
Introduction: Modernizing Oracle Data for the Open Table Era
Moving data from Oracle to Apache Iceberg is a common goal for organizations modernizing legacy systems for analytics, machine learning, and open data architectures. Oracle continues to power critical operational workloads, but its data is often locked inside proprietary infrastructure that is expensive to scale and difficult to integrate with modern analytics platforms.
Apache Iceberg provides an open table format that brings transactional guarantees, schema evolution, and time travel to cloud object storage. Together, Oracle and Iceberg allow teams to decouple operational systems from analytical workloads. However, this only works if Oracle data can be moved continuously, reliably, and without introducing excessive operational overhead.
The challenge is that syncing Oracle data to Iceberg in real time is not straightforward. Oracle extraction requires careful change data capture (CDC) configuration, schema changes are frequent, and Iceberg requires transactional writes and consistent metadata management. As a result, teams typically choose between batch-based pipelines, custom CDC implementations, or managed streaming approaches, each with different trade-offs.
The sections below outline the common approaches teams take to move Oracle data into Iceberg, why those approaches are challenging, and how modern streaming architectures simplify the process.
Other Ways Teams Move Data from Oracle to Apache Iceberg
Before managed streaming pipelines became widely available, most organizations relied on custom-built or manual approaches to move data from Oracle into Apache Iceberg. These methods are still used today, but they come with significant operational and engineering trade-offs.
Batch-Based Extraction Pipelines
A common approach is to extract data from Oracle on a scheduled basis and load it into Iceberg tables using batch jobs. Data is exported from Oracle, written to cloud storage, and then processed using Spark, AWS Glue, or similar frameworks.
While straightforward to implement, batch pipelines place additional load on the Oracle database, introduce data latency, and make it difficult to support near real-time analytics or machine learning workloads. As data volumes grow, batch jobs become slower, more expensive, and harder to operate reliably.
Custom CDC Pipelines Using Oracle LogMiner
Some teams build custom CDC pipelines using Oracle LogMiner. In this model, engineers manage log extraction, offset tracking, schema changes, retries, and recovery logic themselves. Change events are then applied to Iceberg tables using custom Spark or Flink jobs.
Although this approach can deliver lower latency than batch pipelines, achieving exactly-once semantics, handling schema evolution safely, and maintaining reliable recovery paths requires deep Oracle expertise and ongoing engineering effort.
Manual Hybrid Architectures
Other organizations attempt a manual hybrid architecture that combines batch snapshots with custom CDC streams. Batch jobs are used to load historical data, while CDC streams handle ongoing changes.
While this approach can reduce staleness compared to pure batch pipelines, it often introduces complexity around deduplication, ordering, and reconciliation when writing into Iceberg tables. Managing these concerns outside of a unified CDC system increases operational risk and long-term maintenance costs.
These approaches can work, but they require custom infrastructure, careful coordination, and constant engineering investment. This is why many teams now prefer managed solutions that treat snapshot ingestion and continuous CDC as a single, unified pipeline.
Why Estuary Is the Easiest Way to Sync Oracle to Apache Iceberg
For teams that need continuous Oracle to Apache Iceberg synchronization, the core challenge is not moving data once, but maintaining correctness, schema evolution, and transactional guarantees over time. Manual batch pipelines and custom CDC implementations require significant operational effort and are difficult to scale reliably.
Estuary provides a unified, right-time data pipeline that captures historical data and ongoing changes from Oracle as a single system and delivers them transactionally into Apache Iceberg tables. Instead of managing separate snapshot jobs, custom LogMiner consumers, and reconciliation logic, teams rely on built-in change data capture, schema enforcement, and managed delivery to Iceberg.
Native Oracle CDC with LogMiner
Estuary uses Oracle LogMiner to capture inserts, updates, and deletes directly from the database redo logs with minimal impact on operational workloads. The connector supports Oracle 11g and newer versions and provides:
- Full insert, update, and delete capture
- Configurable backfill behavior
- Schema-aware change capture
- Optional SSH tunneling for secure network access
This enables real-time Oracle to Iceberg synchronization rather than periodic batch exports.
Built-In Apache Iceberg Materialization
Estuary includes a native Apache Iceberg materialization that writes data directly to cloud object storage in Iceberg format while managing table metadata and transactional commits. The materialization:
- Writes data to S3-backed Iceberg tables
- Updates metadata catalogs such as AWS Glue or REST catalogs
- Manages schema evolution and partitioning automatically
- Supports both delta-style and fully reduced updates
This removes the need to build and operate custom Spark or Glue pipelines.
Schema Enforcement and Evolution
Oracle schemas evolve over time, and unmanaged schema changes are a common source of pipeline failures. Estuary enforces schemas at the collection level, validates changes, and propagates compatible updates safely downstream to Iceberg tables without manual intervention.
Unified Stream-to-Lakehouse Pipeline
Oracle to Iceberg synchronization is part of a broader, unified pipeline. With Estuary, teams can filter or transform data in motion, materialize the same stream to multiple destinations, and monitor pipeline health using built-in metrics and observability tools.
This approach treats snapshot ingestion and continuous CDC as a single system, reducing operational complexity while maintaining real-time data freshness.
Steps to Connect Oracle to Apache Iceberg with Estuary
Estuary helps you build a real-time Oracle to Iceberg data pipeline in just a few steps. Whether you’re modernizing your warehouse or streaming operational data to a data lakehouse, Estuary makes it simple — no custom code or infrastructure required.
Before you begin, ensure the following prerequisites are in place:
Prerequisites
Oracle (Source):
- Oracle 11g or later
- An Oracle DB user with:
SELECTaccess to tables- Access to LogMiner views (for CDC)
- Supplemental logging enabled
- A custom
FLOW_WATERMARKStable for tracking
- Network connectivity between Estuary and Oracle DB (via public IP or SSH tunneling)
Apache Iceberg (Destination):
- An S3 bucket for storing Iceberg table files
- An EMR Serverless application (Spark runtime) for standard, fully-reduced updates (not needed for delta updates)
- AWS Glue or REST Catalog set up as your Iceberg catalog
- IAM credentials with access to:
- S3 bucket
- EMR execution
- Catalog read/write permissions
Step 1: Set Up Oracle as the Source
- Log into your Estuary account.
- In the Sources section, click + NEW CAPTURE.
- Search for Oracle, then click the Capture button for the real-time connector.
- On the Oracle connector configuration page, fill in the required fields:
- Address: e.g.,
oracle.example.com:1521 - User / Password: Your Oracle CDC user credentials
- Database: Use the PDB name (if in a containerized DB) or default to
ORCL
- Address: e.g.,
- (Optional) Configure SSH tunneling if your DB is behind a firewall.
- (Optional) Define advanced settings like:
watermarksTable:<user>.FLOW_WATERMARKSincremental_scn_range: Adjust based on data volumediscover_schemas: Specify schemas to include (for faster discovery)- Dictionary Mode: Use
extractfor schema-change resilience
- Click NEXT > SAVE AND PUBLISH to create your capture.
This step sets up a real-time Oracle CDC stream, which will populate a Estuary collection with change events.
Step 2: Configure Iceberg as the Destination
Once Oracle data is flowing into your collection, it’s time to materialize that data to Apache Iceberg:
- On the popup after capture creation, click MATERIALIZE COLLECTIONS
(or go to Destinations > + NEW MATERIALIZATION) - Search for Iceberg, and select the appropriate materialization connector:
- Use Amazon S3 Iceberg for delta updates and AWS-native setup using a Glue or REST catalog
- Use Apache Iceberg for standard, fully-reduced updates using a REST catalog
- Click the Materialize button and configure the destination:
Amazon S3 Iceberg (Delta Updates) Required Fields:
- Name: A unique materialization name
- AWS Access Key ID / Secret Access Key: With access to S3 and AWS Glue
- Bucket: Your S3 bucket name
- Region: Where the S3 bucket is hosted
- Namespace: A logical grouping for your Iceberg tables
- Catalog: Choose between AWS Glue or REST; a REST catalog will require:
- URI: The REST catalog URI
- Warehouse: The warehouse to connect to
- Credential or Token: Authentication to connect to the catalog
Apache Iceberg (Standard Updates) Required Fields:
- Name: A unique materialization name
- URL: Base URL for the catalog
- Warehouse: Warehouse to connect to
- Namespace: A logical grouping for your Iceberg tables
- Catalog authentication: Credentials to access the REST catalog
- OAuth 2.0: Requires the URI and credentials
- AWS SigV4 Authentication: Requires AWS access key/ID and region
- Compute Settings (for EMR Serverless):
- AWS Access Key ID / Secret Access Key: With access to S3 and EMR
- Bucket: Your S3 bucket name
- Region: Where both EMR and the S3 bucket are hosted
- Application ID: Your EMR Serverless app ID (Spark runtime, e.g., emr-7.7.0)
- Execution Role ARN: IAM role for EMR job execution
- Under Source Collections, ensure the Oracle capture is linked. If not, click SOURCE FROM CAPTURE to bind it manually.
- Click NEXT > SAVE AND PUBLISH to activate your real-time sync.
What Happens Next?
- Estuary will stream change data from Oracle to a collection.
- Data will be converted to Parquet, then written to Iceberg tables in S3.
- Iceberg metadata will be updated in Glue or REST Catalog.
- Your tables are now queryable via Trino, Spark, Athena, or Snowflake.
Whether you're building a modern lakehouse or want low-latency reporting from Oracle, Estuary is the fastest way to set up a streaming Oracle to Iceberg pipeline.
Get Started with Real-Time Oracle to Iceberg Sync
Whether you're modernizing your data stack, scaling your analytics platform, or enabling real-time ML, Estuary gives you a faster, more straightforward way to sync Oracle to Apache Iceberg.
You don't need to build fragile pipelines, manage Spark jobs, or worry about schema drift. Just connect, configure, and let Estuary handle the rest — with:
- Change Data Capture from Oracle (via LogMiner)
- Streaming ingestion with schema validation
- Delta or fully-reduced updates to Apache Iceberg tables
- Automated metadata updates via AWS Glue or REST Catalogs
- Millisecond latency for real-time access
Ready to build your Oracle to Iceberg pipeline?
👉 Try Estuary for Free - No code required, set up in minutes
📚 Or explore the docs:
- Oracle Connector Docs
- Standard Apache Iceberg Materialization Docs
- Delta Updates Apache Iceberg Materialization Docs
More Oracle Integration Guides
FAQs
Can you stream Oracle data to Apache Iceberg in real time?
Do I need Spark to sync Oracle to Iceberg?

About the author
Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

















