oracleApache Iceberg

8 min read

April 28, 2025

How to Sync Oracle to Apache Iceberg in Real Time (CDC + Streaming Guide)

Stream Oracle data to Apache Iceberg in real time using Estuary Flow. This guide covers Oracle CDC setup, Iceberg materialization, and how to build a no-code, low-latency pipeline for your modern data stack.

Dani Pálma Head of Data & Marketing

Share this article

Introduction: Modernizing Oracle Data for the Open Table Era

Modern data teams are stuck between a rock and a legacy place.

They rely on Oracle to power critical operational systems — but need flexible, scalable, and affordable analytics infrastructure to drive decisions. Enter Apache Iceberg, the open table format that brings data warehouse reliability to cloud object storage.

But the real challenge?
Moving data from Oracle to Iceberg — continuously, reliably, and without weeks of custom engineering.

Traditional approaches involve:

Expensive batch jobs that strain the Oracle database
Complex pipelines that break on schema changes
Delayed insights and stale dashboards
Missed opportunities for real-time analytics or ML

That’s why modern teams are looking for a new solution.

Estuary Flow enables real-time Oracle to Iceberg sync with built-in change data capture (CDC), schema enforcement, and cloud-native orchestration — no brittle code or ETL maintenance required.

In this guide, we cover the business drivers behind moving to Apache Iceberg, the challenges of integrating with Oracle, and how Estuary Flow lets you build a real-time CDC pipeline in minutes, with schema enforcement and zero maintenance.

Estuary Flow makes this integration fast and low-code. Jump to the setup steps or try it free →

Why Oracle to Iceberg Is Harder Than It Looks

Many teams want to sync Oracle to Apache Iceberg to modernize their data stack, enabling scalable analytics, open formats, and cost-effective storage. But building that pipeline is full of roadblocks.

Here’s what makes Oracle to Iceberg syncs so challenging:

Oracle is Complex to Extract From

Oracle's architecture isn't built for easy data movement. You need advanced CDC configuration with LogMiner, plus careful schema handling. Most tools default to batch exports, which can’t support real-time Oracle to Iceberg use cases.

Iceberg Requires Smart Integration

Apache Iceberg ETL pipelines must handle metadata, schema evolution, and partitioning — all while writing efficiently to object storage. Streaming raw Oracle data without respecting these rules leads to performance and consistency issues.

Schema Drift and AWS Complexity

Even small schema changes in Oracle can break your pipeline. And setting up Iceberg with AWS Glue, EMR Serverless, IAM roles, and staging buckets takes serious effort, with plenty of room for error.

Most Tools Aren’t Built for Streaming

If you’re working with typical ETL tools, they likely don’t support real-time CDC from Oracle to Iceberg. They’re batch-first, slow to update, and can’t meet the demands of modern analytics or ML pipelines.

Estuary Flow solves all of this, with real-time Oracle to Iceberg streaming that just works.

Why Estuary Flow Is the Easiest Way to Sync Oracle to Iceberg

Imagine going from Oracle to Apache Iceberg — not in days or weeks, but in minutes. That’s exactly what Estuary Flow delivers.

Estuary Flow is a real-time data integration platform built for streaming-first pipelines, and it makes syncing Oracle data to Iceberg seamless — no brittle scripts, no complex orchestration, and no batch delays.

Here’s how Estuary Flow simplifies Oracle to Iceberg sync:

Native Oracle CDC — Real-Time, Log-Based

Estuary’s Oracle connector uses Oracle LogMiner under the hood to capture change events with minimal load. It supports granular change data capture (CDC) from Oracle 11g and above, including:

Full insert/update/delete tracking
Custom backfill options
Schema-aware capture
Optional SSH tunneling for secure networking

This enables real-time Oracle to Iceberg streaming, not just periodic dumps.

Built-In Iceberg Materialization (No Spark Setup Required)

Forget wiring up Spark jobs or fiddling with AWS Glue — Estuary Flow has a native Apache Iceberg materialization connector that:

Writes data directly to S3 in Iceberg format
Automatically updates metadata catalogs (Glue, REST, or Snowflake Open Catalog)
Manages schema evolution and partitioning
Supports both delta and full updates

It’s a cloud-native Iceberg ETL solution that just works.

Schema Evolution and Validation Built In

Oracle schemas change. Flow handles it.

Its collections act as version-controlled, JSON schema-validated streams. You get warnings for breaking changes, and downstream systems like Iceberg adapt safely, with zero manual patching.

This removes one of the most significant risks in streaming Oracle data reliably.

Stream-to-Lakehouse in One Platform

With Estuary Flow, your Oracle to Iceberg sync is just part of a broader ecosystem. You can:

Filter or transform data mid-stream
Materialize the same stream to multiple destinations (e.g., Iceberg + Kafka + Snowflake)
Monitor everything via built-in OpenMetrics or dashboards

This isn't just a sync tool — it's a unified streaming ETL platform.

Steps to Connect Oracle to Apache Iceberg with Estuary Flow

Estuary Flow helps you build a real-time Oracle to Iceberg data pipeline in just a few steps. Whether you’re modernizing your warehouse or streaming operational data to a data lakehouse, Flow makes it simple — no custom code or infrastructure required.

Before you begin, ensure the following prerequisites are in place:

Prerequisites

Oracle (Source):

Oracle 11g or later
An Oracle DB user with:
- SELECT access to tables
- Access to LogMiner views (for CDC)
- Supplemental logging enabled
- A custom FLOW_WATERMARKS table for tracking
Network connectivity between Estuary Flow and Oracle DB (via public IP or SSH tunneling)

Apache Iceberg (Destination):

An S3 bucket for storing Iceberg table files
An EMR Serverless application (Spark runtime) for standard, fully-reduced updates (not needed for delta updates)
AWS Glue or REST Catalog set up as your Iceberg catalog
IAM credentials with access to:
- S3 bucket
- EMR execution
- Catalog read/write permissions

Step 1: Set Up Oracle as the Source

Log into your Estuary Flow account.
In the Sources section, click + NEW CAPTURE.
Search for Oracle, then click the Capture button for the real-time connector.
On the Oracle connector configuration page, fill in the required fields:
- Address: e.g., oracle.example.com:1521
- User / Password: Your Oracle CDC user credentials
- Database: Use the PDB name (if in a containerized DB) or default to ORCL
(Optional) Configure SSH tunneling if your DB is behind a firewall.
(Optional) Define advanced settings like:
- watermarksTable: <user>.FLOW_WATERMARKS
- incremental_scn_range: Adjust based on data volume
- discover_schemas: Specify schemas to include (for faster discovery)
- Dictionary Mode: Use extract for schema-change resilience
Click NEXT > SAVE AND PUBLISH to create your capture.

This step sets up a real-time Oracle CDC stream, which will populate a Flow collection with change events.

Step 2: Configure Iceberg as the Destination

Apache Iceberg materialization connectors in Estuary

Once Oracle data is flowing into your collection, it’s time to materialize that data to Apache Iceberg:

On the popup after capture creation, click MATERIALIZE COLLECTIONS
(or go to Destinations > + NEW MATERIALIZATION)
Search for Iceberg, and select the appropriate materialization connector:
- Use Amazon S3 Iceberg for delta updates and AWS-native setup using a Glue or REST catalog
- Use Apache Iceberg for standard, fully-reduced updates using a REST catalog
Click the Materialize button and configure the destination:

Amazon S3 Iceberg (Delta Updates) Required Fields:

Name: A unique materialization name
AWS Access Key ID / Secret Access Key: With access to S3 and AWS Glue
Bucket: Your S3 bucket name
Region: Where the S3 bucket is hosted
Namespace: A logical grouping for your Iceberg tables
Catalog: Choose between AWS Glue or REST; a REST catalog will require:
- URI: The REST catalog URI
- Warehouse: The warehouse to connect to
- Credential or Token: Authentication to connect to the catalog

Apache Iceberg (Standard Updates) Required Fields:

Name: A unique materialization name
URL: Base URL for the catalog
Warehouse: Warehouse to connect to
Namespace: A logical grouping for your Iceberg tables
Catalog authentication: Credentials to access the REST catalog
- OAuth 2.0: Requires the URI and credentials
- AWS SigV4 Authentication: Requires AWS access key/ID and region
Compute Settings (for EMR Serverless):
- AWS Access Key ID / Secret Access Key: With access to S3 and EMR
- Bucket: Your S3 bucket name
- Region: Where both EMR and the S3 bucket are hosted
- Application ID: Your EMR Serverless app ID (Spark runtime, e.g., emr-7.7.0)
- Execution Role ARN: IAM role for EMR job execution

Under Source Collections, ensure the Oracle capture is linked. If not, click SOURCE FROM CAPTURE to bind it manually.
Click NEXT > SAVE AND PUBLISH to activate your real-time sync.

What Happens Next?

Estuary Flow will stream change data from Oracle to a collection.
Data will be converted to Parquet, then written to Iceberg tables in S3.
Iceberg metadata will be updated in Glue or REST Catalog.
Your tables are now queryable via Trino, Spark, Athena, or Snowflake.

Whether you're building a modern lakehouse or want low-latency reporting from Oracle, Estuary Flow is the fastest way to set up a streaming Oracle to Iceberg pipeline.

Migrate Data From Oracle to Any Destination in Real-time

Get Started with Real-Time Oracle to Iceberg Sync

Whether you're modernizing your data stack, scaling your analytics platform, or enabling real-time ML, Estuary Flow gives you a faster, more straightforward way to sync Oracle to Apache Iceberg.

You don't need to build fragile pipelines, manage Spark jobs, or worry about schema drift. Just connect, configure, and let Estuary Flow handle the rest — with:

Change Data Capture from Oracle (via LogMiner)
Streaming ingestion with schema validation
Delta or fully-reduced updates to Apache Iceberg tables
Automated metadata updates via AWS Glue or REST Catalogs
Millisecond latency for real-time access

Ready to build your Oracle to Iceberg pipeline?

👉 Try Estuary Flow for Free - No code required, set up in minutes

📚 Or explore the docs:

More Oracle Integration Guides

Share this article

Table of Contents

Start Building For Free

About the author

Dani PálmaHead of Data & Marketing

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.