DynamoDBApache Iceberg

9 min read

May 20, 2025

DynamoDB to Iceberg: A Real-Time CDC Pipeline with Estuary Flow

Learn how to stream DynamoDB data to Apache Iceberg in real time using Estuary Flow. No code, low latency, and analytics-ready from day one.

Jeffrey Richman

Stream Data From DynamoDB to Apache Iceberg in Real Time

Share this article

Amazon DynamoDB is a go-to choice for fast, scalable application data. But as your data grows and analytics needs evolve, DynamoDB’s limitations become clear: it’s not designed for querying history, performing complex joins, or running large-scale analytics.

That’s where Apache Iceberg comes in — a modern, open table format built for the lakehouse era. It turns low-cost object storage into an analytics powerhouse with support for ACID transactions, schema evolution, and time travel.

So, how do you bridge these two worlds? The answer is a real-time DynamoDB to Iceberg pipeline — and in this guide, we’ll show you how to build one using Estuary Flow. Whether you're looking to stream DynamoDB to Apache Iceberg, support low-latency analytics, or simplify your data architecture, this is the most efficient and scalable way to do it.

The Business Problem

As teams scale their applications and data volumes grow, they quickly run into a wall: DynamoDB isn’t built for analytics, and bridging it with platforms like Apache Iceberg is far from straightforward.

DynamoDB Isn’t Designed for Analytical Workloads

Amazon DynamoDB is a high-performance NoSQL database optimized for transactional operations, not for deep analysis. Out of the box, you:

Can only use a subset of the PartiQL query language when running ad hoc queries
Have a limited selection of AWS-only data integrations
Can’t explore trends or perform time-series analysis

To get around this, teams often export DynamoDB data to Amazon S3, then stitch together ETL pipelines using AWS Glue or manual scripts. These pipelines are slow, fragile, and require constant maintenance.

Batch Pipelines Fall Short on Freshness

Batch-based approaches introduce latency. Whether you’re exporting data hourly or nightly, the insights are already outdated. That hurts:

Real-time dashboards
ML models that need fresh data
Operational agility, where decisions hinge on recent activity

In short, batch ETL slows you down.

Streaming From DynamoDB to Iceberg Is Complex to Build

Building a custom streaming architecture to move data from DynamoDB to Iceberg often involves DynamoDB Streams, Apache Kafka, Spark, and a mesh of connectors and glue code. It’s technically possible — but:

Schema changes can break pipelines
Backfills require custom handling
Retries and deduplication are hard to get right
Infrastructure costs and operational burden grow quickly

Most teams don’t want to engineer a real-time stack from scratch — they just want a streaming DynamoDB to Iceberg pipeline that works.

The Solution: Real-Time DynamoDB to Iceberg Integration

To unlock the full analytical potential of your DynamoDB data, you need more than periodic exports and brittle ETL scripts. You need a system that:

Captures every change in real time
Handles schema evolution and data consistency
Writes to Iceberg in a format optimized for analytics
Requires minimal engineering overhead

That’s where a real-time DynamoDB to Iceberg integration comes in — and more specifically, where Estuary Flow delivers.

With Estuary Flow, you can stream DynamoDB change events (via DynamoDB Streams) directly into Iceberg-backed tables stored on Amazon S3 or another object store. Flow does all the heavy lifting — capturing inserts, updates, and deletes, transforming data into structured formats like Parquet, and delivering it into your Iceberg tables with low latency and high reliability.

Unlike custom pipelines or batch tools, Flow provides a declarative, no-code interface and built-in support for schema enforcement, retries, monitoring, and backfills, making it the fastest way to get from operational data to analytics-ready tables.

Whether you're building a real-time lakehouse, powering dashboards, or training ML models, streaming DynamoDB to Apache Iceberg with Flow gets you there without the complexity.

Why Estuary Flow for DynamoDB to Apache Iceberg?

Here’s why Estuary Flow is the best choice for building a real-time DynamoDB to Iceberg pipeline:

Real-time, always-on sync: Flow captures change events from DynamoDB Streams as they happen and delivers them to Iceberg with low latency — ideal for real-time analytics.
No-code setup: Create powerful, production-ready pipelines without writing a single line of code. Flow’s intuitive UI and pre-built connectors let you go from idea to implementation in minutes.
Native support for Iceberg: Materialize data directly into Apache Iceberg tables stored on Amazon S3, with support for AWS Glue or REST catalogs — no Spark job orchestration required.
Automated schema handling: Flow automatically maps and enforces schemas, handles type conversions, and adapts to changes in your DynamoDB documents — saving you hours of manual work.
Backfills + exactly-once delivery: Easily backfill historical data from DynamoDB and trust that your data lands in Iceberg once and only once — even if there are failures or retries.
Scalable and cloud-native: Whether you're syncing one table or hundreds, Flow is built to scale with your data — and it runs as managed SaaS or in your private cloud.
Reliable monitoring and observability: Get visibility into every step of your pipeline with built-in metrics, logs, and real-time alerts.

With Estuary Flow, you don’t need to build or manage a fragile data stack. You get a reliable, real-time DynamoDB to Iceberg pipeline — ready for analytics, machine learning, or anything your lakehouse throws at it.

How to Set Up a Real-Time DynamoDB to Iceberg Pipeline with Estuary Flow

With Estuary Flow, you can move from raw DynamoDB change data to structured Iceberg tables in minutes. Here’s how to do it.

Let’s walk through the steps to build a DynamoDB → Iceberg pipeline using Estuary Flow.

Prerequisites

One or more DynamoDB tables with DynamoDB Streams enabled
A target Apache Iceberg setup, backed by object storage (e.g., Amazon S3) and catalog (AWS Glue or REST)
An active Estuary Flow account

Step 1: Configure Amazon DynamoDB as the Source

Log into your Estuary Flow account.
In the left sidebar, click Sources, then click + NEW CAPTURE.
In the Search connectors field, search for DynamoDB.
Click the Capture button on the Amazon DynamoDB connector.
On the configuration page, fill in:
- Name: A unique identifier for your capture
- AWS Access Key ID / Secret Access Key: Credentials with access to DynamoDB
- Region: AWS region of your DynamoDB table
Click NEXT > SAVE AND PUBLISH to activate the capture.

Once configured, Flow will begin reading real-time change events (inserts, updates, deletes) from your table and write them to a collection.

Step 2: Configure Apache Iceberg as the Destination

Selecting an Apache Iceberg materialization connector

After your capture is active, click MATERIALIZE COLLECTIONS in the popup, or navigate to Destinations > + NEW MATERIALIZATION.
In the Search connectors field, type Iceberg.
Select the appropriate materialization:
- Amazon S3 Iceberg (for delta update pipelines using S3 + AWS Glue)
- Apache Iceberg (for full updates using a REST catalog, S3, and EMR)

Configuration Fields

Amazon S3 Iceberg (Delta Updates):

Name: Unique materialization name
AWS Access Key ID / Secret Access Key: Must have permissions for S3 and Glue
Bucket: Your S3 bucket for data storage
Region: AWS region for the S3 bucket and Glue catalog
Namespace: Logical grouping of your Iceberg tables (e.g., prod/analytics)
Catalog:
- Glue: If using AWS Glue as the catalog
- REST: Provide REST URI, warehouse path, and credentials if using a custom catalog

Apache Iceberg (Standard Updates with EMR Serverless):

URL: REST catalog base URI
Warehouse: Iceberg warehouse path
Namespace: Logical table grouping
Authentication: OAuth or AWS SigV4, depending on catalog type
Compute Settings:
- Application ID: EMR Serverless application ID
- Execution Role ARN: IAM role for job execution
- Bucket / Region: S3 bucket and AWS region for EMR
- AWS Access Key ID / Secret Access Key: Credentials to access EMR and S3

In the Source Collections section, click SOURCE FROM CAPTURE to bind the collection created by your DynamoDB capture.
Click NEXT > SAVE AND PUBLISH to finalize your materialization.

What You Get

Once active, Estuary Flow continuously syncs change events from your DynamoDB table into Iceberg-backed tables — enabling real-time analytics, queryability via SQL engines, and durable, governed data storage.

You can also:

Backfill historical data (if needed)
Apply schema evolution rules
Monitor the pipeline via metrics and alerts

Popular Use Cases for DynamoDB to Iceberg Pipelines

Syncing DynamoDB to Apache Iceberg unlocks a broad range of powerful, real-world applications. Whether you're a data engineer building modern lakehouse pipelines or a product manager seeking better analytics, here are some top use cases:

Real-Time Operational Analytics

Keep a pulse on application activity, such as user behavior, transactions, or sensor data, by streaming DynamoDB change events into Iceberg. Query the latest data using Trino or Spark for real-time insights.

Data Lakehouse Enablement

Integrate DynamoDB into a larger data lakehouse architecture by syncing to Iceberg. This allows you to join NoSQL data with other sources (like relational databases or Kafka) in a unified, open analytics layer.

Machine Learning Feature Stores

Maintain fresh, queryable feature tables for machine learning pipelines. Iceberg’s versioning and time travel features support reproducibility and batch/stream hybrid ML workflows.

Long-Term Data Retention & Compliance

Store historical records from DynamoDB in Iceberg tables for years, cost-effectively and with schema control. This is ideal for audit logs, financial records, or regulated industries.

Simplifying ETL Infrastructure

Replace brittle batch jobs and manual exports with a real-time, fully managed pipeline that continuously syncs DynamoDB to a structured analytics layer.

Conclusion

As modern data teams move toward real-time insights and scalable analytics, bridging DynamoDB to Apache Iceberg is no longer a nice-to-have — it's essential. But building that bridge yourself? Costly, fragile, and slow.

Estuary Flow changes that.

With native support for both DynamoDB Streams and Iceberg materializations, Flow delivers a fully-managed, real-time pipeline that’s reliable, scalable, and fast to deploy — all without the usual engineering overhead.

Whether you're powering live dashboards, supporting ML pipelines, or just looking for a better way to store and query your application data, Flow gets you from source to insight in minutes.

Ready to Get Started?

Share this article

Table of Contents

Start Building For Free

About the author

Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.