
Streaming data from Amazon DynamoDB into Apache Iceberg is a common requirement for teams that want to analyze application data at scale without sacrificing freshness. DynamoDB is optimized for low-latency transactional workloads, while Iceberg is designed for analytical queries, historical analysis, and lakehouse architectures. Bridging these systems requires capturing DynamoDB change events and applying them correctly to Iceberg tables on object storage.
This guide explains the common DynamoDB to Iceberg architectures, their tradeoffs, and how Estuary simplifies real-time streaming into Iceberg without requiring teams to operate complex streaming infrastructure.
Key takeaways
DynamoDB is not designed for analytical queries, joins, or historical analysis.
Apache Iceberg provides ACID transactions, schema evolution, and time travel on low-cost object storage.
DynamoDB Streams is the authoritative source for change data capture from DynamoDB.
AWS-native pipelines can stream DynamoDB data into Iceberg but introduce operational and correctness challenges.
Estuary provides a right-time data platform that streams DynamoDB changes into Iceberg with built-in CDC handling and minimal operational overhead.
The business problem: why DynamoDB data is hard to analyze
Amazon DynamoDB is purpose-built for operational workloads. It excels at high-throughput key-value access, predictable latency, and horizontal scalability. These strengths come with tradeoffs that become apparent as data volumes and analytical needs grow.
Limited analytical query capabilities
DynamoDB does not support complex analytical queries. Even with PartiQL, querying remains constrained to access patterns defined by primary keys and indexes. Joins, aggregations across large datasets, and historical trend analysis are not practical.
No native historical or time-based analysis
DynamoDB is optimized for current state access, not historical exploration. While point-in-time recovery exists for backup and restore, it is not designed for querying data over time or analyzing change history.
Analytics workflows require external systems
To analyze DynamoDB data, teams typically export data into systems like Amazon S3, data warehouses, or lakehouse platforms. This immediately introduces data movement, transformation logic, and operational complexity.
Why batch exports fall short
A common workaround is to export DynamoDB data to S3 using periodic jobs or managed services such as AWS Glue. While simple to reason about, batch pipelines introduce several limitations.
High latency
Batch exports run on schedules. Whether hourly or nightly, insights lag behind production activity. This impacts real-time dashboards, operational monitoring, and machine learning pipelines that depend on fresh data.
Fragile and hard to evolve
Batch pipelines often rely on scripts or Glue jobs that must be updated when schemas change. DynamoDB is schemaless by nature, but analytical systems are not. Over time, these pipelines become brittle and costly to maintain.
Inefficient for large-scale change data
Re-exporting large tables repeatedly is inefficient when only a small subset of records change. Batch jobs waste compute and storage while still failing to deliver low-latency insights.
For teams that need timely analytics, streaming change data capture is a more appropriate model.
Common ways to stream DynamoDB changes into Apache Iceberg
There are several established, AWS-native approaches to streaming DynamoDB data into Iceberg tables on object storage. Each approach makes different tradeoffs between simplicity, correctness, and operational effort.
Option 1: DynamoDB Streams to Firehose to Iceberg
This is the simplest managed option.
How it works
- DynamoDB Streams is enabled on a table, typically using NEW_AND_OLD_IMAGES to capture full change events.
- Stream records are forwarded into Amazon Data Firehose, often via a Lambda or Kinesis intermediary.
- Firehose writes records into Apache Iceberg tables on Amazon S3 using a supported catalog.
When this works well
- Minimal operational overhead
- Mostly append-oriented analytics
- Low transformation requirements
Tradeoffs
DynamoDB Streams emits CDC events, not row-level table updates. Mapping inserts, updates, and deletes into correct Iceberg row semantics requires careful design. Frequent updates and deletes often require downstream merge or compaction processes. Schema evolution and deduplication logic must be handled outside of Firehose.
Option 2: DynamoDB Streams to Apache Flink to Iceberg
This approach offers the most control and correctness.
How it works
- DynamoDB Streams provides ordered change events per partition.
- Apache Flink reads the stream using a DynamoDB Streams connector, commonly via Amazon Managed Service for Apache Flink.
- Flink applies CDC semantics, transformations, enrichment, and writes into Iceberg using the Iceberg sink connector.
When this works well
- True upserts and deletes are required
- Data must be enriched, re-keyed, or transformed
- Strong control over watermarking and state
Tradeoffs
Operating Flink introduces significant complexity. Teams must manage state backends, checkpoints, failure recovery, and small-file mitigation. Iceberg table maintenance, including compaction and optimization, must be planned and operated continuously.
Option 3: DynamoDB Streams to AWS Glue streaming jobs to Iceberg
This option is familiar to Spark-centric teams.
How it works
- DynamoDB Streams data is routed through a streaming backbone such as Kinesis.
- AWS Glue streaming jobs read the stream and apply transformations using Spark.
- Data is written to Iceberg tables using the Glue Data Catalog.
When this works well
- Existing Spark and Glue expertise
- Hybrid batch and streaming transformations
- Integration with existing Glue-based data lakes
Tradeoffs
Although Glue abstracts some infrastructure, CDC correctness, merge semantics, and Iceberg table maintenance remain the responsibility of the data team. Operational complexity is higher than Firehose and similar to Flink-based approaches.
The shared challenges across all AWS-native approaches
While the architectures above differ, they share a set of underlying challenges that teams consistently encounter.
CDC correctness is not automatic
DynamoDB Streams produces change events, not relational updates. Teams must define how primary keys map to Iceberg rows, how updates overwrite prior values, and how deletes are represented and applied.
Schema evolution requires discipline
DynamoDB allows attributes to appear and disappear freely. Analytical systems require stable schemas. Without careful handling, schema drift can break downstream queries or force repeated manual intervention.
Backfills and reprocessing are complex
Replaying historical data or recovering from pipeline failures often requires custom logic. This is particularly difficult in streaming systems where state and ordering matter.
Operational burden grows over time
Flink and Spark pipelines require monitoring, tuning, upgrades, and cost management. Even fully managed services still require an operating model.
For many teams, the challenge is not whether DynamoDB data can reach Iceberg, but how much infrastructure they want to own to make it reliable.
A simpler approach: DynamoDB to Iceberg with Estuary
Estuary addresses these challenges by providing a right-time data platform that natively handles change data capture and delivery into analytical systems like Apache Iceberg.
Right-time data movement means teams can choose when data moves, whether sub-second, near real-time, or batch, without rebuilding pipelines.
At a high level, Estuary:
- Reads change events directly from DynamoDB Streams
- Applies CDC semantics consistently for inserts, updates, and deletes
- Enforces and evolves schemas for analytical use
- Writes data into Apache Iceberg tables on object storage
- Operates as a managed service with predictable reliability and cost
Instead of assembling Firehose, Flink, or Spark pipelines, Estuary collapses these responsibilities into a single managed system designed specifically for streaming operational data into analytics-ready formats.
When Estuary is the right choice
AWS-native pipelines for DynamoDB to Iceberg are viable, but they assume teams are willing to design, operate, and continuously maintain streaming infrastructure. Estuary is designed for teams that want correct, real-time data movement without owning that operational complexity.
Estuary is a strong fit when:
- Correct CDC semantics matter
Inserts, updates, and deletes from DynamoDB need to be applied deterministically to Iceberg tables without custom merge logic. - Low-latency analytics are required
Dashboards, monitoring, or downstream systems depend on near real-time visibility into application data. - Operational simplicity is a priority
Teams want to avoid running Flink clusters, Spark streaming jobs, or custom retry and recovery logic. - Schema evolution is expected
DynamoDB attributes change over time, and the analytics layer must adapt without breaking queries. - Predictable cost and reliability are important
Streaming infrastructure sprawl often leads to hidden costs and operational risk.
In these cases, Estuary functions as a purpose-built CDC-to-lakehouse layer rather than a general-purpose streaming framework.
How to Set Up a DynamoDB to Iceberg Pipeline with Estuary
With Estuary, you can move from raw DynamoDB change data to structured Iceberg tables in minutes. Here’s how to do it.
Let’s walk through the steps to build a DynamoDB to Iceberg pipeline using Estuary.
Prerequisites
- One or more DynamoDB tables with DynamoDB Streams enabled
- A target Apache Iceberg setup, backed by object storage (e.g., Amazon S3) and catalog (AWS Glue or REST)
- An active Estuary account
Step 1: Configure Amazon DynamoDB as the Source
- Log into your Estuary account.
- In the left sidebar, click Sources, then click + NEW CAPTURE.
- In the Search connectors field, search for DynamoDB.
- Click the Capture button on the Amazon DynamoDB connector.
- On the configuration page, fill in:
- Name: A unique identifier for your capture
- AWS Access Key ID / Secret Access Key: Credentials with access to DynamoDB
- Region: AWS region of your DynamoDB table
- Click NEXT > SAVE AND PUBLISH to activate the capture.
Once configured, Estuary will begin reading real-time change events (inserts, updates, deletes) from your table and write them to a collection.
Step 2: Configure Apache Iceberg as the Destination
- After your capture is active, click MATERIALIZE COLLECTIONS in the pop-up, or navigate to Destinations > + NEW MATERIALIZATION.
- In the Search connectors field, type Iceberg.
- Select the appropriate materialization:
- Amazon S3 Iceberg (for delta update pipelines using S3 + AWS Glue)
- Apache Iceberg (for full updates using a REST catalog, S3, and EMR)
Configuration Fields
Amazon S3 Iceberg (Delta Updates):
- Name: Unique materialization name
- AWS Access Key ID / Secret Access Key: Must have permissions for S3 and Glue
- Bucket: Your S3 bucket for data storage
- Region: AWS region for the S3 bucket and Glue catalog
- Namespace: Logical grouping of your Iceberg tables (e.g.,
prod/analytics) - Catalog:
- Glue: If using AWS Glue as the catalog
- REST: Provide REST URI, warehouse path, and credentials if using a custom catalog
Apache Iceberg (Standard Updates with EMR Serverless):
- URL: REST catalog base URI
- Warehouse: Iceberg warehouse path
- Namespace: Logical table grouping
- Authentication: OAuth or AWS SigV4, depending on catalog type
- Compute Settings:
- Application ID: EMR Serverless application ID
- Execution Role ARN: IAM role for job execution
- Bucket / Region: S3 bucket and AWS region for EMR
- AWS Access Key ID / Secret Access Key: Credentials to access EMR and S3
- In the Source Collections section, click SOURCE FROM CAPTURE to bind the collection created by your DynamoDB capture.
- Click NEXT > SAVE AND PUBLISH to finalize your materialization.
What You Get
Once active, Estuary continuously syncs change events from your DynamoDB table into Iceberg-backed tables — enabling real-time analytics, queryability via SQL engines, and durable, governed data storage.
You can also:
- Backfill historical data (if needed)
- Apply schema evolution rules
- Monitor the pipeline via metrics and alerts
How Estuary compares to AWS-native approaches
| Approach | Latency | CDC correctness | Operational effort | Schema handling |
|---|---|---|---|---|
| DynamoDB export + batch jobs | Hours | Limited | Medium | Manual |
| Firehose to Iceberg | Minutes | Partial | Low | Manual |
| Flink to Iceberg | Seconds | High | Very high | Manual |
| Glue streaming to Iceberg | Minutes | High | High | Manual |
| Estuary to Iceberg | Seconds | High | Low | Automatic |
This comparison highlights the core distinction: most pipelines focus on data movement, while Estuary focuses on change data correctness and lifecycle management.
Common use cases
Streaming DynamoDB into Iceberg enables a range of analytical and operational workloads.
Real-time operational analytics
Application events, transactions, and state changes can be queried in near real time using SQL engines without impacting production workloads.
Lakehouse integration
DynamoDB data can be joined with relational, event, and batch data in a unified Iceberg-based lakehouse architecture.
Machine learning pipelines
Fresh, versioned data supports feature generation, model training, and reproducibility using Iceberg’s snapshot and time-travel capabilities.
Long-term retention and compliance
Historical records can be stored cost-effectively on object storage with full auditability and schema governance.
Conclusion
Streaming DynamoDB data into Apache Iceberg is essential for teams that want scalable analytics, historical visibility, and lakehouse interoperability. While AWS-native approaches can achieve this, they often require significant infrastructure and operational investment.
Estuary provides a purpose-built alternative that delivers correct, right-time change data into Iceberg without requiring teams to assemble and operate complex streaming systems. By abstracting CDC handling, schema evolution, and delivery mechanics, Estuary allows teams to focus on analytics rather than infrastructure.
Ready to Get Started?
Sign up for Estuary to build your DynamoDB to Iceberg pipeline today — and unlock real-time analytics at scale.
FAQs
Do I need Spark or Flink to write to Iceberg?
How are updates and deletes handled?
How real-time can DynamoDB to Iceberg pipelines be?

About the author
With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.











