
Amazon DynamoDB is a go-to choice for fast, scalable application data. But as your data grows and analytics needs evolve, DynamoDB’s limitations become clear: it’s not designed for querying history, performing complex joins, or running large-scale analytics.
That’s where Apache Iceberg comes in — a modern, open table format built for the lakehouse era. It turns low-cost object storage into an analytics powerhouse with support for ACID transactions, schema evolution, and time travel.
So, how do you bridge these two worlds? The answer is a real-time DynamoDB to Iceberg pipeline — and in this guide, we’ll show you how to build one using Estuary Flow. Whether you're looking to stream DynamoDB to Apache Iceberg, support low-latency analytics, or simplify your data architecture, this is the most efficient and scalable way to do it.
The Business Problem
As teams scale their applications and data volumes grow, they quickly run into a wall: DynamoDB isn’t built for analytics, and bridging it with platforms like Apache Iceberg is far from straightforward.
DynamoDB Isn’t Designed for Analytical Workloads
Amazon DynamoDB is a high-performance NoSQL database optimized for transactional operations, not for deep analysis. Out of the box, you:
- Can only use a subset of the PartiQL query language when running ad hoc queries
- Have a limited selection of AWS-only data integrations
- Can’t explore trends or perform time-series analysis
To get around this, teams often export DynamoDB data to Amazon S3, then stitch together ETL pipelines using AWS Glue or manual scripts. These pipelines are slow, fragile, and require constant maintenance.
Batch Pipelines Fall Short on Freshness
Batch-based approaches introduce latency. Whether you’re exporting data hourly or nightly, the insights are already outdated. That hurts:
- Real-time dashboards
- ML models that need fresh data
- Operational agility, where decisions hinge on recent activity
In short, batch ETL slows you down.
Streaming From DynamoDB to Iceberg Is Complex to Build
Building a custom streaming architecture to move data from DynamoDB to Iceberg often involves DynamoDB Streams, Apache Kafka, Spark, and a mesh of connectors and glue code. It’s technically possible — but:
- Schema changes can break pipelines
- Backfills require custom handling
- Retries and deduplication are hard to get right
- Infrastructure costs and operational burden grow quickly
Most teams don’t want to engineer a real-time stack from scratch — they just want a streaming DynamoDB to Iceberg pipeline that works.
The Solution: Real-Time DynamoDB to Iceberg Integration
To unlock the full analytical potential of your DynamoDB data, you need more than periodic exports and brittle ETL scripts. You need a system that:
- Captures every change in real time
- Handles schema evolution and data consistency
- Writes to Iceberg in a format optimized for analytics
- Requires minimal engineering overhead
That’s where a real-time DynamoDB to Iceberg integration comes in — and more specifically, where Estuary Flow delivers.
With Estuary Flow, you can stream DynamoDB change events (via DynamoDB Streams) directly into Iceberg-backed tables stored on Amazon S3 or another object store. Flow does all the heavy lifting — capturing inserts, updates, and deletes, transforming data into structured formats like Parquet, and delivering it into your Iceberg tables with low latency and high reliability.
Unlike custom pipelines or batch tools, Flow provides a declarative, no-code interface and built-in support for schema enforcement, retries, monitoring, and backfills, making it the fastest way to get from operational data to analytics-ready tables.
Whether you're building a real-time lakehouse, powering dashboards, or training ML models, streaming DynamoDB to Apache Iceberg with Flow gets you there without the complexity.
Why Estuary Flow for DynamoDB to Apache Iceberg?
Here’s why Estuary Flow is the best choice for building a real-time DynamoDB to Iceberg pipeline:
- Real-time, always-on sync: Flow captures change events from DynamoDB Streams as they happen and delivers them to Iceberg with low latency — ideal for real-time analytics.
- No-code setup: Create powerful, production-ready pipelines without writing a single line of code. Flow’s intuitive UI and pre-built connectors let you go from idea to implementation in minutes.
- Native support for Iceberg: Materialize data directly into Apache Iceberg tables stored on Amazon S3, with support for AWS Glue or REST catalogs — no Spark job orchestration required.
- Automated schema handling: Flow automatically maps and enforces schemas, handles type conversions, and adapts to changes in your DynamoDB documents — saving you hours of manual work.
- Backfills + exactly-once delivery: Easily backfill historical data from DynamoDB and trust that your data lands in Iceberg once and only once — even if there are failures or retries.
- Scalable and cloud-native: Whether you're syncing one table or hundreds, Flow is built to scale with your data — and it runs as managed SaaS or in your private cloud.
- Reliable monitoring and observability: Get visibility into every step of your pipeline with built-in metrics, logs, and real-time alerts.
With Estuary Flow, you don’t need to build or manage a fragile data stack. You get a reliable, real-time DynamoDB to Iceberg pipeline — ready for analytics, machine learning, or anything your lakehouse throws at it.
How to Set Up a Real-Time DynamoDB to Iceberg Pipeline with Estuary Flow
With Estuary Flow, you can move from raw DynamoDB change data to structured Iceberg tables in minutes. Here’s how to do it.
Let’s walk through the steps to build a DynamoDB → Iceberg pipeline using Estuary Flow.
Prerequisites
- One or more DynamoDB tables with DynamoDB Streams enabled
- A target Apache Iceberg setup, backed by object storage (e.g., Amazon S3) and catalog (AWS Glue or REST)
- An active Estuary Flow account
Step 1: Configure Amazon DynamoDB as the Source
- Log into your Estuary Flow account.
- In the left sidebar, click Sources, then click + NEW CAPTURE.
- In the Search connectors field, search for DynamoDB.
- Click the Capture button on the Amazon DynamoDB connector.
- On the configuration page, fill in:
- Name: A unique identifier for your capture
- AWS Access Key ID / Secret Access Key: Credentials with access to DynamoDB
- Region: AWS region of your DynamoDB table
- Click NEXT > SAVE AND PUBLISH to activate the capture.
Once configured, Flow will begin reading real-time change events (inserts, updates, deletes) from your table and write them to a collection.
Step 2: Configure Apache Iceberg as the Destination
- After your capture is active, click MATERIALIZE COLLECTIONS in the popup, or navigate to Destinations > + NEW MATERIALIZATION.
- In the Search connectors field, type Iceberg.
- Select the appropriate materialization:
- Amazon S3 Iceberg (for delta update pipelines using S3 + AWS Glue)
- Apache Iceberg (for full updates using a REST catalog, S3, and EMR)
Configuration Fields
Amazon S3 Iceberg (Delta Updates):
- Name: Unique materialization name
- AWS Access Key ID / Secret Access Key: Must have permissions for S3 and Glue
- Bucket: Your S3 bucket for data storage
- Region: AWS region for the S3 bucket and Glue catalog
- Namespace: Logical grouping of your Iceberg tables (e.g.,
prod
/analytics
) - Catalog:
- Glue: If using AWS Glue as the catalog
- REST: Provide REST URI, warehouse path, and credentials if using a custom catalog
Apache Iceberg (Standard Updates with EMR Serverless):
- URL: REST catalog base URI
- Warehouse: Iceberg warehouse path
- Namespace: Logical table grouping
- Authentication: OAuth or AWS SigV4, depending on catalog type
- Compute Settings:
- Application ID: EMR Serverless application ID
- Execution Role ARN: IAM role for job execution
- Bucket / Region: S3 bucket and AWS region for EMR
- AWS Access Key ID / Secret Access Key: Credentials to access EMR and S3
- In the Source Collections section, click SOURCE FROM CAPTURE to bind the collection created by your DynamoDB capture.
- Click NEXT > SAVE AND PUBLISH to finalize your materialization.
What You Get
Once active, Estuary Flow continuously syncs change events from your DynamoDB table into Iceberg-backed tables — enabling real-time analytics, queryability via SQL engines, and durable, governed data storage.
You can also:
- Backfill historical data (if needed)
- Apply schema evolution rules
- Monitor the pipeline via metrics and alerts
Popular Use Cases for DynamoDB to Iceberg Pipelines
Syncing DynamoDB to Apache Iceberg unlocks a broad range of powerful, real-world applications. Whether you're a data engineer building modern lakehouse pipelines or a product manager seeking better analytics, here are some top use cases:
Real-Time Operational Analytics
Keep a pulse on application activity, such as user behavior, transactions, or sensor data, by streaming DynamoDB change events into Iceberg. Query the latest data using Trino or Spark for real-time insights.
Data Lakehouse Enablement
Integrate DynamoDB into a larger data lakehouse architecture by syncing to Iceberg. This allows you to join NoSQL data with other sources (like relational databases or Kafka) in a unified, open analytics layer.
Machine Learning Feature Stores
Maintain fresh, queryable feature tables for machine learning pipelines. Iceberg’s versioning and time travel features support reproducibility and batch/stream hybrid ML workflows.
Long-Term Data Retention & Compliance
Store historical records from DynamoDB in Iceberg tables for years, cost-effectively and with schema control. This is ideal for audit logs, financial records, or regulated industries.
Simplifying ETL Infrastructure
Replace brittle batch jobs and manual exports with a real-time, fully managed pipeline that continuously syncs DynamoDB to a structured analytics layer.
Conclusion
As modern data teams move toward real-time insights and scalable analytics, bridging DynamoDB to Apache Iceberg is no longer a nice-to-have — it's essential. But building that bridge yourself? Costly, fragile, and slow.
Estuary Flow changes that.
With native support for both DynamoDB Streams and Iceberg materializations, Flow delivers a fully-managed, real-time pipeline that’s reliable, scalable, and fast to deploy — all without the usual engineering overhead.
Whether you're powering live dashboards, supporting ML pipelines, or just looking for a better way to store and query your application data, Flow gets you from source to insight in minutes.
Ready to Get Started?
Sign up for Estuary Flow to build your DynamoDB to Iceberg pipeline today — and unlock real-time analytics at scale.

About the author
With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.
Popular Articles
