KinesisApache Iceberg

10 min read

August 15, 2025

How to Stream Kinesis Data to Apache Iceberg Without Complex ETL

Learn how to stream Amazon Kinesis data into Apache Iceberg tables using Estuary Flow. Configure your pipeline for low-latency, ACID-compliant table commits without complex ETL jobs.

Team Estuary Estuary Editorial Team

Share this article

Key Takeaways

Stream Kinesis to Apache Iceberg tables with low-latency transactional commits for analytics, AI models, and BI dashboards.
Ensure consistent and reliable tables with Iceberg’s ACID-compliant commit process.
Maintain schema governance and evolution with Estuary Flow’s managed collections for smooth downstream queries.
Configure and deploy a Kinesis to Iceberg table pipeline in minutes using Estuary Flow’s no-code interface or specification files.
Start with Estuary Flow’s free tier to process up to 10 GB per month, then scale to production workloads without changing your pipeline.

Introduction

Amazon Kinesis is a cloud-based service for ingesting high volumes of streaming data such as application logs, clickstream events, IoT sensor readings, and real-time transactions. Apache Iceberg is an open table format designed for large-scale analytics on data lakes, offering ACID transactions, schema evolution, and compatibility with multiple query engines including Spark, Trino, Flink, and Athena.

For many teams, combining Kinesis with Iceberg creates a powerful foundation for analytics and AI workloads. Kinesis delivers a constant stream of events, while Iceberg provides the reliability and structure needed for querying that data at scale. The challenge is building a pipeline that moves data from Kinesis into Iceberg tables without manual batch jobs, custom scripts, or complex orchestration.

Estuary Flow solves this problem by providing a managed, low-latency pipeline from Amazon Kinesis streams directly into Apache Iceberg tables. It handles the ingestion, schema validation, and table commits for you, so your analytics and machine learning teams can work with the most up-to-date data in a consistent and query-ready format. Whether you are building BI dashboards, real-time analytics platforms, or AI-driven applications, this approach lets you focus on insights instead of pipeline maintenance.

Why Stream Kinesis Data to Apache Iceberg

Moving data from Amazon Kinesis streams into Apache Iceberg tables allows organizations to combine the speed of streaming ingestion with the reliability and flexibility of a modern data lakehouse format. This integration delivers several advantages:

1. Low-latency data availability

By configuring Estuary Flow to capture events from Kinesis and commit them to Iceberg on a frequent schedule, your tables can be updated quickly after events occur. This ensures that your downstream systems always work with fresh data instead of waiting for traditional batch windows.

2. ACID-compliant table updates

Iceberg supports atomic commits, which means every set of changes from Kinesis is applied in a consistent state. This prevents partial updates and ensures that queries, dashboards, and machine learning pipelines always return reliable results.

3. Schema evolution and governance

Kinesis event schemas can change over time as applications evolve. Estuary Flow enforces JSON schema validation and manages safe schema changes so that your Iceberg tables stay compatible with downstream tools.

4. Simplified data pipeline operations

Streaming directly from Kinesis to Iceberg removes the need for complex ETL jobs, multiple staging steps, or custom integration code. This reduces operational overhead and makes the pipeline easier to maintain.

5. Scalability for high-volume workloads

Kinesis is designed to handle massive data throughput, and Iceberg can scale to store and query petabytes of data. Together, they support real-time ingestion and analytics for enterprise-grade workloads.

With these benefits, a Kinesis to Iceberg pipeline is well-suited for applications like BI dashboards, time-series analytics, operational monitoring, and large-scale data science projects.

Prerequisites Checklist

To stream Kinesis data into Iceberg tables using Estuary Flow, you need:

Amazon Kinesis streams with JSON data in the same AWS region.
AWS IAM user with Kinesis read permissions and secure access keys.
Apache Iceberg REST catalog such as AWS Glue, S3 Tables, or Snowflake Open Catalog.
AWS EMR Serverless application for compute, with a staging S3 bucket.
Estuary Flow account to configure the capture and materialization.

With these in place, you can configure the pipeline and start sending Kinesis events into Iceberg tables within minutes.

Architecture Overview

A Kinesis to Iceberg pipeline in Estuary Flow has three main layers that work together to move data reliably from a streaming source into a transactional table format.

1. Source Layer – Kinesis Capture: The pipeline starts with Amazon Kinesis streams, which deliver continuous JSON events. Estuary Flow’s Kinesis capture connector subscribes to these streams, performs an initial backfill of historical records, and then switches to streaming mode to capture new events as they arrive.

2. Transport and Governance Layer – Estuary Flow Collections: Captured events are stored in Flow collections, which act as a real-time staging area. These collections validate data against JSON schemas, track data lineage, and handle schema changes safely so that your downstream Iceberg tables stay compatible with query engines.

3. Destination Layer – Iceberg Materialization: On the destination side, Estuary Flow uses the Apache Iceberg materialization connector to merge new events into Iceberg tables. This process runs on AWS EMR Serverless, staging files in S3 and committing them atomically to the Iceberg catalog. The result is consistent, ACID-compliant tables that are ready for analytics and BI queries.

This architecture removes the need for manual ETL jobs, enables frequent table updates, and ensures that data is both current and reliable for downstream use.

Watch in action

See how to load data into Iceberg Using Estuary Flow:

Step-by-Step: Setting Up Kinesis to Iceberg in Estuary Flow

Step 1: Create the Kinesis Capture

Log in to the Estuary Flow web application.
Navigate to Sources and select Amazon Kinesis from the connector list.
Enter the required connection details:
- AWS Access Key ID
- AWS Secret Access Key
- AWS Region where the Kinesis stream is located
- (Optional) AWS Endpoint if capturing from a Kinesis-compatible API not hosted by AWS
Add one or more stream bindings for the Kinesis streams you want to capture.
Save and publish the capture. Flow will backfill existing data, then continue capturing new events.

Step 2: Create the Apache Iceberg Materialization

Go to Destinations in the Flow web application and select Apache Iceberg.
Configure your Iceberg catalog details:
- Base URL for the catalog (e.g., Glue, S3 Tables, or Snowflake Open Catalog)
- Warehouse (for Glue, this is your AWS account ID without hyphens)
- Namespace for your Iceberg tables
- Base Location if required by your catalog
Set Catalog Authentication:
- AWS SigV4 for Glue or S3 Tables
- OAuth 2.0 Client Credentials for other REST catalogs
Provide Compute configuration for AWS EMR Serverless:
- AWS Access Key ID and Secret Access Key
- AWS Region of the EMR application and S3 staging bucket
- EMR Application ID and Execution Role ARN
- S3 Bucket and optional Bucket Path for staging
(Optional) Enable Lowercase Column Names if you will query tables in Athena.
Save and publish the materialization.

Step 3: Bind Collections to Iceberg Tables

In the materialization setup, go to Source Collections.
Select the Flow collections created by your Kinesis capture.
Map each collection to an Iceberg table name and namespace.
Save and publish to start moving data from Kinesis streams into Iceberg tables.

Ready to streamline your data pipeline?

Start streaming Kinesis data into Apache Iceberg tables today with Estuary Flow. Sign up for the free tier to move up to 10 GB per month at no cost, or scale to production workloads with confidence. Get Started Now

Performance and Cost Optimization

A well-configured Kinesis to Iceberg pipeline in Estuary Flow can deliver fresh, consistent data without unnecessary infrastructure costs. These settings and practices can help you get the most out of your deployment:

Adjust sync frequency based on needs: Set the Iceberg materialization’s sync schedule according to how quickly downstream systems need updates. Shorter intervals deliver fresher data but may increase compute costs.
Use EMR Serverless autoscaling: Enable autoscaling in AWS EMR Serverless so compute resources scale with workload size. Configure auto-stop timers to release resources when idle, reducing costs during periods of low data volume.
Optimize table partitioning: Partition Iceberg tables based on query patterns, such as by date or customer ID. This reduces the amount of data scanned in queries and improves performance.
Schedule table maintenance: Over time, Iceberg tables can accumulate many small files. Run compaction and other maintenance tasks during off-peak hours to improve read performance and keep EMR runtimes efficient.
Avoid unnecessary captures: Only bind and materialize the streams you actually need. Reducing the number of bindings lowers both storage and processing overhead.

By tuning these parameters, you can balance data freshness with predictable costs while keeping your Iceberg tables query-ready.

Security and Compliance

When streaming data from Kinesis to Iceberg, it is important to ensure that the pipeline is both secure and compliant with your organization’s requirements. Estuary Flow provides several options to help you achieve this.

1. Bring Your Own Cloud (BYOC)

Run Estuary Flow’s data plane inside your own VPC so data never leaves your infrastructure. This gives you full control over compute, storage, and network paths.

2. Private networking

Connect securely to AWS services and catalogs using:

VPC Peering for direct VPC-to-VPC connectivity
AWS PrivateLink for private, secure access to AWS services without traversing the public internet
SSH tunnels for secure connections to restricted networks

3. Least-privilege IAM policies

Grant only the permissions needed for the pipeline to operate:

Kinesis read permissions for capture
Catalog and staging bucket access for Iceberg materialization
EMR Serverless execution role limited to the specific application and region

4. Encryption in transit and at rest

All communication between Flow and AWS services is encrypted with TLS. Data stored in S3 or other staging locations can be encrypted using SSE-S3, SSE-KMS, or equivalent cloud-native encryption.

5. Audit and lineage tracking

Flow collections provide immutable, schema-enforced logs of all records written. This supports auditing and compliance by allowing you to trace each record from capture to table commit.

By combining these security features with your organization’s policies, you can build a Kinesis to Iceberg pipeline that meets strict governance and compliance standards.

Conclusion

Integrating Amazon Kinesis with Apache Iceberg through Estuary Flow allows you to move from high-volume streaming data to query-ready, ACID-compliant tables without managing complex ETL pipelines. By capturing events directly from Kinesis streams and committing them to Iceberg tables through AWS EMR Serverless, you maintain a pipeline that is reliable, scalable, and easy to operate.

With schema governance, frequent commit options, and security features such as BYOC and private networking, this setup supports both technical performance and compliance requirements. Whether your goal is powering BI dashboards, operational monitoring, or large-scale analytics, the combination of Kinesis and Iceberg provides a strong foundation for timely, trustworthy data.

Get started today — sign up for a free Estuary Flow account, connect your Kinesis streams, and begin materializing them into Iceberg tables in just a few minutes.

FAQs

Can I stream multiple Kinesis streams to the same Iceberg table?

Yes, you can capture data from multiple Kinesis streams in a single Estuary Flow pipeline, but all streams must be in the same AWS region. You can then configure bindings in the Iceberg materialization to merge the data into one or more tables, depending on your data model.

Do I need to write custom code for this integration?

No, Estuary Flow provides a no-code interface for configuring both the Kinesis capture and Iceberg materialization. All you need is the correct AWS credentials, catalog details, and EMR Serverless configuration.

What happens if the pipeline fails during a sync?

If a sync fails before completion, the Iceberg table commit is not applied. This ensures that your tables remain in a consistent state and prevents partial updates from appearing in queries.

Can I run this pipeline entirely in my own cloud environment?

Yes, Estuary Flow offers a Bring Your Own Cloud (BYOC) deployment option, which allows you to run the data plane inside your own VPC for full control over compute, storage, and networking.

Share this article

Table of Contents

Start Building For Free

About the author

Team EstuaryEstuary Editorial Team

Team Estuary is a group of engineers, product experts, and data strategists building the future of real-time and batch data integration. We write to share technical insights, industry trends, and practical guides.

How to Stream Kinesis Data to Apache Iceberg Without Complex ETL

Key Takeaways

Introduction