MotherduckDuckLake

4 min read

Last updated: August 5, 2025

How to Load Streaming Data into DuckLake with Estuary Flow

Learn how to create a DuckLake lakehouse in MotherDuck and continuously load real-time data using Estuary Flow. Includes setup steps, SQL examples, and tips for BYOB or fully managed deployments.

Dani Pálma Head of Data & Marketing

Load Streaming Data into DuckLake Motherduck With Estuary Flow

Share this article

DuckLake is a new, simplified approach to building lakehouses. Instead of dealing with JSON and Avro manifest layers like Iceberg or Delta Lake, DuckLake uses a SQL database to manage metadata and Parquet files for storage. It’s fast, open, and easy to manage.

In this guide, we’ll show how to set up a DuckLake database in MotherDuck and continuously load data into it using Estuary Flow.

Check out this webinar recording for an end-to-end demonstration on how Estuary integrates with DuckLake

Setting Up DuckLake with MotherDuck and Estuary Flow

To load data into DuckLake, you'll need to create a database in MotherDuck and configure a streaming data pipeline using Estuary Flow. DuckLake supports both fully managed and Bring Your Own Bucket (BYOB) deployment models, giving you flexibility over metadata and storage layers. Follow the steps below to set up your lakehouse and start ingesting real-time data.

Step 1: Choose Your DuckLake Deployment Model

MotherDuck offers two ways to create a DuckLake database. Choose the one that best fits your use case:

Option 1: Fully Managed DuckLake Database

Both the metadata and data are stored in MotherDuck-managed infrastructure. Fast to set up, great for quick evaluations.

plaintext
CREATE DATABASE my_ducklake (TYPE DUCKLAKE);

Option 2: Bring Your Own Bucket (BYOB)

You use your own S3-compatible storage for Parquet files while MotherDuck handles the metadata.

plaintextCREATE DATABASE my_ducklake (
    TYPE DUCKLAKE,
    DATA_PATH 's3://your-bucket/your-path/'
);

Then create a secret for credentials:

plaintextCREATE SECRET my_secret IN MOTHERDUCK (
    TYPE S3,
    KEY_ID 'your-access-key',
    SECRET 'your-secret-key',
    REGION 'your-region'
);

✅ Tip: Use an S3 bucket in us-east-1 to avoid cross-region latency when using MotherDuck compute.

Step 2: Set Up Estuary Flow to Materialize into DuckLake

Estuary Flow lets you connect streaming and batch data sources and continuously materialize into DuckLake.

Want a quick walkthrough? Check out this video tutorial:

Follow these steps to connect Estuary Flow to DuckLake:

Set Up Your Source Connector: Choose from supported sources like PostgreSQL, MySQL, Kafka, S3, or even webhooks.

Create a Derivation (optional): You can transform and filter your data using Flow’s TypeScript derivations, or just pass it through.
Configure the DuckLake Materialization: Use Estuary’s DuckLake connector to write directly to your DuckLake catalog (via the MotherDuck catalog endpoint or your own DuckDB instance).

You’ll configure:

Target database
Table name and schema mapping

Deploy the Flow pipeline: Once deployed, Flow continuously pushes updates to your DuckLake tables with exactly-once semantics.

Step 3: Query and Explore Your DuckLake Data

After Flow writes data into DuckLake, you can query it from:

MotherDuck web UI
DuckDB CLI
dbt or your favorite SQL IDE
Python, JavaScript, or other language bindings

Example:

plaintextSELECT *
FROM my_ducklake.your_table
WHERE event_time > NOW() - INTERVAL '1 HOUR';

DuckLake supports time travel, incremental reads, and even metadata queries like:

plaintext
FROM ducklake_snapshots('my_ducklake');

Ready to build your DuckLake pipeline? Create your free Estuary account and start streaming data in minutes — no code required.

Use Cases for DuckLake + Estuary Flow

Here are some powerful real-world use cases enabled by combining DuckLake and Estuary Flow:

Data Sharing: Publish curated, fast-changing datasets (e.g., product metrics, financial records) to other teams via DuckLake’s SQL-based structure and versioned snapshots.
Machine Learning Feature: Continuously update feature tables with low-latency writes from Flow and train models directly from DuckLake with native DuckDB or Python integrations.
Streaming ETL into Parquet Lakes: Automate complex transformation logic in Flow, and land the data in open Parquet format with schema control and rollback support.

Summary

DuckLake + Estuary Flow is a powerful combo. You get:

Open format (Parquet) + SQL-native metadata
Fully managed or BYOB flexibility
Real-time, exactly-once ingestion with Flow
Scalable reads/writes from your apps or tools

Whether you're building a real-time analytics stack or just want a no-fuss lakehouse, DuckLake makes it simple, and Estuary gets your data there.

Want help getting started? Reach out to us at Estuary or join our Slack community.

FAQs

Does Estuary Flow support schema evolution when materializing into DuckLake?

Yes. Estuary Flow supports full schema evolution — adding, removing, or renaming fields — and these changes are reflected in DuckLake via transactional DDL. You can version and roll back schema changes easily using DuckLake’s built-in snapshotting.

Can I use my own compute engine to query a DuckLake filled by Flow?

Absolutely. You can use your own DuckDB client to read and write to a DuckLake database, as long as it can access the metadata (via MotherDuck or local storage) and the Parquet files (via S3 or other compatible storage). This makes DuckLake great for hybrid setups.

How does Estuary ensure data consistency in DuckLake?

Estuary Flow uses exactly-once delivery semantics and writes each transaction to DuckLake as a single, atomic SQL transaction. This guarantees that each change set is either fully committed or not at all — no duplicates, no partial writes.

Share this article

Table of Contents

Start Building For Free

About the author

Dani PálmaHead of Data & Marketing

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

How to Load Streaming Data into DuckLake with Estuary Flow

Setting Up DuckLake with MotherDuck and Estuary Flow

Step 1: Choose Your DuckLake Deployment Model

Option 1: Fully Managed DuckLake Database

Option 2: Bring Your Own Bucket (BYOB)

Step 2: Set Up Estuary Flow to Materialize into DuckLake

Step 3: Query and Explore Your DuckLake Data

Use Cases for DuckLake + Estuary Flow

Summary

FAQs

Does Estuary Flow support schema evolution when materializing into DuckLake?

Can I use my own compute engine to query a DuckLake filled by Flow?

How does Estuary ensure data consistency in DuckLake?

Start streaming your data for free

About the author

Related Articles

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.

How to Load Streaming Data into DuckLake with Estuary Flow

Setting Up DuckLake with MotherDuck and Estuary Flow

Step 1: Choose Your DuckLake Deployment Model

Option 1: Fully Managed DuckLake Database

Option 2: Bring Your Own Bucket (BYOB)

Step 2: Set Up Estuary Flow to Materialize into DuckLake

Step 3: Query and Explore Your DuckLake Data

Use Cases for DuckLake + Estuary Flow

Summary

FAQs

Does Estuary Flow support schema evolution when materializing into DuckLake?

Can I use my own compute engine to query a DuckLake filled by Flow?

How does Estuary ensure data consistency in DuckLake?

Start streaming your data for free

About the author

Related Articles

Postgres to MotherDuck: Offload Analytics Without Slowing Down

Snowflake to Motherduck: 2 Methods to Stream Your Data

Offloading BigQuery Workloads to MotherDuck for Faster and Affordable Analytics

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.