
DuckLake is a new, simplified approach to building lakehouses. Instead of dealing with JSON and Avro manifest layers like Iceberg or Delta Lake, DuckLake uses a SQL database to manage metadata and Parquet files for storage. It’s fast, open, and easy to manage.
In this guide, we’ll show how to set up a DuckLake database in MotherDuck and continuously load data into it using Estuary Flow.
Setting Up DuckLake with MotherDuck and Estuary Flow
To load data into DuckLake, you'll need to create a database in MotherDuck and configure a streaming data pipeline using Estuary Flow. DuckLake supports both fully managed and Bring Your Own Bucket (BYOB) deployment models, giving you flexibility over metadata and storage layers. Follow the steps below to set up your lakehouse and start ingesting real-time data.
Step 1: Choose Your DuckLake Deployment Model
MotherDuck offers two ways to create a DuckLake database. Choose the one that best fits your use case:
Option 1: Fully Managed DuckLake Database
Both the metadata and data are stored in MotherDuck-managed infrastructure. Fast to set up, great for quick evaluations.
plaintextCREATE DATABASE my_ducklake (TYPE DUCKLAKE);
Option 2: Bring Your Own Bucket (BYOB)
You use your own S3-compatible storage for Parquet files while MotherDuck handles the metadata.
plaintextCREATE DATABASE my_ducklake (
TYPE DUCKLAKE,
DATA_PATH 's3://your-bucket/your-path/'
);
Then create a secret for credentials:
plaintextCREATE SECRET my_secret IN MOTHERDUCK (
TYPE S3,
KEY_ID 'your-access-key',
SECRET 'your-secret-key',
REGION 'your-region'
);
✅ Tip: Use an S3 bucket in us-east-1 to avoid cross-region latency when using MotherDuck compute.
Step 2: Set Up Estuary Flow to Materialize into DuckLake
Estuary Flow lets you connect streaming and batch data sources and continuously materialize into DuckLake.
Want a quick walkthrough? Check out this video tutorial:
Follow these steps to connect Estuary Flow to DuckLake:
- Set Up Your Source Connector: Choose from supported sources like PostgreSQL, MySQL, Kafka, S3, or even webhooks.
- Create a Derivation (optional): You can transform and filter your data using Flow’s TypeScript derivations, or just pass it through.
- Configure the DuckLake Materialization: Use Estuary’s DuckLake connector to write directly to your DuckLake catalog (via the MotherDuck catalog endpoint or your own DuckDB instance).
You’ll configure:
- Target database
- Table name and schema mapping
- Deploy the Flow pipeline: Once deployed, Flow continuously pushes updates to your DuckLake tables with exactly-once semantics.
Step 3: Query and Explore Your DuckLake Data
After Flow writes data into DuckLake, you can query it from:
- MotherDuck web UI
- DuckDB CLI
- dbt or your favorite SQL IDE
- Python, JavaScript, or other language bindings
Example:
plaintextSELECT *
FROM my_ducklake.your_table
WHERE event_time > NOW() - INTERVAL '1 HOUR';
DuckLake supports time travel, incremental reads, and even metadata queries like:
plaintextFROM ducklake_snapshots('my_ducklake');
Ready to build your DuckLake pipeline? Create your free Estuary account and start streaming data in minutes — no code required.
Use Cases for DuckLake + Estuary Flow
Here are some powerful real-world use cases enabled by combining DuckLake and Estuary Flow:
- Data Sharing: Publish curated, fast-changing datasets (e.g., product metrics, financial records) to other teams via DuckLake’s SQL-based structure and versioned snapshots.
- Machine Learning Feature: Continuously update feature tables with low-latency writes from Flow and train models directly from DuckLake with native DuckDB or Python integrations.
- Streaming ETL into Parquet Lakes: Automate complex transformation logic in Flow, and land the data in open Parquet format with schema control and rollback support.
Summary
DuckLake + Estuary Flow is a powerful combo. You get:
- Open format (Parquet) + SQL-native metadata
- Fully managed or BYOB flexibility
- Real-time, exactly-once ingestion with Flow
- Scalable reads/writes from your apps or tools
Whether you're building a real-time analytics stack or just want a no-fuss lakehouse, DuckLake makes it simple, and Estuary gets your data there.
Want help getting started? Reach out to us at Estuary or join our Slack community.
FAQs
1. Does Estuary Flow support schema evolution when materializing into DuckLake?
2. Can I use my own compute engine to query a DuckLake filled by Flow?
3. How does Estuary ensure data consistency in DuckLake?

About the author
Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.
