
Imagine a scenario where real-time ecommerce orders data is being ingested into MongoDB. While MongoDB is excellent for fast writes and flexible data models, it’s not ideal for real-time analytics and gathering business insights such as tracking the number of orders per minute, identifying top-selling products, or understanding user behavior. MongoDB isn’t designed for large-scale analytical workloads. In many cases, services must fetch and process data from one another, leading to significant limitations.
To solve this, modern architectures adopt centralized data warehousing combined with event-driven streaming. MongoDB can continue to handle real-time operational data, while ClickHouse, an OLAP-optimized engine, can handle fast, scalable analytics. Traditionally, this would require setting up Kafka to handle the pipeline. Estuary Flow simplifies this process by using its internal system called Dekaf, which emits change data as Kafka-compatible topics without the need to manage Kafka infrastructure.
In this tutorial, we’ll simulate an ecommerce order tracking scenario, where we ingest mock data into MongoDB and stream it to ClickHouse using Estuary Flow. We’ll then query the live data for analytics such as tracking order status, product demand, and user activity.
Why Sync MongoDB Data to ClickHouse for Real-Time Ecommerce Analytics?
MongoDB is widely used for application development due to its flexible schema and quick writes, making it suitable for use cases like event tracking, user profiles, and content management. However, it's not optimized for complex analytical queries at scale. Once data volume increases, MongoDB’s performance for aggregations and joins becomes limited and often requires costly scaling.
ClickHouse is a columnar OLAP database built for fast analytical workloads. It offers:
- Real-time analytics on billions of rows
- Advanced aggregations and filtering
- Compression and optimized reads
By syncing MongoDB to ClickHouse, you get the best of both worlds: MongoDB for real-time data ingestion and application logic, and ClickHouse for scalable analytics.
This combination enables use cases like:
- Real-time dashboards powered by ClickHouse without overloading operational systems
- Scalable product, customer, or behavioral analytics built on top of fast-ingested operational data
- Offloading complex, resource-heavy queries from MongoDB to a purpose-built engine
- Customer-facing analytics features, where speed and responsiveness are critical
- Cross-service joins and aggregations, where centralized analysis removes the need for interdependent microservice calls
Streaming MongoDB to ClickHouse using Estuary Flow’s Dekaf
Flow connects directly to your MongoDB cluster and captures data in real time using change streams—MongoDB’s native mechanism for watching inserts, updates, and deletes. It then delivers those changes to ClickHouse as structured, schema-enforced records, ready for analytics.
Here’s what makes Flow ideal for this sync:
- Change Data Capture with no extra tooling - Flow taps directly into MongoDB change streams, capturing live updates as they happen. If your deployment doesn’t support change streams (like time-series collections), Flow can use batch modes with customizable polling.
- Kafka-style delivery without Kafka - Flow uses its internal Dekaf system to emit changes as Kafka-compatible topics. These topics can be consumed by ClickHouse using ClickPipes, with no need to deploy or manage Kafka infrastructure.
- Automatic backfill and live sync - When you first set up the pipeline, Flow performs a historical load to ensure ClickHouse has a complete dataset. After that, it streams changes continuously, so your analytics stay up to date.
- Schema-aware transformation - MongoDB documents are flexible, but analytics tools like ClickHouse require consistent structure. Estuary Flow infers schemas from your data, flattens nested fields where needed, and keeps track of changes to ensure reliability over time.
- Resilience and observability - Built-in retries, state tracking, and logs make the pipeline production-ready. Flow automatically resumes from where it left off if interrupted, and helps you monitor pipeline health through the UI or API.
Real World Use Case: E-commerce Orders Tracking
In ecommerce, customers expect real-time updates on their order status. With a CDC pipeline, each status change—"placed," "shipped," "delivered"—is streamed instantly from MongoDB to ClickHouse, where analytical queries can be run to visualize fulfillment progress.
Step by Step Guide: Real-Time Ecommerce Order Analytics: MongoDB to ClickHouse with Estuary Flow
Pre-requisites
To follow the guide, you will need:
- MongoDB Atlas Cluster
- Enable change streams. If you’re using MongoDB Atlas or a self-managed replica set, ensure change streams are enabled. This allows Flow to capture real-time inserts, updates, and deletes.
- Allow network access. If you’re using a managed cluster, allowlist Estuary’s IP addresses or set up an SSH tunnel so Flow can securely reach your database.
- Any Python IDE: Used to simulate mock ecommerce order data and insert it into MongoDB.
- Estuary Flow Account: Go to the Flow web app on dashboard.estuary.com to sign up for the free tier.
- ClickHouse Cloud Account: You can get a 30 days trial
Step 1: Create Python Script to Simulate Real Time Ecommerce Orders Data
pythonfrom pymongo import MongoClient
from datetime import datetime
import random
import time
import uuid
from datetime import datetime, timezone
conn=f'mongodb+srv://<YOUR_CREDENTIAL>@<YOUR_CLUSTER>.XXX.mongodb.net/'
client = MongoClient(conn)
db = client["ecommerce"]
orders_col = db["orders"]
users = list(db["users"].find())
products = list(db["products"].find())
status_flow = ["placed", "packed", "shipped", "out_for_delivery", "delivered"]
active_orders = {}
def create_order():
user = random.choice(users)
product = random.choice(products)
order_id = "ORD" + str(uuid.uuid4())[:8].upper()
now = datetime.now(timezone.utc)
order = {
"orderId": order_id,
"userEmail": user["email"],
"productId": product["productId"],
"status": "placed",
"events": [{"status": "placed", "timestamp": now}],
"lastModified": now
}
orders_col.insert_one(order)
active_orders[order_id] = 1 # next stage index
print(f"[Insert] {order_id} placed by {user['email']} - {product['name']}")
def progress_order():
for order_id in list(active_orders.keys()):
doc = orders_col.find_one({"orderId": order_id})
stage = active_orders[order_id]
if stage >= len(status_flow):
del active_orders[order_id]
continue
new_status = status_flow[stage]
now = datetime.now(timezone.utc)
orders_col.update_one(
{"orderId": order_id},
{
"$set": {"status": new_status, "lastModified": now},
"$push": {"events": {"status": new_status, "timestamp": now}}
}
)
print(f"[Update] {order_id} → {new_status}")
active_orders[order_id] += 1
if __name__ == "__main__":
while True:
if random.random() < 0.6:
create_order()
else:
progress_order()
time.sleep(30)
This Python script simulates real-time ecommerce order processing by:
- Continuously inserting and updating order data in a MongoDB collection.
- Progresses existing orders through various delivery stages - such as “packed,” “shipped,” and “delivered”—updating the database and appending new status events.
- Runs forever, alternating between inserting a new order (60% chance) and updating existing ones (40% chance).
- Sleeps for 30 seconds between each action to simulate real-time behavior.
- This setup is ideal for testing real-time data pipelines or analytics systems that rely on continuous data streams.
Output in MongoDB Collection:
I have also added products, and users data, that is static for this example.
Step 2: Setup MongoDB as a Source
Now that your MongoDB cluster is prepared, you can connect it to Estuary Flow to begin capturing data into real-time collections.
Go to your dashboard, and click on Sources, and search for MongoDB.
Enter the Following MongoDB connection details:
- Name: Give your source a meaningful name like ecommerce_orders
- Address: Provide the host and port of your MongoDB instance. For Atlas, use the full mongodb+srv://... URI. If your credentials authenticate via the admin DB, be sure to append ?authSource=admin.
- Username & Password: Enter the database user's credentials with read access to the collections you want to capture.
- Database: You can optionally specify which databases to include.
- Configure network access - Whitelist Estuary’s IP Addresses
- After completing the fields, click Next. Flow will test the connection to ensure credentials are valid and that it can access the cluster. If successful, you’ll move on to selecting collections for capture.
Step 3: Choose Collections and Configure Capture Mode
Next, choose which MongoDB collections you want to stream. Estuary Flow allows you to select collections from one or more databases and configure how each should be captured.
For each collection, you’ll need to select a capture mode:
- Change Stream Incremental – Recommended for real-time data sync. This uses MongoDB’s native change streams to continuously capture inserts, updates, and deletes.
- Batch Incremental or Batch Snapshot – Suitable for collections that don’t support change streams (e.g., views or time-series collections). In this mode, Flow periodically polls for changes using a specified cursor field (usually _id or a timestamp).
In this tutorial, we’ll use incremental streaming for the orders, products, and users collections.
After configuring the capture mode for each collection, click Publish. Estuary Flow will begin backfilling historical data from MongoDB, and then automatically switch to real-time streaming as new changes occur.
Click Publish to deploy your source. Flow will begin backfilling the selected collections and automatically start streaming new changes as they occur.
Step 4: Stream to ClickHouse via Kafka-Compatible Materialization
- In the Flow dashboard, go to Destinations and click + New Destination
- Choose ClickHouse from the connector catalog
- Click Materialize
Fill in the configuration details:
- Name the materialization (e.g., order_tracking)
- Choose the same Data Plane region used for your MongoDB source
- Set an Auth Token that ClickHouse will use to authenticate with Flow
- Under Source Collections, select the MongoDB-backed collections you captured in Step 3.
- Review the field mappings and click Publish.
Here is the Collections:
Materialization Details:
Step 5: Connect to Clickhouse via Clickpipe
Now it’s time to consume the data stream inside ClickHouse.
In the ClickHouse Cloud dashboard
- Select the Data Source as Kafka
- Add a new Kafka Pipe with these connection settings:
- Broker: dekaf.estuary-data.com:9092
- Protocol: SASL_SSL
- SASL Mechanism: PLAIN
- Username: Your full Flow materialization name (e.g., your-org/order-tracking/dekaf-clickhouse)
- Password: The Auth Token from Step 4
- Schema Registry URL: https://dekaf.estuary-data.com
- Select the topics that match your Flow collections
- Choose topic: orders
- Offset: From Beginning or Timestamp
ClickHouse will now start ingesting your MongoDB data in real time.
You can see your metrics once it successfully connects:
Step 6: Query your Data for Real Time Analytics in ClickHouse
Here is the loaded Orders table:
- Define a ClickHouse table to receive streaming order data.
pythonCREATE OR REPLACE TABLE orders_final (
orderId String,
userEmail String,
productId String,
status String,
lastModified DateTime,
_topic String
) ENGINE = ReplacingMergeTree(lastModified)
ORDER BY orderId;
select * from orders_final;
2. Create Materialized View for Snapshots:
pythonALTER TABLE orders_final
ADD COLUMN current_timestamp DateTime DEFAULT now();
ALTER TABLE orders_final
ADD COLUMN count_orders UInt32 DEFAULT 1;
CREATE MATERIALIZED VIEW events_snapshot_mv
TO orders_final
AS
SELECT
*,
now() AS current_timestamp,
1 AS count_orders
FROM orders;
SELECT * FROM events_snapshot_mv;
Events Snapshot MV Table:
To show how many orders were processed per minute using the snapshot view:
pythonSELECT
toStartOfMinute(current_timestamp) AS minute,
sum(count_orders) AS total_orders
FROM events_snapshot_mv
GROUP BY minute
ORDER BY minute DESC;
Total Orders by current timestamp:
Step 7: Create Enriched Snapshot Table:
pythonCREATE TABLE orders_enriched_snapshots (
orderId String,
status String,
lastModified DateTime,
userEmail String,
userName String,
productId String,
productName String,
productCategory String,
productPrice Float64,
enriched_at DateTime
)
ENGINE = MergeTree
ORDER BY (orderId, enriched_at);
INSERT INTO orders_enriched_snapshots
SELECT
o.orderId,
o.status,
o.lastModified,
o.userEmail,
u.name AS userName,
o.productId,
p.name AS productName,
p.category AS productCategory,
p.price AS productPrice,
now() AS enriched_at
FROM orders o
LEFT JOIN users u ON o.userEmail = u.email
LEFT JOIN products p ON o.productId = p.productId;
The above query combines orders data with users and products data. This works like a CDC snapshot: every run captures the latest state from orders and enriches it with user and product data into orders_enriched_snapshots table
This is useful because operational data in systems like MongoDB is often fragmented across multiple collections. By centralizing and enriching it in a single table, you make it easier to run analytical queries, build dashboards, and track historical trends without repeatedly performing expensive joins.
Ready to Power Real-Time Analytics at Scale? Start streaming for free and bring your analytics to life.
Conclusion
Using Estuary Flow with MongoDB and ClickHouse enables seamless real-time analytics without managing Kafka infrastructure or tightly coupling microservices. With just a few configuration steps, you can simulate, ingest, stream, and query ecommerce orders for instant operational insights.
We’ve demonstrated how to:
- Simulate real-time operational data in MongoDB.
- Capture changes in near real time using Estuary Flow’s CDC pipeline.
- Stream changes to ClickHouse for fast, OLAP-style analytics.
- Enrich and join data from multiple sources
This architecture solves common microservice pain points—such as inter-service joins, performance bottlenecks, and scaling issues—by centralizing analytics in a purpose-built system. Whether you’re running ecommerce tracking, cross-service reporting, or real-time dashboards, this approach allows your services to remain decoupled and focused while still enabling powerful insights at scale.
FAQs
1. Do I need to set up Kafka to stream MongoDB data to ClickHouse with Estuary Flow?
2. What types of MongoDB collections can I capture with Estuary Flow?
3. How can I enrich MongoDB order data with user and product info in ClickHouse?

About the author
Ruhee has a background in Computer Science and Economics and has worked as a Data Engineer for SaaS providing tech startups, where she has automated ETL processes using cutting-edge technologies and migrated data infrastructures to the cloud with AWS/Azure services. She is currently pursuing a Master’s in Business Analytics with a focus on Operations and AI at Worcester Polytechnic Institute.
