Estuary

MongoDB to ClickHouse: Real-Time Ecommerce Analytics with Estuary

Learn how to stream real-time ecommerce order data from MongoDB to ClickHouse using Estuary Flow. No Kafka required. Build dashboards, enrich data, and scale analytics fast.

Blog post hero image
Share this article

Imagine a scenario where real-time ecommerce orders data is being ingested into MongoDB. While MongoDB is excellent for fast writes and flexible data models, it’s not ideal for real-time analytics and gathering business insights such as tracking the number of orders per minute, identifying top-selling products, or understanding user behavior. MongoDB isn’t designed for large-scale analytical workloads. In many cases, services must fetch and process data from one another, leading to significant limitations.

To solve this, modern architectures adopt centralized data warehousing combined with event-driven streaming. MongoDB can continue to handle real-time operational data, while ClickHouse, an OLAP-optimized engine, can handle fast, scalable analytics. Traditionally, this would require setting up Kafka to handle the pipeline. Estuary Flow simplifies this process by using its internal system called Dekaf, which emits change data as Kafka-compatible topics without the need to manage Kafka infrastructure. 

In this tutorial, we’ll simulate an ecommerce order tracking scenario, where we ingest mock data into MongoDB and stream it to ClickHouse using Estuary Flow. We’ll then query the live data for analytics such as tracking order status, product demand, and user activity.

Why Sync MongoDB Data to ClickHouse for Real-Time Ecommerce Analytics?

MongoDB is widely used for application development due to its flexible schema and quick writes, making it suitable for use cases like event tracking, user profiles, and content management. However, it's not optimized for complex analytical queries at scale. Once data volume increases, MongoDB’s performance for aggregations and joins becomes limited and often requires costly scaling.

ClickHouse is a columnar OLAP database built for fast analytical workloads. It offers:

  • Real-time analytics on billions of rows
  • Advanced aggregations and filtering
  • Compression and optimized reads

By syncing MongoDB to ClickHouse, you get the best of both worlds: MongoDB for real-time data ingestion and application logic, and ClickHouse for scalable analytics.

This combination enables use cases like:

  • Real-time dashboards powered by ClickHouse without overloading operational systems
  • Scalable product, customer, or behavioral analytics built on top of fast-ingested operational data
  • Offloading complex, resource-heavy queries from MongoDB to a purpose-built engine
  • Customer-facing analytics features, where speed and responsiveness are critical
  • Cross-service joins and aggregations, where centralized analysis removes the need for interdependent microservice calls

Streaming MongoDB to ClickHouse using Estuary Flow’s Dekaf

Flow connects directly to your MongoDB cluster and captures data in real time using change streams—MongoDB’s native mechanism for watching inserts, updates, and deletes. It then delivers those changes to ClickHouse as structured, schema-enforced records, ready for analytics.

Here’s what makes Flow ideal for this sync:

  • Change Data Capture with no extra tooling - Flow taps directly into MongoDB change streams, capturing live updates as they happen. If your deployment doesn’t support change streams (like time-series collections), Flow can use batch modes with customizable polling.
  • Kafka-style delivery without Kafka - Flow uses its internal Dekaf system to emit changes as Kafka-compatible topics. These topics can be consumed by ClickHouse using ClickPipes, with no need to deploy or manage Kafka infrastructure.
  • Automatic backfill and live sync - When you first set up the pipeline, Flow performs a historical load to ensure ClickHouse has a complete dataset. After that, it streams changes continuously, so your analytics stay up to date.
  • Schema-aware transformation - MongoDB documents are flexible, but analytics tools like ClickHouse require consistent structure. Estuary Flow infers schemas from your data, flattens nested fields where needed, and keeps track of changes to ensure reliability over time.
  • Resilience and observability - Built-in retries, state tracking, and logs make the pipeline production-ready. Flow automatically resumes from where it left off if interrupted, and helps you monitor pipeline health through the UI or API.

Real World Use Case: E-commerce Orders Tracking 

In ecommerce, customers expect real-time updates on their order status. With a CDC pipeline, each status change—"placed," "shipped," "delivered"—is streamed instantly from MongoDB to ClickHouse, where analytical queries can be run to visualize fulfillment progress.

Architecture Diagram - Real-time streaming from MongoDB to ClickHouse

Step by Step Guide: Real-Time Ecommerce Order Analytics: MongoDB to ClickHouse with Estuary Flow

Pre-requisites

To follow the guide, you will need:

  • MongoDB Atlas Cluster
    • Enable change streams. If you’re using MongoDB Atlas or a self-managed replica set, ensure change streams are enabled. This allows Flow to capture real-time inserts, updates, and deletes.
    • Allow network access. If you’re using a managed cluster, allowlist Estuary’s IP addresses or set up an SSH tunnel so Flow can securely reach your database.
  • Any Python IDE: Used to simulate mock ecommerce order data and insert it into MongoDB.
  • Estuary Flow Account: Go to the Flow web app on dashboard.estuary.com to sign up for the free tier.
  • ClickHouse Cloud Account: You can get a 30 days trial

Step 1: Create Python Script to Simulate Real Time Ecommerce Orders Data 

python
from pymongo import MongoClient from datetime import datetime import random import time import uuid from datetime import datetime, timezone conn=f'mongodb+srv://<YOUR_CREDENTIAL>@<YOUR_CLUSTER>.XXX.mongodb.net/' client = MongoClient(conn) db = client["ecommerce"] orders_col = db["orders"] users = list(db["users"].find()) products = list(db["products"].find()) status_flow = ["placed""packed""shipped""out_for_delivery""delivered"] active_orders = {} def create_order(): user = random.choice(users) product = random.choice(products) order_id = "ORD" + str(uuid.uuid4())[:8].upper() now = datetime.now(timezone.utc) order = { "orderId": order_id, "userEmail": user["email"], "productId": product["productId"], "status""placed", "events": [{"status""placed""timestamp": now}], "lastModified": now } orders_col.insert_one(order) active_orders[order_id] = 1  # next stage index print(f"[Insert] {order_id} placed by {user['email']} - {product['name']}") def progress_order(): for order_id in list(active_orders.keys()): doc = orders_col.find_one({"orderId": order_id}) stage = active_orders[order_id] if stage >= len(status_flow): del active_orders[order_id] continue new_status = status_flow[stage] now = datetime.now(timezone.utc) orders_col.update_one( {"orderId": order_id}, { "$set": {"status": new_status, "lastModified": now}, "$push": {"events": {"status": new_status, "timestamp": now}} } ) print(f"[Update] {order_id}{new_status}") active_orders[order_id] += 1 if __name__ == "__main__": while True: if random.random() < 0.6: create_order() else: progress_order() time.sleep(30)

This Python script simulates real-time ecommerce order processing by:

  • Continuously inserting and updating order data in a MongoDB collection. 
  • Progresses existing orders through various delivery stages - such as “packed,” “shipped,” and “delivered”—updating the database and appending new status events. 
  • Runs forever, alternating between inserting a new order (60% chance) and updating existing ones (40% chance).
  • Sleeps for 30 seconds between each action to simulate real-time behavior.
  • This setup is ideal for testing real-time data pipelines or analytics systems that rely on continuous data streams.

Output in MongoDB Collection:

MongoDB to ClickHouse - Output in MongoDB Collection

I have also added products, and users data, that is static for this example.

Step 2: Setup MongoDB as a Source

Move Data from MongoDB to ClickHouse - Setup MongoDB as a Source

Now that your MongoDB cluster is prepared, you can connect it to Estuary Flow to begin capturing data into real-time collections.

Go to your dashboard, and click on Sources, and search for MongoDB.

Enter the Following MongoDB connection details:

  • Name: Give your source a meaningful name like ecommerce_orders 
  • Address: Provide the host and port of your MongoDB instance. For Atlas, use the full mongodb+srv://... URI. If your credentials authenticate via the admin DB, be sure to append ?authSource=admin.
  • Username & Password: Enter the database user's credentials with read access to the collections you want to capture.
  • Database: You can optionally specify which databases to include.
  • Configure network access - Whitelist Estuary’s IP Addresses
  • After completing the fields, click Next. Flow will test the connection to ensure credentials are valid and that it can access the cluster. If successful, you’ll move on to selecting collections for capture.

Step 3: Choose Collections and Configure Capture Mode

image3.png

Next, choose which MongoDB collections you want to stream. Estuary Flow allows you to select collections from one or more databases and configure how each should be captured.

For each collection, you’ll need to select a capture mode:

  • Change Stream Incremental – Recommended for real-time data sync. This uses MongoDB’s native change streams to continuously capture inserts, updates, and deletes.
  • Batch Incremental or Batch Snapshot – Suitable for collections that don’t support change streams (e.g., views or time-series collections). In this mode, Flow periodically polls for changes using a specified cursor field (usually _id or a timestamp).

In this tutorial, we’ll use incremental streaming for the ordersproducts, and users collections.

After configuring the capture mode for each collection, click Publish. Estuary Flow will begin backfilling historical data from MongoDB, and then automatically switch to real-time streaming as new changes occur.

Click Publish to deploy your source. Flow will begin backfilling the selected collections and automatically start streaming new changes as they occur.

Step 4: Stream to ClickHouse via Kafka-Compatible Materialization

Stream to ClickHouse via Kafka-Compatible Materialization
  1. In the Flow dashboard, go to Destinations and click + New Destination
  2. Choose ClickHouse from the connector catalog
  3. Click Materialize

Fill in the configuration details:

  • Name the materialization (e.g., order_tracking)
  • Choose the same Data Plane region used for your MongoDB source
  • Set an Auth Token that ClickHouse will use to authenticate with Flow
  • Under Source Collections, select the MongoDB-backed collections you captured in Step 3.
  • Review the field mappings and click Publish.

Here is the Collections:

image15.png

Materialization Details: 

image5.png

Migrating Data into ClickHouse With Estuary

Step 5: Connect to Clickhouse via Clickpipe

Now it’s time to consume the data stream inside ClickHouse.

In the ClickHouse Cloud dashboard

  1. Select the Data Source as Kafka
  2. Add a new Kafka Pipe with these connection settings:
  • Broker: dekaf.estuary-data.com:9092
  • Protocol: SASL_SSL
  • SASL Mechanism: PLAIN
  • Username: Your full Flow materialization name (e.g., your-org/order-tracking/dekaf-clickhouse)
  • Password: The Auth Token from Step 4
  • Schema Registry URL: https://dekaf.estuary-data.com
image13.png
  1. Select the topics that match your Flow collections
  • Choose topic: orders
  • Offset: From Beginning or Timestamp
image14.png

ClickHouse will now start ingesting your MongoDB data in real time.

You can see your metrics once it successfully connects:

image2.png

Step 6: Query your Data for Real Time Analytics in ClickHouse

Here is the loaded Orders table:

image11.png
  1. Define a ClickHouse table to receive streaming order data.
python
CREATE OR REPLACE TABLE orders_final ( orderId String, userEmail String, productId String, status String, lastModified DateTime, _topic String ) ENGINE = ReplacingMergeTree(lastModified) ORDER BY orderId; select * from orders_final;

2. Create Materialized View for Snapshots:

python
ALTER TABLE orders_final ADD COLUMN current_timestamp DateTime DEFAULT now(); ALTER TABLE orders_final ADD COLUMN count_orders UInt32 DEFAULT 1; CREATE MATERIALIZED VIEW events_snapshot_mv TO orders_final AS SELECT *, now() AS current_timestamp, 1 AS count_orders FROM orders; SELECT * FROM events_snapshot_mv;

Events Snapshot MV Table:

image7.png

To show how many orders were processed per minute using the snapshot view:

python
SELECT toStartOfMinute(current_timestamp) AS minute, sum(count_orders) AS total_orders FROM events_snapshot_mv GROUP BY minute ORDER BY minute DESC;

Total Orders by current timestamp:

image10.png

Step 7: Create Enriched Snapshot Table:

python
CREATE TABLE orders_enriched_snapshots ( orderId String, status String, lastModified DateTime, userEmail String, userName String, productId String, productName String, productCategory String, productPrice Float64, enriched_at DateTime ) ENGINE = MergeTree ORDER BY (orderId, enriched_at); INSERT INTO orders_enriched_snapshots SELECT o.orderId, o.status, o.lastModified, o.userEmail, u.name AS userName, o.productId, p.name AS productName, p.category AS productCategory, p.price AS productPrice, now() AS enriched_at FROM orders o LEFT JOIN users u ON o.userEmail = u.email LEFT JOIN products p ON o.productId = p.productId;
image4.png

The above query combines orders data with users and products data. This works like a CDC snapshot: every run captures the latest state from orders and enriches it with user and product data into orders_enriched_snapshots table

​​This is useful because operational data in systems like MongoDB is often fragmented across multiple collections. By centralizing and enriching it in a single table, you make it easier to run analytical queries, build dashboards, and track historical trends without repeatedly performing expensive joins.

Ready to Power Real-Time Analytics at Scale? Start streaming for free and bring your analytics to life.

Conclusion

Using Estuary Flow with MongoDB and ClickHouse enables seamless real-time analytics without managing Kafka infrastructure or tightly coupling microservices. With just a few configuration steps, you can simulate, ingest, stream, and query ecommerce orders for instant operational insights.

We’ve demonstrated how to:

  • Simulate real-time operational data in MongoDB.
  • Capture changes in near real time using Estuary Flow’s CDC pipeline.
  • Stream changes to ClickHouse for fast, OLAP-style analytics.
  • Enrich and join data from multiple sources 

This architecture solves common microservice pain points—such as inter-service joins, performance bottlenecks, and scaling issues—by centralizing analytics in a purpose-built system. Whether you’re running ecommerce tracking, cross-service reporting, or real-time dashboards, this approach allows your services to remain decoupled and focused while still enabling powerful insights at scale.

FAQs

    No. Estuary Flow eliminates the need to set up or manage Kafka infrastructure. It uses an internal Kafka-compatible engine called Dekaf, which emits change data from MongoDB as Kafka-style topics. These can be consumed directly by ClickHouse via ClickPipes, making the architecture much simpler and faster to deploy.
    Estuary Flow supports capturing data from MongoDB collections in multiple modes: Change Stream Incremental (CDC) – Ideal for real-time streaming from collections that support MongoDB change streams (like normal collections in a replica set or Atlas). Batch Snapshot & Batch Incremental – Suitable for collections that don’t support change streams, such as time-series collections or views. These use polling based on a cursor field. This flexibility ensures you can stream virtually any data structure from MongoDB into ClickHouse.
    You can build an enriched snapshot table in ClickHouse by joining order records with static users and products tables using SQL. This is often done via: Periodic queries that insert into an orders_enriched_snapshots table, or Joins based on shared fields like userEmail and productId. This structure supports fast, centralized analytical queries without needing repeated joins at query time, making it ideal for real-time dashboards or customer-facing analytics features.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Ruhee Shrestha
Ruhee Shrestha Technical Writer

Ruhee has a background in Computer Science and Economics and has worked as a Data Engineer for SaaS providing tech startups, where she has automated ETL processes using cutting-edge technologies and migrated data infrastructures to the cloud with AWS/Azure services. She is currently pursuing a Master’s in Business Analytics with a focus on Operations and AI at Worcester Polytechnic Institute.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.