ClickHousemongodb

11 min read

Last updated: July 3, 2025

MongoDB to ClickHouse: Real-Time Ecommerce Analytics with Estuary

Learn how to stream real-time ecommerce order data from MongoDB to ClickHouse using Estuary Flow. No Kafka required. Build dashboards, enrich data, and scale analytics fast.

Ruhee Shrestha Technical Writer

MongoDB to ClickHouse - Real-time Ecommerce Analytics

Share this article

Imagine a scenario where real-time ecommerce orders data is being ingested into MongoDB. While MongoDB is excellent for fast writes and flexible data models, it’s not ideal for real-time analytics and gathering business insights such as tracking the number of orders per minute, identifying top-selling products, or understanding user behavior. MongoDB isn’t designed for large-scale analytical workloads. In many cases, services must fetch and process data from one another, leading to significant limitations.

To solve this, modern architectures adopt centralized data warehousing combined with event-driven streaming. MongoDB can continue to handle real-time operational data, while ClickHouse, an OLAP-optimized engine, can handle fast, scalable analytics. Traditionally, this would require setting up Kafka to handle the pipeline. Estuary Flow simplifies this process by using its internal system called Dekaf, which emits change data as Kafka-compatible topics without the need to manage Kafka infrastructure.

In this tutorial, we’ll simulate an ecommerce order tracking scenario, where we ingest mock data into MongoDB and stream it to ClickHouse using Estuary Flow. We’ll then query the live data for analytics such as tracking order status, product demand, and user activity.

Why Sync MongoDB Data to ClickHouse for Real-Time Ecommerce Analytics?

MongoDB is widely used for application development due to its flexible schema and quick writes, making it suitable for use cases like event tracking, user profiles, and content management. However, it's not optimized for complex analytical queries at scale. Once data volume increases, MongoDB’s performance for aggregations and joins becomes limited and often requires costly scaling.

ClickHouse is a columnar OLAP database built for fast analytical workloads. It offers:

Real-time analytics on billions of rows
Advanced aggregations and filtering
Compression and optimized reads

By syncing MongoDB to ClickHouse, you get the best of both worlds: MongoDB for real-time data ingestion and application logic, and ClickHouse for scalable analytics.

This combination enables use cases like:

Real-time dashboards powered by ClickHouse without overloading operational systems
Scalable product, customer, or behavioral analytics built on top of fast-ingested operational data
Offloading complex, resource-heavy queries from MongoDB to a purpose-built engine
Customer-facing analytics features, where speed and responsiveness are critical
Cross-service joins and aggregations, where centralized analysis removes the need for interdependent microservice calls

Streaming MongoDB to ClickHouse using Estuary Flow’s Dekaf

Flow connects directly to your MongoDB cluster and captures data in real time using change streams—MongoDB’s native mechanism for watching inserts, updates, and deletes. It then delivers those changes to ClickHouse as structured, schema-enforced records, ready for analytics.

Here’s what makes Flow ideal for this sync:

Change Data Capture with no extra tooling - Flow taps directly into MongoDB change streams, capturing live updates as they happen. If your deployment doesn’t support change streams (like time-series collections), Flow can use batch modes with customizable polling.
Kafka-style delivery without Kafka - Flow uses its internal Dekaf system to emit changes as Kafka-compatible topics. These topics can be consumed by ClickHouse using ClickPipes, with no need to deploy or manage Kafka infrastructure.
Automatic backfill and live sync - When you first set up the pipeline, Flow performs a historical load to ensure ClickHouse has a complete dataset. After that, it streams changes continuously, so your analytics stay up to date.
Schema-aware transformation - MongoDB documents are flexible, but analytics tools like ClickHouse require consistent structure. Estuary Flow infers schemas from your data, flattens nested fields where needed, and keeps track of changes to ensure reliability over time.
Resilience and observability - Built-in retries, state tracking, and logs make the pipeline production-ready. Flow automatically resumes from where it left off if interrupted, and helps you monitor pipeline health through the UI or API.

Real World Use Case: E-commerce Orders Tracking

In ecommerce, customers expect real-time updates on their order status. With a CDC pipeline, each status change—"placed," "shipped," "delivered"—is streamed instantly from MongoDB to ClickHouse, where analytical queries can be run to visualize fulfillment progress.

Architecture Diagram - Real-time streaming from MongoDB to ClickHouse

Step by Step Guide: Real-Time Ecommerce Order Analytics: MongoDB to ClickHouse with Estuary Flow

Pre-requisites

To follow the guide, you will need:

MongoDB Atlas Cluster
- Enable change streams. If you’re using MongoDB Atlas or a self-managed replica set, ensure change streams are enabled. This allows Flow to capture real-time inserts, updates, and deletes.
- Allow network access. If you’re using a managed cluster, allowlist Estuary’s IP addresses or set up an SSH tunnel so Flow can securely reach your database.
Any Python IDE: Used to simulate mock ecommerce order data and insert it into MongoDB.
Estuary Flow Account: Go to the Flow web app on dashboard.estuary.com to sign up for the free tier.
ClickHouse Cloud Account: You can get a 30 days trial

Step 1: Create Python Script to Simulate Real Time Ecommerce Orders Data

python
from pymongo import MongoClient
from datetime import datetime
import random
import time
import uuid
from datetime import datetime, timezone

conn=f'mongodb+srv://<YOUR_CREDENTIAL>@<YOUR_CLUSTER>.XXX.mongodb.net/'

client = MongoClient(conn)
db = client["ecommerce"]
orders_col = db["orders"]
users = list(db["users"].find())
products = list(db["products"].find())

status_flow = ["placed", "packed", "shipped", "out_for_delivery", "delivered"]
active_orders = {}

def create_order():
    user = random.choice(users)
    product = random.choice(products)
    order_id = "ORD" + str(uuid.uuid4())[:8].upper()
    now = datetime.now(timezone.utc)

    order = {
        "orderId": order_id,
        "userEmail": user["email"],
        "productId": product["productId"],
        "status": "placed",
        "events": [{"status": "placed", "timestamp": now}],
        "lastModified": now
    }

    orders_col.insert_one(order)
    active_orders[order_id] = 1  # next stage index
    print(f"[Insert] {order_id} placed by {user['email']} - {product['name']}")

def progress_order():
    for order_id in list(active_orders.keys()):
        doc = orders_col.find_one({"orderId": order_id})
        stage = active_orders[order_id]
        if stage >= len(status_flow):
            del active_orders[order_id]
            continue

        new_status = status_flow[stage]
        now = datetime.now(timezone.utc)

        orders_col.update_one(
            {"orderId": order_id},
            {
                "$set": {"status": new_status, "lastModified": now},
                "$push": {"events": {"status": new_status, "timestamp": now}}
            }
        )
        print(f"[Update] {order_id} → {new_status}")
        active_orders[order_id] += 1

if __name__ == "__main__":
    while True:
        if random.random() < 0.6:
            create_order()
        else:
            progress_order()
        time.sleep(30)

This Python script simulates real-time ecommerce order processing by:

Continuously inserting and updating order data in a MongoDB collection.
Progresses existing orders through various delivery stages - such as “packed,” “shipped,” and “delivered”—updating the database and appending new status events.
Runs forever, alternating between inserting a new order (60% chance) and updating existing ones (40% chance).
Sleeps for 30 seconds between each action to simulate real-time behavior.
This setup is ideal for testing real-time data pipelines or analytics systems that rely on continuous data streams.

Output in MongoDB Collection:

I have also added products, and users data, that is static for this example.

Step 2: Setup MongoDB as a Source

Move Data from MongoDB to ClickHouse - Setup MongoDB as a Source

Now that your MongoDB cluster is prepared, you can connect it to Estuary Flow to begin capturing data into real-time collections.

Go to your dashboard, and click on Sources, and search for MongoDB.

Enter the Following MongoDB connection details:

Name: Give your source a meaningful name like ecommerce_orders
Address: Provide the host and port of your MongoDB instance. For Atlas, use the full mongodb+srv://... URI. If your credentials authenticate via the admin DB, be sure to append ?authSource=admin.
Username & Password: Enter the database user's credentials with read access to the collections you want to capture.
Database: You can optionally specify which databases to include.
Configure network access - Whitelist Estuary’s IP Addresses
After completing the fields, click Next. Flow will test the connection to ensure credentials are valid and that it can access the cluster. If successful, you’ll move on to selecting collections for capture.

Step 3: Choose Collections and Configure Capture Mode

Next, choose which MongoDB collections you want to stream. Estuary Flow allows you to select collections from one or more databases and configure how each should be captured.

For each collection, you’ll need to select a capture mode:

Change Stream Incremental – Recommended for real-time data sync. This uses MongoDB’s native change streams to continuously capture inserts, updates, and deletes.
Batch Incremental or Batch Snapshot – Suitable for collections that don’t support change streams (e.g., views or time-series collections). In this mode, Flow periodically polls for changes using a specified cursor field (usually _id or a timestamp).

In this tutorial, we’ll use incremental streaming for the orders, products, and users collections.

After configuring the capture mode for each collection, click Publish. Estuary Flow will begin backfilling historical data from MongoDB, and then automatically switch to real-time streaming as new changes occur.

Click Publish to deploy your source. Flow will begin backfilling the selected collections and automatically start streaming new changes as they occur.

Step 4: Stream to ClickHouse via Kafka-Compatible Materialization

In the Flow dashboard, go to Destinations and click + New Destination
Choose ClickHouse from the connector catalog
Click Materialize

Fill in the configuration details:

Name the materialization (e.g., order_tracking)
Choose the same Data Plane region used for your MongoDB source
Set an Auth Token that ClickHouse will use to authenticate with Flow
Under Source Collections, select the MongoDB-backed collections you captured in Step 3.
Review the field mappings and click Publish.

Here is the Collections:

Materialization Details:

Migrating Data into ClickHouse With Estuary

Step 5: Connect to Clickhouse via Clickpipe

Now it’s time to consume the data stream inside ClickHouse.

In the ClickHouse Cloud dashboard

Select the Data Source as Kafka
Add a new Kafka Pipe with these connection settings:

Broker: dekaf.estuary-data.com:9092
Protocol: SASL_SSL
SASL Mechanism: PLAIN
Username: Your full Flow materialization name (e.g., your-org/order-tracking/dekaf-clickhouse)
Password: The Auth Token from Step 4
Schema Registry URL: https://dekaf.estuary-data.com

Select the topics that match your Flow collections

Choose topic: orders
Offset: From Beginning or Timestamp

ClickHouse will now start ingesting your MongoDB data in real time.

You can see your metrics once it successfully connects:

Step 6: Query your Data for Real Time Analytics in ClickHouse

Here is the loaded Orders table:

Define a ClickHouse table to receive streaming order data.

pythonCREATE OR REPLACE TABLE orders_final (
  orderId String,
  userEmail String,
  productId String,
  status String,
  lastModified DateTime,
  _topic String
) ENGINE = ReplacingMergeTree(lastModified)
ORDER BY orderId;

select * from orders_final;

2. Create Materialized View for Snapshots:

pythonALTER TABLE orders_final
ADD COLUMN current_timestamp DateTime DEFAULT now();

ALTER TABLE orders_final
ADD COLUMN count_orders UInt32 DEFAULT 1;

CREATE MATERIALIZED VIEW events_snapshot_mv
TO orders_final
AS
SELECT
  *,
  now() AS current_timestamp,
  1 AS count_orders
FROM orders;

SELECT * FROM events_snapshot_mv;

Events Snapshot MV Table:

To show how many orders were processed per minute using the snapshot view:

pythonSELECT
  toStartOfMinute(current_timestamp) AS minute,
  sum(count_orders) AS total_orders
FROM events_snapshot_mv
GROUP BY minute
ORDER BY minute DESC;

Total Orders by current timestamp:

Step 7: Create Enriched Snapshot Table:

pythonCREATE TABLE orders_enriched_snapshots (
    orderId String,
    status String,
    lastModified DateTime,
    userEmail String,
    userName String,
    productId String,
    productName String,
    productCategory String,
    productPrice Float64,
    enriched_at DateTime
)
ENGINE = MergeTree
ORDER BY (orderId, enriched_at);

INSERT INTO orders_enriched_snapshots
SELECT
    o.orderId,
    o.status,
    o.lastModified,
    o.userEmail,
    u.name AS userName,
    o.productId,
    p.name AS productName,
    p.category AS productCategory,
    p.price AS productPrice,
    now() AS enriched_at
FROM orders o
LEFT JOIN users u ON o.userEmail = u.email
LEFT JOIN products p ON o.productId = p.productId;

The above query combines orders data with users and products data. This works like a CDC snapshot: every run captures the latest state from orders and enriches it with user and product data into orders_enriched_snapshots table

This is useful because operational data in systems like MongoDB is often fragmented across multiple collections. By centralizing and enriching it in a single table, you make it easier to run analytical queries, build dashboards, and track historical trends without repeatedly performing expensive joins.

Ready to Power Real-Time Analytics at Scale? Start streaming for free and bring your analytics to life.

Conclusion

Using Estuary Flow with MongoDB and ClickHouse enables seamless real-time analytics without managing Kafka infrastructure or tightly coupling microservices. With just a few configuration steps, you can simulate, ingest, stream, and query ecommerce orders for instant operational insights.

We’ve demonstrated how to:

Simulate real-time operational data in MongoDB.
Capture changes in near real time using Estuary Flow’s CDC pipeline.
Stream changes to ClickHouse for fast, OLAP-style analytics.
Enrich and join data from multiple sources

This architecture solves common microservice pain points—such as inter-service joins, performance bottlenecks, and scaling issues—by centralizing analytics in a purpose-built system. Whether you’re running ecommerce tracking, cross-service reporting, or real-time dashboards, this approach allows your services to remain decoupled and focused while still enabling powerful insights at scale.

FAQs

Do I need to set up Kafka to stream MongoDB data to ClickHouse with Estuary Flow?

No. Estuary Flow eliminates the need to set up or manage Kafka infrastructure. It uses an internal Kafka-compatible engine called Dekaf, which emits change data from MongoDB as Kafka-style topics. These can be consumed directly by ClickHouse via ClickPipes, making the architecture much simpler and faster to deploy.

What types of MongoDB collections can I capture with Estuary Flow?

Estuary Flow supports capturing data from MongoDB collections in multiple modes: Change Stream Incremental (CDC) – Ideal for real-time streaming from collections that support MongoDB change streams (like normal collections in a replica set or Atlas). Batch Snapshot & Batch Incremental – Suitable for collections that don’t support change streams, such as time-series collections or views. These use polling based on a cursor field. This flexibility ensures you can stream virtually any data structure from MongoDB into ClickHouse.

How can I enrich MongoDB order data with user and product info in ClickHouse?

You can build an enriched snapshot table in ClickHouse by joining order records with static users and products tables using SQL. This is often done via: Periodic queries that insert into an orders_enriched_snapshots table, or Joins based on shared fields like userEmail and productId. This structure supports fast, centralized analytical queries without needing repeated joins at query time, making it ideal for real-time dashboards or customer-facing analytics features.

Share this article

Table of Contents

Start Building For Free

About the author

Ruhee Shrestha Technical Writer

Ruhee has a background in Computer Science and Economics and has worked as a Data Engineer for SaaS providing tech startups, where she has automated ETL processes using cutting-edge technologies and migrated data infrastructures to the cloud with AWS/Azure services. She is currently pursuing a Master’s in Business Analytics with a focus on Operations and AI at Worcester Polytechnic Institute.

MongoDB to ClickHouse: Real-Time Ecommerce Analytics with Estuary

Why Sync MongoDB Data to ClickHouse for Real-Time Ecommerce Analytics?

Streaming MongoDB to ClickHouse using Estuary Flow’s Dekaf

Real World Use Case: E-commerce Orders Tracking

Step by Step Guide: Real-Time Ecommerce Order Analytics: MongoDB to ClickHouse with Estuary Flow

Pre-requisites

Step 1: Create Python Script to Simulate Real Time Ecommerce Orders Data

Step 2: Setup MongoDB as a Source

Step 3: Choose Collections and Configure Capture Mode

Step 4: Stream to ClickHouse via Kafka-Compatible Materialization

Step 5: Connect to Clickhouse via Clickpipe

Step 6: Query your Data for Real Time Analytics in ClickHouse

Step 7: Create Enriched Snapshot Table:

Conclusion

FAQs

Do I need to set up Kafka to stream MongoDB data to ClickHouse with Estuary Flow?

What types of MongoDB collections can I capture with Estuary Flow?

How can I enrich MongoDB order data with user and product info in ClickHouse?

Start streaming your data for free

About the author

Related Articles

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.

MongoDB to ClickHouse: Real-Time Ecommerce Analytics with Estuary

Why Sync MongoDB Data to ClickHouse for Real-Time Ecommerce Analytics?

Streaming MongoDB to ClickHouse using Estuary Flow’s Dekaf

Real World Use Case: E-commerce Orders Tracking

Step by Step Guide: Real-Time Ecommerce Order Analytics: MongoDB to ClickHouse with Estuary Flow

Pre-requisites

Step 1: Create Python Script to Simulate Real Time Ecommerce Orders Data

Step 2: Setup MongoDB as a Source

Step 3: Choose Collections and Configure Capture Mode

Step 4: Stream to ClickHouse via Kafka-Compatible Materialization

Step 5: Connect to Clickhouse via Clickpipe

Step 6: Query your Data for Real Time Analytics in ClickHouse

Step 7: Create Enriched Snapshot Table:

Conclusion

FAQs

Do I need to set up Kafka to stream MongoDB data to ClickHouse with Estuary Flow?

What types of MongoDB collections can I capture with Estuary Flow?

How can I enrich MongoDB order data with user and product info in ClickHouse?

Start streaming your data for free

About the author

Related Articles

PostgreSQL to ClickHouse: Real-Time Streaming with CDC

ClickHouse vs Snowflake: Key Differences, Performance & Pricing Compared

How to Integrate Salesforce with ClickHouse in Real Time (No Code, No Kafka)

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.