Estuary

How to Replicate MySQL to ClickHouse in Real Time Using CDC (In Minutes)

Set up real-time MySQL to ClickHouse replication using Change Data Capture (CDC) and Estuary Flow—no Kafka, no batch jobs, and no custom ETL. Learn how to build a low-latency pipeline in minutes.

Blog post hero image
Share this article

From OLTP to OLAP: Syncing MySQL to ClickHouse for Real-Time Analytics

MySQL is a workhorse of transactional systems. It's simple, stable, and widely used, powering everything from e-commerce checkouts to SaaS user activity. But while it's great for reads and writes at scale, it's not exactly built for deep analytics.

As soon as teams start asking complex questions—“What are our top-selling products by region over the past year?” or “How is user behavior trending in real time?”—MySQL starts showing its limits. Long-running queries clash with day-to-day operations. Read replicas help for a while, but they come with lag and cost. Eventually, you're left with one option: move your analytical workloads elsewhere.

Enter ClickHouse.

ClickHouse is a high-performance, columnar database designed for lightning-fast analytics. It’s optimized for big scans, aggregations, and time-series analysis—everything MySQL wasn’t designed to do.

The real challenge? Getting your MySQL data into ClickHouse continuously, reliably, and without duct-taped ETL scripts.

That’s where Estuary Flow comes in. With built-in support for MySQL change data capture (CDC) via the binary log, Flow lets you replicate data from MySQL to ClickHouse in real time—no code, no Kafka, no hassle.

In this guide, we’ll walk you through:

  • Why MySQL isn't ideal for large-scale analytics
  • How ClickHouse delivers speed and flexibility at analytical scale
  • How to set up real-time MySQL to ClickHouse replication using Estuary Flow

If you're struggling to get fresh insights without slowing down your transactional database, this is the sync you've been looking for.

Why MySQL Isn’t Built for Modern Analytics

MySQL excels at transactional workloads—fast inserts, updates, and simple lookups. It’s a great choice for powering applications, but once you start running complex analytical queries, cracks begin to show.

As data grows, so do the pain points:

  • Aggregations and joins become slow
  • Dashboards lag or timeout
  • Read replicas struggle to keep up
  • Analytical queries interfere with production traffic

You might try exporting data on a schedule or standing up read-only replicas, but those are short-term fixes. Batch ETL adds lag, and scaling vertically only buys time.

MySQL’s row-based storage, lack of vectorized execution, and tight coupling of reads and writes make it fundamentally limited for analytics. You end up choosing between up-to-date insights and application performance.

The truth is, MySQL wasn’t designed for large-scale, real-time analysis. If you want fast, flexible insights without slowing down production, you need a dedicated analytics engine.

That’s where ClickHouse comes in.

Why ClickHouse Complements MySQL

You don’t need to replace MySQL to improve analytics—you just need to offload the parts it wasn’t built for.

MySQL is excellent at handling operational data: orders, logins, transactions, and user updates. It ensures consistency, supports ACID properties, and powers mission-critical applications. That role shouldn't change.

ClickHouse, on the other hand, is built for speed and scale, perfect for analytical queries, dashboards, and real-time monitoring. It doesn’t compete with MySQL; it enhances it.

By syncing data from MySQL to ClickHouse, you get the best of both worlds:

  • MySQL remains the source of truth, optimized for high-volume transactions
  • ClickHouse becomes your analytics layer, tuned for exploration, aggregation, and speed

This separation of concerns reduces stress on production systems and opens the door to rich, real-time insights, without slowing down your apps or waiting on batch jobs.

How to Replicate MySQL to ClickHouse Using CDC and Estuary Flow

Traditionally, syncing data from MySQL to an analytics system like ClickHouse meant building custom ETL jobs, running batch exports, or deploying Kafka. These approaches often involve high latency, fragile scripts, and significant operational overhead.

ClickHouse even offers its own MySQL integration for simple syncs between the two systems—but it currently only supports SELECTs and INSERTs of your MySQL data from within ClickHouse. This isn’t the automated replication we’re looking for. After all, we’re trying to move our complex SELECT queries out of MySQL entirely.

Thankfully, there’s a solution.

Estuary Flow lets you stream data from MySQL to ClickHouse in real time—no code, no Kafka, and no manual sync logic. It does this by leveraging MySQL’s binary log (binlog), which captures every row-level change as it happens. This process is known as Change Data Capture (CDC).

Here’s how it works with Flow:

  1. Flow connects to your MySQL database and reads directly from the binlog, capturing inserts, updates, and deletes in real time
  2. It transforms those changes into structured events and stores them in versioned, schema-aware collections
  3. Using Flow’s Dekaf module, those collections are exposed as Kafka-compatible streams, which ClickHouse can consume directly via ClickPipes in real-time

This architecture gives you a continuous pipeline that’s:

  • Low-latency — Events reach ClickHouse seconds after they occur in MySQL
  • Resilient — Built-in backfill, exactly-once or at-least-once delivery
  • Schema-aware — Handles changes gracefully and enforces structure
  • Fully managed — Flow handles orchestration, monitoring, and fault tolerance

Whether you’re using self-hosted MySQLAmazon RDSAurora, or Cloud SQL, Estuary Flow supports it—and makes syncing to ClickHouse seamless.

Step-by-Step: Sync MySQL to ClickHouse with Estuary Flow

You can set up a fully managed, real-time replication pipeline from MySQL to ClickHouse in just a few steps using Estuary Flow—no Kafka, no scripts, and no infrastructure to manage.

Here’s how to do it:

Step 1: Prepare Your MySQL Environment for CDC

Before connecting to Flow, make sure your MySQL instance is configured for Change Data Capture using the binary log (binlog).

✅ Enable binlog with ROW format

plaintext
SET GLOBAL binlog_format = 'ROW';

✅ Set binlog retention (recommended: 7+ days)

plaintext
SET PERSIST binlog_expire_logs_seconds = 604800;

✅ Create a capture user with replication access

plaintext
CREATE USER IF NOT EXISTS 'flow_capture'@'%' IDENTIFIED BY 'your_password'; GRANT REPLICATION CLIENT, REPLICATION SLAVE, SELECT ON *.* TO 'flow_capture';

✅ Set the time zone, if using DATETIME fields, to avoid conversion issues:

plaintext
SET PERSIST time_zone = 'America/New_York'; -- Or your region

If you're using Amazon RDSAuroraGoogle Cloud SQL, or Azure MySQL, follow the platform-specific setup steps to enable binlog access and networking.

Step 2: Capture Data from MySQL in Estuary Flow

Various MySQL capture connectors you can use as Estuary sources
  1. Log into Estuary Flow
  2. Go to Sources > + New Source
  3. Select MySQL from the connector catalog and click Capture

Now configure your connection:

  • Name: A unique name like mysql_orders_capture
  • Data Plane: Choose your preferred processing region
  • Server Address: Your MySQL host (e.g., db.example.com:3306)
  • Username / Password: The flow_capture credentials

If your database is behind a VPC or firewall, configure SSH forwarding via the Network Tunnel section.

Click Next, test the connection, and proceed.

Step 3: Select Tables and Define Collections

Once connected, Estuary Flow auto-detects CDC-enabled tables.

  1. Select one or more tables you want to replicate
  2. For any table without a primary key, manually assign a collection key
  3. Review and customize the schema (optional)

Flow will generate versioned collections for each table, schema-enforced and ready for downstream streaming.

Click Publish to deploy your MySQL capture.

Step 4: Configure ClickHouse as the Destination

Dekaf-backed ClickHouse destination connector
  1. Go to Destinations > + New Destination
  2. Search for ClickHouse and choose Materialization
  3. Enter your destination settings:
    • Name: e.g., clickhouse_orders_sync
    • Data Plane: Should match your capture’s region
    • Auth Token: Set a secure token — used by ClickHouse to authenticate
  4. Under Source Collections, click Modify and link your MySQL collections

Click Publish to finalize the materialization.

Behind the scenes, Flow uses Dekaf to expose collections as Kafka-style topics, ready for ClickHouse to consume—no Kafka cluster needed.

Step 5: Connect ClickHouse to Estuary via ClickPipes

In your ClickHouse Cloud UI:

  1. Go to Integrations → ClickPipes
  2. Add a new Kafka pipe with the following:
    • Brokerdekaf.estuary-data.com:9092
    • ProtocolSASL_SSL
    • SASL MechanismPLAIN
    • Username: The full Flow materialization name (e.g., your-org/clickhouse_orders_sync)
    • Password: The auth token you set in Step 4
    • Schema Registry URLhttps://dekaf.estuary-data.com
  3. Select the topics (1 per table)
  4. Map the fields to your ClickHouse schema
  5. Save and activate the pipe

Within seconds, your MySQL data will begin streaming into ClickHouse in real time.

Migrating Data into ClickHouse With Estuary

3 Reasons to Stream MySQL to ClickHouse

Setting up a real-time sync from MySQL to ClickHouse isn’t just about modernizing your stack—it directly improves performance, visibility, and engineering velocity.

Here’s why teams are making the switch:

1. Offload Analytics Without Harming Transactions

Running analytical queries on your primary MySQL instance adds risk: slowdowns, replication lag, and blocked transactions. By replicating data to ClickHouse, you isolate your production database from reporting workloads, keeping apps fast and users happy.

ClickHouse becomes your dedicated analytics layer, optimized for heavy reads and complex queries. MySQL stays lean and focused on serving application traffic.

2. Real-Time Insights Without ETL Overhead

Batch pipelines and scheduled exports always lag behind reality. With Estuary Flow and MySQL CDC, changes stream into ClickHouse seconds after they happen. No more waiting for hourly jobs or stale dashboards—your analytics are always current.

Whether you're tracking user events, product metrics, or operational KPIs, ClickHouse gives you real-time visibility at scale.

3. No Kafka, No Scripts, No Headaches

Most change data pipelines require stitching together Kafka, custom consumers, and monitoring glue. With Flow, the entire process is fully managed: from initial backfill to streaming sync.

You configure it once, and it just works. No brokers to maintain, no retries to debug, no custom code to fix when schemas evolve.

Final Thoughts: Replicate MySQL to ClickHouse the Right Way

MySQL was built for transactions. ClickHouse was built for analytics. When you connect them using real-time replication, your architecture becomes both reliable and insightful.

With Estuary Flow, you can stream changes from MySQL to ClickHouse in minutes, without writing code, deploying Kafka, or managing brittle pipelines. The result? Real-time insights, clean separation of workloads, and a data stack that actually scales.

Next Steps

FAQs

    The best way to replicate MySQL to ClickHouse is by using a CDC-powered platform like Estuary Flow. It captures real-time changes from MySQL’s binary log and streams them into ClickHouse with schema enforcement, low latency, and no need for Kafka or custom code.
    Tools like Estuary Flow are purpose-built for real-time MySQL to ClickHouse replication using Change Data Capture (CDC). Unlike traditional ETL tools, Flow offers continuous sync, automatic backfills, and full schema awareness without requiring manual orchestration or Kafka infrastructure.
    Yes, you can build a manual pipeline using cron jobs, mysqldump, custom scripts, or batch ETL processes. However, this approach often leads to data lag, broken schemas, and high maintenance, making it unsuitable for real-time analytics or production workloads.
    Absolutely. CDC (Change Data Capture) allows you to stream only the row-level changes from MySQL, significantly reducing load and latency. It’s far more efficient than batch ETL and is essential for building real-time, scalable analytics pipelines with ClickHouse.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Team Estuary
Team EstuaryEstuary Editorial Team

Team Estuary is a group of engineers, product experts, and data strategists building the future of real-time and batch data integration. We write to share technical insights, industry trends, and practical guides.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.