PostgreSQLpostgresClickHouseDekaf

10 min read

Last updated: June 3, 2025

How to Stream Data from PostgreSQL to ClickHouse in Real Time with CDC

Learn how to stream data from PostgreSQL to ClickHouse in real time using Estuary Flow. This step-by-step guide covers setup, schema handling, ClickPipes integration, and performance tips.

Jeffrey Richman

Stream PostgreSQL to ClickHouse in Real-Time with Estuary

Share this article

As businesses scale, so does the demand for faster, more actionable data. PostgreSQL is a trusted choice for transactional workloads, powering applications, websites, and core business services. But when it comes to running complex analytics on large volumes of data, PostgreSQL can hit performance ceilings fast.

This is where ClickHouse comes in. Designed for high-speed OLAP (Online Analytical Processing), ClickHouse can process billions of rows per second, making it perfect for real-time dashboards, anomaly detection, and operational analytics.

But how do you get your data from PostgreSQL to ClickHouse in real time?
That’s where many teams struggle. Traditional ETL pipelines are batch-based, lag-prone, and challenging to maintain. They lead to stale reports, delayed decisions, and fragile pipelines that break with every schema change.

This guide shows you how to stream data from PostgreSQL to ClickHouse using Estuary Flow, a real-time data operations platform built for streaming-first architectures. With support for change data capture (CDC), Estuary continuously syncs your PostgreSQL database to ClickHouse with minimal latency—no batch jobs, no custom code, and no operational complexity.

By the end of this guide, you’ll know exactly what’s required to set up a Postgres to ClickHouse pipeline, how Estuary handles data capture and streaming behind the scenes, and how to configure each step from source to destination.

Why Sync PostgreSQL with ClickHouse?

PostgreSQL is one of the most reliable and feature-rich databases for transactional workloads. It powers everything from customer-facing apps to internal business systems. But when it comes to running complex analytical queries, especially across large datasets, it starts to show limitations. Aggregations slow down, indexing becomes costly, and query performance degrades as data grows.

ClickHouse fills this gap. It’s an open-source, columnar OLAP database designed for lightning-fast analytics. With features like vectorized execution and efficient compression, ClickHouse can process billions of rows per second with minimal latency.

By syncing data from PostgreSQL to ClickHouse, teams can offload analytical workloads, build real-time dashboards, and unlock insights without impacting transactional performance. Whether you’re tracking events in a SaaS app, analyzing ecommerce behavior, or monitoring financial transactions, combining these two systems provides the best of both worlds: trusted source-of-truth data with high-speed analytics.

Challenges with Traditional Postgres to ClickHouse ETL

Moving data from PostgreSQL to ClickHouse is not a new idea, but doing it well is a different story.

Most traditional ETL (Extract, Transform, Load) approaches are built around batch processing. You schedule periodic jobs to dump data from Postgres, transform it, and load it into ClickHouse. This works—for a while. But as data volumes grow and the need for real-time insights becomes critical, these methods quickly fall short.

Here are the main issues:

Latency: Batch jobs introduce unavoidable lag. Your dashboards are always looking at data that’s minutes—or hours—old.
Operational overhead: Managing extraction scripts, transformation logic, retries, and schema mismatches across two different systems adds complexity and failure points.
Lack of change awareness: Most batch pipelines don’t track incremental changes effectively. You either reprocess everything or risk missing updates and deletes.
Scalability bottlenecks: High-frequency batch loads can overwhelm both the source and the destination, leading to contention and degraded performance.

To build a truly real-time, reliable, and scalable sync from PostgreSQL to ClickHouse, you need a different architecture—one that’s stream-based, change-aware, and built to evolve with your data.

Estuary Flow: Real-Time CDC + Kafka-Compatible ClickHouse Integration

Estuary Flow is a streaming-native platform designed to move data in real time, without the complexity of traditional ETL. At its core, Flow uses Change Data Capture (CDC) to detect and stream row-level changes from databases like PostgreSQL as they happen.

To send this data into ClickHouse, Flow uses a clever approach: it materializes data as Kafka-compatible messages via a component called Dekaf. This makes Flow a seamless bridge between Postgres and ClickHouse, leveraging ClickHouse’s built-in ClickPipes feature to consume data from Kafka topics.

Here’s how the architecture works:

Capture from PostgreSQL
Flow connects directly to your Postgres instance and captures inserts, updates, and deletes in real time using logical replication.
Flow Collection
These change events are stored in an internal, schema-enforced data lake called a collection, which acts as an intermediate layer for reliability and transformation.
Materialize to ClickHouse via Dekaf
The Dekaf connector emits your Flow collection data as Kafka messages. ClickHouse, using ClickPipes, consumes those messages and writes them to native tables for fast querying.
End-to-End Streaming
The entire pipeline—from Postgres to ClickHouse—is continuous, fault-tolerant, and exactly-once (depending on destination configuration).

Whether you’re analyzing user events, financial transactions, or IoT metrics, Estuary Flow offers a low-latency, fully managed pipeline that’s robust and easy to configure.

Streaming Postgres to ClickHouse with Estuary Flow

Before diving into configuration, here’s what you’ll need to set up a real-time data pipeline from PostgreSQL to ClickHouse using Estuary Flow.

Prerequisites

To complete this setup, you’ll need:

A PostgreSQL database (self-hosted or cloud-managed: RDS, Aurora, Cloud SQL, Azure).
- A database user with replication privileges in PostgreSQL.
- Network access from Estuary to your database (via public IP or SSH tunnel).
A ClickHouse Cloud account with ClickPipes enabled.
Estuary Flow access via the web UI or CLI.

Step 1: Create a Flow Collection from PostgreSQL

A range of Postgres source options in Estuary, both self-hosted and cloud-hosted

Select a PostgreSQL Source and fill out the required fields to connect to your database, such as address, user, and password.

Estuary uses CDC to capture changes from your Postgres database and write them to a versioned Flow collection.

Configuration example (YAML):

plaintextcaptures:
  your-org/postgres-capture:
    endpoint:
      connector:
        image: ghcr.io/estuary/source-postgres:dev
        config:
          address: your-db-host:5432
          user: your-db-user
          password: your-db-password
          database: your-db-name
    bindings:
      - resource:
          table: public.orders
        target: your-org/orders

Key Points:

You don’t need to pre-create Flow collections—publishing this capture will auto-generate them.
Flow supports field-level schema enforcement and handles reserved words automatically.
Logical replication must be enabled in your Postgres settings.
You can check the docs for help with specific configurations, such as Google Cloud SQL for Postgres or Neon PostgreSQL.

Step 2: Materialize Your Collection to ClickHouse via Dekaf

Dekaf-backed ClickHouse destination connector

Select the ClickHouse Dekaf Destination and link your Postgres collection(s).

Flow materializes data to ClickHouse using Dekaf, which emits Kafka-compatible topics that ClickPipes in ClickHouse can consume.

Configuration example (YAML):

plaintextmaterializations:
  your-org/clickhouse-mat:
    endpoint:
      dekaf:
        config:
          token: your-auth-token
          strict_topic_names: false
          deletions: kafka
        variant: clickhouse
    bindings:
      - resource:
          topic_name: orders
        source: your-org/orders

Key Points:

Set a secure token; this will be used by ClickHouse to authenticate.
Use the clickhouse variant to help keep your Dekaf materializations organized.
Each Flow collection you want to sync must be bound to a corresponding Kafka topic.

Step 3: Connect ClickHouse ClickPipes to Flow

Now that your Kafka-compatible topics are live via Estuary’s Dekaf connector, it’s time to link them to ClickHouse using ClickPipes.

In your ClickHouse Cloud dashboard:

Go to Integrations, and select Apache Kafka as your data source.
When prompted for connection details:
- Use dekaf.estuary-data.com:9092 as the broker address.
- Set the schema registry URL to https://dekaf.estuary-data.com.
- Choose SASL_SSL for the security protocol.
- Set the SASL mechanism to PLAIN.
- For both the SASL username and schema registry username, use the full name of your Estuary materialization (e.g., your-org/clickhouse-mat).
- For the password, enter the same authentication token you configured in the Dekaf materialization.
Once connected, ClickHouse will prompt you to map the incoming fields to your target table schema. Use the mapping interface to align Flow fields with ClickHouse columns.
Save and activate the ClickPipe. Within seconds, data will begin streaming from PostgreSQL into ClickHouse in real time, without manual intervention.

Migrating Data into ClickHouse With Estuary

Key Features and Benefits

Estuary Flow isn’t just a faster way to move data—it’s a smarter, more resilient approach to real-time pipelines. By bridging PostgreSQL and ClickHouse through CDC and Kafka-compatible messaging, Flow offers a robust set of features that solve the most common pain points in analytics infrastructure.

Real-Time Change Data Capture

Flow captures inserts, updates, and deletes from PostgreSQL the moment they happen—there is no polling or periodic syncs. This enables you to power dashboards, anomaly detection, and alerts with always-fresh data.

ClickHouse-Native Streaming

With Dekaf connectors, Flow emits fully compatible Kafka messages that plug directly into ClickHouse ClickPipes. No extra services or Kafka brokers are needed—Flow handles the hard parts.

Schema Enforcement and Evolution

Flow collections are backed by JSON schemas, so you always know what your data looks like. When your upstream schema changes, Flow helps you manage evolution gracefully without breaking downstream pipelines.

Exactly-Once Delivery Semantics

Flow guarantees at-least-once delivery by default, and supports exactly-once semantics depending on your destination configuration. This ensures consistency in high-volume pipelines without the risk of duplication.

Delta Updates for Efficiency

PostgreSQL materializations can optionally use delta updates, which reduce write amplification by updating only changed fields—especially useful for high-churn tables.

Flexible Deployment Options

Run Flow as a fully managed SaaS, deploy in your own cloud (BYOC), or use a private deployment model to meet compliance and control needs.

Production-Ready Monitoring

Flow integrates with Prometheus via its OpenMetrics API, so you can track latency, throughput, error rates, and more—no guesswork required.

Real-World Use Case: E-commerce Order Analytics

Imagine you're running an e-commerce platform where every transaction is recorded in a PostgreSQL database. Your operations team wants a real-time dashboard that shows order volume, revenue trends, top-selling products, and customer activity across regions—updated every few seconds.

Here’s how Estuary Flow makes that possible:

Source: PostgreSQL

New orders, updates to shipping status, and cancellations are continuously logged in a public.orders table. Instead of relying on nightly ETL jobs, you capture this data in real time using Flow’s Postgres connector.

Stream: Estuary Flow Collection

As changes occur, they’re streamed into a Flow collection with schema enforcement and versioning. You don’t have to manage storage, transformation, or failover—Flow handles it for you.

Destination: ClickHouse via ClickPipes

The collection is materialized into ClickHouse through Flow’s Kafka-compatible Dekaf connector. ClickHouse consumes these records using ClickPipes and inserts them into an analytics-optimized table.

Outcome: Real-Time Visibility

Now your BI dashboard is powered by ClickHouse’s ultra-fast queries, with data that’s seconds old, not hours. You can monitor conversions, detect stockouts, or adjust promotions dynamically—all without putting load on your transactional database.

This setup gives your team the analytical agility of ClickHouse with the trusted source-of-truth integrity of PostgreSQL—and it’s built entirely on streaming infrastructure.

Conclusion

Syncing PostgreSQL to ClickHouse no longer requires brittle batch pipelines, custom Kafka deployments, or hours of engineering work. With Estuary Flow, you get a fully-managed, streaming-first solution that brings transactional data into ClickHouse in real time, with exactly-once guarantees, built-in schema management, and seamless compatibility via ClickPipes.

Whether you're building operational dashboards, powering real-time analytics, or simply offloading queries from Postgres, Estuary Flow makes it easy to modernize your data stack.

Ready to stream from Postgres to ClickHouse in minutes? Try Estuary Flow and see what real-time really looks like.

FAQs

1. What’s the best way to sync PostgreSQL to ClickHouse in real time?

The most efficient way to sync PostgreSQL to ClickHouse in real time is by using Estuary Flow. It leverages change data capture (CDC) to detect row-level changes and streams them to ClickHouse via a Kafka-compatible interface, with minimal setup and no batch jobs.

2. Can I use ClickHouse with Kafka without managing Kafka infrastructure?

Yes. Estuary Flow offers a built-in Kafka-compatible connector called Dekaf. It emits data from Flow collections as Kafka topics, which ClickHouse can consume using ClickPipes, without requiring you to host Kafka yourself.

3. How does Estuary Flow handle schema changes in PostgreSQL?

Estuary Flow uses JSON schema enforcement on its collections. When your source schema evolves, Flow can accommodate changes like new fields or modified types with minimal intervention, reducing the risk of broken pipelines.

Share this article

Table of Contents

Start Building For Free

About the author

Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.