Estuary

PostgreSQL to ClickHouse: Real-Time Streaming with CDC

Stream PostgreSQL data to ClickHouse with CDC for real-time dashboards and analytics. Step-by-step guide to building a fast, reliable pipeline.

Stream PostgreSQL to ClickHouse in Real-Time with Estuary
Share this article

As businesses scale, so does the demand for faster, more actionable data. PostgreSQL is a trusted choice for transactional workloads, powering applications, websites, and core business services. But when it comes to running complex analytics on large volumes of data, PostgreSQL can hit performance ceilings fast.

This is where ClickHouse comes in. Designed for high-speed OLAP (Online Analytical Processing), ClickHouse can process billions of rows per second, making it perfect for real-time dashboards, anomaly detection, and operational analytics.

But how do you get your data from PostgreSQL to ClickHouse in real time?
That’s where many teams struggle. Traditional ETL pipelines are batch-based, lag-prone, and challenging to maintain. They lead to stale reports, delayed decisions, and fragile pipelines that break with every schema change.

This guide shows you how to stream data from PostgreSQL to ClickHouse using Estuary Flow, a real-time data operations platform built for streaming-first architectures. With support for change data capture (CDC), Estuary continuously syncs your PostgreSQL database to ClickHouse with minimal latency—no batch jobs, no custom code, and no operational complexity.

By the end of this guide, you’ll know exactly what’s required to set up a Postgres to ClickHouse pipeline, how Estuary handles data capture and streaming behind the scenes, and how to configure each step from source to destination.

Why Sync PostgreSQL with ClickHouse?

PostgreSQL is one of the most reliable and feature-rich databases for transactional workloads. It powers everything from customer-facing apps to internal business systems. But when it comes to running complex analytical queries, especially across large datasets, it starts to show limitations. Aggregations slow down, indexing becomes costly, and query performance degrades as data grows.

ClickHouse fills this gap. It’s an open-source, columnar OLAP database designed for lightning-fast analytics. With features like vectorized execution and efficient compression, ClickHouse can process billions of rows per second with minimal latency.

By syncing data from PostgreSQL to ClickHouse, teams can offload analytical workloads, build real-time dashboards, and unlock insights without impacting transactional performance. Whether you’re tracking events in a SaaS app, analyzing ecommerce behavior, or monitoring financial transactions, combining these two systems provides the best of both worlds: trusted source-of-truth data with high-speed analytics.

Challenges with Traditional Postgres to ClickHouse ETL

Moving data from PostgreSQL to ClickHouse is not a new idea, but doing it well is a different story.

Most traditional ETL (Extract, Transform, Load) approaches are built around batch processing. You schedule periodic jobs to dump data from Postgres, transform it, and load it into ClickHouse. This works—for a while. But as data volumes grow and the need for real-time insights becomes critical, these methods quickly fall short.

Here are the main issues:

  • Latency: Batch jobs introduce unavoidable lag. Your dashboards are always looking at data that’s minutes—or hours—old.
  • Operational overhead: Managing extraction scripts, transformation logic, retries, and schema mismatches across two different systems adds complexity and failure points.
  • Lack of change awareness: Most batch pipelines don’t track incremental changes effectively. You either reprocess everything or risk missing updates and deletes.
  • Scalability bottlenecks: High-frequency batch loads can overwhelm both the source and the destination, leading to contention and degraded performance.

To build a truly real-time, reliable, and scalable sync from PostgreSQL to ClickHouse, you need a different architecture—one that’s stream-based, change-aware, and built to evolve with your data.

Alternative Methods: ClickHouse PostgreSQL Engine

ClickHouse also provides a PostgreSQL table engine and foreign table connectors that allow you to query data directly from Postgres without building a separate pipeline. These methods can be useful for quick prototypes or low-volume workloads, but they come with limitations:

  • No real-time CDC: The Postgres engine reads snapshots of data rather than continuously streaming row-level changes. This means dashboards or reports can still lag behind the source of truth.
  • Scalability challenges: Direct queries place additional load on the PostgreSQL database, which can slow down transactional workloads as data volumes grow.
  • Limited schema handling: Schema evolution and type compatibility require manual effort, increasing operational complexity.

For production scenarios where freshness, scale, and reliability matter, a streaming-first architecture like Estuary Flow is the more robust choice. It captures incremental changes in real time and ensures ClickHouse always has the latest data without straining your Postgres system.

Estuary Flow: Real-Time CDC + Kafka-Compatible ClickHouse Integration

Estuary Flow is a streaming-native platform designed to move data in real time, without the complexity of traditional ETL. At its core, Flow uses Change Data Capture (CDC) to detect and stream row-level changes from databases like PostgreSQL as they happen.

To send this data into ClickHouse, Flow uses a clever approach: it materializes data as Kafka-compatible messages via a component called Dekaf. This makes Flow a seamless bridge between Postgres and ClickHouse, leveraging ClickHouse’s built-in ClickPipes feature to consume data from Kafka topics.

Here’s how the architecture works:

  1. Capture from PostgreSQL
    Flow connects directly to your Postgres instance and captures inserts, updates, and deletes in real time using logical replication.
  2. Flow Collection
    These change events are stored in an internal, schema-enforced data lake called a collection, which acts as an intermediate layer for reliability and transformation.
  3. Materialize to ClickHouse via Dekaf
    The Dekaf connector emits your Flow collection data as Kafka messages. ClickHouse, using ClickPipes, consumes those messages and writes them to native tables for fast querying.
  4. End-to-End Streaming
    The entire pipeline—from Postgres to ClickHouse—is continuous, fault-tolerant, and exactly-once (depending on destination configuration).

Whether you’re analyzing user events, financial transactions, or IoT metrics, Estuary Flow offers a low-latency, fully managed pipeline that’s robust and easy to configure.

Streaming Postgres to ClickHouse with Estuary Flow

Before diving into configuration, here’s what you’ll need to set up a real-time data pipeline from PostgreSQL to ClickHouse using Estuary Flow.

Prerequisites

To complete this setup, you’ll need:

  • PostgreSQL database (self-hosted or cloud-managed: RDS, Aurora, Cloud SQL, Azure).
    • A database user with replication privileges in PostgreSQL.
    • Network access from Estuary to your database (via public IP or SSH tunnel).
  • ClickHouse Cloud account with ClickPipes enabled.
  • Estuary Flow access via the web UI or CLI.

Step 1: Create a Flow Collection from PostgreSQL

A range of Postgres source options in Estuary, both self-hosted and cloud-hosted

Select a PostgreSQL Source and fill out the required fields to connect to your database, such as addressuser, and password.

Estuary uses CDC to capture changes from your Postgres database and write them to a versioned Flow collection.

Configuration example (YAML):

plaintext
captures: your-org/postgres-capture:    endpoint:      connector:        image: ghcr.io/estuary/source-postgres:dev        config:          address: your-db-host:5432          user: your-db-user          password: your-db-password          database: your-db-name    bindings:      - resource:          table: public.orders        target: your-org/orders

Key Points:

  • You don’t need to pre-create Flow collections—publishing this capture will auto-generate them.
  • Flow supports field-level schema enforcement and handles reserved words automatically.
  • Logical replication must be enabled in your Postgres settings.
  • You can check the docs for help with specific configurations, such as Google Cloud SQL for Postgres or Neon PostgreSQL.

Step 2: Materialize Your Collection to ClickHouse via Dekaf

Dekaf-backed ClickHouse destination connector

Select the ClickHouse Dekaf Destination and link your Postgres collection(s).

Flow materializes data to ClickHouse using Dekaf, which emits Kafka-compatible topics that ClickPipes in ClickHouse can consume.

Configuration example (YAML):

plaintext
materializations: your-org/clickhouse-mat:    endpoint:      dekaf:        config:          token: your-auth-token          strict_topic_names: false          deletions: kafka        variant: clickhouse    bindings:      - resource:          topic_name: orders        source: your-org/orders

Key Points:

  • Set a secure token; this will be used by ClickHouse to authenticate.
  • Use the clickhouse variant to help keep your Dekaf materializations organized.
  • Each Flow collection you want to sync must be bound to a corresponding Kafka topic.

Step 3: Connect ClickHouse ClickPipes to Flow

Now that your Kafka-compatible topics are live via Estuary’s Dekaf connector, it’s time to link them to ClickHouse using ClickPipes.

In your ClickHouse Cloud dashboard:

  1. Go to Integrations, and select Apache Kafka as your data source.
  2. When prompted for connection details:
    • Use dekaf.estuary-data.com:9092 as the broker address.
    • Set the schema registry URL to https://dekaf.estuary-data.com.
    • Choose SASL_SSL for the security protocol.
    • Set the SASL mechanism to PLAIN.
    • For both the SASL username and schema registry username, use the full name of your Estuary materialization (e.g., your-org/clickhouse-mat).
    • For the password, enter the same authentication token you configured in the Dekaf materialization.
  3. Once connected, ClickHouse will prompt you to map the incoming fields to your target table schema. Use the mapping interface to align Flow fields with ClickHouse columns.
  4. Save and activate the ClickPipe. Within seconds, data will begin streaming from PostgreSQL into ClickHouse in real time, without manual intervention.

Migrating Data into ClickHouse With Estuary

Data Type Mapping and Schema Evolution

When syncing PostgreSQL with ClickHouse, one important consideration is how data types and schema changes are handled.

ClickHouse and PostgreSQL have overlapping but not identical type systems. For example:

PostgreSQL Type

ClickHouse Equivalent

Notes

TEXTVARCHARStringStrings map directly, with compression handled natively in ClickHouse.
NUMERICDECIMALDecimal(P, S)Choose appropriate precision/scale for financial or high-accuracy workloads.
BOOLEANUInt8 (0/1)Represented as integers in ClickHouse.
TIMESTAMP WITH TIME ZONEDateTime64ClickHouse stores timezone-aware timestamps with sub-second precision.
JSONBString or NestedTypically ingested as strings; can be transformed into ClickHouse Nested structures if needed.

Estuary Flow automatically enforces JSON schemas on every collection, which means:

  • Schema enforcement: Each record conforms to a validated schema before it ever reaches ClickHouse.
  • Graceful evolution: Adding new fields or changing types can be managed in Flow’s schema evolution workflows, reducing the risk of broken pipelines.
  • Compatibility checks: If an upstream schema change could cause incompatibility (e.g., changing a NUMERIC field to TEXT), Flow flags it early.

This schema-first approach ensures that your analytical workloads in ClickHouse stay consistent, even as your PostgreSQL schema evolves.

Key Features and Benefits

Estuary Flow isn’t just a faster way to move data—it’s a smarter, more resilient approach to real-time pipelines. By bridging PostgreSQL and ClickHouse through CDC and Kafka-compatible messaging, Flow offers a robust set of features that solve the most common pain points in analytics infrastructure.

Real-Time Change Data Capture

Flow captures inserts, updates, and deletes from PostgreSQL the moment they happen—there is no polling or periodic syncs. This enables you to power dashboards, anomaly detection, and alerts with always-fresh data.

ClickHouse-Native Streaming

With Dekaf connectors, Flow emits fully compatible Kafka messages that plug directly into ClickHouse ClickPipes. No extra services or Kafka brokers are needed—Flow handles the hard parts.

Schema Enforcement and Evolution

Flow collections are backed by JSON schemas, so you always know what your data looks like. When your upstream schema changes, Flow helps you manage evolution gracefully without breaking downstream pipelines.

Exactly-Once Delivery Semantics

Flow guarantees at-least-once delivery by default, and supports exactly-once semantics depending on your destination configuration. This ensures consistency in high-volume pipelines without the risk of duplication.

Delta Updates for Efficiency

PostgreSQL materializations can optionally use delta updates, which reduce write amplification by updating only changed fields—especially useful for high-churn tables.

Flexible Deployment Options

Run Flow as a fully managed SaaS, deploy in your own cloud (BYOC), or use a private deployment model to meet compliance and control needs.

Production-Ready Monitoring

Flow integrates with Prometheus via its OpenMetrics API, so you can track latency, throughput, error rates, and more—no guesswork required.

Best Practices for Bulk Load and CDC Performance

A common challenge when moving data from PostgreSQL to ClickHouse is handling both the initial load of historical data and the continuous stream of new changes. Estuary addresses this by combining a one-time backfill with ongoing CDC, but there are best practices you can follow to maximize performance:

  • Use snapshot + CDC together: Flow automatically takes an initial snapshot of your Postgres tables before switching to streaming CDC. This ensures your ClickHouse tables start with a complete dataset and then stay continuously updated.
  • Partition large tables: For very large or high-churn tables (like orders or events), partitioning in Postgres helps Flow capture changes more efficiently and reduces lock contention.
  • Enable delta updates where possible: Instead of re-writing entire rows, Flow can propagate only the fields that changed. This reduces write amplification in ClickHouse and improves performance for high-frequency updates.
  • Monitor pipeline health: Flow integrates with Prometheus via its OpenMetrics API. Tracking metrics like end-to-end latency, throughput, and error rates helps you quickly spot bottlenecks and scale resources as needed.
  • Tune resource allocation: For mission-critical workloads, dedicate sufficient Postgres replication slots and configure ClickHouse ingestion settings (like batch size) to match your data velocity.

Following these practices ensures you get both a fast initial load and a low-latency CDC pipeline that can handle production-scale workloads without surprises.

Real-World Use Case: E-commerce Order Analytics

Imagine you're running an e-commerce platform where every transaction is recorded in a PostgreSQL database. Your operations team wants a real-time dashboard that shows order volume, revenue trends, top-selling products, and customer activity across regions—updated every few seconds.

Here’s how Estuary Flow makes that possible:

Source: PostgreSQL

New orders, updates to shipping status, and cancellations are continuously logged in a public.orders table. Instead of relying on nightly ETL jobs, you capture this data in real time using Flow’s Postgres connector.

Stream: Estuary Flow Collection

As changes occur, they’re streamed into a Flow collection with schema enforcement and versioning. You don’t have to manage storage, transformation, or failover—Flow handles it for you.

Destination: ClickHouse via ClickPipes

The collection is materialized into ClickHouse through Flow’s Kafka-compatible Dekaf connector. ClickHouse consumes these records using ClickPipes and inserts them into an analytics-optimized table.

Outcome: Real-Time Visibility

Now your BI dashboard is powered by ClickHouse’s ultra-fast queries, with data that’s seconds old, not hours. You can monitor conversions, detect stockouts, or adjust promotions dynamically—all without putting load on your transactional database.

This setup gives your team the analytical agility of ClickHouse with the trusted source-of-truth integrity of PostgreSQL—and it’s built entirely on streaming infrastructure.

Conclusion

Syncing PostgreSQL to ClickHouse no longer requires brittle batch pipelines, custom Kafka deployments, or hours of engineering work. With Estuary Flow, you get a fully-managed, streaming-first solution that brings transactional data into ClickHouse in real time, with exactly-once guarantees, built-in schema management, and seamless compatibility via ClickPipes.

Whether you're building operational dashboards, powering real-time analytics, or simply offloading queries from Postgres, Estuary Flow makes it easy to modernize your data stack.

Ready to stream from Postgres to ClickHouse in minutes? Try Estuary Flow and see what real-time really looks like.

FAQs

    What’s the best way to sync PostgreSQL to ClickHouse in real time?

    The most efficient way to sync PostgreSQL to ClickHouse in real time is by using Estuary Flow. It leverages change data capture (CDC) to detect row-level changes and streams them to ClickHouse via a Kafka-compatible interface, with minimal setup and no batch jobs.
    Yes. Estuary Flow offers a built-in Kafka-compatible connector called Dekaf. It emits data from Flow collections as Kafka topics, which ClickHouse can consume using ClickPipes, without requiring you to host Kafka yourself.
    Estuary Flow uses JSON schema enforcement on its collections. When your source schema evolves, Flow can accommodate changes like new fields or modified types with minimal intervention, reducing the risk of broken pipelines.
    Airbyte and Fivetran are widely used ETL tools, but they rely on batch-based pipelines. Data is extracted from PostgreSQL at scheduled intervals and then loaded into ClickHouse, which means there is always a delay before new changes appear in your analytics. This approach can work for traditional reporting but falls short when real-time visibility is required. Estuary takes a different approach by using change data capture (CDC) to stream every insert, update, and delete from PostgreSQL into ClickHouse as it happens. With its built-in Kafka-compatible Dekaf connector, Flow integrates directly with ClickHouse ClickPipes, eliminating the need for extra brokers or infrastructure. It also automatically handles schema evolution, reducing the operational burden of managing changes between systems. For teams that need dashboards and analytics updated within seconds rather than hours, Flow provides a more reliable and truly real-time solution compared to batch-based alternatives.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Jeffrey Richman
Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.