
As businesses scale, so does the demand for faster, more actionable data. PostgreSQL is a trusted choice for transactional workloads, powering applications, websites, and core business services. But when it comes to running complex analytics on large volumes of data, PostgreSQL can hit performance ceilings fast.
This is where ClickHouse comes in. Designed for high-speed OLAP (Online Analytical Processing), ClickHouse can process billions of rows per second, making it perfect for real-time dashboards, anomaly detection, and operational analytics.
But how do you get your data from PostgreSQL to ClickHouse in real time?
That’s where many teams struggle. Traditional ETL pipelines are batch-based, lag-prone, and challenging to maintain. They lead to stale reports, delayed decisions, and fragile pipelines that break with every schema change.
This guide shows you how to stream data from PostgreSQL to ClickHouse using Estuary Flow, a real-time data operations platform built for streaming-first architectures. With support for change data capture (CDC), Estuary continuously syncs your PostgreSQL database to ClickHouse with minimal latency—no batch jobs, no custom code, and no operational complexity.
By the end of this guide, you’ll know exactly what’s required to set up a Postgres to ClickHouse pipeline, how Estuary handles data capture and streaming behind the scenes, and how to configure each step from source to destination.
Why Sync PostgreSQL with ClickHouse?
PostgreSQL is one of the most reliable and feature-rich databases for transactional workloads. It powers everything from customer-facing apps to internal business systems. But when it comes to running complex analytical queries, especially across large datasets, it starts to show limitations. Aggregations slow down, indexing becomes costly, and query performance degrades as data grows.
ClickHouse fills this gap. It’s an open-source, columnar OLAP database designed for lightning-fast analytics. With features like vectorized execution and efficient compression, ClickHouse can process billions of rows per second with minimal latency.
By syncing data from PostgreSQL to ClickHouse, teams can offload analytical workloads, build real-time dashboards, and unlock insights without impacting transactional performance. Whether you’re tracking events in a SaaS app, analyzing ecommerce behavior, or monitoring financial transactions, combining these two systems provides the best of both worlds: trusted source-of-truth data with high-speed analytics.
Challenges with Traditional Postgres to ClickHouse ETL
Moving data from PostgreSQL to ClickHouse is not a new idea, but doing it well is a different story.
Most traditional ETL (Extract, Transform, Load) approaches are built around batch processing. You schedule periodic jobs to dump data from Postgres, transform it, and load it into ClickHouse. This works—for a while. But as data volumes grow and the need for real-time insights becomes critical, these methods quickly fall short.
Here are the main issues:
- Latency: Batch jobs introduce unavoidable lag. Your dashboards are always looking at data that’s minutes—or hours—old.
- Operational overhead: Managing extraction scripts, transformation logic, retries, and schema mismatches across two different systems adds complexity and failure points.
- Lack of change awareness: Most batch pipelines don’t track incremental changes effectively. You either reprocess everything or risk missing updates and deletes.
- Scalability bottlenecks: High-frequency batch loads can overwhelm both the source and the destination, leading to contention and degraded performance.
To build a truly real-time, reliable, and scalable sync from PostgreSQL to ClickHouse, you need a different architecture—one that’s stream-based, change-aware, and built to evolve with your data.
Alternative Methods: ClickHouse PostgreSQL Engine
ClickHouse also provides a PostgreSQL table engine and foreign table connectors that allow you to query data directly from Postgres without building a separate pipeline. These methods can be useful for quick prototypes or low-volume workloads, but they come with limitations:
- No real-time CDC: The Postgres engine reads snapshots of data rather than continuously streaming row-level changes. This means dashboards or reports can still lag behind the source of truth.
- Scalability challenges: Direct queries place additional load on the PostgreSQL database, which can slow down transactional workloads as data volumes grow.
- Limited schema handling: Schema evolution and type compatibility require manual effort, increasing operational complexity.
For production scenarios where freshness, scale, and reliability matter, a streaming-first architecture like Estuary Flow is the more robust choice. It captures incremental changes in real time and ensures ClickHouse always has the latest data without straining your Postgres system.
Estuary Flow: Real-Time CDC + Kafka-Compatible ClickHouse Integration
Estuary Flow is a streaming-native platform designed to move data in real time, without the complexity of traditional ETL. At its core, Flow uses Change Data Capture (CDC) to detect and stream row-level changes from databases like PostgreSQL as they happen.
To send this data into ClickHouse, Flow uses a clever approach: it materializes data as Kafka-compatible messages via a component called Dekaf. This makes Flow a seamless bridge between Postgres and ClickHouse, leveraging ClickHouse’s built-in ClickPipes feature to consume data from Kafka topics.
Here’s how the architecture works:
- Capture from PostgreSQL
Flow connects directly to your Postgres instance and captures inserts, updates, and deletes in real time using logical replication. - Flow Collection
These change events are stored in an internal, schema-enforced data lake called a collection, which acts as an intermediate layer for reliability and transformation. - Materialize to ClickHouse via Dekaf
The Dekaf connector emits your Flow collection data as Kafka messages. ClickHouse, using ClickPipes, consumes those messages and writes them to native tables for fast querying. - End-to-End Streaming
The entire pipeline—from Postgres to ClickHouse—is continuous, fault-tolerant, and exactly-once (depending on destination configuration).
Whether you’re analyzing user events, financial transactions, or IoT metrics, Estuary Flow offers a low-latency, fully managed pipeline that’s robust and easy to configure.
Streaming Postgres to ClickHouse with Estuary Flow
Before diving into configuration, here’s what you’ll need to set up a real-time data pipeline from PostgreSQL to ClickHouse using Estuary Flow.
Prerequisites
To complete this setup, you’ll need:
- A PostgreSQL database (self-hosted or cloud-managed: RDS, Aurora, Cloud SQL, Azure).
- A database user with replication privileges in PostgreSQL.
- Network access from Estuary to your database (via public IP or SSH tunnel).
- A ClickHouse Cloud account with ClickPipes enabled.
- Estuary Flow access via the web UI or CLI.
Step 1: Create a Flow Collection from PostgreSQL
Select a PostgreSQL Source and fill out the required fields to connect to your database, such as address, user, and password.
Estuary uses CDC to capture changes from your Postgres database and write them to a versioned Flow collection.
Configuration example (YAML):
plaintextcaptures:
your-org/postgres-capture:
endpoint:
connector:
image: ghcr.io/estuary/source-postgres:dev
config:
address: your-db-host:5432
user: your-db-user
password: your-db-password
database: your-db-name
bindings:
- resource:
table: public.orders
target: your-org/orders
Key Points:
- You don’t need to pre-create Flow collections—publishing this capture will auto-generate them.
- Flow supports field-level schema enforcement and handles reserved words automatically.
- Logical replication must be enabled in your Postgres settings.
- You can check the docs for help with specific configurations, such as Google Cloud SQL for Postgres or Neon PostgreSQL.
Step 2: Materialize Your Collection to ClickHouse via Dekaf
Select the ClickHouse Dekaf Destination and link your Postgres collection(s).
Flow materializes data to ClickHouse using Dekaf, which emits Kafka-compatible topics that ClickPipes in ClickHouse can consume.
Configuration example (YAML):
plaintextmaterializations:
your-org/clickhouse-mat:
endpoint:
dekaf:
config:
token: your-auth-token
strict_topic_names: false
deletions: kafka
variant: clickhouse
bindings:
- resource:
topic_name: orders
source: your-org/orders
Key Points:
- Set a secure
token
; this will be used by ClickHouse to authenticate. - Use the
clickhouse
variant to help keep your Dekaf materializations organized. - Each Flow collection you want to sync must be bound to a corresponding Kafka topic.
Step 3: Connect ClickHouse ClickPipes to Flow
Now that your Kafka-compatible topics are live via Estuary’s Dekaf connector, it’s time to link them to ClickHouse using ClickPipes.
In your ClickHouse Cloud dashboard:
- Go to Integrations, and select Apache Kafka as your data source.
- When prompted for connection details:
- Use
dekaf.estuary-data.com:9092
as the broker address. - Set the schema registry URL to
https://dekaf.estuary-data.com
. - Choose SASL_SSL for the security protocol.
- Set the SASL mechanism to
PLAIN
. - For both the SASL username and schema registry username, use the full name of your Estuary materialization (e.g.,
your-org/clickhouse-mat
). - For the password, enter the same authentication token you configured in the Dekaf materialization.
- Use
- Once connected, ClickHouse will prompt you to map the incoming fields to your target table schema. Use the mapping interface to align Flow fields with ClickHouse columns.
- Save and activate the ClickPipe. Within seconds, data will begin streaming from PostgreSQL into ClickHouse in real time, without manual intervention.
Data Type Mapping and Schema Evolution
When syncing PostgreSQL with ClickHouse, one important consideration is how data types and schema changes are handled.
ClickHouse and PostgreSQL have overlapping but not identical type systems. For example:
PostgreSQL Type | ClickHouse Equivalent | Notes |
TEXT, VARCHAR | String | Strings map directly, with compression handled natively in ClickHouse. |
NUMERIC, DECIMAL | Decimal(P, S) | Choose appropriate precision/scale for financial or high-accuracy workloads. |
BOOLEAN | UInt8 (0/1) | Represented as integers in ClickHouse. |
TIMESTAMP WITH TIME ZONE | DateTime64 | ClickHouse stores timezone-aware timestamps with sub-second precision. |
JSONB | String or Nested | Typically ingested as strings; can be transformed into ClickHouse Nested structures if needed. |
Estuary Flow automatically enforces JSON schemas on every collection, which means:
- Schema enforcement: Each record conforms to a validated schema before it ever reaches ClickHouse.
- Graceful evolution: Adding new fields or changing types can be managed in Flow’s schema evolution workflows, reducing the risk of broken pipelines.
- Compatibility checks: If an upstream schema change could cause incompatibility (e.g., changing a NUMERIC field to TEXT), Flow flags it early.
This schema-first approach ensures that your analytical workloads in ClickHouse stay consistent, even as your PostgreSQL schema evolves.
Key Features and Benefits
Estuary Flow isn’t just a faster way to move data—it’s a smarter, more resilient approach to real-time pipelines. By bridging PostgreSQL and ClickHouse through CDC and Kafka-compatible messaging, Flow offers a robust set of features that solve the most common pain points in analytics infrastructure.
Real-Time Change Data Capture
Flow captures inserts, updates, and deletes from PostgreSQL the moment they happen—there is no polling or periodic syncs. This enables you to power dashboards, anomaly detection, and alerts with always-fresh data.
ClickHouse-Native Streaming
With Dekaf connectors, Flow emits fully compatible Kafka messages that plug directly into ClickHouse ClickPipes. No extra services or Kafka brokers are needed—Flow handles the hard parts.
Schema Enforcement and Evolution
Flow collections are backed by JSON schemas, so you always know what your data looks like. When your upstream schema changes, Flow helps you manage evolution gracefully without breaking downstream pipelines.
Exactly-Once Delivery Semantics
Flow guarantees at-least-once delivery by default, and supports exactly-once semantics depending on your destination configuration. This ensures consistency in high-volume pipelines without the risk of duplication.
Delta Updates for Efficiency
PostgreSQL materializations can optionally use delta updates, which reduce write amplification by updating only changed fields—especially useful for high-churn tables.
Flexible Deployment Options
Run Flow as a fully managed SaaS, deploy in your own cloud (BYOC), or use a private deployment model to meet compliance and control needs.
Production-Ready Monitoring
Flow integrates with Prometheus via its OpenMetrics API, so you can track latency, throughput, error rates, and more—no guesswork required.
Best Practices for Bulk Load and CDC Performance
A common challenge when moving data from PostgreSQL to ClickHouse is handling both the initial load of historical data and the continuous stream of new changes. Estuary addresses this by combining a one-time backfill with ongoing CDC, but there are best practices you can follow to maximize performance:
- Use snapshot + CDC together: Flow automatically takes an initial snapshot of your Postgres tables before switching to streaming CDC. This ensures your ClickHouse tables start with a complete dataset and then stay continuously updated.
- Partition large tables: For very large or high-churn tables (like orders or events), partitioning in Postgres helps Flow capture changes more efficiently and reduces lock contention.
- Enable delta updates where possible: Instead of re-writing entire rows, Flow can propagate only the fields that changed. This reduces write amplification in ClickHouse and improves performance for high-frequency updates.
- Monitor pipeline health: Flow integrates with Prometheus via its OpenMetrics API. Tracking metrics like end-to-end latency, throughput, and error rates helps you quickly spot bottlenecks and scale resources as needed.
- Tune resource allocation: For mission-critical workloads, dedicate sufficient Postgres replication slots and configure ClickHouse ingestion settings (like batch size) to match your data velocity.
Following these practices ensures you get both a fast initial load and a low-latency CDC pipeline that can handle production-scale workloads without surprises.
Real-World Use Case: E-commerce Order Analytics
Imagine you're running an e-commerce platform where every transaction is recorded in a PostgreSQL database. Your operations team wants a real-time dashboard that shows order volume, revenue trends, top-selling products, and customer activity across regions—updated every few seconds.
Here’s how Estuary Flow makes that possible:
Source: PostgreSQL
New orders, updates to shipping status, and cancellations are continuously logged in a public.orders
table. Instead of relying on nightly ETL jobs, you capture this data in real time using Flow’s Postgres connector.
Stream: Estuary Flow Collection
As changes occur, they’re streamed into a Flow collection with schema enforcement and versioning. You don’t have to manage storage, transformation, or failover—Flow handles it for you.
Destination: ClickHouse via ClickPipes
The collection is materialized into ClickHouse through Flow’s Kafka-compatible Dekaf connector. ClickHouse consumes these records using ClickPipes and inserts them into an analytics-optimized table.
Outcome: Real-Time Visibility
Now your BI dashboard is powered by ClickHouse’s ultra-fast queries, with data that’s seconds old, not hours. You can monitor conversions, detect stockouts, or adjust promotions dynamically—all without putting load on your transactional database.
This setup gives your team the analytical agility of ClickHouse with the trusted source-of-truth integrity of PostgreSQL—and it’s built entirely on streaming infrastructure.
Conclusion
Syncing PostgreSQL to ClickHouse no longer requires brittle batch pipelines, custom Kafka deployments, or hours of engineering work. With Estuary Flow, you get a fully-managed, streaming-first solution that brings transactional data into ClickHouse in real time, with exactly-once guarantees, built-in schema management, and seamless compatibility via ClickPipes.
Whether you're building operational dashboards, powering real-time analytics, or simply offloading queries from Postgres, Estuary Flow makes it easy to modernize your data stack.
Ready to stream from Postgres to ClickHouse in minutes? Try Estuary Flow and see what real-time really looks like.
FAQs
Can I use ClickHouse with Kafka without managing Kafka infrastructure?
How does Estuary Flow handle schema changes in PostgreSQL?
How does Estuary Flow compare to tools like Airbyte or Fivetran for Postgres to ClickHouse?

About the author
With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.
