Estuary

No-Code Real-time CDC to StarTree Using Estuary Flow

Learn how to build a real-time CDC pipeline from PostgreSQL to StarTree using Estuary Flow. Skip Kafka setup and power fast OLAP analytics with ease.

Blog post hero image
Share this article

One of the most common use cases in Data Engineering is making the data from relational databases available to the analytical world. Traditionally, this involved periodic batch processing, but as user expectations have evolved, the demand for real-time analytics has grown significantly. Today, delivering fresh, up-to-the-minute data is crucial for many applications. To meet this need, it's essential to propagate changes from transactional databases to analytical systems in real time – a process made possible through Change Data Capture (CDC).

In this blog, we can see how we can seamlessly build a no-code PostgreSQL to StarTree CDC pipeline using Estuary Flow. This Setup will propagate the changes happening in a relational database in real-time to an OLAP datastore. We will be using Cloud SQL for PostgreSQL as the relational database. We will capture the changes from the PostgreSQL database using Estuary Flow and direct them to Pinot tables in StarTree. The complete pipeline will be built in no-code fashion.

Change Data Capture from Postgres to StarTree using Dekaf

To capture data from PostgreSQL, we will be utilizing the Change Data Capture (CDC) technique. CDC is a data management approach used to identify and capture changes made to a database or data source over time. All types of changes—such as inserts, upserts, and deletes—are captured using CDC.

Estuary connects to the source through Captures, which collect the CDC from the source, in this case, PostgreSQL. Estuary is a sophisticated tool capable of capturing not only data changes but also schema changes made at the source. The captured CDC data is stored in Collections within Estuary.

The data from these collections is then propagated to the OLAP datastore, StarTree, using DekafDekaf serves as Estuary Flow’s Kafka-API compatibility layer, allowing services to read data from Estuary Flow's collections as though they were topics in a Kafka cluster. Using Dekaf, the StarTree materialization propagates the data to Pinot tables making it available for analytics.

With Estuary, the heavy lifting required to transfer data from PostgreSQL to StarTree in real-time is greatly simplified. There is no need to manage a Kafka cluster, CDC tools like Debezium, or metadata systems such as schema registries. Estuary abstracts all of this, offering a fully managed solution. The Estuary team ensures that your pipelines remain up-to-date and run seamlessly. As a result, Estuary significantly streamlines your data pipelines and ensures that your destination data stores remain in sync.

Step-by-Step Guide for Creating RDBMS to StarTree Estuary Flow

Prerequisites

To follow the guide, you will need:

  • GCP account with admin access to Cloud SQL.
  • An Estuary account: Go to the Flow web app on dashboard.estuary.com to sign up for the free tier.
  • Startree account: StarTree provides a free developer account that you can use for development use cases. You can find more about it here.

Step 1: Setup Cloud SQL for Postgres

  1. You can navigate to Cloud SQL on the Google Cloud console, and click on the Create Instance button to create a new Cloud SQL instance. 
Create Cloud SQL instance
  1. On the Create an instance page, under Choose your database engine section, click on the Choose PostgreSQL button.
Choose PostgreSQL
  1. On the Create a PostgreSQL instance page, choose an appropriate CloudSQL edition and Edition preset as per your requirement. Choose an appropriate Database version, and provide an appropriate Instance ID and Password for the PostgreSQL instance. Choose Region and Zonal availability as per your requirement.
image13.png
  1. Click on the Show Configuration Options, and under Connections, ensure that Public IP is enabled. Click on Add a Network option, and provide an appropriate Name for the network, and in the Network text box, mention 0.0.0.0/0. This ensures that we are able to connect to the PostgreSQL instance from any IP address. Click on Done.
Configure connection to Postgres
  1. And finally, click on the Create Instance button at the bottom of the page. It will take a few minutes for the PostgreSQL instance to get created.

 

You can now connect to the PostgreSQL instance from any client using the public IP address of the instance on port 5432, username as postgres, password as provided by you while creating the instance, and database as postgres. Once connected to the instance, create the table via the client using the following SQL:

plaintext
CREATE TABLE users ( id int, first_name varchar(100), last_name varchar(100), city varchar(100), created_at timestamp NOT NULL DEFAULT NOW(), PRIMARY KEY (id) );

Step 2: Create Estuary Flow Capture for PostgreSQL

  1. Navigate to the Estuary dashboard, and click on the Sources from the left navigation menu.
Sources menu
  1. Click on the New Capture button on the Sources page.
New Capture
  1. In the Search connectors text box, search for Cloud SQL. On the Google Cloud SQL for PostgreSQL tile, click on the docs link, which will take you to this page. 
Search for Cloud SQL
  1. Follow the instructions under the Setup section to ensure the PostgreSQL instance is set up for capturing the change data.
  2. Come back to the Estuary dashboard, and click on the Capture button on the Google Cloud SQL for PostgreSQL tile.
Select Capture
  1. On the Create Capture page, provide an appropriate Name for the capture and choose an appropriate Data Plane. You can always choose to leave the Data Plane with the default value.
  2. In the Server Address textbox, provide the public IP of the PostgreSQL instance with port 5432, as `<ip_address>:5432`. Provide the User as `flow_capture`, and the Password as the one that you have set while running the following SQL command from the Setup section as mentioned in point number 4 (in this case, the password is `secret`):
plaintext
CREATE USER flow_capture WITH REPLICATION IN ROLE cloudsqlsuperuser LOGIN PASSWORD 'secret';
  1. Provide the Database as `postgres`.
image24.png
  1. Click on the Next button at the top of the page.
  2. You should now see the Output Collections section being populated with the `users` table. Click on the Next button at the top of the page.
image21.png
  1. Click on the Save and Publish button at the top of the page. This will create the capture for our Estuary Flow.
image9.png
  1. Navigate to Collections from the left navigation menu. You should see a collection that has been created corresponding to the new capture that we have published in the earlier step.
image22.png
  1. You can now insert, update, and/or delete any records in the `users` table using the PostgreSQL client or any other tools, and see the change data being captured in the collection.

Step 3: Create Estuary Flow Materialization into Startree

Let us now proceed to create the Startree materialization for our Estuary Flow.

  1. On the Estuary dashboard, navigate to Destinations from the left navigation menu. On the Destinations page, click on the New Materialization button.
image23.png
  1. Search for StarTree, and click on the Materialization button on the StarTree tile.
    image28.png
  2. On the Create Materialization page, provide an appropriate Name for the materialization. You can choose the Data Plane of your choice, or let it stay as default. 

Under the Endpoint Config section, provide an Auth Token of your choice. Note that this is going to be used later while creating the connection in StarTree.

Click on the Deletion Mode, and choose `cdc` from the deletion mode dropdown.

image25.png
  1. Under the Source Collections section, click on the Add button beside Collections. On the popup that appears, tick the checkbox next to the collection we created in the previous step, and click on Continue. The selected collection will now start appearing in the Collections section.
image8.png
  1. Click on the Next button at the top of the page.
  2. You can now see Test and Save and Publish buttons appear on the top of the page. Click on the Save and Publish button. 

This will create the StarTree materialization in Estuary.

image5.png

Step 4: Create Dataset in StarTree

Now that the Estuary Flow is ready, we will proceed to the StarTree Cloud console.

  1. On the StarTree Cloud console, click on the Go To Data Manager button.
image3.png
  1. On the Data Manager Overview page, click on the Create a Dataset button.
  2. On the Create Dataset page, select Kafka as the Connection Type by clicking on the Kafka button present under the Event Streams section.
image6.png
  1. On the Connection Details page, click on the Create Connection button.
image4.png
  1. On the Create Connection popup, provide the following details:
  • Connection Name: An appropriate name of choice, say `EstuaryKafka`
  • Broker Url: `dekaf.estuary-data.com:9092`
  • Authentication Type: `SASL`
  • Security Protocol: `SASL_SSL`
  • SASL Mechanism: `PLAIN`
  • Username: StarTree Materialization’s full name. In this case, it will be `ShrutisOrganization/users/dekaf-startree`.
  • Password: Auth Token provided while creating StarTree Materialization.
  • Click on the Show Advanced Options to configure the Schema Registry.
  • Schema Registry URL: `https://dekaf.estuary-data.com`
  • Schema Registry Key: Same as the SASL username
  • Schema Registry Secret: Same as the SASL password
image20.png

Click on the Test Connection button, and ensure that the connection is successful. Click on the Create Connection button to create the connection.

  1. Select the newly created Kafka connection, and click on the Next button.
image29.png
  1. On the Dataset Details page, provide the Name for the dataset, and optionally, provide the Description. Under the Kafka input format section, select the Topic name corresponding to users from the dropdown. Select Data format as `AVRO`. Leave the Record reader config with its default value(`{}`).
image7.png
  1. Click on the Check Sample Data button. The row(s) from the PostgreSQL `users` table should start showing up in the sample data.

Note: In case you have not inserted any rows in the PostgreSQL table, insert the rows into the table using the PostgreSQL client, and then click on the Check Sample Data button.

Note that there is a new column added called `_is_deleted` which tracks whether the row is in the deleted state in the PostgreSQL table.

  1. Click on the Next button.
  2. On the Data Modelling page, change the following values:
    1. For _is_deleted column, change the Field type to `Dimension`, and Data type to `Boolean`.
    2. For created_at column, change the Field type to `Date_Time`, and Data type to `Long`.

Note: In case you get a warning after making the changes to the created_at column, click on the Back button to go to the previous page, and then click on the Next button to come back on the Data Modelling page. The warning will go away.

Ensure that under the Pinot Time Column section, the created_at column starts showing up in the Time Column dropdown.

Click on the Next button.

image19.png
  1. On this page, you can leave all the values under the Configure Indexes and Add Encoding Types section as is. Under the Additional Configuration section, select `id` as the Primary Key

Click on the Next button.

image18.png
  1. You can now preview the Data, Schema and Table Config.
  2. Click on the Create Dataset button. The new dataset will get created.
image26.png
  1. On the `users` dataset page, click on the Edit button on the top right corner of the page.
image14.png
  1. Under the Table Config section, make the following changes:
    1. In the upsertConfig section:
      1. Add a new attribute “deleteRecordColumn” with value “_is_deleted”.
      2. Change the value of “metadataTTL” and “deletedKeysTTL” to some seconds, say 10.
      3. Leave the rest of the values under this section as default.

The changed fields will look like follows:

plaintext
"upsertConfig": {      "deleteRecordColumn": "_is_deleted",      "metadataTTL": 10,      "deletedKeysTTL": 10,      …    }
  1. In the ingestionConfig section, the username and password present at two places, one for Kafka and another for Schema Registry, is masked with asterisks (*). Replace the masked values with the actual values. These are the values that will change:
plaintext
"ingestionConfig": {                          …                          "streamIngestionConfig": {                            …                            "streamConfigMaps": [                              {                                 "stream.kafka.username": "ShrutisOrganization/users/dekaf-startree",                                 "stream.kafka.decoder.prop.schema.registry.key": "ShrutisOrganization/users/dekaf-startree",                                 "sasl.jaas.config": "org.apache.kafka.common.security.plain.PlainLoginModule required \n username=\"ShrutisOrganization/users/dekaf-startree\" \n password=\"password\";",                                 "stream.kafka.password": "password",                                 …                              }                            ],                          },                        }
  1. Click on the Save button at the bottom of the page.

With this, our StarTree dataset is ready.

Step 5: Test the pipeline

Our CDC pipeline created using Estuary Flow is now ready. Let us try it out.

  1. On the Data Manager console, click on the Query Console from the left navigation menu.
image1.png
  1. On the Query Console page, you can now run any query on the `users` table.
image2.png
  1. You can now make any changes (inserts / updates /deletes) to the `users` table in PostgreSQL database, and see the same changes being captured and reflected in the `users` table in the StarTree query console.

Thus, we have successfully ingested the data from Cloud SQL’s PostgreSQL instance to StarTree using Estuary Flow in real-time in a no-code fashion.

Conclusion

Estuary provides an extremely powerful mechanism to establish flows that can capture data in real-time from different databases, and route them to StarTree seamlessly in a no-code fashion. This makes handling data pipelines so much hassle-free with all the complexities of realtime change data capture movement being abstracted away.

Ready to build your own Postgres to StarTree pipeline in minutes?

👉 Get started with Estuary Flow for free and experience real-time analytics without the maintenance burden.

FAQs

    You can use Estuary Flow to build a no-code real-time CDC pipeline from PostgreSQL (Cloud SQL) to StarTree. Estuary captures data changes and schema updates from PostgreSQL and streams them to StarTree's Pinot tables using Dekaf, a Kafka-API-compatible layer—eliminating the need to manually manage Kafka, Debezium, or schema registries.
    No, Estuary Flow offers a completely no-code experience. You can configure CDC from PostgreSQL and materialize data into StarTree using intuitive UI steps on the Estuary dashboard—no custom coding, Kafka setup, or infrastructure management required.
    Estuary Flow simplifies real-time data integration by abstracting complex components like Kafka and Debezium. It supports streaming schema evolution, CDC-based row-level tracking, and plug-and-play materializations into StarTree, enabling fast OLAP analytics without engineering overhead.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Shruti Mantri
Shruti Mantri

Shruti is an accomplished Data Engineer with over a decade of experience, specializing in innovative data solutions. Her passion for exploring new technologies keeps her at the forefront of advancements in the field. As an active contributor to open-source projects and a technical blog writer, Shruti shares her knowledge with the wider community. She is also an Udemy author, with multiple courses in data engineering, helping others build expertise in this domain.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.