Estuary

How to Move Data from Kafka to Redshift: Options + Tutorial

Need to analyze, but data stuck in Kafka? Here are 2 methods to get data from Kafka to Redshift, with step-by-step guidance.

Kafka to Redshift
Share this article

Are you looking to migrate data from Apache Kafka to Amazon Redshift in a way that is fast, reliable, and easy to maintain? You are in the right place.

Many teams start with Kafka Connect to wire data from Kafka into Redshift. While it works well for streaming, it often comes with ongoing challenges: managing connector deployments, handling schema drift, tuning performance, and staging data in S3 for batch loads. These tasks can become a significant engineering burden, especially when you also need exactly-once delivery and both historical and real-time data in one pipeline.

The good news is that you have options. In this article, we will explore two proven methods:

  • Method 1: Kafka Connect with Confluent’s Amazon Redshift Sink or S3 Sink
  • Method 2: Estuary Flow, a fully managed platform that handles real-time streaming and historical backfill automatically

By the end, you will understand how each approach works, what prerequisites are involved, and the limitations you should be aware of, so you can select the best path for your migration project.

What is Kafka?

kafka to redshift - kafka

Image Source

Kafka, developed at LinkedIn in 2011 and later donated to the Apache Foundation, is an open-source, distributed Event Streaming software Platform (ESP). It allows the development of real-time event-driven applications. 

Kafka was initially developed to handle data (referred to as messages) queues and to manage the real-time streaming of data from different applications. Data generated by producers (front-end servers) is stored in the form of a stream in the Kafka cluster (also referred to as a data broker). This data broker acts as an intermediate system between all producers and consumers (end-user). A Kafka topic is a user-defined category to organize messages. Each topic has a unique name across the entire Kafka cluster. So the producers write data on topics in Kafka, and the consumers will read the data from these topics. Whenever a consumer generates a request, it will consume the data from the cluster. 

Let’s understand some of Kafka’s best features:

  • Scalability: It’s a highly scalable distributed system with no downtime.
  • High volume: It can continuously handle terabytes of data generated by producers and seamlessly send it to consumers.
  • Fault tolerance: It handles failure and data recovery with replicas stored on some other systems.

Apache Kafka can be used either on its own or with its managed version: Confluent. Basically, the Confluent platform is built on Apache Kafka, which makes it easy to manage data streaming.

What is Redshift?

kafka to redshift - redshift

Image Source

Redshift, developed by Amazon, is a cloud-based, petabyte-scale, fully-managed data warehouse service. It is specifically built to support large-scale data analytics and data warehousing. It stores data from various sources and applications for further analysis with business intelligence tools or supports machine learning workflows.

Some of the advantages of Redshift are:

  • Cloud Data Warehouse: You don’t need to provision hardware and set up your server/database. This overcomes the additional cost and overhead maintenance. 
  • Easy to Set Up and Manage: You just need to launch your data warehouse with the desired configuration that you need.
  • Massive Parallel Processing (MPP): It enables fast execution of complex queries operating on voluminous data.
  • Columnar Storage: Redshift stores data in column-based format. This type of storage optimizes the analytical performance of the query as it reduces the overall disk I/O requirement.
  • Result Caching: This is one of the important features of Amazon Redshift to reduce query execution time and improve system performance. It caches the result of repeated queries, and when these queries are rerun, it returns the cached result.

Kafka to Redshift

Move Data from Kafka to Redshift 

Using the below methods, you can quickly migrate your data from Kafka to Amazon Redshift:

Method 1: Kafka to Redshift using Kafka Connect Amazon Redshift Sink Connector.

Method 2: Using SaaS Alternative to Move Data from Kafka to Redshift.

Method 1: Kafka to Redshift using Kafka Connect Amazon Redshift Sink Connector

Kafka Connect is an open-source tool to stream data between Apache Kafka and other systems. Confluent offers two ways to move Kafka data into Redshift:

  1. The Amazon Redshift Sink Connector, which writes directly to Redshift over JDBC.
  2. The Amazon S3 Sink Connector combined with Redshift’s COPY command, which stages data in S3 before loading it into Redshift.

In this guide, we’ll focus on the S3 staging method, as it’s widely used and works across both self-managed and fully managed (Confluent Cloud) setups.

Features of the Redshift Sink Connector

Let’s take a look at the key features of the Kafka to Redshift Connector:

  • At least once delivery: Guarantees that records from Kafka topics are delivered at least once.
  • Dead Letter Queue: Stores messages that fail to reach their destination due to format or serialization errors.
  • Scheduled file rotation (S3 method): Regularly rotates output files for efficient batch loading.
  • Multiple tasks: Supports running multiple tasks in parallel to boost throughput.
  • Time-based partitioning (S3 method): Partitions files hourly or daily for better load performance.
  • Data format flexibility: Works with Avro, JSON (with schema), Protobuf, or Bytes format.

Amazon Redshift Sink/S3 connector can be used with: 

1. Confluent Platform: This is a self-managed, enterprise-grade distribution of Apache Kafka.

2. Confluent Cloud: This is a fully managed, cloud-native service for Apache Kafka.

Confluent Platform

You can use an Amazon Sink connector for the Confluent platform to export data. It will continuously poll data from Kafka and then write this data into Redshift. 

Confluent Cloud

You can use an Amazon S3 connector for the Confluent Cloud to export data. It exports/streams data from Kafka and loads it into Amazon Redshift. The connector supports data from Kafka in JSON, Bytes, Avro, or Protobuf format and exports data to Amazon S3. Then you can use a copy command to move data from Amazon S3 to Redshift. 

kafka to redshift - confluent

Image Source

Limitations of the Sink Connector

General limitations (apply to both ways)

  • Network reliability: Poor or unstable network connections between Kafka Connect and the target service can cause lag or failed writes.
  • Nested data structures: Deeply nested JSON objects must be flattened or transformed before ingestion.
  • Unsupported data types: Array fields are not supported.
  • Schema requirements: For schema-based formats (Avro, JSON with schema), a valid schema must be registered in the Confluent Schema Registry.

Direct JDBC (Amazon Redshift Sink Connector)

  • Data type restrictionsBYTESSTRUCTMAP, and ARRAY types are not supported.
  • Decimal limitations: Avro schemas containing the decimal logical type are not supported.
  • Schema evolution limits: Auto-evolve can only add new columns. Type changes, column removals, or adding primary key constraints must be done manually.

S3 staging (Amazon S3 Sink Connector + Redshift COPY)

  • Region restrictions: Your Confluent Cloud cluster and the target Redshift/S3 bucket must be in the same AWS region.
  • Load frequency: Delivery is batch-based, so near real-time behavior depends on S3 rotation intervals and COPY schedules.

For a detailed step-by-step guide to move data from Kafka to Redshift, see the resources for: 

Method 2: SaaS Alternative

If you want to avoid the operational complexity of running and maintaining Kafka Connect, Estuary Flow offers a fully managed alternative that is designed to get your Kafka data into Redshift within minutes, not days.

Estuary Flow is a real-time data movement platform built on Gazette, the same kind of distributed, fault-tolerant architecture that powers mission-critical event systems. It captures both historical data and live event streams, then materializes them into Redshift with exactly-once delivery guarantees.

Unlike traditional connector setups that require you to manage infrastructure, Flow runs the entire pipeline in the cloud. This means no connector upgrades, no scaling headaches, and no risk of losing data during redeployments.

Prerequisites to Move Data from Kafka to Redshift with Estuary Flow

  • Estuary Flow account.
  • A Kafka cluster with bootstrap.servers, connection security enabled with TLS.
  • S3 bucket for staging temporary files. 
  • AWS root or IAM user with read and write access to the S3 bucket.

Steps to Move Data from Kafka to Redshift with Estuary Flow

  1. Sign in to your Estuary Flow account (or create a new one).
  2. Once you have logged in, click on Capture + New Capture.
     
  3. Search for the Kafka connector and click Capture.
kafka to redshift - kafka connector.
  1. Provide a unique name for your capture. 
  2. Similarly, fill in the details for Endpoint ConfigEnter the required details for Bootstrap ServersTLS connection settings, and Authentication, and click "Next" to test the connection. 
  3. Flow uses the provided information to initiate a connection to Kafka. Next, a Collection selector will appear, showing the list of collections, representing Kafka topics. In this list, deselect the name of connections that you don’t want to capture. 
  4. You can view the generated capture definition and the schema for each collection in the Specification Editor. For each collection, modify its JSON schema. The schema will be essential to how the data is mapped to Redshift. 
  5. Click Save and Publish. When the capture is published successfully, you will receive a notification. 
  6. In the open dialog box, click Materialize Collections to continue. 
  7.  Create Materialization window will appear. In the search connectors, type Amazon Redshift and enter.
  8. Now an Amazon Redshift tile will appear. Click on Materialize. 
kafka to redshift - redshift connector
  1. Choose a unique name for the materialization and fill in the Endpoint Config details. This connector materializes Flow collections (from your Kafka topics) into tables in an Amazon Redshift database.
  2. Click on Save and Publish. You will receive a notification when the data Flow is published successfully.
  3. From now on, as soon as new data that streams through your Kafka topics will be materialized to the appropriate Redshift table immediately.

Features of Estuary Flow

  • Handles both batch (historical) and event (real-time) data in the same pipeline.
  • Performs initial backfill from Kafka topics plus ongoing continuous sync.
  • Wide connector ecosystem, covering databases, warehouses, and SaaS tools.
  • Exactly-once delivery and schema enforcement for data integrity.
  • Once deployed, pipelines run continuously with no manual intervention.

Summary

You can move data from Kafka to Redshift using either a self-managed or managed Kafka Connect setup or by using Estuary Flow.

  • Kafka Connect (Confluent) is a mature option with both direct JDBC and S3 staging methods. It is best for teams with Kafka Connect expertise who want full control over infrastructure and configurations.
  • Estuary Flow removes the infrastructure burden entirely, delivering exactly-once, real-time pipelines that combine historical backfill with continuous streaming. It is ideal for teams that want fast deployment, zero maintenance, and guaranteed data integrity.

Bottom line: If you have the resources to manage and tune Kafka Connect, it is a capable solution. If you want a faster path to production with fewer moving parts and lower operational costs, Estuary Flow is the smarter choice.

You can deploy your first Kafka to Redshift pipeline on Estuary Flow for free and have it running in minutes.

For moving data from Kafka to Redshift, you can opt for several tools available in the market. Depending upon the requirements of your application and software availability needs, any of the above approaches might be more suitable than others.

If you are a team of engineers and prefer the flexibility of a programmatic approach and software deployment, use the Amazon Redshift Sink/S3 connector method.

Create your free Estuary Flow account and launch your first Kafka to Redshift pipeline today.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Jeffrey Richman
Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.