redshiftdatabricks

9 min read

May 13, 2025

Redshift to Databricks Migration Guide: Real-Time Sync with Estuary Flow

Learn how to migrate from Amazon Redshift to Databricks with real-time data sync using Estuary Flow. Set up continuous pipelines or use S3 + Auto Loader for one-time migration.

Emily Lucek Technical Content Creator

Stream Data From Amazon Redshift to Databricks with Estuary

Share this article

Amazon Redshift has served as a foundational data warehouse for many organizations, especially those that invested early in the AWS ecosystem. But as data strategies evolve — with growing demands for real-time analytics, AI/ML readiness, and support for open formats — teams are increasingly reevaluating their stack.

Databricks has emerged as a preferred destination for those looking to scale beyond the limitations of Redshift. With its lakehouse architecture, native support for Delta Lake, and strong integrations with machine learning tools, Databricks enables a unified environment for both batch and streaming data workloads.

Migrating from Redshift to Databricks isn’t just about shifting storage or compute — it’s about unlocking greater flexibility, performance, and long-term architectural efficiency.

In this guide, we’ll walk you through the best ways to migrate Redshift to Databricks, starting with a streaming-friendly, low-latency pipeline using Estuary Flow. We’ll also cover an alternative batch method using S3 and Databricks Auto Loader for teams with simpler or one-off needs.

Want to go straight to the recommended approach? Here’s how to set up Redshift to Databricks with Estuary Flow →

Redshift vs Databricks: Why More Teams Are Making the Switch

Amazon Redshift is a powerful tool — but only up to a point. As organizations scale their data operations, the cracks begin to show. More users, more workloads, and more demands for real-time access put pressure on infrastructure that was never built for dynamic, open-ended analytics.

Why Data Teams Are Migrating from Redshift to Databricks: A Comparison

Common Friction Points with Redshift

Rigid scaling: You have to provision capacity in advance, which either leads to over-provisioning or query slowdowns during peak usage
Concurrency limitations: As more dashboards, tools, and users pile on, workloads compete for limited slots
Batch-oriented mindset: Redshift was built for scheduled loads, not continuous streams or real-time updates
Data lock-in: Data in Redshift is stored in a proprietary format, making it harder to move, reuse, or serve AI/ML pipelines
Manual operations: Maintenance tasks like vacuuming, WLM tuning, and slot prioritization demand constant attention

These constraints stall productivity, and they force engineers to spend time managing infrastructure instead of unlocking value from data.

The Lakehouse Advantage: Why Teams Are Choosing Databricks

Databricks offers a modern alternative: a lakehouse architecture that unifies the scalability of data lakes with the reliability of data warehouses.

Dynamic compute scaling with job-level cluster configuration
Delta Lake format enables ACID transactions and time travel on open storage
Streaming-native: ingest, process, and serve data in near real-time
Multi-language support: Python, SQL, R, and Scala in one platform
Built-in ML ecosystem with MLflow, feature stores, and notebook-based modeling
Governance-first architecture via Unity Catalog

This shift isn’t just about cost or performance — it’s about enabling faster insights, broader collaboration, and more intelligent products.

But the migration itself needs to be just as forward-thinking. That’s where Estuary Flow comes in.

Why Estuary Flow Is the Smarter Path to Databricks

Data Pipeline from Amazon Redshift to Databricks via Estuary

Data migration isn’t just about moving tables — it’s about preserving business continuity, reducing downtime, and preparing for what comes next. That’s where most traditional approaches fall short. They require manual scripting, introduce long delays between syncs, and often break when schemas evolve.

Estuary Flow is built for modern data teams who want to avoid those pitfalls. It connects Redshift and Databricks through a streaming-first, no-code pipeline that supports:

Continuous sync from Redshift with incremental change detection
Real-time delivery into Databricks Delta tables via Unity Catalog
Automated schema propagation, backfill, and transformation
No-code configuration, deployable in minutes

By continuously syncing your Redshift data into Databricks, Estuary lets your team:

Run analytics in Databricks without waiting for batch exports
Migrate incrementally — reducing cutover risk
Build future-proof workflows for machine learning, BI, and operations

If you're planning to use Databricks as your new central lakehouse, Estuary makes that transition seamless and production-ready.

Let’s walk through exactly how it works.

Method 1: Redshift to Databricks with Estuary Flow (Recommended)

Estuary Flow is a streaming-native ETL platform that makes it easy to build and manage data pipelines, with <100ms latency, real-time change detection, and no code required. Using Flow, you can continuously sync data from Redshift to Databricks while preserving schema, data integrity, and real-time updates.

Here’s how to set it up in two simple steps.

Prerequisites

Before you start, make sure you have:

Redshift

Cluster or serverless access
Username and password (or IAM role access)
Permissions to read from the source schema/tables

Databricks

SQL Warehouse enabled
Unity Catalog with a target schema
Personal Access Token (PAT)
HTTP Path and SQL endpoint

Estuary Flow

A free Estuary Flow account
Network access to both Redshift and Databricks endpoints

Estuary captures data from Redshift using its batch connector, which can ingest entire tables or perform incremental sync based on timestamps or primary keys.

Here’s how to set it up:

Step 1: Capture Data from Redshift

Amazon Redshift Batch Capture Connector Selection

Log in to your Estuary Flow dashboard.
In the left menu, navigate to Sources and click + New Capture.
In the search bar, type “Redshift” and select the Amazon Redshift Batch connector.
Fill in your Redshift connection details:
- Name: e.g., redshift_to_databricks_migration
- Server Address: e.g., redshift-cluster.xxxx.us-west-2.redshift.amazonaws.com
- Database Name, Username, and Password
Click Next, and Estuary will retrieve a list of available schemas and tables.
Select the specific tables or entire schema you want to capture.
Click Save and Publish to deploy the capture.

Once deployed, Estuary will automatically create Flow collections representing the selected Redshift tables. These collections act as schema-enforced, versioned datasets that stay continuously in sync with your source.

Step 2: Materialize Data into Databricks

Now that your Redshift data is flowing into Estuary collections, the next step is to push those into a Databricks SQL Warehouse via a Unity Catalog Volume.

Follow these steps:

In your Estuary dashboard, go to Destinations and click + New Materialization.
Search for and select the Databricks connector.
Configure the materialization with the following:
- Address: Your Databricks SQL endpoint (e.g., dbc-abc123.cloud.databricks.com:443)
- HTTP Path: From your SQL Warehouse settings
- Catalog Name: e.g., main
- Schema Name: e.g., raw_redshift or staging
- Auth Type: Choose Personal Access Token
- Token: Paste your PAT from Databricks
In the Source Collections section, select the collections from your Redshift capture.
Click Next, review settings, then hit Save and Publish.

Estuary now handles the rest — automatically creating tables and transactionally applying new inserts and updates as they arrive from Redshift.

What Happens Behind the Scenes

Once both the capture and materialization are active, Estuary Flow:

Extracts full and incremental data from Redshift using batch scans or key-based syncs
Streams data through its pipeline engine, applying optional transformations if configured
Writes to Databricks Delta tables in your specified schema, with support for schema mapping and updates
Handles retry logic, consistency guarantees, and schema enforcement end-to-end

This architecture ensures your Redshift data is always fresh and query-ready in Databricks, without manual exports, scripts, or scheduled jobs.

Start streaming from Redshift to Databricks in minutes. Try Estuary Flow free — no credit card required.

Method 2: One-Time Migration Using Redshift UNLOAD + Databricks Auto Loader

If your use case doesn’t require ongoing sync or real-time updates — for example, migrating historical data or setting up a new Databricks environment from a static Redshift snapshot — a manual batch migration using Amazon S3 and Databricks Auto Loader may be sufficient.

This approach gives you full control over the extract and load process, but it requires more setup, maintenance, and manual handling of schema consistency and updates.

Step 1: Export Data from Redshift to S3

Use Redshift’s native UNLOAD command to write data into S3 in CSV or Parquet format.

plaintextUNLOAD ('SELECT * FROM your_schema.your_table')
TO 's3://your-bucket/redshift-export/'
CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=yyy'
DELIMITER ','
ALLOWOVERWRITE
PARALLEL OFF;

Tip: Use PARQUET format if possible — it improves performance during the Databricks load step and supports schema inference.

Step 2: Configure Databricks Auto Loader

In Databricks, use Auto Loader with a cloudFiles trigger to load the exported Redshift data from S3.

plaintextdf = (
  spark.readStream
  .format("cloudFiles")
  .option("cloudFiles.format", "parquet")  # or csv
  .option("cloudFiles.inferColumnTypes", "true")
  .load("s3://your-bucket/redshift-export/")
)

df.writeStream.format("delta") \
  .option("checkpointLocation", "/tmp/checkpoints/redshift") \
  .option("mergeSchema", "true") \
  .table("your_catalog.raw_redshift_table")

Make sure your Databricks workspace has IAM access to read from the S3 bucket. You may need to configure instance profiles or credential passthrough.

Limitations of This Approach

No change tracking: You’ll need to manually re-run the pipeline for any new data
No schema evolution: Any schema changes in Redshift will need to be re-exported and reloaded
Operational overhead: Managing S3 permissions, export jobs, and ingestion code adds complexity

For simple use cases or initial backfills, this method can be effective. But if your data keeps changing — or your team wants to avoid repeated manual work — Estuary Flow offers a much more robust, low-maintenance solution.

Ready to move real data? Sign up for Estuary Flow

Conclusion

Migrating from Redshift to Databricks isn’t just about switching platforms — it’s about leveling up your data architecture. Databricks offers unmatched flexibility for analytics, machine learning, and streaming, but the success of your migration depends on how you move the data.

While the manual S3-based method can get the job done for static snapshots or initial loads, it quickly becomes burdensome when real-time updates, schema changes, or operational simplicity are required.

Estuary Flow takes the friction out of data movement. With real-time CDC, automated schema handling, and seamless delivery into Databricks’ Unity Catalog, it’s the most efficient way to modernize your pipeline — without writing a single line of glue code.

Get Started with Estuary Flow

Sign up for a free account: Estuary Flow Dashboard
Explore the Redshift and Databricks connector docs
Or talk to our team about your specific use case

Share this article

Table of Contents

Start Building For Free

About the author

Emily LucekTechnical Content Creator

Emily is a software engineer and technical content creator with an interest in developer education. She has experience across Developer Relations roles from her FinTech background and is always learning something new.

Redshift to Databricks Migration Guide: Real-Time Sync with Estuary Flow

Redshift vs Databricks: Why More Teams Are Making the Switch

Common Friction Points with Redshift

The Lakehouse Advantage: Why Teams Are Choosing Databricks

Why Estuary Flow Is the Smarter Path to Databricks

Method 1: Redshift to Databricks with Estuary Flow (Recommended)

Prerequisites

Step 1: Capture Data from Redshift

Step 2: Materialize Data into Databricks

What Happens Behind the Scenes

Method 2: One-Time Migration Using Redshift UNLOAD + Databricks Auto Loader

Step 1: Export Data from Redshift to S3

Step 2: Configure Databricks Auto Loader

Limitations of This Approach

Conclusion

Get Started with Estuary Flow

Start streaming your data for free

About the author

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.