
Amazon Redshift has served as a foundational data warehouse for many organizations, especially those that invested early in the AWS ecosystem. But as data strategies evolve — with growing demands for real-time analytics, AI/ML readiness, and support for open formats — teams are increasingly reevaluating their stack.
Databricks has emerged as a preferred destination for those looking to scale beyond the limitations of Redshift. With its lakehouse architecture, native support for Delta Lake, and strong integrations with machine learning tools, Databricks enables a unified environment for both batch and streaming data workloads.
Migrating from Redshift to Databricks isn’t just about shifting storage or compute — it’s about unlocking greater flexibility, performance, and long-term architectural efficiency.
In this guide, we’ll walk you through the best ways to migrate Redshift to Databricks, starting with a streaming-friendly, low-latency pipeline using Estuary Flow. We’ll also cover an alternative batch method using S3 and Databricks Auto Loader for teams with simpler or one-off needs.
Want to go straight to the recommended approach? Here’s how to set up Redshift to Databricks with Estuary Flow →
Redshift vs Databricks: Why More Teams Are Making the Switch
Amazon Redshift is a powerful tool — but only up to a point. As organizations scale their data operations, the cracks begin to show. More users, more workloads, and more demands for real-time access put pressure on infrastructure that was never built for dynamic, open-ended analytics.
Common Friction Points with Redshift
- Rigid scaling: You have to provision capacity in advance, which either leads to over-provisioning or query slowdowns during peak usage
- Concurrency limitations: As more dashboards, tools, and users pile on, workloads compete for limited slots
- Batch-oriented mindset: Redshift was built for scheduled loads, not continuous streams or real-time updates
- Data lock-in: Data in Redshift is stored in a proprietary format, making it harder to move, reuse, or serve AI/ML pipelines
- Manual operations: Maintenance tasks like vacuuming, WLM tuning, and slot prioritization demand constant attention
These constraints stall productivity, and they force engineers to spend time managing infrastructure instead of unlocking value from data.
The Lakehouse Advantage: Why Teams Are Choosing Databricks
Databricks offers a modern alternative: a lakehouse architecture that unifies the scalability of data lakes with the reliability of data warehouses.
- Dynamic compute scaling with job-level cluster configuration
- Delta Lake format enables ACID transactions and time travel on open storage
- Streaming-native: ingest, process, and serve data in near real-time
- Multi-language support: Python, SQL, R, and Scala in one platform
- Built-in ML ecosystem with MLflow, feature stores, and notebook-based modeling
- Governance-first architecture via Unity Catalog
This shift isn’t just about cost or performance — it’s about enabling faster insights, broader collaboration, and more intelligent products.
But the migration itself needs to be just as forward-thinking. That’s where Estuary Flow comes in.
Why Estuary Flow Is the Smarter Path to Databricks
Data migration isn’t just about moving tables — it’s about preserving business continuity, reducing downtime, and preparing for what comes next. That’s where most traditional approaches fall short. They require manual scripting, introduce long delays between syncs, and often break when schemas evolve.
Estuary Flow is built for modern data teams who want to avoid those pitfalls. It connects Redshift and Databricks through a streaming-first, no-code pipeline that supports:
- Continuous sync from Redshift with incremental change detection
- Real-time delivery into Databricks Delta tables via Unity Catalog
- Automated schema propagation, backfill, and transformation
- No-code configuration, deployable in minutes
By continuously syncing your Redshift data into Databricks, Estuary lets your team:
- Run analytics in Databricks without waiting for batch exports
- Migrate incrementally — reducing cutover risk
- Build future-proof workflows for machine learning, BI, and operations
If you're planning to use Databricks as your new central lakehouse, Estuary makes that transition seamless and production-ready.
Let’s walk through exactly how it works.
Method 1: Redshift to Databricks with Estuary Flow (Recommended)
Estuary Flow is a streaming-native ETL platform that makes it easy to build and manage data pipelines, with <100ms latency, real-time change detection, and no code required. Using Flow, you can continuously sync data from Redshift to Databricks while preserving schema, data integrity, and real-time updates.
Here’s how to set it up in two simple steps.
Prerequisites
Before you start, make sure you have:
Redshift
- Cluster or serverless access
- Username and password (or IAM role access)
- Permissions to read from the source schema/tables
Databricks
- SQL Warehouse enabled
- Unity Catalog with a target schema
- Personal Access Token (PAT)
- HTTP Path and SQL endpoint
Estuary Flow
- A free Estuary Flow account
- Network access to both Redshift and Databricks endpoints
Estuary captures data from Redshift using its batch connector, which can ingest entire tables or perform incremental sync based on timestamps or primary keys.
Here’s how to set it up:
Step 1: Capture Data from Redshift
- Log in to your Estuary Flow dashboard.
- In the left menu, navigate to Sources and click + New Capture.
- In the search bar, type “Redshift” and select the Amazon Redshift Batch connector.
- Fill in your Redshift connection details:
- Name: e.g.,
redshift_to_databricks_migration
- Server Address: e.g.,
redshift-cluster.xxxx.us-west-2.redshift.amazonaws.com
- Database Name, Username, and Password
- Name: e.g.,
- Click Next, and Estuary will retrieve a list of available schemas and tables.
- Select the specific tables or entire schema you want to capture.
- Click Save and Publish to deploy the capture.
Once deployed, Estuary will automatically create Flow collections representing the selected Redshift tables. These collections act as schema-enforced, versioned datasets that stay continuously in sync with your source.
Step 2: Materialize Data into Databricks
Now that your Redshift data is flowing into Estuary collections, the next step is to push those into a Databricks SQL Warehouse via a Unity Catalog Volume.
Follow these steps:
- In your Estuary dashboard, go to Destinations and click + New Materialization.
- Search for and select the Databricks connector.
- Configure the materialization with the following:
- Address: Your Databricks SQL endpoint (e.g.,
dbc-abc123.cloud.databricks.com:443
) - HTTP Path: From your SQL Warehouse settings
- Catalog Name: e.g.,
main
- Schema Name: e.g.,
raw_redshift
orstaging
- Auth Type: Choose Personal Access Token
- Token: Paste your PAT from Databricks
- Address: Your Databricks SQL endpoint (e.g.,
- In the Source Collections section, select the collections from your Redshift capture.
- Click Next, review settings, then hit Save and Publish.
Estuary now handles the rest — automatically creating tables and transactionally applying new inserts and updates as they arrive from Redshift.
What Happens Behind the Scenes
Once both the capture and materialization are active, Estuary Flow:
- Extracts full and incremental data from Redshift using batch scans or key-based syncs
- Streams data through its pipeline engine, applying optional transformations if configured
- Writes to Databricks Delta tables in your specified schema, with support for schema mapping and updates
- Handles retry logic, consistency guarantees, and schema enforcement end-to-end
This architecture ensures your Redshift data is always fresh and query-ready in Databricks, without manual exports, scripts, or scheduled jobs.
Start streaming from Redshift to Databricks in minutes. Try Estuary Flow free — no credit card required.
Method 2: One-Time Migration Using Redshift UNLOAD + Databricks Auto Loader
If your use case doesn’t require ongoing sync or real-time updates — for example, migrating historical data or setting up a new Databricks environment from a static Redshift snapshot — a manual batch migration using Amazon S3 and Databricks Auto Loader may be sufficient.
This approach gives you full control over the extract and load process, but it requires more setup, maintenance, and manual handling of schema consistency and updates.
Step 1: Export Data from Redshift to S3
Use Redshift’s native UNLOAD
command to write data into S3 in CSV or Parquet format.
plaintextUNLOAD ('SELECT * FROM your_schema.your_table')
TO 's3://your-bucket/redshift-export/'
CREDENTIALS 'aws_access_key_id=xxx;aws_secret_access_key=yyy'
DELIMITER ','
ALLOWOVERWRITE
PARALLEL OFF;
Tip: Use PARQUET
format if possible — it improves performance during the Databricks load step and supports schema inference.
Step 2: Configure Databricks Auto Loader
In Databricks, use Auto Loader with a cloudFiles trigger to load the exported Redshift data from S3.
plaintextdf = (
spark.readStream
.format("cloudFiles")
.option("cloudFiles.format", "parquet") # or csv
.option("cloudFiles.inferColumnTypes", "true")
.load("s3://your-bucket/redshift-export/")
)
df.writeStream.format("delta") \
.option("checkpointLocation", "/tmp/checkpoints/redshift") \
.option("mergeSchema", "true") \
.table("your_catalog.raw_redshift_table")
Make sure your Databricks workspace has IAM access to read from the S3 bucket. You may need to configure instance profiles or credential passthrough.
Limitations of This Approach
- No change tracking: You’ll need to manually re-run the pipeline for any new data
- No schema evolution: Any schema changes in Redshift will need to be re-exported and reloaded
- Operational overhead: Managing S3 permissions, export jobs, and ingestion code adds complexity
For simple use cases or initial backfills, this method can be effective. But if your data keeps changing — or your team wants to avoid repeated manual work — Estuary Flow offers a much more robust, low-maintenance solution.
Ready to move real data? Sign up for Estuary Flow
Conclusion
Migrating from Redshift to Databricks isn’t just about switching platforms — it’s about leveling up your data architecture. Databricks offers unmatched flexibility for analytics, machine learning, and streaming, but the success of your migration depends on how you move the data.
While the manual S3-based method can get the job done for static snapshots or initial loads, it quickly becomes burdensome when real-time updates, schema changes, or operational simplicity are required.
Estuary Flow takes the friction out of data movement. With real-time CDC, automated schema handling, and seamless delivery into Databricks’ Unity Catalog, it’s the most efficient way to modernize your pipeline — without writing a single line of glue code.
Get Started with Estuary Flow
- Sign up for a free account: Estuary Flow Dashboard
- Explore the Redshift and Databricks connector docs
- Or talk to our team about your specific use case

About the author
Emily is a software engineer and technical content creator with an interest in developer education. She has experience across Developer Relations roles from her FinTech background and is always learning something new.
Popular Articles
