snowflakedatabricks

9 min read

Last updated: August 4, 2025

How to Move Data from Snowflake to Databricks in Real Time

Learn how to move Snowflake data to Databricks in real time using Estuary. No Spark jobs, no scripts — just fast, continuous syncs.

Team Estuary Estuary Editorial Team

Stream Snowflake CDC to Databricks with Estuary Flow

Share this article

Snowflake and Databricks are both leaders in the modern data stack, but for different reasons. Snowflake is trusted for scalable, SQL-based analytics. Databricks excels at machine learning, streaming workloads, and open-format flexibility through Delta Lake.

Many teams now use both platforms together. But here's the challenge: how do you move or sync data from Snowflake to Databricks efficiently, without relying on batch pipelines or clunky export workflows?

Whether you're looking to offload compute, train ML models on fresh data, or shift toward a lakehouse architecture, real-time data movement between Snowflake and Databricks unlocks serious performance and cost advantages.

In this guide, you'll learn how to stream Snowflake data to Databricks using Estuary Flow—a fully managed platform that lets you build no-code, real-time pipelines with minimal overhead. No Kafka. No Airflow. No batch jobs. Just fast, reliable sync that scales with your data.

If you're ready to move beyond manual ETL and start connecting your warehouse to your lakehouse in real time, this guide will show you exactly how to do it.

Snowflake and Databricks: Better Together, Not One vs the Other

Snowflake’s compute costs scale steeply over time - Image Source

Snowflake vs. Databricks Cost Over Time — Snowflake’s compute costs scale steeply over time - Image Source

At first glance, Snowflake and Databricks might seem like competing platforms. But in reality, they’re built for different strengths—and when used together, they can unlock far more value than either one alone.

Snowflake is a fully managed cloud data warehouse known for its ease of use, native SQL support, and high performance for structured analytics. It’s ideal for dashboards, BI tools, and operational reporting.

Databricks, on the other hand, is a powerful lakehouse platform built on Apache Spark. It shines in machine learning, streaming, and large-scale data processing—especially when working with open file formats like Delta Lake or Parquet. It's also where many teams are running AI workflows, from real-time inference to model training and experimentation.

In a modern data stack, it’s common to see both tools working side by side:

Product analytics in Snowflake, feature engineering in Databricks
SQL dashboards in Snowflake, AI model pipelines in Databricks
Raw event data stored in Snowflake, real-time enrichment, and ML ops in Databricks

But here’s the catch: they don’t sync out of the box. And AI pipelines are only as good as the freshness and completeness of the data they’re trained on.

That’s why moving data from Snowflake to Databricks—continuously, in real time—has become a key part of modern architectures. Instead of treating these systems as silos, the smart move is to connect them and let each do what it does best.

Suggested Read: Databricks vs Snowflake

Why Moving Data from Snowflake to Databricks Is Harder Than It Looks

While both Snowflake and Databricks are powerful in their own right, connecting them is surprisingly difficult. Most teams start with manual or batch-based methods, only to find themselves stuck with brittle workflows that don’t scale.

Here are the common roadblocks:

No native sync: Snowflake doesn’t offer built-in connectors to stream data directly into Databricks. You’ll need custom pipelines or third-party tools to bridge the gap.
ETL is complex and slow: Traditional extract-transform-load (ETL) pipelines are often batch-oriented. They introduce hours of latency, which kills real-time use cases like AI-powered recommendations or live dashboards.
Maintenance overhead: Managing scripts, orchestrators, and schema changes across two evolving platforms becomes a full-time job. One change in Snowflake can break your entire Databricks workflow.
Data duplication or loss risks: Without exactly-once delivery and schema enforcement, syncing can result in duplicates, partial updates, or broken AI inputs.
Limited flexibility: Most off-the-shelf ETL tools don’t support custom transformations, streaming updates, or hybrid cloud environments well enough to keep up.

If your team relies on fresh Snowflake data for AI pipelines, ML features, or real-time metrics in Databricks, you can’t afford to wait hours—or rebuild pipelines every time something changes.

This is where Estuary Flow makes a meaningful difference.

How to Move Data from Snowflake to Databricks Using Estuary (Step-by-Step)

Estuary Flow makes it easy to stream data from Snowflake to Databricks — no pipelines to maintain, no custom Spark jobs, no batch scripts.

In this step-by-step guide, you’ll connect Snowflake as your source using CDC and deliver data continuously into Databricks with Delta Lake format. This gives you sub-second latency, automatic schema handling, and a fully managed pipeline.

Let’s walk through the setup:

Step 1: Connect Snowflake as Your Source

Snowflake CDC source connector in Estuary

Log in to the Estuary Dashboard. If you don’t have an account yet, create one for free — no credit card required.
In the left sidebar, click on Sources, then hit the + New Source button.
From the list of connectors, select Snowflake and click Capture.
Enter your Snowflake credentials:
- Host: Your Snowflake account URL (e.g. xy12345.us-east-1.snowflakecomputing.com)
- Database and Warehouse: Where your source data lives.
- User and Password: A Snowflake user with appropriate roles (we recommend creating a dedicated ESTUARY_USER).
Estuary will auto-discover your Snowflake schema. Select one or more tables to sync. Estuary will now capture all inserts, updates, and deletes in real time using CDC.

Step 2: Set Up Databricks as the Destination

Databricks destination connector in Estuary

From the dashboard sidebar, go to Destinations and click + New Materialization.
Select Databricks from the list and click Materialize.
Fill in your Databricks configuration details:
- Address: Host and port for your SQL warehouse.
- HTTP Path: From your SQL warehouse.
- Catalog Name: Name of your Unity Catalog.
- Personal Access Token: Generate a Personal Access Token in Databricks.
Link the collections from your Snowflake source to this Databricks materialization. Estuary will ensure schema compatibility.

Step 3: Save and Activate the Pipeline

Click Save & Publish to activate your pipeline. Estuary begins streaming data from Snowflake to Databricks immediately.
From the dashboard, you can:
- Monitor sync status and latency in real time
- View row counts and throughput
- Edit schemas and transformations
- Enable logging and error alerts
Want to transform data in-flight? Use Estuary’s UI for field mappings, or go deeper with SQL and TypeScript derivations.

Need to scale across more tables or use cases? Repeat the same flow. Estuary supports multiple pipelines and horizontal scaling.

You’ve now built a production-ready, real-time data pipeline from Snowflake to Databricks in minutes, with zero code and full observability.

Why Use Estuary to Move Data from Snowflake to Databricks?

Moving data from Snowflake to Databricks might sound simple in theory, but maintaining reliability, low latency, and scalability in practice is another story. Here’s why Estuary is the smart choice for teams looking to bridge the two platforms:

Real-Time Streaming with Sub-Second Latency

Estuary supports Snowflake CDC out-of-the-box. That means you’re not relying on batch jobs or time-consuming DIY implementations — your data flows continuously with latency low enough to power real-time dashboards and analytics in Databricks.

You can also ensure data makes it into Snowflake in real-time in the first place with Estuary’s Snowpipe Streaming integration. Any latency upstream cascades along the pipeline, so a truly real-time solution needs the lowest possible latency at each step of the journey.

No-Code Setup, End-to-End

Traditional approaches require writing custom Spark jobs, managing orchestration tools, or configuring middleware like Kafka. Estuary eliminates that complexity. You configure your pipeline once through the UI, and Estuary handles the rest — from data capture to delivery.

Automatic Schema Management

When your schema changes in Snowflake — a new column, a renamed field, or a changed data type — Estuary Flow can automatically adapt. No broken pipelines, no manual intervention, and no downstream data loss.

Delta Lake Compatibility

Data lands in Databricks in Delta Lake format, which means it’s immediately queryable and ACID-compliant. Whether you're building ML pipelines or interactive dashboards, you can trust your data is fresh and reliable.

Built-In Transformations

Need to reshape or clean your data before it hits Databricks? Estuary supports field mappings, filtering, and derived collections using SQL or TypeScript, right in the pipeline.

Unified Monitoring and Observability

With Estuary’s dashboard, you get full visibility into every pipeline: throughput, latency, sync health, error tracking, and more. No more jumping between tools or building your own observability stack.

Flexible Enough to Fit Any Workflow

Want to mix real-time and batch? Build pipelines across cloud regions? Maintain strict compliance with private deployments? Estuary’s flexible architecture — including support for BYOC — gives you full control without locking you into a rigid model.

Conclusion

Moving data from Snowflake to Databricks doesn’t have to mean maintaining Spark jobs, stitching together batch scripts, or sacrificing freshness for simplicity. With Estuary Flow, you get a real-time, production-ready pipeline that combines the best of Snowflake’s SQL savvy and Databricks’ scalable analytics on Delta Lake — all with zero manual maintenance.

Whether you're powering live dashboards, feeding machine learning models, or unifying data across teams, Estuary ensures low-latency syncs, automatic schema handling, and exactly-once guarantees — out of the box.

Next Steps: Start Streaming from Snowflake to Databricks

If you're ready to move Snowflake data into Databricks for faster analytics and AI-powered insights, Estuary Flow makes it effortless.

Create your Estuary Flow account
Set up your first Snowflake to Databricks pipeline in minutes — no Spark jobs, no batch scripts. Get started with Flow
Explore step-by-step tutorials
Learn how Flow works, how to configure delta updates, and how to optimize syncs for Delta Lake. View documentation
Join the Estuary Slack community
Connect with other data engineers, ask questions, and get real-time support from the Estuary team. Join our Slack
Talk to us about your data architecture
Need help with secure deployment, private networking, or choosing the right ingestion method? We’re here to help. Contact Estuary

FAQs

1. What does Estuary do that tools like Fivetran or Airbyte don’t?

Estuary supports real-time streaming using CDC and syncs into Delta Lake. Most ETL tools focus on batch extraction and may not support Databricks well, or not support Snowflake as a source at all. Estuary’s architecture is built for low-latency pipelines and event-based syncs.

2. Do I need a Snowflake or Databricks enterprise plan to use Estuary?

No, you can use Estuary with standard Snowflake and Databricks accounts as long as the required APIs and credentials are enabled.

3. Will real-time ingestion increase Snowflake or Databricks costs?

Estuary is optimized to reduce compute costs by minimizing warehouse usage in Snowflake and streaming efficiently to Delta Lake in Databricks. You also eliminate the operational cost of managing batch infrastructure and scheduling jobs.

4. Can I move data from Snowflake to Databricks manually?

Yes, it’s possible to move data from Snowflake to Databricks manually, but the process is time-consuming, error-prone, and not suitable for real-time use cases. The most common manual method involves exporting data from Snowflake into files—usually in CSV or Parquet format—then uploading those files into cloud storage like Amazon S3 or Azure Blob Storage. From there, you would configure Databricks to ingest the files into Delta Lake tables using notebook jobs or data ingestion tools like Auto Loader. While this approach can work for one-time transfers or infrequent updates, it introduces significant delays, lacks automation, and doesn’t scale well for production workloads. You’ll also need to handle schema changes, file cleanup, job scheduling, and failure monitoring manually. That’s why many teams are shifting to automated, real-time solutions like Estuary Flow, which eliminates batch jobs and continuously streams Snowflake data to Databricks with minimal setup and no code.

5. Can you use Snowflake and Databricks together?

Absolutely. In fact, using Snowflake and Databricks together is increasingly common among data-driven teams. Snowflake serves as a trusted source for clean, structured data, while Databricks provides the flexibility and compute power for streaming, feature engineering, and model training. The key is having a reliable way to move data between them—preferably in real time. That’s where tools like Estuary Flow come in, enabling continuous, low-latency syncs from Snowflake to Databricks without manual pipelines or batch delays.

Share this article

Table of Contents

Start Building For Free

About the author

Team EstuaryEstuary Editorial Team

Team Estuary is a group of engineers, product experts, and data strategists building the future of real-time and batch data integration. We write to share technical insights, industry trends, and practical guides.

How to Move Data from Snowflake to Databricks in Real Time

Snowflake and Databricks: Better Together, Not One vs the Other

Why Moving Data from Snowflake to Databricks Is Harder Than It Looks