Estuary

How to Move Data from Snowflake to Databricks in Real Time

Learn how to move data from Snowflake to Databricks in real time using Estuary. Build dependable right-time pipelines for analytics, AI, and machine learning with no code and continuous data sync between your warehouse and lakehouse.

Stream Snowflake CDC to Databricks with Estuary Flow
Share this article

You can stream data from Snowflake to Databricks in real time using Estuary, a right-time data platform that lets you move data continuously without managing batch pipelines or manual exports. This integration keeps Databricks always up to date with your latest Snowflake tables, powering faster analytics, AI workflows, and machine learning pipelines.

Snowflake is built for scalable SQL analytics, while Databricks excels at advanced processing, streaming, and open data formats like Delta Lake. When connected through Estuary, the two platforms complement each other perfectly, enabling low-latency data sharing, reduced compute costs, and unified access to your most current data.

In this guide, you’ll learn how to set up a reliable Snowflake to Databricks connection in minutes using Estuary, along with key reasons this real-time approach outperforms traditional ETL pipelines.

Key Takeaways

  • You can connect Snowflake and Databricks to combine scalable SQL analytics with advanced AI and machine learning workloads.
  • Traditional ETL or batch pipelines introduce latency and maintenance overhead, slowing down data-driven workflows.
  • Estuary enables right-time data movement, letting you stream Snowflake data to Databricks with sub-second latency.
  • No coding or orchestration tools are required — Estuary automatically handles schema evolution, monitoring, and recovery.
  • The result: faster insights, real-time model training, and lower compute costs across your warehouse and lakehouse environments.

Snowflake and Databricks: Better Together, Not One vs the Other

Snowflake vs. Databricks Cost Over Time
Snowflake’s compute costs scale steeply over time - Image Source

At first glance, Snowflake and Databricks might seem like competing platforms. But in reality, they’re built for different strengths—and when used together, they can unlock far more value than either one alone.

Snowflake is a fully managed cloud data warehouse known for its ease of use, native SQL support, and high performance for structured analytics. It’s ideal for dashboards, BI tools, and operational reporting.

Databricks, on the other hand, is a powerful lakehouse platform built on Apache Spark. It shines in machine learning, streaming, and large-scale data processing—especially when working with open file formats like Delta Lake or Parquet. It's also where many teams are running AI workflows, from real-time inference to model training and experimentation.

In a modern data stack, it’s common to see both tools working side by side:

  • Product analytics in Snowflake, feature engineering in Databricks
  • SQL dashboards in Snowflake, AI model pipelines in Databricks
  • Raw event data stored in Snowflake, real-time enrichment, and ML ops in Databricks

But here’s the catch: they don’t sync out of the box. And AI pipelines are only as good as the freshness and completeness of the data they’re trained on.

That’s why moving data from Snowflake to Databricks—continuously, in real time—has become a key part of modern architectures. Instead of treating these systems as silos, the smart move is to connect them and let each do what it does best.

Suggested Read: Databricks vs Snowflake

Why Moving Data from Snowflake to Databricks Is Harder Than It Looks

While both Snowflake and Databricks are powerful in their own right, connecting them is surprisingly difficult. Most teams start with manual or batch-based methods, only to find themselves stuck with brittle workflows that don’t scale.

Here are the common roadblocks:

  • No native sync: Snowflake doesn’t offer built-in connectors to stream data directly into Databricks. You’ll need custom pipelines or third-party tools to bridge the gap.
  • ETL is complex and slow: Traditional extract-transform-load (ETL) pipelines are often batch-oriented. They introduce hours of latency, which kills real-time use cases like AI-powered recommendations or live dashboards.
  • Maintenance overhead: Managing scripts, orchestrators, and schema changes across two evolving platforms becomes a full-time job. One change in Snowflake can break your entire Databricks workflow.
  • Data duplication or loss risks: Without exactly-once delivery and schema enforcement, syncing can result in duplicates, partial updates, or broken AI inputs.
  • Limited flexibility: Most off-the-shelf ETL tools don’t support custom transformations, streaming updates, or hybrid cloud environments well enough to keep up.

If your team relies on fresh Snowflake data for AI pipelines, ML features, or real-time metrics in Databricks, you can’t afford to wait hours—or rebuild pipelines every time something changes.

This is where Estuary Flow makes a meaningful difference.

How to Move Data from Snowflake to Databricks Using Estuary (Step-by-Step)

iceberg vs hudi - estuary logo

Estuary makes it easy to stream data from Snowflake to Databricks — no pipelines to maintain, no custom Spark jobs, no batch scripts.

In this step-by-step guide, you’ll connect Snowflake as your source using CDC and deliver data continuously into Databricks with Delta Lake format. This gives you sub-second latency, automatic schema handling, and a fully managed pipeline.

Let’s walk through the setup:

Step 1: Connect Snowflake as Your Source

Snowflake CDC source connector in Estuary
  1. Log in to the Estuary Dashboard. If you don’t have an account yet, create one for free — no credit card required.
  2. In the left sidebar, click on Sources, then hit the + New Source button.
  3. From the list of connectors, select Snowflake and click Capture.
  4. Enter your Snowflake credentials:
    • Host: Your Snowflake account URL (e.g. xy12345.us-east-1.snowflakecomputing.com)
    • Database and Warehouse: Where your source data lives.
    • User and Password: A Snowflake user with appropriate roles (we recommend creating a dedicated ESTUARY_USER).
  5. Estuary will auto-discover your Snowflake schema. Select one or more tables to sync. Estuary will now capture all inserts, updates, and deletes in real time using CDC.

Step 2: Set Up Databricks as the Destination

Databricks destination connector in Estuary
  1. From the dashboard sidebar, go to Destinations and click + New Materialization.
  2. Select Databricks from the list and click Materialize.
  3. Fill in your Databricks configuration details:
    • Address: Host and port for your SQL warehouse.
    • HTTP Path: From your SQL warehouse.
    • Catalog Name: Name of your Unity Catalog.
    • Personal Access Token: Generate a Personal Access Token in Databricks.
  4. Link the collections from your Snowflake source to this Databricks materialization. Estuary will ensure schema compatibility.

Step 3: Save and Activate the Pipeline

  1. Click Save & Publish to activate your pipeline. Estuary begins streaming data from Snowflake to Databricks immediately.
  2. From the dashboard, you can:
    • Monitor sync status and latency in real time
    • View row counts and throughput
    • Edit schemas and transformations
    • Enable logging and error alerts
  3. Want to transform data in-flight? Use Estuary’s UI for field mappings, or go deeper with SQL and TypeScript derivations.

Need to scale across more tables or use cases? Repeat the same flow. Estuary supports multiple pipelines and horizontal scaling.

You’ve now built a production-ready, real-time data pipeline from Snowflake to Databricks in minutes, with zero code and full observability.

Why Use Estuary to Move Data from Snowflake to Databricks?

Moving data from Snowflake to Databricks might sound simple in theory, but maintaining reliability, low latency, and scalability in practice is another story. Here’s why Estuary is the smart choice for teams looking to bridge the two platforms:

Real-Time Streaming with Sub-Second Latency

Estuary supports Snowflake CDC out-of-the-box. That means you’re not relying on batch jobs or time-consuming DIY implementations — your data flows continuously with latency low enough to power real-time dashboards and analytics in Databricks.

You can also ensure data makes it into Snowflake in real-time in the first place with Estuary’s Snowpipe Streaming integration. Any latency upstream cascades along the pipeline, so a truly real-time solution needs the lowest possible latency at each step of the journey.

No-Code Setup, End-to-End

Traditional approaches require writing custom Spark jobs, managing orchestration tools, or configuring middleware like Kafka. Estuary eliminates that complexity. You configure your pipeline once through the UI, and Estuary handles the rest — from data capture to delivery.

Automatic Schema Management

When your schema changes in Snowflake — a new column, a renamed field, or a changed data type — Estuary Flow can automatically adapt. No broken pipelines, no manual intervention, and no downstream data loss.

Delta Lake Compatibility

Data lands in Databricks in Delta Lake format, which means it’s immediately queryable and ACID-compliant. Whether you're building ML pipelines or interactive dashboards, you can trust your data is fresh and reliable.

Built-In Transformations

Need to reshape or clean your data before it hits Databricks? Estuary supports field mappings, filtering, and derived collections using SQL or TypeScript, right in the pipeline.

Unified Monitoring and Observability

With Estuary’s dashboard, you get full visibility into every pipeline: throughput, latency, sync health, error tracking, and more. No more jumping between tools or building your own observability stack.

Flexible Enough to Fit Any Workflow

Want to mix real-time and batch? Build pipelines across cloud regions? Maintain strict compliance with private deployments? Estuary’s flexible architecture — including support for BYOC — gives you full control without locking you into a rigid model.

Conclusion

Syncing data between Snowflake and Databricks is no longer just about migration. It is about enabling analytics and AI systems to work from the same, freshest version of data. Batch jobs and manual exports cannot meet that standard anymore.

Using a right-time data platform like Estuary, teams can continuously move data from Snowflake to Databricks with sub-second latency, automatic schema handling, and exactly-once reliability. The result is a more efficient architecture where Snowflake remains the foundation for analytics and Databricks becomes the engine for machine learning and large-scale processing.

Whether you are building real-time dashboards, feature pipelines, or unified data models, the key is to control when and how your data moves while balancing performance, cost, and reliability. Estuary makes that balance possible through unified right-time data movement.

Next Steps

FAQs

    What does Estuary do that tools like Fivetran or Airbyte don’t?

    Estuary supports real-time streaming using CDC and syncs into Delta Lake. Most ETL tools focus on batch extraction and may not support Databricks well, or not support Snowflake as a source at all. Estuary’s architecture is built for low-latency pipelines and event-based syncs.
    No, you can use Estuary with standard Snowflake and Databricks accounts as long as the required APIs and credentials are enabled.
    Estuary is optimized to reduce compute costs by minimizing warehouse usage in Snowflake and streaming efficiently to Delta Lake in Databricks. You also eliminate the operational cost of managing batch infrastructure and scheduling jobs.
    Yes, it’s possible to move data from Snowflake to Databricks manually, but the process is time-consuming, error-prone, and not suitable for real-time use cases. The most common manual method involves exporting data from Snowflake into files—usually in CSV or Parquet format—then uploading those files into cloud storage like Amazon S3 or Azure Blob Storage. From there, you would configure Databricks to ingest the files into Delta Lake tables using notebook jobs or data ingestion tools like Auto Loader. While this approach can work for one-time transfers or infrequent updates, it introduces significant delays, lacks automation, and doesn’t scale well for production workloads. You’ll also need to handle schema changes, file cleanup, job scheduling, and failure monitoring manually. That’s why many teams are shifting to automated, real-time solutions like Estuary Flow, which eliminates batch jobs and continuously streams Snowflake data to Databricks with minimal setup and no code.
    Absolutely. In fact, using Snowflake and Databricks together is increasingly common among data-driven teams. Snowflake serves as a trusted source for clean, structured data, while Databricks provides the flexibility and compute power for streaming, feature engineering, and model training. The key is having a reliable way to move data between them—preferably in real time. That’s where tools like Estuary Flow come in, enabling continuous, low-latency syncs from Snowflake to Databricks without manual pipelines or batch delays.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Team Estuary
Team EstuaryEstuary Editorial Team

Team Estuary is a group of engineers, product experts, and data strategists building the future of real-time and batch data integration. We write to share technical insights, industry trends, and practical guides.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.