
Apache Kafka is the industry standard for high-throughput, real-time data ingestion — but it wasn’t built for complex analytics. Storing large volumes of Kafka data and running SQL queries for ML models, dashboards, or operational analytics? That’s where things start to break.
Enter Databricks, the unified platform for lakehouse analytics. With support for Delta Lake, streaming ingestion, and SQL warehouses, it’s a natural destination for Kafka pipelines.
But moving data from Kafka to Databricks isn’t always straightforward — unless you use the right tooling.
In this guide, we’ll show you:
- Why teams move data from Kafka to Databricks
- Two methods to do it: real-time with Estuary Flow vs. manual with Kafka Connect
- A step-by-step walkthrough for building a zero-code Kafka → Databricks pipeline using Estuary
- Performance, reliability, and transformation considerations
Let’s get streaming.
Why Stream Data from Kafka to Databricks?
Kafka is ideal for real-time ingestion, but it's not a storage or analytics engine. If you're trying to build data pipelines for dashboards, machine learning, or even basic reporting — you need a scalable, queryable platform on the other end.
That’s why modern teams integrate Kafka with Databricks SQL Warehouse, backed by Delta Lake and the Unity Catalog.
Kafka | Databricks |
Real-time event streaming | Scalable, transactional storage |
Optimized for ingestion | Optimized for analytics |
Limited querying capabilities | SQL engine with BI & ML support |
Volatile message retention | Persistent, structured data |
Method 1: Stream Kafka to Databricks with Estuary Flow (Recommended)
Estuary Flow is a real-time ETL/ELT platform that connects data systems with <100ms latency, full CDC (Change Data Capture) support, and no code required.
It supports:
- Streaming ingestion from Kafka topics (JSON or Avro message format)
- Transformation using SQL or TypeScript
- Real-time materialization to Databricks tables via Unity Catalog
Prerequisites
Before you begin, make sure you have:
- A free Estuary Flow account (sign in with GitHub, Google, or Azure)
- Kafka cluster access with:
- bootstrap.servers, auth config, and TLS enabled
- Optional schema registry (required for Avro)
- A Databricks workspace with:
- SQL Warehouse
- Unity Catalog + schema
- Personal Access Token for authentication
Step-by-Step: Kafka to Databricks with Estuary Flow
Step 1: Capture Streaming Data from Kafka
Estuary’s Kafka connector captures records from topics using Avro or JSON formats.
- On the Estuary dashboard, go to Sources > + New Capture
- Search for Kafka and click Capture
- Enter your configuration details:
- bootstrap_servers: e.g. kafka1.example.com:9092
- Credentials: Username and password for SASL authentication or AWS access key information for AWS MSK IAM authentication
- Schema Registry (optional but recommended)
- Schema registry URL, username, and password (e.g., for Confluent Cloud)
- Schema registry URL, username, and password (e.g., for Confluent Cloud)
- Click Next
- Select the topics you want to capture
- Save and Publish your capture
Once configured, Estuary will create Flow collections representing your Kafka topics in real time.
Step 2: Materialize Data to Databricks
Once your Kafka capture is active, stream that data into Databricks:
- After your capture is saved, click Materialize Collections
(or go to Destinations > + New Materialization) - Search for Databricks, then click Materialize
- Enter Databricks configuration:
- Address: Your SQL warehouse endpoint (e.g., dbc-abc.cloud.databricks.com)
- HTTP Path: Found in your SQL warehouse connection details
- Catalog Name: Name of your Unity Catalog
- Schema Name: e.g., raw_streaming_data
- Auth Type: PAT
- Personal Access Token: Paste from Databricks UI or CLI
- Confirm that Flow collections from Kafka are bound to destination tables; if not, add them in the Source Collections section
- Click Next, then Save and Publish
Estuary now handles:
- Uploading data to Unity Catalog Volumes
- Transactionally applying updates to Databricks Delta tables
- Automatic schema mapping, retry logic, and scheduling
Bonus: Streamlining for Scale
Estuary Flow also supports:
- Delta Updates: Improve latency by skipping table queries (use only if Kafka messages have unique keys)
- Transformations: Enrich or filter Kafka messages in-flight using SQL or TypeScript
- Backfill + CDC: Materialize historical + new Kafka messages without data loss
- Sync Schedule: Default sync delay is 30 min (configurable)
Method 2: Kafka Connect + Delta Lake Sink Connector
If you're managing your own infrastructure, you can use Kafka Connect with the Delta Lake Sink Connector to write Kafka topics to Databricks.
Prerequisites:
- Kafka Connect installed and running
- Delta Lake Sink Connector installed
- Access to Databricks workspace and SQL Warehouse
- Write permissions on target Delta Lake tables
Example JSON Configuration:
javascript{
"name": "kafka-to-databricks",
"config": {
"connector.class": "io.delta.connectors.spark.DeltaSinkConnector",
"topics": "events",
"delta.tables": "/mnt/datalake/events",
"spark.sql.catalog": "spark_catalog",
"format": "delta",
"checkpointLocation": "/mnt/datalake/_checkpoints"
}
}
You can post this config to your Kafka Connect REST API:
plaintextcurl -X POST http://localhost:8083/connectors \
-H "Content-Type: application/json" \
-d @connector-config.json
Drawbacks:
- Complex deployment and configuration
- Requires Spark runtime with Delta support
- No GUI or transformation logic
- Manual schema management
- Less fault-tolerant vs Estuary
Looking for a faster, no-code way to move Kafka data to Databricks?
Estuary Flow gives you real-time pipelines, built-in CDC, and zero maintenance — all in minutes. 👉 Start your free pipeline →
Kafka to Databricks: Estuary Flow vs Kafka Connect
Estuary Flow offers a no-code, real-time Kafka to Databricks pipeline with built-in CDC and transformation support, while Kafka Connect requires manual setup and ongoing maintenance.
Feature | Estuary | Kafka Connect + Delta Sink |
No-code UI | Yes | No |
Real-time streaming | Yes | Yes |
Auto schema discovery | Yes | No |
CDC support | Yes | Basic |
Transformations (SQL/TS) | Yes | No |
Built-in retries + monitoring | Yes | Manual |
Integration with Unity Catalog | Yes | Complex |
Setup time | Minutes | Hours |
Top Use Cases for Kafka to Databricks Integration
ML Feature Engineering
Stream Kafka events into Databricks to build real-time feature stores for ML models, with low-latency data ingestion and training-ready datasets.
Real-Time Analytics
Ingest IoT metrics, clickstreams, or logs from Kafka and run SQL-based analytics in Databricks for monitoring, alerting, or trend analysis.
E-commerce Personalization
Capture user behavior in Kafka and sync to Databricks to power personalized recommendations, funnel analysis, and A/B test insights.
Operational Dashboards
Enable live dashboards with up-to-date Kafka data in Delta Lake — perfect for tracking system health, orders, or business KPIs.
Compliance & Audit Logging
Store Kafka event streams in Databricks for secure, queryable audit logs to meet regulatory, security, and governance requirements.
Final Thoughts: From Streams to Insight in Minutes
You chose Kafka for real-time data streaming. Now it's time to unlock its full potential with powerful analytics in Databricks.
Whether you're building dashboards, training ML models, or analyzing IoT data — Estuary Flow gives you:
- Real-time Kafka ingestion
- Transactional Delta Lake materialization
- Zero code, zero delay
💡 Ready to go from Kafka to Databricks in minutes? Start your free Estuary Flow pipeline
FAQ: Kafka to Databricks
Can I connect Kafka to Databricks without Spark?
Yes — Estuary Flow handles this fully in the background. You don’t need to manage Spark.
What message formats are supported?
Estuary supports JSON and Avro. For Avro, a schema registry is required.
Is Estuary secure?
Yes. Flow supports TLS, secure credentials, and private deployments with VPC control.
Does Flow support Unity Catalog + Delta Lake?
Yes — the connector writes to Unity Catalog Volumes and applies updates transactionally to Delta tables.
Also exploring other destinations? Learn how to stream Kafka to BigQuery, Kafka to Iceberg or Kafka to PostgreSQL using Estuary Flow.

About the author
Emily is a software engineer and technical content creator with an interest in developer education. She has experience across Developer Relations roles from her FinTech background and is always learning something new.
Popular Articles
