databricksmysql

14 min read

Last updated: March 6, 2026

How to Connect MySQL to Databricks: 2 Effective Methods

Expand the possibilities for your MySQL data! Learn how to migrate data from MySQL to Databricks using efficient integration methods.

Rob Meyer Technical Product Marketing (Data & Integration)

Share this article

Summarize this page with AI

Start Building For Free

Looking to load MySQL data into Databricks for real-time analytics, machine learning, or centralized reporting?

Whether you're building a real-time data pipeline or performing a one-time data migration, integrating MySQL (a widely used relational database) with Databricks (a unified data and AI platform) can significantly enhance your ability to generate insights, govern data, and automate workflows.

This guide explores two proven methods to connect MySQL to Databricks:

A manual approach using CSV export and import
An automated, real-time solution using Estuary, a low-code platform with built-in CDC and streaming capabilities

By the end, you’ll know exactly which method best fits your business case, whether you're managing small datasets manually or scaling real-time operations across your enterprise.

If you're eager to jump straight to the methods and steps for connecting MySQL to Databricks, feel free to click here to skip ahead.

An Overview of MySQL

Image Source

MySQL is one of the most popular relational database management systems (RDBMS) available today, renowned for its reliability and performance. It is acclaimed for its ability to organize data into structured tables and establish relationships among various data types. The SQL in MySQL implies you can utilize Structured Query Language (SQL) for creating, modifying, and extracting data from relational databases.

Here are some key features of MySQL:

High Performance: MySQL supports multiple storage engines, such as MyISAM and InnoDB, each optimized for different use cases. And, with efficient indexing of tables, MySQL enhances performance, speeding up data retrieval operations like SELECT queries. MySQL also employs various caching techniques to improve query response times.
Robust Security Measures: MySQL provides robust security features, including host-based verification and support for secure password mechanisms. It also allows detailed control of user privileges that can be configured per host, per database, or even per table.
Reliability: MySQL ensures data integrity and reliability with features like foreign key constraints and transactional support. Regular updates, bug fixes, and security patches help improve stability while keeping the system secure and up-to-date.

An Overview of Databricks

Image Source

Databricks is a cloud-based platform that utilizes the Apache Spark big data processing engine to manage and analyze large datasets efficiently. It can handle diverse workloads, including batch processing, real-time streaming, and machine-learning tasks, making it highly versatile.

To handle such workloads, Databricks optimizes cloud resources with dynamic allocation and scaling to match your workload requirements. This ensures efficient data processing and helps in cost optimization.

Here are some of the key features of Databricks:

Integrated Tools: Databricks offers comprehensive tools and services tailored for big data operations. These include support for multiple data formats, seamless integration with leading libraries and frameworks, and data analysis utilities.
Data Governance: Databricks offers robust data governance services, including audits and compliance controls. These functionalities empower you to regulate data access, manage data lifecycle, and fulfill their legal obligations and compliance objectives.
Encryption: Databricks employ encryption to protect data at rest and in transit. Some of the encryption measures include safeguarding data storage, network communication, and user authentication.

Why Migrate from MySQL to Databricks

Migrating from MySQL to Databricks can help unlock new data insights and improve decision-making. Let's examine the several benefits of this migration.

Unified Data Platform: Databricks offers a unified data platform that integrates all your data, analytics, and AI capabilities in a single environment. This simplifies data workflows, reduces integration complexity, and enables cross-functional collaboration.
Build Machine Learning Models: Databricks provides cutting-edge tools like Databricks AutoML for automated ML model training and Unity Catalog for model management. You can utilize these tools for a streamlined, full-cycle development of machine learning models.
Cost-Effective Scalability: Databricks optimizes resource allocation automatically, scaling up or down based on workloads. This helps reduce costs by ensuring you only pay for the resources you use.

How to Migrate Data from MySQL to Databricks: 2 Reliable Methods

You can connect or migrate your data from MySQL to Databricks using two proven approaches, depending on your data volume, sync frequency, and technical requirements:

Method 1: Real-Time ETL with Estuary

Use Estuary’s low-code platform to stream MySQL data into Databricks with Change Data Capture (CDC), schema validation, and Unity Catalog compatibility.

Method 2: Manual CSV Export/Import

Export data from MySQL as CSV files and upload them to Databricks using its built-in UI.

Let’s walk through each method step-by-step so you can choose the one that best fits your data integration needs.

Method #1: Using Estuary to Load Data from MySQL to Databricks

Estuary is a real-time ETL solution with low-code automation capabilities. It offers a user-friendly interface and pre-built connectors that help effortlessly transfer data between the source and destination in real time.

Some of the key features of Estuary are:

Transformation Options: Estuary allows you to perform real-time or batch transformations using SQL or TypeScript. You can use these transformations for ELT or the traditional ETL approach with Estuary.
Change Data Capture (CDC): Estuary leverages Change Data Capture (CDC) for real-time data processing. This low-latency technique helps track and capture the changes made to the source data, ensuring these updates are reflected in the destinations immediately.
Real-time Integration: The real-time data integration feature helps build pipelines of millisecond latency between data extraction and loading. This ensures immediate insights and swift decision-making since you can analyze and act upon the data almost as soon as it is generated.
No-code Configuration: Estuary offers 200+ pre-built connectors that can seamlessly synchronize any source and destination with just a few clicks. Configuring these connectors does not require writing a single line of code.

Prerequisites

Before you begin setting up a MySQL to Databricks pipeline using Estuary, make sure you have the following in place:

A MySQL Database

binlog_format must be set to ROW (required for CDC).
The binary log retention period should be at least 7 days to reduce the risk of unrecoverable failures.
Create a dedicated database user (commonly flow_capture) with the following privileges:
- REPLICATION CLIENT
- REPLICATION SLAVE
- SELECT on the tables being captured (plus information_schema if using automatic discovery).
If capturing DATETIME columns, configure the database time_zone to an IANA zone name (e.g., America/New_York) or a valid numerical offset.

The Mysql connector supports self-hosted MySQL, Amazon RDS, Amazon Aurora, Google Cloud SQL, and Azure Database for MySQL. It can also capture from read replicas if binlog is enabled.

A Databricks Workspace
You’ll need:

A Unity Catalog
A SQL Warehouse
A schema in the Unity Catalog for materialized tables
A user or service principal with sufficient permissions

Authentication

A Personal Access Token (PAT), or
A service principal access token (admins group only).

How it Works
Estuary’s Databricks connector first stages MySQL change data into a Unity Catalog Volume, then applies those changes transactionally into Databricks tables.

Advanced Options

Configure sync schedules (default delay: 30 minutes).
Enable delta updates to reduce query costs and latency for large datasets (useful if all rows have unique keys).
Use column mapping to handle schema evolution where Databricks does not allow direct column alteration.
Flow automatically handles reserved words by quoting them, but queries must reference them with the quotes.

An Estuary Account: If you haven’t already, sign up for a free Estuary account here →

Step 1: Configure MySQL as the Source

Login to your Estuary account to access the dashboard.
Click the Sources option on the left-side pane of the dashboard.
Click the + NEW CAPTURE button, and you will be redirected to the Create Capture page.

mysql to databricks - mysql connector search

On the Create Capture page, type MySQL in the Search connectors field. Click the Capture button of the real-time MySQL connector from the search results.

MySQL to Databricks - MySQL connector configuration

On the MySQL connector configuration page, fill in the necessary details, such as Name, Server Address, Login Username, and Login Password.
Finally, click NEXT on the top right corner and then SAVE AND PUBLISH to complete the source configuration.

Before publishing, confirm these required settings are in place on your MySQL instance

binlog_format is ROW
Binary log retention is ≥ 7 days
The capture user has REPLICATION CLIENT, REPLICATION SLAVE, and SELECT privileges
If any captured tables include DATETIME, ensure the database time_zone is set to an IANA name or numeric offset, or set the connector’s timezone

This CDC connector captures changes from the MySQL Binary Log. It supports self-hosted MySQL, Amazon RDS, Amazon Aurora, Google Cloud SQL, and Azure Database for MySQL. It can also capture from read replicas if binlog is enabled.

Step 2: Configure Databricks as the Destination

After the capture is created, click MATERIALIZE COLLECTIONS in the pop-up.
- Or go to Destinations > + NEW MATERIALIZATION to open Create Materialization.
Search for Databricks and select the Materialization option.

MySQL to Databricks - Databricks connector search

On the Databricks connector configuration page, provide:
- Address: Host[:port] of your SQL Warehouse
- HTTP path: HTTP path for the SQL Warehouse
- Catalog Name: Your Unity Catalog
- Schema Name: Default schema to materialize to
- Authentication:
  - Role: PAT
  - Personal Access Token: Your Databricks token
  - Alternatively, use a service principal access token if the principal has sufficient permissions

MySQL to Databricks - databricks connector configuration

In Source Collections, confirm the MySQL capture collections are bound to your Databricks tables.
Finally, click the NEXT > SAVE AND PUBLISH buttons to complete the destination configuration.

How the materialization runs

The connector uploads changes to a Unity Catalog Volume and then applies them transactionally to your Databricks tables.
You can configure a Sync Schedule to control update frequency. The default delay is 30 minutes.
For large datasets with unique keys, you can enable delta updates on a per-table basis to reduce read costs and latency.

Optional: If your Databricks environment restricts column alteration, enable column mapping on the target table with Delta protocol reader version 2+ and writer version 5+.

Ready to streamline your MySQL to Databricks data transfer? Sign up for your free Estuary account and start building today!

Advanced Notes for MySQL to Databricks with Estuary

While most use cases work out of the box with the standard setup, Estuary also supports advanced configurations that are useful for larger or more complex environments:

Skip Backfill: By default, Estuary backfills existing tables before switching to CDC. For very large tables, you can disable backfill per table and capture only new changes.
Read Replica Support: The MySQL connector can capture from read replicas, provided that binary logging is enabled and all prerequisites are met.
Column Mapping in Databricks: If your Databricks setup doesn’t allow direct column alterations, you can enable column mapping with Delta protocol reader v2+ and writer v5+. This makes schema evolution easier.
Reserved Words Handling: Flow automatically quotes Databricks reserved words when creating tables. Keep in mind that any queries must reference these quoted identifiers exactly.

These options make Estuary suitable not just for simple pipelines but also for production-scale data engineering needs.

Have a complex environment or need help tailoring your setup? Contact our team to discuss your use case.

See Estuary streamline your Databricks data loading in this quick video demo.

Method 2: Using CSV Export/Import to Move Data from MySQL to Databricks

This method involves extracting data from MySQL in CSV format and loading it into Databricks. Let’s look at the details of the steps involved in this method:

Step 1: Use MySQL Workbench to Extract MySQL Data as CSV Files

MySQL Workbench is an import/export wizard that allows you to export databases/tables using a graphical user interface (GUI). It supports exports in CSV and JSON formats. Download MySQL Workbench here.

Here are the steps to export MySQL data as CSV

Open MySQL Workbench. To connect to your MySQL database, click the + sign next to MySQL Connections on the home page.
Provide the necessary details, such as Connection Name, Connection Method, Hostname, Port, Username, and Password.

MySQL to Databricks - setup manual mysql connection

Image Source

Click the Test Connection button to complete the database connection
To export MySQL data, right-click on the table containing the data you want to export and select the Table Data Export Wizard option.

Image Source

Specify the data to be exported, including the Columns, Count (the number of rows to be exported from the top), and Row Offset (the number of rows to be exported from the bottom).

Image Source

Specify the File Path, format (CSV/JSON), Line Separator, Enclose String in, Field separator used to delimit values, and "null and Null word as SQL keyword". Then click the Next button to proceed.

MySQL to Databricks - choose output file path

Image Source

Once the export is successful, you can see a message similar to the one below.

Image Source

You can find the CSV files in the mentioned path to process the data.

Step 2: Load CSV to Databricks

Follow the steps below to load a CSV file to Databricks,

Login to your Databricks account and navigate to the dashboard.
In the Sidebar menu, click Data > Add Data > Create Table.
Drag and drop the CSV files from your local directory and upload them.
To access your data, click the Create Table with UI button.
Now, you can read and modify the CSV data in Databricks.

Challenges of Using CSV Export/Import to Move Data from MySQL to Databricks

While the CSV export/import process is effective for a MySQL-Databricks integration, there are some associated limitations, including:

Time-consuming: Manually extracting MySQL data as CSV and then loading it into Databricks is a time-consuming process. This method can be slow, especially when dealing with large volumes of data and limited network bandwidth.
Risk of Errors: The CSV export/import method requires manual intervention to execute and monitor various tasks. This can impact the data integrity and make it more error-prone, affecting the reliability of the transferred data.
Lack of Automation: The CSV export/import method lacks automation capabilities, making it difficult to streamline repetitive data transformation tasks. Maintaining accuracy, consistency, and efficiency throughout the migration process can be difficult without automation.
Lack of Real-Time Integration: The manual migration typically involves batch processing with data transferred on-need basis. There can be a significant delay between data updates in the source and destination system, resulting in the method lacking real-time integration capabilities.

Final Thoughts

By connecting Databricks to MySQL, you gain multiple benefits, including rapid data processing, advanced machine learning modeling, and the flexibility to adapt to varying data requirements. To migrate data from MySQL to Databricks, you can either use the manual CSV export/import method or Estuary.

The manual CSV export/import method can be time-consuming and require extensive technical expertise. Consider using Estuary to help overcome these limitations. With ready-to-use connectors and an intuitive interface, Estuary supports effortless data migration from source to destination with negligible latencies.

Would you like to transfer data between different data sources and destinations? To get started, log in or register for free here.

FAQs

What are the top ETL tools for transferring data from MySQL to Databricks?

The top three ETL tools to transfer data from MySQL to Databricks include Estuary and Fivetran. These tools help extract data from MySQL, transform it, and load it into Databricks to enhance data management capabilities.

Are there any best practices for optimizing Databricks data migration performance?

Best practices include parallelizing data transfer and transformation tasks, optimizing network bandwidth usage, caching intermediate results, and leveraging Databricks’ distributed processing capabilities. With Estuary, you can also configure sync schedules and enable delta updates for large datasets to reduce costs and latency.

What are the post-migration steps to ensure a successful migration to Databricks?

Post-migration steps include verifying data completeness and correctness within Databricks by comparing it with the source system. Ensure data pipelines, analytical tasks, and machine learning algorithms function properly. Conduct thorough testing of data processing and analytical workflows to validate their accuracy and reliability.

About the author

Rob MeyerTechnical Product Marketing (Data & Integration)

Rob is a technical product marketing leader with expertise in data engineering, databases, and integration technologies. He has previously worked with WSO2, Firebolt, Imply, GridGain, Axway, Informatica, and TIBCO, focusing on data platforms, APIs, and real-world data movement solutions.