
Looking to load MySQL data into Databricks for real-time analytics, machine learning, or centralized reporting?
Whether you're building a real-time data pipeline or performing a one-time data migration, integrating MySQL (a widely used relational database) with Databricks (a unified data and AI platform) can significantly enhance your ability to generate insights, govern data, and automate workflows.
This guide explores two proven methods to connect MySQL to Databricks:
- A manual approach using CSV export and import
- An automated, real-time solution using Estuary Flow, a low-code platform with built-in CDC and streaming capabilities
By the end, you’ll know exactly which method best fits your business case, whether you're managing small datasets manually or scaling real-time operations across your enterprise.
If you're eager to jump straight to the methods and steps for connecting MySQL to Databricks, feel free to click here to skip ahead.
An Overview of MySQL
MySQL is one of the most popular relational database management systems (RDBMS) available today, renowned for its reliability and performance. It is acclaimed for its ability to organize data into structured tables and establish relationships among various data types. The SQL in MySQL implies you can utilize Structured Query Language (SQL) for creating, modifying, and extracting data from relational databases.
Here are some key features of MySQL:
- High Performance: MySQL supports multiple storage engines, such as MyISAM and InnoDB, each optimized for different use cases. And, with efficient indexing of tables, MySQL enhances performance, speeding up data retrieval operations like SELECT queries. MySQL also employs various caching techniques to improve query response times.
- Robust Security Measures: MySQL provides robust security features, including host-based verification and support for secure password mechanisms. It also allows detailed control of user privileges that can be configured per host, per database, or even per table.
- Reliability: MySQL ensures data integrity and reliability with features like foreign key constraints and transactional support. Regular updates, bug fixes, and security patches help improve stability while keeping the system secure and up-to-date.
An Overview of Databricks
Databricks is a cloud-based platform that utilizes the Apache Spark big data processing engine to manage and analyze large datasets efficiently. It can handle diverse workloads, including batch processing, real-time streaming, and machine-learning tasks, making it highly versatile.
To handle such workloads, Databricks optimizes cloud resources with dynamic allocation and scaling to match your workload requirements. This ensures efficient data processing and helps in cost optimization.
Here are some of the key features of Databricks:
- Integrated Tools: Databricks offers comprehensive tools and services tailored for big data operations. These include support for multiple data formats, seamless integration with leading libraries and frameworks, and data analysis utilities.
- Data Governance: Databricks offers robust data governance services, including audits and compliance controls. These functionalities empower you to regulate data access, manage data lifecycle, and fulfill their legal obligations and compliance objectives.
- Encryption: Databricks employ encryption to protect data at rest and in transit. Some of the encryption measures include safeguarding data storage, network communication, and user authentication.
Why Migrate from MySQL to Databricks
Migrating from MySQL to Databricks can help unlock new data insights and improve decision-making. Let's examine the several benefits of this migration.
- Unified Data Platform: Databricks offers a unified data platform that integrates all your data, analytics, and AI capabilities in a single environment. This simplifies data workflows, reduces integration complexity, and enables cross-functional collaboration.
- Build Machine Learning Models: Databricks provides cutting-edge tools like Databricks AutoML for automated ML model training and Unity Catalog for model management. You can utilize these tools for a streamlined, full-cycle development of machine learning models.
- Cost-Effective Scalability: Databricks optimizes resource allocation automatically, scaling up or down based on workloads. This helps reduce costs by ensuring you only pay for the resources you use.
How to Migrate Data from MySQL to Databricks: 2 Reliable Methods
You can connect or migrate your data from MySQL to Databricks using two proven approaches, depending on your data volume, sync frequency, and technical requirements:
- Method 1: Real-Time ETL with Estuary Flow
Use Estuary Flow’s low-code platform to stream MySQL data into Databricks with Change Data Capture (CDC), schema validation, and Unity Catalog compatibility.
- Method 2: Manual CSV Export/Import
Export data from MySQL as CSV files and upload them to Databricks using its built-in UI.
Let’s walk through each method step-by-step so you can choose the one that best fits your data integration needs.
Method #1: Using Estuary Flow to Load Data from MySQL to Databricks
Estuary Flow is a real-time ETL solution with low-code automation capabilities. It offers a user-friendly interface and pre-built connectors that help effortlessly transfer data between the source and destination in real time.
Some of the key features of Estuary Flow are:
- Transformation Options: Estuary Flow allows you to perform real-time or batch transformations using SQL or TypeScript. You can use these transformations for ELT or the traditional ETL approach with Estuary Flow.
- Change Data Capture (CDC): Estuary Flow leverages Change Data Capture (CDC) for real-time data processing. This low-latency technique helps track and capture the changes made to the source data, ensuring these updates are reflected in the destinations immediately.
- Real-time Integration: The real-time data integration feature helps build pipelines of millisecond latency between data extraction and loading. This ensures immediate insights and swift decision-making since you can analyze and act upon the data almost as soon as it is generated.
- No-code Configuration: Estuary Flow offers 200+ pre-built connectors that can seamlessly synchronize any source and destination with just a few clicks. Configuring these connectors does not require writing a single line of code.
Prerequisites
Before you begin setting up a MySQL to Databricks pipeline using Estuary Flow, make sure you have the following in place:
A MySQL Database
- binlog_format must be set to ROW (required for CDC).
- The binary log retention period should be at least 7 days to reduce the risk of unrecoverable failures.
- Create a dedicated database user (commonly flow_capture) with the following privileges:
- REPLICATION CLIENT
- REPLICATION SLAVE
- SELECT on the tables being captured (plus information_schema if using automatic discovery).
- If capturing DATETIME columns, configure the database time_zone to an IANA zone name (e.g., America/New_York) or a valid numerical offset.
The Mysql connector supports self-hosted MySQL, Amazon RDS, Amazon Aurora, Google Cloud SQL, and Azure Database for MySQL. It can also capture from read replicas if binlog is enabled.
A Databricks Workspace
You’ll need:
- A Unity Catalog
- A SQL Warehouse
- A schema in the Unity Catalog for materialized tables
- A user or service principal with sufficient permissions
Authentication
- A Personal Access Token (PAT), or
- A service principal access token (admins group only).
How it Works
Estuary’s Databricks connector first stages MySQL change data into a Unity Catalog Volume, then applies those changes transactionally into Databricks tables.
Advanced Options
- Configure sync schedules (default delay: 30 minutes).
- Enable delta updates to reduce query costs and latency for large datasets (useful if all rows have unique keys).
- Use column mapping to handle schema evolution where Databricks does not allow direct column alteration.
- Flow automatically handles reserved words by quoting them, but queries must reference them with the quotes.
An Estuary Flow Account: If you haven’t already, sign up for a free Estuary Flow account here →
Step 1: Configure MySQL as the Source
- Login to your Estuary Flow account to access the dashboard.
- Click the Sources option on the left-side pane of the dashboard.
- Click the + NEW CAPTURE button, and you will be redirected to the Create Capture page.
- On the Create Capture page, type MySQL in the Search connectors field. Click the Capture button of the real-time MySQL connector from the search results.
- On the MySQL connector configuration page, fill in the necessary details, such as Name, Server Address, Login Username, and Login Password.
- Finally, click NEXT on the top right corner and then SAVE AND PUBLISH to complete the source configuration.
Before publishing, confirm these required settings are in place on your MySQL instance
- binlog_format is ROW
- Binary log retention is ≥ 7 days
- The capture user has REPLICATION CLIENT, REPLICATION SLAVE, and SELECT privileges
- If any captured tables include DATETIME, ensure the database time_zone is set to an IANA name or numeric offset, or set the connector’s timezone
This CDC connector captures changes from the MySQL Binary Log. It supports self-hosted MySQL, Amazon RDS, Amazon Aurora, Google Cloud SQL, and Azure Database for MySQL. It can also capture from read replicas if binlog is enabled.
Step 2: Configure Databricks as the Destination
- After the capture is created, click MATERIALIZE COLLECTIONS in the pop-up.
- Or go to Destinations > + NEW MATERIALIZATION to open Create Materialization.
- Search for Databricks and select the Materialization option.
- On the Databricks connector configuration page, provide:
- Address: Host[:port] of your SQL Warehouse
- HTTP path: HTTP path for the SQL Warehouse
- Catalog Name: Your Unity Catalog
- Schema Name: Default schema to materialize to
- Authentication:
- Role: PAT
- Personal Access Token: Your Databricks token
- Alternatively, use a service principal access token if the principal has sufficient permissions
- In Source Collections, confirm the MySQL capture collections are bound to your Databricks tables.
- Finally, click the NEXT > SAVE AND PUBLISH buttons to complete the destination configuration.
How the materialization runs
- The connector uploads changes to a Unity Catalog Volume and then applies them transactionally to your Databricks tables.
- You can configure a Sync Schedule to control update frequency. The default delay is 30 minutes.
- For large datasets with unique keys, you can enable delta updates on a per-table basis to reduce read costs and latency.
Optional: If your Databricks environment restricts column alteration, enable column mapping on the target table with Delta protocol reader version 2+ and writer version 5+.
Ready to streamline your MySQL to Databricks data transfer? Sign up for your free Estuary account and start building today!
Advanced Notes for MySQL to Databricks with Estuary Flow
While most use cases work out of the box with the standard setup, Estuary Flow also supports advanced configurations that are useful for larger or more complex environments:
- Skip Backfill: By default, Estuary Flow backfills existing tables before switching to CDC. For very large tables, you can disable backfill per table and capture only new changes.
- Read Replica Support: The MySQL connector can capture from read replicas, provided that binary logging is enabled and all prerequisites are met.
- Column Mapping in Databricks: If your Databricks setup doesn’t allow direct column alterations, you can enable column mapping with Delta protocol reader v2+ and writer v5+. This makes schema evolution easier.
- Reserved Words Handling: Flow automatically quotes Databricks reserved words when creating tables. Keep in mind that any queries must reference these quoted identifiers exactly.
These options make Estuary Flow suitable not just for simple pipelines but also for production-scale data engineering needs.
Have a complex environment or need help tailoring your setup? Contact our team to discuss your use case.
See Estuary Flow streamline your Databricks data loading in this quick video demo.
Method 2: Using CSV Export/Import to Move Data from MySQL to Databricks
This method involves extracting data from MySQL in CSV format and loading it into Databricks. Let’s look at the details of the steps involved in this method:
Step 1: Use MySQL Workbench to Extract MySQL Data as CSV Files
MySQL Workbench is an import/export wizard that allows you to export databases/tables using a graphical user interface (GUI). It supports exports in CSV and JSON formats. Download MySQL Workbench here.
Here are the steps to export MySQL data as CSV
- Open MySQL Workbench. To connect to your MySQL database, click the + sign next to MySQL Connections on the home page.
- Provide the necessary details, such as Connection Name, Connection Method, Hostname, Port, Username, and Password.
- Click the Test Connection button to complete the database connection
- To export MySQL data, right-click on the table containing the data you want to export and select the Table Data Export Wizard option.
- Specify the data to be exported, including the Columns, Count (the number of rows to be exported from the top), and Row Offset (the number of rows to be exported from the bottom).
- Specify the File Path, format (CSV/JSON), Line Separator, Enclose String in, Field separator used to delimit values, and "null and Null word as SQL keyword". Then click the Next button to proceed.
- Once the export is successful, you can see a message similar to the one below.
- You can find the CSV files in the mentioned path to process the data.
Step 2: Load CSV to Databricks
Follow the steps below to load a CSV file to Databricks,
- Login to your Databricks account and navigate to the dashboard.
- In the Sidebar menu, click Data > Add Data > Create Table.
- Drag and drop the CSV files from your local directory and upload them.
- To access your data, click the Create Table with UI button.
- Now, you can read and modify the CSV data in Databricks.
Challenges of Using CSV Export/Import to Move Data from MySQL to Databricks
While the CSV export/import process is effective for a MySQL-Databricks integration, there are some associated limitations, including:
- Time-consuming: Manually extracting MySQL data as CSV and then loading it into Databricks is a time-consuming process. This method can be slow, especially when dealing with large volumes of data and limited network bandwidth.
- Risk of Errors: The CSV export/import method requires manual intervention to execute and monitor various tasks. This can impact the data integrity and make it more error-prone, affecting the reliability of the transferred data.
- Lack of Automation: The CSV export/import method lacks automation capabilities, making it difficult to streamline repetitive data transformation tasks. Maintaining accuracy, consistency, and efficiency throughout the migration process can be difficult without automation.
- Lack of Real-Time Integration: The manual migration typically involves batch processing with data transferred on-need basis. There can be a significant delay between data updates in the source and destination system, resulting in the method lacking real-time integration capabilities.
Final Thoughts
By connecting Databricks to MySQL, you gain multiple benefits, including rapid data processing, advanced machine learning modeling, and the flexibility to adapt to varying data requirements. To migrate data from MySQL to Databricks, you can either use the manual CSV export/import method or Estuary Flow.
The manual CSV export/import method can be time-consuming and require extensive technical expertise. Consider using Estuary Flow to help overcome these limitations. With ready-to-use connectors and an intuitive interface, Estuary Flow supports effortless data migration from source to destination with negligible latencies.
Would you like to transfer data between different data sources and destinations? To get started, log in or register for free here.
FAQs
Are there any best practices for optimizing Databricks data migration performance?
What are the post-migration steps to ensure a successful migration to Databricks?

About the author
Rob has worked extensively in marketing and product marketing on database, data integration, API management, and application integration technologies at WS02, Firebolt, Imply, GridGain, Axway, Informatica, and TIBCO.
