Estuary

How to Move Data from MySQL to Redshift in Minutes

Need to move data from MySQL to Redshift? Here are two methods: for a manual migration and a real-time data pipeline.

Share this article

Organizations use many data systems to support their business processes, and the never-ending challenge is keeping them in sync. Often, day-to-day operational data is stored in databases like MySQL. However, MySQL isn’t ideal for analyzing of massive amounts of data. Hence, moving data into data warehouses like Redshift makes a lot more sense for analytics workflows. Redshift stores data efficiently and supports faster query processing for analytics using BI tools. You can consolidate data from various applications in Amazon Redshift and perform robust analytics.

In this article, we’ll introduce MySQL and Redshift briefly, and provide a step-by-step guide on how to migrate data from MySQL to Redshift using two methods: the manual method and an automated approach using Estuary Flow.

What is MySQL?

mysql to redshift - mysql

 

MySQL is an open-source database management system developed by Oracle. It is a Relational Database Management System (RDBMS) that stores data in a tabular format consisting of rows and columns. To maintain consistency amongst tables and data, MySQL follows a referential integrity concept. As a result, you can use SQL (Structured Query Language) to store and retrieve information. 

MySQL is known for its following features:

  • Open-source: MySQL offers Community and an Enterprise edition. The Community edition is freely available to anyone who wants to use and modify databases.
  • Scalability: It supports up to 64 indexes per table, where each can contain 1 to 16 columns.
  • OS Compatible: MySQL is compatible with OS such as Windows, Linux, and macOS.

What is Amazon Redshift?

mysql to redshift - redshift

Amazon Redshift is one of the most powerful, large-scale data warehousing services provided by Amazon. It is widely used by organizations because of its fast query and I/O performance for large datasets.

Amazon Redshift is one of the popular database warehouses known for its following features:

  • It stores data in a column-based format in databases, optimizing analytical query performance. 
  • You can add additional nodes whenever the data in the database increases and take advantage of Massive Parallel Processing (MPP). MPP helps to run the most complex queries swiftly, even on voluminous data.

Methods for Moving Data from MySQL to Redshift

This section contains two comprehensive step-by-step methods for transferring data from MySQL to Amazon Redshift using MySQL commands and a SaaS alternative Tool.

Method 1: Manually Migrate Data from MySQL to Redshift

Method 2: Use Estuary Flow to Stream Data from MySQL to Redshift

Method 1: Manually Migrate Data from MySQL to Redshift

Migrating data manually from MySQL to Redshift can be done using CSV export and the COPY command. While it’s a straightforward process, it requires some manual effort, especially if you’re dealing with a large database or multiple tables.

The diagram below shows an overview of how data is migrated from MySQL to Amazon Redshift using CSV format files and the LOAD command.

mysql to redshift - LOAD method

Image Source

mysql to redshift - LOAD method

Step-by-Step Process to Manually Migrate Data:

Step 1: Export Data from MySQL Using mysqldump

You first need to export your MySQL data to a .sql file using the mysqldump command. This tool is widely used to create backups of MySQL databases and export data in formats such as CSV, SQL, or delimited text.

Here’s the command to generate a .sql file:

plaintext
mysqldump -h mysql_host -u user database_name table_name --result-file dumpfilename.sql

Explanation of Parameters:

  • -h: Specifies the host name of your MySQL server.
  • -u: Specifies the user accessing the MySQL server.
  • database_name: Name of the database you want to export.
  • table_name: The table(s) you want to export. You can specify multiple tables.
  • --result-file: Saves the output to the file specified (e.g., dumpfilename.sql).

For multiple databases, use the command:

plaintext
mysqldump --databases db_name1 [db_name2 ...] > pathname.sql

The resulting .sql file will contain SQL statements like this:

plaintext
INSERT INTO `employee` (`e_id`, `employee_first_name`, `employee_last_name`, `gender`) VALUES (567, ‘Emily’, ‘John’, 'F'), (568,’Karan’,’Kumar’, 'M');

However, this file is in text format, and to load it into Amazon Redshift, you’ll need it in CSV format.

plaintext
"567","Emily","John", "F" "568","Karan","Kumar","M"

Step 2: Convert Data to CSV Format

To load data into Redshift, you need to convert your .sql file into a CSV format. You can use the following command to extract and convert the MySQL data directly to a CSV file:

plaintext
mysql -B -u username database_name -h mysql_hostname -e "SELECT * FROM table_name;" | sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g" > table_name_data.csv

Explanation of Parameters:

  • sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"Stream editor to perform basic text transformation on an input stream.
  • table_name_data.csvThe name and path of the .csv file.

Another option is to use MySQL’s OUTFILE command, which directly creates a .csv file:

plaintext
SELECT * FROM table_name INTO OUTFILE 'C:/directory_name/table_name.csv' FIELDS TERMINATED BY ',' LINES TERMINATED BY '\n';

This will save the CSV file to the specified directory.

Step 3: Load Data to Amazon S3

Before loading the data into Redshift, you need to stage the CSV files in an Amazon S3 bucket.

  1. Sign in to your AWS Management Console.
  2. Create a new bucket or use an existing one.
  3. Upload your CSV files into this S3 bucket.

Step 4: Use the COPY Command to Load Data to Redshift

Now, load the data from the S3 bucket into Amazon Redshift using the COPY command. Here’s the command to load the data from S3 into Redshift:

plaintext
COPY table_name FROM 's3://bucket_name/file_path/table_name_data.csv' CREDENTIALS 'aws_access_key_id=your_access_key;aws_secret_access_key=your_secret_key' CSV;

Explanation:

  • table_name: The target table in Redshift where the data will be loaded.
  • 's3://bucket_name/file_path': The location of the CSV file in your S3 bucket.
  • CREDENTIALS: AWS access credentials for the S3 bucket.
  • CSV: Informs Redshift that the data is in CSV format.

Step 5: Verify Data Transfer

After running the COPY command, check if all the tables were successfully copied by reviewing the STL_LOAD_ERRORS system table in Redshift:

plaintext
SELECT * FROM STL_LOAD_ERRORS WHERE query = pg_last_query_id();

This will display any errors that occurred while loading the data. For more information on possible errors, you can refer to Amazon’s documentation.

Challenges of the Manual Method

  • Time-consuming: Exporting data from multiple tables and transforming it to the right format can take a significant amount of time.
  • Error-prone: Network issues or manual errors can cause data inconsistencies.
  • Lack of Real-time Updates: By the time you finish exporting data, your source tables might have changed, leading to outdated data in Redshift.

If you are looking for a more automated and real-time approach, consider using Estuary Flow as explained in Method 2.

Method 2: Use Estuary Flow to Stream Data from MySQL to Redshift

The manual method to migrate data from MySQL to Redshift looks cost-effective but becomes drawn-out when you encounter errors. If you have a huge database to export, this process might get burdensome to export each table into CSV format. 

Plus, your MySQL database likely changes quite frequently. By the time you export, the source table might already contain updates. This scenario leads to an inaccurate picture in reality, where you keep on moving your data regularly. You might also face network issues or timeout errors while MySQL to Redshift migration.

Understanding the above challenges, Estuary Flow allows you to easily extract data from MySQL using change data capture and write that data to a variety of destinations, including Amazon Redshift, Google Sheets, PostgreSQL, and many more. This process will eliminate most of the associated risks and considerably reduce internal efforts. Once you deploy your database, the pipeline will operate continuously in real-time, reducing the repetitive work.

Build your first pipeline using Flow with a free trial account to migrate MySQL to RedshiftRegister here. Before you set up your MySQL and Redshift, see the prerequisites:

Step 1: Go to the Estuary web app and sign in. 

Step 2: To capture data from MySQL, click the Captures tab and choose + New Capture. In the search connectors box, enter MySQL and then select capture.

Step 3: In the create capture window, add a unique name for your capture. Fill out the capture details with your SQL server addressdatabase username, and password

Step 4: Click Next. With the provided information, Flow will establish a connection with your SQL Server and display your tables as one or more collections. 

mysql to redshift - estuary mysql capture

Step 5:  By default, all the tables in your database will be captured as Flow collections. In case you don’t wish to capture any, you can remove them. Next, click Save and Publish. You will receive a notification once the capture is published.

Step 6: On the dialog box showing the capture’s logs, click the Materialize collections button.

Step 7: Next, a Create Materialization window will appear. In the search connectors, type Amazon Redshift and enter. Click on Materialize, as shown in the image below.

mysql to redshift - estuary redshift connector

Step 8: Fill in the Materialization details and Endpoint Config details that include the AWS access key and secret access key, the S3 bucket name, and the region. 

mysql to redshift - estuary redshift config

Step 9: You can see the Collection Selector at the end of the page. Every data collection that you have captured from MySQL is mapped to a new table in Redshift. From the available collections drop-down list, you can remove them in case you don’t want to add them. Add a table name for each. Click on Next, then Save and Publish.

mysql to redshift - redshift resource config

From now on, all existing data from your MySQL database will be copied to your Amazon Redshift. Any new data that streams through MySQL will be immediately copied to Redshift in real time.

Register & Set up your first pipeline in minutes—no coding required. Got questions or need help? Join our Slack community for real-time support and connect with data experts!

Summary: Which Method is Best for MySQL to Redshift Migration?

Migrating MySQL database to Amazon Redshift is a great way to store backups or analyze data. There are plenty of tools available to export databases. From the above methods, you can use the OUTFILE or mysqldump methods if you are comfortable with querying. However, this method might be time-consuming and need plenty of manual work. You also need to be accustomed to the Redshift data structure and COPY command. However, to quickly replicate your MySQL data to Amazon Redshift in real-time, give Estuary Flow a try — your first pipeline is free.

FAQs

1. How do I migrate data from MySQL to Redshift?

You can migrate data from MySQL to Redshift either manually using mysqldump and S3 or by using automated tools like Estuary Flow, which offers real-time data streaming.

2. What’s the best way to move data from MySQL to Redshift?

For small databases, the manual method using CSV export and the COPY command can work, but for real-time replication and large datasets, Estuary Flow provides a more efficient solution.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Build a Pipeline

Start streaming your data for free

Build a Pipeline

About the author

Picture of Jeffrey Richman
Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.