Organizations use many data systems to support their business processes, and the never-ending challenge is keeping them in sync. Often, day-to-day operational data is stored in databases like MySQL. However, MySQL isn’t ideal for analyzing of massive amounts of data. Hence, moving data into data warehouses like Redshift makes a lot more sense for analytics workflows. Redshift stores data efficiently and supports faster query processing for analytics using BI tools. You can consolidate data from various applications in Amazon Redshift and perform robust analytics.

In this article, we’ll start with a brief introduction to MySQL and Redshift. Then, we’ll provide a step-by-step guide to export your data from MySQL to Amazon Redshift using two methods. 

What is MySQL?

mysql to redshift - mysql

 

MySQL is an open-source database management system developed by Oracle. It is a Relational Database Management System (RDBMS) that stores data in a tabular format consisting of rows and columns. To maintain consistency amongst tables and data, MySQL follows a referential integrity concept. As a result, you can use SQL (Structured Query Language) to store and retrieve information. 

MySQL is known for its following features:

  • Open-source: MySQL offers Community and an Enterprise edition. The Community edition is freely available to anyone who wants to use and modify databases.
  • Scalability: It supports up to 64 indexes per table, where each can contain 1 to 16 columns.
  • OS Compatible: MySQL is compatible with OS such as Windows, Linux, and macOS.

What is Amazon Redshift?

mysql to redshift - redshift

 

Amazon Redshift is one of the most powerful, large-scale data warehousing services provided by Amazon. It is widely used by organizations because of its fast query and I/O performance for large datasets.

Amazon Redshift is one of the popular database warehouses known for its following features:

  • It stores data in a column-based format in databases, optimizing analytical query performance. 
  • You can add additional nodes whenever the data in the database increases and take advantage of Massive Parallel Processing (MPP). MPP helps to run the most complex queries swiftly, even on voluminous data.

Methods for Moving Data from MySQL to Redshift

This section contains two comprehensive step-by-step methods for transferring data from MySQL to Amazon Redshift using MySQL commands and a SaaS alternative Tool.

Method 1: Manually Migrate Data from MySQL to Redshift

Method 2: Use Estuary Flow to Stream Data from MySQL to Redshift

Method 1: Manually Migrate Data from MySQL to Redshift

The diagram below shows a rough scenario of how to migrate data from MySQL to Amazon Redshift using CSV format files and the LOAD command.

mysql to redshift - LOAD method

Image Source

The first part of this process is to extract data from MySQL databases into a .csv file. We’ll show you two methods to do so: using mysqldump and OUTFILE

Export from MySQL using mysqldump

MySQL’s mysqldump command is used to export data from the MySQL database into a single text file. The text file contains SQL statements that can be used to reproduce MySQL's original database. mysqldump is used to perform logical backups. This command can also generate output in .csvXML, or other delimited text

  1. Open your command prompt and connect to your MySQL server.
     
  2. Navigate to the directory where the mysqldump.exe file is located (because ‘mysqldump’ is not recognized as an internal or external command). Now, insert the below command to create a .sql file using the mysqldump command.
plaintext
mysqldump -h mysql_host -u user database_name table_name --result-file dumpfilename.sql
  • -hIs used to provide the name of the host.
  • --result-file: Directs the output to the given destination file.
  • database_name: The name of your database.
  • tablename: The name of the table that you want to export. You can also list multiple tables like tablename1 tablename2 tablename3.
  • dumpfilename.sqlThe path and name of the final output file with a .sql extension. 

You can specify the configuration option to customize your backup. You’ll be prompted to enter your password.

You can also dump multiple databases using mysqldump. See the command below:

plaintext
mysqldump --databases db_name1 [db_name2 ...] > pathname.sql

The mysqldump command will migrate all data from databases/tables to the dumpfilename.sql file. It will create a .sql file for each table in the directory path that you provided. The .sql file will contain a query structure for creating a table, which will look something like this:

plaintext
INSERT INTO `employee` (`e_id`, `employee_first_name`, `employee_last_name`, `gender`) VALUES (567, ‘Emily’, ‘John’, 'F'), (568,’Karan’,’Kumar’, 'M');

But the file is in text format, and you need to load a CSV or JSON format file in Amazon   Redshift. Now, we need to to convert the above .sql file data into the following format:

plaintext
"567","Emily","John", "F" "568","Karan","Kumar","M"
  1. Insert the below query to generate the output in .csv format. You have to do this specifically for each table.
plaintext
mysql -B -u username database_name -h mysql_hostname -e "SELECT * FROM table_name;" | sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g" > table_name_data.csv
  • sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"Stream editor to perform basic text transformation on an input stream.
  • table_name_data.csvThe name and path of the .csv file.

Export from MySQL using OUTFILE

Alternatively, you can create a .csv file by using the OUTFILE command. This command will write rows you select into a specified file.

  1. Insert the following code in your command prompt.
plaintext
mysql> select * from table_name into OUTFILE “C:\\directory_name\\table_name.csv” fields terminated by ‘,’ lines terminated by ‘\n’
  • table_name.csvThe name of your .csv file.
  • fields terminated byProvide values separated by ‘,’
  • lines terminated by ‘\n’Enter a new record in a new line.

The above query will create a Comma Separated Values (CSV) format file on the given path. This process has to be done for every table. 

Transform the CSV and load data into RedShift

The next step is to clean and transform the CSV file. In order to store data in Redshift, you need to structure and store the data in the right format. Hence, you need to clean the data before uploading it to Redshift. You might want to concatenate two fields or change the format of the date and time fields in your .csv file. All these changes have to be done manually.

Once the .csv file is cleaned, you have to upload it to the Amazon S3 bucket, Amazon’s object storage service. Then, use the following COPY command to load data to Amazon Redshift from S3.

plaintext
COPY table_name FROM 's3://redshift_bucket_name/path_name/table_name/' credentials 'aws_access_key_id=access_key_name;aws_secret_access_key=secret_key';

To export all tables, the above step has to be run separately for each table.

After executing the above commands, you can check if all the tables are copied correctly in STL_LOAD_ERRORS. STL_LOAD_ERRORS contains a history (typically of 4-5 days) of all records and displays the errors that occurred while loading the data from the respective file. For reference, check the list of possible load errors here.

If you don’t find any STL_LOAD_ERRORS, then your MySQL to Redshift migration is successful.

Method 2: Use Estuary Flow to Stream Data from MySQL to Redshift

The manual method to migrate data from MySQL to Redshift looks cost-effective but becomes drawn-out when you encounter errors. If you have a huge database to export, this process might get burdensome to export each table into CSV format. 

Plus, your MySQL database likely changes quite frequently. By the time you export, the source table might already contain updates. This scenario leads to an inaccurate picture in reality, where you keep on moving your data regularly. You might also face network issues or timeout errors while MySQL to Redshift migration.

Understanding the above challenges, Estuary Flow allows you to easily extract data from MySQL using change data capture and write that data to a variety of destinations, including Amazon Redshift, Google Sheets, PostgreSQL, and many more. This process will eliminate most of the associated risks and considerably reduce internal efforts. Once you deploy your database, the pipeline will operate continuously in real-time, reducing the repetitive work.

Build your first pipeline using Flow with a free trial account to migrate MySQL to RedshiftRegister here. Before you set up your MySQL and Redshift, see the prerequisites:

Step 1: Go to the Estuary web app and sign in. 

Step 2: To capture data from MySQL, click the Captures tab and choose + New Capture. In the search connectors box, enter MySQL and then select capture.

Step 3: In the create capture window, add a unique name for your capture. Fill out the capture details with your SQL server addressdatabase username, and password

Step 4: Click Next. With the provided information, Flow will establish a connection with your SQL Server and display your tables as one or more collections. 

mysql to redshift - estuary mysql capture

Step 5:  By default, all the tables in your database will be captured as Flow collections. In case you don’t wish to capture any, you can remove them. Next, click Save and Publish. You will receive a notification once the capture is published.

Step 6: On the dialog box showing the capture’s logs, click the Materialize collections button.

Step 7: Next, a Create Materialization window will appear. In the search connectors, type Amazon Redshift and enter. Click on Materialize, as shown in the image below.

mysql to redshift - estuary redshift connector

Step 8: Fill in the Materialization details and Endpoint Config details that include the AWS access key and secret access key, the S3 bucket name, and the region. 

mysql to redshift - estuary redshift config

Step 9: You can see the Collection Selector at the end of the page. Every data collection that you have captured from MySQL is mapped to a new table in Redshift. From the available collections drop-down list, you can remove them in case you don’t want to add them. Add a table name for each. Click on Next, then Save and Publish.