Organizations use many data systems to support their business processes, and the never-ending challenge is keeping them in sync. Often, day-to-day operational data is stored in databases like MySQL. However, MySQL isn’t ideal for analyzing of massive amounts of data. Hence, moving data into data warehouses like Redshift makes a lot more sense for analytics workflows. Redshift stores data efficiently and supports faster query processing for analytics using BI tools. You can consolidate data from various applications in Amazon Redshift and perform robust analytics.
In this article, we’ll introduce MySQL and Redshift briefly, and provide a step-by-step guide on how to migrate data from MySQL to Redshift using two methods: the manual method and an automated approach using Estuary Flow.
What is MySQL?
MySQL is an open-source database management system developed by Oracle. It is a Relational Database Management System (RDBMS) that stores data in a tabular format consisting of rows and columns. To maintain consistency amongst tables and data, MySQL follows a referential integrity concept. As a result, you can use SQL (Structured Query Language) to store and retrieve information.
MySQL is known for its following features:
- Open-source: MySQL offers Community and an Enterprise edition. The Community edition is freely available to anyone who wants to use and modify databases.
- Scalability: It supports up to 64 indexes per table, where each can contain 1 to 16 columns.
- OS Compatible: MySQL is compatible with OS such as Windows, Linux, and macOS.
What is Amazon Redshift?
Amazon Redshift is one of the most powerful, large-scale data warehousing services provided by Amazon. It is widely used by organizations because of its fast query and I/O performance for large datasets.
Amazon Redshift is one of the popular database warehouses known for its following features:
- It stores data in a column-based format in databases, optimizing analytical query performance.
- You can add additional nodes whenever the data in the database increases and take advantage of Massive Parallel Processing (MPP). MPP helps to run the most complex queries swiftly, even on voluminous data.
Methods for Moving Data from MySQL to Redshift
This section contains two comprehensive step-by-step methods for transferring data from MySQL to Amazon Redshift using MySQL commands and a SaaS alternative Tool.
Method 1: Manually Migrate Data from MySQL to Redshift
Method 2: Use Estuary Flow to Stream Data from MySQL to Redshift
Method 1: Manually Migrate Data from MySQL to Redshift
Migrating data manually from MySQL to Redshift can be done using CSV export and the COPY command. While it’s a straightforward process, it requires some manual effort, especially if you’re dealing with a large database or multiple tables.
The diagram below shows an overview of how data is migrated from MySQL to Amazon Redshift using CSV format files and the LOAD command.
mysql to redshift - LOAD method
Step-by-Step Process to Manually Migrate Data:
Step 1: Export Data from MySQL Using mysqldump
You first need to export your MySQL data to a .sql file using the mysqldump command. This tool is widely used to create backups of MySQL databases and export data in formats such as CSV, SQL, or delimited text.
Here’s the command to generate a .sql file:
plaintextmysqldump -h mysql_host -u user database_name table_name --result-file dumpfilename.sql
Explanation of Parameters:
-h
: Specifies the host name of your MySQL server.-u
: Specifies the user accessing the MySQL server.database_name
: Name of the database you want to export.table_name
: The table(s) you want to export. You can specify multiple tables.--result-file
: Saves the output to the file specified (e.g.,dumpfilename.sql
).
For multiple databases, use the command:
plaintextmysqldump --databases db_name1 [db_name2 ...] > pathname.sql
The resulting .sql file will contain SQL statements like this:
plaintextINSERT INTO `employee` (`e_id`, `employee_first_name`, `employee_last_name`, `gender`) VALUES (567, ‘Emily’, ‘John’, 'F'), (568,’Karan’,’Kumar’, 'M');
However, this file is in text format, and to load it into Amazon Redshift, you’ll need it in CSV format.
plaintext"567","Emily","John", "F"
"568","Karan","Kumar","M"
Step 2: Convert Data to CSV Format
To load data into Redshift, you need to convert your .sql file into a CSV format. You can use the following command to extract and convert the MySQL data directly to a CSV file:
plaintextmysql -B -u username database_name -h mysql_hostname -e "SELECT * FROM table_name;" | sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g" > table_name_data.csv
Explanation of Parameters:
sed "s/'/'/;s/t/","/g;s/^/"/;s/$/"/;s/n//g"
: Stream editor to perform basic text transformation on an input stream.table_name_data.csv
: The name and path of the .csv file.
Another option is to use MySQL’s OUTFILE command, which directly creates a .csv file:
plaintextSELECT * FROM table_name
INTO OUTFILE 'C:/directory_name/table_name.csv'
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n';
This will save the CSV file to the specified directory.
Step 3: Load Data to Amazon S3
Before loading the data into Redshift, you need to stage the CSV files in an Amazon S3 bucket.
- Sign in to your AWS Management Console.
- Create a new bucket or use an existing one.
- Upload your CSV files into this S3 bucket.
Step 4: Use the COPY
Command to Load Data to Redshift
Now, load the data from the S3 bucket into Amazon Redshift using the COPY command. Here’s the command to load the data from S3 into Redshift:
plaintextCOPY table_name
FROM 's3://bucket_name/file_path/table_name_data.csv'
CREDENTIALS 'aws_access_key_id=your_access_key;aws_secret_access_key=your_secret_key'
CSV;
Explanation:
table_name
: The target table in Redshift where the data will be loaded.'s3://bucket_name/file_path'
: The location of the CSV file in your S3 bucket.CREDENTIALS
: AWS access credentials for the S3 bucket.CSV
: Informs Redshift that the data is in CSV format.
Step 5: Verify Data Transfer
After running the COPY command, check if all the tables were successfully copied by reviewing the STL_LOAD_ERRORS system table in Redshift:
plaintextSELECT * FROM STL_LOAD_ERRORS WHERE query = pg_last_query_id();
This will display any errors that occurred while loading the data. For more information on possible errors, you can refer to Amazon’s documentation.
Challenges of the Manual Method
- Time-consuming: Exporting data from multiple tables and transforming it to the right format can take a significant amount of time.
- Error-prone: Network issues or manual errors can cause data inconsistencies.
- Lack of Real-time Updates: By the time you finish exporting data, your source tables might have changed, leading to outdated data in Redshift.
If you are looking for a more automated and real-time approach, consider using Estuary Flow as explained in Method 2.
Method 2: Use Estuary Flow to Stream Data from MySQL to Redshift
The manual method to migrate data from MySQL to Redshift looks cost-effective but becomes drawn-out when you encounter errors. If you have a huge database to export, this process might get burdensome to export each table into CSV format.
Plus, your MySQL database likely changes quite frequently. By the time you export, the source table might already contain updates. This scenario leads to an inaccurate picture in reality, where you keep on moving your data regularly. You might also face network issues or timeout errors while MySQL to Redshift migration.
Understanding the above challenges, Estuary Flow allows you to easily extract data from MySQL using change data capture and write that data to a variety of destinations, including Amazon Redshift, Google Sheets, PostgreSQL, and many more. This process will eliminate most of the associated risks and considerably reduce internal efforts. Once you deploy your database, the pipeline will operate continuously in real-time, reducing the repetitive work.
Build your first pipeline using Flow with a free trial account to migrate MySQL to Redshift: Register here. Before you set up your MySQL and Redshift, see the prerequisites:
Step 1: Go to the Estuary web app and sign in.
Step 2: To capture data from MySQL, click the Captures tab and choose + New Capture. In the search connectors box, enter MySQL and then select capture.
Step 3: In the create capture window, add a unique name for your capture. Fill out the capture details with your SQL server address, database username, and password.
Step 4: Click Next. With the provided information, Flow will establish a connection with your SQL Server and display your tables as one or more collections.
Step 5: By default, all the tables in your database will be captured as Flow collections. In case you don’t wish to capture any, you can remove them. Next, click Save and Publish. You will receive a notification once the capture is published.
Step 6: On the dialog box showing the capture’s logs, click the Materialize collections button.
Step 7: Next, a Create Materialization window will appear. In the search connectors, type Amazon Redshift and enter. Click on Materialize, as shown in the image below.
Step 8: Fill in the Materialization details and Endpoint Config details that include the AWS access key and secret access key, the S3 bucket name, and the region.
Step 9: You can see the Collection Selector at the end of the page. Every data collection that you have captured from MySQL is mapped to a new table in Redshift. From the available collections drop-down list, you can remove them in case you don’t want to add them. Add a table name for each. Click on Next, then Save and Publish.