Organizations rely on vast data volumes for effective decision-making. With constantly expanding data volumes, there is a rise in demand for databases that can handle it all. Amazon Relational Database Service (RDS) plays a pivotal role in this by providing a reliable platform for handling this data as it’s easy to set up.

However, Amazon RDS is primarily good for processing transactional data. It does not have the power to process large-scale analytical queries or derive insights for impactful decision-making. By loading your data from Amazon RDS to BigQuery, you can leverage the warehousing and advanced analytical capabilities of BigQuery. It also offers you real-time data monitoring, dynamic scalability, and simplified data analysis at reduced costs.

But, the best way to learn how to load your data into BigQuery from Amazon RDS is to try it out yourself. This is exactly what this tutorial is all about. So, let’s dive in. 

Amazon RDS: Overview

Blog Post Image

Image Source

Amazon Relational Database Service (Amazon RDS) is a serverless relational database service offered by Amazon Web Services (AWS). It supports effortless deployment through its managed service model, which automates the database setup and configuration in the cloud environment. You can effortlessly set up a new database with just a few API calls without any manual setup. In addition, it has the ability to collaborate with multiple popular database engines such as MySQL, PostgreSQL, Oracle, MariaDB, and SQL Server. This diverse engine support facilitates combining the scalability and managed services of Amazon RDS with the power of multiple databases.

Amazon RDS also automates time-consuming and repetitive administrative tasks such as database configuration, patching, and backup maintenance. These robust functionalities assist you in working on the development part of the application rather than putting in the effort to manage the infrastructure.

Some key features of Amazon RDS are mentioned below.

  • Scalability: Amazon RDS offers you the ability to scale up or down on demand effortlessly. This permits you to handle fluctuating workloads with optimal processing speed and without much downtime.
  • Automated Software Patching: Amazon RDS has the functionality to automatically apply software updates to the database as soon as they are available. This ensures the databases are continuously upgraded to the latest version and includes all security fixes.
  • Automated Backups: Amazon RDS automates the backups for all the databases and securely stores them in Amazon S3. This facilitates fault tolerance and ensures data durability.

What Is BigQuery?

Blog Post Image

Image Source

BigQuery is a highly scalable cloud-based data warehousing platform offered by Google Cloud Platform (GCP). It leverages Google's infrastructure to efficiently store, process, manage, and analyze petabyte-scale datasets. BigQuery also supports collaboration, consistency, and high-performance querying of these massive datasets using SQL-like syntax.

Its architecture offers columnar storage and a highly parallelized processing engine for achieving optimal performance and quick results on large and growing datasets. This also ensures automatic scaling by dynamically allocating resources for increasing or decreasing requirements.

Some of the key features of BigQuery are mentioned below.

  • User-defined Functions: BigQuery offers user-defined functions (UDFs) by allowing you to write and use custom SQL functions. This facilitates query reusability and encapsulates complex logic and functionalities with the datasets.
  • Data Import/Export Formats: BigQuery allows you to work with multiple data import and export formats such as JSON, CSV, Parquet, Avro, and ORC. This lets you seamlessly transfer data between BigQuery and multiple other platforms for smooth data processing.
  • Disaster Recovery: BigQuery facilitates cross-region replication of your data for disaster recovery. With this feature, you can ensure high availability and no downtime, even in case of any failures or interruptions.

How to Migrate From Amazon RDS to BigQuery

Here are two methods you can use to reliably load data from Amazon RDS to BigQuery:

  • Method 1: Using Estuary Flow to load data from Amazon RDS to BigQuery
  • Method 2: Using manual data export/import to load data from Amazon RDS to BigQuery

Method 1: Using Estuary Flow to Load Data From Amazon RDS to BigQuery

Estuary Flow is a no-code, real-time ETL (extract, transform, load) platform that enables you to streamline your data-loading process in real time. The data extraction and loading is completely automated, thanks to its readily available connectors, allowing you to integrate your data without requiring prior coding experience.

Prerequisites

Step 1: Configure Amazon RDS as the Source 

  • Log in to your Estuary account to start configuring Amazon RDS as your data source.
  • Go to the Sources tab on the dashboard.
  • Click on the + NEW CAPTURE button on the Sources page.
Blog Post Image
  • Using the Search connectors box, search for the Amazon RDS connector and then click on the Capture button.

You can select one of the many connector options, including Amazon RDS for MariaDBAmazon RDS for MySQLAmazon RDS for PostgreSQL, or Amazon RDS for SQL Server.

Blog Post Image
  • You will be redirected to the Amazon RDS connector page. Fill in all the specified details, such as Server Address, Login Username, and Login Password.
  • Click on NEXT > SAVE AND PUBLISH. The CDC connector will capture change events from MySQL into Flow Collections via the Binary Log.

Step 2: Configure BigQuery as the Destination 

  • Select the Destinations option on the dashboard to start configuring the destination end of the data pipeline.
  • Click on the + NEW MATERIALIZATION button on the Destinations page.
Blog Post Image
  • Using the Search connectors box, search for the BigQuery connector and click on its Materialization button.
Blog Post Image
  • You will be redirected to the BigQuery connector page. Fill in all the specified details, such as Project IDService Account JSONRegionDataset, and Bucket.
  • While your source data from Flow collections will be automatically added to your materialization, consider using the Source Collections section to do this manually.
  • Click on NEXT > SAVE AND PUBLISH. This will materialize the data from your Flow collections into BigQuery tables.

Benefits of Using Estuary Flow

  • Change Data Capture (CDC): Estuary Flow offers CDC support for processing data in real-time. It facilitates near instantaneous synchronization and continuous updating of all the changes in the data to maintain integrity.
  • Built-in Testing: Flow maintains the quality of data while migration by using its built-in testing feature. This ensures that the data that you are loading from Amazon RDS to BigQuery is accurate and does not result in unexpected errors.
  • Built-in Connectors: Estuary Flow provides you with more than 300 ready-to-use connectors for creating custom pipelines. This streamlines your entire data-loading process and makes it less prone to human errors.

Method 2: Using Manual Data Export/Import to Load Data From Amazon RDS to BigQuery

Let’s explore the step-by-step instructions to load your data from Amazon RDS to BigQuery by manually transferring data files. This involves exporting data from Amazon RDS and temporarily storing it in Amazon S3, following which, the data is then imported into BigQuery.

Step 1: Create an S3 Bucket

  • Sign into your AWS Management Console.
  • Open the Amazon S3 console by clicking on S3 under Storage & Content Delivery.
  • Click on Create Bucket from the Amazon S3 console dashboard.
  • Fill in the specified details, such as Bucket Name and AWS Region, and click on Create. Amazon S3 will successfully create your bucket and display it in the Buckets pane.

Step 2: Export Data From Amazon RDS to S3 Bucket

Use the following CLI command to export your data from Amazon RDS to the S3 bucket.

plaintext
aws rds-data execute-statement --database <database-name> --resource-arn <db-cluster-arn> --Secret-arn <secret-arn> --sql “SELECT * FROM <table-name>” --Include-result-metadata --Output text > s3://<bucket-name>/data.txt

Substitute the following parameters in the above command:

  1. <database-name>: The name of your Amazon RDS database.
  2. <db-cluster-arn>: The arn of your RDS cluster.
  3. <secret-arn>: The arn of your secret.
  4. <table-name>: The name of the table from which the data has to be exported.
  5. <bucket-name>: The name of your newly created S3 bucket.

Step 3: Import Data From S3 Bucket into BigQuery

  • In the Google Cloud Console, navigate to the BigQuery page.
  • In Transfers, click Create a Transfer.
  • You will be redirected to the Create Transfer page. Here, enter all the mandatory details as mentioned below:
  1. In the Source box under the Source type section, select Amazon S3.
Blog Post Image

Image Source

  1. In the Display name box under the Transfer config name section, fill in a name for the transfer.
Blog Post Image

Image Source

  1. In the Schedule options section, specify a Repeat frequency.
  2. In the Destination dataset box under the Destination settings section, choose the dataset you created to store your data.

 

Blog Post Image

Image Source

  1. Under the Data source details section, fill in all the fields such as Destination tableAmazon S3 URI, Access key IDSecret access key, and File Format.
Blog Post Image

Image Source

  1. In the Transfer Options section, fill in the Number of errors allowed.
Blog Post Image

Image Source

  1. Select the format type and fill in the corresponding details.
Blog Post Image

Image Source

  1. Select a Service account associated with your Google Cloud Project.
  • Save all your changes by clicking the Save button to finish the importing process. 

The Takeaway

Loading your data from Amazon RDS to BigQuery facilitates effective data analysis and management. It allows you to leverage the highly scalable and advanced analytical querying capabilities of BigQuery.

There are two methods for loading data from Amazon RDS to Google BigQuery. The manual export/import method uses CLI (Command Line Interface) commands to export data from Amazon RDS into the Amazon S3 bucket as a temporary location. Then, this data is imported from the S3 bucket into BigQuery using Google Cloud Console. The process is time-consuming and error-prone due to the involvement of multiple steps.

No-code ETL platforms offer a powerful solution to overcome time-consuming hurdles. Estuary Flow automates the entire data extraction and loading processes and offers an extensive connector set with real-time data processing capabilities.

Automate your data loading process with the use of Estuary Flow by leveraging its robust capabilities and dynamic scalability. Register now to get started!

Start streaming your data for free

Build a Pipeline