Estuary

Ingesting Data into BigQuery: How to set up a Materialization in Estuary Flow

TUTORIALS7/6/2023

Try us free:
https://www.estuary.dev/

Join our Slack channel with a community of developers:
https://estuary-dev.slack.com/

BigQuery blog:
https://estuary.dev/cloud-sql-to-bigquery/

BigQuery sandbox:
https://estuary.dev/bigquery-sandbox/

________________________________________________________________________

Building a Pipeline With Estuary Flow.
Estuary Flow is a real-time data integration platform that allows you to connect Cloud SQL to BigQuery and other data sources. Estuary is streaming native and has an intuitive no-code UI that’s quick to use once your data systems meet the prerequisites. Like Dataflow, it's also highly scaleable and hands-off once the initial setup is done.

To connect Cloud SQL to BigQuery using Estuary, you'll need to meet the following requirements:

Google Cloud SQL instance: You need to have a running Cloud SQL instance that contains the data you want to transfer to BigQuery.

Allow connections from Estuary Flow: You'll need to enable public IP on your database and add the IP address of Estuary Flow (currently 34.121.207.128) as an authorized IP address.

Depending on whether your Cloud SQL instance is MySQL, Postgres, or SQL Server, you’ll have to meet a few more requirements to prepare your database. See the guides below:

MySQL
Postgres
SQL Server

A Google Cloud Storage bucket in the same region as the BigQuery dataset.

A Google Service account with roles/bigquery.dataEditor, roles/bigquery.jobUser, and roles/storage.objectAdmin; and a service account key generated. See this guide for help.
Once you've met these requirements, you can follow these steps to connect Cloud SQL to BigQuery using Estuary Flow:

Log in to your Estuary account, or sign up to get started for free.

Go to the create a new capture page of the Estuary web app and select either the MySQL, PostgreSQL, or SQL Server connector, depending on your Cloud SQL database type.

Add a unique name for the capture. Provide the Cloud SQL server address, database username (this should be “flow_capture” if you followed the prerequisite steps), and a password.

Click the Next button. Flow lists all the tables in your database, which it will convert into Flow data collections described by JSON schema. You can remove any tables you don’t want to capture.

Click Save and Publish.

On the dialog box showing your capture was successful, click the Materialize Collections button to continue.

Choose the BigQuery connector.

Add a unique name for the materialization.

Provide the following details for your BigQuery dataset:
Google Cloud project ID
Service account JSON credentials (which you generated per the prerequisites)
The project’s Google Cloud Region
Dataset name
Staging Google Cloud Storage bucket name

Scroll down to the Collection Selector. Each table you just captured from Cloud SQL will be mapped to a new table in BigQuery. Provide a name for each (you might choose to use the same names).

Optionally, you can modify the collection's schema, determining how it'll be mapped to BigQuery, but that shouldn't be necessary: Flow will output the data in a queryable format in BigQuery tables.

Click Next.

Click Save and Publish.
All historical data from your Cloud SQL database will be copied to BigQuery. Any new data that appears in Cloud SQL will also be copied to BigQuery in real-time. Along the way, data will be cleaned and re-formatted to adhere to BigQuery's data types and valid schema options.

Using this method requires minimal technical expertise, and your data pipeline is backed up securely with schema validation and exactly-once semantics. Additionally, a single data pipeline can sync many (or all) tables in your Cloud SQL database into equivalent BigQuery tables.

#bigquery #data #dataengineering #datapipeline

More videos

Estuary logo on light background

Seamless Data Integration, Unlimited Potential

Discover the simplest way to connect and move your data.
Get hands-on for free, or schedule a demo to see the possibilities for your team.