Estuary

How To Load Data From CSV to BigQuery: Full Tutorial

Moving data from CVS to BigQuery is critical for data analysis. Learn the most effective methods.

Picture of Jeffrey Richman
Jeffrey Richman
How To Load Data From CSV to BigQuery: Full Tutorial
Share this article

Due to the simplicity and versatility of CSV files, it continues to be a preferred choice for data storage and sharing across various domains. As a result, data-driven companies deal with a lot of CSV files to move data from one platform to another. However, if you want to perform analysis, you must centralize the data in a data warehouse like BigQuery. By loading your CSV data into BigQuery, you can run queries and gain valuable insights into your data. 

What is Google BigQuery?

csv to bigquery - BigQuery logo

Google BigQuery is a fully-managed, cloud-based data warehouse and analytics platform offered by Google Cloud Platform (GCP). It provides a powerful, scalable solution to process and analyze massive amounts of data efficiently. For analysis, you can either use SQL commands or connect business intelligence tools to obtain insights. And for machine learning, you can also leverage the BigQuery ML capability. It allows you to create and deploy ML models using GoogleSQL queries. BigQuery ML supports internally trained models like Linear regression, K-means clustering, and Principal Component Analysis. However, externally trained models like Deep Neural Networks and Random Forest are also supported by BigQuery ML.

BigQuery also has the ability to integrate with a range of other GCP services like Google Cloud Storage, Cloud Dataflow, and Google Cloud Pub/Sub. This allows you to easily transfer data into BigQuery and analyze it using other GCP tools. 

Why Move Data From CSV to BigQuery?

CSV, or Comma-Separated Values, is the most common file format for exporting or importing data. While CSV files are widely used, there are some limitations pertaining to processing and analyzing large datasets. 

However, Google BigQuery can handle large amounts of data efficiently. When you upload CSV to BigQuery, you can achieve fast query execution. This reduces the time required to obtain insights into your data. BigQuery offers advanced analytics features like ML and geographic data analysis. You can use these features to gain deeper insights for making informed decisions.

Additionally, BigQuery is a pay-as-you-go model, so you pay only for the storage and queries you use. You don’t need any expensive software or hardware for the same.

Methods to Load Data From CSV to BigQuery

Whether you’re loading CSV data from cloud storage or a local file, you can load the data into a new BigQuery table or BigQuery partitioned table. You can also append to or overwrite an existing table or partition.

Before you get started with loading your CSV data to BigQuery, ensure that you have a BigQuery dataset to store your data. Then, you can pick one of the different methods to upload CSV to BigQuery:

Method #1: Using the Command Line

You can use the bq command line tool’s bq load command to load data from a readable data source. A load job is automatically created when you load data using the command line tool. The bq load command will create or update a table and load the data in a single step.

Let’s say you have a dataset named my_dataset with a table named my_table. Then, you can use the following command:

plaintext
bq load \ - - source_format=CSV \ my_dataset.my_table \ gs://my_bucket/mydata.csv \ ./myschema.json

This command loads data from the source file in gs://mybucket/mydata.csv into the table named my_table within my_dataset. For specific schema definitions, the local schema file named myschema.json is included. There are other optional flags you can include in this command to specify additional information. For example, autodetect will enable auto-detection of schema, and ignore_unknown_values will ignore extra, unrecognized CSV values.

If you want to check whether your BigQuery table has been populated, run the following command:

plaintext
bq show my_dataset.my_table

If my_table within my_dataset has all your data, this command will provide detailed information about my_table, including its schema, field names, data types, and other attributes.

While the manual process seems pretty straightforward, it comes with scalability limitations and susceptibility to errors.

Method #2: Using the BigQuery Web UI

You can use the BigQuery web UI to load CSV data into a BigQuery table. First, you must open the BigQuery web UI. In the Explorer pane, expand your project, and select a dataset you want to load. In the Data set info section, click on Create Table.

csv to bigquery - BigQuery Web UI

In the Create table panel, you must specify details in the required fields. Under the Source section, you’ll have the option to Create a table from Google Cloud Storage, Upload, or Drive, among other options. Select the appropriate option. The File format must be set to CSV.

Similarly, specify the Project, Data set, and Table details in the Destination section. Finally, within the Schema section, you can enable Auto-detect or manually enter the schema information.

csv to bigquery - Bigquery create table page

You can scroll down in the Create table panel and check Advanced options for some other parameters like Encryption and Default Rounding Mode. After providing the necessary details, click on the Create Table button. Your CSV data will be fetched, and after ascertaining the schema, the table will be created and populated with the CSV data.

Method #3: Using the Web API

If you’re an avid programmer and want to upload CSV to BigQuery programmatically, you can use BigQuery APIs. Using the BigQuery APIs is a flexible way to load data from CSV to BigQuery. 

Here’s a sample Python code you can use to load CSV to BigQuery:

plaintext
from google.cloud import bigquery client = bigquery.Client() table_ref = client.dataset('dataset').table('table') job_config = bigquery.LoadJobConfig() job_config.source_format = bigquery.SourceFormat.CSV job_config.skip_leading_rows = 1 job_config.schema = [    bigquery.SchemaField('col1', 'STRING'),    bigquery.SchemaField('col2', 'INTEGER'), ] with open('/path/to/file.csv', 'rb') as source_file:    job = client.load_table_from_file(source_file, table_ref, job_config=job_config)    job.result()  # Wait for the job to complete    table = client.get_table(table_ref)

This code uses the load_table_from_file method to add CSV to BigQuery. It specifies the schema of the table with SchemaField objects. 

You can also use options like skip_leading_rows and field_delimiter in the LoadJobConfig object for a customized load process.

Method #4: SaaS Alternatives to CSV Export

Wouldn’t you love to have a hassle-free process to upload data to BigQuery, with a low-code approach? While there is a range of multi-purpose data transfer ETL, you can use Estuary Flow for quick and seamless data integration in real time.

Flow is a promising tool that will help extract your data from CSV and load it in BigQuery. Completing this task will only take a few clicks. However, to get started, you must have a Google Cloud Storage bucket in the same region as the BigQuery destination dataset.

Here’s how you can go about loading data from CSV to BigQuery using Estuary Flow:

Step 1: Log in to Estuary. Don’t have an account yet? You can register for a free account.

Step 2: After logging in, you’ll be redirected to the Estuary dashboard. Click on Captures on the left side of the dashboard to start setting up your source end of the pipeline.

csv to bigquery - Captures page

Click on the New Capture button to land on the Create Capture page. You can load CSV data either from Google Sheets or CSV hosted at an HTTP endpoint. Search for Google Sheets or HTTP File in Search Connectors accordingly.

csv to bigquery - HTTP File search result

If you have your CSV data hosted at an HTTP endpoint, select the HTTP File connector. On the connector page, provide a Name for the connector and the HTTP file URL. Click on the Next button after you’ve filled in the details, then click on Save and Publish.

csv to bigquery - HTTP Connector page1

If you’ve stored your CSV data in Google Sheets, search for the Google Sheets connector in the Search Connectors box.

csv to bigquery -Google Sheets Connector Search

Click on the Capture button to navigate to the Google Sheets connector page.

csv to bigquery -Google Sheets Connector Page

Provide an appropriate Name for the capture and the link to your Spreadsheet. You can mention the number of rows to be fetched when making a Google Sheet API call. Once you’ve filled in the details, then click on Next. Following this, you must click on Save and Publish. This will capture your data from Google Sheets into a Flow collection.

Step 3: Now that you’ve set up your pipeline’s source end, you can proceed with the destination setup. Navigate to the Estuary dashboard and click on Materializations on the left side.

You’ll land on the Materializations page. Click on the New Materialization button to start setting up the destination end of the pipeline. Since you’re moving data from CSV to BigQuery, search for BigQuery in Search Connectors.

csv to bigquery - BigQuery Materialization Connector

Click on the Materialize button of the BigQuery connector. You’ll be redirected to the BigQuery materialization connector page.

Before connecting BigQuery using Flow, you must meet a few prerequisites. Ensure you complete these steps before you proceed. Once you’ve completed the steps, you can continue setting up your destination.

csv to bigquery - BigQuery Connector page

Fill in the required fields, like a Name for the connector, Project ID, and Name of the target BigQuery dataset. Once you’ve provided all the details, click on the Next button. If you see that your CSV data wasn’t automatically loaded, use the Collection Selector to add the captured CSV data. Then, click on Save and Publish.

If you’d like more information and detailed instructions, here’s the Estuary documentation on:

Conclusion

That’s all there is to add CSV to BigQuery! While you can use the CLI, Web UI, or Web API, nothing can compare to using an ETL tool like Estuary Flow. The built-in connectors in Flow allow the building of real-time data pipelines between CSV to BigQuery. All this with little-to-no code! By transferring your CSV data to BigQuery, you can easily analyze and gain valuable insights for better decision-making.

Get started with Estuary Flow today. Your first pipeline is free!

Start streaming your data for free

Build a Pipeline

Author

Author's Avatar
Jeffrey Richman

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.