Google Sheets is among the more popular applications used for data management and collaboration. You can easily perform data manipulation and analysis with the range of functions supported by Google Sheets. On the other hand, Google BigQuery, a data warehouse, offers advanced analytics capabilities. When you combine the strength of the two tools—Google Sheets and BigQuery—you can uncover new possibilities for data analysis.
Whether you want to track customer data, analyze sales-related data, or explore survey results, you can do all this effectively by loading your Google Sheets data into BigQuery. It will help you gain deeper insights into your business and make better decisions. Here’s all the information you need to get started with importing your data from Google Sheets to BigQuery.
What is Google Sheets?
Google Sheets is a free, cloud-based spreadsheet application provided by Google within the Google Drive service. It enables you to create, edit, and format spreadsheets online for organizing and analyzing data. An advantage of Google Sheets is that multiple people can work simultaneously, and every change is saved automatically. This allows you to see people’s changes as they make them and also track any changes made using a version history.
Here are some useful features of Google Sheets:
- Offline Editing: Google Sheets supports offline editing, so you can edit the spreadsheet offline either on desktop or mobile apps. For desktop, you must use the Chrome browser and install the Google Docs Offline extension. This will enable you to edit Google Sheets when offline. For mobile, you must use the Google Sheets mobile app for Android or iOS for offline editing.
- Import and Export Options: Google Sheets supports various formats like XLSX, CSV, and PDF for importing and exporting data. You can also import data from external sources, like databases, cloud-based storage platforms, and web services.
What is BigQuery?
Google BigQuery is a fully-managed, serverless, petabyte-scale data warehouse offered by the Google Cloud platform.
Unlike most traditional database systems that store their data in a row-oriented format, BigQuery stores data in a columnar format. Column-oriented databases are optimized for analytics workloads and are particularly efficient at scanning individual columns of an entire dataset. Another advantage of columnar storage is that values are of the same data type. This feature allows you to achieve greater data compression to improve read performance.
Here are some useful features of BigQuery:
- Fast Query Capabilities: BigQuery separates the compute engine that analyzes your data from the storage system. The separate compute engine, together with BigQuery’s columnar storage, allows you to query terabytes in seconds and petabytes in minutes.
- Standard SQL Support: BigQuery supports standard SQL, allowing you to run SQL queries for data analysis. You can use interactive or batch queries using SQL query syntax for querying your BigQuery data.
- BigQuery ML: You can use BigQuery ML to perform advanced analytics and derive useful insights from your BigQuery data. It allows you to create and deploy ML models using SQL queries. Some of the models that BigQuery ML supports through SQL include Boosted Trees, Random Forest, Deep Neural Networks, and Linear and logistic regression.
Methods to Load Data from Google Sheets to BigQuery
By loading data from Google Sheets to BigQuery, you can leverage BigQuery’s SQL querying capabilities for complex operations and analysis. This helps in the creation of insightful visualizations and reports.
Here’s a list of the different methods you can use to load data from Google Sheets to BigQuery:
- Method #1: Use BigQuery Connector
- Method #2: Use SaaS alternatives like Estuary Flow
Method #1: Using BigQuery Connector to Load Data From Google Sheets to BigQuery
You can use the BigQuery connector to load your Google Sheets data into BigQuery. Here are the steps that you can follow:
Step 1: Log in to your Google Cloud account. Navigate to the GCP console → BigQuery UI (in the Navigation menu).
Step 2: A BigQuery project contains a dataset that is used to organize and control access to your BigQuery tables and views. Before you load data into BigQuery, you need to create at least one dataset since a table or view must belong to a dataset.
This step involves creating a new dataset in BigQuery. Select the project of your choice within the Explorer pane, then click on the three vertical dots to View actions. Now, click on Create data set.
This will redirect you to the Create data set, where you must fill in the required fields like Data set ID, and Location type.
Step 3: After creating a dataset, you must create a BigQuery table to load the data from Google Sheets.
Click on CREATE TABLE within the data set tab that you have selected to start creating a new table.
In the Create table from option, select Drive. Then provide the Drive URL to access the Google Sheets file. Select File format as CSV or Google Sheets, as both formats allow you to select the Auto-Detect schema. Optionally, you can specify the column name and data type.
Step 4: Under the Destination section, fill in the Table name. Then, click on the CREATE TABLE button at the bottom. Your Google Sheets will be linked to Google BigQuery. And every time you make changes to your sheet, it will automatically reflect in BigQuery.
Step 5: Now that you have your Google Sheets data in BigQuery, you can run SQL queries on the ingested data.
Note: The BigQuery export to Google Sheets is another useful feature. It allows you to export your BigQuery data to Google Sheets, and the results of SQL queries on the data will be available on Sheets. Read this article to learn how to get started with BigQuery export to Google Sheets.
Method #2: Using SaaS Alternatives like Estuary Flow
While you can use the BigQuery connector or Sheets connector to import data from Google Sheets to BigQuery, there are some drawbacks associated with these methods. Both these connectors are primarily designed for batch loading of data and not real-time streaming. If you require real-time or near-real-time updates between your spreadsheets and BigQuery, these may not be the best options.
There are several ETL tools that offer connectors or integration capabilities to connect Google Sheets to BigQuery with real-time streaming. Estuary Flow is one such effective and easy-to-use tool that has in-built connectors to extract data from Google Sheets and load the data into BigQuery. Flow supports real-time data streaming, such that any change in your Google Sheet will reflect in your BigQuery table in real time.
Here’s how you can start loading your Google Sheets data into BigQuery using Estuary Flow:
Step 2: Now, you must set up the source connection—Google Sheets. Click on the Captures option on the left-side pane of the Estuary dashboard. You will be redirected to the Captures page. Here, click on the NEW CAPTURE button.
On the Create Capture page, you’ll find the entire list of source connectors, which Estuary supports, to configure your endpoint. Search for Google Sheets in the Search Connector box.
Now, click on the Capture button of the Google Sheets connector. This connector is derived from a third-party open-source connector and modified for performance in the Flow system.
Provide the required fields like the Name of the connector, Spreadsheet Link, and Row Batch Size on the connector page. For authentication, you can use one of the two ways—OAuth2 or generating a service account key manually.
After filling in the necessary fields, click on the Next button. Flow will connect to your Google account and detect the data resources for the link you provided. Then, click on Save and Publish.
Step 3: Now that you have set up the source endpoint of the data pipeline, the next step is to set up the destination endpoint—BigQuery. You can click on Materialize Connections in the pop-up that follows a successful capture. Alternatively, you can navigate to the Estuary dashboard and click on the Materializations option on the left-side pane. On the Materializations page, click on the NEW MATERIALIZATION button.
On the Create Materialization page, you’ll see the list of Estuary-supported connectors to materialize your data. Search for BigQuery in the Search Connectors box.
Click on the Materialization button of the Google Bigquery connector. The connector materializes Flow collections into tables within a Google BigQuery dataset. Before you proceed with setting up the BigQuery connector to connect to Flow, here are a few prerequisites to complete.
On the connector page, provide all the required details like Project ID, Service Account JSON credentials, Dataset, and Bucket details. Now, click on the Next button. You can use the Collection Selector to add the data captured from Google Sheets if it wasn’t filled in automatically. Then, click on Save and Publish.
If you’d like to know more about the process and connectors, here’s a list of the Estuary documentation:
- The Google Sheets source connector
- The BigQuery materialization connector
- Detailed steps to create a Data Flow like this one
By importing data from Google Sheets to BigQuery, you can leverage the power of a scalable data warehouse for advanced analytics. This helps unlock the full potential of your data for better decision-making.
The BigQuery connector and Sheets connectors offer a user-friendly way to directly integrate between Google Sheets and BigQuery. However, since they don’t support real-time updates, you can use Estuary to load data from Google Sheets to BigQuery. With Estuary Flow’s in-built connectors, you can automate the extraction of Google Sheets data and loads it into BigQuery tables. And with its real-time support, any changes in your spreadsheet will reflect instantly in your BigQuery table. This makes Flow a more effective solution than the use of the BigQuery connector or Sheets connector.
Estuary Flow allows you to load data from a range of sources to different destinations with its in-built connectors. Here are some other articles you might find useful involving Google Sheets as a source: