Estuary

How to Connect Salesforce to BigQuery: Complete Guide

Learn how to connect Salesforce data to BigQuery in real-time, with step-by-step guidance.

Share this article

Businesses generate a vast amount of data through sales transactions, customer interactions, social media platforms, and marketing campaigns. Salesforce plays a critical role in organizing essential domain data.

To enable advanced analytics, integrating data seamlessly from Salesforce to BigQuery, a powerful enterprise data warehouse, is essential. This integration facilitates in-depth analysis, providing valuable insights to enhance decision-making processes and drive business growth.

Let’s explore both platforms before diving into how to seamlessly connect Salesforce to BigQuery using two primary methods: APIs and SaaS ETL tools like Estuary Flow.

What is Salesforce CRM Data Platform?

1-Salesforce-Logo.png

Salesforce is a cloud-based customer relationship management (CRM) platform, which has evolved to support critical business operations like support and marketing. As a result, today, it helps businesses manage their customer interactions, marketing, sales, and customer service processes. Salesforce enables businesses to find more prospects, close more deals, and improve customer service, making it a go-to platform for managing business operations effectively. The importance of CRM data in organizing critical business operations cannot be overstated.

Key features of Salesforce include:

  • Analytics and Reporting: Salesforce offers built-in reporting and analytics capabilities to gain insights into sales, marketing, and customer service data. You can create custom reports, dashboards, and visualizations for tracking key metrics, helping make data-driven decisions. It plays a crucial role in turning business data into actionable insights, empowering businesses to deliver personalized customer experiences and run more effective campaigns.
  • Security and Data Privacy: Salesforce adheres to industry-standard security practices and compliance regulations, helping protect your data. It provides Salesforce Shield, a trio of data encryption, event monitoring, and audit trails to ensure the confidentiality and integrity of sensitive information.
  • Integration Capabilities: Salesforce provides APIs and integration capabilities to connect with other systems and applications. Through these APIs, many third-party integrations are available. These include ETL tools like Estuary, Stitch, and Talend, allowing you to extend its functionality.

What is Google BigQuery Data Warehouse?

2-BigQuery logo.png

BigQuery is Google’s serverlessfully-managed bigquery data warehouse that enables scalable analysis. BigQuery allows you to run SQL queries on terabytes of data in seconds and process petabytes of data within minutes. This is possible mainly because BigQuery separates the compute engine that analyzes your data from the storage. Based on the processing requirement of a query, the compute can be scaled up independently, providing flexibility for varying needs.

Here are some interesting features of BigQuery:

  • Columnar Storage: Data in BigQuery is stored in a columnar format, meaning it stores each column separately. This type of storage is optimized for analytical queries of large datasets.
  • Scalability: Since BigQuery is designed to handle petabytes of data, you can store and process massive amounts of data. This helps overcome any performance issues or hardware limitations. 
  • Integrations: BigQuery integrates seamlessly with other Google Cloud services, such as Google Cloud Dataflow, Google Analytics, and Dataproc. Additionally, BigQuery efficiently manages bigquery tables, allowing for smooth integration and syncing with services like Salesforce and Estuary, ensuring your data is always up-to-date and accessible. This allows you to easily use BigQuery alongside your existing data infrastructure and workflows.
  • Access to Free Version: If you’d like to try out BigQuery features for free and ensure it’s suitable for your business needs, you can use BigQuery Sandbox. You need not provide any billing or credit card details to use Sandbox.

Why Connect Salesforce to Google BigQuery?

While Salesforce and BigQuery are two individual powerful tools, integrating them will result in several benefits. Here are some reasons why migrating Salesforce data to BigQuery is worth considering:

  • Advanced Analytics: You can use BigQuery for more advanced analytics beyond the basic reporting and analytical capabilities of Salesforce. It provides powerful data analysis and machine learning workflows to help you uncover valuable insights and make data-driven decisions.
  • Data Warehousing: Salesforce is primarily a transactional database that can manage customer interactions. While it preserves limited-time historical data, it isn’t optimized for data analytics. Instead, you can migrate Salesforce data into a purpose-built data warehouse—BigQuery—for centralizing the data for analytics and reporting.
  • Integration with Data Visualization Tools: BigQuery seamlessly integrates with popular data visualization and BI tools like Data Studio, Looker, and Power BI. Connecting your Salesforce data stored with BigQuery with these tools helps create interactive dashboards, reports, and visualizations for intuitive insights.

Methods to Connect Salesforce to Google BigQuery

To connect Salesforce to BigQuery and transfer data efficiently, you can use one of these two methods. It's important to note that managing the connection setup can be efficiently done through the Google Cloud console. Additionally, preparing and loading data from Salesforce to BigQuery is a crucial step in the process.

  • Method #1: Using APIs
  • Method #2: Using SaaS Alternatives

Method #1: Use APIs to Load Data From Salesforce to BigQuery

Apart from the many products that Salesforce offers, it also has an abundance of APIs. You can use certain APIs to access and extract data from Salesforce to load into BigQuery. Ensuring data quality before loading it into BigQuery is crucial for achieving analysis-ready data and real-time business insights.

Here’s a step-by-step guide for API-based Salesforce to BigQuery integration:

Step 1: Choose and Access Your API:

Step 2: Extract Data from Salesforce:

  • After successful authentication, fetch the desired data from Salesforce using the chosen API.
  • Build API requests to retrieve specific objects and fields.
  • If you’re looking for real-time data, you can use Salesforce’s Streaming API.

Step 3: Prepare Data for BigQuery:

  • Ensure that extracted data is formatted correctly before loading it into BigQuery. The two data formats that BigQuery supports for loading data into it are the CSV and JSON formats.
  • Ensure that data types are compatible with BigQuery (refer to the BigQuery Data Types documentation )
  • Carefully choose the 'data location' in BigQuery to optimize access and processing speeds.

Step 4: Stage Data in Google Cloud Storage (Optional):

  • For large datasets, consider using Google Cloud Storage (GCS) as a staging area.
  • Upload your prepared data to GCS for efficient processing.

Step 5: Load Data into BigQuery:

  • Utilize the BigQuery API or command-line tools like bq to load your data.
  • Load data directly from your local environment or from GCS (if used as a staging area).
  • Create a new BigQuery table or append data to an existing table within your chosen dataset. 

Method #2: Salesforce to BigQuery Integration with SaaS Alternative Estuary Flow

While APIs make it convenient to connect Salesforce to BigQuery, there are some drawbacks associated with using APIs. Implementing an API-based transfer requires writing custom code and managing API maintenance overhead. This adds to the complexity of the integration process, requiring additional efforts. Data analysts find more streamlined approach is to leverage SaaS ETL (Extract, Transform, Load) tools like Estuary Flow. These tools automate data transfer, manage schema compatibility, and often provide a low-code or no-code interface.

There’s a better and easier way to connect Salesforce to BigQuery. Low-code ETL tools like Estuary Flow can automate real-time data transfer processes while ensuring schema compatibility. Estuary Flow is a powerful ETL tool for connecting Salesforce to BigQuery. Flow automates real-time data transfer between data sources and targets using pre-built connectors. This makes it suitable for monitoring, data processing, and reporting tasks, especially when dealing with multiple data sources.

Estuary delivers a comprehensive experience for capturing data from Salesforce by using a mixture of different APIs and methods. You can capture historical and real-time updates with two connectors based on different APIs. For capturing historical data, Estuary uses a slightly modified version of Airbyte's Salesforce connector that uses the REST Query API.

And for real-time data, it uses Pushtopic Streaming APIs, which are built for receiving real-time updates. 

Let’s look at Step-by-Step Guide to Transferring Salesforce Data to BigQuery using Estuary Flow:

Step 1:  Sign Up and Access Estuary Flow:

  • Log in to your Estuary account if you already have one. Else, you can register for a free Estuary account.

Step 2: Capture Salesforce Data

  • In the Estuary Flow dashboard, click Captures on the left sidebar, then select New Capture.
3-New Capture button.png
  • Locate the Salesforce connector by scrolling or using the Search Connectors box. Once you’ve located the connector, click on the Capture button.
4-Salesforce connector search result.png
  • Notice that there are two Salesforce connectors. The regular Salesforce connector captures data from Salesforce objects into Flow collections using batch processing. On the other hand, the Salesforce Real-Time connector is used for capturing data as soon as it is generated via the Salesforce PushTopic API.

Let’s consider the regular Salesforce connector for this tutorial. You will be directed to the Salesforce Historical Data connector page.

5-Salesforce Connector page.png
  • Complete the required fields on this page, such as connector Name, Start Date, and Filters for your Salesforce Objects. If you’re using a Salesforce Sandbox, tick the box against Sandbox. Now, click on Authenticate Your Salesforce Account. You’ll be prompted to provide your Salesforce user credentials. 
  • Upon completing these requirements, click on Next. Flow connects with your Salesforce account and detects all the data objects available there. When you’re ready, click on Save and Publish.

Instead of the regular Salesforce connector, you can select the Salesforce Real-Time connector to capture data from Salesforce objects in real time. This connector only requires you to authenticate your Salesforce account to start capturing the data.

Step 3: Set Up BigQuery Destination (Materialization)

  • Next, you must set up the destination for your Salesforce data. You can click on Materialize Connections in the pop-up following a successful capture. Or you can navigate to the Estuary dashboard and click on Materializations on the left-side pane. Then, click on the New Materialization button.
6-New materialization button.png

Image Source

  • Search for the BigQuery connector in Search Connectors. Then, click on the Materialize button.
7-bigquery search result.png
  • BigQuery requires a few additional prerequisites before connecting to Flow.
  • Once you’re done with the prerequisite steps, you can proceed with setting up your destination. The BigQuery materialization requires a few details. Fill in the required fields, like a connector NameProject IDService Account JSON credentials, RegionDataset, and Bucket details.
8-Bigquery connector.png
  • Once you’ve provided all the details, click on Next. Then, click on Save and Publish.

By following these steps and leveraging Estuary Flow's intuitive interface, you can efficiently transfer your Salesforce data to BigQuery for advanced analytics and insights.

If you’re keen on more instructions, here’s the Estuary documentation for:

Key Advantages of Using Estuary Flow

  • Reduced Complexity: Eliminate the need for custom code and API maintenance.
  • Accelerated Time-to-Value: Set up and start transferring data quickly.
  • Real-Time Insights: Gain access to up-to-the-minute Salesforce data in BigQuery for immediate analysis.
  • Scalability: Easily handle large volumes of Salesforce data.

Final Thoughts on Salesforce to BigQuery Integration

Salesforce and BigQuery might be two useful and powerful tools individually, but the real benefit is when you combine both. Migrating your Salesforce data to BigQuery enables you to leverage BigQuery’s powerful querying capabilities. With the capacity to store and process petabytes of data in minutes, you can gain valuable insights for improved decision-making.

Two Primary Methods for Salesforce to BigQuery Integration:

  1. Direct API Integration: Offers flexibility but requires technical expertise and ongoing maintenance.
  2. SaaS ETL Tools (like Estuary Flow): Streamlines the process with a user-friendly, low-code interface, automatic data capture, and real-time updates.

A drawback of the API method is the need to stay vigilant for Salesforce API changes. However, Estuary Flow streamlines the entire data movement process, enabling you to set up and initiate data transfer in just a few minutes.

Experience the power of Salesforce and BigQuery integration. Register for a free Estuary account and start building your first pipeline!

Related Articles

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Build a Pipeline

Start streaming your data for free

Build a Pipeline

About the author

Picture of Jeffrey Richman
Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.