sql serverbigquery

11 min read

Last updated: May 19, 2026

How to Easily Connect Microsoft SQL Server to BigQuery

Stop loading SQL Server tables into BigQuery with brittle ODBC scripts or batch jobs that miss deletes. Three production paths compared: log-based CDC, Google's native transfer service, and manual pipelines, with real tradeoffs.

Jeffrey Richman Data Engineering & Growth Specialist

Share this article

Summarize this page with AI

Start Building For Free

Moving data from SQL Server to BigQuery is one of the most common integration challenges for data engineering teams. The right approach depends on your latency requirements, hosting environment, and how much engineering overhead you can absorb. This guide covers three proven methods, from Google's native transfer service to real-time CDC pipelines, with honest tradeoffs for each.

How Do I Connect SQL Server to BigQuery?

You can connect SQL Server to BigQuery using three main approaches:

Google BigQuery Data Transfer Service: Native, scheduled batch transfers. Free during preview, but limited change-capture support.
Manual ODBC and Cloud SDK pipelines: Full control, but high setup and maintenance overhead.
Real-time CDC platforms like Estuary: Continuous streaming of inserts, updates, and deletes with minimal setup.

For production workloads that need complete, always-fresh data in BigQuery, CDC-based pipelines are the most reliable option.

SQL Server to BigQuery Methods Compared

Method	Best for	Captures deletes?	Freshness	Setup effort
Estuary	Continuous replication and fresh analytics	Yes, when CDC is available	Seconds to minutes	Low to medium
BigQuery Data Transfer Service	Native scheduled transfers	Limited; review Google docs	Scheduled batch	Medium
Manual ODBC / Cloud SDK	Custom one-off or fully controlled workflows	Only if custom-built	Depends on implementation	High
Cloud Data Fusion or Datastream	GCP-native replication workflows	Depends on configuration	Batch to real time	Medium to high

What to Know Before You Start

Before choosing a method, a few technical requirements and decision points will determine which approach fits your environment.

SQL Server requirements

Real-time SQL Server CDC requires the following:

Version: SQL Server 2016 SP1 or later. CDC is available across all editions from this version onward. Earlier versions restrict CDC to Enterprise and Developer editions.
SQL Server Agent: Must be running on self-hosted SQL Server and Azure SQL Managed Instance. Azure SQL Database uses an internal scheduler instead of Agent jobs.
Keys: SQL Server CDC itself can capture tables without primary keys, but downstream materialization into BigQuery needs a stable collection key for correct merge updates. If a source table does not have a primary key, define an appropriate key manually during capture setup in Estuary.

Estuary regularly tests its SQL Server connector against SQL Server 2017 and up.

Hosting environment

The command to enable CDC differs by platform:

Hosting	Enable CDC command
Self-hosted	`EXEC sys.sp_cdc_enable_db;`
AWS RDS	`EXEC msdb.dbo.rds_cdc_enable_db;`
Google Cloud SQL	`EXEC msdb.dbo.gcloudsql_cdc_enable_db '<database>';`
Azure SQL Database	`EXEC sys.sp_cdc_enable_db;` where CDC is supported; otherwise use Change Tracking
Azure SQL Managed Instance	`EXEC sys.sp_cdc_enable_db;`

Estuary provides dedicated connector variants for self-hosted SQL Server, Amazon RDS, and Google Cloud SQL, each with platform-specific setup instructions.

If CDC is not an option

Estuary offers two alternatives for environments where CDC is unavailable or not the right fit:

Change Tracking connector: Stores only primary keys rather than full row contents. Lower storage overhead on the source database, but may combine intermediate changes during rapid succession.
Batch Query connector: Periodically queries tables and captures results. Works on any SQL Server version and hosting type, but does not capture individual row-level changes between polling intervals.

BigQuery requirements

On the destination side, you need:

A Google Cloud Storage (GCS) bucket in the same region as your BigQuery destination dataset, used as a temporary staging area.
A Google Cloud service account with three IAM roles:
- roles/bigquery.dataEditor on the destination dataset
- roles/bigquery.jobUser on the project
- roles/storage.objectAdmin on the GCS bucket

One gotcha worth knowing upfront

If you add a column to a source table in SQL Server, the CDC change table does not automatically reflect the new column. Microsoft's recommended approach is to create a second capture instance for the updated schema, transition to it, and delete the old one. Estuary's connector detects a second capture instance automatically and seamlessly switches to the newest one once both are valid. You can also enable Automatic Capture Instance Management in Estuary to have the connector handle this, though it requires additional permissions on the source database.

Method 1: Real-Time CDC Pipeline with Estuary

Estuary captures changes from SQL Server using log-based Change Data Capture and streams them into BigQuery continuously. No custom code, no batch scheduling. Setup takes three steps.

Step 1: Enable CDC on SQL Server

Run the following script as a privileged user. This example is for a self-hosted SQL Server instance. For AWS RDS or Google Cloud SQL, swap the first EXEC line for the platform-specific command shown in the prerequisites table.

sql
USE <database>;

-- Enable CDC for the database
EXEC sys.sp_cdc_enable_db;

-- Create a user for the connector
CREATE LOGIN flow_capture WITH PASSWORD = 'secret';
CREATE USER flow_capture FOR LOGIN flow_capture;

-- Grant required permissions
GRANT SELECT ON SCHEMA :: dbo TO flow_capture;
GRANT SELECT ON SCHEMA :: cdc TO flow_capture;
GRANT VIEW DATABASE STATE TO flow_capture;

-- Create watermarks table used by the connector for replication state
CREATE TABLE dbo.flow_watermarks(slot INTEGER PRIMARY KEY, watermark TEXT);
GRANT SELECT, INSERT, UPDATE ON dbo.flow_watermarks TO flow_capture;

-- Enable CDC on a source table you want to capture
EXEC sys.sp_cdc_enable_table
  @source_schema = 'dbo',
  @source_name = 'Orders',
  @role_name = 'flow_capture';

Estuary also requires a flow_watermarks table for replication state tracking. Create it and grant the connector user SELECT, INSERT, and UPDATE permissions as shown above. Then run sys.sp_cdc_enable_table for each business table you want to replicate, such as Orders, Customers, or Invoices.

Before moving to Step 2, verify two things:

SQL Server Agent is running (self-hosted and Azure SQL Managed Instance only). CDC will not function without it on these platforms. Azure SQL Database uses an internal scheduler and does not require Agent.
Source tables have a defined key. If a table lacks a primary key, you can specify one manually in the Estuary collection definition during capture setup.

Step 2: Create the SQL Server Capture in Estuary

Log in to the Estuary dashboard and create a new capture. Select the SQL Server connector that matches your hosting environment.

Fill in the endpoint configuration:

Address: Your SQL Server host in host:1433 format
Database: The database name where CDC is enabled
User and password: The flow_capture credentials you created in Step 1
Network access: Either allowlist Estuary's IP addresses in your firewall, or configure an SSH tunnel directly in the endpoint settings

Estuary discovers your CDC-enabled tables automatically. Select the tables you want to capture, then publish the capture.

Captured changes are written to Estuary collections, which are append-only durable logs stored indefinitely in your cloud storage bucket. The full historical record is always available for backfills or replays without re-reading from the source database.

Step 3: Configure BigQuery Materialization

Go to the Destinations page in the Estuary dashboard and create a new materialization using the BigQuery connector.

Enter the following configuration:

Project ID and dataset name: Your BigQuery destination
Service account key: Paste or upload the JSON key file for your service account
GCS bucket: The staging bucket you created (must be in the same region as your dataset)
Sync schedule: Controls how frequently Estuary pushes accumulated changes to BigQuery

A note on sync schedule and cost: setting the interval to zero seconds means changes flow into BigQuery in near real-time. Setting a longer interval (for example, 30 minutes) reduces the number of load operations BigQuery processes, which can lower BigQuery destination costs. Estuary's own pricing does not change based on sync frequency.

Select the collections you want to materialize, then click Save and Publish. Estuary runs a complete backfill first, then switches to continuous incremental replication automatically.

The BigQuery connector supports two update modes:

Standard (merge) updates: Your destination table stays as an always-current mirror of the source.
Delta updates: Raw change events stream into the destination without merging, useful for building a full change history in BigQuery.

What Lands in BigQuery?

After the initial backfill, Estuary materializes SQL Server collections into BigQuery tables. In standard update mode, each BigQuery table represents the latest state of the selected source table, which is useful for reporting and dashboards.

If you choose delta updates, BigQuery receives raw change events instead of only the merged current state. This is useful when you want to preserve insert, update, and delete history for auditing, debugging, or downstream event analysis.

For example, a SQL Server Orders table can become a BigQuery orders table for current reporting, while a delta-update configuration can preserve the sequence of changes behind those orders.

Method 2: BigQuery Data Transfer Service (Google Native)

Google's BigQuery Data Transfer Service includes a SQL Server connector in preview. It supports both on-premises and cloud-hosted SQL Server instances (including AWS and Azure), handles data type mapping automatically, and is free to use during the preview period.

The service has meaningful limitations for production workloads:

Incremental transfers only work with TIMESTAMP watermark columns. Tables without a monotonically increasing timestamp cannot be incrementally synced.
A single transfer configuration supports either incremental or full ingestion, not both.
The number of concurrent transfer jobs is capped by SQL Server's maximum concurrent connections.
Some configuration options are locked once a transfer has run.

Because the SQL Server transfer connector is in preview, pricing, limitations, and supported configurations may change. Review Google's documentation before choosing it for a production pipeline. BQ DTS is a reasonable free option for one-time migrations or periodic batch loads. For continuous sync with full change capture, a CDC-based approach is usually a better fit.

See the official BigQuery Data Transfer Service documentation for setup instructions.

Method 3: Manual Pipeline (ODBC + Cloud SDK)

The traditional manual approach uses the BigQuery ODBC driver on the SQL Server side and the Google Cloud SDK for authentication. The process involves nine main steps:

Create a Google Cloud project and enable the BigQuery API.
Create a service account with permissions to access BigQuery.
Generate and download a JSON key file for the service account.
Install the Google Cloud SDK on your SQL Server machine.
Authenticate the SDK using your service account and project ID.
Install the BigQuery ODBC driver on your SQL Server machine.
Configure an ODBC data source for BigQuery in SQL Server.
Write the transfer logic (SSIS package, T-SQL job, or custom script).
Test the connection and validate data movement.

This method gives you full control but carries significant ongoing costs. You are responsible for driver updates, credential rotation, error handling, retry logic, schema change detection, and monitoring. There is no built-in CDC, so most implementations fall back to full table loads or timestamp-based incremental queries.

Other Approaches Worth Knowing

Several other data integration tools can move data from SQL Server to BigQuery. Each fits a specific use case.

SQL Server Integration Services (SSIS) with the BigQuery Connector suits teams already deeply invested in the SSIS ecosystem. It provides drag-and-drop components for SQL Server to BigQuery transfers but requires SSIS infrastructure and licensing, and lacks native CDC support.

Google Cloud Data Fusion is a managed visual pipeline builder native to GCP. It offers pre-built SQL Server and BigQuery connectors and can run CDC replication jobs. It is a strong fit for GCP-native shops that want a graphical interface, though pricing can scale quickly for smaller workloads.

Google Cloud Datastream is Google's native real-time CDC service. It supports SQL Server as a source and BigQuery as a destination with low-latency change capture. It is the closest GCP-native competitor to Estuary, best suited to teams committed to staying entirely within Google Cloud.

Apache Beam is an open-source unified programming model for batch and streaming pipelines. It offers maximum flexibility but requires significant engineering investment. Best for teams with strong data engineering capacity that need custom transformation logic.

How to Choose the Right Method

The right method depends on three decision axes.

Latency requirement

Near real-time (seconds to minutes): CDC-based platforms like Estuary or Google Cloud Datastream.
Hourly or daily batch: BigQuery Data Transfer Service or scheduled SSIS jobs.
One-time migration: Manual ODBC pipeline, BQ DTS in full ingestion mode, or a one-shot Estuary backfill.

Engineering capacity

No-code or low-code preferred: Estuary, Fivetran, Cloud Data Fusion.
Code-heavy, custom transformations needed: Apache Beam, custom Python with pyodbc and google-cloud-bigquery.
Existing SSIS or SSMS workflows: SSIS BigQuery Connector.

Hosting environment

Multi-cloud or hybrid: Estuary, Fivetran, or other vendor-neutral platforms.
100% Google Cloud: BQ DTS or Datastream may simplify procurement and billing.
Existing Microsoft stack with Azure: Azure Data Factory may integrate more naturally.

Where Estuary may not be the right fit

If your pipeline requires complex transformations before data lands in BigQuery (joins across sources, heavy enrichment, lookup tables), you may need to pair Estuary with dbt for downstream transformations in BigQuery, or evaluate a tool with a built-in mid-pipeline transformation layer. If your stack is exclusively Google Cloud and you only need to replicate GCP-managed databases, Datastream may be simpler to operate.

Common Use Cases

Connecting SQL Server to BigQuery typically supports one of the following scenarios:

Analytics offloading: Run heavy analytical queries in BigQuery instead of taxing the SQL Server OLTP database.
Cloud migration: Replicate SQL Server data to BigQuery as part of a phased migration off on-premises infrastructure.
Hybrid data warehousing: Combine SQL Server transactional data with other cloud sources in a single BigQuery warehouse.
Federated analytics: Enable BigQuery to query SQL Server data alongside other GCP data sources.
Real-time dashboards: Stream CDC-captured changes into BigQuery for immediate visualization in Looker or connected BI tools. See our guide on building real-time data pipelines for architecture patterns.

Getting Started

Estuary can help teams set up SQL Server to BigQuery pipelines without writing custom replication code, especially when they need CDC, backfills, and ongoing sync.

Start your free Estuary account or book a demo to walk through your specific use case.

This guide is published by Estuary. We have included competing approaches with honest assessments of where each fits best.

About the author

Jeffrey RichmanData Engineering & Growth Specialist

Jeffrey is a data engineering professional with over 15 years of experience, helping early-stage data companies scale by combining technical expertise with growth-focused strategies. His writing shares practical insights on data systems and efficient scaling.