Google BigQueryApache Iceberg

16 min read

Last updated: March 5, 2026

Google BigQuery to Iceberg Tables: Best Methods and Tools

This guide explains how to move data from BigQuery to Apache Iceberg, covering automated sync options, manual export workflows, and key considerations for analytics and lakehouse use cases.

Jeffrey Richman

Share this article

You can move data from Google BigQuery to Apache Iceberg using two practical approaches: an automated pipeline that continuously syncs data, or a manual export-and-load workflow using native BigQuery exports and Apache Spark. The right method depends on how frequently your data changes, how much operational complexity you can support, and whether Iceberg tables need to stay up to date over time.

Apache Iceberg is often used alongside BigQuery to reduce long-term storage costs, enable multi-engine analytics, and support large-scale historical analysis. Estuary is the Right-Time Data Platform that allows teams to automate data movement from BigQuery into Iceberg tables, with control over when data moves, including near real-time or batch delivery.

Key Takeaways

There are two common ways to move data from BigQuery to Iceberg: an automated integration using Estuary, or a manual workflow using BigQuery exports and Apache Spark.
Automated pipelines are best for ongoing analytics and production workloads, where Iceberg tables must remain synchronized with BigQuery data over time.
Manual export-based approaches are better suited for one-time migrations or archival use cases, but require more engineering effort and infrastructure management.
Apache Iceberg enables capabilities that complement BigQuery, including open table storage, schema evolution, time travel, and support for multiple query engines.
The choice depends on data freshness, cost predictability, and operational ownership, not just dataset size.

BigQuery to Apache Iceberg Data Movement Options: How to Choose the Right Method

There are multiple ways to move data from Google BigQuery to Apache Iceberg, but in practice most teams choose between an automated incremental pipeline or a manual export-and-load workflow. The right approach depends on how frequently your BigQuery data changes, how fresh Iceberg data needs to be, and how much operational complexity your team can support.

In most real-world architectures, BigQuery is not fully replaced. Instead, data is copied or synchronized into Iceberg to support long-term storage, multi-engine analytics, or cost optimization. For this reason, it is more accurate to think in terms of data movement or offloading, rather than a one-time migration.

BigQuery to Iceberg: Method Comparison

Criteria	Automated Data Movement (Estuary)	Manual Export and Load (BigQuery + Spark)
Primary use case	Ongoing data synchronization	One-time or periodic data offloading
Data freshness	Incremental, poll-based (near real-time to batch)	Batch only
Setup effort	Low	Medium to high
Ongoing maintenance	Minimal	Requires scripting and job management
Schema changes	Automatically propagated	Manual handling required
Infrastructure required	Managed by platform	Spark environment required
Failure recovery	Built-in state tracking	Custom retry logic required
Best suited for	Production analytics and lakehouse pipelines	Backfills and archival use cases

When to Use Automated BigQuery to Iceberg Data Movement

An automated approach is best when BigQuery tables change over time and Iceberg tables are expected to stay up to date without repeated manual exports.

This pattern is common when:

BigQuery is used for ingestion or transformation
Iceberg serves as a long-term analytical or lakehouse layer
Data must be queried by multiple engines such as Spark, Trino, or Flink
Teams want predictable operations without maintaining export scripts

Estuary supports this pattern by periodically querying BigQuery tables, using cursor columns to detect new or updated rows, and materializing those results into Iceberg tables on object storage. This approach aligns with BigQuery’s execution model and avoids assumptions about log-based CDC, which BigQuery does not provide.

When Manual Export and Spark-Based Loading Makes Sense

A manual export-based approach is more appropriate for controlled, batch-oriented scenarios.

This method is typically used when:

Data needs to be moved once or infrequently
BigQuery tables are relatively static
Teams already operate Spark infrastructure
Full control over export and load logic is required

In this workflow, BigQuery data is exported to cloud storage using native SQL commands and then loaded into Iceberg tables using Apache Spark. While flexible, this approach requires additional effort to manage schema evolution, retries, and job orchestration.

Key Factors to Consider Before Choosing a Method

Before deciding how to move data from BigQuery to Iceberg, consider:

Update frequency: Frequently changing tables benefit from automated incremental sync.
Operational ownership: Automated pipelines reduce long-term maintenance.
Cost predictability: Repeated full exports can increase BigQuery scan and egress costs.
Long-term architecture: Iceberg is often part of a shared lakehouse rather than a replacement for BigQuery.

Suggested read: How to load Data into Apache Iceberg

Best Tools to Move Data from BigQuery to Apache Iceberg

Several tools and approaches can be used to move data from Google BigQuery into Apache Iceberg tables. In practice, these options fall into two categories: managed data movement platforms that automate incremental sync, and custom export-and-load pipelines built using BigQuery and Spark.

The best tool depends on how often your BigQuery data changes, how fresh Iceberg data needs to be, and how much operational complexity your team is willing to manage.

Estuary (Automated Incremental Data Movement from BigQuery to Iceberg)

Estuary is the Right-Time Data Platform used to automate data movement from BigQuery into Apache Iceberg tables stored on object storage. Instead of relying on log-based CDC, Estuary’s BigQuery connector works by periodically executing queries against BigQuery tables and capturing new or updated rows into collections.

Incrementality is achieved by configuring cursor columns, such as update timestamps or monotonically increasing IDs. On each poll, Estuary queries only rows with cursor values greater than the last observed value, making it well suited for large analytical tables that change over time.

Estuary is best suited for:

Ongoing synchronization between BigQuery and Iceberg
Lakehouse architectures where Iceberg serves as long-term storage
Teams that want to avoid maintaining custom export jobs
Use cases where near real-time to batch updates are acceptable

Key characteristics:

Poll-based incremental data capture aligned with BigQuery’s execution model
Automatic schema propagation from BigQuery to Iceberg
Writes to Apache Iceberg tables on object storage using a catalog such as AWS Glue or a REST catalog
Minimal operational overhead compared to custom pipelines

Because Estuary manages state, polling, and delivery internally, it is commonly used when Iceberg tables must remain accurate and continuously updated without manual intervention.

Apache Spark with Native BigQuery Exports (Manual Pipeline)

Apache Spark is often used when teams want full control over how BigQuery data is exported and loaded into Iceberg.

In this approach:

Data is exported from BigQuery using the EXPORT DATA SQL command, typically in Parquet format
Exported files are stored in cloud storage such as Google Cloud Storage
Spark reads the exported files and writes them into Iceberg tables using an Iceberg catalog

This method is best suited for:

One-time migrations or historical backfills
Periodic archival of BigQuery data
Teams with existing Spark infrastructure and expertise

While flexible, this approach requires engineers to manage:

Export scheduling
Schema evolution
Spark job configuration
Failure handling and retries

As a result, it is generally not ideal for keeping Iceberg tables continuously in sync with BigQuery.

BigQuery-Native and Cloud-Native Tooling (Limited Scope)

BigQuery-native tools, such as scheduled queries and export jobs, can be combined with cloud services to move data into Iceberg. These tools are typically used as building blocks rather than complete solutions.

They are best suited for:

Simple batch exports
Infrequent data movement
Low operational requirements

However, these approaches still require Spark or similar engines to write Iceberg tables and do not provide built-in handling for incremental updates or schema changes.

Tool Selection Summary

For most teams:

Use Estuary when you need an automated, low-maintenance way to keep Iceberg tables synchronized with BigQuery data.
Use Spark-based pipelines when performing one-time data movement or when full control over the export and load process is required.
Use BigQuery-native exports alone only for simple batch scenarios where Iceberg is not continuously updated.

Step-by-Step: Automated BigQuery to Apache Iceberg Data Movement Using Estuary

This method uses Estuary to automate incremental data movement from Google BigQuery into Apache Iceberg tables stored on object storage. It is best suited for teams that want Iceberg tables to stay up to date over time without building and maintaining custom export scripts or Spark jobs.

Estuary’s BigQuery connector operates by periodically executing queries against BigQuery tables and capturing new or updated rows using cursor columns, which aligns with BigQuery’s execution model and avoids assumptions about log-based CDC.

Prerequisites

Before setting up the pipeline, ensure the following are in place:

An active Estuary account
A Google Cloud project with BigQuery enabled
A Google Cloud service account with the following roles:
- BigQuery User
- BigQuery Data Viewer
A service account key in JSON format
An object storage bucket for Iceberg tables (such as Amazon S3)
An Iceberg catalog, such as AWS Glue or a REST-based catalog
Credentials with permission to write to the Iceberg storage and catalog

Step 1: Configure BigQuery as the Source

Log in to your Estuary account.
From the left navigation pane of the Estuary page, click the Sources option.

Click the + NEW CAPTURE button on the Sources page and search for BigQuery using the Search connectors field.

BigQuery to iceberg - bigquery connector search

When the BigQuery connector appears in the search results, click the connector’s Capture button.

BigQuery to iceberg - bigquery source configuration

You will be redirected to the BigQuery connector configuration page; enter a unique source capture name in the Name field under the Capture Details section.
Expand the Endpoint Config section and fill in the mandatory fields, including:
- Project ID: It is the Google Cloud Project ID associated with the BigQuery dataset(s).
- Service Account JSON: JSON credentials to authorize the service account.
- Dataset: The name of your BigQuery dataset.

For each selected table, configure cursor columns to enable incremental data capture. Common cursor choices include update timestamps or monotonically increasing identifiers. When a cursor is defined, Estuary queries only rows with cursor values greater than the last observed value on each polling cycle.

Once the configuration is complete, save and publish the capture.

Step 2: Configure Apache Iceberg as the Destination

Estuary provides Amazon S3-managed Apache Iceberg tables as a destination for faster analytics. Once your BigQuery data capture is complete, a pop-up window with the capture details appears. Click the MATERIALIZE COLLECTIONS option in this window to begin configuring the destination end of the data pipeline.

Alternatively, on the Estuary dashboard, click Destinations > + NEW MATERIALIZATION. On the Create Materialization page, begin your setup using the following steps:

Search for Iceberg in the Search connectors field; an Amazon S3 Iceberg connector will appear. Using this connector, you can materialize delta updates from Flow collections into Apache Iceberg tables. The connector leverages Amazon S3 for object storage and AWS Glue as the catalog.

Delta updates refer to changes made to data since the previous modification. These updates are batched within Flow, transformed into Parquet files, and then appended to the Iceberg tables according to your defined schedule.

BigQuery to iceberg - iceberg connector search

Click the connector’s Materialization button to proceed with your destination configuration.

BigQuery to iceberg - iceberg materialization configuration

On the Create Materialization page, fill all the necessary fields, including:
- Name: A unique name for your materialization.
- AWS Access Key ID: The Access Key ID to access AWS services.
- AWS Secret Access Key: A secret access key to access AWS services.
- Bucket: Name of the S3 bucket name into which you need to write the data.
- Region: The AWS region where your S3 bucket exists.
- Namespace: A namespace for bound collection tables.
In the Catalog section, you can choose either REST or AWS GLUE as the Iceberg catalog.
- The AWS Glue Catalog requires that your IAM user have the permissions to access AWS Glue.
- The REST Catalog requires the following additional configuration:
  - The Catalog URI
  - Credentials to connect to the catalog
  - The name of the Warehouse to connect with
Under the Source Collections section, you must verify if the captured data from the BigQuery source is automatically loaded to your materialization. If not, you can manually link the capture by clicking the SOURCE FROM CAPTURE button.
Click NEXT > SAVE AND PUBLISH to finish your destination configuration.

Estuary will begin materializing data from BigQuery into Iceberg tables according to the configured polling intervals.

How Incremental Updates Work

Estuary captures BigQuery data incrementally by:

Running SELECT queries against source tables at configurable intervals
Filtering rows using cursor columns
Tracking the highest observed cursor value between polls
Writing only new or updated rows to Iceberg tables

This approach allows Iceberg tables to remain synchronized with BigQuery data without repeatedly exporting full tables.

How Data Is Written to Iceberg

Captured data is:

Converted into columnar files
Written to object storage
Committed to Iceberg tables using snapshot-based metadata

Iceberg features such as schema evolution and time travel remain available, and tables can be queried by engines such as Spark, Trino, Flink, or Redshift Spectrum.

When This Method Is the Best Choice

Use this approach when:

BigQuery tables change regularly
Iceberg is used as a long-term analytical or lakehouse layer
You want to minimize operational overhead
Incremental updates are sufficient (near real-time to batch)

This method is commonly used for analytics pipelines, historical data offloading, and shared data lake architectures.

Step-by-Step: Manual BigQuery to Apache Iceberg Data Movement Using BigQuery Exports and Apache Spark

This method moves data from Google BigQuery to Apache Iceberg using a batch-based export and load workflow. It is best suited for one-time data movement, historical backfills, or periodic archival use cases where continuous synchronization is not required.

Unlike automated pipelines, this approach requires manual orchestration and infrastructure management, but provides full control over how data is exported, transformed, and written to Iceberg.

Prerequisites

Before starting, ensure you have:

A Google Cloud project with BigQuery enabled
Permissions to run BigQuery export jobs
A cloud storage bucket (typically Google Cloud Storage) for exported files
A Spark environment (Dataproc, Databricks, EMR, or self-managed Spark)
Apache Iceberg libraries compatible with your Spark version
An Iceberg catalog, such as:
- AWS Glue Catalog
- Hadoop catalog
- REST catalog

Step 1: Export Data from BigQuery to Cloud Storage

BigQuery provides a native EXPORT DATA statement that allows you to export query results directly to cloud storage in columnar formats such as Parquet, which are compatible with Iceberg.

Example export command:

plaintext language-sqlEXPORT DATA OPTIONS(
  uri='gs://your-bucket/bigquery-export/your_table_*.parquet',
  format='PARQUET',
  overwrite=true
)
AS
SELECT * FROM dataset_name.table_name;

This operation:

Executes a snapshot query against the table
Writes the results as Parquet files to Google Cloud Storage
Produces a static view of the data at export time

Each export job handles one query result, so multiple tables require separate export operations.

Step 2: Configure Apache Spark with Iceberg

To load data into Iceberg, Spark must be configured with Iceberg extensions and a catalog.

Example Spark configuration:

plaintext language-bashspark-sql \\
  --packages org.apache.iceberg:iceberg-spark-runtime-3.5_2.12:1.7.1 \\
  --conf spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions \\
  --conf spark.sql.catalog.iceberg=org.apache.iceberg.spark.SparkCatalog \\
  --conf spark.sql.catalog.iceberg.type=hadoop \\
  --conf spark.sql.catalog.iceberg.warehouse=s3://your-bucket/iceberg-warehouse

You can replace the Hadoop catalog with AWS Glue or a REST catalog, depending on your environment.

Step 3: Load Exported Data into Iceberg Tables

Spark can read the exported Parquet files directly from cloud storage without downloading them locally.

Example Spark SQL or PySpark workflow:

python
spark.sql("CREATE DATABASE IF NOT EXISTS analytics")

df = spark.read.parquet(
    "gs://your-bucket/bigquery-export/your_table_*.parquet"
)

df.writeTo("iceberg.analytics.your_table") \\
  .using("iceberg") \\
  .createOrReplace()

At this stage:

The data is written into an Iceberg table
Iceberg metadata tracks snapshots and file locations
The table becomes queryable by Spark and other compatible engines

Optional optimizations include:

Defining explicit schemas instead of relying on inference
Applying partition transforms such as days(timestamp_column)
Running compaction jobs to merge small files

Step 4: Validate and Maintain the Iceberg Table

After loading data:

Validate row counts against BigQuery
Run sample queries to confirm schema correctness
Monitor file sizes and metadata growth
Schedule compaction if data volumes are large

Because this method is batch-based, any new data added to BigQuery after the export requires a new export and load cycle.

Limitations of the Manual Export-and-Load Approach

This approach introduces several trade-offs:

Batch-only behavior: Iceberg tables are not continuously updated
Manual orchestration: Scheduling, retries, and monitoring must be handled by engineers
Schema drift risk: Changes in BigQuery schemas require manual updates
Higher operational overhead: Spark infrastructure must be managed

For these reasons, this method is typically used for migrations or archival workflows rather than production synchronization.

Operational Considerations and Best Practices for Moving Data from BigQuery to Apache Iceberg

Moving data from Google BigQuery to Apache Iceberg is not just a matter of exporting and loading tables. To ensure correctness, performance, and long-term maintainability, teams need to account for cost behavior, schema evolution, partitioning strategy, and ongoing table maintenance.

The following best practices apply whether you use an automated pipeline or a manual export-based approach.

Understand BigQuery Scan and Export Costs

BigQuery pricing is largely driven by data scanned during query execution. Exporting data using EXPORT DATA or incremental polling queries still incurs scan costs.

Best practices:

Avoid frequent full-table exports for large tables
Use incremental approaches when tables change regularly
Limit exports to required columns rather than SELECT * when possible
Be mindful of repeated scans caused by poorly chosen cursor columns

Automated incremental pipelines typically reduce scan costs over time compared to repeated full exports.

Choose Cursor Columns Carefully for Incremental Data Movement

When moving data incrementally from BigQuery, cursor columns determine which rows are captured on each run.

Recommended cursor characteristics:

Monotonically increasing values
Update timestamps when available
Creation timestamps for append-only tables
Composite cursors only when necessary

Avoid:

Non-deterministic columns
Columns that can decrease or be updated retroactively
Cursors that require scanning large portions of the table repeatedly

Well-chosen cursors significantly reduce query cost and improve reliability.

Plan for Schema Evolution Early

BigQuery schemas often change over time as new fields are added or data types evolve.

Best practices:

Validate how new columns are handled in Iceberg
Avoid incompatible type changes such as string-to-integer
Track nullable versus non-nullable fields
Monitor schema changes in BigQuery datasets proactively

Iceberg supports schema evolution, but not all changes are backward compatible without careful planning.

Design an Effective Partitioning Strategy in Iceberg

Partitioning has a major impact on Iceberg query performance and storage efficiency.

Recommended practices:

Use low-cardinality columns such as dates or regions
Prefer Iceberg partition transforms like days(timestamp_column)
Avoid over-partitioning, which increases metadata overhead
Base partitioning on query patterns, not source table structure

Iceberg’s hidden partitioning allows you to change partition layouts later without rewriting data.

Manage File Sizes and Compaction

Incremental data movement often produces many small files, which can degrade query performance.

Best practices:

Periodically compact small files into larger ones
Target file sizes in the 256 MB to 512 MB range for analytics
Schedule compaction as a background maintenance task
Monitor snapshot and metadata growth over time

Regular compaction improves scan efficiency and reduces planning overhead.

Account for Cross-Cloud Data Movement

In many architectures, BigQuery data is exported to Google Cloud Storage while Iceberg tables are stored on Amazon S3 or another object store.

Considerations:

Network egress costs between cloud providers
Latency for large exports
Security and credential management across clouds

For large datasets, cross-cloud data movement can significantly impact cost and performance.

Validate Data After Movement

After moving data into Iceberg, validation is essential.

Recommended checks:

Compare row counts between BigQuery and Iceberg
Validate sample records for type and value correctness
Run representative analytical queries to confirm performance
Monitor query plans and scanned file sizes

Early validation prevents subtle issues from propagating into downstream analytics.

Monitor and Maintain Iceberg Tables Over Time

Once data is in Iceberg, ongoing maintenance is required.

Best practices:

Monitor snapshot growth and metadata size
Clean up expired snapshots when appropriate
Review access controls on object storage and catalog services
Track query performance across different engines

Iceberg tables are durable, but they benefit from periodic housekeeping.

Conclusion

Moving data from Google BigQuery to Apache Iceberg is a common pattern for teams that want more control over long-term storage costs, support multiple analytics engines, or build an open lakehouse architecture. In most cases, this process is not a full migration away from BigQuery, but a deliberate way to offload or synchronize data into Iceberg for broader analytical use.

There are two practical approaches to achieve this. An automated, incremental pipeline is best suited for production analytics and long-lived data platforms where Iceberg tables need to stay up to date as BigQuery data changes. A manual export-and-load workflow is more appropriate for one-time data movement, historical backfills, or archival use cases where continuous updates are not required.

For teams that want to minimize operational overhead and avoid maintaining custom export jobs, automated data movement provides a more reliable and scalable solution. For teams that need full control and already operate Spark infrastructure, manual exports remain a viable option. The right choice depends on data freshness requirements, cost considerations, and long-term ownership of the pipeline.

To understand how Estuary automates data integration and enhances your business productivity, feel free to connect with Estuary experts.

Related Sync with BigQuery

FAQs

How do I move data from BigQuery to Apache Iceberg?

Data can be moved from Google BigQuery to Apache Iceberg using either an automated incremental pipeline or a manual batch-based workflow. An automated approach uses a data movement platform to periodically query BigQuery tables, capture new or updated rows, and write them into Iceberg tables on object storage. A manual approach involves exporting data from BigQuery using the EXPORT DATA command and then loading the exported files into Iceberg tables using Apache Spark. The automated method is better suited for ongoing analytics, while the manual method is typically used for one-time or infrequent data movement.

What is the best way to move data from BigQuery to Iceberg?

The best approach depends on how often the data changes and how the Iceberg tables are used. For production analytics and shared lakehouse environments, an automated incremental pipeline is usually the best option because it keeps Iceberg tables synchronized with BigQuery over time and reduces operational effort. For historical backfills or archival use cases, a manual export-and-load process can be sufficient, though it requires more engineering effort to manage.

When should I choose Iceberg over BigQuery for data storage?

Iceberg is often chosen when teams want to separate storage from compute, reduce long-term storage costs, support multiple query engines, or enable features such as schema evolution and time travel. BigQuery remains a strong option for serverless analytics and interactive querying, but Iceberg provides greater flexibility for modern data lake and lakehouse architectures.

Share this article

Summarize this page with AI

Table of Contents

Start Building For Free

About the author

Jeffrey Richman

With over 15 years in data engineering, a seasoned expert in driving growth for early-stage data companies, focusing on strategies that attract customers and users. Extensive writing provides insights to help companies scale efficiently and effectively in an evolving data landscape.

Google BigQuery to Iceberg Tables: Best Methods and Tools

Key Takeaways

BigQuery to Apache Iceberg Data Movement Options: How to Choose the Right Method

BigQuery to Iceberg: Method Comparison

When to Use Automated BigQuery to Iceberg Data Movement

When Manual Export and Spark-Based Loading Makes Sense

Key Factors to Consider Before Choosing a Method

Best Tools to Move Data from BigQuery to Apache Iceberg

Estuary (Automated Incremental Data Movement from BigQuery to Iceberg)

Apache Spark with Native BigQuery Exports (Manual Pipeline)

BigQuery-Native and Cloud-Native Tooling (Limited Scope)

Tool Selection Summary

Step-by-Step: Automated BigQuery to Apache Iceberg Data Movement Using Estuary

Prerequisites

Step 1: Configure BigQuery as the Source

Step 2: Configure Apache Iceberg as the Destination

How Incremental Updates Work

How Data Is Written to Iceberg

When This Method Is the Best Choice

Step-by-Step: Manual BigQuery to Apache Iceberg Data Movement Using BigQuery Exports and Apache Spark

Prerequisites

Step 1: Export Data from BigQuery to Cloud Storage

Step 2: Configure Apache Spark with Iceberg

Step 3: Load Exported Data into Iceberg Tables

Step 4: Validate and Maintain the Iceberg Table

Limitations of the Manual Export-and-Load Approach

Operational Considerations and Best Practices for Moving Data from BigQuery to Apache Iceberg

Understand BigQuery Scan and Export Costs

Choose Cursor Columns Carefully for Incremental Data Movement

Plan for Schema Evolution Early

Design an Effective Partitioning Strategy in Iceberg

Manage File Sizes and Compaction

Account for Cross-Cloud Data Movement

Validate Data After Movement

Monitor and Maintain Iceberg Tables Over Time

Conclusion

FAQs

How do I move data from BigQuery to Apache Iceberg?

What is the best way to move data from BigQuery to Iceberg?

When should I choose Iceberg over BigQuery for data storage?

Start streaming your data for free

About the author

Related Articles

Offloading BigQuery Workloads to MotherDuck for Faster and Affordable Analytics

Apache Iceberg Tutorial: The Ultimate Guide for Beginners

How to Load Data into Apache Iceberg: A Step-by-Step Tutorial

Popular Articles

ChatGPT for Sales Conversations: Building a Smart Dashboard

Why You Should Reconsider Debezium: Challenges and Alternatives

Don't Use Kafka as a Data Lake. Do This Instead.

Streaming Pipelines.

Simple to Deploy.

Simply Priced.