4 min read

February 26, 2025

How to Migrate From Snowflake Native Tables to Open Catalog (Polaris) Iceberg Tables

Learn how to migrate Snowflake native tables to Polaris Iceberg for open storage, scalability, and interoperability. Step-by-step guide with PySpark code.

Dani Pálma Head of Data Engineering Marketing

Share this article

Overview

Migrating from a Snowflake-managed table to a Polaris Iceberg table requires a well-planned and structured approach. Snowflake’s native tables are proprietary, while Snowflake Open Catalog (Polaris) leverages Apache Iceberg for an open, scalable table format.

The Snowflake Native Tables to Polaris Iceberg Tables migration process involves:

Extracting data from Snowflake.
Syncing it with an external Polaris catalog.
Converting it to a Polaris internal catalog.

This guide walks through the technical details, including setting up external volumes, catalog integration, and the actual migration steps using PySpark.

Why Migrate to Polaris Iceberg?

Migrating to Polaris Iceberg tables provides several advantages:

Open Storage Format: Avoids vendor lock-in by leveraging Apache Iceberg.
Cost Efficiency: Optimize storage and query performance.
Interoperability: Enables seamless analytics across Spark, Presto, Trino, and Flink.
Scalability: Supports large datasets with transactional integrity.

Prerequisites

Before migrating, ensure:

A Snowflake account with the necessary permissions
A Polaris account with external and internal catalog access
Apache Spark with Iceberg support configured
Access to an object storage bucket (S3, GCS, or equivalent)

Step 1: Set Up Polaris External Catalog

First, create an external Polaris catalog to store the Snowflake table in an Iceberg-compatible format.

plaintext-- Create an external catalog in Polaris
CREATE CATALOG polaris_external WITH (
    location = 's3://your-bucket/polaris_external/'
);

Step 2: Create a Snowflake Iceberg Table

Now, create an Iceberg table in Snowflake and enable continuous sync with Polaris.

plaintextUSE DATABASE sales_db;
USE SCHEMA transactions;

CREATE OR REPLACE ICEBERG TABLE order_history_ext (
    order_id INT,
    customer_id STRING,
    order_date TIMESTAMP,
    total_amount DECIMAL(18,2),
    status STRING
) 
CATALOG = 'SNOWFLAKE'
EXTERNAL_VOLUME = 'polaris_external_vol'
BASE_LOCATION = 'order_history'
CATALOG_SYNC = 'sales_migration';

This ensures that every commit in Snowflake is reflected in Polaris.

Step 3: Sync Data with Polaris External Catalog

Once the table is created, populate it with data from an existing Snowflake table.

plaintextINSERT INTO order_history_ext
SELECT * FROM sales_db.transactions.order_history
WHERE order_date >= '2023-01-01';

Step 4: Migrate to Polaris Internal Catalog Using PySpark

With the data synced to Polaris, use PySpark to register the external table into Polaris’ internal catalog.

python
from pyspark.sql import SparkSession

spark = SparkSession.builder \
    .appName("PolarisOrderMigration") \
    .config("spark.sql.catalog.polaris", "org.apache.iceberg.spark.SparkCatalog") \
    .config("spark.sql.catalog.polaris.type", "hadoop") \
    .config("spark.sql.catalog.polaris.warehouse", "s3a://your-bucket/polaris_internal/") \
    .getOrCreate()

# Switch to Polaris catalog
spark.sql("USE CATALOG polaris")
spark.sql("SHOW NAMESPACES").show()

# Fetch latest Iceberg metadata file
bucket_name = 'your-bucket'
prefix = 'transactions/order_history_ext/metadata/'
latest_metadata = get_latest_metadata_json_file(bucket_name, prefix)

# Register table in Polaris Internal Catalog
spark.sql(f'''
CALL polaris.system.register_table(
    table => 'polaris.transactions.order_history',
    metadata_file => 's3a://{bucket_name}/{latest_metadata}'
)
''')

# Verify migration
spark.sql("SHOW TABLES").show()
spark.sql("SELECT * FROM polaris.transactions.order_history").show(10)

Step 5: Cleanup and Finalize

Once the data is successfully migrated, drop unnecessary Snowflake tables to finalize the migration.

python-- Drop the original Snowflake table
DROP TABLE sales_db.transactions.order_history;

-- Drop the external sync table
DROP TABLE sales_db.transactions.order_history_ext;

After deleting the old tables, you're good to go!

Optional: Expose Polaris Table to Snowflake Using Rest Integration

If you need Snowflake to access the Polaris internal catalog table, create a read-only Iceberg table.

pythonCREATE OR REPLACE ICEBERG TABLE sales_migrated.transactions.order_history
AS SELECT * FROM polaris.transactions.order_history;

Post-Migration Optimizations

After migration, optimize the Iceberg table for performance:

python
# Rewrite data files for optimization
spark.sql("CALL polaris.system.rewrite_data_files('polaris.transactions.order_history')")

# Add partitions (if needed)
spark.sql("ALTER TABLE polaris.transactions.order_history ADD PARTITION FIELD order_date")

# Configure RBAC permissions
spark.sql("GRANT SELECT ON polaris.transactions.order_history TO role data_analyst")

Frequently Asked Questions (FAQs)

1. What is Polaris Iceberg in Snowflake?

Polaris is Snowflake’s Open Catalog, allowing users to store and manage Apache Iceberg tables externally while maintaining compatibility with Snowflake analytics.

2. How does Polaris differ from Snowflake native tables?

Polaris uses Apache Iceberg, which is an open table format.

Snowflake tables are proprietary and locked into the Snowflake ecosystem.

Polaris supports multiple compute engines beyond Snowflake.

3. Can I use Polaris Iceberg tables with Trino and Presto?

Yes, Polaris Iceberg tables are fully compatible with Trino, Presto, Apache Spark, and Flink, making it ideal for a multi-engine architecture.

4. What are the cost benefits of moving to Polaris Iceberg?

Polaris Iceberg provides lower storage costs, better query optimization, and open storage flexibility, reducing vendor lock-in costs.

5. How do I ensure data integrity during migration?

Enable continuous sync between Snowflake and Polaris.

Perform data validation using SELECT COUNT(*) comparisons before switching.

Run checksum tests to validate data accuracy.

Conclusion

Migrating from Snowflake to Polaris Iceberg unlocks open storage, better scalability, and interoperability with other analytics engines. With continuous sync, PySpark migration, and post-migration optimizations, you get a fully operational Iceberg table ready for production workloads.

Suggested Read:

Share this article

Table of Contents

Start Building For Free

About the author

Dani PálmaHead of Data Engineering Marketing

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.