Estuary

Apache Iceberg Materialization: Standard Updates & More Catalogs

We've added new features to our Apache Iceberg materialization: support for Amazon EMR to run merge queries and expanded catalog compatibility with Apache Polaris and Snowflake Open Catalog. Learn more in our latest update.

Blog post hero image
Share this update

Apache Iceberg Materialization Enhancements in Estuary

Estuary's Apache Iceberg materialization has been updated with new features that improve flexibility and integration options. These enhancements include support for running merge queries using Amazon EMR and expanded catalog compatibility with Apache Polaris and Snowflake Open Catalog.

Feature Updates

1. Bring Your Own EMR for Merge Queries

Estuary allows users to execute MERGE INTO operations using their Amazon EMR (Elastic MapReduce) clusters. This update provides:

  • The ability to control compute costs by managing EMR cluster usage.
  • Integration with existing AWS-managed environments.
  • Performance improvements for large-scale updates and deletes in Iceberg tables.

For configuration details, refer to the documentation.

2. Expanded Catalog Support: Apache Polaris & Snowflake Open Catalog

Estuary Flow now supports materializing Iceberg tables into Apache Polaris and Snowflake Open Catalog:

  • Apache Polaris: A managed Iceberg service that simplifies Iceberg adoption and table management.
  • Snowflake Open Catalog: Enables the use of Iceberg tables stored in S3, GCS, or Azure Blob Storage while integrating with Snowflake’s query engine and governance features.

For implementation details, visit the documentation.

You can find the new connector if you search for Apache Iceberg

new_iceberg_connector.png

These updates provide more options for integrating Iceberg tables within different environments, helping users manage their data lakehouse architectures more effectively.

For further details, visit the documentation or contact our team.

Share this update

Table of Contents

Start Building For Free
Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.