Estuary

Apache Iceberg Copy-On-Write (COW) vs Merge-On-Read (MOR): A Deep Dive

Learn the difference between Iceberg Copy-On-Write (COW) and Merge-On-Read (MOR), with PySpark examples, performance impacts, and best practices.

Blog post hero image
Share this article

Apache Iceberg is widely adopted for its ability to manage large-scale datasets efficiently while supporting schema evolution and ACID transactions. However, when configuring Iceberg tables, data engineers often face the crucial decision of whether to use Copy-On-Write (COW) or Merge-On-Read (MOR) as their update strategy.

This post explores the fundamental differences between these approaches, with practical PySpark examples demonstrating their impact on query performance, update efficiency, and storage behavior.

If you're looking to get started with Apache Iceberg, check out this article.

Apache Iceberg COW vs MOR: What’s the Difference?

Feature

Copy-On-Write (COW)

Merge-On-Read (MOR)

Write StrategyRewrites entire affected filesCreates separate delete/update files
Read PerformanceFast, as data is fully compactedSlower, as delete files must be applied
Write PerformanceSlow, due to full file rewritesFaster, as only small delete/update files are written
Storage ImpactHigher, due to frequent file rewritesLower, as full files are not rewritten
Use Case SuitabilityWorkloads with frequent reads and infrequent updatesWorkloads with frequent updates and fewer reads
Update MechanismFull file rewrite on updateNewly updated rows written separately, old data marked as deleted
Delete MechanismFull file rewrite on deleteDeletes tracked in separate delete files
Compaction RequirementNot required frequentlyRequired periodically to merge delete files
Best ForRead-heavy workloads, analytics, bulk updatesUpdate-heavy workloads, streaming ingestion, incremental updates

Let's dive deeper into each approach, including real-world PySpark examples.

1. Copy-On-Write (COW) in Apache Iceberg

How Copy-On-Write Works

  • When a write, update, or delete operation occurs, the affected data files are completely rewritten with the new changes.
  • This means no additional delete files are created.
  • Reads are highly optimized since the data is fully compacted.

Pros:

  • Optimized for frequent reads due to fewer fragmented files.
  • Efficient for bulk updates where many records are changed at once.
  • Simple file layout as each rewrite produces a clean version.

Cons:

  • Slow updates and deletes due to file rewriting overhead.
  • High write amplification, leading to increased storage usage and I/O costs.

PySpark Example: Creating a COW Table

python
prod_db_nm = "analytics_db" file_dir = "/tmp/warehouse" prod_db_dir = "analytics" spark.sql(f"""    CREATE TABLE IF NOT EXISTS iceberg_catalog.{prod_db_nm}.customer_cow (        customer_id BIGINT,        name STRING,        age INT,        signup_date DATE    )    TBLPROPERTIES (        'write.format.default' = 'parquet',        'write.delete.mode' = 'copy-on-write',        'write.update.mode' = 'copy-on-write',        'write.merge.mode' = 'copy-on-write',        'format-version' = '2'    )    LOCATION '{file_dir}/{prod_db_dir}/customer_cow' """)

Inserting Data into the Table

python
spark.sql(f"""    INSERT INTO iceberg_catalog.{prod_db_nm}.customer_cow    VALUES (1, 'Alice', 30, '2024-01-10'),           (2, 'Bob', 27, '2024-02-15'),           (3, 'Charlie', 35, '2024-03-20') """)

Expected Output:

python
+-------------+ |num_affected | +-------------+ | 3 | +-------------+

Updating a Record in COW Table

Every update rewrites the entire affected file.

python
spark.sql(f"""    UPDATE iceberg_catalog.{prod_db_nm}.customer_cow    SET age = 31    WHERE customer_id = 1 """)

Expected Output:

python
+-------------+ |num_affected | +-------------+ | 1 | +-------------+

Checking File Changes

python
spark.sql(f"""    SELECT file_path, partition FROM iceberg_catalog.{prod_db_nm}.customer_cow.files """).show()

Expected Output:

python
+----------------------+------------+ | file_path  | partition  | +----------------------+------------+ | /tmp/.../customer_cow/data/0001.parquet | 2024-01-10 | | /tmp/.../customer_cow/data/0002.parquet | 2024-02-15 | +----------------------+------------+

Visual Walkthrough: Iceberg COW File Changes

Initial State (Before COW Write)

Initial State Before COW Write

Snapshot 0: The table currently has only one snapshot (S0), which tracks the table's current state.

Manifest List 1: The manifest list points to a single manifest file (MF-1) that stores metadata about data files.

Manifest File 1: Tracks column statistics (e.g., min/max values) and points to data files.

Data Files:

  • Partition 1 contains F1 and F2.
  • Partition 2 contains F3 and F4.

No Updates Yet: The data files remain untouched, and the structure is simple.

Post-Write State (After COW Write)

Post COW-write state

New Snapshot 1: Iceberg creates a new snapshot (1) that tracks the latest table state.

New Manifest List 2: The new manifest list (2) replaces 1, pointing to the updated files.

New Manifest File 2: A new manifest file (2) is generated, reflecting the updated column statistics.

Rewritten Data Files:

  • Partition 1's data files are completely rewritten (1' and 2').
  • Partition 2 remains unchanged since no updates occurred there.

No Delete Files: Unlike Merge-On-Read (MOR), COW does not create delete files; instead, the entire affected data files are rewritten.

Warnings & Edge Cases for COW

  • Storage Growth: Every update recreates entire data files. Frequent updates will cause storage to grow exponentially.
  • Concurrency Issues: Ensure no other process modifies the same table simultaneously to avoid unnecessary rewrites.
  • Partition Awareness: If updates are localized to specific partitions, optimize storage by partitioning effectively.

2. Merge-On-Read (MOR) in Apache Iceberg

How Merge-On-Read Works

  • Updates and deletes create delete files instead of rewriting entire data files.
  • During reads, the delete files are applied on the fly, filtering out the outdated records.
  • This reduces write amplification but can degrade read performance due to extra processing.

Pros

  • Efficient Writes: Updates and deletes are written as separate files, avoiding the need for full file rewrites, which makes writes faster.
  • Lower Storage Overhead: Since only small delete/update files are created instead of rewriting entire data files, storage usage grows at a slower rate.
  • Better for Frequent Updates: Suitable for workloads with continuous updates, such as streaming data ingestion, CDC (Change Data Capture), and real-time ETL.
  • Flexible Schema Evolution: MOR allows incremental changes without significantly impacting existing data structures.
  • Reduced Write Amplification: Since only changed rows are written separately, there is less I/O cost compared to full file rewrites in Copy-On-Write (COW).

Cons

  • Slower Read Performance: Queries must apply delete files dynamically, which increases read latency and can degrade query speed, especially for analytical workloads.
  • Accumulation of Delete Files: Over time, a large number of delete/update files can accumulate, increasing metadata overhead and making queries slower.
  • Requires Periodic Compaction: To maintain query performance, Iceberg tables using MOR require background compaction jobs to merge delete files into base data files.
  • Higher Read Complexity: Queries need to merge data files and delete files dynamically, leading to increased processing overhead.
  • Potentially High Latency for Large Deletes: If a significant portion of data is deleted frequently, the number of delete files can grow rapidly, making queries inefficient until compaction occurs.

PySpark Example: Creating a MOR Table

python
spark.sql(f"""    CREATE TABLE IF NOT EXISTS iceberg_catalog.{prod_db_nm}.customer_mor (        customer_id BIGINT,        name STRING,        age INT,        signup_date DATE    )    TBLPROPERTIES (        'write.format.default' = 'parquet',        'write.delete.mode' = 'merge-on-read',        'write.update.mode' = 'merge-on-read',        'write.merge.mode' = 'merge-on-read',        'format-version' = '2'    )    LOCATION '{file_dir}/{prod_db_dir}/customer_mor' """)

Deleting a Record in MOR Table

python
spark.sql(f"""    DELETE FROM iceberg_catalog.{prod_db_nm}.customer_mor    WHERE customer_id = 2 """)

Expected Output:

python
+-------------+ | num_affected | +-------------+ | 1 | +-------------+

Checking File Changes

python
spark.sql(f"""    SELECT file_path FROM iceberg_catalog.{prod_db_nm}.customer_mor.files """).show()

Expected Output:

python
+-------------------------+ | file_path | +-------------------------+ | /tmp/.../customer_mor/data/0001.parquet | | /tmp/.../customer_mor/data/delete_0001.parquet | +-------------------------+

Visual Walkthrough: Iceberg MOR File Changes

Initial State (Before MOR Write)

This represents an Iceberg table before any updates or deletions occur. The table has a single snapshot (0) referencing existing data files.

Initial state before MOR write

Post-Write State (After MOR Write)

Unlike COW, Merge-On-Read does not rewrite entire files when an update or delete occurs. Instead:

  • Deleted rows are tracked in delete files.
  • Newly updated rows are written to new data files.
  • During reads, delete files are applied to filter out old data.
Post-MOR write state

Here’s what’s happening in more detail:

 

Initial State (Before MOR Write)

After MOR Write

SnapshotsOnly 0 existsA new 1 is created
Manifest ListsPoints to 1A new manifest list 2 is generated
Manifest FilesTracks original column stats and data filesA new manifest file 2 replaces 1
Data FilesOriginal files remain unchangedNew updated records written separately
Delete TrackingNo delete filesDelete files (1) are added
Performance ImpactFaster reads, no rewritesReads may slow down due to applying delete files

Warnings & Edge Cases for MOR

  • Read Performance Impact: Queries must filter out delete files, increasing read latency.
  • Storage Management: Although updates don’t rewrite full files, multiple delete files may accumulate over time.
  • Compaction: Periodically run Iceberg compaction jobs to merge delete files into base data.

Wrapping Up: Iceberg COW vs MOR

By understanding these trade-offs, you can fine-tune Iceberg tables for optimal performance and cost efficiency. Always benchmark both strategies based on query performance and storage impact before making a decision.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Dani Pálma
Dani PálmaHead of Data Engineering Marketing

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.