
Apache Iceberg is widely adopted for its ability to manage large-scale datasets efficiently while supporting schema evolution and ACID transactions. However, when configuring Iceberg tables, data engineers often face the crucial decision of whether to use Copy-On-Write (COW) or Merge-On-Read (MOR) as their update strategy.
This post explores the fundamental differences between these approaches, with practical PySpark examples demonstrating their impact on query performance, update efficiency, and storage behavior.
If you're looking to get started with Apache Iceberg, check out this article.
Apache Iceberg COW vs MOR: What’s the Difference?
Feature | Copy-On-Write (COW) | Merge-On-Read (MOR) |
Write Strategy | Rewrites entire affected files | Creates separate delete/update files |
Read Performance | Fast, as data is fully compacted | Slower, as delete files must be applied |
Write Performance | Slow, due to full file rewrites | Faster, as only small delete/update files are written |
Storage Impact | Higher, due to frequent file rewrites | Lower, as full files are not rewritten |
Use Case Suitability | Workloads with frequent reads and infrequent updates | Workloads with frequent updates and fewer reads |
Update Mechanism | Full file rewrite on update | Newly updated rows written separately, old data marked as deleted |
Delete Mechanism | Full file rewrite on delete | Deletes tracked in separate delete files |
Compaction Requirement | Not required frequently | Required periodically to merge delete files |
Best For | Read-heavy workloads, analytics, bulk updates | Update-heavy workloads, streaming ingestion, incremental updates |
Let's dive deeper into each approach, including real-world PySpark examples.
1. Copy-On-Write (COW) in Apache Iceberg
How Copy-On-Write Works
- When a write, update, or delete operation occurs, the affected data files are completely rewritten with the new changes.
- This means no additional delete files are created.
- Reads are highly optimized since the data is fully compacted.
Pros:
- Optimized for frequent reads due to fewer fragmented files.
- Efficient for bulk updates where many records are changed at once.
- Simple file layout as each rewrite produces a clean version.
Cons:
- Slow updates and deletes due to file rewriting overhead.
- High write amplification, leading to increased storage usage and I/O costs.
PySpark Example: Creating a COW Table
pythonprod_db_nm = "analytics_db"
file_dir = "/tmp/warehouse"
prod_db_dir = "analytics"
spark.sql(f"""
CREATE TABLE IF NOT EXISTS iceberg_catalog.{prod_db_nm}.customer_cow (
customer_id BIGINT,
name STRING,
age INT,
signup_date DATE
)
TBLPROPERTIES (
'write.format.default' = 'parquet',
'write.delete.mode' = 'copy-on-write',
'write.update.mode' = 'copy-on-write',
'write.merge.mode' = 'copy-on-write',
'format-version' = '2'
)
LOCATION '{file_dir}/{prod_db_dir}/customer_cow'
""")
Inserting Data into the Table
pythonspark.sql(f"""
INSERT INTO iceberg_catalog.{prod_db_nm}.customer_cow
VALUES (1, 'Alice', 30, '2024-01-10'),
(2, 'Bob', 27, '2024-02-15'),
(3, 'Charlie', 35, '2024-03-20')
""")
Expected Output:
python+-------------+
|num_affected |
+-------------+
| 3 |
+-------------+
Updating a Record in COW Table
Every update rewrites the entire affected file.
pythonspark.sql(f"""
UPDATE iceberg_catalog.{prod_db_nm}.customer_cow
SET age = 31
WHERE customer_id = 1
""")
Expected Output:
python+-------------+
|num_affected |
+-------------+
| 1 |
+-------------+
Checking File Changes
pythonspark.sql(f"""
SELECT file_path, partition FROM iceberg_catalog.{prod_db_nm}.customer_cow.files
""").show()
Expected Output:
python+----------------------+------------+
| file_path | partition |
+----------------------+------------+
| /tmp/.../customer_cow/data/0001.parquet | 2024-01-10 |
| /tmp/.../customer_cow/data/0002.parquet | 2024-02-15 |
+----------------------+------------+
Visual Walkthrough: Iceberg COW File Changes
Initial State (Before COW Write)
Snapshot 0: The table currently has only one snapshot (S0), which tracks the table's current state.
Manifest List 1: The manifest list points to a single manifest file (MF-1) that stores metadata about data files.
Manifest File 1: Tracks column statistics (e.g., min/max values) and points to data files.
Data Files:
- Partition 1 contains F1 and F2.
- Partition 2 contains F3 and F4.
No Updates Yet: The data files remain untouched, and the structure is simple.
Post-Write State (After COW Write)
New Snapshot 1: Iceberg creates a new snapshot (1) that tracks the latest table state.
New Manifest List 2: The new manifest list (2) replaces 1, pointing to the updated files.
New Manifest File 2: A new manifest file (2) is generated, reflecting the updated column statistics.
Rewritten Data Files:
- Partition 1's data files are completely rewritten (1' and 2').
- Partition 2 remains unchanged since no updates occurred there.
No Delete Files: Unlike Merge-On-Read (MOR), COW does not create delete files; instead, the entire affected data files are rewritten.
Warnings & Edge Cases for COW
- Storage Growth: Every update recreates entire data files. Frequent updates will cause storage to grow exponentially.
- Concurrency Issues: Ensure no other process modifies the same table simultaneously to avoid unnecessary rewrites.
- Partition Awareness: If updates are localized to specific partitions, optimize storage by partitioning effectively.
2. Merge-On-Read (MOR) in Apache Iceberg
How Merge-On-Read Works
- Updates and deletes create delete files instead of rewriting entire data files.
- During reads, the delete files are applied on the fly, filtering out the outdated records.
- This reduces write amplification but can degrade read performance due to extra processing.
Pros
- Efficient Writes: Updates and deletes are written as separate files, avoiding the need for full file rewrites, which makes writes faster.
- Lower Storage Overhead: Since only small delete/update files are created instead of rewriting entire data files, storage usage grows at a slower rate.
- Better for Frequent Updates: Suitable for workloads with continuous updates, such as streaming data ingestion, CDC (Change Data Capture), and real-time ETL.
- Flexible Schema Evolution: MOR allows incremental changes without significantly impacting existing data structures.
- Reduced Write Amplification: Since only changed rows are written separately, there is less I/O cost compared to full file rewrites in Copy-On-Write (COW).
Cons
- Slower Read Performance: Queries must apply delete files dynamically, which increases read latency and can degrade query speed, especially for analytical workloads.
- Accumulation of Delete Files: Over time, a large number of delete/update files can accumulate, increasing metadata overhead and making queries slower.
- Requires Periodic Compaction: To maintain query performance, Iceberg tables using MOR require background compaction jobs to merge delete files into base data files.
- Higher Read Complexity: Queries need to merge data files and delete files dynamically, leading to increased processing overhead.
- Potentially High Latency for Large Deletes: If a significant portion of data is deleted frequently, the number of delete files can grow rapidly, making queries inefficient until compaction occurs.
PySpark Example: Creating a MOR Table
pythonspark.sql(f"""
CREATE TABLE IF NOT EXISTS iceberg_catalog.{prod_db_nm}.customer_mor (
customer_id BIGINT,
name STRING,
age INT,
signup_date DATE
)
TBLPROPERTIES (
'write.format.default' = 'parquet',
'write.delete.mode' = 'merge-on-read',
'write.update.mode' = 'merge-on-read',
'write.merge.mode' = 'merge-on-read',
'format-version' = '2'
)
LOCATION '{file_dir}/{prod_db_dir}/customer_mor'
""")
Deleting a Record in MOR Table
pythonspark.sql(f"""
DELETE FROM iceberg_catalog.{prod_db_nm}.customer_mor
WHERE customer_id = 2
""")
Expected Output:
python+-------------+
| num_affected |
+-------------+
| 1 |
+-------------+
Checking File Changes
pythonspark.sql(f"""
SELECT file_path FROM iceberg_catalog.{prod_db_nm}.customer_mor.files
""").show()
Expected Output:
python+-------------------------+
| file_path |
+-------------------------+
| /tmp/.../customer_mor/data/0001.parquet |
| /tmp/.../customer_mor/data/delete_0001.parquet |
+-------------------------+
Visual Walkthrough: Iceberg MOR File Changes
Initial State (Before MOR Write)
This represents an Iceberg table before any updates or deletions occur. The table has a single snapshot (0) referencing existing data files.
Post-Write State (After MOR Write)
Unlike COW, Merge-On-Read does not rewrite entire files when an update or delete occurs. Instead:
- Deleted rows are tracked in delete files.
- Newly updated rows are written to new data files.
- During reads, delete files are applied to filter out old data.
Here’s what’s happening in more detail:
Initial State (Before MOR Write) | After MOR Write | |
Snapshots | Only 0 exists | A new 1 is created |
Manifest Lists | Points to 1 | A new manifest list 2 is generated |
Manifest Files | Tracks original column stats and data files | A new manifest file 2 replaces 1 |
Data Files | Original files remain unchanged | New updated records written separately |
Delete Tracking | No delete files | Delete files (1) are added |
Performance Impact | Faster reads, no rewrites | Reads may slow down due to applying delete files |
Warnings & Edge Cases for MOR
- Read Performance Impact: Queries must filter out delete files, increasing read latency.
- Storage Management: Although updates don’t rewrite full files, multiple delete files may accumulate over time.
- Compaction: Periodically run Iceberg compaction jobs to merge delete files into base data.
Wrapping Up: Iceberg COW vs MOR
By understanding these trade-offs, you can fine-tune Iceberg tables for optimal performance and cost efficiency. Always benchmark both strategies based on query performance and storage impact before making a decision.

About the author
Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.
Popular Articles
