Estuary

The Estuary 2025 Data Warehouse Benchmark

The Estuary 2025 Data Warehouse Benchmark evaluates leading cloud data warehouses in terms of performance and cost efficiency. Discover which platforms perform best for modern, streaming-first workloads.

Blog post hero image
Share this article
null success story logo
Resend

Resend Uses Estuary Flow for Internal Product Analytics and Fraud Detection.

Read Success Story

The Estuary 2025 Data Warehouse Benchmark

Choosing the right data warehouse might be the single most expensive decision a data team makes.

It’s also one of the easiest to get wrong.

Behind the glossy marketing promises and high-level feature lists, every data warehouse platform hides a few hard truths, performance ceilings, unexpected costs, reliability gaps under concurrent workloads, and in some cases, outright failure when pushed beyond toy examples.

At Estuary, we work with teams moving real data: fast and in high volumes. And we’ve seen first-hand how warehouse choices made in haste can lead to 18-month migrations and millions in sunk cost.

So we decided to do something about it.

We conducted a comprehensive, vendor-agnostic benchmark of today’s leading data warehouses, focused on the performance realities that matter in production: ingestion capacity, complex query runtime, failure modes, and true cost-to-performance ratios.

The result?

👉 Download the Estuary 2025 Data Warehouse Benchmark Report

What Makes This Benchmark Different

Benchmark report results format

We didn’t just re-run the same stale TPCH queries on stock datasets. We tested how data warehouses behave under real workloads, using:

  • The TPC-H SF1000 dataset (1TB+ of structured and semi-structured data)
  • Custom queries reviewed by real data practitioners
  • Full ingestion and query lifecycle tracking via Estuary Flow
  • No vendor-specific tuning or caching cheats
  • Open-source methodology you can run yourself (GitHub repo)

Key Findings

  1. BigQuery is a speed demon, but also the most cost-volatile

Incredible performance, especially on nested JSON data. But no guardrails means unpredictable bills.

  1. Snowflake offers balance

Stable, reliable, and scalable, and with smart engine sizing and auto-suspend, it can be cost-effective too.

  1. Databricks brings AI-ready flexibility, but has query performance quirks

If you're building ML pipelines, it’s a powerful option. Just be ready for slower SQL under load.

  1. Redshift and Fabric? Buyer beware

Both platforms suffered from frequent memory failures, long runtimes, and rigid provisioning. We observed hours-long runtimes and incomplete queries in multiple tests.

  1. Cost-to-runtime ratios vary wildly

A few seconds of saved runtime can cost you hundreds. Without tuning, some warehouses delivered 10x the cost for the same output.

Note: if you think there's something we should update for the next iteration, let us know!

Real-World Stress, Not Vendor Demos

In our benchmark:

  • We loaded over 8TB of data using Estuary pipelines
  • We ran multi-step queries with window functions, joins, and nested logic, not just SELECT COUNT(*)
  • We intentionally designed Query-F, aka the “Frankenquery” that pushed every platform to its limit
  • It’s all open source and available on GitHub.

Rankings that Actually Matter

We break it all down in the report, with detailed rankings across:

  • Cost-efficiency
  • Raw performance
  • Scalability
  • Startup-friendliness
  • Enterprise readiness
  • Reliability under stress

And yes, we call out which platforms failed memory tests and which ones succeeded even at small scale.

👉 Get the full report here

Why Estuary Did This

Estuary is a real-time data integration platform built to move massive amounts of data across streaming and batch into warehouses like Snowflake, BigQuery, Databricks, and more.

We built this benchmark because we work with these systems every day. We’ve seen the gaps. We’ve helped customers recover from bad fits. And we believe data teams deserve transparency before investing 12–24 months into a new warehouse.

Download the Report. Make a Better Decision.

Whether you’re:

  • Migrating to a new warehouse
  • Scaling analytics in a fast-growing company
  • Building real-time AI and ops pipelines
  • Or just curious how your current stack stacks up…

This benchmark gives you the clarity you need.

Want help assessing your current pipeline performance or planning a migration?
We’d love to chat. Contact us here.

FAQs

    It’s an independent performance evaluation comparing top cloud data warehouses like Snowflake, BigQuery, Redshift, and others on real-time ingestion, query latency, and cost-efficiency using real-world streaming datasets.
    The benchmark includes leading solutions such as Snowflake, BigQuery, Redshift, Databricks, and Microsoft Fabric, measured under identical workloads to ensure fair, reproducible results.
    It highlights how well data warehouses handle modern real-time use cases, helping teams select the most performant and cost-effective platform for streaming analytics, ELT, and operational workloads.

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Dani Pálma
Dani PálmaHead of Data & Marketing

Dani is a data professional with a rich background in data engineering and real-time data platforms. At Estuary, Daniel focuses on promoting cutting-edge streaming solutions, helping to bridge the gap between technical innovation and developer adoption. With deep expertise in cloud-native and streaming technologies, Dani has successfully supported startups and enterprises in building robust data solutions.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.