Estuary

Optimize Snowflake Costs Beyond Streaming

Discover smarter ways to lower Snowflake costs. Compare batch vs streaming ingestion and learn when right-time data delivers maximum efficiency.

Estuary reduces Snowflake costs whether streaming or batching data
Share this article

In our previous post, Cut Your Snowflake Bill by 70% with Streaming Ingestion, we explored how Snowpipe Streaming and Estuary together can cut ingestion costs by up to 70%. That approach is ideal for use cases that require sub-second or near real-time data freshness.

However, not every workload needs data to move in real time. Many use cases, such as periodic reporting and historical data analysis work just as well with batch ingestion. In these cases, a well-designed batch strategy can meet data freshness requirements while significantly lowering compute consumption in Snowflake.

This article explains how Estuary, the Right-Time Data Platform, supports multiple materialization strategies to help teams optimize Snowflake ingestion costs. You will learn when to use streaming versus batch ingestion and how Estuary gives you full control over when and how your data lands in Snowflake.

Key Takeaways

  • Many workloads do not need real-time streaming. Batch ingestion can meet analytics needs while lowering Snowflake costs.
  • Scheduling data loads can reduce warehouse runtime and credits by up to 75 percent.
  • Smart batch loading and configurable sync schedules help control when data lands in Snowflake.
  • A hybrid approach that combines streaming and batch ingestion provides the best balance of cost, freshness, and scalability.
  • With Estuary, data moves at the right time so teams get exactly the data they need, when they need it.

When Streaming Isn’t the Right Fit for Snowflake Ingestion

Although row-level streaming is efficient in many real-time use cases, it is not required in every scenario. Often, batch ingestion can be sufficient to satisfy analytics requirements, but costs less as compared to streaming ingestion.

For example, financial reporting, nightly data reconciliation, or historical trend analysis don’t need real-time data freshness. In these scenarios, Snowflake virtual warehouses don’t have to run continuously, which helps reduce warehouse runtime and compute credit usage.

Batch ingestion provides several key advantages in such scenarios:

  • Lower compute costs: rather than having an active virtual warehouse to ingest each row in real-time, data can be ingested in larger batches, scheduled less frequently. This minimizes the total time the Snowflake warehouse must be online and also reduces any idle time during warehouse suspension.
  • Fewer queries: by loading data in batches, Snowflake executes fewer queries which reduces processing time and consequently decreases compute costs.
  • Controlled orchestration: with Estuary, you specify when data is written to Snowflake in your pipeline. You can customize your sync frequency or combine batch and streaming in the same pipeline, ensuring that data that must be available more frequently is available when needed, while data that is not as urgent doesn’t consume unnecessary resources for ingestion.

Estuary’s smart batch loading ensures that Snowflake receives data efficiently. Instead of inserting each record continuously, data is staged and merged in optimized batches. This approach maintains data accuracy while lowering costs.

Smart Batch Loading with Merge Queries in Snowflake

When building data pipelines in Estuary, you can choose the materialization type based on how frequently you want the data to land in Snowflake based on your performance and cost goals. Estuary supports both standard batch updates and delta updates to give you the flexibility to manage data movement efficiently.

  • Standard updates: Choose this option when you are working in batch mode and ingest data on schedule. The data is staged in a Snowflake internal stage and then merged to update the target table.
  • Delta updates: Suited for real-time ingestion with or without Snowpipe Streaming. Each row that represents a change in source data is ingested into Snowflake with minimal latency.

You can choose which type of materialization to use on a per-table basis, by mixing and matching real-time and batch ingestion in the same project. This flexibility allows data engineers to optimize the ingestion for freshness and cost, according to the analytics use cases.

When operating in batch mode, Estuary applies several techniques to improve performance and minimize cost:

  • Bulk loading via Snowflake stages: Data changes are uploaded to a Snowflake internal stage. From there, the changes are materialized to the target table using a MERGE statement.
  • Min/Max values for key fields: This feature is designed to reduce compute usage and improve merge performance by limiting how much of the target table is scanned during updates.
  • Configurable polling intervals: You can set polling intervals that determine how frequently Estuary checks batch sources for changes. Or use CDC for continuous source ingestion while sticking with materialized batches to reduce costs in destination systems (see an example of this below).
  • Flexible sync schedules: With sync schedules, you control how often Estuary materializes data to Snowflake.

Together, these capabilities ensure that even without streaming, Estuary can ingest and update data efficiently and cost-effectively.

Data moves from Estuary to Snowflake using an internal stage
Polling for changes and batching ingestion according to the sync schedule

Watch: How to Load Data into Snowflake Using Estuary

If you want to see how these ingestion techniques look in practice, watch this short demo that walks through setting up Snowflake resources and configuring Estuary’s Snowflake materialization connector.

Control When Materialization Happens

Many data pipelines waste money by running 24/7 even when the downstream consumers don’t require real-time data. For workloads where data freshness measured in minutes or hours is acceptable, scheduled bulk loading can strike the right balance between cost and performance.

Estuary enables data engineers to have fine-grained control over when data is written to Snowflake. Through configurable polling intervals and sync schedules, they can define how often Estuary materializes the data to the target, while still capturing CDC data from the source in real time.

Choose sync frequency and fast sync times in Estuary
Set your preferred sync frequency or choose times when data should sync more frequently than normal.

For example, data pipelines in a retail analytics environment might continuously capture CDC changes from a PostgreSQL database but materialize the changes to Snowflake every hour. This approach ensures that data is fresh enough for reporting dashboards, while reducing compute usage as compared to streaming mode.

Additionally, Estuary doesn’t simply forward each individual change event from CDC to Snowflake. Instead, it reduces documents before merging them by consolidating multiple intermediate changes into the latest state of each record. This means that Snowflake receives only the latest version of the data, without the intermediate changes which reduces the number of records Snowflake processes and consequently reduces consumption.

Another reason why a pipeline might capture CDC changes continuously from a source like PostgreSQL is to avoid overflowing Write-Ahead Log (WAL) files. Since PostgreSQL writes changes to WAL files, they grow until they are consumed and may eventually exhaust disk space. By continuously capturing CDC data while materializing to Snowflake on a schedule, Estuary prevents WAL overflow and at the same time ensures there is no data loss.

Comparing Cost: Streaming vs. Batch Ingestion in Snowflake

Here is a simple example to calculate the cost savings when using batch ingestion as compared to streaming ingestion.

Assuming we are running on an X-small Snowflake virtual warehouse which consumes 1 credit per hour, the cost comparison is:

  • Streaming mode: The virtual warehouse must remain active at all times to process each incoming changes. At the rate of 1 credit per hour, it consumes 24 credits in a 24-hour period..
  • Batch mode: if materializations are scheduled every hour, and each batch run takes up to 15 minutes to complete, the warehouse consumes no more than 0.25 credits each hour, or a total of 6 credits in a 24-hour period.

In this example, the cost reduction is 75%, which is achieved simply by adjusting the materialization frequency and reducing the amount of time that a Snowflake virtual warehouse must be up and running.

This calculation is valid when the Snowflake virtual warehouse doesn’t consume credits for any idle time, for example during the auto-suspend period. Therefore, the warehouse should be created with an auto-suspend parameter of 30 seconds which is the lowest recommended value, ensuring that it suspends as soon as possible after each batch load is completed.

By optimizing your sync frequency and warehouse settings, Estuary helps you achieve significant cost savings without sacrificing data reliability or accuracy.

The Best of Both Worlds: Flexibility in Ingestion

In many organizations, the ideal data strategy isn’t strictly real-time or strictly batch. They can have the best of both worlds by mixing and matching real-time freshness for operational dashboards and less frequent data updates for other use cases.

Estuary makes it easy to configure each materialization either as streaming or batch, depending on the need, all within the same project. This lets data engineers fine-tune performance and cost on a per-table basis instead of maintaining separate ingestion systems.

This combined approach offers many benefits, such as:

  • Operational simplicity: Manage both batch and streaming ingestion within one Estuary project without maintaining separate systems.
  • Scalability: Adjust sync schedules or materialization frequencies anytime as data volumes or business needs evolve.
  • Predictable TCO: Control how frequently data lands in Snowflake, reducing unnecessary warehouse runtime and compute usage.
  • Right-time performance: Ensure that each dataset is updated exactly when it is needed, not sooner or later.

By combining streaming ingestion with efficient batch materializations, Estuary enables organizations to build right-time data pipelines. This approach ensures that data movement is optimized for freshness, reliability, and cost in every use case.

Conclusion

While real-time streaming supports many use cases where data freshness is critical, it is not the only option when ingesting data for analytics. When you factor in cost optimization for Snowflake, the best approach is to match the ingestion strategy to the use case, rather than defaulting to real-time just in case. That’s what right-time data is all about.

With Estuary, data engineers can continuously capture CDC data, decide when to materialize it, and mix streaming and batch modes as needed.

Start Building Right-Time Pipelines Control when data lands in Snowflake and cut compute costs with the Right-Time Data Platform. Get Started Free or Talk to an Expert

Start streaming your data for free

Build a Pipeline
Share this article

Table of Contents

Start Building For Free

About the author

Picture of Maja Ferle
Maja FerleData Architect | Author

Maja is a seasoned data architect with more than 30 years of experience in data analytics, data warehousing, business intelligence, data engineering, and data modeling. She has written extensively on these topics, helping practitioners and organizations take control of their data. Most recently, she authored Snowflake Data Engineering, where she shares practical insights into building and managing data engineering pipelines in modern cloud-based data platforms.

Related Articles

Popular Articles

Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.