Estuary

Better Backfills with Dataflow Reset

Kick off a one-step backfill that refreshes your entire data pipeline—including your schemas—with a dataflow reset.

Better backfills with Estuary's new Dataflow Reset feature
Share this update

Backfilling data is now easier than ever with Estuary's Dataflow Reset feature.

Dataflow reset offers a one-stop solution to refresh everything in a dataflow, including:

  • Source collections
  • Derivations
  • Schemas
  • Materialized tables

This means that a dataflow reset will not only refresh data from sources and to destinations; it will also recalculate schemas based on this refreshed data. While Estuary's schema inference is a powerful tool to detect fields' data types, bad or changing data can lead it to faulty or outdated assumptions. Clearing the slate and recalculating freshens your schemas and can tighten any unintentionally broad fields that previously received bad data.

To kick off a dataflow reset, simply start from your capture. From the edit screen, selecting "Backfill" will now automatically default to a dataflow reset.

Select between dataflow reset and incremental backfill when backfilling a source

The dataflow reset will automatically detect affected destinations and other resources without you needing to select specific resources to update downstream.

While dataflow resets will likely be the most useful option for most use cases, Estuary still offers advanced options to backfill individual parts of a pipeline: 

  • Incremental backfills can update source data without recreating destination tables
  • Or recreate destination tables without re-extracting source data using advanced materialization backfills

Known limitations: It is currently not recommended to use dataflow reset in conjunction with a Dekaf materialization.

Find out more, including suggestions on when to choose which backfill option, in our docs.

Share this update
Start Building For Free
Streaming Pipelines.
Simple to Deploy.
Simply Priced.
$0.50/GB of data moved + $.14/connector/hour;
50% less than competing ETL/ELT solutions;
<100ms latency on streaming sinks/sources.