Estuary

Data Engineering Glossary

Clear, plain-language definitions of the data integration, streaming, and pipeline terms that data teams work with every day. Browse by letter or filter by category to find what you need.

B

Backfill

Core Concept

A backfill loads historical data from a source into a destination when a new pipeline is created, a new table is added, or when manually triggered.

Bring Your Own Cloud (BYOC)

Estuary Platform

Bring Your Own Cloud (BYOC) is a deployment option where Estuary's data plane runs entirely in the customer's own infrastructure.

C

Capture

Estuary Platform

A capture is the Estuary component that ingests data from an external source system into a collection.

Collection

Estuary Platform

A collection is a real-time, append-only log of JSON documents stored in cloud object storage, produced by a capture or derivation and consumed by one or more materializations.

Connector Catalog

Estuary Platform

Estuary's catalog of connectors includes real-time CDC, streaming, batch, and materialization connectors for warehouses, lakes, and AI infrastructure.

D

Data Flow

Estuary Platform

A Data Flow is an end-to-end pipeline in Estuary connecting one or more sources to one or more destinations.

Data Freshness

Core Concept

Data freshness is the gap between when a change happens in a source system and when it is visible in a destination.

Data Replication

Core Concept

Data replication is the continuous process of copying changes from a source system to one or more destinations, keeping them in sync.

Database Connector

Estuary Platform

A database connector is a pre-built component that connects a pipeline to a specific source database.

Delivery Semantics

Core Concept

Delivery semantics describes the guarantee a pipeline makes about how many times each change event reaches the destination.

Derivation

Estuary Platform

In Estuary, a derivation is a special data collection that applies transformation logic using SQL, TypeScript, or Python.

E

ELT

Core Concept

ELT is a data movement pattern with Extract, Load, and Transform steps: raw data lands in the warehouse, and tools like dbt handle the transformation step downstream.

Exactly-once Delivery

Core Concept

Exactly-once delivery is the guarantee that each record from a source reaches the destination precisely one time, with no duplicates and no dropped events.

I

Idempotent

Core Concept

An idempotent operation produces the same result whether it runs once or multiple times, which makes pipelines safe to retry after failures.

L

Latency

Core Concept

Latency is the technical measure behind data freshness — the gap in milliseconds or seconds between when data is created at a source and when it is available at a destination.

LLM (Large Language Model)

Core Concept

Large Language Models (LLMs) are a type of AI that works with natural language to power chatbots, copilots, and agentic workflows.

Log-based Change Data Capture

Methodology

Log-based CDC reads committed changes directly from a database's transaction log and delivers them as a real-time ordered stream of change events.

M

Materialization

Estuary Platform

A materialization is the Estuary component that delivers data from a collection into an external destination, keeping it up to date as new change events arrive.

O

Operational Analytics

Core Concept

Operational analytics uses analytical data to drive real-time business actions rather than retrospective reporting.

R

RAG (Retrieval-Augmented Generation)

Core Concept

Retrieval-augmented generation is an AI pattern where a language model retrieves relevant context from an external data store before generating a response.

Real-time Data Pipeline

Core Concept

A real-time data pipeline moves data from source to destination continuously, with end-to-end latency measured in milliseconds to seconds rather than hours.

Reverse ETL

Core Concept

Reverse ETL moves data from a data warehouse back into the operational tools where business teams work, such as CRMs, ad platforms, and support systems.

Right-time Data

Core Concept

Right-time data means delivering data at the cadence each use case actually requires: sub-second, near-real-time, and scheduled batch.

S

Schema Evolution

Core Concept

Schema evolution is handling changes to data structure (added columns, renamed fields, changed types) as they occur.

Streaming

Core Concept

Streaming processes data continuously as a flow of events the moment it is produced, making it available to downstream consumers within milliseconds.

Streaming ETL

Core Concept

Streaming ETL applies the extract, transform, load pattern continuously rather than on a scheduled batch cadence.

T

Transaction Log

Architecture

A transaction log is an ordered, append-only record of every committed change a database makes, used for crash recovery and the foundation of change data capture.

Trigger Based Change Data Capture (CDC)

Methodology

Trigger-based CDC fires a database trigger on every insert, update, or delete, writing each change to a shadow table that the pipeline then reads.

V

Vector Database

Destination Type

A vector database stores high-dimensional numerical embeddings of text, images, or other data.

Stay informed with our newsletter
Email icon

Subscribe to our newsletter

Email icon

By subscribing I agree with Terms and Conditions.

One platform for all data movement