
How to Evaluate Change Data Capture (CDC) Solutions: Key Considerations for Buyers
Figuring out the correct change data capture (CDC) solution to use can be confusing. Follow these criteria for evaluating CDC tools so you can make the right decision based on your use case and requirements.

If you've read our intro to change data capture, you already know the three capture methods differ significantly in source load, delete handling, and latency. The next question is which CDC tool to use, and that's where things get harder. The market has expanded fast, and most vendors describe themselves the same way: real-time, log-based, scalable, easy. The differences only show up once a pipeline is in production, when retries, schema changes, and destination slowdowns start to matter.
This guide covers what data teams should actually evaluate when comparing CDC tools, with examples from teams who've made the call.
Key Takeaways
Not all CDC tools use log-based capture—confirm the tool reads from the transaction log for your specific source before evaluating anything else.
Delivery semantics matter more than capture speed: delete handling, per-key ordering, and retry behavior are where most CDC tools fail in production.
Backfills aren't edge cases, so choose a tool that supports bounded replay and selective backfill, not just full resyncs.
Evaluate pricing against your real workload; per-row and per-destination models can make CDC significantly more expensive as you scale.
What capture method does the tool use?
Start here, because everything else depends on it. A tool that calls itself "CDC" can mean log-based capture, query-based polling, or a hybrid that switches depending on the source. The differences in latency, source load, and delete handling are large enough to disqualify a tool before you look at anything else.
Confirm the tool reads from the transaction log for your specific source: PostgreSQL WAL, MySQL binlog, SQL Server transaction log, Oracle redo logs, MongoDB oplog or change streams. Then confirm what happens when log access isn't available. A tool that quietly falls back to polling without telling you is one that will surprise you later when latency drifts and source load climbs.
Hayden AI's situation is a good example of why this matters. Their Postgres database is the system of record for the data that powers their analytics, and they needed CDC that captured changes continuously without adding query load to a production database their application depends on. Log-based capture from the WAL was a hard requirement, not a preference.
How does the tool handle deletes, ordering, and retries?
This is where most CDC tools fail. Capturing changes is the easy part. Delivering them correctly when something goes wrong is what separates a pipeline you can trust from one that silently corrupts data.
When evaluating CDC tools for this, consider these three questions:
- Does it capture deletes as first-class events? Some tools only emit inserts and updates, leaving you to model deletes yourself. If your warehouse is going to reflect the source, deletes have to propagate.
- Does it preserve per-key ordering? Two updates to the same customer record cannot arrive out of order in the destination. Tools that don't guarantee this will eventually show "shipped" before "ordered" in your analytics.
- What happens on retry? At-least-once delivery is the default for most CDC tools, which means duplicates will happen. The question is whether the tool gives you the building blocks (stable event IDs, idempotent upserts, dedupe on log position) to end up with correct state in the destination.
Resend ran into this exact category of problem before switching. Their pipeline relied on backfills to recover from delivery gaps, which were both expensive and unreliable. Replacing that with a CDC system that handled retries and recovery natively turned data movement from a recurring incident into infrastructure they don't have to think about.
How does the tool handle backfills and schema changes?
Backfills aren't edge cases. You'll do them when you add a new destination, fix a transformation bug, onboard a new analytics use case, or recover from a destination outage. A CDC tool that treats every backfill as a full resnapshot turns routine work into multi-hour incidents.
Look for:
- Bounded replay: Can the tool replay a defined window without dropping and rebuilding the destination?
- Selective backfill: Can you backfill one table or one tenant without touching the rest of the pipeline?
- Schema evolution: What happens when a column is added, renamed, or retyped? The tool should handle additive changes automatically and pause or quarantine on breaking changes, not erratically drop fields.
Glossier is a useful example here. As an Estuary customer, they were able to finally connect their ERP's new data endpoint and unlock data that had previously been blocked by the cost of moving it. The unlock wasn't capture. It was that adding a new source and reshaping the pipeline didn't require rebuilding everything downstream.
What does the operational burden actually look like?
Some CDC tools require you to run Kafka, Kafka Connect, and Debezium yourself. Others handle the streaming layer for you. The decision isn't ideological. It's about whether your team's time is better spent operating infrastructure or building on top of it.
A few items to evaluate here:
- Do you need to run Kafka? If yes, factor in cluster operations, partition management, and the people-cost of keeping it healthy.
- What's the monitoring story? Can you see freshness lag and backlog per pipeline, or do you have to build that yourself?
- How does the tool handle failure? Dead-letter queues, error streams, and clear alerts on schema mismatch matter more than they sound like they do.
Shippit chose their CDC tool partly to avoid this exact tradeoff. Running Kafka in-house meant either dedicating engineers to streaming infrastructure or accepting fragility, neither of which fit their team. A managed approach removed that decision entirely.
Does the pricing model match how you'll actually use it?
Pricing in the CDC space can get confusing. Often a tool that looks affordable in the demo can become expensive once you add destinations or scale data volume. The two patterns that catch teams off guard are per-row pricing (which punishes you for high-change tables) and per-destination pricing (which punishes the multi-consumer use case CDC is supposed to enable).
Ask the vendor to model your real workload, not a generic example. Then ask what changes if you double the volume, add a second destination, or onboard a new use case. The answer should be predictable.
LOVESPACE compared Estuary to Fivetran and Airbyte specifically on this dimension. Cost efficiency was one of their two stated reasons for the choice, alongside support responsiveness. Cosuno made a similar call: real-time data movement at half the cost of their previous setup, with no incidents since. Shippit framed it more directly: they didn't want to be locked into a system where faster syncs meant higher bills.
Checklist: What to compare when evaluating CDC tools
Use this checklist to avoid selecting a CDC tool that works in a demo but fails under retries, bursts, and schema changes:
1. Capture method
- True log-based CDC vs. polling / "incremental sync"
- Source support (Postgres WAL, MySQL binlog, SQL Server log, Oracle redo,
- MongoDB oplog/change streams)
2. Delivery capabilities
- How duplicates are handled under retries (idempotency / dedupe)
- Ordering guarantees (at least per-key ordering)
- Delete handling (hard deletes, soft delete options, tombstones)
- Destination latency (are change events sent to destination systems in real time, or is there lag?)
3. Backfill and recovery
- Initial snapshot + streaming handoff correctness
- Adding new destinations with backfill (without disrupting existing pipelines)
- Bounded replay vs "full resync" as the default recovery response
4. Schema change behavior
- Additive evolution support
- What happens on breaking changes (pause, error stream, silent drop, auto-cast)?
5. Operational burden
- Whether you must run Kafka / Connect infrastructure
- Monitoring and lag visibility
- Failure handling (dead-letter/quarantine behavior)
- Cost profile as volumes and destinations scale
Bringing it together
The teams who get CDC right tend to evaluate on the same five dimensions: capture method, delivery semantics, backfill and schema handling, operational burden, and pricing alignment. Anything else is a feature checklist that won't predict how the tool behaves in production.
If you're building a shortlist, consider Estuary. Our platform offers log-based CDC across Postgres, MySQL, SQL Server, MongoDB, Oracle, and others. Built-in handling of ordering, retries, and deletes. Bounded replay and selective backfill as standard operations rather than rebuild events. No requirement to run Kafka, with pricing that doesn't penalize freshness or additional destinations.
The teams quoted above all made the same evaluation in different orders, and ended up at the same place.
FAQs
What's the difference between log-based CDC and query-based CDC?
How should a CDC tool handle schema changes?
Why do CDC pipelines fail in production?
Do I need Kafka to run CDC?
How does Estuary handle CDC?

About the author
Emily is an engineer and technical content creator with an interest in developer education. At Estuary, she works with data pipelines for both streaming and batch data and finds satisfaction in transforming a mess of information into usable data. Previous roles familiarized her with FinTech data and working closely with REST APIs.




