Data integration platforms come in various types and forms. Whether they’re branded as ETL, ELT, DataOps, or something else, these platforms are essential.
They help businesses develop, test, process, and update multiple data pipelines, and today’s article focuses mainly on the three notable names in the data integration space: Airbyte vs Fivetran.
Our guide today is meant for businesses struggling to choose the most reliable and capable platform of the previously mentioned three options. Fivetran has long been one of the most in-demand data integration platforms, but it’s been challenged by relative newcomers that bring new features to the scene — companies like Airbyte and Estuary.
We’ll go over the specifics of all three options so that by the time you are done reading this article, you’ll know which platform you should invest in for your company and project development.
But first, let’s begin with a crash course on all three platforms and the development teams behind them.
What Is Airbyte?
Airbyte is an open-source data integration platform that promises the configuration of data pipelines within minutes of setting up the software. The company understands the growing pains of incorporating data into projects and how it is an essential part of every development today, with most businesses and services relying heavily on multiple data pipelines.
These pipelines feed data from multiple sources so their project has the necessary prerequisites to handle user requests. Airbyte is based in San Francisco, California and the team behind the product’s development boasts more than a hundred years of cumulative experience.
This experience has led them to develop one of the most popular new data integration tools available today.
What Is Fivetran?
Similar to Airbyte, Fivetran is a platform that leverages the power of Metadata API to power end-to-end data analysis and visibility. It performs similarly to other data integration tools but works in the ELT format and is meant primarily for deployment in cloud-based systems.
Fivetran enables efficient business processes by streamlining the data replication, fetching, sourcing, and integration aspects. However, unlike most other options, Fivetran focuses heavily on extraction and loading whereas other platforms rely on extraction and transformation of data.
The California-based company does not have an open-source license for its platform and instead uses the traditional SaaS (Software as a Service) model.
What is Estuary?
The Estuary Flow platform is a real-time data integration system built for the future and is based on the DataOps methodology, ensuring your teams can build data-intensive architecture for your applications with minimal friction and issues.
Perhaps our biggest claim to fame is that we unify databases, pub/sub systems, and SaaS in real time, and do all of this without needing new investments in infrastructure or development.
Our ELT platform is Estuary Flow which enables instant capture of data from multiple sources and landing it where it is required. And with real-time transformations, Flow aims to be one of the fastest platforms available to businesses today.
Airbyte, Fivetran, and Estuary Features At A Glance
With a brief understanding of the companies and their platforms out of the way, the next best way to approach the nuances of the three platforms is by comparing them head-on, including Airbyte vs. Fivetran. Before we dive deeper into the specifics, here is a comparison table highlighting the major key differences in all three platforms.
|Authentication to data sources
|API Tokens / Dev API / Cloud: Auth2.0
|OAuth 2.0 / API tokens
|Maintenance of connectors
|Airbyte & Community
|Only Fivetran team
|Estuary team and community; integrates with Airbyte connectors
|Sync Data Frequency
|Restricted per tier
|Restricted per tier
|Lock-in to Enterprise
On the surface, all three products look similar: they all feature a variety of connectors for different data systems and intuitive user interfaces.
But if you look at the underlying architecture that powers data pipelines made on each platform, things start to look different.
In simple terms, there are two main types of processing that a data integration tool can use:
- Batch processing, in which the pipeline checks the data source for changes periodically and processes those changes in batches.
- Real-time processing, also known as streaming, in which any change event in the source is detected within milliseconds and and processed immediately.
The most obvious effect is that using real-time processing, data pipelines are much faster. But there are also differences in performance and scalability between the two methods.
How Airbyte Processes Data
Airbyte syncs data in batches from a source system using a schedule. This can be triggered manually in the UI, or have a pre-set time interval.
There are four methods, or sync modes, available, depending on the connector in use:
- Full refresh – Overwrite: All available data is synced from the source, even if it has been synced before. Identical data records are overwritten.
- Full refresh – Append: All available data is synced from the source, even if it has been synced before. Identical data records are duplicated.
- Incremental – Append: Only new or modified data is synced, but modified rows are duplicated.
- Incremental – Deduped History: Only new or modified data is synced, and modified rows are merged.
How Fivetran Processes Data
Fivetran ingests data from a source in two ways:
- Pull connectors: Connectors retrieve data from the source at a set interval.
- Push connectors: Source systems send data to Fivetran as events.
The data is kept temporarily in cloud storage. Although push connectors provide the data events in real time, data from push and pull captures alike is synced to the destination system in batches.
How Estuary Processes Data
Estuary Flow is fundamentally a real-time system because of its event-based runtime. Rather than syncing data on a set schedule, all data events are processed as soon as the source system makes them available.
- Most data updates are reflected in milliseconds.
- Source or destinations may bottleneck the speed of data, but Estuary never will.
- Since flow reacts to events, it doesn’t need to scan the entire data source each time it makes an update. This saves performance costs, especially for large datasets.
Though it is real-time, Estuary is able to interface with batch-based connectors that are open-source, as well.
Category Winner — Estuary
Estuary’s event-based runtime is its principle value add, setting it apart from the other platforms in this category.
If you’re not worried about timeliness, pre-built connectors are one of the best way to differentiate ETL / ELT solutions since they determine whether a pipeline solution matches your use case. These are standardized and pluggable components used for interfacing with a system to pull or push data
Let’s discuss the two types of connectors used in all platforms and why they differ from each other so much.
Airbyte has only been in the data integration business for two years and already has support for connectors from over 200 data sources. The platform uses all prominent and major data lakes, warehouses, and databases as these destinations.
The biggest reason for the vast number of connectors over the other two platforms is the fact that Airbyte is open-source and so are its connectors. This allows analysts to work with custom connectors and edit to address any specific needs the customers have.
Regardless of which version of the platform you choose (open-source or cloud), Airbyte users can leverage this catalog of connectors easily.
Airbyte also provides a no-code Connector Development Kit which lets users develop custom connectors. This process typically takes two days on most platforms but the kit lets them get started within 30 minutes. Plus, the Airbyte team and community are always available and can help with their maintenance.
Approximately a little more than 50% of the connectors have been contributed by the growing community. The company also has an SLA for certified connectors. They are also looking to offer reverse-ETL tools in 2023.
The long-term ambition of the company is to provide an SLA for other connectors through the community and reach more than 1000 connectors in the years to come.
Compared to Airbyte connectors, Fivetran’s connector can seem meager in comparison. The platform has connectors for almost 150 data sources including Zoho CRM, AWS CloudTrail, and Salesforce but ensures that all connectors support major data warehouses and databases as destinations. One big difference here is that Fivetran does not support data lakes.
Since Fivetran is a proprietary setup, customers cannot build custom connectors similar to Airbyte. However, the company does charge extra for custom requests when users would need Fivetran to build a new data source. Similarly, no one outside the company can make improvements to existing sources either.
A workaround for most Fivetran customers is that they can hire their data engineering teams or a different company to build and maintain custom connectors to address all their specific needs.
Estuary is focusing on high-scale technologies, like databases and pub/sub systems, instead of SaaS products. It is a nice balance between the two companies where it does offer pre-built connectors for most users, but also has support for the open-source connector repository.
Estuary uses an adaptation of the Airbyte community connector specification for its connectors, which are dual-licensed under the Apache 2.0 and MIT licenses. This means that many of Airbyte’s open-source connectors are usable in Estuary Flow, as well.
Estuary currently provides connectors for all vital endpoints, including:
- Amazon S3
- Google Firestore
- Amazon Kinesis
Currently, the Estuary team is accepting suggestions for new connectors from beta users with rapid turnaround time.
Category Winner – Tie
The winner in this category depends on your use case; namely the systems that you are moving data between. Check each company’s website to see if your most valued systems are supported. In general terms:
- Looking for a huge variety of SaaS connectors and open-source freedom? Choose Airbyte.
- Still looking for a variety of SaaS connectors with more established support channels? Choose Fivetran.
- Value technology systems over SaaS, especially real-time CDC connctors? Choose Estuary.
Data transformation is the process that converts datasets from one format to another. Data is converted into the required format of a destination system based on the format of a source system.
Transforming data is one of the most important components of data integration and data management tasks. And with the reality of big data today, the need to transform data into readable formats has never been higher.
Apps, programs, and mobile devices provide vast amounts of data daily and handling it all can be a challenging task for most integration platforms without the necessary conversion. It enables consumers and businesses to change data from any source into a format that can be evaluated, integrated, stored, and ultimately mined for useful business intelligence.
Let’s see how the three integration platforms handle data transformation:
Airbyte is an ELT tool and does not transform data before loading. It can be problematic for companies looking to transform data as soon as it is ingested but Airbyte does offer two options to get your data:
- Serialized JSON object
- A normalized version of the record as tables
Custom transformations are also possible with SQL on Airbyte and users can also program deep integration with dbt. This integration lets them trigger their dbt packages at the destination level right after the EL.
Since Fivetran is an ELT tool like Airbyte, it also does not transform data before loading, either. However, the platform offers opinionated normalization out of the box. It also lets users copy and paste SQL and dbt by adding support for post-load transformations.
Estuary Flow is an ELT platform as well but unlike the other two, the software uses TypeScript for its data transformation requirements. And because these transforms are executed in real-time with built in testing and static type checking, the program has a leg up compared to other ELT tool providers on our list and otherwise.
As an alternative, Estuary also lets users customize their transformation experience with webhooks.
Category Winner – Estuary
Estuary provides on-the-fly transformation capabilities. By integrating with data warehouses like Snowflake and Bigquery, it also allows you to power dbt with real-time data.
Airbyte has three pricing plans to choose from.
- Open Source – Free
- Cloud – $2.50 per credit
- Enterprise – Quote through the sales team
Most businesses and users will love the fact that Airbyte is a free-to-use tool but for cloud-based architectures, the software may be an expensive option.
Fivetran has five plans to choose from.
- Starter – Estimated $120 per month
- Standard Select – Estimated $60 per month
- Standard – Estimated $180 per month
- Enterprise – Estimated $240 per month
- Business Critical – Quote through the sales team
Fivetran is a tricky pick for most people. While the platform is a great solution for businesses and organizations, users and smaller companies can find it to be an expensive investment.
Estuary Flow has three plans to choose from.
- Open Source – Free
- Cloud – Free
- Enterprise – Quote through the sales team
Estuary is still a work in development but does offer the most bang for the buck at the moment.
Category Winner – Tie (Airbyte + Estuary)
Both products provide a generous free tier and scalable pricing for businesses of all sizes.
What Do The 3 Platforms Get Right?
Pros Of Airbyte
- Has support for in-house connectors as well as custom connectors.
- Airbyte is open-source which means the platform is free to use for anyone, if self-hosted. The connectors are completely open source.
- The company adds on average 9 new connectors every month and even features a connector marketplace where they can be placed for easy visibility for other users.
- Airbyte and its community can help you maintain connectors and the platform.
- No issues of vendor lock-in since the platform allows users to pick any data cloud storage the user wants.
- Airbyte has API support right out of the box which helps users create their connectors.
- Flexible enough to allow users to code in any language they want, including Java and Python.
Pros Of Fivetran
- Fivetran has a wide user base and companies appreciate its SaaS model as a robust and simple mechanism.
- User permissions come included with Fivetran. Restack still creates single-user access and protects the custom domain.
- Access to data sources is easy through the single sign-in method compared to having data engineers fiddle with an API to access their data.
- Another benefit lies in Fivetran’s SaaS model. The company has immense support from its in-house teams and detailed guides and troubleshooting FAQs are provided.
Pros Of Estuary
- Of the three platforms, Estuary is the only true real-time data pipeline solution.
- Highly scalable architecture focused on large datasets, and connectors that emphasize high-scale systems.
- Estuary is an open-source platform and its default version is free to use for any user or business.
- The platform has real-time data transformation.DataOps enables data teams to efficiently and at scale develop real-time, data-intensive applications.
- Built-in support for unit testing.
What Do The 3 Platforms Get Wrong?
Cons Of Airbyte
Airbyte requires a vast knowledge of setting up data pipelines and isn’t the easiest platform to get up and running with. The platform also has few releases and its UI, while intuitive, doesn’t have the easiest learning curve compared to other competing platforms.
Perhaps its biggest flaw is in its transformation capabilities (or lack thereof) where transformations only occur once all data is pre-loaded.
Cons Of Fivetran
Fivetran relies heavily on dbt to handle behind-the-scenes complex transformations. Due to this, data teams will find their workflow incredibly muddied as they switch back and forth between different data management tools. For this reason, we wouldn’t recommend Fivetran to startups with limited engineering personnel to handle transformation and management tasks.
The platform also lacks broader data management capabilities such as data orchestration, auto-discovery of data semantics, data governance and virtualization, and data quality capabilities.
Cons Of Estuary
The only issue we can spot with Estuary Flow is that it is a relatively new product that is still undergoing development. It means that the program may at times have bugs and errors with data ingestion in its current state.
Regardless, the platform holds its own compared to the other two established names in the industry and even excels in most areas already.
This guide aims to provide a clear answer and a unanimous winner when comparing Airbyte vs Fivetran vs Estuary. We’ve compared the three platforms for different use cases and seen that all have their strengths and weaknesses , and while Airbyte and Estuary are comparable platforms, Fivetran’s SaaS model makes it the odd one out.
Of course we’re a bit biased, but in our review today, we’ve found Estuary Flow to be the overall winner. It’s capable of all your data integration requirements, with all the benefits of real-time data. And being the most cost-effective option out of the three, the platform is the obvious choice for any business looking for a reliable ELT platform.
You can get started with Estuary for free – register here!