The modern data stack requires robust pipelines that unify source and destination systems across an organization in a timely, reliable way. Data integration platforms are necessary for businesses to develop, test, process, and update these data pipelines. Our guide today helps you choose between some of the top platforms available today: Meltano vs Airbyte. We'll also compare both tools to Estuary Flow.
This post aims to answer some of your burning questions and compare key features of these platforms so you can decide which is the best fit for your business and data goals.
We’ll go over each product’s specifics and walk you through the benefits and drawbacks. We can’t promise that you’ll be able to make a decision on this guide alone — your situation is unique! But by the time you’ve finished reading this guide, you’ll be equipped with all the essential knowledge to narrow your search. You’ll save time by knowing which solutions to read about in-depth, talk to sales teams about, and try out, and which you should skip.
But first, a refresher on data integration platforms.
What Is A Data Integration Platform?
A data integration platform is a system that allows you to collect, transform, and transport datasets between disparate data systems. The transported data is then used by specific applications, business units, and partners to help deliver data-driven business outcomes.
Each connection between a source and destination is called a data pipeline. With a data integration platform, you can easily create and maintain the various pipelines your data stack requires.
Put another way, a data integration platform can automate much of the work that in the past would be completed manually by IT professionals or data engineers. Today, as data volume grows exponentially and the number of data use cases dramatically increases, it would be impossible for data professionals at most companies to build all the required pipelines manually.
Modern data integration platforms are user-friendly, so in addition to highly technical professionals, stakeholders from data scientists to marketers can create and manage pipelines.
The benefits of effective data integration range far across an organization: from business intelligence dashboards to machine learning algorithms, up-to-date, accurate data is critical.
So, from an infrastructure standpoint, how are data integration platforms supported?
You won’t be surprised to learn that, like most IT infrastructure, you have two main options: on-premise and cloud. Often, the same vendor can provide you with both options.
Type 1: On-Prem Data Integration Platforms
Typically, on-prem platforms integrate into an existing business operation like a software installation. These operate smoothly and frequently in tandem with other data management technologies.
These types of platforms are favored by companies that specifically need on-premises architecture, usually for security reasons.
Type 2: Cloud-Based Data Integration Solutions
Cloud-based integration platforms are scalable, have cheaper maintenance, and don’t disrupt systems when adapting or upgrades occur. These platforms are becoming more and more popular as a result of their scalability and affordability.
Along with real-time data backup, customized access, and big data storage, these platforms’ security measures are becoming more robust over time.
ELT Or ETL? What Is The Difference And Which One To Pick
In addition to installed vs cloud-based data integration solutions, there’s another important distinction we need to make. It has to do with the order in which data pipeline steps take place.
ELT (Extract, Load, Transform) and ETL (Extract, Transform, Load) are the two most basic sequences a pipeline can have. Ultimately, all data pipelines must extract data from a source and load it to a destination. At some point, data must be transformed from the source format into one that’s usable at the destination.
On paper, the difference between ETL and ELT is quite simple. ETL tools transform data before loading it to the destination whereas ELT completes the data transformation after loading.
ETL is the older methodology of the two and provides stronger guarantees of data shape and quality, sometimes at the expense of flexibility or processing speed. When the pipeline destination is an application with a rigid schema or data structure, ETL may be necessary.
ELT is the newer technology. It’s more popular for pipelines that end in analytical storage systems like data warehouses or data lakes. Because both structured and unstructured raw data can be loaded, more flexibility is available to analysts.
Each method has benefits and drawbacks, and architectures can be hybrids of the two. But this is still a helpful framework to keep in mind as you evaluate data integration platform options. Be mindful of where and when you’d like transformations to occur, and pay attention to what the platform supports.
Now that you’ve thought through a couple of high-level architecture options (and have your unique use case in mind, of course) let’s dive into several options.
Comparing Data Integration Options – Meltano vs Airbyte vs Estuary
There are far more than three data integration platforms on the market today. Fivetran, Stitch, and Matillion may come to mind.
Instead of comparing them all, we’re focusing on Meltano, Airbyte, and Estuary, which have a few things in common:
- All are open-source.
- All have self-hosted and cloud-managed options.
- All are relatively new players to the scene and rapidly growing.
Here is a breakdown of the three platforms:
What is Meltano?
Meltano is an open-source data integration platform that has its roots in Gitlab. Alongside its base development, it also uses Singer’s taps and targets. Founded in 2019, they have been iterating on several approaches and currently have 35 of their own connectors.
It has a low number of proprietary connectors because the company focuses on Singer’s offerings but does not provide any support or maintenance for them. The lack of maintenance may not be an issue since Singer’s taps and targets are open-source and can be customized to your needs
One area where businesses might hesitate in choosing Meltano is that the taps and targets offered require extra engineering work and do not function out of the box – like Airbyte and Estuary.
Meltano also distinguishes itself with its focus on DataOps and the command line interface. Unlike Airbyte and Estuary, Meltano does not have a user interface (UI) and is designed to provide an experience akin to software development, including CI/CD and version control.
Currently, you can self-host Meltano (typically, on-prem) and use a Git repository as a backup. Meltano’s managed Cloud offering will be available in 2023.
What is Airbyte?
Airbyte is a data integration tool similar to Meltano, but rather than emphasizing developer workflows, it promises the configuration of data pipelines within minutes of setting up the software. It is primarily a low-code, UI-based app designed for ease of use for more stakeholders.
Airbyte can be self-hosted using the open-source option or managed in the cloud through the paid model.
Airbyte is also relatively new to the data integration space but is backed by a team of data scientists and engineers with a cumulative 100 years of experience. Based in San Francisco, California, the company understands issues with data incorporation into projects and how it is an essential part of every development today.
Since most businesses and services rely heavily on multiple data pipelines, Airbyte has prioritized creating a vast library of open-source connectors, mostly for SaaS and APIs. This diversity of connectors and the community that has grown around their development has helped propel Airbyte to popularity.
What is Estuary?
Based on the DataOps philosophy, the Estuary Flow Platform is a real-time data integration solution created for the future. Flow combines the ease of popular ELT tools (like Airbyte) with the scalability and efficiency of data streaming.
Estuary’s team prioritizes integrating high-scale technologies like databases, pub/sub, and filestores, but it also supports Airbyte’s open-source specification to provide more variety of SaaS connectors.
Real-time data architecture is considered challenging for most engineers. Flow is designed to eliminate these classical challenges. This allows not just for time savings, but also cost savings and seamless scalability as your data grows.
Flow offers real-time transformation, and has both CLI and UI support. You can self-host Flow or used the managed cloud offering.
Founded in New York by data industry veterans, Estuary is a relatively new company that is growing rapidly.
Meltano Vs Airbyte Vs Estuary Features At A Glance
Understanding the platforms and the companies behind their development is only part of the story. The other half is ensuring their technical specifications line up with your requirements. Let’s approach the nuances of the three platforms by comparing them head-on.
Before we dive deeper into a few specifics, here is a comparison table highlighting the key differences in all three platforms.
|UI; CLI coming soon
|UI and CLI
|30 natively. Integrates with all open-source Singer connectors.
|More than 200 open-source connectors.
|More than 25, focused on high-scale technologies and CDC. Integrates with Airbyte.
|Yes; paid cloud-hosted and enterprise tiers are also available.
|Yes; paid cloud-hosted and enterprise tiers are also available.
|Singer taps and targets (most of Meltano’s connectors) are open-source.
|Integrates with dbt for transformation after loading.
|Integrates with dbt and allows SQL script after loading.
|Native real-time transformations prior to loading.
All three platforms provide a central pipeline runtime that interfaces with outside systems using connectors.
Connectors, sometimes called integrations, are plugin components that connect the central platform to either a data source or destination system. Simply put, if the platform doesn’t have a connector for one of your sources or destinations, you’ll have a hard time using the platform. As a customer, you may have the opportunity to create a connector yourself or request that the company make one for your use case.
Meltano has only been in the data integration industry for two years and prior to deciding to stop spending time on extending its list of supported connections, its team produced 35 connectors. However, the official website shows support for more than three hundred connectors.
Meltano’s dependence on Singer’s taps and targets is the cause of this discrepancy. Although these connectors can be modified and are open-sourced, they are known to be challenging and often need engineering work.
Meltano also provides engineers and analysts with a proprietary SDK that makes it simple to create Singer taps and targets that are more uniform.
Similar to Meltano, Airbyte has also been in the industry for only two years but has support for more than 200 data connectors and is working towards expanding its connectors to more than 1000. While Airbyte now offers SLA for some certified connectors, it is expected to do so in the future for more connectors as well.
These connectors cover data lakes, warehouses, and databases, but the vast majority are for SaaS APIs. Airbyte’s open-source connector framework and focus on community make this wide variety possible — it allows the Airbyte community to develop connectors for their use cases.
Clients can leverage custom connectors and modifications to their unique requirements, and then share their work with the community. Users can start using it right away because of its no-code Connector Development Kit, whereas competing tools take two days to complete. The Airbyte community has contributed approximately 50% of the connectors.
Estuary builds and maintains the connectors our users need, focused on high-scale technology systems including:
- Amazon S3 – AWS
- Amazon Kinesis – AWS
Our connectors are open-source and follow Airbyte’s specification. This means you can also add open-source Airbyte connectors for extended SaaS support. Estuary’s engineering team independently tests and adapts open-source Airbyte connectors for use with Flow, so there’s no need for engineering work on the customer’s end.
If you self-host the platform, your engineering team can create and update connectors. If you use the cloud-managed version of Flow, you can request new connectors from the team.
Ideally, you want your data integration platform to meet all of your needs out of the box. In reality, the complex nature of your data infrastructure might make it necessary to customize — whether that means adding a new connector or crafting a highly specific pipeline.
The question is whether the platform you’re choosing will make customization easy.
Since Meltano is open-source, you can leverage Singer’s taps and targets the way you want. You can also use the Meltano SDK to create custom connectors that follow the Singer specification.
Meltano is modular in nature. Each step in a pipeline comes in the form of a plugin, and a wide variety of plugins are available. This makes it easy to mix and match components. If you don’t find what you need, Meltano’s code base is open and accepting contributions.
Airbyte’s major success when it comes to customizability is its open-source connector library. Businesses can use the Connector Development Kit to build their custom connectors in as little as a few hours.
As an open-source project, Airbyte also accepts contributions to its code. Self-hosting the open-source offering allows flexibility for engineers; the UI for Airbyte Cloud is easier to use but less customizable.
Like Airbyte and Meltano, Estuary is a modular platform with open-source connectors. Using the open-source version affords full customizability.
Flow’s cloud-managed tier is designed to combine the ease of use of a web application with the flexibility of a CLI. From schema changes to real-time transforms, you can modify just about everything about your pipeline. Advanced users can even view and control how processing tasks are handled.
For the vast majority of data pipelines, you’ll need to transform data from the source format to one that’s usable at the destination.
Most data integration platforms make it easy to make small adjustments to conform data to a schema. But what about when you need more — say, a join or calculation? Pipeline platforms address this in different ways.
Meltano’s Data Transformation
Meltano is an ELT tool: transformation occurs after loading, usually into a data warehouse. Transformations are run on batches of data.
Meltano offers an integration with dbt, a popular transformation platform, through a plugin. This allows you to create transformations from the familiar Meltano CLI environment.
Airbyte’s Data Transformation
Airbyte is also an ELT tool.
Airbyte allows you to trigger transformations to occur immediately after loading to a data warehouse through an integration with dbt. You can also use custom SQL scripts. As with Meltano, transformations are run in batches.
Estuary’s Data Transformation
Estuary Flow supports real-time data transformation prior to loading. Unlike its competitors, transformations are performed as soon as data events flow through the pipeline. When the data arrives at the destination, it’s already transformed.
You can use Flow’s to generate robust, type-checked transformation functions with TypeScript, or use simple SQL. This provides a strong guarantee against transformation failure. You can also host transformation functions at an HTTP endpoint and create them however you’d like.
Let’s compare the pricing of all three tools.
Meltano is open-source and free. Their managed cloud offering doesn’t yet have pricing available.
Airbyte has three plans to choose from.
- Open Source – Free
- Cloud – $2.50 per credit. The amount of data that a “credit” is equal to depends on the data source.
- Enterprise – Quote through the sales team
Estuary Flow has three plans to choose from.
- Open Source – Free
- Cloud – Free trial, then $0.75 / GB of data process. This is the same regardless of data source.
- Enterprise – Quote through the sales team
Estuary’s pricing is designed to scale up affordably: streaming technology keeps pipelines more efficient than the batch workflow used by other providers.
Our guide today hopes to give you a basic overview of Meltano vs Airbyte vs Estuary.
While it’s impossible to know the best option for you without considering your unique use case, what you’ve learned here has laid the important groundwork. You may have realized that a certain option is a non-starter, or have honed in on specific questions for further research.
As we wrap up, keep in mind that all of these companies are relatively new and products are evolving quickly. Pay attention to the timeline for future offerings, and take advantage of free trials and self-hosted instances to make sure the platform performs as you expect.
You can get your free trial of Estuary Flow here.