This guide is designed to help you evaluate different Matillion alternatives and choose the best data integration tool based on your needs. If you’re looking for alternatives to Matillion you will most likely be looking at other ETL tools, or possibly ELT tools focused on cloud data warehouses. You might be considering transitioning to a cloud-based ETL solution for loading data into your cloud data warehouse. You might even only need data replication, not full-blown ETL. This guide will cover the top Matillion alternatives, including Estuary, Airbyte, Fivetran, Hevo, Informatica, Talend, and Stitch Data.
What is Matillion?
Matillion ETL, initially developed as an on-premises platform before the emergence of cloud data warehouses, remains largely focused on on-premises deployments. That said, it does work well with cloud data platforms such as Snowflake, Amazon Redshift, and Google BigQuery. It combines many features to extract, transform, and load (ETL) data. More recently Matillion has been adding cloud options as part of the Matillion Data Productivity Cloud offering. It consists of a Hub for administration and billing, a choice of working with the on-premises Matillion ETL deployed as “private cloud” or Matillion Data Loader, a free cloud batch and CDC replication tool built on Matillion ETL but lacking many of its capabilities including transforms.
As with most of the mature ETL tools, Matillion has a strong set of features, but is harder to learn and use and is more expensive.
Pros
Perhaps one of the biggest advantages of Matillion is its ETL and orchestration, especially when compared to various ELT tools.
- Advanced transforms: Matillion ETL supports a variety of transform options, from drag-and-drop to code editors for complex transformations.
- Orchestration: Matillion offers advanced graphical workflow design and orchestration.
- Pushdown optimization: Matillion ETL can push down transformations to the target data warehouse.
- Reverse ETL: Matillion provides the ability to extract data from a source, cleanse it, and insert data back into the source.
Cons
- SaaS: Matillion ETL, its flagship product, is on-premises only. It does offer Data Loader, which is built on ETL, as a free cloud service for replication. There is also integration between Matillion ETL and the Matillion Cloud Hub for billing. While you can migrate work in Data Loader to ETL if you choose, it is a migration from the cloud to your own managed environment.
- Free tier: While Matillion Data Loader is free, it has limitations, such as lacking support for data transformations, making it difficult to thoroughly assess the tool before committing to a paid plan.
- Connectors: Matillion has fewer connectors than most (120+ in total.) You can invoke external APIs to access other systems, but access to all your sources and destinations can become an issue.
- Schema evolution: Matillion does support adding columns to existing destination tables, deleting a column, and handling data type changes as sources change. But adding a table requires creating a new pipeline and there is no automation for schema evolution.
- dbt integration for SaaS: While Matillion ETL has a connector for dbt, there is no integration between Data Loader and dbt.
- Pricing: Compared to more modern ELT vendors, Matillion is expensive. It starts at $1000/month for 500 credits where each credit is a virtual core-hour similar to an AWS, Azure, or Google virtual core. This is really in the $1000s per month minimum. Data productivity Cloud consumes a credit per running task every 15 minutes, and only consumes when tasks are running. The smallest ETL unit is two cores, which means you consume 2 cores an hour, or nearly 3x the 500 credits every month.
Top 7 Matillion Alternatives in 2024
If you’re being held back in your projects by some of Matillion’s limitations, you’ll most likely be evaluating other ETL or cloud ELT tools. If your focus is on loading a cloud data warehouse, moving to an ELT vendor and running dbt or SQL in the cloud data warehouse for your transforms might make good sense. Otherwise, if you need to support multiple use cases in addition to a data warehouse - including data replication, operational analytics, general-purpose operational data integration, or a generative AI project - another ETL tool might make better sense.
This guide reviews 7 of the more common ETL and ELT alternatives to Matillion:
- Estuary
- Airbyte
- Fivetran
- Hevo
- Informatica
- Stitch
- Talend
Estuary
Estuary Flow is a powerful SaaS platform that delivers scalable, real-time ETL, ELT, and CDC capabilities for efficient data integration.It has an easy-to-use no-code interface that lets you build pipelines in minutes. Flow lets you combine batch and real-time sources and destinations across hundreds of databases, SaaS apps, and other sources. Estuary Flow’s includes both streaming ETL transforms using SQL or TypeScript, and support for dbt in destinations.
If you need real-time support, including reliable change data capture (CDC), want to mix real-time with batch, and want to load multiple destinations and support multiple projects, Flow is an excellent choice. It also happens to be the lowest cost option of this list.
Pros
Flow has several key features that make it a great alternative to Matillion. Beyond having many of the features of Matillion including ETL support, it is true public and private cloud, real-time and batch, has among the best CDC support, and also is the lowest cost option.
- Low latency: Estuary supports sub-100ms latency.
- Real-time and batch: Estuary supports real-time and batch, and lets you mix them in a single pipeline. Its support for real-time CDC and batch-loading a data warehouse helps lower costs.
- ETL and ELT: Flow supports using streaming SQL and TypeScript transforms (ETL) and dbt (ELT).
- Built-in store and replay: Estuary automatically stores as it streams data, which lets you replay data and perform time travel.
- Advanced schema evolution: Estuary lets you automate how changes in sources are passed through to destinations.
- Multiple destinations: Estuary lets you load multiple destinations with a single pipeline, unlike most ELT vendors, which only support 1 destination per pipeline.
- High scale: Estuary stands out as one of the most scalable ETL/ELT vendors, uniquely offering incremental snapshots and demonstrating 5-10x the scalability of competitors like Airbyte, Fivetran, and Hevo. With messaging-level scalability akin to Kafka, Estuary sets a new standard in data integration.
- Reliable CDC: Estuary has the fastest and most efficient CDC connectors. It also enables exactly-and-only-once capture, where future backfills use the stream store. This puts the least load on the source system, preserves more change data, and helps keep source systems consistent.
- Private cloud: Estuary supports private cloud deployments. You can deploy a Flow data plane in any private account and many several data planes from a single shared SaaS control plane. It’s the privacy of on prem with the simplicity of SaaS.
- Exactly once delivery: Estuary supports exactly-once transactional delivery so that you don’t have to de-duplicate in the destination.
- Lowest cost: for data at any volume, Estuary is the clear low-cost winner.
- Great support: Customers cite great support as one of the reasons for choosing Estuary.
Cons
- New vendor: Estuary Flow is a relatively new company compared to several ETL vendors. While it is open source and its open source Gazette framework has been maturing for 10 years, the rest is 5 years old.
- 150+ connectors: Estuary Flow has 150+ connectors built by Estuary, which is less than Fivetran. While it also supports 500+ open source connectors, open source connectors are not always as reliable.
Pricing
Estuary Flow offers three pricing plans:
- Free: Up to 2 connectors and 10 GB/month (try Flow for free.)
- Cloud (with a 30-day free trial): $0.50/GB change data moved from source or destination +$0.14/hour/connector
- Enterprise pricing available on demand (contact for more.)
Airbyte
Airbyte was founded in 2020 as an open source ELT company based on the Singer open source framework, and launched its cloud service in 2022. It has since changed its protocol and connectors to be different from Singer, though it kept Singer compatibility to support Singer taps as needed. Airbyte, like Singer, has remained batch-based. It is still a relatively new product. The official 1.0 product launch, the big milestone for any open source project, is planned for September, 2024.
Airbyte has become one of the main open source ELT options. If you go by pricing calculators and customers, it’s the second lowest cost ELT vendor after Estuary.
If you’re focused on open source, Airbyte is a contender (Meltano would be the other open source ELT option.) But if you’re considering replacing Matillion with Airbyte, it would only be suitable for a lower-scale, lower-cost cloud data warehouse deployment, and if you don’t need real-time or ETL.
Pros
- Ease of use: Airbyte is easy to use, like other ELT vendors.
- Low cost: Airbyte Cloud is higher cost than Estuary but lower than most of the others.
- Widely used: While Airbyte is only 4 years old, it is already widely used with thousands of deployments. Most of the customers use the open source version.
Cons
- 50+ managed connectors, 300+ total: Airbyte lists 300+ connectors. But only 50+ of these are connectors actively managed by Airbyte. The rest are open source connectors and listed as Marketplace connectors for Airbyte Cloud.
- High Latency: Airbyte is batch only. While the self-hosted open source can load in intervals of 5 minutes or more, Airbyte Cloud only supports 1+ hour intervals and one source connector at a time.
- Reliability: Airbyte is at-least-once guaranteed delivery, though it does have both incremental and deduped modes you can use though. Under-sized workers have been a bigger reliability issue..Airbyte pipelines lack staging or storage mechanisms to preserve state. If you need the data again, or if anything fails, you’ll need to re-extract from the source.
- Scalability:.Airbyte is known for scalability challenges, which may make it less suitable for handling larger workloads. Its latest benchmark, which was with the new self-hosted open source Postgres connector, not the cloud offering, had a peak 9MB/sec throughput. That’s around 0.5TB per day, depending on how the loads vary throughout the day.
- ELT only: Airbyte cloud supports dbt cloud but is ELT only. If you want to implement (ETL) transforms outside of the data warehouse, Airbyte is not a good option.
- DataOps: Airbyte provides UI-based replication designed for ease of use. It does not give you a configuration or code option that helps with automating end-to-end pipelines, adding tests, or managing schema evolution.
Pricing
Airbyte starts at $10 per GB of data moved, or $15 per million rows of data moved via an API (or custom source.) It drops from there with volume-based discounts. You do pay the same for initial backfills.
Fivetran
Founded in 2012 by data scientists, Fivetran was built to deliver an integrated platform for effortless data capture and analysis.The name was actually a play on Fortran and intended as a programming language for big data. Eventually Fivetran focused on just the data integration part because that’s what people were buying. In 2018, Fivetran raised their series A. In 2020 they added transformation capabilities in 2020 with Data Build Tool (dbt) support and started to support CDC. Fivetran also acquired HVR for its CDC, but it remains a separate data pipeline.
Fivetran is a leading ELT vendor, renowned for its ease of use and an extensive library that includes nearly 300 pre-built connectors along with 300+ lite (API) connectors. If you’re loading a cloud data warehouse, it is a viable option for replacing Matillion. But Fivetran has no ETL, private cloud, or real-time support. It is also considered the most expensive of the ELT vendors.
Pros
- Ease of Use: Fivetran is cloud-native SaaS with a user-friendly interface that requires minimal coding.
- Pre-built Connectors: Fivetran provides nearly 300 pre-built connectors for various data sources, along with an additional 300+ connectors designed to interact with APIs..
- Scalability: Fivetran is known for its scalability compared to other ELT vendors.
- Integration with dbt: Fivetran has done a good job of integrating with dbt core (open source).
- Focus on EL: Fivetran's core strength is batch-based data extraction and loading, which is well suited for cloud data warehouses.
- Advanced schema evolution: With Fivetran you can automate how changes in sources are passed through to destinations better than most other vendors (similar to Estuary.)
Cons
- High latency: Fivetran is batch only for all its connectors, even CDC and messaging. HVR supports real-time, but it’s a different product and pipeline from Fivetran.
- High, unpredictable costs: Fivetran is known for its high and unpredictable costs, making it the priciest option among modern ELT vendors. It is also the most unpredictable. Fivetran bills based on monthly active rows (MAR). Some sources require all source data to be copied. Data from non-relational sources is transformed into highly normalized relational data, often resulting in a much higher count of MARs than anticipated.
- Reliability: Fivetran implements batch CDC. It first extracts a full snapshot, and then processes the source database’s transaction log in batch intervals. Both add load to a database and have been known to cause database failures. The alerts, warnings, and failures that get generated lead to additional admin time.
- Support: When issues arise, customers often report that Fivetran's support can be slow to respond.
- DataOps: Fivetran offers little control or transparency over how they manage data and schema, leaving users in the dark. Sometimes they don’t bring in all the data depending on the data structure and don’t tell you why. They change field names and data structures but do not let you rename columns. This also makes it harder to migrate to other ELT/ETL tools.
- Roadmap: Customers frequently comment Fivetran does not provide much of a future direction or roadmap.
Pricing
Fivetran's pricing is based on Monthly Active Rows (MAR), which can be unpredictable due to how Fivetran internally represents and processes data, especially from non-relational sources. Additionally, reducing latency significantly increases costs. It is not unheard of for Fivetran costs to reach 6 digits annually, especially with some non-relational or limited sources as mentioned above.
- A small deployment (2M MARs/month) can cost $700-$2667
- 10M MARs/month get you into $10K a month.
Hevo Data
Hevo Data is a no-code SaaS data pipeline platform that started as a cloud service in 2017. Hevo is primarily ELT but has been adding some row-based ETL support.
If you’re looking for a Matillion alternative that has some row-based transform support, you could consider Hevo. While Hevo uses Kafka as part of its underlying architecture, it’s only batch-based. If you need real-time CDC or loading, consider others like Estuary.
Pros
- Ease of use: Hevo is intuitive and easy to use, like several other ELT vendors.
- ELT and ETL: Hevo has been adding more transform (ETL) capabilities including Python scripts and a new drag-and-drop editor (in Beta.) Both are more for row-level transformations though. Hevo’s main transform support is dbt, especially for doing more complex transforms involving joins.
- Reverse ETL: Hevo can insert source data back into the source once it’s been cleansed. If you’re implementing this type of data cleansing Hevo might be a good option.
Cons
- Connectivity: Hevo has only slightly more connectors than Matillion, slightly over 150. You’ll have to evaluate whether it provides all the sources and destinations you need.
- Latency: Hevo connectors to sources are batch only with 5 minute minimum delays. Also, there is currently no common scheduler. So end-to-end latency becomes longer as the sources and target operate with different intervals and schedules.
- Costs: Hevo's pricing can be one of the most affordable options for low data volumes, particularly in the low GBs per month range. But it becomes more expensive than others as you reach 10s of GBs a month. Costs will also be much more as you lower latency because several Hevo connectors only support full extracts each time, which can make costs soar.
- Reliability: Customers have also complained about Hevo bugs that make it into production and cause downtime. Also, CDC is batch only, which can add loads and even cause failures.
- Scalability: Hevo has several limitations around scale. Some you can change like the 50MB Excel, and 5GB CSV/TSV file limits by contacting support. But most limitations cannot be changed, like column limits or ingestion limits that cause issues, like a 25 million row limit per table on initial ingestion. Custom scheduling is limited to 24 different times. You also can’t make more than 100 API calls per minute. Users have also reported high CPU usage which can cause system slowdowns.
- DataOps: There is no CLI or configuration-based automation support with Hevo. You can map to a destination table manually, which can help. You can’t fully automate schema evolution. There is no schema testing or evolution control. New tables can be passed through. Also, many column changes end up getting moved to a failed events table that must be fixed within 30 days or the data is permanently lost.
Pricing
- Free: Up to 1 million free events per month with free initial load, 50+ connectors, and unlimited models
- Starter ($239/mo): Includes access to 150+ connectors, on-demand event processing, and a 12-hour support window.
- Business (Custom Pricing): HIPAA compliance with a dedicated data architect and account manager
Informatica
Informatica offered one of the first ETL products, Powercenter, in 1993, and one of the first cloud integration products, Informatica Cloud, in 2006. Informatica Cloud was originally built based on an older version of Informatica PowerCenter and eventually upgraded to a newer version of the on premises data integration based on Hadoop, then Spark.
Informatica is perhaps the best example of a mature data integration platform and a solid alternative to Matillion, especially if you need more than just data integration. It’s among the best data quality and master data management software, and has many other capabilities as well.
While it was one of the first to make the transition to the cloud and has one of the strongest and broadest data integration feature sets, it is harder to use, more expensive, and not as DataOps-native. However, it offers strong enterprise features and boasts one of the best private cloud architectures.
Pros
- A comprehensive data management platform: Informatica Intelligent Data Management Cloud is a data integration platform, not just ETL. It offers a wide range of features, including replication, data quality management, and master data management.
- Rich data integration functionality: Informatica is probably the most feature-rich data integration, data quality, and master data management.
- Great connectors: Informatica has over 300 connectors, including proven connectors to high-performance on premises and cloud data warehouses.
- Performance and scalability: Informatica is built for batch and low latency data pipelines at scale. It has supported serverless compute, pipeline partitioning, push-down optimization, and other features for years.
- Private cloud: Informatica is one of the few vendors that supports private cloud, with data plane deployments managed by a shared SaaS control plane.
Cons
- Harder to learn: Informatica Cloud is easier than Powercenter. But it still has a significant learning curve compared to most SaaS ELT services. This may limit it to larger, specialized data integration teams.
- Doesn’t support DataOps as well: Informatica Cloud wasn’t designed for CI/CD and DataOps. Versioning and other tasks are harder than some of the more modern ELT tools.
- Higher vendor costs: Informatica is more expensive than most other ELT and ETL vendors, perhaps with the exception of Fivetran.
Pricing
Informatica’s consumption-based pricing is complicated and is not pay-as-you-go. Read the Informatica Cloud and Product Description Schedule for more information. Cloud is mostly based on hourly pricing, with some other pricing like row-based pricing for CDC-based replication.
Talend
Talend, now part of Qlik, has two main products—Talend Data Fabric and Stitch. Talend Data Fabric is a data integration platform that, like Informatica, is broader than ETL. It also includes data quality and data governance capabilities. If you’re searching for Talend ETL, this is the product you’re looking for. Stitch is an open-source ELT tool built on the Singer framework.. At this point Meltano is arguably a better Singer option to consider, and both Airbyte and Estuary support Singer/Stitch open source connectors.
Talend had an open-source option, Talend Open Studio, but it has been retired. You can choose to deploy Data Fabric on premises, or use Talend Cloud, which includes Data Fabric Cloud.
Talend is certainly a strong alternative to Matillion ETL. It has most of the features. It has a better SaaS option, data quality and governance, and API-based integration. Like Matillion, Talend will be expensive, more complex to learn and use than an ELT vendor, and does not have all the latest features. While Talend claims to offer around 1,000 connectors, only 50+ of these are actually pre-built. The rest are reusable connections that package up API calls, for example. They’re not really connectors.
Pros
- ETL platform: Data Fabric has rich transformation, data mapping, and data quality features that help with building data pipelines.
- Real-time and batch: Real-time support includes streaming CDC. While it’s mature technology, it is still real-time.
- Strong monitoring and analytics: Like Informatica, Talend has built up good visibility for operations.
- Cloud and on premises: Talend does have a rich SaaS option in addition to its original Data Fabric, unlike Matillion ETL, which is on premises only.
Cons
- Learning curve: Talend has an older UI that takes time to learn, just like some other ETL tools. Building transforms can take time.
- Limited connectors: Talend claims 1000+ connectors. But it lists 50 or so databases, file systems, applications, messaging, and other systems it supports. The rest are Talend Cloud Connectors, which you create as reusable objects.
- High costs: there is no pricing listed. You should expect higher costs than most pay-as-you-go tools, as well as Stitch.
Pricing
Quotes are available upon request. You should expect it’s going to be higher cost than several of the pay-as-you-go ELT vendors, with the exception of Fivetran.
Stitch Data
Stitch is a SaaS version of the Singer open source project. Stitch was originally created within RJMetrics, which became a separate company when Magento purchased RJMetrics in 2016. Soon after in 2017, Stitch contributed to the Singer open source project. Stitch then got acquired by Talend in 2018. Talend was then acquired by Qlik. Over 3,000 companies rely on Stitch, with even more utilizing the Singer framework.
Stitch calls itself ETL; it’s not. Stitch is batch-based ELT with basic row-based data conversions to move raw data from each source to the target just like most other ELT technologies. It also only supports soft deletes like other ELT products.
Stitch is not a great option to consider as a Matillion alternative for several reasons. First, there isn’t much investment going into Singer or Stitch, which you can see by looking at what’s new in the release notes. Second, Stitch is ELT only. If you’re using Matillion for ETL, you should consider Estuary, Informatica, or perhaps Hevo.
Pros
- Open source: Singer is a solid open source framework, and you have the choice of Stitch, Meltano, even Airbyte and Estuary which can support Singer taps.
- Log retention: Stitch supports up to 60 day log retention for recovery, which is better than many other vendors except for Estuary.
- Support: Qlik offers support for Stitch.
- Integration with other Qlik products: if you’re a Qlik customer and using Stitch, there is a lot to be said for continuing to use Stitch.
Cons
- Lack of investment: You only need to look at the Stitch changelog to see there isn’t a log of investment in the Singer framework or Stitch.
- Pricing: You will most likely spend $1250+ a month on any reasonable deployment. If you need VPN/privatelink and a high level of support you will need Premium at a minimum of $2500 a month. At that price you can consider several other ELT vendors along with Estuary.
- Limited connectors: Stitch still only supports 140+ taps (connectors). While that’s still a good amount, it’s still less than several others. There are more (200+) Singer connectors, but your quality may vary. There aren’t many new connectors being added either. If you need additional connectors, this may be an issue.
Pricing
Stitch starts at $100 per month/$1000 per year for the most basic plan at 3 million rows per month, $1250 a month for Advanced at 100m rows, and $2500 for Premium at 1 billion rows/month.
Conclusion
For the most part, if you are interested in a cloud option, and the connectivity options exist, you may choose to evaluate Estuary as a Matillion alternative or replacement.
- Low latency: Estuary is the only SaaS vendor in this comparison with sub-second end-to-end latency.
- High scale: Estuary Flow is one of the most scalable products, especially with CDC. It is the only vendor capable of doing incremental snapshots and has demonstrated the ability to extract and load gigabytes per second in production..
- Most reliable: Estuary’s exactly-once transactional delivery and durable stream storage is partly what makes it the most reliable data pipeline vendor as well for real-time workloads.
- Private cloud: Beyond Matillion,Informatica, and Talend, Estuary is one of the few modern ETL/ELT vendors that supports private cloud deployments.
- Lowest cost: for data at any volume, Estuary is the clear low-cost winner.
- Great support: Estuary customers frequently comment on great support as one of the reasons they use Estuary.
Ultimately the best approach for choosing an alternative to Matillion is to identify your future and current needs - including your sources and destinations, the key data integration features you need, your performance, scalability, reliability, and security needs - then use this information to choose a good short-term and long-term solution for you.
Getting Started with Estuary
- Getting started with Estuary is simple. Sign up for a free account
- Make sure you read through the documentation, especially the get started section:
- I highly recommend you also join the Slack community. It’s the easiest way to get support while you’re getting started.
- If you want an introduction and walk-through of Estuary, you can watch the Estuary 101 Webinar.
- Questions? Feel free to contact us any time!
Related Blogs