How new pipeline tools are changing data engineering in the 2020s
For the past several years, the data workforce has been chronically short on one type of professional: the data engineer. But what will happen as new vendors build services that automate much of their daily work?
Data engineers inhabit a booming industry where job descriptions shift just as quickly as new data tools hit the market. Far from being a cause for concern, this is actually a good thing.
We’ll never stop needing highly technical data people. As we devise ways to automate the tedious aspects of data engineering, their responsibilities will shift, and ultimately, opportunities will become more complex and diverse.
To see where we’ve been and where we may be headed, let’s dive into data engineering’s past, present, and future.
A brief history of data engineering
Let’s start with a simple question: what exactly is a data engineer, and what do they do?
In rough terms, data engineers make sure that the data an organization needs is available for analysis and can be used to reliably meet business objectives.
It’s a technical role that exploded in popularity during the latter half of the 2010s. To understand what the job has really meant during the ensuing years, and to understand its future, we need a quick historical primer.
The rise of data engineering more or less parallels the rise of “big data” (in other words, data as we know it today) and the resulting boom in tooling.
Prior to 2010, data engineering was more or less an unknown concept, and data itself lay mostly under the purview of a given organization’s IT department. Things began to change rapidly throughout the 2010s. Data systems like Hadoop, and later Kafka, AirFlow, and others, originated at tech giants and grew legs. OLAP warehouses like Redshift and Snowflake powered analytics at a scale previously thought impossible, and business demands for powerful data-driven insights became the norm. However, getting ahold of accurate, complete, clean data for analysis was no longer such a simple matter.
During this time, data engineering went from a virtually unknown job title to one for which openings far outnumbered qualified applicants. This is because the systems mentioned above are hugely powerful but also hugely high-maintenance. For instance, Kafka is complex and unopinionated in how it is used, and managing Airflow is often an entire job unto itself.
What did this mean for the daily work of the data engineer? Most of it was (and still is):
- Using the available frameworks combined with custom code to integrate data across an organization; in other words, building custom ETL solutions.
- Maintaining sensitive data systems and pipelines, often fixing broken systems under extremely urgent timelines (after all, the entire business’s data supply was at stake)
Despite the recent rise of ELT SaaS and managed services — more on that below — the explosion in data engineering jobs has more or less continued to the present day.
In their 2020 emerging jobs report, LinkedIn found that between 2015 and 2020, hiring rates for data engineers increased by 35%, though the job is notoriously hard to hire for. In 2020, DICE reported data engineer job growth at about 50%, an extraordinary rate. At the time of this writing, about 195,000 data engineer positions in the USA were posted on LinkedIn.
So, for emerging data professionals in the last several years, pursuing data engineering may have seemed like a foolproof path to job security. But is that really the case? If you follow this industry, you already know the answer isn’t so simple.
The confusing landscape of data jobs
You might view the demand for data engineers as a bubble; we can’t expect it to grow forever, but it’s unlikely to disappear entirely. After all, data engineering was a virtually unknown concept twelve years ago.
In 2012, a Harvard Business Review article arguably jumped the gun by declaring data science the sexiest job of the 21st century. Ten years later, it’s hard to make a case for “data scientist” as the sexiest job in data. Its buzz was overtaken by “data engineer,” which is now starting to lose attention to the “analytics engineer” role, which we’ll discuss shortly.
Changes in the job makeup are inevitably faster in a field as dynamic as data, but data the professions in general will continue to see growth. The key to success — regardless of your current title — is to be adaptable in the challenges you’re poised to solve, of which there will be many.
ELT services change the data engineering role
I’ve stated that the data engineering hiring boom was mostly driven by demand for custom-made, hard-to-manage ETL pipelines. Let’s back that up with some, um, data.
- A joint study from data.world and DataKitchen found that 50% of surveyed data engineers spent too much time maintaining pipelines and/or manual process. Constantly playing catch-up and fast-paced stakeholder requests were also major stressors, with suggests that most issues are pipeline-related.
- A Gartner study found that data professionals spent 56% of their time on operational execution and just 22% on innovation that delivers value.
These statistics definitely have a negative tone, and it’s worth noting that hand-building ETL pipelines is not inherently undesirable work. But for a growing number of companies, it has proven to be stressful, time-consuming, and not the only thing that a growing cohort of gifted engineers would like to be doing. As early as 2016, people have been writing that engineers shouldn’t write ETL, but many are still doing it.
At the very least, it’s become clear that a certain amount of repetitive, time-sensitive tasks could be automated, absorbing much of the burden on data engineers.
This is where modern ELT providers come in.
Various startups have grown and thrived by providing data pipelines in the form of an intuitive SaaS tool or managed service. ETL stands for “extract, transform, load.” Generally speaking, ELT vendors offer a way to make the “E” and “L” easier, taking the burden of pipeline building off of engineers but leaving the “T” piece at the discretion of the company.
Some vendors, like Confluent, aren’t actually ELT, but meet a similar need by managing a single, more challenging integration tool; in their case, Kafka. Some, like Fivetran, strive for easy ELT with many connectors to external systems. Hevo‘s real-time ETL combines the benefits of a vast connector library with the real-time benefits of Kafka.
And Estuary Flow allows you to build scalable, flexible pipelines that can ingest and operationalize data in real time. With both a web UI and a powerful CLI, team members across the technical spectrum can use Flow to meaningfully contribute to data integration.
The first and most obvious impact such automation will have on data engineering is that the exponential growth in job openings — specifically those that are ETL-centric — will taper off dramatically. In other words, the supply of data engineers will meet the demand. This is good for business’s bottom lines, company culture, and existing engineers’ quality of life.
The second is that data engineers whose jobs revolve around pipeline building and maintenance will see a shift in their role.
Neither of these are bad things. In fact, they open the door for new possibilities in a fast-growing field.
The future of data engineering
Remember this statistic from earlier?
Data professionals spent 56% of their time on operational execution and just 22% on innovation that delivers value.
It’s unrealistic to think that the operational work will just go away with the rise of ELT and other data integration solutions. There are always new pieces being added to the data stack, and innovation adds pitfalls that require humans to navigate. What we can hope for is that better tooling will flip those percentages.
Once the operational burden of infrastructure is eased, data professionals — both engineers and others — can focus more on building things that are new, unique, and valuable. The tool takes care of the stressful, time-dependent work of tending to pipelines. It keeps engineers from constantly needing to put out fires.
They can then focus their efforts on things their tools can’t do, which they wouldn’t have had time for in the past. This could be anything from refining performance in a specific area to collaborating with analysts to build cutting-edge data solutions.
For more insight, check out Matt Arderne’s article The future history of Data Engineering. In it, he says:
When the data is easy to centralise, combine and analyse, engineers won’t be needed to devise and contrive data combining solutions.
They can go and contrive and devise something else, that is complex, and that gives the company a competitive advantage.
It’s an appealing prospect: a dynamic in which the work of data engineers is likely more interesting, and has a more direct impact on the growth of the business.
It’s not about the job title, anyway
While some data engineers shift toward tackling more in-depth technical goals, others may shift to being more team and product-oriented. In fact, enough professionals are already occupying the space between data infrastructure and the business applications of data that a different job title has come about: analytics engineer.
Analytics engineering is focused on providing data products to end users. This is in contrast to data engineering, which is typically more about the pipeline itself.
Of course, it can be hard to delineate what various data job titles actually mean, or to predict what job titles may pop up in the future. As a data professional, it may become harder to figure out exactly where you fit. But many would argue that this grey area (or, should I say purple area) is a cause for celebration. It eliminates the dichotomy between those who care about the business outcomes of data and those who understand the infrastructure.
Even if “data engineer” job postings taper off, businesses won’t stop needing highly technical, infrastructure-focused data experts. And by adding a bit of automation and easing the burden on these people, we can empower them to specialize, branch out, and innovate. In this environment, perhaps data teams will be able to work better together, understand each other better, and ultimately, accomplish more.
Estuary was founded by data professionals on a mission to reduce operational burden, give engineers data integration superpowers, and unlock your data’s full potential.
Check out our code on GitHub or try for free today.
Keywords: analytics engineering, data pipeline, ELT, ETL, jobs