The all-round profile of the modern data engineer

Essential skills and team structures for effective data engineering

Article
Data Engineering
Analytics Engineering

Oskar van den Berg

Data Engineer

3 min

13 Jul 2023

Since the field of big data emerged, many elements of the modern data stack became the data engineers' responsibility. What are these elements, and how should you build your data team?

A brief history

The term ‘data engineer’ was founded by big tech companies in the early 2010s when data became ‘big data’. The term was used for software engineers who specialised in building data pipelines. A couple of years before, when Hadoop was first released, the term ‘Big Data Engineer’ was pioneered in the industry. Around the same time, Amazon was first to release its cloud service Amazon Web Services (AWS). Over time, most data platforms have moved to the cloud, running most frequently on either AWS, Azure or Google Cloud Platform (GCP).

The data engineer has become the all-round engineer for building data platforms in the cloud. At first, the term was mostly used for engineers building data pipelines, and over time for building the platforms as well. With the world of big data maturing and moving to the cloud, knowledge about CI/CD and cloud environments was added to the stack.

The skills of a modern data engineer

What skills are expected from a modern data engineer?

The following requirements are common to see in a data engineering job listing:

Programming: Often Python, sometimes Scala or other programming languages. Often Object Oriented-programming (OOP), and includes writing tests, code performance, version control etc.
CI/CD: Configuring your pipelines, setting up build agents, storing artifacts and/or containers, running tests, deploying applications and infrastructure.
Cloud: Knowledge of multiple cloud environments, user and identity management, infrastructure as code, cloud functions and other compute services, orchestration, queues, databases.
Databases and datawarehousing: Querying SQL, noSQL, building datalayers in datawarehouses (dbt), data modelling and database performance.
Infrastructure: Running and deploying infrastructure as code, networking, deployment and maintenance of Kubernetes.
Architecture: Designing and evaluating architectures, sometimes in collaboration with solution architects.
Data science and analysis: knowledge about machine learning lifecycle management, dashboarding.

This list is quite extensive. In practice, only senior engineers have a comprehensive understanding of all, or even most of these elements.

A new default

If we would create a parallel with software engineering, tests could be written by a dedicated tester and most developers would be dedicated to one part of the stack. ‘Frontend’, ‘backend’, or ‘mobile’, for instance. In the world of data engineering, ‘full stack’ is often the default. However, we do see more specialised roles emerging in the field of data as well, like DevOps engineers or platform engineers. A positive development which we expect to be seen more often in the future.

T-shaped profiles

It is difficult to form a team of professionals who all have the all-round profile as described above. Instead, we would recommend putting together a team of T-shaped profiles, all of whom have complementary knowledge in specific sub-areas but still have a good abstract understanding of the entire stack.

Doing so, keep the following considerations in mind:

Team size

A large team will have more room for specialised roles which are part of the data engineering stack, like a DevOps engineer or a Python programmer. Smaller teams will have to rely on all-round profiles, as fewer people will become responsible for larger parts of the stack.

Complexity

A complicated tech stack, business and/or IT environment will create extra demand for specialised roles. For instance, complicated business logic might raise the need to bring in analytics engineer to design and develop the data layers in a data warehouse.

Expectations

This still leaves the question as to why the role of the modern data engineer has grown to so many expectations. This is a difficult question to answer, though we expect it has to do with the following two factors:

It has become increasingly easy to set up resources within cloud environments using Infrastructure as Code. For example, with a couple of lines of code, you can set up a platform, where an entire network is deployed for you. Without having to explicitly mention it, you are deploying, and becoming the maintainer of a network for the data pipelines you want to run.
The field of data is still relatively new compared to IT and software engineering. As the field matures, (hiring) managers become more knowledgeable about specific roles, team structures and responsibilities may change over time.

Wrap up

It’s important to be aware of the different subdomains which are part of data engineering. Every data engineer has a different profile, depending on their interest and experience. Before hiring, be sure to know what parts of the technology stack are most relevant in your team and what knowledge is currently lacking.

Are you interested in advice on setting up your data team? Or would you like expertise from a data engineer within your organisation? We will be happy to help you!

We’re also looking for data engineers who would like to join our team! View the job opening here.

This is an article by Oskar van den Berg, Data Engineering Consultant at Digital Power

Oskar started developing his own websites at the age of 10. During his career, he increasingly focused on Data Engineering. Through Digital Power he works for major clients such as De Nederlandsche Bank and ASML.

Oskar van den Berg

Data Engineeroskar.vandenberg@digital-power.com

Receive data insights, use cases and behind-the-scenes peeks once a month?

You may find this interesting too

Implementing a data platform

Based on our know-how, the purpose of this blog is to transmit our knowledge and experience to the community by describing guidelines for implementing a data platform in an organisation. We understand that the specific needs of every organisation are different, that they will have an impact on the technologies used and that a single architecture satisfying all of them makes no sense. So, in this blog we will keep it as general as we can.

Data quality: the foundation for effective data-driven work

Data projects often need to deliver results quickly. The field is relatively new, and to gain support, it must first prove its value. As a result, many organisations build data solutions without giving much thought to their robustness, often overlooking data quality. What are the risks if your data quality is not in order, and how can you improve it? Find the answers to the key questions about data quality in this article.

A standardised way of processing data using dbt

One of the largest online shops in the Netherlands wanted to develop a standardised way of data processing within one of its data teams. All data was stored in the scalable cloud data warehouse Google BigQuery. Large amounts of data were available within this platform regarding orders, products, marketing, returns, customer cases and partners.

Reliable reporting using robust Python code

The National Road Traffic Data Portal (NDW) is a valuable resource for municipalities, provinces, and the national government to gain insight into traffic flows and improve infrastructure efficiency.

Setting up a future-proof data infrastructure

Valk Exclusief is a chain of 4-star+ hotels with 43 hotels in the Netherlands. The hotel chain wants to offer guests a personal experience, both in the hotel and online.

A scalable data platform in Azure

TM Forum, an alliance of over 850 global companies, engaged our company as a data partner to identify and solve data-related challenges.

A fully automated data import pipeline

Stichting Donateursbelangen aims to strengthen trust between donors and charities. They believe that that trust is based on collecting money honestly, openly, transparently and respectfully. At the same time effectively using the raised donation funds to make an impact. To further this goal, Stichting Donateursbelangen wants to share information about charities with donors through their own search engine.

5 questions for Data Engineer Oskar

In this video, you will find out what a job as a Data Engineer looks like! What does a working week look like, which clients do our Data Engineers work for and what makes working so much fun? Oskar likes to tell you more about it!

How do I become a Data Engineer?

A few years ago, the job title didn't even exist: Data Engineer. Nowadays, there is a high demand for Data Engineers. Almost every organisation consciously collects data, and the realisation that this must be done in a structured way is growing. If the data you collect is not well organised and correct, you cannot use it as input for making good decisions. Data Engineers build infrastructures that process data. Therefore, they are indispensable to organisations that want to collect and apply their data in a structured way.