The all-round profile of the modern data engineer
Essential skills and team structures for effective data engineering
- Article
- Data Engineering
- Analytics Engineering
Since the field of big data emerged, many elements of the modern data stack became the data engineers' responsibility. What are these elements, and how should you build your data team?
A brief history
The term ‘data engineer’ was founded by big tech companies in the early 2010s when data became ‘big data’. The term was used for software engineers who specialised in building data pipelines. A couple of years before, when Hadoop was first released, the term ‘Big Data Engineer’ was pioneered in the industry. Around the same time, Amazon was first to release its cloud service Amazon Web Services (AWS). Over time, most data platforms have moved to the cloud, running most frequently on either AWS, Azure or Google Cloud Platform (GCP).
The data engineer has become the all-round engineer for building data platforms in the cloud. At first, the term was mostly used for engineers building data pipelines, and over time for building the platforms as well. With the world of big data maturing and moving to the cloud, knowledge about CI/CD and cloud environments was added to the stack.
The skills of a modern data engineer
What skills are expected from a modern data engineer?
The following requirements are common to see in a data engineering job listing:
- Programming: Often Python, sometimes Scala or other programming languages. Often Object Oriented-programming (OOP), and includes writing tests, code performance, version control etc.
- CI/CD: Configuring your pipelines, setting up build agents, storing artifacts and/or containers, running tests, deploying applications and infrastructure.
- Cloud: Knowledge of multiple cloud environments, user and identity management, infrastructure as code, cloud functions and other compute services, orchestration, queues, databases.
- Databases and datawarehousing: Querying SQL, noSQL, building datalayers in datawarehouses (dbt), data modelling and database performance.
- Infrastructure: Running and deploying infrastructure as code, networking, deployment and maintenance of Kubernetes.
- Architecture: Designing and evaluating architectures, sometimes in collaboration with solution architects.
- Data science and analysis: knowledge about machine learning lifecycle management, dashboarding.
This list is quite extensive. In practice, only senior engineers have a comprehensive understanding of all, or even most of these elements.
A new default
If we would create a parallel with software engineering, tests could be written by a dedicated tester and most developers would be dedicated to one part of the stack. ‘Frontend’, ‘backend’, or ‘mobile’, for instance. In the world of data engineering, ‘full stack’ is often the default. However, we do see more specialised roles emerging in the field of data as well, like DevOps engineers or platform engineers. A positive development which we expect to be seen more often in the future.
T-shaped profiles
It is difficult to form a team of professionals who all have the all-round profile as described above. Instead, we would recommend putting together a team of T-shaped profiles, all of whom have complementary knowledge in specific sub-areas but still have a good abstract understanding of the entire stack.
Doing so, keep the following considerations in mind:
Team size
A large team will have more room for specialised roles which are part of the data engineering stack, like a DevOps engineer or a Python programmer. Smaller teams will have to rely on all-round profiles, as fewer people will become responsible for larger parts of the stack.
Complexity
A complicated tech stack, business and/or IT environment will create extra demand for specialised roles. For instance, complicated business logic might raise the need to bring in analytics engineer to design and develop the data layers in a data warehouse.
Expectations
This still leaves the question as to why the role of the modern data engineer has grown to so many expectations. This is a difficult question to answer, though we expect it has to do with the following two factors:
- It has become increasingly easy to set up resources within cloud environments using Infrastructure as Code. For example, with a couple of lines of code, you can set up a platform, where an entire network is deployed for you. Without having to explicitly mention it, you are deploying, and becoming the maintainer of a network for the data pipelines you want to run.
- The field of data is still relatively new compared to IT and software engineering. As the field matures, (hiring) managers become more knowledgeable about specific roles, team structures and responsibilities may change over time.
Wrap up
It’s important to be aware of the different subdomains which are part of data engineering. Every data engineer has a different profile, depending on their interest and experience. Before hiring, be sure to know what parts of the technology stack are most relevant in your team and what knowledge is currently lacking.
Are you interested in advice on setting up your data team? Or would you like expertise from a data engineer within your organisation? We will be happy to help you!
We’re also looking for data engineers who would like to join our team! View the job opening here.
This is an article by Oskar van den Berg, Data Engineering Consultant at Digital Power
Oskar started developing his own websites at the age of 10. During his career, he increasingly focused on Data Engineering. Through Digital Power he works for major clients such as De Nederlandsche Bank and ASML.
Data Engineeroskar.vandenberg@digital-power.com
Receive data insights, use cases and behind-the-scenes peeks once a month?
Sign up for our email list and stay 'up to data':