Should you hire a data engineer instead of a data scientist?
In this post, we'll look at
- which aspects are required to develop a data-driven software product,
- how data scientists and data engineers fit these aspects,
- how to detect if your team needs a data engineer, and
- how to find a data-engineer for your team, if you need one.
Three aspects for any data-driven product
Developing a data-driven software product is not only about analytics. In fact, there are three aspects required for a product to succeed: Consulting, Analytics, and Automation.
Let’s dive into each of them individually.
Aspect #1: Consulting
Consulting is about
- translating the business problem into analytical terms,
- collaborating on a business-case, and
- validating outcomes with (internal) business clients.
It requires business-acumen and desire to understand the domain.
Aspect #2: Analytics
Analytics is about extracting insights from data to solve a given business problem. It requires in-depth knowledge in:
- preparing and cleaning the data/features,
- simple models like excel and data exploration/vizualization,
- complex models to reach out for if needed, like machine-learning, statistics and simulations.
The focus here is to demonstrate via a proof of concept that the analytical solution adds value to the business.
Aspect #3: Automation
Automation is about optimising & preparing the product for long-term use. The proof-of-concept is improved to production-level code, using software-engineering best-practices.
Underestimating this step might result in:
- low confidence in the product from end-users, due to many unexpected bugs,
- an unexpected loss of production data,
- unexpected long-term service unavailability,
- inability to promptly add a feature/fix an issue.
Automation requires software-engineering skill set tailored for data-driven products.
Data Scientists vs. Data Engineers
Data Scientists usually come from mathematical/statistics/operation-research/machine-learning background. Having business acumen, they are strong in both analytics and consulting. They live and breath for data insights and applying models to solve a business problem.
Data Engineers, on the other hand, come from software-engineering background. They live and breath for automation. They understand how to ship high-quality production-level code, including code readability, testability, architecture, DevOps, automated-deployment, robust ETL, etc.
Data engineers can also speed-up delivery of analytical parts by providing technical support for data scientists. For example:
- create a discovery platform for data scientists (e.g. IPython notebooks automatically hooked to a cleaned data warehouse representing a single source of truth),
- provide infrastructure improvements for data scientists (e.g. a common Docker image with all data-science tooling installed),
- create shareable libraries, boilerplates, etc.
Skills mapping: Summary
Here's a summary of expected expertize in the three discussed aspects by role.
| Expected expertize | Data Scientist | Data Engineer | |------------------------|----------------|---------------| | Consulting | :medal: Strong | Basic | | Analytics | :medal: Strong | Medium | | Automation/Engineering | Basic | :medal: Strong |
NB: As applicable to any role, the more a data-engineer knows about analytics and consulting, the better. And vice versea, the more a data scientist knows about automation and engineering, the better.
Do you need a Data Engineer?
This question should be simple simple to answer: Ask your team which activities they spend most of their time on.
If the answer includes mostly manual deployments, getting access to data, re-cleaning the data, code refactoring, application monitoring, dev-ops, fighting Spark/Hadoop/Kafka/Yarn issues, then you probably need an additional Data Engineer.
If the answer is modeling, feature-engineering, vizualisations, communicating with an internal customer, you are probably not in need of additional Data Engineer.
How to find Data Engineers
In the current job market, the demand for data engineers exceeds supply. In this context, there seems to be two viable options on how to get an additional data engineer to the team:
Option #1: Become an attractive workplace
Become an attractive workplace so that data-engineers come to you: start open-source initiatives & analytical blogs, strengthen conference presence, start organizing local meetups.
This option is the harder one but it pays off in long-term.
Option #2: Turn generalists into specialists
Alternatively, hire software-engineers who are generalists. A strong generalist (e.g. a Python/Scala developer) would grasp the required stack fast and would be a great engineering complement to your existing team of data scientists.