Data scientist vs machine learning engineer

31

13

What are the differences, if any, between a "data scientist" and a "machine learning engineer"?

Over the past year or so "machine learning engineer" has started to show up a lot in job postings. This is particularly noticeable in San Francisco, which is arguably where the term "data scientist" originated. At one point "data scientist" overtook "statistician", and I'm wondering if the same is now slowly beginning to happen to "data scientist".

Career advice is listed as off-topic on this site, but I view my question as highly relevant since I'm asking about definitions; I'm not asking about recommendations given my own career trajectory or personal circumstances like other off-topic questions have.

This question is on-topic because it might someday have significant implications for many users of this site. In fact, this stack-exchange site might not exist if the "statistician" vs "data scientist" evolution had not occurred. In that sense, this is a rather pertinent, potentially existential question.

Ryan Zotti

Posted 2018-02-20T06:15:04.687

Reputation: 2 047

2Data scientist sounds like a designation with little clarity on what the actual work will be, while machine learning engineer is more specific. In first case, your company will give you a target and you need to figure out what approach (machine learning, image processing, neural network, fuzzy logic, etc) you would use. In second case, you company has already narrowed down to what approach has to be used.gurvinder372 2018-02-20T06:31:40.477

Related: data science vs operations research . Also, a scientist is something different than an engineer. Unfortunately, industry doesn't seem to care about this.

Discrete lizard 2018-02-21T09:56:41.027

Answers

21

Good question. Actually there is a lot of confusion on this subject, mainly because both are quite new jobs. But if we focus on the semantics, the real meaning of the jobs become clear.

Beforehand is better to compare apples with apples, talking about a single subject, the Data. Machine Learning and it's sons (Deep Learning, etc.) is just one sub-subject of the Data World, together with the statistic theories, the data acquisition (DAQ), the processing (which can be non-machine learning driven), the interpretation of the results, etc.

So, for my explanation, I will broad the Machine Learning Engineer role to the one of Data Engineer.

Science is about experiment, trials and fails, theory building, phenomenological understanding. Engineering is about work on what science already knows, perfecting it and carry to the "real world".

Think about a proxy: what is the difference between a nuclear scientist and a nuclear engineer?

The nuclear scientist is the one which know the science behind the atom, the interaction between them, the one which wrote the recipe which allow to get energy from the atoms.

The nuclear engineer is the guy charged to take the recipe of the scientist, and carry it to the real world. So it's knowledge about the atomic physics is quite limited, but he also know about materials, buildings, economics, and whatever else useful to build a proper nuclear plant.

Coming back to the Data world, here another example: the guys which developed Convolutional Neural Networks (Yann LeCun) is a Data Scientist, the guy which deploy the model to recognize faces in pictures is a Machine Learning Engineer. The guy responsible of the whole process, from the data acquisition to the registration of the .JPG image, is a Data Engineer.

So, basically, 90% of the Data Scientist today are actually Data Engineers or Machine Learning Engineers, and 90% of the positions opened as Data Scientist actually need Engineers. An easy check: in the interview, you will be asked about how many ML models you deployed in production, not on how many papers on new methods you published.

Instead, when you see announces about "Machine Learning Engineer", that means that the recruiters are well aware of the difference, and they really need someone able to put some model in production.

Vincenzo Lavorini

Posted 2018-02-20T06:15:04.687

Reputation: 506

I've never thought of the nuclear scientists vs. engineer I think this is a thorough answer. It's appropriate to my experience, when i'm doing analysis it's like that white lab coat (jupyter and pretty graphs). When i'm "getting my hands dirty" with engineering production work (etl & webapp containers), i'm constantly finding weird edge cases, bugs, and bad code smell.Tony 2018-02-20T14:52:28.433

6

It may vary from company to company, but Data Scientist as a designation has been around for some time now and is usually meant for extracting knowledge and insights from data.

I have seen data scientists doing

  • Writing Image processing and image recognition algorithms,
  • Design and implement decision trees for a business use case,
  • Or simply design and implement some reports or write ETLs for data transformations.

Data science, however, is a super-domain of machine learning

It employs techniques and theories drawn from many fields within the broad areas of mathematics, statistics, information science, and computer science, in particular from the subdomains of machine learning, classification, cluster analysis, uncertainty quantification, computational science, data mining, databases, and visualization.

Machine learning engineer seems to be a designation where your employer has already narrowed down to the

  • Approach,
  • Tools,
  • and a rough model (of what to deliver)

to extract knowledge or insights from data using machine learning and your work will be to design and implement machine learning algorithms to deliver the same.

gurvinder372

Posted 2018-02-20T06:15:04.687

Reputation: 161

5

[Completely a personal opinion]

When the term 'Data Scientist' overtook 'Statistician', it is more towards sounding cool, rather than any major difference. Similarly, the term 'Deep Learning'. It is just neural networks (which is another Machine Learning algorithm) with a couple of more layers. No one can explain when a particular neural net can be called DL, rather than ML, cause the definition itself is fuzzy. So, is the term 'Data Scientist'.

However, as companies are adopting the DevOps mindset to data science, the term ML Engineer evolved.

What is the DevOps mindset to data science?

This is where you build the model, deploy it and also expected to maintain it in production. This helps in avoiding a lot of friction in software teams.

[PS: DevOps is a way of doing software, more like a philosophy. So, using it as a designation, again confuses me].

So, ML engineers are supposed to know the nuances of systems engineering, ML, and stats (obviously).

A vague generalization would be Data Engineer + Data Scientist = ML Engineer.

Having said that, the designations in this space are becoming vague day by day, and the term 'Statistician' is becoming more and more relevant (the irony!).

Dawny33

Posted 2018-02-20T06:15:04.687

Reputation: 4 688

2Machine Learning is much more than just neural nets (just as an example, consider all kinds of tree-based classifiers), so don't see how "Deep Learning is just Machine Learning with a couple of more layers".Stephan Kolassa 2018-02-20T12:38:51.973

@StephanKolassa Yeah. Agree. Shouldn't have generalized it too much :) Thanks for pointing it out.Dawny33 2018-02-20T13:49:54.397

1(+1) but I don't think "statistician" becoming more relevant is an irony, just... an expected transition? Where are the "operational researchers" these days? ;)usεr11852 2018-02-20T22:28:01.023

3

The terms are nebulous because they are new

Being in the middle of a job search in the 'data science' field, I think that there are two things going on here. First, the jobs are new, and there is no set definitions of various terms, so no commonly agreed upon matching of terms with job descriptions. Compare this to 'web developer' or 'back-end developer.' These are two similar jobs that have reasonably well agreed upon and distinct descriptions.

Second, a lot of people doing the job posting and initial interviews don't know that well what they are hiring for. This is particularly true in the case of small to medium sized-companies that hire recruiters to find applicants for them. It is these intermediaries that are posting the job descriptions on CareerBuilder or whatever forum. This isn't to say that many of them don't know their stuff, many of them are quite knowledgeable about the companies they represent and the requirements of the workplace. But, without well defined terms to describe different specific jobs, nebulous job titles are often the result.

There are three general divisions of the field

In my experience, there are three general divisions of the 'job space' of data science.

The first is the development of the mathematical and computational techniques that make data science possible. This covers things like statistical research into new machine learning methods, the implementation of these methods, and the building of computational infrastructure to employ these methods in the real world. This is the division farthest separated from the customer, and the smallest division. Much of this work is done by either academics or researchers at the big companies (Google, Facebook, etc). This is for things like developing Google's TensorFlow, IBM's SPSS neural nets, or whatever the next big graph database is going to be.

The second division is using the underlying tools to create application specific packages to perform whatever data analysis needs to be done. People are hired to use Python or R or whatever to build analysis capability on some set of data. A lot of this work, in my experience, involves doing the 'data laundry,' turning raw data in whatever form into something usable. Another big chunk of this work is databasing; figuring out how to store the data in a way that it can be accessed in whatever timeline you need it in. This job isn't so much taking tools, but using existing database, statistics, and graphical analysis libraries to produce some results.

The third division is producing analysis from the newly organized and accessible data. This is the most customer facing side, depending on your organization. You have to produce analysis that business leaders can use to make decisions. This would be the least technical of the three divisions; many jobs are hybrids between the second and third divisions at this point, since data science is in its infancy. But in the future, I strongly suspect that there will be a more clean division between these two jobs, with people win the second job needing a technical, computer science or statistics based education, and this third job needing only a general education.

In general, all three could describe themselves as 'data scientist', but only the first two could reasonably describe themselves as 'machine learning engineer.'

Conclusion

For the time being, you will have to find out yourself what each job entails. My current job hired me on as an 'analyst,' to do some machine learning stuff. But as we got to work, it became apparent that the company's databasing was inadequate, and now probably 90% of my time is spent working on the databases. My machine learning exposure is now just quickly running stuff through whatever scikit-learn package seems most appropriate, and shooting csv files to the third division analysts to make powerpoint presentations for the customer.

The field is in flux. A lot of organizations are trying to add data science decision making to their processes, but without knowing clearly what that means. Its not their fault, its pretty hard to predict the future, and the ramifications of a new technology are never very clear. Until the field is more established, many jobs themselves will be as nebulous as the terms used to describe them.

kingledion

Posted 2018-02-20T06:15:04.687

Reputation: 223

1

Machine Learning Engineers and engineering focused Data Scientist are the same, but not all Data Scientist are engineering focused. About 5 years ago almost all Data Scientist were engineering focused, e.g, they had to write production code. Now, however, there are many Data Scientist roles that are for most part: playing in Jupyter notebook, understanding data, making pretty graphs, explaining to clients, managers, analysts... They don't do any engineering. And I believe that term Machine Learning Engineers came up to underline that this an engineering position.

Akavall

Posted 2018-02-20T06:15:04.687

Reputation: 236

0

I don't disagree with any of the answers given. However, I do think that there is a role of Data Scientist that is being glossed over in virtually all of the answers here. Most of these answers say something to the effect of, "Well, an engineer just writes and deploys the model . . . ". Hold on a sec - there's A LOT of work in those two steps!

My core definition of a Data Scientist is someone that applies the scientific method to working with data. So I'm constantly thinking of hypostheses, designing tests, collecting my data and executing those tests, checking my cross validation results, trying new approaches, transforming my data, etc, etc. That's essentially what goes into "just writes and deploys the model" in a professional setting.

So, for your answer, I think "the devil is in the details" because you can't just gloss over some of these steps/terms. Also, if you are job hunting, you should be careful because "data engineer" and "data scientist" can have woefully different pay scales - you do not want to be a data scientist on a data engineer salary!

I always put myself out there as a data scientist, I tell companies that I work on predictive models (not just analytical) and that I'm not an Excel jockey - I write in programming languages (R, Python, etc). If you can find a position that let's you do both of those, then you're on your way to being a data scientist.

Unknown Coder

Posted 2018-02-20T06:15:04.687

Reputation: 275

0

TL;DR: It depends on who is asking.

The answer to this question depends largely on the expectations, knowledge, and experience of whomever is askinhg. An analogous question with just as fuzzy of an answer is:

What is the difference between a software developer, a software engineer, and a computer scientist?

To some people, particularly people who study or teach computer science and software engineering, there is a large and defined difference between these fields. But to the average HR worker, technical recruiter, or manager, these are all just "Computer People".

I love this quote by Vincent Granville, emphasis mine:

Earlier in my career (circa 1990) I worked on image remote sensing technology, among other things to identify patterns (or shapes or features, for instance lakes) in satellite images and to perform image segmentation: at that time my research was labeled as computational statistics, but the people doing the exact same thing in the computer science department next door in my home university, called their research artificial intelligence. Today, it would be called data science or artificial intelligence, the sub-domains being signal processing, computer vision or IoT.

lfalin

Posted 2018-02-20T06:15:04.687

Reputation: 101