Categories
Uncategorized

WHAT IS A DATA SCIENTIST?

Data and scientist are two words that have been around a long time with crystal clear definitions, yet together they do nothing but cause controversy. This blog aims to dig into the professions purpose and reputation with an eye on what it means for your career. It’s key reading to understanding our blog as it frames and gives context to our articles and colours our opinions throughout. That being said it is far from a definitive, exhaustive list!

WHY THE CONFUSION?

Data science is a relatively new job title and no one argues over what areas of skill it encompasses. A data scientist will not get hired or be effective without proficiency in:

Math 

Statistics

Coding

Machine learning / AI

Data cleaning /preparation

Berkeley University has a fantastic and detailed diagram of what they consider the role to mean where all of the above are involved (link).

For comparison, the popular Google backed MOOC, Udacity offers this chart (link), framing the position against others which entail crossover skills.

JOB DESCRIPTIONS

But simply looking at job postings makes you realise how different the roles are. Here are snippets from a search on LinkedIn Jobs (March 2020). These are simply the first results on my page, I’ve not filtered them. Also as these are local examples there is an increased relevance for this blog and its readers!

iFindTech, Oslo

“The position is within Advanced Analytics and involves Machine Learning. A big part of this role will involve the building of the data architecture, aws, and api’s.

It will also involve a lot of data extraction and make data available. 

You will be managing the domain, and will be heading the analytics and data science, Building up the area, including architecture.”

Varner, Oslo

“Solve complex data challenges. Develop strong business cases together with domain experts and business managers. Test your hypotheses and assumptions.Communicate and coordinate well with business operational experts throughout the process from problem identification to production roll-out. Create AI models that scales when it is in production.

Your model generates business value, such as achieving certain revenue KPIs.“

Cognizant, Oslo

“responsible for delivering analytics ranging from advanced statistical (predictive) modeling, Machine learning algorithms, Operations Research, and solutions of AI nature. You will be working in a variety of industries and using a diverse set of tools to bring insights out of complex data. Accessing a range of data stored in disparate systems, integrating data and providing data mining to answer specific business questions as well as identifying unknown trends and relationships in data.” 

Microsoft, Oslo

“You will work with massive amounts of data collected from multiple search services running on more than 100.000 servers.You will be responsible for answering questions from senior leaders and stakeholders in high profile projects to help them make data driven decisions. Need: Experience with big data tools like Apache Hadoop, Apache Spark, AWS or Azure Data Lake.Experience building and optimizing ‘big data’ data pipelines, architectures and data sets.”

Looking at these posts there are the obvious commonalities mentioned earlier. The company size clearly makes a difference in what size of a team you work in and how independant your contributions are. So how do these adverts match up to that diagram above?

Interestingly both iFindTech and Varner are looking for a candidate to create and productionize models from scratch. This implies a focus on software engineering skills yet iFindTech has a focus on more typical analytics as a smaller company (50≈ employees) whilst Varner is looking to get started with AI, which could mean simply machine learning.

Cognizant and Microsoft are massive mature organisations and thus the volume and complexity of the data is likely to be far in excess of the other companies. Cognizant looks to be a more independent research focussed role whereas the Microsoft position stresses working at ‘scale’ and suggests more of a team based role.

HIRING GOALS

Here are some further thoughts about what a company might have in mind when creating a job advert for a data science. There is a lot of technical theoretical knowledge required in an education of data science but the complexity of the jobs mean that industry experience can come to define a career path. A company could be:

Looking for an AI wizard-

The promise and hype of AI has reached fever pitch in recent years with world changing technologies such as self driving cars. Almost every business is looking to see how this could work for them. A skilled data scientist is primed not only with the skills to implement the algorithms and data structures of AI or ML but they can also use their skillset to apply this in the right place in the organisation.

Looking for a data analyst-

Parsing data is by no means purely a modern role as at its core is a skill set in maths and statistics. Can you avoid biases such as recency, selection, inclusion in making sense of your data? The additional demands of modern data sets and analytical tools mean that a software engineering skillset can help conclusions come faster but also uniquely through ML tools.

Looking to build a data science development pipeline-

Aimed at candidates who were previously software engineers, the company is getting started with the transition into becoming a ‘data driven’ company. Although there are many pre built systems and tools available thanks to the open source community, there is always an element of customization or specialization required.

Looking for research-

Many data scientists are active, publishing members of the scientific community. Their ambitions lie in working at the frontiers of what we know about certain problems. A company may hire this type of data scientist in multiple ways but two that come to my mind would be firstly, their hiring in the general sense. Give them absolute freedom to ‘discover’ something new in the business. Secondly they could be hired for a specific problem where the solution requires collaboration with the scientific community at large.

DAILY INSIGHT

So far we have covered data science in a very general way so in this section I aim to offer a little more insight into what the role looks like day to day.

From Jason Goodman (3 years DS experience) – 

  • “Hadoop, Spark, Yarn, Julia, Kafka, Airflow, Scalding, Redshift, Hive, TensorFlow, Kubernetes… there are a seemingly unending number of data science coding languages, frameworks, and tools.Unless you’re going for a really specialized role, they’ll expect you can learn their stack on the job.
  • You should go deep on the basic tools you use daily. You’ll never regret learning the boring parts of whatever SQL dialect your company uses.
  • A lot of very technical data scientists’ careers are implicitly limited because they can’t write or speak clearly.
  • With Kaggle, the hardest part has already been done for you: collecting, cleaning, and defining the problem to be solved with that data.
  • Go to events — hackathons, conferences, meetupsDoing so will give you a better understanding of the realities of the field” (article link)

From Crowdflower Research – (report 1 and report 2)

MISCONCEPTIONS

Despite data science being touted as the ultimate job (link), the role isn’t quite met with positivity all round. From the job excerpts above one can deduce the implication of a wide ranging skillset associated with the title. Is this generality healthy or should companies define roles in more specific areas? 

For local relevance and for an honest opinion I’ll include this quote from Håkon Hapnes Strand (Data Scientist at Webstep, Stavanger) who is popular on Quora (profile).

“There is relatively little algorithmic finesse in the daily work of a data scientist. We reuse library implementations and embed them in standardized pipelines. There is often more algorithmic complexity in backend software engineering than in data science, and more mathematical complexity in the work of a statistician or actuary.

We’re not really scientists. We’re glorified business developers with powerful tools. But that’s okay. There is actually plenty of room for creativity in solving business problems with data, and those powerful tools allow for high-impact solutions. However, if you enter the career expecting to develop low-level machine learning algorithms, you may be disappointed.” (full answer).

The thread has many answers from all types of math or software backgrounds and makes for interesting reading. Of course these are personal comments on a social site but they present a challenge to the industry.

Another chart from the aforementioned Crowdflower report (link) highlights the crossover between the most common data science activity with that of the least enjoyable! 

CLOSING THOUGHTS

As the popular MOOC, Udacity states “data scientist is often used as a blanket title to describe jobs that are drastically different.” The modern advent of big data and machine learning/AI tools has created a demand for many new combinations of skill sets. As the field matures and as businesses become more and more data driven I’m sure we will see a more defined picture emerge of what it means to be a data scientist. 

My guess is that the scientist part of the title will begin to take on more significance and prestige, thus it will be harder to claim you are doing science rather than just doing data. A lot of roles may become more defined, more segmented and many current data scientists will continue their careers defined as data engineers or analysts, or big data specialists. 

Leave a Reply