Thursday, January 2, 2020

Data Science Job Market

A few weeks ago I was thinking about the state of the current Data Science job market--and a bit frustrated about inquiries by job seekers on LinkedIn, I hastily sent out a tweet on my thoughts..

As the weeks have gone by (and my employment situation has become a bit weird--in good ways), I've often thought about that tweet.  Essentially, since the day I said this, I've noticed it to be more true than I originally thought--Especially on the Shingy front.

I thought it would might good to put together some more in-depth thoughts on the different types of Data Science candidates I see on the market and how they relate to real roles in companies. In result, this blog can serve as a helpful guide in building a data science team at your organization.  I've gone into more detail of the five types of candidates below.

Directly Out of School

The trademark of these candidates are recently completing grad school with no or limited work experience.  A few resumes will have a history of internships, or will try to pass off class projects as "jobs they worked." My general thoughts:
  • There are a lot of these candidates-a lot.
  • They harass Data Science managers regularly on social media, LinkedIn and email.
  • Many of them are not yet talented, and are going to take a lot of work before they're tackling projects on their own.
  • Some of them have not been screened for what I call "employability" and are unlikely to survive the rigor, rules and norms of the workplace.
  • Be careful of buzzword slingers.
Advice: You can take on a few of these and they can add value for your team over time.  But you can and should screen heavily before choosing your candidate.  The sheer number of these candidates on the market gives the employer the luxury of being picky-and because many of these candidates come untested and without serious references (professors are not real references) you need to use your best BS detecting.


Couple of Years Experience

These candidates have been out in the job market for one to five years and generally with one or two employers.  Resumes will generally have real projects worked on listed, though being a junior-level employee, you don't know what their actual role involved.  My thoughts on this group:
  • There are quite a few in this group as well.
  • The loudest ones are usually in the process of failing out of a first job.
  • A lot of them are well on their way to a great career, though.
  • Their skills may be largely defined by the experience of their first job-so there will still be a big training job ahead. 
  • It is very important to determine *exactly* what their roles are on teams and projects, beware of Tableau jockeys and spreadsheet analysts pumping their resume.
Advice: These employees are generally a great investment, though you have to be very careful not to be picking up another organization's castoffs.  They can begin to take on larger projects on their own, or serve as junior mentors to the first category of employees.  This group should be seen as your bridge the future Data Science team, the group that will be your Senior and Principal data scientists within 2-5 years.


Shingy Clones

First, who's shingy?  This guy.





These candidates are full of hot air, very much hyped on Data Science as a concept--also other hyped concepts--have fun getting them to talk about block chain.  The dark side of course is that they have no Data Science abilities and are just low-rent hype people
  • They will come with a lot of energy and enthusiasm, which can be hypnotizing, especially for executives.
  • These people are the definition of why the interview process is critical.
  • They are completely destructive if you hire one, will always be hyping and saying we need new technology or to do "x".  However they don't have the skills or knowledge to understand what they are suggesting or how to deliver.
  • They have no clue what they are talking about.
Advice: You can't hire these people.  They are going to be a high cost with zero deliverables.  To avoid this put some matrix algebra or simple calculus questions on your interview.   Coefficient interpretation? Ask them questions about coding in un-sexy languages (SQL).  On difficult questions this group will break.  

(As an aside, I've had a few of these people try to gaslight me, and then end up yelling at me in an interview.  It's not fun to be yelled at, but when this happens, I know I've dodged a bullet in calling someone out.)


Experienced Statisticians

These candidates are more advanced in their career, and often will shun the term Data Scientist.  They may be less striking at first, and certainly with less flash than a 25-year-old machine learning expert, but add a ton of value to your organization.
  • These candidates generally build models, often outperforming data scientists models while using simpler, more elegant methods.
  • They can be great mentors to young data scientists-if junior staff are willing to listen.
  • They often lack machine learning or big data system (e.g. Hadoop).
  • They will also lack some more modern coding/computing skills (e.g. containerization, cloud, etc).
  • One successful tactic is to use this type of employee on a project team with a machine learning expert and a data engineer.  The data engineer will bring the technical coding skills, the machine learning expert will bring modern methods, and the more experienced employee will bring research design and rigor.
Advice: These candidates are some of the best deals on the market, mainly because they can mentor and "fix" a lot of the missing knowledge of young data scientists.  Younger data scientists tend to have bad habits, or in cases massive holes in their skillsets and intuition around research design, rigor, probability, and statistics.  And more simply put-these employees often can do better work, using older methods and are great mentors.


Unicorns

These are the classic Data Scientists that many organizations are looking for.  Their traits:
  • 15+ years in building machine learning models.
  • 15+ years building econometric models.
  • Production level developer who has built massive productionized ML systems
  • Hadoop/Spark developer.
  • 100x ROI.
  • Virtually non-existent.
I'm being a bit hyperbolic, but these candidates essentially don't exist.  Well, some do, but you may not want to pay the premium involved (it's high).  If you can find one, by all means hire.  On the other hand, you can build an all-star team fairly well by focusing on building blocks.

Summary

A few months ago a recruiter called me and asked if I had 15 years of experience in Hadoop.  This is an absurd question given that Hadoop's first release was in 2006, but it speaks to an underlying truth:  many organizations are looking for a Data Science candidate pool that simply does not exist.  I hope that the essential takeaway allows you to build a reasonable Data Science team with building blocks based on the talent actually available. A reasonable team might involve:
  • Directly out of School: 1-2 FTE
  • Couple Years Experience: 2 FTE
  • Shingy Clones: 0 FTE
  • Experienced Statisticians: 1 FTE
  • Unicorns: 1 - if you can find one, but not necessary
As an aside, the model without "Unicorn" candidates will likely require substantial help from data engineers and some developers in order to get data into systems, and models into production.  This does create some inefficiencies, but is often less expensive than finding a unicorn candidate.