Thinking about candidates in the #DataScience job market:— Levi Bowles (@LeviABx) October 22, 2019
40% just out of school, no real experience/skills.
40% 1-2 years out of school, very limited experience/skills.
15% David Shing hype clones.
4% experienced, good, but incomplete skills.
1% unicorn candidates.
As the weeks have gone by (and my employment situation has become a bit weird--in good ways), I've often thought about that tweet. Essentially, since the day I said this, I've noticed it to be more true than I originally thought--Especially on the Shingy front.
I thought it would might good to put together some more in-depth thoughts on the different types of Data Science candidates I see on the market and how they relate to real roles in companies. In result, this blog can serve as a helpful guide in building a data science team at your organization. I've gone into more detail of the five types of candidates below.
Directly Out of School
The trademark of these candidates are recently completing grad school with no or limited work experience. A few resumes will have a history of internships, or will try to pass off class projects as "jobs they worked." My general thoughts:- There are a lot of these candidates-a lot.
- They harass Data Science managers regularly on social media, LinkedIn and email.
- Many of them are not yet talented, and are going to take a lot of work before they're tackling projects on their own.
- Some of them have not been screened for what I call "employability" and are unlikely to survive the rigor, rules and norms of the workplace.
- Be careful of buzzword slingers.
Couple of Years Experience
These candidates have been out in the job market for one to five years and generally with one or two employers. Resumes will generally have real projects worked on listed, though being a junior-level employee, you don't know what their actual role involved. My thoughts on this group:- There are quite a few in this group as well.
- The loudest ones are usually in the process of failing out of a first job.
- A lot of them are well on their way to a great career, though.
- Their skills may be largely defined by the experience of their first job-so there will still be a big training job ahead.
- It is very important to determine *exactly* what their roles are on teams and projects, beware of Tableau jockeys and spreadsheet analysts pumping their resume.
Shingy Clones
First, who's shingy? This guy.These candidates are full of hot air, very much hyped on Data Science as a concept--also other hyped concepts--have fun getting them to talk about block chain. The dark side of course is that they have no Data Science abilities and are just low-rent hype people
- They will come with a lot of energy and enthusiasm, which can be hypnotizing, especially for executives.
- These people are the definition of why the interview process is critical.
- They are completely destructive if you hire one, will always be hyping and saying we need new technology or to do "x". However they don't have the skills or knowledge to understand what they are suggesting or how to deliver.
- They have no clue what they are talking about.
Advice: You can't hire these people. They are going to be a high cost with zero deliverables. To avoid this put some matrix algebra or simple calculus questions on your interview. Coefficient interpretation? Ask them questions about coding in un-sexy languages (SQL). On difficult questions this group will break.
(As an aside, I've had a few of these people try to gaslight me, and then end up yelling at me in an interview. It's not fun to be yelled at, but when this happens, I know I've dodged a bullet in calling someone out.)
Experienced Statisticians
These candidates are more advanced in their career, and often will shun the term Data Scientist. They may be less striking at first, and certainly with less flash than a 25-year-old machine learning expert, but add a ton of value to your organization.- These candidates generally build models, often outperforming data scientists models while using simpler, more elegant methods.
- They can be great mentors to young data scientists-if junior staff are willing to listen.
- They often lack machine learning or big data system (e.g. Hadoop).
- They will also lack some more modern coding/computing skills (e.g. containerization, cloud, etc).
- One successful tactic is to use this type of employee on a project team with a machine learning expert and a data engineer. The data engineer will bring the technical coding skills, the machine learning expert will bring modern methods, and the more experienced employee will bring research design and rigor.
Unicorns
These are the classic Data Scientists that many organizations are looking for. Their traits:- 15+ years in building machine learning models.
- 15+ years building econometric models.
- Production level developer who has built massive productionized ML systems
- Hadoop/Spark developer.
- 100x ROI.
- Virtually non-existent.
Summary
A few months ago a recruiter called me and asked if I had 15 years of experience in Hadoop. This is an absurd question given that Hadoop's first release was in 2006, but it speaks to an underlying truth: many organizations are looking for a Data Science candidate pool that simply does not exist. I hope that the essential takeaway allows you to build a reasonable Data Science team with building blocks based on the talent actually available. A reasonable team might involve:- Directly out of School: 1-2 FTE
- Couple Years Experience: 2 FTE
- Shingy Clones: 0 FTE
- Experienced Statisticians: 1 FTE
- Unicorns: 1 - if you can find one, but not necessary
As an aside, the model without "Unicorn" candidates will likely require substantial help from data engineers and some developers in order to get data into systems, and models into production. This does create some inefficiencies, but is often less expensive than finding a unicorn candidate.