Friday, November 10, 2017

A shortage of data scientists - really?

One of the standards of our industry at the moment is that there is, right now (and in the future), a crucial  and serious shortage of data scientists. I was at an industry seminar last week (a roadshow that had been all over the country) where a senior executive from a BI company repeated this "axiom". She showed some charts and numbers to back up her comment. All was accepted uncritically by the audience. It's what we want to believe.

... but ...

Her numbers were based on predictions that were relevant to the US., and she'd just made some simple (and reasonable) adjustments for the size and nature of Australia's economy to come up with numbers representing the shortage in Australia. A reasonable approach, but, I have a problem those U.S.-based predictions in the first place. I think - based on the data we collect at Monash (and I say "we", as I'm a guest at Monash now, and not a full-time staff member) - they are way out.

No doubt there is a significant market for data scientists in Australia. Just log on to, say, Seek.com today and you'll see 200 odd jobs. I think - and I didn't collect the data previously, so I'm guessing - that most of the jobs listed aren't "new" in the sense that most people want to believe - that there's a revolution happening. Most are for jobs that only a few years ago would have been called statistician, and econometrician or operational researcher. Just now these roles labelled being labelled "data scientist". If you delve into the jobs, you'll see listed as required qualification, a degree in "Data Science, Analytics, Operational Research, Engineering, Econometrics, Applied Mathematic or Computer Science" - except for "Data Science" in that list (from an actual current role) all those degrees have been around for many years. The job label "data scientist" is new, and that's good, but relabelling a category of job (even it it represents a maturation or appreciation of the role of IT and of data in those roles) isn't a revolution.

Regardless of what they are called, what of the numbers of jobs themselves, are they growing? In 2015 a feature article in the Australian Newspaper agreed with the still prevailing wisdom that the market for data scientists was growing fast. A 77% annual increase is noted in the article. Well - that might be true, I started collecting data on how many jobs there where for roles matching the description "data scientist" at that time, and I think at the time the article was published there was actually about 70 odd jobs listed on-line, so that could be consistent with the numbers quotes if the growth came from a very low base. What about now, 2 years later? Well not much has happened. The growth has been solid but slow. There are 200 odd listed right now (jobs tend to list for 30 days - so the total count is kind of a moving average), but that's up dramatically in the last few days due to a lot of cross posting for roles in the federal govt. Dept. of Human Services (Data Science jobs seem to get cross posted - list with more than one agency - at a much higher rate than other jobs). So there's a bit of a spike right now. So the longer term trend is slower growth from nothing to a couple of hundred in the last few years. Is that the start of a major gap between supply and demand? I don't think so. Universities all over the country now have graduate and undergraduate offerings in data science. These courses are popular, and a lot of graduates are being produced. Graduates with qualifications in computer science, statistics, mathematical modelling are also being still being produced. Most of the students attracted to the graduate courses have come from overseas, so it's not like there has been displacement from one University course to another at the expense of the "older" courses.

The long term job trend for "data science" - well - I think it's pretty flat. Its largely - for the past 18 months - been between 100 and 200 active listed vacancies. To give some context - there are about 700 active listed roles in traditional BI, and about 2,500 for JavaScript programmers (the leading required language skill for programmers). So no, there's not a massive shortage of people with data science qualifications, and there isn't a massive job market for data scientists.

Feel free to check my numbers here: http://dsslab.infotech.monash.edu.au:8080/datajobs/, or follow the Twitter account @MonashBIIndex

Wednesday, March 15, 2017

Arrgh. Why do people believe stories that are too good to be true?

OMG help me! It's a simple rule – if it's too good to be true it probably isn't. I'm hot under the collar right now from reading an academic paper that has fallen hook, line and sinker for an urban myth.
It's hard not to get lost in hype. We all believe what we want to believe. We are hard-wired to 'see' the world in a way that confirms to our existing beliefs (it's called the confirmation bias). 
In IT, there is plenty of hype and lots of selective (and sometimes unconscious) use of evidence as justification for positions and beliefs. That's just the way the the world is, and I love that as an academic I have the freedom to rock the boat occasionally and poke fun at some of what goes on in industry from time to time.
I think academics have a lot to offer industry, even though we use often use language that is inaccessible to practitioners (largely because we are writing for other academics). We provide thinking that is objective, theory rich and evidence based. And when it's explained or presented in the right way, this can provide a useful perspective for practitioners to deal with real problems and issues.
However, in the last little bit, many academics have been swept up in the hype surrounding Big Data and Data Science. Davenport's famous statement that being a data scientist is 'The Sexiest Job of the 21st Century' is thrown into conversations, papers and presentations uncritically by academics and practitioners alike. We desperately want this to be true, but seriously, if committing R-code to a GIT repository is sexy, then I'm in need of a whole a new definition of sexy. As a result, we've forgotten what our role should be and have unthinkingly dived head-first into the role of Data Science evangelism. 
That 'need to believe' is going to cause us problems. Data Science is, of course, great, but it has its limitations, and there are many many problems with what used to be called the 'normative approach' to decision support. These problems have been well understood since the 70s and 80s and aren't changed by the use of Hadoop or R, or whatever is the new silver bullet technology for crunching large amounts of data. Nobody that I've seen working on Data Science is seriously addressing these issues. (Have a read of Peter Keen's insightful review of problems facing the approach - written in 1987 - "Decision Support Systems: The Next Decade"). The work being done on Data Science is almost exclusively focussed on the development of new technologies and techniques for crunching data – that's nice, and makes for neat applications, but it doesn't change the kinds of problems that will be solved (which are generally narrow, structured, well defined and almost always operational.)
In reading about Data Science today, I ran head-on into the kind of mistake that an academic shouldn't make. I was reading a paper in a very good journal – highly rated, peer reviewed – on the ROI of data science. It was the kind of journal where, if you are able to get a paper published, your career is made (at least for short while). In its abstract and main body of discussion, the paper repeated an often told story of a large US-based retailer (Target) that used an algorithm to predict which female customers were pregnant, and used this information to send them offers. It's a great anecdote that illustrates the power of predictive algorithms while also showing the ethical line that can be crossed by Big Data analysis. As the paper's authors stated, the predictive algorithm "proved to be an invasion of privacy into the life of a minor [a teenage girl who was correctly identified by the algorithm as being pregnant] and informed her father of her untimely pregnancy prematurely". 
It's a great story, but ... sadly not true. Despite being widely reported in the trade press, it has just as equally widely been refuted. (Fake news!) 
Academics should be better than this. So should peer reviewed journals. Our job should be to seek truth, not to add credibility to made up stories.