Friday, November 10, 2017

A shortage of data scientists - really?

One of the standards of our industry at the moment is that there is, right now (and in the future), a crucial  and serious shortage of data scientists. I was at an industry seminar last week (a roadshow that had been all over the country) where a senior executive from a BI company repeated this "axiom". She showed some charts and numbers to back up her comment. All was accepted uncritically by the audience. It's what we want to believe.

... but ...

Her numbers were based on predictions that were relevant to the US., and she'd just made some simple (and reasonable) adjustments for the size and nature of Australia's economy to come up with numbers representing the shortage in Australia. A reasonable approach, but, I have a problem those U.S.-based predictions in the first place. I think - based on the data we collect at Monash (and I say "we", as I'm a guest at Monash now, and not a full-time staff member) - they are way out.

No doubt there is a significant market for data scientists in Australia. Just log on to, say, Seek.com today and you'll see 200 odd jobs. I think - and I didn't collect the data previously, so I'm guessing - that most of the jobs listed aren't "new" in the sense that most people want to believe - that there's a revolution happening. Most are for jobs that only a few years ago would have been called statistician, and econometrician or operational researcher. Just now these roles labelled being labelled "data scientist". If you delve into the jobs, you'll see listed as required qualification, a degree in "Data Science, Analytics, Operational Research, Engineering, Econometrics, Applied Mathematic or Computer Science" - except for "Data Science" in that list (from an actual current role) all those degrees have been around for many years. The job label "data scientist" is new, and that's good, but relabelling a category of job (even it it represents a maturation or appreciation of the role of IT and of data in those roles) isn't a revolution.

Regardless of what they are called, what of the numbers of jobs themselves, are they growing? In 2015 a feature article in the Australian Newspaper agreed with the still prevailing wisdom that the market for data scientists was growing fast. A 77% annual increase is noted in the article. Well - that might be true, I started collecting data on how many jobs there where for roles matching the description "data scientist" at that time, and I think at the time the article was published there was actually about 70 odd jobs listed on-line, so that could be consistent with the numbers quotes if the growth came from a very low base. What about now, 2 years later? Well not much has happened. The growth has been solid but slow. There are 200 odd listed right now (jobs tend to list for 30 days - so the total count is kind of a moving average), but that's up dramatically in the last few days due to a lot of cross posting for roles in the federal govt. Dept. of Human Services (Data Science jobs seem to get cross posted - list with more than one agency - at a much higher rate than other jobs). So there's a bit of a spike right now. So the longer term trend is slower growth from nothing to a couple of hundred in the last few years. Is that the start of a major gap between supply and demand? I don't think so. Universities all over the country now have graduate and undergraduate offerings in data science. These courses are popular, and a lot of graduates are being produced. Graduates with qualifications in computer science, statistics, mathematical modelling are also being still being produced. Most of the students attracted to the graduate courses have come from overseas, so it's not like there has been displacement from one University course to another at the expense of the "older" courses.

The long term job trend for "data science" - well - I think it's pretty flat. Its largely - for the past 18 months - been between 100 and 200 active listed vacancies. To give some context - there are about 700 active listed roles in traditional BI, and about 2,500 for JavaScript programmers (the leading required language skill for programmers). So no, there's not a massive shortage of people with data science qualifications, and there isn't a massive job market for data scientists.

Feel free to check my numbers here: http://dsslab.infotech.monash.edu.au:8080/datajobs/, or follow the Twitter account @MonashBIIndex

Wednesday, March 15, 2017

Arrgh. Why do people believe stories that are too good to be true?

OMG help me! It's a simple rule – if it's too good to be true it probably isn't. I'm hot under the collar right now from reading an academic paper that has fallen hook, line and sinker for an urban myth.
It's hard not to get lost in hype. We all believe what we want to believe. We are hard-wired to 'see' the world in a way that confirms to our existing beliefs (it's called the confirmation bias). 
In IT, there is plenty of hype and lots of selective (and sometimes unconscious) use of evidence as justification for positions and beliefs. That's just the way the the world is, and I love that as an academic I have the freedom to rock the boat occasionally and poke fun at some of what goes on in industry from time to time.
I think academics have a lot to offer industry, even though we use often use language that is inaccessible to practitioners (largely because we are writing for other academics). We provide thinking that is objective, theory rich and evidence based. And when it's explained or presented in the right way, this can provide a useful perspective for practitioners to deal with real problems and issues.
However, in the last little bit, many academics have been swept up in the hype surrounding Big Data and Data Science. Davenport's famous statement that being a data scientist is 'The Sexiest Job of the 21st Century' is thrown into conversations, papers and presentations uncritically by academics and practitioners alike. We desperately want this to be true, but seriously, if committing R-code to a GIT repository is sexy, then I'm in need of a whole a new definition of sexy. As a result, we've forgotten what our role should be and have unthinkingly dived head-first into the role of Data Science evangelism. 
That 'need to believe' is going to cause us problems. Data Science is, of course, great, but it has its limitations, and there are many many problems with what used to be called the 'normative approach' to decision support. These problems have been well understood since the 70s and 80s and aren't changed by the use of Hadoop or R, or whatever is the new silver bullet technology for crunching large amounts of data. Nobody that I've seen working on Data Science is seriously addressing these issues. (Have a read of Peter Keen's insightful review of problems facing the approach - written in 1987 - "Decision Support Systems: The Next Decade"). The work being done on Data Science is almost exclusively focussed on the development of new technologies and techniques for crunching data – that's nice, and makes for neat applications, but it doesn't change the kinds of problems that will be solved (which are generally narrow, structured, well defined and almost always operational.)
In reading about Data Science today, I ran head-on into the kind of mistake that an academic shouldn't make. I was reading a paper in a very good journal – highly rated, peer reviewed – on the ROI of data science. It was the kind of journal where, if you are able to get a paper published, your career is made (at least for short while). In its abstract and main body of discussion, the paper repeated an often told story of a large US-based retailer (Target) that used an algorithm to predict which female customers were pregnant, and used this information to send them offers. It's a great anecdote that illustrates the power of predictive algorithms while also showing the ethical line that can be crossed by Big Data analysis. As the paper's authors stated, the predictive algorithm "proved to be an invasion of privacy into the life of a minor [a teenage girl who was correctly identified by the algorithm as being pregnant] and informed her father of her untimely pregnancy prematurely". 
It's a great story, but ... sadly not true. Despite being widely reported in the trade press, it has just as equally widely been refuted. (Fake news!) 
Academics should be better than this. So should peer reviewed journals. Our job should be to seek truth, not to add credibility to made up stories.

Friday, December 4, 2015

Big Data: Massive Ado about ... Nothing?

Some of the hype circulating right now around the topics of data science and big data are really annoying me - used to find it amusing, but now it's starting to make me angry. There is a lot too the movements of data science and big data - don't get me wrong - but a lot of what's being said is just plain wrong and doomed to fail.

My main issue is the complete ignorance of history that is shown by people talking up the concepts. They show - often - that they know nothing of the history of work in the area. The most important thing that they miss in doing that is the hard earned lesson that normative approaches to decision making simply don't work unless they are applied collaboratively, with decision makers understanding the models developed. The conceptual frameworks developed for data science and big data at the moment totally ignore human decision making. They assume more data is better for decision making - it's not. Time to take deep breath I think and build a better and reasoned critique of what missing and why it won't work (I'll post here when I've calmed down).

POD (sitting in a "big data" panel session at a major academic conference, fuming)
Just saw a "big data" process model that ignored decision making, and had developing a business case as step 5 of 6. OMG


Wednesday, October 28, 2015

Here is a link to my ppt slide deck: https://app.box.com/s/yd9i9fh1go1pbbr4cvwzj62i3yboq3mk

A reading list relevant to his Mastering SAP BA presentation soon.

POD

Tuesday, June 23, 2015


Peter O’Donnell, a lecturer in Monash University’s Decision Support Systems laboratory, along with Masters student Yasmine AL Ahmadi, is running a project aimed at improving medical reporting. The idea is to apply BI design techniques to reports containing medical test results. If you can spare 5 to 10 minutes you can help. Click the link below to find out more about the project.

If you participate in the study you simply use a browser to access a Web-based system that will take you through the “experiment”. After answering some some simple demographic questions (you remain anonymous), and then you will be shown a series of reports showing some medical test results. You will be asked answer three simple questions about each report. It should take no longer than 5 to 10 minutes to complete the task.

We would greatly appreciate it if you could help us conduct this project either by participating in the study yourself, or letting others who you think would like to participate know about the project. For further information about the study please access the following URL:




Thank you
Peter and Yasmine 

Monday, October 27, 2014

What Next for Business Intelligence? Big Data Analytics, Hype and All That Jazz

The slide deck:

  • 1 per page [pdf]
  • 4 per page [pdf]
  • 6 per page [pdf]
More to come :-)

- POD

Saturday, November 26, 2011

DSS LAB Internship Sharing: BI application prototype development on iPad

The DSS Lab runs an internship programme for coursework students to get a taste of what it's like to be an active participant in the research group. Jason Lu recently completed a project where he developed a proof of concept iOS interface for a BI reporting tool, something that we hope to be able to use to do research on mobile and touch interfaces for BI. - Rob
In recent years, mobile BI has become a frequent topic of discussion among BI vendors and users, especially after the success of the iPhone and iPad since 2008. Smart phones and tablet PCs are performing an essential role in managers' daily lives, however, BI users and vendors face challenges with using and developing mobile BI applications.

Over the last 15 weeks, I have worked as a research intern in the DSS Lab to develop an iPad application prototype for exploring the dimensional structures of BI reports. In this prototype, I adopted an approach that presented the user interface using Apple's WebView controller, as well as Javascript and HTML. Users select report generation criteria in the user interface and the criteria will be sent to a web server that, in turn, uses SQL to create a query command  for execution on a data warehouse. Finally, the information for the report is sent back to the client system on the iPad to display the report.

One of the issues for the BI software vendors is the need to connect to a server to retrieve data every time the users attempts to generate a report. This creates a problem when using the application without a network connection, such as when flying. One area of further research may be to try an alternative that stores temporary report data locally in the iPad. However, this may present a security issue if the user loses the device.

Some other areas of future research might also include investigating user interface principles for touch such as pre-defined gestures for things like drill down and roll up.

To conclude, this internship experience offered me a great opportunity to study, in depth, some of the advantages and disadvantages of mobile BI applications on multi-touch mobile devices.  In my opinion, with the tablet device, users can interact with reports more intuitively, but they face potential security and accessibility issues along with the benefits provided by the portable device.

Jason