Category Archives: Statistics

Analytics IV and V: Projects

asm_topLast year (with Chandler Johnson and Alessandra Luzzi) and this year (with Chandler, Jadwiga Supryn and Prakash Raj Paudel), I teach a course called Analytics for Strategic Management. In this course executive students work on real projects for real companies, applying various forms of machine learning (big data, analytics, whatever you want to call it) to business problems. Here is a list (mostly anonymised, except for public organizations) list from this year:

  • One group wants to use machine learning to predict fraud in public security contracts in a developing country
  • A credit agency wants to predict which of their customers will pay their bills by the end of the month
  • An engineering company wants to predict the number of hours needed to meet demand for each month in each department
  • One group wants to predict housing prices within Oslo, to help house sellers get a realistic estimate of what their property is worth
  • A higher education provider wants to predict which students are likely to fail or not qualify for an exam, to be able to intervene early
  • A couple of municipalities want to predict who will accept a kindergarten allocation or not
  • A telecommunications company wants to predict which customers will churn
  • An Internet product company wants to predict necessary capacity for picking and shipping work every day
  • One group wants to predict the likelihood of a road closing due to bad weather, in order to warn truck drivers so they can detour
  • One group wants to predict the future financial health of companies based on employee engagement numbers
  • One group wants to predict efficiency of production in a wind power park

And last year we had these projects:

  • An investment company wanted to predict bankruptcies from media events
  • Ruter, Oslo’s public transportation authority wanted to predict the number of passengers (for each station, to great precision) for one line on the metro
  • A telecommunications company wanted to predict customer feedback scores from analyzing customer interactions (so the customer does not have to answer a survey afterwards)
  • The Norwegian Health directorate wanted to predict general physician “fastlege” churn
  • A commercial TV station wanted to predict subscriber churn
  • An insurance company wants to identify customers likely to buy a group insurance package
  • An online gaming company wanted to predict customer churn
  • A large political party wanted to predict membership churn
  • One group wanted to start a company based on using machine learning to diagnose hearing problems
  • A large retail chain wanted to predict churn based on customer purchase patterns

Hans Rosling in memoriam

Hans Rosling died from cancer this morning.

Not much to say, really. Or, maybe, so much to say. I met him in Oslo once, I had seen his video and suggested him for the annual “big” conference for movers and shakers in Oslo. He came and wowed everyone. Simple as that.

Here is another one (this one in Swedish) where he just shuts down a rather snooty and ill prepared newsshow host by saying, essentially, “this is not a matter of opinion, this is a matter of statistics and facts. I am right and you are wrong.”

What a man.

Analytics for Strategic Management

I am starting a new executive course, Analytics for Strategic Management, with my young and very talented colleagues Alessandra Luzzi and Chandler Johnson (both with the Center for Digitization at BI Norwegian Business School).

alessandra

Alessandra Luzzi

chandler

Chandler Johnson

The course (over five modules) is aimed at managers who want to become sophisticated consumers of analytics (be it Big Data or the more regular kind). The idea is to learn just enough analytics that you know what to ask for, where the pressure points are (so you do not ask for things that cannot be done or will be prohibitively expensive). The participants will learn from cases, discussions, live examples and assignments.

Central to the course is a course analytics project, where the participants will seek out data from their own company (or, since it will be group work, someone else’s), figure out what you can do with the data, and end up, if not with a finished analysis (that might happen), at least with a well developed project specification.

The course will contain quite a bit of analytics – including a spot of Phython and R programming – again, so that the executives taking it will know what they are asking for and what is being done.

We were a bit nervous about offering this course – a technically oriented course with a February startup date. The response, however, has been excellent, with more than 20 students signed up already. In fact, wi will probably be capping the course at 30 participants, simply because it is the first time we are teaching it, and we are conscious that for the first time, 30 is more than enough, as we will be doing everything for the first time and undoubtedly change many things as we go along.

If you can’t do the course this year – here are a few stating pointers to whet your appetite:

  • Big Data is difficult to define. This is always the case with fashionable monikers – for instance, how big is “big”? – but good ol’ Wikipedia comes to the rescue, with an excellent introductory article on the concept. For me, Big Data has always been about having the entire data set instead of a sample (i.e., n = p), but I can certainly see the other dimensions of delineation suggested here.
  • Data analytics can be very profitable (PDF), but few companies manage to really mine their data for insights and actions. That’s great – more upside for those who really wants to do it!
  • Data may be big but often is bad, causing data scientists to spend most of their time fixing errors, cleaning things up and, in general, preparing for analytics rather than the analysis itself. Sometimes you can almost smell that the data is bad – I recommend The Quartz guide to bad data as a great list of indicators that something is amiss.
  • Data scientists are few, far between and expensive. There is a severe shortage of people with data analysis skills in Norway and elsewhere, and the educational systems (yours truly excepted, of course) is not responding. Good analysts are expensive. Cheap analysts – well, you get what you pay for. And, quite possibly, some analytics you may like, but not what you ought to get.
  • There is lots of data, but a shortage of models. Though you may have the data and the data scientists, that does not mean that you have good models. It is actually a problem that as soon as you have numbers – even though they are bad – they become a focal point for decision makers, who show a marked reluctance to asking where the data is coming from, what it actually means, and how the constructed models have materialised.

And with that – if you are a participant, I look forward to seeing you in February. If you are not – well, you better boogie over to BIs web pages and sign up.

R – the swiss army knife of the data scientist

R LogoThe video below, a talk by John D. Cook (via Flowingdata), is a very nice intro to R for the someone who wants to be a data scientist and have some notion or experience of programming. I have been beginning to look at R, but need a specific project to analyze in order to get into it. When learning a programming language (or any powerful tool, for that matter) it is important to get under the skin of it, to understand it to the point where you don’t look up the function or whatever in the manual because you intuitively know what it would be named, since you think like the developers. (I can’t claim any knowledge like that, except perhaps for IFPS (a defunct financial programming language), REXX (macro language for IBM mainframes), and Minitab (statistical package, rather marginalized now). Learning something to that level requires time and, most importantly, a need. We’ll see.

But it helps to have someone explain things, so I guess watching this video is not a waste of time. It wasn’t for me, anyway. And R certainly is the thing to learn, in this Big Data (whatever that may mean) world. (Though, as is said here, it was never designed for huge data sets. But huge data sets need models to work, and you build those on small data sets…)