Category Archives: Analytics

Analytics VI: Projects

Another year, another list of exciting projects (previous ones here, here, and here) from the course Analytics for Strategic Management, which I teach with my excellent colleague Chandler Johnson). In this course, students work on real data analysis projects for real companies – and here is a (rather disguised) list:

  • Avinor, the Norwegian Airport Authority, wants to predict TOBT (Target Off-Block Time) for Gardermoen Airport. TOBT is a measure of when the plane will leave the gate, and very important for planning access to runways and other congested areas of the airport.
  • Norsk Tipping AS wants to improve its marketing of certain products through predicting customers’ likeliness to adopt them (see this (Norwegian) article for a former, very successful project with this company)
  • An international company in the shipping supply business wants to predict prices for some of its products. A key issue here (as is often the case) is finding data from orders that were not accepted (and, hence, not registered anywhere). You cannot know what price a customer will accept unless you have access to cases where the price was too high.
  • A news agency wants to predict the uptake of its articles to prioritise its editorial resources. The get news articles from news agencies and other sources around the world, and need to know which of those to spend money on translating and editing for the Norwegian market.
  • A grocery wholesaler wants to predict demand for its products. in the grocery industry, most stock levels are determined either by having minimum levels or by going by what you bought last year with some adjustments. This group wants to see if they can improve on that.
  • An insurance company wants to predict churn for some of its products. This problem is common to any company running a subscription business – which is increasingly true for more and more companies.
  • An large business school wants to predict grades for large exams. Manually reading through thousands of exams is boring work – not to mention expensive – can machine learning in some form be used to automate some of the work?
  • A large engineering company wants to predict employee churn. Engineers and other specialists are difficult to find, and it is much cheaper to retain a good employee than to find a new one.
  • Brønnøysundsregistrene, a Norwegian register for, amongst other things, company annual accounts, wants to predict late submissions. If they can predict this, they can make efforts to follow up more carefully on those companies, rather than send out nagging reminders to everyone.

One problem we often have in these projects is difficulty in getting data. This is not the case this year. Whether this is a result of more companies saving more data, the students getting better at defining problems based on data they have, or just plain coincidence, remains to be seen. But it is a welcome development!

EU’s new AI regulation: GDPR for machine learning?

EU has recently release a proposal for regulating the use of AI in companies and regulations. As far as I can see, it is modelled on the GDPR regulations: Assigning responsibility to board and top management, sanctions expressed in terms of percentages of revenues, and (hopefully) som sort of “safe harbor” rules so you can be somewhat confident in that you are not doing anything wrong.

An interesting aspect here is that the EU is early with regards to the use of AI (yes, I know “AI” is a really diffuse concept, but leave that be for a moment) and is again taking the lead in regulation where Silicon Valley (and China) leads in implementation.

Elin Hauge

This means that managers, board members and researchers will need to learn more. I plan to do this by attending a webinar at Applied Artificial Intelligence Conference 2021. This webinar (May 27, at CDT 1430-1600) is open for everyone who registers. It will be facilitated by Elin Hauge, who is a member of one of the EGN networks I lead.

Recommended – see you there!

Analytics IV and V: Projects

asm_topLast year (with Chandler Johnson and Alessandra Luzzi) and this year (with Chandler, Jadwiga Supryn and Prakash Raj Paudel), I teach a course called Analytics for Strategic Management. In this course executive students work on real projects for real companies, applying various forms of machine learning (big data, analytics, whatever you want to call it) to business problems. Here is a list (mostly anonymised, except for public organizations) list from this year:

  • One group wants to use machine learning to predict fraud in public security contracts in a developing country
  • A credit agency wants to predict which of their customers will pay their bills by the end of the month
  • An engineering company wants to predict the number of hours needed to meet demand for each month in each department
  • One group wants to predict housing prices within Oslo, to help house sellers get a realistic estimate of what their property is worth
  • A higher education provider wants to predict which students are likely to fail or not qualify for an exam, to be able to intervene early
  • A couple of municipalities want to predict who will accept a kindergarten allocation or not
  • A telecommunications company wants to predict which customers will churn
  • An Internet product company wants to predict necessary capacity for picking and shipping work every day
  • One group wants to predict the likelihood of a road closing due to bad weather, in order to warn truck drivers so they can detour
  • One group wants to predict the future financial health of companies based on employee engagement numbers
  • One group wants to predict efficiency of production in a wind power park

And last year we had these projects:

  • An investment company wanted to predict bankruptcies from media events
  • Ruter, Oslo’s public transportation authority wanted to predict the number of passengers (for each station, to great precision) for one line on the metro
  • A telecommunications company wanted to predict customer feedback scores from analyzing customer interactions (so the customer does not have to answer a survey afterwards)
  • The Norwegian Health directorate wanted to predict general physician “fastlege” churn
  • A commercial TV station wanted to predict subscriber churn
  • An insurance company wants to identify customers likely to buy a group insurance package
  • An online gaming company wanted to predict customer churn
  • A large political party wanted to predict membership churn
  • One group wanted to start a company based on using machine learning to diagnose hearing problems
  • A large retail chain wanted to predict churn based on customer purchase patterns

Dealing with cheating

At BI Norwegian Business School, we are (naturally and way overdue, but a virus crisis helps) moving all exams to digital. This means a lot of changes for people who have not done that before. One particular anxiety is cheating – normally not a problem in the subjects I teach (case- and problem oriented, master/executive, small classes) but certainly is an issue in large classes at the bachelor level, where many answers are easily found online, the students are many, and the subjects introductory in nature.

Here are some strategies to deal with this:

  • Have an academic honesty policy and have the students sign it as part of the exam. This to make them aware of they risk if they cheat.
  • Keep the exam time short – three hours at the max – and deliberately ask more questions than usual. This makes for less time for cheating (by collaborating) because collaboration takes time. It also means introducing more differentiation between the students – if just a few students manage to answer all questions, those are the A candidates. Obviously, you need to adjust the grade scale somewhat (you can’t expect all to answer everything) and there is an issue of awarding students that are good at taking exams at the expense of deep learning, but that is the way of all exams.
  • Don’t ask the obvious questions, especially not those asked on previous exams. Sorry, no reuse. Or perhaps a little bit (it is a tiring time.)
  • Tell the students that all answers will be subjected to an automated plagiarism check. Whether this is true or not, does not matter – plagiarism checkers are somewhat unreliable, have many false positives, and require a lot of afterwork – but just the threat will eliminate much cheating. (Personally, I look for cleverly crafted answers and Google them, amazing what shows up…).
  • Tell the students that after the written exam, they can be called in for an oral exam where they will need to show how they got their answers (if it is a single-answer, mathematically oriented course) or answer more detailed questions (if it is a more analysis- or literature oriented course). Who gets called in (via videoconference) will be partially random and partially based on suspicion. Failing the orals results in failing the course.
  • When you write the questions: If applicable, Google them, look at the most common results, and deliberately reshape the questions so that the answer is not one of those.
  • Use an example for the students to discuss/calculate, preferably one that is fresh from a news source or from a deliberately obscure academic article they have not seen before.
  • Consider giving sub-groups of students different numbers to work from – either automatically (different questions allocated through the exam system) or by having questions like “If your student ID ends in an even number (0,2,4,6,8) answer question 2a, otherwise answer question 2b” (use the student ID, not “birthday in January, February, March…” as this will be the only marker you have.) The questions may have the same problem, but with small, unimportant differences such as names, coefficients or others. This makes it much harder to collaborate for the students. (If you do multiple questions in an electronic context, I assume a number of the tools will have functionality for changing the order of the questions – it would, frankly, astonish me if they did not – but I don’t use multiple choice myself, so I don’t know.
  • Consider telling the students they will all get different problems (as discussed above) but not doing it. It still will prevent a lot of cheating simply because the students believe they all have different problems and act accordingly.
  • If you have essay questions, ask the students to pick a portion of them and answer them. I do this on all my exams anyway – give the students 6 questions with short (150 words) answers and ask them to pick 4 and answer only those, and give them 2 or 3 longer questions (400 words or so) and ask them to answer only one. (Make it clear that answering them all will result in only the first answered will be considered.) Again, this makes cheating harder.

Lastly: You can’t eliminate cheating in regular, physical exams, so don’t think you can do it in online exams. But you certainly can increase the disincentives to do so, and that is the most you can hope for.

Department for future ideas
I have always wanted to use machine learning for grading exams. At BI, we have some exams with 6000 candidates writing textual answers. Grading this surely must constitute cruel and unusual punishment. With my eminent colleague Chandler Johnson I tried to start a project where we would have graders grade 1000 of these exams, then use text recognition and other tools, build an ML model and use that to grade the rest. Worth an experiment, surely. The project (like many other ideas) never took off, largely because of difficulties of getting the data, but perhaps this situation will make it possible.

And that would be a good thing…

From notepad: The power and limits of deep learning – Yann LeCun

Warning: These are my notes from an ACM webcast. Misunderstandings, skips, jumps and errors (probably) abound. Caveat emptor.

Notes from
The Power and Limits of Deep Learning,” presented on Thursday, July 11 at 1 PM ET/10 AM PT by Yann LeCun, VP & Chief AI Scientist at Facebook, Silver Professor at NYU, and 2018 ACM A.M Turing Award Laureate.

Abstract:
Deep Learning (DL) has enabled significant progress in computer perception, natural language understanding, and control. Almost all these successes rely on supervised learning, where the machine is required to predict human-provided annotations, or model-free reinforcement learning, where the machine learns policies that maximize rewards. Supervised learning paradigms have been extremely successful for an increasingly large number of practical applications such as medical image analysis, autonomous driving, virtual assistants, information filtering, ranking, search and retrieval, language translation, and many more. Today, DL systems are at the core of search engines and social networks. DL is also used increasingly widely in the physical and social sciences to analyze data in astrophysics, particle physics, and biology, or to build phenomenological models of complex systems. An interesting example is the use of convolutional networks as computational models of human and animal perception. But while supervised DL excels at perceptual tasks, there are two major challenges to the next quantum leap in AI: (1) getting DL systems to learn tasks without requiring large amounts of human-labeled data; (2) getting them to learn to reason and to act. These challenges motivate some the most interesting research directions in AI.

Notes:

  • supervised learning works, but requires too many samples
  • convolutional networks: using layers to tease out compositional hierarchy
  • other approaches: reinforcement learning,
    • use convolutional networks and a few other architectural concepts, requires huge number of interactions with clearly defined universe – takes 80 hours to reach performance a human uses 15 minutes to reach. In the end, it does better than the human, but it takes a long time
    • impractical for non-electronic settings (self-driving car would need to crash thousands of times
  • better approach: (deep) multi-layer neural nets
    • alternates linear/non-linear layers
  • supervised machine learning, such as stochastic gradient descent
  • figure out tweaking by computing gradients by back-propagation (automatic differentiation)
  • architecture of neural networks – figure out sparse networks, not using all connections, based on research on visual cortex
    • first using simple cells, then combining them
  • convolutional neural network builds on this idea, but introduces back propagation
    • turn on/off each neuron based on the portion it sees, then combine
  • shows examples through the nineties, such as recognising numbers (for checks)
  • neural networks out of fashion with AI researchers, realized that they could recognize multiple objects
  • research on moving robots, did not need training data
  • moving on to autonomous driving by classifying pixels
  • 2010: Deep learning revolution, driven by speech recognition community
    • largely responsible for lowering of errors in SR
  • 2012: (Alexnet) Krizhevsy et al, NIPS 2012, other nets, large networks
  • better and better performance, dramatic increase in number of layers
    • current record: 84% image recognition
    • trying to find the minimal architecture that gives performance
    • Facebook: billions of pictures, each goes through 6 convnets
  • Mask R-CNN: instance segmentation, two stage detection system, identifies areas of interest and send them to new networks
  • RetinaNet: One-pass object recognition
  • other works, recognizing background,
  • Applications:
    • image recognition, such as finding femurs (for hip ops) by taking in the whole 3D picture rather than using layers
    • autonomous driving
    • everyone uses convnets
  • Limitation:
    • good for perception, not for reasoning
    • for this: introducing working memory (differentiable associative memory), need to maintain a number of facts, “memory network”, a neural net with an attached network for memory, essentially soft RAM
    • transformer networks, every unit is itself a neural network, works with translation (dynamic convolution)
    • Facebook; dynamic neural nets: networks that put out networks
  • Challenge: How can humans and animals learn so quickly?
    • children learn largely by observation
      • learn about gravity between 6 and 9 months, just by observation
    • solution(?) self-supervised networks
      • not task-directed, comprises most of our own learning (cake example)
      • very large networks (see slide on process)
      • works for speech recognition and text, filling in 15-20% of blanks in text
      • does not work for filling in missing parts of images (yet)
      • works partly for speech recognition
      • summary: works with discrete data (text, partly speech), much more difficult with continuous data, because we do not have good ways of parameterization
        • predicts the average of all possible futures, results in blurry images…
    • Adversarial training: prediction under uncertainty:
      • generator that makes prediction, discriminator that determines whether it is good or not
      • works well for generating images of people that don’t exist, clothes that has not been designed yet
      • important with video prediction for self-driving cars, that is where the demand is
    • Self-supervised forward models: training self-driving cars to predict it environment by adding latent variables, randomly sampled
    • Final slide: Theory follows invention, will deep learning result in a theory of intelligence?

(did not take notes during question session, should have don (might add them later), talk available at learning.acm.com)

Analytics III: Projects

asm_topTogether with Chandler Johnson and Alessandra Luzzi, I currently teach a course called Analytics for Strategic Management. In this course (now in its third iteration), executive students work on real projects for real companies, applying various forms of machine learning (big data, analytics, whatever you want to call it) to business problems. We have just finished the second of five modules, and the projects are now defined.

Here is a (mostly anonymised, except for publicly owned companies) list:

  • An IT service company that provides data and analytics wants to predict customer use of their online products, in order to provide better products and tailor them more to the most active customers
  • A gas station chain company wants to predict churn in their business customers, to find ways to keep them (or, if necessary, scale down some of their offerings)
  • A electricity distribution network company wants to identify which of their (recently installed) smart meters are not working properly, to reduce the cost of inspection and increase the quality of
  • A hairdressing chain wants to predict which customers will book a new appointment when they have had their hair done, in order to increase repeat business and build a group of loyal customers
  • A large financial institution wants to identify employees that misuse company information (such as looking at celebrities’ information), in order to increase privacy and data confidentiality
  • NAV IT wants to predict which employees are likely to leave the company, in order to better plan for recruitment and retraining
  • OSL Gardermoen want to find out which airline passengers are more likely to use the taxfree shop, in order to increase sales (and not bother those who will not use the taxfree shop too much)
  • a bank wants to find out which of their younger customers will need a house loan soon, to increase their market share
  • a TV media company wants to find out which customers are likely to cancel their subscription within a certain time frame, to better tailor their program offering and their marketing
  • a provider of managed data centers wants to predict their customers’ energy needs, to increase the precision of their own and their customers’ energy budgets
  • Ruter (the public transportation umbrella company for the Oslo area) wants to build a model to better predict crowding on buses, to, well, avoid overcrowding
  • Barnevernet wants to build a model to better predict which families are most likely to be approved as foster parents, in order to speed up the qualification process
  • an electrical energy production company wants to build a model to better predict electricity usage in their market, in order to plan their production process better

All in all, a fairly typical set of examples of the use of machine learning and analytics in business – and I certainly like to work with practical examples with very clearly defined benefits. Over the next three modules (to be finished in the Spring) we will take these projects closer to fruition, some to a stage of a completed proposal, some probably all the way to a finished model and perhaps even an implementation.

Interesting interview with Rodney Brooks

sawyer_and_baxterBoingboing, which is a fantastic source of interesting stuff to do during Easter vacation, has a long and fascinating interview by Rob Reid with Rodney Brooks, AI and robotics researcher and entrepreneur extraordinaire. Among the things I learned:

  • What the Baxter robot really does well – interacting with humans and not requiring 1/10 mm precision, especially when learning
  • There are not enough workers in manufacturing (even in China), most of the ones working there spend their time waiting for some expensive capital equipment to finish
  • The automation infrastructure is really old, still using PLCs that refresh and develop really slowly
  • Robots will be important in health care – preserving people’s dignity by allowing them to drive and stay at home longer by having robots that understand force and softness and can do things such as help people out of bed.
  • He has written an excellent 2018 list of dated predictions on the evolution of robotic and AI technologies, highly readable, especially his discussions on how to predict technologies and that we tend to forget the starting points. (And I will add his blog to my Newsblur list.)
  • He certainly doesn’t think much of the trolley problem, but has a great example to understand the issue of what AI can do, based on what Isaac Newton would think if he were transported to our time and given a smartphone – he would assume that it would be able to light a candle, for instance.

Worth a listen..

Neural networks – explained

As mentioned here a few times, I teach an executive course called Analytics for strategic management, as well as a short program (three days) called Decisions from Data: Driving an Organization on Analytics. We have just finished the first version of both of these courses, and it has been a very enjoyable experience. The students (in both courses) have been interested and keen to learn, bringing relevant and interesting problems to the table, and we have managed do what it said on the tin (I think) – make them better consumers of analytics, capable of having a conversation with the analytics team, employing the right vocabulary and being able to ask more intelligent questions.

Of course, programs of this type does not allow you do dive deep into how things work, though we have been able to demonstrate MySQL, Python and DataRobot, and also give the students an understanding of how rapidly these things are evolving. We have talked about deep learning, for instance, but not how it works.

But that is easy to fix – almost everything about machine learning is available on Youtube and in other web channels, once you are into a little bit of the language. For instance, to understand how deep learning works, you can check out a series of videos from Grant Sanderson, who produces very good educational videos on the web site 3 blue one brown.

(There are follow-up videos: Chapter 2, Chapter 3, and Chapter 3 (formal calculus appendix). This Youtube channel has a lot of other math-related videos, too, including a great explanation of how Bitcoin works, which I’ll have to get into at some points, since I keep being asked why I don’t invest in Bitcoin all the time.)

Of course, you have to be rather interested to dive into this, and it certainly is not required read for an executive who only wants to be able to talk intelligently to the analytics team. But it is important (and a bit reassuring) to note the mechanisms employed: Breaking a very complex problem up into smaller problems, breaking those up into even smaller problems. solving the small problems by programming, then stepping back up. For those of you with high school math: It really isn’t that complicated. Just complicated in layers.

And it is good to know that all this advanced AI stuff really is rather basic math. Just applied in an increasingly complex way, really fast.

Analytics projects

asm_topTogether with Chandler Johnson and Alessandra Luzzi, I currently teach a course called Analytics for Strategic Management. In this course (now in its second iteration), executive students work on real projects for real companies, applying various forms of machine learning (big data, analytics, whatever you want to call it) to business problems. We have just finished the second of five modules, and the projects are now defined.

Here is a (mostly anonymised) list:

  • The Agency for Public Management and eGovernment (Difi) wants to understand and predict which citizens are likely to reserve themselves against electronic communications from the government. The presumption is that these people may be mostly old, not on electronic media, or in other ways digitally unsophisticated – but that may not be true, so they want to find out.
  • An electric power distribution company wants to investigate power imbalances in the electric grid: In the electric grid, production has to match consumption at all times, or you will get (sometimes rather large) price fluctuations. Can they predict when imbalances (more consumption that production, for instance) will occur, so that they can adjust accordingly?
  • A company in the food and beverage industry want to offer recommendations to their (business) customers: When you order products from them, how can they suggest other products that may either sell well or differentiate the customer from the competition?
  • A petroleum producing company wants to predict unintended shutdowns and slowdowns in their production infrastructure. Such problems are costly and risky, but predictions are difficult because they are rather rare – and that creates difficulties with unbalanced data sets.
  • A major bank wants to look into the security profiles of their online customers and investigate whether some customers are less likely to be exposed to security risks (and therefore may be able to use less cumbersome security procedures than others).
  • An insurance company wants to investigate which of their new customers are likely to leave them (churn analysis) – and why. They want to find them early, while there is still time to do something to make them stay.
  • A ship management company wants to investigate the use of certain types of oil and optimise the delivery and use of it. (Though the oil is rather specialised, the ships are large and the expense significant.)
  • Norsk Tipping runs a service helping people who are in danger of becoming addicted to gaming, an important part of their societal responsibility which they take very seriously. They want to identify which of their customers are most likely to benefit from intervention. This is a rather tricky and interesting problem – you need to identify not only those who are likely to become addicted, but also make a judgement as to whether the intervention (of which there is limited capacity) is likely to help.
  • A major health club chain wants to identify customers who are not happy with their services, and they want to find them early, so they can make offers to activate them and make them stay.
  • A regional bank wants to identify customers who are about to leave them, particularly those who want to move their mortgage somewhere else. (This is also a problem of unbalanced data sets, since most customers stay.)
  • A major electronic goods retailer wants to do market basket analysis to be able to recommend and stock products that customers are likely to buy together with others.

All in all, a fairly typical set of examples of the use of machine learning and analytics in business – and I certainly like to work with practical examples with very clearly defined benefits. Now – a small matter of implementation!

Big Data and analytics – briefly

DFDDODData and data analytics is becoming more and more important for companies and organizations. Are you wondering what data and data science might do for your company? Welcome to a three-day ESP (Executive Short Program) called Decisions from Data: Driving an Organization with Analytics. It will take place at BI Norwegian Business School from December 5-7 this year. The short course is an offshoot from our very popular executive programs Analytics for Strategic Management, which are fully booked. (Check this list (Norwegian) for a sense of what those students are doing.)

Decisions from Data is aimed at managers who are curious about Big Data and data science and wants an introduction and an overview, without having to take a full course. We will talk about and show various forms of data analysis, discuss the most important obstacles to becoming a data driven organization and how to deal with data scientists, and, of course, give lots of examples of how to compete with analytics. The course will not be tech heavy, but we will look at and touch a few tools, just to get an idea of what we are asking those data scientists to do.

The whole thing will be in English, because, well, the (in my humble opinion) best people we have on this (Chandler Johnson og Alessandra Luzzi) are from the USA and Italy, respectively. As for myself, I tag along as best I can…

Welcome to the data revolution – it start’s here!

Big Data in practice

(This is a translation of an earlier post in my Norwegian blog. This translation was done by Ragnvald Sannes using Google Translate with a few amendments. This technology malarky is getting better and better, isn’t it?).
ml_mapI have just finished teaching four days of data analytics – proper programming and data collection. We (Chandler, Alessandra and the undersigned) have managed to trick over 30 executives and middle managers in Norway to attend a programming and statistics course (more or less, this is actually what analytics basically is), while sort of wondering how we did that. The students are motivated and hard-working and have many and smart questions – in a course taught in English. It is almost enough to make me stop complaining about the state of the world and education and other things.
Anyway – what are these students going to do with this course? We are working on real projects, in the sense that we require people to come up with a problem they will find out in their own job – preferably something that is actually important and where deep data analysis can make a difference. This has worked for almost all the groups: They work on real issues in real organizations – and that is incredibly fun for the teacher. Here is a list of the projects, so judge by yourself. (I do not identify any students here, but believe me – these people face these issues every day.) Well worth spending time on:
  • What is the correct price for newly built homes? A group is working to figure out how to price homes that are not built yet, for a large residential building company.
  • What is the tax effect of the sharing economy? This group (where one student works for the Tax Administration) tries to figure out how to identify people who cheat on the tax as Uber drivers – while making suggestions on how tax rules can be adapted to make it easy to follow the law.
  • What characterizes successful consulting proposals? A major consulting firm wants to use data from their CRM system (which documents the bidding process) to understand what kind of projects they will win or lose.
  • How to recognize money laundering transactions? A bank wants to find out if any of their customers are doing money laundering through online gaming companies.
  • How to offer benefits to customers with automated analysis? A company that supplies stock trading terminals wants to use data analysis to create a competitive edge.
  • How to segment Norwegian shareholders? A company that offers online trading of shares wants to identify segments of its customers to pinpoint and improve its marketing strategy.
  • How to lower costs and reduce the risk of production stoppages in a process business? A hydropower company wants to better understand when and why your power stations need repairs or maintenance.
  • How to identify customers who are in the process of terminating? A TV company wants to understand what characterizes “churn” – how can they identify customers who are about to leave them?
  • Why are some wines more popular than others? A group will work with search data from a wine site to find out what makes some wines more sought after than others.
  • Which customers will buy a new product? A group is working on data from a large bank that wants to offer its existing customers more services.
  • How to increase the recycling rate for waste in Oslo? REN – Oslo’s municipal trash service – wants to find out if you can organize routes and routines differently to better utilize trash trucks and recycling plants.
  • How to avoid being sold out for promotional items? One of Norway’s largest grocery chains wishes to improve their ordering routines so that customers do not get to the store and find out that there is no more left of the offer they wanted.
  • How to model fraud risk in maritime insurance? An insurance company wants to build a model to understand how to find customers attempting to fraud companies or authorities.
  • Which customers are about to leave us? A large transport company wants to find out which customers are about to go to a competitor so that they can take action before it happens.
  • What characterize students who drop out? BI enters 3500 new students each year, but some of them end after the first year. How can we find evidence that a student is about to drop out?
Common to all the projects – and so it’s with all the student projects I have advised since I started in this industry – is that you start with a big question and reduce it to something that can actually be answered. Then you look for data and find that you need to reduce it even more. Then you get problems that the data is either not found, unreliable or inadequate – and one has to figure out what to do with it. Finally, after about 90% of the time and money budget is gone, one can begin to think about analysis. And then there is a risk that you find nothing…
And that is an important lesson of this course: The goal is that the student should be able to know about actual data analysis to ask the right questions and have a realistic expectation of what kind of answer you actually can get.
There is a great demand for this course – so we have set up an additional course this fall. See you there!

Key myths about analytics

My excellent colleagues Alessandra Luzzi and Chandler Johnson have pointed me to this video, a keynote speech from 2015 by Ken Rudin, head of analytics at Facebook:

This is a really good speech, and almost an advertisement for our course Analytics for Strategic Management, which starts in two days (and, well, sorry, it is full, but will be arranged again next year.)

In the talk (starting about 1:30 in), Ken breaks down four common myths surrounding Big Data:

  1. Big Data does not necessarily imply use of certain tools, in particular Hadoop. Hadoop can sift through mountains of data, but other tools, such as relational databases, are better at ad hoc analysis once you have structured the data and determined what of the data that is interesting and worth analyzing.
  2. Big Data does not always provide better answers. Big Data will give you more answers, but, as Rudin says, can give you “brilliant answers to questions that no one cares about.” He stated the best way to better answers to formulate better question, which requires hiring smart people with “business savvy” who will ask how to solve real business problems. Also, you need to place the data analysts out in the organization, so they understand how the business runs and what is important. He advocates an embedded model – centrally organized analysts sitting geographically with the people they are helping.
  3. Data Science is not all science. A lot of data science has an “art” to it, and you have to have a balance. Having a common language between business and analytics is important here – and Facebook sends its people to a two-week “Data Camp” to learn that. You ned to avoid the “hippo” problem – the highest paid person’s opinion – essentially, not enough science. The other side is the “groundhog” issue – based on the movie – where the main character tries to win the girl by gradual experimentation. Data is like sandpaper – it cannot create a good idea, but it can shape it after it has been created.
  4. The goal of analytics is not insights, but results. To that end, data scientists have to help making sure that people act on the analysis, not just inform them. “An actionable insight that nobody acts on has no value.”

To the students we’ll meet on Tuesday: This is not a bad way of gearing up for the course. To anyone else interested in analytics and Big Data: This video is recommended.

(And if you think, like I do, that his sounds like the discussion of what IT should be in an organization 20 years ago – well, fantastic, then we know what problems to expect and how to act on them.)

Analytics for Strategic Management

I am starting a new executive course, Analytics for Strategic Management, with my young and very talented colleagues Alessandra Luzzi and Chandler Johnson (both with the Center for Digitization at BI Norwegian Business School).

alessandra

Alessandra Luzzi

chandler

Chandler Johnson

The course (over five modules) is aimed at managers who want to become sophisticated consumers of analytics (be it Big Data or the more regular kind). The idea is to learn just enough analytics that you know what to ask for, where the pressure points are (so you do not ask for things that cannot be done or will be prohibitively expensive). The participants will learn from cases, discussions, live examples and assignments.

Central to the course is a course analytics project, where the participants will seek out data from their own company (or, since it will be group work, someone else’s), figure out what you can do with the data, and end up, if not with a finished analysis (that might happen), at least with a well developed project specification.

The course will contain quite a bit of analytics – including a spot of Phython and R programming – again, so that the executives taking it will know what they are asking for and what is being done.

We were a bit nervous about offering this course – a technically oriented course with a February startup date. The response, however, has been excellent, with more than 20 students signed up already. In fact, wi will probably be capping the course at 30 participants, simply because it is the first time we are teaching it, and we are conscious that for the first time, 30 is more than enough, as we will be doing everything for the first time and undoubtedly change many things as we go along.

If you can’t do the course this year – here are a few stating pointers to whet your appetite:

  • Big Data is difficult to define. This is always the case with fashionable monikers – for instance, how big is “big”? – but good ol’ Wikipedia comes to the rescue, with an excellent introductory article on the concept. For me, Big Data has always been about having the entire data set instead of a sample (i.e., n = p), but I can certainly see the other dimensions of delineation suggested here.
  • Data analytics can be very profitable (PDF), but few companies manage to really mine their data for insights and actions. That’s great – more upside for those who really wants to do it!
  • Data may be big but often is bad, causing data scientists to spend most of their time fixing errors, cleaning things up and, in general, preparing for analytics rather than the analysis itself. Sometimes you can almost smell that the data is bad – I recommend The Quartz guide to bad data as a great list of indicators that something is amiss.
  • Data scientists are few, far between and expensive. There is a severe shortage of people with data analysis skills in Norway and elsewhere, and the educational systems (yours truly excepted, of course) is not responding. Good analysts are expensive. Cheap analysts – well, you get what you pay for. And, quite possibly, some analytics you may like, but not what you ought to get.
  • There is lots of data, but a shortage of models. Though you may have the data and the data scientists, that does not mean that you have good models. It is actually a problem that as soon as you have numbers – even though they are bad – they become a focal point for decision makers, who show a marked reluctance to asking where the data is coming from, what it actually means, and how the constructed models have materialised.

And with that – if you are a participant, I look forward to seeing you in February. If you are not – well, you better boogie over to BIs web pages and sign up.