Category Archives: Big Data

On videoconferencing and security

Picture: Zoom

Yesterday began with a message from a business executive who was concerned with the security of Zoom, the video conferencing platform that many companies (and universities) have landed on. The reason was a newspaper article regurgitating several internet articles, partly about functionality that has been adequately documented by Zoom, partly about security holes that have been fixed a long time ago.

So is there any reason to be concerned about Zoom or Whereby or Teams or Hangouts or all the other platforms?

My answer is “probably not” – at least not for the security holes discussed here, and for ordinary users (and that includes most small- to medium sized companies I know about).

It is true that video conferencing introduces some security and privacy issues, but if we look at it realistically, the biggest problem is not the technology, but the people using it (Something we nerds refer to as PEBKAC – Problem Exists Between Keyboard and Chair.)

When a naked man sneaks into an elementary school class via Whereby, as happened a few days ago here in Norway, it is not due to technology problems, but because the teacher had left the door wide open, i.e., had not turned on the function that makes it necessary to “knock” and ask for permission to enter.

When anyone can record (and have the dialogue automatically transcribed) from Zoom, it is because the host has not turned off the recording feature. By the way, anyone can record a video conference with screen capture software (such as Camtasia), a sound recorder or for that matter a cell phone, and no (realistic) security system in the world can do anything about it.

When the boss can monitor that people are not using other software while sitting in a meeting (a feature that can be completely legitimate in a classroom, it is equivalent to the teacher looking beyond the class to see if the students are awake), well, I don’t think the system is to blame for that either. Any leader who holds such irrelevant meetings that people do not bother to pay attention should rethink their communications strategy. Any executive I know would have neither time nor interest in activating this feature – because if you need technology to force people to wake up, you don’t have a problem technology can solve.

The risk of a new tool should not be measured against some perfect solution, but against what the alternative is if you don’t have it. Right now, video conferencing is the easiest and best tool for many – so that is why we use it. But we have to take the trouble to learn how it works. The best security system in the world is helpless against people writing their password on a Post-It, visible when they are in videoconference.

So, therefore – before using the tool – take a tour of the setup page, choose carefully what features you want to use, and think through what you want to achieve by having the meeting.

If that’s hard, maybe you should cancel the whole thing and send an email instead.

Dealing with cheating

At BI Norwegian Business School, we are (naturally and way overdue, but a virus crisis helps) moving all exams to digital. This means a lot of changes for people who have not done that before. One particular anxiety is cheating – normally not a problem in the subjects I teach (case- and problem oriented, master/executive, small classes) but certainly is an issue in large classes at the bachelor level, where many answers are easily found online, the students are many, and the subjects introductory in nature.

Here are some strategies to deal with this:

  • Have an academic honesty policy and have the students sign it as part of the exam. This to make them aware of they risk if they cheat.
  • Keep the exam time short – three hours at the max – and deliberately ask more questions than usual. This makes for less time for cheating (by collaborating) because collaboration takes time. It also means introducing more differentiation between the students – if just a few students manage to answer all questions, those are the A candidates. Obviously, you need to adjust the grade scale somewhat (you can’t expect all to answer everything) and there is an issue of awarding students that are good at taking exams at the expense of deep learning, but that is the way of all exams.
  • Don’t ask the obvious questions, especially not those asked on previous exams. Sorry, no reuse. Or perhaps a little bit (it is a tiring time.)
  • Tell the students that all answers will be subjected to an automated plagiarism check. Whether this is true or not, does not matter – plagiarism checkers are somewhat unreliable, have many false positives, and require a lot of afterwork – but just the threat will eliminate much cheating. (Personally, I look for cleverly crafted answers and Google them, amazing what shows up…).
  • Tell the students that after the written exam, they can be called in for an oral exam where they will need to show how they got their answers (if it is a single-answer, mathematically oriented course) or answer more detailed questions (if it is a more analysis- or literature oriented course). Who gets called in (via videoconference) will be partially random and partially based on suspicion. Failing the orals results in failing the course.
  • When you write the questions: If applicable, Google them, look at the most common results, and deliberately reshape the questions so that the answer is not one of those.
  • Use an example for the students to discuss/calculate, preferably one that is fresh from a news source or from a deliberately obscure academic article they have not seen before.
  • Consider giving sub-groups of students different numbers to work from – either automatically (different questions allocated through the exam system) or by having questions like “If your student ID ends in an even number (0,2,4,6,8) answer question 2a, otherwise answer question 2b” (use the student ID, not “birthday in January, February, March…” as this will be the only marker you have.) The questions may have the same problem, but with small, unimportant differences such as names, coefficients or others. This makes it much harder to collaborate for the students. (If you do multiple questions in an electronic context, I assume a number of the tools will have functionality for changing the order of the questions – it would, frankly, astonish me if they did not – but I don’t use multiple choice myself, so I don’t know.
  • Consider telling the students they will all get different problems (as discussed above) but not doing it. It still will prevent a lot of cheating simply because the students believe they all have different problems and act accordingly.
  • If you have essay questions, ask the students to pick a portion of them and answer them. I do this on all my exams anyway – give the students 6 questions with short (150 words) answers and ask them to pick 4 and answer only those, and give them 2 or 3 longer questions (400 words or so) and ask them to answer only one. (Make it clear that answering them all will result in only the first answered will be considered.) Again, this makes cheating harder.

Lastly: You can’t eliminate cheating in regular, physical exams, so don’t think you can do it in online exams. But you certainly can increase the disincentives to do so, and that is the most you can hope for.

Department for future ideas
I have always wanted to use machine learning for grading exams. At BI, we have some exams with 6000 candidates writing textual answers. Grading this surely must constitute cruel and unusual punishment. With my eminent colleague Chandler Johnson I tried to start a project where we would have graders grade 1000 of these exams, then use text recognition and other tools, build an ML model and use that to grade the rest. Worth an experiment, surely. The project (like many other ideas) never took off, largely because of difficulties of getting the data, but perhaps this situation will make it possible.

And that would be a good thing…

From notepad: The power and limits of deep learning – Yann LeCun

Warning: These are my notes from an ACM webcast. Misunderstandings, skips, jumps and errors (probably) abound. Caveat emptor.

Notes from
The Power and Limits of Deep Learning,” presented on Thursday, July 11 at 1 PM ET/10 AM PT by Yann LeCun, VP & Chief AI Scientist at Facebook, Silver Professor at NYU, and 2018 ACM A.M Turing Award Laureate.

Abstract:
Deep Learning (DL) has enabled significant progress in computer perception, natural language understanding, and control. Almost all these successes rely on supervised learning, where the machine is required to predict human-provided annotations, or model-free reinforcement learning, where the machine learns policies that maximize rewards. Supervised learning paradigms have been extremely successful for an increasingly large number of practical applications such as medical image analysis, autonomous driving, virtual assistants, information filtering, ranking, search and retrieval, language translation, and many more. Today, DL systems are at the core of search engines and social networks. DL is also used increasingly widely in the physical and social sciences to analyze data in astrophysics, particle physics, and biology, or to build phenomenological models of complex systems. An interesting example is the use of convolutional networks as computational models of human and animal perception. But while supervised DL excels at perceptual tasks, there are two major challenges to the next quantum leap in AI: (1) getting DL systems to learn tasks without requiring large amounts of human-labeled data; (2) getting them to learn to reason and to act. These challenges motivate some the most interesting research directions in AI.

Notes:

  • supervised learning works, but requires too many samples
  • convolutional networks: using layers to tease out compositional hierarchy
  • other approaches: reinforcement learning,
    • use convolutional networks and a few other architectural concepts, requires huge number of interactions with clearly defined universe – takes 80 hours to reach performance a human uses 15 minutes to reach. In the end, it does better than the human, but it takes a long time
    • impractical for non-electronic settings (self-driving car would need to crash thousands of times
  • better approach: (deep) multi-layer neural nets
    • alternates linear/non-linear layers
  • supervised machine learning, such as stochastic gradient descent
  • figure out tweaking by computing gradients by back-propagation (automatic differentiation)
  • architecture of neural networks – figure out sparse networks, not using all connections, based on research on visual cortex
    • first using simple cells, then combining them
  • convolutional neural network builds on this idea, but introduces back propagation
    • turn on/off each neuron based on the portion it sees, then combine
  • shows examples through the nineties, such as recognising numbers (for checks)
  • neural networks out of fashion with AI researchers, realized that they could recognize multiple objects
  • research on moving robots, did not need training data
  • moving on to autonomous driving by classifying pixels
  • 2010: Deep learning revolution, driven by speech recognition community
    • largely responsible for lowering of errors in SR
  • 2012: (Alexnet) Krizhevsy et al, NIPS 2012, other nets, large networks
  • better and better performance, dramatic increase in number of layers
    • current record: 84% image recognition
    • trying to find the minimal architecture that gives performance
    • Facebook: billions of pictures, each goes through 6 convnets
  • Mask R-CNN: instance segmentation, two stage detection system, identifies areas of interest and send them to new networks
  • RetinaNet: One-pass object recognition
  • other works, recognizing background,
  • Applications:
    • image recognition, such as finding femurs (for hip ops) by taking in the whole 3D picture rather than using layers
    • autonomous driving
    • everyone uses convnets
  • Limitation:
    • good for perception, not for reasoning
    • for this: introducing working memory (differentiable associative memory), need to maintain a number of facts, “memory network”, a neural net with an attached network for memory, essentially soft RAM
    • transformer networks, every unit is itself a neural network, works with translation (dynamic convolution)
    • Facebook; dynamic neural nets: networks that put out networks
  • Challenge: How can humans and animals learn so quickly?
    • children learn largely by observation
      • learn about gravity between 6 and 9 months, just by observation
    • solution(?) self-supervised networks
      • not task-directed, comprises most of our own learning (cake example)
      • very large networks (see slide on process)
      • works for speech recognition and text, filling in 15-20% of blanks in text
      • does not work for filling in missing parts of images (yet)
      • works partly for speech recognition
      • summary: works with discrete data (text, partly speech), much more difficult with continuous data, because we do not have good ways of parameterization
        • predicts the average of all possible futures, results in blurry images…
    • Adversarial training: prediction under uncertainty:
      • generator that makes prediction, discriminator that determines whether it is good or not
      • works well for generating images of people that don’t exist, clothes that has not been designed yet
      • important with video prediction for self-driving cars, that is where the demand is
    • Self-supervised forward models: training self-driving cars to predict it environment by adding latent variables, randomly sampled
    • Final slide: Theory follows invention, will deep learning result in a theory of intelligence?

(did not take notes during question session, should have don (might add them later), talk available at learning.acm.com)

Analytics III: Projects

asm_topTogether with Chandler Johnson and Alessandra Luzzi, I currently teach a course called Analytics for Strategic Management. In this course (now in its third iteration), executive students work on real projects for real companies, applying various forms of machine learning (big data, analytics, whatever you want to call it) to business problems. We have just finished the second of five modules, and the projects are now defined.

Here is a (mostly anonymised, except for publicly owned companies) list:

  • An IT service company that provides data and analytics wants to predict customer use of their online products, in order to provide better products and tailor them more to the most active customers
  • A gas station chain company wants to predict churn in their business customers, to find ways to keep them (or, if necessary, scale down some of their offerings)
  • A electricity distribution network company wants to identify which of their (recently installed) smart meters are not working properly, to reduce the cost of inspection and increase the quality of
  • A hairdressing chain wants to predict which customers will book a new appointment when they have had their hair done, in order to increase repeat business and build a group of loyal customers
  • A large financial institution wants to identify employees that misuse company information (such as looking at celebrities’ information), in order to increase privacy and data confidentiality
  • NAV IT wants to predict which employees are likely to leave the company, in order to better plan for recruitment and retraining
  • OSL Gardermoen want to find out which airline passengers are more likely to use the taxfree shop, in order to increase sales (and not bother those who will not use the taxfree shop too much)
  • a bank wants to find out which of their younger customers will need a house loan soon, to increase their market share
  • a TV media company wants to find out which customers are likely to cancel their subscription within a certain time frame, to better tailor their program offering and their marketing
  • a provider of managed data centers wants to predict their customers’ energy needs, to increase the precision of their own and their customers’ energy budgets
  • Ruter (the public transportation umbrella company for the Oslo area) wants to build a model to better predict crowding on buses, to, well, avoid overcrowding
  • Barnevernet wants to build a model to better predict which families are most likely to be approved as foster parents, in order to speed up the qualification process
  • an electrical energy production company wants to build a model to better predict electricity usage in their market, in order to plan their production process better

All in all, a fairly typical set of examples of the use of machine learning and analytics in business – and I certainly like to work with practical examples with very clearly defined benefits. Over the next three modules (to be finished in the Spring) we will take these projects closer to fruition, some to a stage of a completed proposal, some probably all the way to a finished model and perhaps even an implementation.

Neural networks – explained

As mentioned here a few times, I teach an executive course called Analytics for strategic management, as well as a short program (three days) called Decisions from Data: Driving an Organization on Analytics. We have just finished the first version of both of these courses, and it has been a very enjoyable experience. The students (in both courses) have been interested and keen to learn, bringing relevant and interesting problems to the table, and we have managed do what it said on the tin (I think) – make them better consumers of analytics, capable of having a conversation with the analytics team, employing the right vocabulary and being able to ask more intelligent questions.

Of course, programs of this type does not allow you do dive deep into how things work, though we have been able to demonstrate MySQL, Python and DataRobot, and also give the students an understanding of how rapidly these things are evolving. We have talked about deep learning, for instance, but not how it works.

But that is easy to fix – almost everything about machine learning is available on Youtube and in other web channels, once you are into a little bit of the language. For instance, to understand how deep learning works, you can check out a series of videos from Grant Sanderson, who produces very good educational videos on the web site 3 blue one brown.

(There are follow-up videos: Chapter 2, Chapter 3, and Chapter 3 (formal calculus appendix). This Youtube channel has a lot of other math-related videos, too, including a great explanation of how Bitcoin works, which I’ll have to get into at some points, since I keep being asked why I don’t invest in Bitcoin all the time.)

Of course, you have to be rather interested to dive into this, and it certainly is not required read for an executive who only wants to be able to talk intelligently to the analytics team. But it is important (and a bit reassuring) to note the mechanisms employed: Breaking a very complex problem up into smaller problems, breaking those up into even smaller problems. solving the small problems by programming, then stepping back up. For those of you with high school math: It really isn’t that complicated. Just complicated in layers.

And it is good to know that all this advanced AI stuff really is rather basic math. Just applied in an increasingly complex way, really fast.

Analytics projects

asm_topTogether with Chandler Johnson and Alessandra Luzzi, I currently teach a course called Analytics for Strategic Management. In this course (now in its second iteration), executive students work on real projects for real companies, applying various forms of machine learning (big data, analytics, whatever you want to call it) to business problems. We have just finished the second of five modules, and the projects are now defined.

Here is a (mostly anonymised) list:

  • The Agency for Public Management and eGovernment (Difi) wants to understand and predict which citizens are likely to reserve themselves against electronic communications from the government. The presumption is that these people may be mostly old, not on electronic media, or in other ways digitally unsophisticated – but that may not be true, so they want to find out.
  • An electric power distribution company wants to investigate power imbalances in the electric grid: In the electric grid, production has to match consumption at all times, or you will get (sometimes rather large) price fluctuations. Can they predict when imbalances (more consumption that production, for instance) will occur, so that they can adjust accordingly?
  • A company in the food and beverage industry want to offer recommendations to their (business) customers: When you order products from them, how can they suggest other products that may either sell well or differentiate the customer from the competition?
  • A petroleum producing company wants to predict unintended shutdowns and slowdowns in their production infrastructure. Such problems are costly and risky, but predictions are difficult because they are rather rare – and that creates difficulties with unbalanced data sets.
  • A major bank wants to look into the security profiles of their online customers and investigate whether some customers are less likely to be exposed to security risks (and therefore may be able to use less cumbersome security procedures than others).
  • An insurance company wants to investigate which of their new customers are likely to leave them (churn analysis) – and why. They want to find them early, while there is still time to do something to make them stay.
  • A ship management company wants to investigate the use of certain types of oil and optimise the delivery and use of it. (Though the oil is rather specialised, the ships are large and the expense significant.)
  • Norsk Tipping runs a service helping people who are in danger of becoming addicted to gaming, an important part of their societal responsibility which they take very seriously. They want to identify which of their customers are most likely to benefit from intervention. This is a rather tricky and interesting problem – you need to identify not only those who are likely to become addicted, but also make a judgement as to whether the intervention (of which there is limited capacity) is likely to help.
  • A major health club chain wants to identify customers who are not happy with their services, and they want to find them early, so they can make offers to activate them and make them stay.
  • A regional bank wants to identify customers who are about to leave them, particularly those who want to move their mortgage somewhere else. (This is also a problem of unbalanced data sets, since most customers stay.)
  • A major electronic goods retailer wants to do market basket analysis to be able to recommend and stock products that customers are likely to buy together with others.

All in all, a fairly typical set of examples of the use of machine learning and analytics in business – and I certainly like to work with practical examples with very clearly defined benefits. Now – a small matter of implementation!

A tour de Fry of technology evolution

There are many things to say about Stephen Fry, but enough is to show this video, filmed at Nokia Bell Labs, explaining, amongst other things, the origin of microchips, the power of exponential growth, the adventure and consequences of performance and functionality evolution. I am beginning to think that “the apogee, the acme, the summit of human intelligence” might actually be Stephen himself:

(Of course, the most impressive feat is his easy banter on hard questions after the talk itself. Quotes like: “[and] who is to program any kind of moral [into computers ]… If [the computer] dives into the data lake and learns to swim, which is essentially what machine learning is, it’s just diving in and learning to swim, it may pick up some very unpleasant sewage.”)