Monthly Archives: July 2019

From notepad: The power and limits of deep learning – Yann LeCun

Warning: These are my notes from an ACM webcast. Misunderstandings, skips, jumps and errors (probably) abound. Caveat emptor.

Notes from
The Power and Limits of Deep Learning,” presented on Thursday, July 11 at 1 PM ET/10 AM PT by Yann LeCun, VP & Chief AI Scientist at Facebook, Silver Professor at NYU, and 2018 ACM A.M Turing Award Laureate.

Deep Learning (DL) has enabled significant progress in computer perception, natural language understanding, and control. Almost all these successes rely on supervised learning, where the machine is required to predict human-provided annotations, or model-free reinforcement learning, where the machine learns policies that maximize rewards. Supervised learning paradigms have been extremely successful for an increasingly large number of practical applications such as medical image analysis, autonomous driving, virtual assistants, information filtering, ranking, search and retrieval, language translation, and many more. Today, DL systems are at the core of search engines and social networks. DL is also used increasingly widely in the physical and social sciences to analyze data in astrophysics, particle physics, and biology, or to build phenomenological models of complex systems. An interesting example is the use of convolutional networks as computational models of human and animal perception. But while supervised DL excels at perceptual tasks, there are two major challenges to the next quantum leap in AI: (1) getting DL systems to learn tasks without requiring large amounts of human-labeled data; (2) getting them to learn to reason and to act. These challenges motivate some the most interesting research directions in AI.


  • supervised learning works, but requires too many samples
  • convolutional networks: using layers to tease out compositional hierarchy
  • other approaches: reinforcement learning,
    • use convolutional networks and a few other architectural concepts, requires huge number of interactions with clearly defined universe – takes 80 hours to reach performance a human uses 15 minutes to reach. In the end, it does better than the human, but it takes a long time
    • impractical for non-electronic settings (self-driving car would need to crash thousands of times
  • better approach: (deep) multi-layer neural nets
    • alternates linear/non-linear layers
  • supervised machine learning, such as stochastic gradient descent
  • figure out tweaking by computing gradients by back-propagation (automatic differentiation)
  • architecture of neural networks – figure out sparse networks, not using all connections, based on research on visual cortex
    • first using simple cells, then combining them
  • convolutional neural network builds on this idea, but introduces back propagation
    • turn on/off each neuron based on the portion it sees, then combine
  • shows examples through the nineties, such as recognising numbers (for checks)
  • neural networks out of fashion with AI researchers, realized that they could recognize multiple objects
  • research on moving robots, did not need training data
  • moving on to autonomous driving by classifying pixels
  • 2010: Deep learning revolution, driven by speech recognition community
    • largely responsible for lowering of errors in SR
  • 2012: (Alexnet) Krizhevsy et al, NIPS 2012, other nets, large networks
  • better and better performance, dramatic increase in number of layers
    • current record: 84% image recognition
    • trying to find the minimal architecture that gives performance
    • Facebook: billions of pictures, each goes through 6 convnets
  • Mask R-CNN: instance segmentation, two stage detection system, identifies areas of interest and send them to new networks
  • RetinaNet: One-pass object recognition
  • other works, recognizing background,
  • Applications:
    • image recognition, such as finding femurs (for hip ops) by taking in the whole 3D picture rather than using layers
    • autonomous driving
    • everyone uses convnets
  • Limitation:
    • good for perception, not for reasoning
    • for this: introducing working memory (differentiable associative memory), need to maintain a number of facts, “memory network”, a neural net with an attached network for memory, essentially soft RAM
    • transformer networks, every unit is itself a neural network, works with translation (dynamic convolution)
    • Facebook; dynamic neural nets: networks that put out networks
  • Challenge: How can humans and animals learn so quickly?
    • children learn largely by observation
      • learn about gravity between 6 and 9 months, just by observation
    • solution(?) self-supervised networks
      • not task-directed, comprises most of our own learning (cake example)
      • very large networks (see slide on process)
      • works for speech recognition and text, filling in 15-20% of blanks in text
      • does not work for filling in missing parts of images (yet)
      • works partly for speech recognition
      • summary: works with discrete data (text, partly speech), much more difficult with continuous data, because we do not have good ways of parameterization
        • predicts the average of all possible futures, results in blurry images…
    • Adversarial training: prediction under uncertainty:
      • generator that makes prediction, discriminator that determines whether it is good or not
      • works well for generating images of people that don’t exist, clothes that has not been designed yet
      • important with video prediction for self-driving cars, that is where the demand is
    • Self-supervised forward models: training self-driving cars to predict it environment by adding latent variables, randomly sampled
    • Final slide: Theory follows invention, will deep learning result in a theory of intelligence?

(did not take notes during question session, should have don (might add them later), talk available at