TTIC 31230: Fundamentals of Deep Learning

Autumn 2020

Lecture: David McAllester (

TA: Pedro Savarese (

This class is intended to provide students with an understanding of the technical content of current research in deep learning. Students successfully completing the class should be able to read and understand current deep learning research papers and posses the technical knowledge necessary to both reproduce research results and to do original research in deep learning. The course covers current methods in computer vision, natural language processing and reinforcement learning for games and robotics. One of the amazing aspects of deep learning is that much the conceptual knowledge needed for research in these areas is shared among the areas making such broad coverage possible.

  1. Introduction:

    The History of Deep Learning and Moore's Law of AI Slides Video
    The Fundamental Equations of Deep Learning Slides Video
    Some Information Theory Slides Video

  2. Frameworks and Back-Propagation:

    Deep Learning Frameworks Slides Video
    Backpropagation for Scalar Source Code Slides Video
    Backpropagation for Tensor Source Code Slides Video
    Minibatching: The Batch Index Slides Video
    The Educational Framework (EDF) Slides Video
    EDF source code 150 lines of Python/NumPy
    MNIST Coding Problem (EDF)
    PyTorch tutorial

  3. Convolutional Neural Networks (CNNs):

    Einstein Notation Slides Video
    CNNs Slides Video
    Pytorch Convolution Functions
    Dilation, Hypercolumns and Grouping (optional)
    Invariant Theory (optional)

  4. Trainability, Residual Connections and RNNs:

    Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet) Slides Video
    Language Modeling Slides Video
    Recurrent Neural Networks (RNNs) Slides Video
    Quiz 1, Autumn 2020

  5. Attention, Machine Translation and the Transformer:

    Machine Translation and Attention Slides Video
    The Transformer Part I Slides Video
    The Transformer Part II Slides Video
    Statistical Machine Translation (optional) Slides
    Self Attention in Vision Coding Problem (EDF)

  6. SGD I: Convergence and Temperature.

    The Classical Convergence Theorem Slides Video
    The Learning Rate, the Batch Size, and Temperature Slides Video
    Momentum and Temperature Slides Video

  7. SGD II: Continuous Time Analyses.

    Gradient Flow Slides Video
    Stochastic Differential Equations (SDEs) Slides Video
    Stationary Distributions and Temperature Slides Video
    Readings: Stochastic Gradient Descent as Approximate Bayesian Inference, Mandt, Hoffman, Blei, 2017

  8. SGD III: Adaptive Methods and Heat Capacity

    RMSProp and Adam Slides Video
    Heat Capacity: Loss (Energy) as a function of Learning Rate (Temperature). Slides Video
    Quiz 2, Autumn 2020

  9. Generalization and Regularization I:

    Early Stopping and Shrinkage Slides Video
    Early Stopping as Shrikage, L1 regularization and Ensembles Slides Video

  10. Generalization and Regularization II:

    Learning Theory I: The Occam Guarantee Slides Video
    Learning Theory II: The PAC-Bayes Guarantee Slides Video
    Implicit Regularization Slides Video
    Double Descent Slides Video
    PAC-Bayes Tutorial
    Langevin Dynamics Coding Problem (PyTorch)

  11. Deep Graphical Models I:

    Exponential Softmax Slides Video
    Backprogation for Exponential Softmax: The Model Marginals Slides Video
    Monte-Carlo Markov Chain (MCMC) Sampling Slides Video

  12. Deep Graphical Models II:

    Deep Graphical Models Summary and Review Slides Video
    Pseudo-Likelihood and Contrastive Divergence Slides Video
    Loopy Belief Propagation (Loopy BP) Slides Video
    Connectionist Temporal Classification (CTC) (optional) Slides

  13. Generative Adversarial Networks (GANs):

    GAN Fundamentals Slides Video
    Timeline of GAN Development Slides Video
    An Early Method for Overcoming Discriminator Victories (optional) Slides
    SytleGAN2 YouTube1
    SytleGAN2 YouTube2
    SytleGAN2 Paper
    Quiz 3, Autumn 2020

  14. Rate-Distortion Autoencoders:

    Perils of Differential Entropy Slides Video
    Rate-Distortion Autoencoders (RDAs) Slides Video
    Noisy Channel RDAs Slides Video

  15. Latent Variables and Variational Autoencoders:

    The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs) Slides Video
    Posterior Collapse, beta-VAEs, and Encoder Autonomy. Slides Video
    Vector Quantized VAEs Slides Video
    Unsupervised speech representation learning
    Juke Box: An amazing application of VQ-VAEs
    VAE Coding Problem (PyTorch)

  16. Mutual Information Training Objectives:

    Mutual Information Co-Training Slides Video
    Constrastive Predictive Coding (CPC) Slides Video

    Co-Training References:
    Harwath et al., Unsupervised Learning of Spoken Language with Visual Context, 2016
    McAllester, Information Theoretic Co-Training, Feb, 2018
    Stratos and Wiseman, Learning Discrete Structured Representations by Adversarially Maximizing Mutual Information, April, 2020
    CPC References:
    van den Oord et al., Contrastive Predictive Coding, July 2018
    McAllester and Stratos, Formal Limitations on the Measurement of Mutual Information, Nov. 2018
    Schneider et al., wav2vec: Unsupervised Pre-training for Speech Recognition, April 2019
    Poole et al., On Variational Bounds of Mutual Information, May 2019
    Chen et al., A Simple Framework for Contrastive Learning of Visual Representations, Feb. 2020
    Song and Ermon, Multi-label Contrastive Predictive Coding, July 2020

  17. Some Mathematics of Contrastive Methods: (optional)

    Optimal Discrimination and Jensen-Shannon Divergence Slides
    Noise Contrastive Estimation Slides
    Contrastive GANs Slides

  18. Reinforcement Learning (RL):

    Basic Definitions, Value Iteration Slides Video
    Q-Learning and Deep Q Networks (DQN) for Atari Slides Video
    The REINFORCE algorithm Slides Video
    Actor-Critic algorithms, A3C for Atari Slides Video
    Quiz 4, Autumn 2020

  19. AlphaZero and AlphaStar:

    Background Algorithms Slides Video
    The AlphaZero Training Algorithm Slides Video
    AlphaZero Results Slides Video
    AlphaStar Slides Video

  20. The Quest for Artificial General Intelligence (AGI):

    AGI: Universality Slides Video
    AGI: Bootstrapping Slides Video
    AGI: Logic Slides Video
    AGI: Natural Language Slides Video