TTIC 31230: Fundamentals of Deep Learning

Lecture: David McAllester (

TA: David Yunis (

Graders: Shuo Xie and Feiyu Han

This class is intended to provide students with an understanding of the technical content of current research in deep learning. Students successfully completing the class should be able to read and understand current deep learning research papers and posses the technical knowledge necessary to both reproduce research results and to do original research in deep learning. The course covers current methods in computer vision, natural language processing and reinforcement learning for games and robotics. One of the amazing aspects of deep learning is that much the conceptual knowledge needed for research in these areas is shared among the areas making such broad coverage possible.

Prerequisites: This class assumes knowledge of vector calculus, basic linear algebra (Matrices, Eigenvectors, eigenvalues), and a significant familiarity and with probability and statistics. Familiarity with Markov chains is advised. The course is overall quite technical and a strong technical background and mathematical maturity is advised. There are machine problems and class programming project and previous familiarity with programming, and Python in particular, is advised.

In the fall of 2022 there will be three machine problem sets, three exams and a final project.

  1. Introduction:

    The History of Deep Learning and Moore's Law of AI (2020) Slides Video1 Video2
    Generative Spoken Language Model Demo
    Codex Demo
    Soccer Demo
    The Fundamental Equations of Deep Learning Slides Video
    Some Information Theory Slides Video

  2. Frameworks and Back-Propagation:

    Deep Learning Frameworks Slides Video
    Backpropagation for Scalar Source Code Slides Video
    Framework Objects and Backpropagation for Tensor Source Code Slides Video
    Minibatching: The Batch Index Slides Video
    The Educational Framework (EDF) Slides Video
    EDF and the MNIST Coding Problem
    PyTorch tutorial

  3. Convolutional Neural Networks (CNNs):

    Einstein Notation Slides Video
    CNNs Slides Video
    Pytorch Convolution Functions
    Dilation, Hypercolumns and Grouping (optional)
    Invariant Theory (optional)

  4. Trainability, Residual Connections and RNNs:

    Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet) Slides Video
    Language Modeling Slides Video
    Recurrent Neural Networks (RNNs) Slides Video

  5. Attention, Machine Translation and the Transformer:

    Machine Translation and Attention Slides Video
    The Transformer Part I Slides Video
    The Transformer Part II Slides Video
    Statistical Machine Translation (optional) Slides
    Masked Language Modeling, Gibbs Sampling and Pseudo-Likelihood Slides Video

  6. SGD I: Convergence and Temperature.

    The Classical Convergence Theorem Slides Video
    The Learning Rate, the Batch Size, and Temperature Slides Video
    Momentum and Temperature Slides Video
    RMSProp and Adam Slides Video

  7. SGD II: Continuous Time Analyses.

    Gradient Flow Slides Video
    Stochastic Differential Equations (SDEs) Slides Video
    Stationary Distributions and Temperature Slides Video
    Heat Capacity: Loss (Energy) as a function of Learning Rate (Temperature). Slides Video
    Readings: SGD as Approximate Bayesian Inference, Mandtet al. 2017

  8. Generalization and Regularization I: Early Stopping and Shrinkage

    Early Stopping and Shrinkage Slides Video
    Early Stopping as Shrikage, L1 regularization and Ensembles Slides Video

  9. Generalization and Regularization II: PAC-Bayesian Leaning Theory

    Learning Theory I: The Occam Guarantee Slides Video
    Learning Theory II: The PAC-Bayes Guarantee Slides Video
    Implicit Regularization Slides Video
    Double Descent Slides Video
    PAC-Bayes Tutorial

  10. Generative Adversarial Networks (GANs):

    GAN Fundamentals Slides Video
    Timeline of GAN Development Slides Video
    SytleGAN2 YouTube1
    SytleGAN2 YouTube2
    SytleGAN2 Paper

  11. Variational Autoencoders:

    The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs) Slides 2021 Video
    Perils of Differential Entropy Slides 2021 Video
    Vector Quantized VAEs Slides 2021 Video
    Progressive VAEs Slides 2021 Video
    Juke Box using VQ-VAEs

  12. Contrastive Coding:

    Contrastive Coding Slides

    Tishby, Pereira and Bialek, The Information Bottleneck Method, 2000
    McAllester, Information Theoretic Co-Training, Feb, 2018
    van den Oord et al., Contrastive Predictive Coding, July 2018
    McAllester and Stratos, Formal Limitations on the Measurement of Mutual Information, Nov. 2018
    Schneider et al., wav2vec: Unsupervised Pre-training for Speech Recognition, April 2019
    Poole et al., On Variational Bounds of Mutual Information, May 2019
    Chen et al., A Simple Framework for Contrastive Learning of Visual Representations, Feb. 2020
    Caron et al. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments, Jan. 2021


  13. Diffusion Models:

    VAE Formulation Slides
    Score Matching Formulation Slides
    DALLE-2 Development Timeline Slides

  14. Reinforcement Learning (RL):

    Basic Definitions, Value Iteration Slides Video
    Q-Learning and Deep Q Networks (DQN) for Atari Slides Video
    The REINFORCE algorithm Slides Video
    Actor-Critic algorithms, A3C for Atari Slides Video

  15. AlphaZero and AlphaStar:

    Background Algorithms Slides Video
    The AlphaZero Training Algorithm Slides Video
    AlphaZero Results Slides Video
    MuZero Slides
    AlphaStar Slides Video

  16. The Quest for Artificial General Intelligence (AGI):

    AGI: Universality Slides Video
    AGI: Bootstrapping Slides Video
    AGI: Logic Slides Video
    AGI: Natural Language Slides Video