TTIC 31230: Fundamentals of Deep Learning

Autumn 2020

Lecture: David McAllester

TA: Pedro Savarese

This class is intended to provide students with an understanding of the technical content of current research in deep learning. Students successfully completing the class should be able to read and understand current deep learning research papers and posses the technical knowledge necessary to both reproduce research results and to do original research in deep learning. The course covers current methods in computer vision, natural language processing and reinforcement learning for games and robotics. One of the amazing aspects of deep learning is that much the conceptual knowledge needed for research in these areas is shared among the areas making such broad coverage possible.

The official meeting time is MWF 1:50-2:40. However, this will be an on-line class and recorded lectures will be available in advance. The class time will be used for on-line office hours (by zoom) for Prof. McAllester.

Evolving Course Material:

  1. Introduction: Reviewed at class time office hours Friday Oct 2.

    The History of Deep Learning and Moore's Law of AI Slides Video
    The Fundamental Equations of Deep Learning Slides Video
    Some Information Theory Slides Video

  2. Frameworks and Back-Propagation: Reviewed at office hours Monday Oct. 5 and Wednesday Oct. 7.

    MNIST in EDF coding problem due Friday Oct. 9.

    Deep Learning Frameworks Slides Video
    Backpropagation for Scalar Source Code Slides Video
    Backpropagation for Tensor Source Code Slides Video
    Minibatching: The Batch Index Slides Video
    The Educational Framework (EDF) Slides Video
    EDF source code 150 lines of Python/NumPy
    MNIST in EDF problem set
    PyTorch tutorial

  3. Einstein Notation and Convolutional Neural Networks (CNNs): Reviewed at office hours Monday Oct. 12.

    Einstein Notation Slides Video
    CNNs Slides Video
    Pytorch Convolution Functions
    Invariant Theory (optional)

  4. Trainability, Residual Connections and RNNs: Reviewed at office hours Wednesday Oct. 14.

    Quiz 1 is Friday Oct. 16.

    Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet) Slides Video
    Language Modeling Slides Video
    Recurrent Neural Networks (RNNs) Slides Video

  5. Attention, Machine Translation and the Transformer: Reviewed at office hours Monday Oct. 19

    Machine Translation and Attention Slides Video
    The Transformer Slides Video
    Statistical Machine Translation (optional) Slides Video(?)

  6. Stochastic Gradient Descent I Reviewed at office hours Wednesday Oct. 21.

    Problem set 2 due Friday Oct. 23.

    The Classical Convergence Theorem Slides Video
    Decoupling the Learning Rate from the Batch Size Slides Video
    Momentum as a Running Average and Decoupled Momentum Slides Video

  7. Stochastic Gradient Descent II: Reviewed at office hours Friday Oct. 23,

    RMSProp, and Adam and Decoupled Versions Slides Video
    Gradient Flow Slides Video
    Heat Capacity with Loss as Energy and Learning Rate as Temperature Slides Video

  8. Stochastic Gradient Descent III: Reviewed at office hours Monday Oct. 26.

    Continuous Time Noise and Stationary Parameter Densities Slides Video

  9. Generalization and Regularization I: Reviewed at office hours Wednesday Oct 28


    Quiz 2 on Friday Oct 30.

    Early Stopping, Shrinkage and Decoupled Shrinkage Slides Video

  10. Generalization and Regularization II: Reviewed at office hours Monday Nov. 2.

    PAC-Bayes Generalization Theory Slides Video
    Implicit Regularization Slides Video
    Double Descent Slides Video
    PAC-Bayes Tutorial

  11. Deep Graphical Models I: Reviewed at office hours Wednesday Nov. 4.

    Problem set 3 Due Friday Nov. 6.

    Exponential Softmax Slides Video
    Speech Recognition: Connectionist Temporal Classification (CTC) Slides Video
    Backprogation for Exponential Softmax: The Model Marginals Slides Video

  12. Deep Graphical Models II: Reviewed at office hours Friday Nov. 6.

    Monte-Carlo Markov Chain (MCMC) Sampling Slides Video
    Pseudo-Likelihood and Contrastive Divergence Slides Video
    Loopy Belief Propagation (Loopy BP) Slides Video
    Noise Contrastive Estimation Slides Video

  13. Generative Adversarial Networks (GANs): Reviewed at office hours Monday Nov. 9

    Overview and Timeline of GAN Development Slides Video
    Replacing the Loss Gradient with the Margin Gradient. Slides Video
    Optimal Discrimination and Jensen-Shannon Divergence Slides Video
    Contrastive GANs Slides Video

  14. Rate-Distortion Autoencoders: Reviewed at office hours Wednesday Nov. 11

    Quiz3, Friday Nov. 13

    Perils of Differential Entropy Slides Video
    Rate-Distortion Autoencoders (RDAs) Slides Video
    Noisy Channel RDAs Slides Video
    Gaussian Noisy Channel RDAs Slides Video

  15. Latent Variables and Variational Autoencoders: Reviewed at office hours Monday Nov 16.

    Interpretability of Latent Variables Slides Video
    The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs) Slides Video
    Gaussian VAEs Slides Video
    Posterior Collapse, VAE Non-Identifiability, and beta-VAEs Slides Video
    Vector Quantized VAEs Slides Video

  16. Pretraining: Reviewed at office hours Wednesday Nov. 18

    Problem set 4 due Friday, Nov 20.

    Pretraining for NLP Slides Video
    Supervised ImageNet Pertraining Slides Video
    Self-Supervised Pretraining for Vision Slides Video
    Contrastive Predictive Coding Slides Video
    Mutual Information Coding Slides Video

  17. Reinforcement Learning (RL): Reviewed at office hours Friday Nov. 20.

    Basic Definitions, Q-learning, Deep Q Networks (DQN) for Atari Slides Video
    The REINFORCE algorithm, Actor-Critic algorithms, A3C for Atari Slides Video

  18. AlphaZero and AlphaStar: Reviewed at office hours Monday Nov. 30.

    Background Algorithms Slides Video
    The AlphaZero Training Algorithm Slides Video
    Some Quantitative Empirical Results Slides Video
    The Policy as a Q-Function Slides Video
    What Happened to alpha-beta? Slides Video
    AlphaStar Slides Video

  19. The Quest for Artificial General Intelligence (AGI): Reviewed at office hours Wednesday, December 2

    Quiz 4 on Friday December 4.

    The Free Lunch Theorem and The Intelligence Explosion Slides Video
    Representing Functions with Shallow Circuits: The Classical Universality Theorems Slides Video
    Representing Functions with Deep Circuits: Circuit Complexity Theory Slides Video
    Representing Functions with Programs: Python, Assembler and the Turing Tarpit Slides Video
    Representing Functions and Knowledge with Logic Slides Video
    Representing Choices and Knowledge with Natural Language Slides Video