TTIC 31230: Fundamentals of Deep Learning

David McAllester

Revised from winter 2020

Lectures Slides and Problems:

  1. Introduction
    1. The History of Deep Learning and Moore's Law of AI
    2. The Fundamental Equations of Deep Learning
    3. Problems
  2. Frameworks and Back-Propagation
    1. Deep Learning Frameworks
    2. Backpropagation for Scalar Source Code
    3. Backpropagation for Tensor Source Code
    4. Minibatching: The Batch Index
    5. The Educational Framework (EDF)
    6. Problems
    7. EDF source code 150 lines of Python/NumPy
    8. MNIST in EDF problem set
    9. PyTorch tutorial
  3. Vision: Convolutional Neural Networks (CNNs)
    1. Einstein Notation
    2. CNNs
    3. Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet)
    4. Invariant Theory (optional)
    5. Problems
    6. Pytorch Convolution Functions
  4. Natural Language Processing
    1. Language Modeling
    2. Recurrent Neural Networks (RNNs)
    3. Machine Translation and Attention
    4. The Transformer
    5. Statistical Machine Translation (optional)
    6. Problems
  5. Stochastic Gradient Descent
    1. The Classical Convergence Theorem
    2. Decoupling the Learning Rate from the Batch Size
    3. Momentum as a Running Average and Decoupled Momentum
    4. RMSProp, and Adam and Decoupled Versions
    5. Gradient Flow
    6. Heat Capacity with Loss as Energy and Learning Rate as Temperature
    7. Continuous Time Noise and Stationary Parameter Densities
    8. Problems
  6. Generalization and Regularization
    1. Early Stopping, Shrinkage and Decoupled Shrinkage
    2. PAC-Bayes Generalization Theory
    3. Implicit Regularization
    4. Double Descent
    5. Problems
    6. PAC-Bayes Tutorial
  7. Deep Graphical Models
    1. Exponential Softmax
    2. Speech Recognition: Connectionist Temporal Classification (CTC)
    3. Backprogation for Exponential Softmax: The Model Marginals
    4. Monte-Carlo Markov Chain (MCMC) Sampling
    5. Pseudo-Likelihood and Contrastive Divergence
    6. Loopy Belief Propagation (Loopy BP)
    7. Noise Contrastive Estimation
    8. Problems
  8. Generative Adversarial Networks (GANs)
    1. Perils of Differential Entropy
    2. Overview and Timeline of GAN Development
    3. Replacing the Loss Gradient with the Margin Gradient.
    4. Optimal Discrimination and Jensen-Shannon Divergence
    5. Contrastive GANs
    6. Problems
  9. Autoencoders
    1. Rate-Distortion Autoencoders (RDAs)
    2. Noisy Channel RDAs
    3. Gaussian Noisy Channel RDAs
    4. Interpretability of Latent Variables
    5. The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs)
    6. Gaussian VAEs
    7. Posterior Collapse, VAE Non-Identifiability, and beta-VAEs
    8. Vector Quantized VAEs
    9. Problems
  10. Pretraining
    1. Pretraining for NLP
    2. Supervised ImageNet Pertraining
    3. Self-Supervised Pretraining for Vision
    4. Contrastive Predictive Coding
    5. Mutual Information Coding
    6. Problems
  11. Reinforcement Learning (RL)
    1. Basic Definitions, Q-learning, Deep Q Networks (DQN) for Atari
    2. The REINFORCE algorithm, Actor-Critic algorithms, A3C for Atari
    3. Problems
  12. AlphaZero and AlphaStar
    1. Background Algorithms
    2. The AlphaZero Training Algorithm
    3. Some Quantitative Empirical Results
    4. The Policy as a Q-Function
    5. What Happened to alpha-beta?
    6. AlphaStar
    7. Problems
  13. The Quest for Artificial General Intelligence (AGI)
    1. The Free Lunch Theorem and The Intelligence Explosion
    2. Representing Functions with Shallow Circuits: The Classical Universality Theorems
    3. Representing Functions with Deep Circuits: Circuit Complexity Theory
    4. Representing Functions with Programs: Python, Assembler and the Turing Tarpit
    5. Representing Functions and Knowledge with Logic
    6. Representing Choices and Knowledge with Natural Language