Autumn 2020
Lecture: David McAllester (mcallester@ttic.edu)
TA: Pedro Savarese (savarese@ttic.edu)
This class is intended to provide students with an understanding of the technical content of current research in deep learning. Students successfully completing the class should be able to read and understand current deep learning research papers and posses the technical knowledge necessary to both reproduce research results and to do original research in deep learning. The course covers current methods in computer vision, natural language processing and reinforcement learning for games and robotics. One of the amazing aspects of deep learning is that much the conceptual knowledge needed for research in these areas is shared among the areas making such broad coverage possible.
The History of Deep Learning and Moore's Law of AI | Slides | Video |
The Fundamental Equations of Deep Learning | Slides | Video |
Some Information Theory | Slides | Video |
Problems |
Frameworks and Back-Propagation:
Deep Learning Frameworks | Slides | Video |
Backpropagation for Scalar Source Code | Slides | Video |
Backpropagation for Tensor Source Code | Slides | Video |
Minibatching: The Batch Index | Slides | Video |
The Educational Framework (EDF) | Slides | Video |
Problems | ||
EDF source code 150 lines of Python/NumPy | ||
MNIST Coding Problem (EDF) | ||
PyTorch tutorial |
Einstein Notation | Slides | Video |
CNNs | Slides | Video |
Pytorch Convolution Functions | ||
Dilation, Hypercolumns and Grouping (optional) | ||
Invariant Theory (optional) |
Trainability, Residual Connections and RNNs:
Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet) | Slides | Video |
Language Modeling | Slides | Video |
Recurrent Neural Networks (RNNs) | Slides | Video |
Problems | ||
Quiz 1, Autumn 2020 |
Attention, Machine Translation and the Transformer:
Machine Translation and Attention | Slides | Video |
The Transformer Part I | Slides | Video |
The Transformer Part II | Slides | Video |
Statistical Machine Translation (optional) | Slides | |
Problems | ||
Self Attention in Vision Coding Problem (EDF) |
SGD I: Convergence and Temperature.
The Classical Convergence Theorem | Slides | Video |
The Learning Rate, the Batch Size, and Temperature | Slides | Video |
Momentum and Temperature | Slides | Video |
Gradient Flow | Slides | Video |
Stochastic Differential Equations (SDEs) | Slides | Video |
Stationary Distributions and Temperature | Slides | Video |
Readings: Stochastic Gradient Descent as Approximate Bayesian Inference, Mandt, Hoffman, Blei, 2017 |
SGD III: Adaptive Methods and Heat Capacity
RMSProp and Adam | Slides | Video |
Heat Capacity: Loss (Energy) as a function of Learning Rate (Temperature). | Slides | Video |
Problems | ||
Quiz 2, Autumn 2020 |
Generalization and Regularization I:
Early Stopping and Shrinkage | Slides | Video |
Early Stopping as Shrikage, L1 regularization and Ensembles | Slides | Video |
Generalization and Regularization II:
Learning Theory I: The Occam Guarantee | Slides | Video |
Learning Theory II: The PAC-Bayes Guarantee | Slides | Video |
Implicit Regularization | Slides | Video |
Double Descent | Slides | Video |
PAC-Bayes Tutorial | ||
Problems | ||
Langevin Dynamics Coding Problem (PyTorch) |
Deep Graphical Models I:
Exponential Softmax | Slides | Video |
Backprogation for Exponential Softmax: The Model Marginals | Slides | Video |
Monte-Carlo Markov Chain (MCMC) Sampling | Slides | Video |
Deep Graphical Models II:
Deep Graphical Models Summary and Review | Slides | Video |
Pseudo-Likelihood and Contrastive Divergence | Slides | Video |
Loopy Belief Propagation (Loopy BP) | Slides | Video |
Connectionist Temporal Classification (CTC) (optional) | Slides | |
Problems |
GAN Fundamentals | Slides | Video |
Timeline of GAN Development | Slides | Video |
An Early Method for Overcoming Discriminator Victories (optional) | Slides | |
SytleGAN2 YouTube1 | ||
SytleGAN2 YouTube2 | ||
SytleGAN2 Paper | ||
Problems | ||
Quiz 3, Autumn 2020 |
Rate-Distortion Autoencoders:
Perils of Differential Entropy | Slides | Video |
Rate-Distortion Autoencoders (RDAs) | Slides | Video |
Noisy Channel RDAs | Slides | Video |
Latent Variables and Variational Autoencoders:
The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs) | Slides | Video |
Posterior Collapse, beta-VAEs, and Encoder Autonomy. | Slides | Video |
Vector Quantized VAEs | Slides | Video |
Unsupervised speech representation learning | ||
Juke Box: An amazing application of VQ-VAEs | ||
Problems | ||
VAE Coding Problem (PyTorch) |
Mutual Information Training Objectives:
Mutual Information Co-Training | Slides | Video |
Constrastive Predictive Coding (CPC) | Slides | Video |
Some Mathematics of Contrastive Methods: (optional)
Optimal Discrimination and Jensen-Shannon Divergence | Slides |
Noise Contrastive Estimation | Slides |
Contrastive GANs | Slides |
Reinforcement Learning (RL):
Basic Definitions, Value Iteration | Slides | Video |
Q-Learning and Deep Q Networks (DQN) for Atari | Slides | Video |
The REINFORCE algorithm | Slides | Video |
Actor-Critic algorithms, A3C for Atari | Slides | Video |
Problems | ||
Quiz 4, Autumn 2020 |
Background Algorithms | Slides | Video |
The AlphaZero Training Algorithm | Slides | Video |
AlphaZero Results | Slides | Video |
AlphaStar | Slides | Video |
Problems |
The Quest for Artificial General Intelligence (AGI):
AGI: Universality | Slides | Video |
AGI: Bootstrapping | Slides | Video |
AGI: Logic | Slides | Video |
AGI: Natural Language | Slides | Video |