Autumn 2020
Lecture: David McAllester
TA: Pedro Savarese
This class is intended to provide students with an understanding of the technical content of current research in deep learning. Students successfully completing the class should be able to read and understand current deep learning research papers and posses the technical knowledge necessary to both reproduce research results and to do original research in deep learning. The course covers current methods in computer vision, natural language processing and reinforcement learning for games and robotics. One of the amazing aspects of deep learning is that much the conceptual knowledge needed for research in these areas is shared among the areas making such broad coverage possible.
The official meeting time is MWF 1:50-2:40. However, this will be an on-line class and recorded lectures will be available in advance. The class time will be used for on-line office hours (by zoom) for Prof. McAllester.
Evolving Course Material:
The History of Deep Learning and Moore's Law of AI | Slides | Video |
The Fundamental Equations of Deep Learning | Slides | Video |
Some Information Theory | Slides | Video |
Problems |
Frameworks and Back-Propagation: Reviewed at office hours Monday Oct. 5 and Wednesday Oct. 7.
MNIST in EDF coding problem due Friday Oct. 9.
Deep Learning Frameworks | Slides | Video |
Backpropagation for Scalar Source Code | Slides | Video |
Backpropagation for Tensor Source Code | Slides | Video |
Minibatching: The Batch Index | Slides | Video |
The Educational Framework (EDF) | Slides | Video |
Problems | ||
EDF source code 150 lines of Python/NumPy | ||
MNIST in EDF problem set | ||
PyTorch tutorial |
Einstein Notation | Slides | Video |
CNNs | Slides | Video |
Pytorch Convolution Functions | ||
Invariant Theory (optional) |
Trainability, Residual Connections and RNNs: Reviewed at office hours Wednesday Oct. 14.
Quiz 1 is Friday Oct. 16.
Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet) | Slides | Video |
Language Modeling | Slides | Video |
Recurrent Neural Networks (RNNs) | Slides | Video |
Problems |
Attention, Machine Translation and the Transformer: Reviewed at office hours Monday Oct. 19
Machine Translation and Attention | Slides | Video |
The Transformer | Slides | Video |
Statistical Machine Translation (optional) | Slides | Video(?) |
Problems |
Stochastic Gradient Descent I Reviewed at office hours Wednesday Oct. 21.
Problem set 2 due Friday Oct. 23.
The Classical Convergence Theorem | Slides | Video |
Decoupling the Learning Rate from the Batch Size | Slides | Video |
Momentum as a Running Average and Decoupled Momentum | Slides | Video |
Stochastic Gradient Descent II: Reviewed at office hours Friday Oct. 23,
RMSProp, and Adam and Decoupled Versions | Slides | Video |
Gradient Flow | Slides | Video |
Heat Capacity with Loss as Energy and Learning Rate as Temperature | Slides | Video |
Continuous Time Noise and Stationary Parameter Densities | Slides | Video |
Problems |
Generalization and Regularization I: Reviewed at office hours Wednesday Oct 28
.Quiz 2 on Friday Oct 30.
Early Stopping, Shrinkage and Decoupled Shrinkage | Slides | Video |
Generalization and Regularization II: Reviewed at office hours Monday Nov. 2.
PAC-Bayes Generalization Theory | Slides | Video |
Implicit Regularization | Slides | Video |
Double Descent | Slides | Video |
Problems | ||
PAC-Bayes Tutorial |
Deep Graphical Models I: Reviewed at office hours Wednesday Nov. 4.
Problem set 3 Due Friday Nov. 6.
Exponential Softmax | Slides | Video |
Speech Recognition: Connectionist Temporal Classification (CTC) | Slides | Video |
Backprogation for Exponential Softmax: The Model Marginals | Slides | Video |
Deep Graphical Models II: Reviewed at office hours Friday Nov. 6.
Monte-Carlo Markov Chain (MCMC) Sampling | Slides | Video |
Pseudo-Likelihood and Contrastive Divergence | Slides | Video |
Loopy Belief Propagation (Loopy BP) | Slides | Video |
Noise Contrastive Estimation | Slides | Video |
Problems |
Overview and Timeline of GAN Development | Slides | Video |
Replacing the Loss Gradient with the Margin Gradient. | Slides | Video |
Optimal Discrimination and Jensen-Shannon Divergence | Slides | Video |
Contrastive GANs | Slides | Video |
Problems |
Rate-Distortion Autoencoders: Reviewed at office hours Wednesday Nov. 11
Quiz3, Friday Nov. 13
Perils of Differential Entropy | Slides | Video |
Rate-Distortion Autoencoders (RDAs) | Slides | Video |
Noisy Channel RDAs | Slides | Video |
Gaussian Noisy Channel RDAs | Slides | Video |
Latent Variables and Variational Autoencoders: Reviewed at office hours Monday Nov 16.
Interpretability of Latent Variables | Slides | Video |
The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs) | Slides | Video |
Gaussian VAEs | Slides | Video |
Posterior Collapse, VAE Non-Identifiability, and beta-VAEs | Slides | Video |
Vector Quantized VAEs | Slides | Video |
Problems |
Pretraining: Reviewed at office hours Wednesday Nov. 18
Problem set 4 due Friday, Nov 20.
Pretraining for NLP | Slides | Video |
Supervised ImageNet Pertraining | Slides | Video |
Self-Supervised Pretraining for Vision | Slides | Video |
Contrastive Predictive Coding | Slides | Video |
Mutual Information Coding | Slides | Video |
Problems |
Reinforcement Learning (RL): Reviewed at office hours Friday Nov. 20.
Basic Definitions, Q-learning, Deep Q Networks (DQN) for Atari | Slides | Video |
The REINFORCE algorithm, Actor-Critic algorithms, A3C for Atari | Slides | Video |
Problems |
Background Algorithms | Slides | Video |
The AlphaZero Training Algorithm | Slides | Video |
Some Quantitative Empirical Results | Slides | Video |
The Policy as a Q-Function | Slides | Video |
What Happened to alpha-beta? | Slides | Video |
AlphaStar | Slides | Video |
Problems |
The Quest for Artificial General Intelligence (AGI): Reviewed at office hours Wednesday, December 2
Quiz 4 on Friday December 4.
The Free Lunch Theorem and The Intelligence Explosion | Slides | Video |
Representing Functions with Shallow Circuits: The Classical Universality Theorems | Slides | Video |
Representing Functions with Deep Circuits: Circuit Complexity Theory | Slides | Video |
Representing Functions with Programs: Python, Assembler and the Turing Tarpit | Slides | Video |
Representing Functions and Knowledge with Logic | Slides | Video |
Representing Choices and Knowledge with Natural Language | Slides | Video |