Lecture: David McAllester (mcallester@ttic.edu)
TA: David Yunis (dyunis@ttic.edu)
Graders: Shuo Xie and Feiyu Han
This class is intended to provide students with an understanding of the technical content of current research in deep learning. Students successfully completing the class should be able to read and understand current deep learning research papers and posses the technical knowledge necessary to both reproduce research results and to do original research in deep learning. The course covers current methods in computer vision, natural language processing and reinforcement learning for games and robotics. One of the amazing aspects of deep learning is that much the conceptual knowledge needed for research in these areas is shared among the areas making such broad coverage possible.
Prerequisites: This class assumes knowledge of vector calculus, basic linear algebra (Matrices, Eigenvectors, eigenvalues), and a significant familiarity and with probability and statistics. Familiarity with Markov chains is advised. The course is overall quite technical and a strong technical background and mathematical maturity is advised. There are machine problems and class programming project and previous familiarity with programming, and Python in particular, is advised.
In the fall of 2022 there will be three machine problem sets, three exams and a final project.
The History of Deep Learning and Moore's Law of AI (2020) | Slides | Video1 | Video2 |
Generative Spoken Language Model Demo | |||
Codex Demo | |||
Soccer Demo | |||
The Fundamental Equations of Deep Learning | Slides | Video | |
Some Information Theory | Slides | Video | |
Problems | |||
Solutions |
Frameworks and Back-Propagation:
Deep Learning Frameworks | Slides | Video |
Backpropagation for Scalar Source Code | Slides | Video |
Framework Objects and Backpropagation for Tensor Source Code | Slides | Video |
Minibatching: The Batch Index | Slides | Video |
The Educational Framework (EDF) | Slides | Video |
EDF and the MNIST Coding Problem | ||
PyTorch tutorial | ||
Problems | ||
Solutions |
Einstein Notation | Slides | Video |
CNNs | Slides | Video |
Pytorch Convolution Functions | ||
Dilation, Hypercolumns and Grouping (optional) | ||
Invariant Theory (optional) | ||
Problems | ||
Solutions |
Trainability, Residual Connections and RNNs:
Trainability: Relu, Initialization, Batch Normalization and Residual Connections (ResNet) | Slides | Video |
Language Modeling | Slides | Video |
Recurrent Neural Networks (RNNs) | Slides | Video |
Problems | ||
Solutions |
Attention, Machine Translation and the Transformer:
Machine Translation and Attention | Slides | Video |
The Transformer Part I | Slides | Video |
The Transformer Part II | Slides | Video |
Statistical Machine Translation (optional) | Slides | |
Masked Language Modeling, Gibbs Sampling and Pseudo-Likelihood | Slides | Video |
Problems | ||
Solutions |
SGD I: Convergence and Temperature.
The Classical Convergence Theorem | Slides | Video |
The Learning Rate, the Batch Size, and Temperature | Slides | Video |
Momentum and Temperature | Slides | Video |
RMSProp and Adam | Slides | Video |
Problems | ||
Solutions |
Gradient Flow | Slides | Video |
Stochastic Differential Equations (SDEs) | Slides | Video |
Stationary Distributions and Temperature | Slides | Video |
Heat Capacity: Loss (Energy) as a function of Learning Rate (Temperature). | Slides | Video |
Readings: SGD as Approximate Bayesian Inference, Mandtet al. 2017 | ||
Problems | ||
Solutions |
Generalization and Regularization I: Early Stopping and Shrinkage
Early Stopping and Shrinkage | Slides | Video |
Early Stopping as Shrikage, L1 regularization and Ensembles | Slides | Video |
Problems | ||
Solutions |
Generalization and Regularization II: PAC-Bayesian Leaning Theory
Learning Theory I: The Occam Guarantee | Slides | Video |
Learning Theory II: The PAC-Bayes Guarantee | Slides | Video |
Implicit Regularization | Slides | Video |
Double Descent | Slides | Video |
PAC-Bayes Tutorial | ||
Problems | ||
Solutions |
GAN Fundamentals | Slides | Video |
Timeline of GAN Development | Slides | Video |
SytleGAN2 YouTube1 | ||
SytleGAN2 YouTube2 | ||
SytleGAN2 Paper | ||
Problems | ||
Solutions |
Variational Autoencoders:
The Evidence Lower Bound (ELBO) and Variational Autoencoders (VAEs) | Slides | 2021 Video |
Perils of Differential Entropy | Slides | 2021 Video |
Vector Quantized VAEs | Slides | 2021 Video |
Progressive VAEs | Slides | 2021 Video |
Juke Box using VQ-VAEs | ||
Problems | ||
Solutions |
Contrastive Coding:
Contrastive Coding | Slides |
Problems |
Solutions |
Diffusion Models:
VAE Formulation | Slides |
Score Matching Formulation | Slides |
DALLE-2 Development Timeline | Slides |
Reinforcement Learning (RL):
Basic Definitions, Value Iteration | Slides | Video |
Q-Learning and Deep Q Networks (DQN) for Atari | Slides | Video |
The REINFORCE algorithm | Slides | Video |
Actor-Critic algorithms, A3C for Atari | Slides | Video |
Problems | ||
Solutions |
Background Algorithms | Slides | Video |
The AlphaZero Training Algorithm | Slides | Video |
AlphaZero Results | Slides | Video |
MuZero | Slides | |
AlphaStar | Slides | Video |
Problems | ||
Solutions |
The Quest for Artificial General Intelligence (AGI):
AGI: Universality | Slides | Video |
AGI: Bootstrapping | Slides | Video |
AGI: Logic | Slides | Video |
AGI: Natural Language | Slides | Video |