David McAllester

Winter 2019

This year the course will not involve programming assignments or class projects. There will be problem sets but the grade will be based entirely on exams including a final. Exams will include problems sampled from the problem sets plus new problems. I will generally give permission to take the class but prospective students might want to look at the first lecture slides and the associated problems to get a sense of the level of mathematical maturity assumed.

The course will involve reading and writing pseudo-code corresponding to code in frameworks such as PyTorch. This is analogous to the use of pseudo-code in an algorithms class as distinct from actual programming in a programming class.

This course covers the topics listed below. Most topics are relevant to most applications --- applications to natural language processing, computer vision, speech recognition, computational biology, and computational chemistry will be integrated into the presentations of the general methods.

- Information theory: entropy, cross-entropy, KL-divergence, mutual information.
- Deep learning frameworks: computation graphs, back-propagation, minibatching.
- Basic Architectures: multi-layer perceptrons, convolutional neural networks, Einstein notation.
- More advanced architectures: gated RNNs (LSTMs), ResNet, attention.
- Stochastic gradient descent (SGD): standard variations (Vanilla, Adam, RMSProp), minibatch scaling laws, second order methods, Hessian-vector products, SGD-friendly initialization.
- Generalization and Regularization: PAC-Bayesian generalization bounds, L2 regularization (shrinkage), dropout.
- Autoencoders: rate-distortion autoencoding, variational autoencoding (VAEs) and the evidence lower bound (the ELBO), vector quantized VAEs (VQ-VAE).
- Deep graphical models: expectation maximization (EM), expectation gradient (EG), connectionsist temporal classification (CTC), various EG approximations.
- Generative Adversarial Networks (GANs): Adversarial optimization, Jensen-Shannon divergence, mode collapse, Wasserstein GANs, progressive GANs.
- Deep Reinforcement Learning: The REINFORCE algorithm, policy-gradient theorems, DQN, A3C, AlphaZero.

- Tuesday, January 15, 10% of grade, class 3
- Tuesday, January 29, 20% of grade, class 7
- Tuesday, February 12, 20% of gradem=, class 11
- Tuesday, February 26, 20% of grade, class 15
- Final, Tuesday, March 19, 1:30-3:30, TTI 526B, 30% of grade

Lectures Slides and Course Material (under development --- please refresh for latest version):

- The Fundamental Equations of Deep Learning
- Back-Propagation and Frameworks
- The Educational Framework (EDF) written in Python/NumPy
- Convolutional Neural Networks (CNNs)
- Controling Gradients: Initialization, Batch Normalization, ResNet and Gated RNNs
- Language Modeling, Machine Translation and Attention
- First Order Stochastic Gradient Descent (SGD)
- Regularization
- Rate-Distortion Autoencoders (RDAs)
- Variational Autoencoders (VAEs) and Noisy Channel RDAs
- Generative Adversarial Networks (GANs)
- Pretraining
- Reinforcement Learning (RL)
- AlphaZero
- Deep Graphical Models
- Connectionist Temporal Classification (CTC)
- Gradients as Dual Vectors, Hessian-Vector Products, and Information Geometry
- The Black Box Problem
- Algorithms for Unfriendly Graphical Models
- The Quest for Artificial General Intelligence (AGI)