This year the course will not involve programming assignments or class projects. There will be problem sets but the grade will be based entirely on exams including a final. Exams will include problems sampled from the problem sets plus new problems. I will generally give permission to take the class but prospective students might want to look at the first lecture slides and the associated problems to get a sense of the level of mathematical maturity assumed.
The course will involve reading and writing
pseudo-code corresponding to code in frameworks such as PyTorch. This is analogous to the use of
pseudo-code in an algorithms class as distinct from actual programming in a programming class.
This course covers the topics listed below. Most topics are relevant to most applications ---
applications to natural language processing, computer vision, speech recognition, computational
biology, and computational chemistry will be integrated into the presentations of the general
- Information theory: entropy, cross-entropy, KL-divergence, mutual information.
- Deep learning frameworks: computation graphs, back-propagation, minibatching.
- Basic Architectures: multi-layer perceptrons, convolutional neural
networks, Einstein notation.
- More advanced architectures: gated RNNs (LSTMs), ResNet, attention.
- Stochastic gradient descent (SGD): standard variations (Vanilla, Adam, RMSProp), minibatch
scaling laws, second order methods, Hessian-vector products, SGD-friendly initialization.
- Generalization and Regularization: PAC-Bayesian generalization bounds, L2 regularization
- Autoencoders: rate-distortion autoencoding, variational autoencoding (VAEs) and the evidence
lower bound (the ELBO), vector quantized VAEs (VQ-VAE).
- Deep graphical models: expectation maximization (EM), expectation gradient (EG), connectionsist
temporal classification (CTC), various EG approximations.
- Generative Adversarial Networks (GANs): Adversarial optimization, Jensen-Shannon divergence,
mode collapse, Wasserstein GANs, progressive GANs.
- Deep Reinforcement Learning: The REINFORCE algorithm, policy-gradient theorems, DQN, A3C,
Office Hours: Mondays 9:00-11:00, TTIC 530 and 1:00-3:00, sheTTIC 435
- Tuesday, January 15, 10% of grade, class 3
- Tuesday, January 29, 20% of grade, class 7
- Tuesday, February 12, 20% of gradem=, class 11
- Tuesday, February 26, 20% of grade, class 15
- Final, Tuesday, March 19, 1:30-3:30, TTI 526B, 30% of grade
Lectures Slides and Course Material (under development --- please refresh for latest version):
- The Fundamental Equations of Deep Learning
- Back-Propagation and Frameworks
- The Educational Framework (EDF) written in Python/NumPy
- Convolutional Neural Networks (CNNs)
- Controling Gradients: Initialization, Batch Normalization, ResNet and Gated RNNs
- Language Modeling, Machine Translation and Attention
- First Order Stochastic Gradient Descent (SGD)
- Rate-Distortion Autoencoders (RDAs)
- Variational Autoencoders (VAEs) and Noisy Channel RDAs
- Generative Adversarial Networks (GANs)
- Reinforcement Learning (RL)
- Deep Graphical Models
- Connectionist Temporal Classification (CTC)
- Gradients as Dual Vectors, Hessian-Vector Products, and Information Geometry
- The Black Box Problem
- Algorithms for Unfriendly Graphical Models
- The Quest for Artificial General Intelligence (AGI)