TTIC 31230: Fundamentals of Deep Learning Winter 2020

Stochastic Gradient Descent (SGD)

Slides on Basic Algorithms and Hyperparameter Decoupling

Problems

Additional Material:

Slides on Gradient Flow and Langevin Dynamics

Langevin Dynamics Probles

Slides on a Quenching Algorithm

Blog post on SGD variants

Training Resnt-50 on Imagenet in one hour

Paper on batch size scaling of the learning rate and momentum parameter

Adding Gradient Noise

Temperature Cycling in SGD

MCMC with momentum