Stochastic Gradient Descent (SGD)
Slides on Basic Algorithms and Hyperparameter Decoupling
Additional Material:
Slides on Gradient Flow and Langevin Dynamics
Slides on a Quenching Algorithm
Training Resnt-50 on Imagenet in one hour
Paper on batch size scaling of the learning rate and momentum parameter