Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods

Research output: Contribution to journalArticle (Academic Journal)

Abstract

Neural network optimization methods fall into two broad classes: adaptive methods such as Adam and non-adaptive methods such as vanilla stochastic gradient descent (SGD). Here, we formulate the problem of neural network optimization as Bayesian filtering. We find that state-of-the-art adaptive (AdamW) and non-adaptive (SGD) methods can be recovered by taking limits as the amount of information about the parameter gets large or small, respectively. As such, we develop a new neural network optimization algorithm, AdaBayes, that adaptively transitions between SGD-like and Adam(W)-like behaviour. This algorithm converges more rapidly than Adam in the early part of learning, and has generalisation performance competitive with SGD.
Original languageEnglish
Article number1807.07540
JournalarXiv
Publication statusAccepted/In press - 31 Jul 2019

Fingerprint Dive into the research topics of 'Bayesian filtering unifies adaptive and non-adaptive neural network optimization methods'. Together they form a unique fingerprint.

Cite this