Abstract
We investigate the local spectral statistics of the loss surface Hessians of artificial neural networks, where we discover excellent agreement with Gaussian Orthogonal Ensemble statistics across several network architectures and datasets. These results shed new light on the applicability of Random Matrix Theory to modelling neural networks and suggest a previously unrecognised role for it in the study of loss surfaces in deep learning. Inspired by these observations, we propose a novel model for the true loss surfaces of neural networks, consistent with our observations, which allows for Hessian spectral densities with rank degeneracy and outliers, extensively observed in practice, and predicts a growing independence of loss gradients as a function of distance in weight-space. We further investigate the importance of the true loss surface in neural networks and find, in contrast to previous work, that the exponential hardness of locating the global minimum has practical consequences for achieving state of the art performance.
Original language | English |
---|---|
Article number | 126742 |
Number of pages | 12 |
Journal | Physica A: Statistical Mechanics and its Applications |
Volume | 590 |
Early online date | 11 Dec 2021 |
DOIs | |
Publication status | Published - 15 Mar 2022 |
Bibliographical note
Funding Information:JPK is pleased to acknowledge support from ERC Advanced Grant 740900 (LogCorRM). DMG is grateful for the support from the JADE computing facility and in particular the extensive support of Andrew Gittings. NPB is grateful for the support of the Advanced Computing Research Centre of the University of Bristol. Furthermore the authors would like to thank Samuel Albanie for extensive discussions on the exponential hardness of the true loss.
Funding Information:
The authors declare the following financial interests/personal relationships which may be considered as potential competing interests: Nicholas Baskerville reports financial support was provided by Government Communications Headquarters. Jonathan Keating reports financial support was provided by European Research Council.
Publisher Copyright:
© 2021 Elsevier B.V.
Keywords
- cs.LG
- math-ph
- math.MP
- stat.ML
Fingerprint
Dive into the research topics of 'Appearence of Random Matrix Theory in Deep Learning'. Together they form a unique fingerprint.Equipment
-
HPC (High Performance Computing) and HTC (High Throughput Computing) Facilities
Alam, S. R. (Manager), Williams, D. A. G. (Manager), Eccleston, P. E. (Manager) & Greene, D. (Manager)
Facility/equipment: Facility