Deep Convolutional Networks as shallow Gaussian Processes

Adria Garriga-Alonso, Carl E. Rasmussen, Laurence Aitchison

Research output: Chapter in Book/Report/Conference proceedingConference Contribution (Conference Proceeding)


We show that the output of a (residual) CNN with an appropriate prior over the weights and biases is a GP in the limit of infinitely many convolutional filters, extending similar results for dense networks. For a CNN, the equivalent kernel can be computed exactly and, unlike "deep kernels", has very few parameters: only the hyperparameters of the original CNN. Further, we show that this kernel has two properties that allow it to be computed efficiently; the cost of evaluating the kernel for a pair of images is similar to a single forward pass through the original CNN with only one filter per layer. The kernel equivalent to a 32-layer ResNet obtains 0.84% classification error on MNIST, a new record for GP with a comparable number of parameters.
Original languageEnglish
Title of host publicationInternational Conference on Learning Representations
Publication statusPublished - 6 May 2019

Publication series



Dive into the research topics of 'Deep Convolutional Networks as shallow Gaussian Processes'. Together they form a unique fingerprint.

Cite this