On the Connection Between the Human Visual System and Machine Learning
: Exploiting Perceptual Distances in Neural Networks

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)

Abstract

It has been demonstrated many times that the behaviour of the human visual system is connected to the statistics of natural images. Since machine learning relies on the statistics of training data as well, the above connection has interesting implications. Of particular interest is when perceptual distances, that capture how the human visual system processes images and produce subjective opinions, are used in the machine learning pipeline. In this thesis we examine three scenarios utilising perceptual metrics in different parts of the pipeline; in the model architecture, as regularisation in the loss function and in explaining the model decisions.
In the first scenario, we propose a perceptual metric using a neural network whose architecture is inspired by the various processing stages in the human visual system. We show that a carefully designed neural network architecture can outperform traditional architectures with several orders of magnitude less parameters.
In the next chapter, we show that replacing the l1-norm with a perceptual metric as a regulariser in image-to-image translation improves the quality of images generated, evaluated with both non-reference image quality metrics and human studies. Our final scenario involves using a perceptual metric in generating explanations for a models decision. In the family of surrogate explainers, the perceptual metric is used as a neighbourhood weighting and leads to more coherent explanations for reference images and images that have natural distortions applied.
Lastly, in the final chapter we aim to unravel the non-trivial relationship between the probability distribution of the data, perceptual distances, and unsupervised machine learning. To this end, we show that perceptual sensitivity is correlated with the probability of an image in its close neighbourhood. We also explore the relation between distances induced by autoencoders and the probability distribution of the data used for training them, as well as how these induced distances are correlated with human perception. In the previous chapters, scenarios of utilising perceptual distances in the machine learning pipeline are presented and we observe a small performance increase over the use of standard Euclidean distances. Given the difference in the ability to predict human judgements between Euclidean and perceptual distances, one would expect a larger increase in performance. We discuss why this might not lead to noticeable gains in performance for common image processing tasks and propose this may be due to a double counting effect of the image statistics, once in the perceptual distance and once in the training procedure. When data is scarce, the perceptual metric acts as regularisation and the performance increase is noticeable. When training an autoencoder using random uniform noise as an input, optimising for a perceptual metric leads to a significant increase in quality of reconstructions compared to optimising for a Euclidean distance. We also find that perceptual metrics obtain more accurate gradients with batch stochastic gradient descent using a small batch size.
Date of Award22 Mar 2022
Original languageEnglish
Awarding Institution
  • University of Bristol
SupervisorRaul Santos-Rodriguez (Supervisor), Ryan McConville (Supervisor) & Valero Laparra Perez-Muelas (Supervisor)

Cite this

'