Towards more efficient CNN models with filter distribution templates

  • Ramon Izquierdo Cordova

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)

Abstract

Researchers in the field of Deep Learning have used convolutional neural network architectures to solve computer vision tasks claiming improved performance with reduced computational cost. Designing these networks remains a complex task, relying primarily on human experience. These new architectures have been created with important innovative elements (ReLu, dropout and batch-normalisation layers) and new structures (residual connections, grouped convolutions). However, a particular feature has remained unchanged: the pyramidal pattern for distributing the number of filters in each layer. Initially proposed in 1989 in the LeNet architecture, this pyramidal design is found in classical and state-of-the-art CNN architectures. The reason behind this incremental design relies on the roots of the deep learning definition aiming to learn hierarchical levels of representation, making complex concepts in higher levels by reusing simpler low-level ones. Because many convolutional networks are tested mainly on the ImageNet dataset, and the pyramidal distribution is tuned to fit that dataset, it is unclear if the pattern works well in other datasets and domains or if other distributions could yield better performances.

This thesis introduces the concept of filter distribution templates, a small set of filter distribution patterns differing from the widely adopted pyramidal distribution for reassigning filters in an existing convolutional network without varying the original architecture. This research experimentally shows that models produced with templates are superior to those using the pyramidal distribution of filters in several popular datasets from the domains of image and audio classification and camera pose estimation. For example, experiments with three popular handcrafted architectures (VGG, ResNet and MobileNetV2) and one automatically discovered (MNASNet) trained on MNIST, CIFAR, CINIC10 and TinyImagenet datasets show that models with these alternative distributions are more resource-efficient reaching reductions up to 90% in parameters and 79% in memory needs while matching or surpassing the original model accuracy.
Date of Award27 Sept 2022
Original languageEnglish
Awarding Institution
  • University of Bristol
SupervisorWalterio W Mayol-Cuevas (Supervisor) & Oliver Ray (Supervisor)

Cite this

'