Deep Video Compression

  • Di Ma

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)


With the increase in demand for improved viewing quality and more immersive experiences, the tension between the large amounts of video data consumed everyday and the available bandwidth is ever increasing. To address this issue, new video coding standards have been developed including Versatile Video Coding and Alliance for Open Media Video 1. Although these compression techniques have achieved evident coding gains when compared to current standards, they still employ a similar framework to that used in previous codecs, but with much more sophisticated modifications and enhancements. None of them however, exploits recent advances in artificial intelligence and machine learning.

In this context, this thesis describes novel CNN-based algorithms to further enhance video compression efficiency. It first presents a new extensive and representative video database (BVI-DVC) for training deep video compression algorithms, which can provide significantly improved training effectiveness compared to other commonly used image and video training databases. The overall additional coding improvements (based on the HEVC HM 16.20) by using the BVI-DVC for all tested coding modules and CNN architectures are up to 10.3% based on the assessment of PSNR and 8.1% based on VMAF.

Novel network architectures have also been investigated in the context of video coding, including MFRNet, which consists of new multi-level feature review residual dense blocks. This structure offers significant coding gains when integrated into various enhancement-based coding tools. When compared to the state-of-the-art networks, up to 11.6% (PSNR) and 11.7% (VMAF) of overall additional coding gains have been provided by MFRNet based on the HEVC HM 16.20.

The perceptual quality of CNN reconstructed content has been further improved through the utilisation of GAN-based networks and training methodologies. The new CVEGAN architecture is also presented in this thesis, which achieves superior compression performance over state-of-the-art architectures for different coding tools (an average additional coding gain up to 18.1% (VMAF) has been achieved based on the HEVC HM 16.20).

Finally, the complexity issue of these CNN-based coding tools is addressed through flexible complexity distribution between the encoder and decoder. By including a CNN-based resolution down-sampling module, we have achieved both coding performance improvement (more than 10% (BD-rate PSNR) based on the HEVC HM 16.20) and computational complexity reduction (29% and 10% for encoder and decoder, respectively).
Date of Award24 Jun 2021
Original languageEnglish
Awarding Institution
  • University of Bristol
SupervisorDavid R Bull (Supervisor)

Cite this