Abstract
With the rapid growth of video streaming and increasing expectations for perceptual quality, delivering artifact-free content has become a critical objective. However, modern video delivery pipelines—including acquisition, compression, transmission, and format adaptation (e.g., HDR, HFR, UHD)—often introduce multiple co-occurring artifacts (e.g., banding, blockiness, motion blur) that degrade visual quality. Although video quality assessment (VQA) and artifact detection methods have been proposed to address these issues, existing approaches remain limited in three key areas: (1) the lack of robust architectures for jointly detecting multiple interacting artifacts in complex streaming conditions; (2) the absence of comprehensive benchmarks tailored to multi-artifact detection in professionally generated content (PGC); and (3) the limited generalizability and scalability of traditional VQA models, which often depend on isolated artifact assumptions, heuristic thresholds, or extensive subjective annotations.To address these gaps, this thesis introduces novel deep learning-based methodologies, datasets, and unified frameworks that advance artifact-aware VQA and multi-artifact detection. First, it introduces BVI-Artefact, the first large-scale public dataset tailored for multi-artifact detection in professionally generated content (PGC), comprising 480 annotated sequences with ten common artifact types. Benchmarking on this dataset reveals notable weaknesses in existing detectors, motivating the need for improved solutions.
Second, it proposes RankDVQA, a deep ranking-based VQA framework that reformulates quality prediction as a pairwise ranking task. Combining a patch-level quality network (PQANet) trained on VMAF-derived labels with a spatio-temporal aggregator (STANet), RankDVQA significantly outperforms state-of-the-art methods on eight HD test datasets. It has been further extended as RankDVQA+, which supports cross-format generalization by incorporating a novel SlowFast-inspired spatio-temporal aggregation network to better leverage the distinct characteristics of video content. Evaluations on a wide range of HD, UHD, HDR, and HFR/VFI test databases demonstrate that both the Full-Reference (FR) and No-Reference (NR) versions of RankDVQA+ consistently outperform existing cutting-edge VQA methods.
Finally, this thesis introduces MVAD, a unified deep model for the simultaneous detection of ten video artifact types. As the first multi-artifact detector to operate independently of video quality assessment models, MVAD leverages an Artifact-aware Dynamic Feature Extractor (ADFE) and a Recurrent Memory Vision Transformer (RMViT). When trained on a large, adversarially augmented dataset simulating real-world streaming, MVAD significantly outperforms seven existing detectors on the Maxwell and BVI-Artefact datasets, achieving an average mAP improvement of 18.7\%.
In summary, these contributions establish a comprehensive foundation for robust, perceptually informed video quality systems. Future research should focus on exploring foundation models for zero-shot VQA and integrating lightweight artifact detectors as perceptual priors in neural codecs and streaming frameworks.
| Date of Award | 9 Dec 2025 |
|---|---|
| Original language | English |
| Awarding Institution |
|
| Supervisor | David R Bull (Supervisor) & Fan Zhang (Supervisor) |