Deep Learning-Based Analysis of Videofluoroscopy Imaging

  • Simon Zeng

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)

Abstract

Recent advances in deep learning have redefined image analysis, with transformer-based architectures leading performance gains. Vision Transformers excel at capturing global context, while Swin Transformers enhance accuracy by jointly modeling local and global information. Additionally, large-scale foundation models have emerged as robust tools for transfer learning.
However, applying these general vision methods to medical imaging remains complex. Videofluoroscopic Swallow Studies (VFSS) present unique challenges, including motion artifacts, variable image quality, and a scarcity of annotated data. Consequently, the primary motivation of this work is to adapt these powerful architectures to improve VFSS analysis and diagnosis specifically.

Parallel to this, Physics-Informed Neural Networks (PINNs) offer a promising avenue for integrating physical constraints—such as tissue mechanics—directly into models. This thesis demonstrates the synergy of Vision Transformers, Swin Transformers, and foundation models in segmenting VFSS data. It establishes a framework tailored to clinical demands and outlines a future vision for unifying representation learning with PINNs. Ultimately, this research aims to deliver diagnostic tools that are accurate, efficient, and objective for practitioners and patients.
Date of Award9 Dec 2025
Original languageEnglish
Awarding Institution
  • University of Bristol

Cite this

'