Skip to main navigation Skip to search Skip to main content

Leveraging home-based semi-structured writings and unstructured conversations for early dementia detection

  • Dan P Kumpik

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)

Abstract

Alzheimer’s disease affects 42 million people globally, causing progressive memory, executive and language dysfunction. Developing scalable, noninvasive tools for early detection of prodromal dementia is critical for timely intervention and improved outcomes. Naturalistic narratives offer a promising biomarker, showing deficits decades before cognitive symptoms. Spoken conversations have gained attention due to their cognitive complexity and ease of acquisition, although they are usually collected in semi-structured interviews which limit ecological validity and overlook longitudinal insights from changes in companion behaviors, such as increased prompting during word-finding difficulties.
This thesis develops machine learning approaches for identifying early cognitive decline through passive monitoring of naturalistic, unstructured conversations between cohabiting companions. In Chapter Two, we validated the relevance to cognitive change of linguistic features across lexical, connected language, semantic, and sentiment domains using semi-structured handwritten narratives from the Caerphilly Prospective Study, a longitudinal dataset spanning 11+ years. Deficits were multifactorial and sometimes compensatory, with connected language particularly sensitive to early decline. In Chapter Three, in the CUBOId study, we applied these features to naturalistic, home-based conversations recorded six months apart. Mixed-effects analyses revealed sensitivity to baseline differences and changes over time, particularly for connected language. Combining linguistic features with embeddings from a DistilRoBERTa large language model (LLM), we developed a multi-task learning pipeline for diagnosis and speaker identification as primary and auxiliary tasks. Ablation experiments confirmed strong diagnostic performance when both linguistic features and LLM embeddings were included. In Chapter Four, we tested whether intermediate transfer learning on a semi-structured dementia dataset improved performance. However, direct transfer to naturalistic conversations proved more effective, suggesting unstructured language samples alone are sufficient for robust dementia detection "in-the-wild".

Our findings highlight connected language features and LLMs as promising tools for early dementia detection and demonstrate the clinical value of passive monitoring of unstructured speech in naturalistic settings.
Date of Award30 Sept 2025
Original languageEnglish
Awarding Institution
  • University of Bristol
SupervisorRaul Santos-Rodriguez (Supervisor), Yoav Ben-Shlomo (Supervisor) & E J Coulthard (Supervisor)

Cite this

'