Abstract

The use of machine learning and AI on electronic health records (EHRs) holds substantial potential for clinical insight. However, this approach faces significant challenges due to data heterogeneity, sparsity, temporal misalignment, and limited labeled outcomes. In this context, we leverage a linked EHR dataset of approximately one million de-identified individuals from Bristol, North Somerset, and South Gloucestershire, UK, to characterize urinary tract infections (UTIs) and develop predictive models focused on data quality, fairness and transparency. A comprehensive data pre-processing and curation pipeline transforms the raw EHR data into a structured format suitable for AI modeling. Given the limited availability and biases of ground truth UTI outcomes, we introduce a UTI risk estimation framework informed by clinical expertise to estimate UTI risk across individual patient timelines. Using this framework, we built pairwise XGBoost models to differentiate UTI risk categories with explainable AI techniques to identify key predictors while ensuring interpretability. Our findings reveal differences in clinical and demographic factors across risk groups, offering insights into UTI risk stratification and progression. This study demonstrates the added value of AI-driven insights into UTI clinical decision-making while prioritizing interpretability, transparency, and fairness, underscoring the importance of sound data practices in advancing health outcomes.
Original languageEnglish
PublisherarXiv.org
DOIs
Publication statusPublished - 26 Nov 2024

Keywords

  • cs.LG
  • cs.AI

Fingerprint

Dive into the research topics of 'Explainable AI for Classifying UTI Risk Groups Using a Real-World Linked EHR and Pathology Lab Dataset'. Together they form a unique fingerprint.

Cite this