TY - JOUR
T1 - Natural language processing techniques to detect delirium in hospitalized patients from clinical notes
T2 - a systematic review
AU - Shankar, Ravi
AU - Kannan, Janani
AU - Tan, Yih Hng
AU - Xu, Qian
N1 - © 2025. The Author(s).
PY - 2025/11/20
Y1 - 2025/11/20
N2 - Delirium affects 50% of hospitalized older adults and 80% of ICU patients, yet detection rates are as low as 20-30% with traditional methods. Natural language processing (NLP) offers potential for automated detection from clinical notes. We systematically reviewed NLP techniques, searching eight databases through March 2025. From 912 records, 13 studies met inclusion criteria, analyzing over 450,000 patients. Studies employed rule-based methods (38%), machine learning (31%), deep learning (15%), topic modeling (8%), and semi-supervised learning (8%). Sensitivity ranged from 28.5% to 99.1%, with transformer models achieving the highest performance (AUROC 0.984). However, 61.5% of studies showed high risk of bias due to inadequate validation practices. Only one study conducted external validation, and none evaluated prospective implementation or patient outcomes. Critical reporting gaps included complete absence of fairness considerations, missing data handling procedures, and implementation guidance across all studies. While transformer-based models demonstrated superior performance, significant barriers remain before clinical deployment.
AB - Delirium affects 50% of hospitalized older adults and 80% of ICU patients, yet detection rates are as low as 20-30% with traditional methods. Natural language processing (NLP) offers potential for automated detection from clinical notes. We systematically reviewed NLP techniques, searching eight databases through March 2025. From 912 records, 13 studies met inclusion criteria, analyzing over 450,000 patients. Studies employed rule-based methods (38%), machine learning (31%), deep learning (15%), topic modeling (8%), and semi-supervised learning (8%). Sensitivity ranged from 28.5% to 99.1%, with transformer models achieving the highest performance (AUROC 0.984). However, 61.5% of studies showed high risk of bias due to inadequate validation practices. Only one study conducted external validation, and none evaluated prospective implementation or patient outcomes. Critical reporting gaps included complete absence of fairness considerations, missing data handling procedures, and implementation guidance across all studies. While transformer-based models demonstrated superior performance, significant barriers remain before clinical deployment.
U2 - 10.1038/s41746-025-02051-w
DO - 10.1038/s41746-025-02051-w
M3 - Article (Academic Journal)
C2 - 41266514
SN - 2398-6352
VL - 8
JO - npj Digital Medicine
JF - npj Digital Medicine
M1 - 701
ER -