Abstract
Psychosocial risk assessment is a cornerstone of mental health care, yet remains resource-intensive and inconsistently delivered across domains such as suicide, intimate partner violence (IPV), and substance misuse. Recent advances in large language models (LLMs) raise the possibility of scalable, conversational agents capable of detecting and evaluating psychosocial risk. Across three interlinked studies, we evaluated the performance of LLMs in this context. Study 1 benchmarked GPT-4 and Claude 3 Sonnet against vignettes constructed from participants’ lived-experience, finding high accuracy in detecting risk domains and substantial agreement with participant-rated severity, though suicidality proved more challenging than IPV or substance misuse. Study 2 examined participants’ perceptions of LLM-generated responses, revealing that most judged them accurate, empathic, and clinically useful, with no differences across models or domains. Study 3 implemented a supervised, three-agent GPT-4o-based chatbot system integrating one chatbot as a therapeutic agent, a supervisor for risk detection, and a JSON-based assessor for structured evaluation. The therapeutic agent chatbot was successfully completed full risk assessments most of the time while maintaining therapeutic quality. Together, these studies suggest that LLMs can contribute to psychosocial risk detection and structured assessment under controlled conditions, while underscoring the need for careful supervision, rigorous validation, and clearly defined boundaries before consideration of real-world clinical deployment.
| Original language | English |
|---|---|
| Article number | e0001352 |
| Number of pages | 25 |
| Journal | PLOS Digital Health |
| Volume | 5 |
| Issue number | 4 |
| DOIs | |
| Publication status | Published - 27 Apr 2026 |
Bibliographical note
Publisher Copyright:© 2026 Vowels et al.
UN SDGs
This output contributes to the following UN Sustainable Development Goals (SDGs)
-
SDG 3 Good Health and Well-being
-
SDG 16 Peace, Justice and Strong Institutions
Fingerprint
Dive into the research topics of 'Large language models for psychosocial risk assessment: A multi-method evaluation across suicide, intimate partner violence, and substance misuse'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver