Abstract
Objective
When developing prediction models, researchers commonly employ a single model which uses all the available data (end-to-end approach). Alternatively, a similarity-based approach has been previously proposed, in which patients with similar clinical characteristics are first grouped into clusters, then prediction models are developed within each cluster. The potential advantage of the similarity-based approach is that it may better address heterogeneity in patient characteristics. However, it remains unclear whether it improves the overall predictive performance. We illustrate the similarity-based approach using data from people with depression and empirically compare its performance with the end-to-end approach.
Methods
We used primary care data collected in general practices in the UK. Using 31 predefined baseline variables, we aimed to predict the severity of depressive symptoms, measured by Patient Health Questionnaire-9, 60 days after initiation of antidepressant treatment. Following the similarity-based approach, we used k-means to cluster patients based on their baseline characteristics. We derived the optimal number of clusters using the Silhouette coefficient. We used ridge regression to build prediction models in both approaches. To compare the models’ performance, we calculated the mean absolute error (MAE) and the coefficient of determination (R2) using bootstrapping.
Results
We analysed data from 16 384 patients. The end-to-end approach resulted in an MAE of 4.64 and R2 of 0.20. The best-performing similarity-based model was for four clusters, with MAE of 4.65 and R2 of 0.19.
Conclusions
The end-to-end and the similarity-based model yielded comparable performance. Due to its simplicity, the end-to-end approach can be favoured when using demographic and clinical data to build prediction models on pharmacological treatments for depression.
When developing prediction models, researchers commonly employ a single model which uses all the available data (end-to-end approach). Alternatively, a similarity-based approach has been previously proposed, in which patients with similar clinical characteristics are first grouped into clusters, then prediction models are developed within each cluster. The potential advantage of the similarity-based approach is that it may better address heterogeneity in patient characteristics. However, it remains unclear whether it improves the overall predictive performance. We illustrate the similarity-based approach using data from people with depression and empirically compare its performance with the end-to-end approach.
Methods
We used primary care data collected in general practices in the UK. Using 31 predefined baseline variables, we aimed to predict the severity of depressive symptoms, measured by Patient Health Questionnaire-9, 60 days after initiation of antidepressant treatment. Following the similarity-based approach, we used k-means to cluster patients based on their baseline characteristics. We derived the optimal number of clusters using the Silhouette coefficient. We used ridge regression to build prediction models in both approaches. To compare the models’ performance, we calculated the mean absolute error (MAE) and the coefficient of determination (R2) using bootstrapping.
Results
We analysed data from 16 384 patients. The end-to-end approach resulted in an MAE of 4.64 and R2 of 0.20. The best-performing similarity-based model was for four clusters, with MAE of 4.65 and R2 of 0.19.
Conclusions
The end-to-end and the similarity-based model yielded comparable performance. Due to its simplicity, the end-to-end approach can be favoured when using demographic and clinical data to build prediction models on pharmacological treatments for depression.
| Original language | English |
|---|---|
| Article number | bmjment-2023-300701 |
| Journal | BMJ Mental Health |
| Volume | 26 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 14 Jun 2023 |
Bibliographical note
Funding Information:OE was supported by the Swiss National Science Foundation (Ambizione grant number 180083). EGO is funded by the National Institute for Health Research (NIHR) Research Professorship to Professor AC (grant RP-2017-08-ST2-006), by the National Institute for Health Research (NIHR) Applied Research Collaboration Oxford and Thames Valley (ARC OxTV) at Oxford Health NHS Foundation Trust, by the National Institute for Health Research (NIHR) Oxford cognitive health Clinical Research Facility and by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215-20005). FDC is supported by the NIHR Research Professorship to AC (grant RP-2017-08-ST2-006) and by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215-20005). AC is supported by the National Institute for Health Research (NIHR) Oxford cognitive health Clinical Research Facility, by an NIHR Research Professorship (grant RP-2017-08-ST2-006), by the NIHR Oxford and Thames Valley Applied Research Collaboration, and by the NIHR Oxford Health Biomedical Research Centre (grant BRC-1215-20005); he is currently the CI/PI of two trials about seltorexant in depression, sponsored by Janssen.
Publisher Copyright:
© Author(s) (or their employer(s)) 2023. Re-use permitted under CC BY. Published by BMJ.