Assessing and predicting adolescent and early adulthood common mental disorders in the ALSPAC cohort using electronic primary care data

Daniel Smith, Kathryn Willan, Stephanie Prady, Josie Dickerson, Gillian Santorelli, Kate M Tilling, Rosie P Cornish

Research output: Working paper


Objectives This paper has three objectives: 1) examine agreement between common mental disorders (CMDs) derived from primary health care records and repeated CMD questionnaire data from ALSPAC (the Avon Longitudinal Study of Parents and Children); 2) explore the factors affecting CMD identification in primary care records; and 3) taking ALSPAC as the reference standard, to construct models predicting ALSPAC-derived CMDs using primary care data. Design and Setting Prospective cohort study (ALSPAC) with linkage to electronic primary care data. Participants Primary care records were extracted for 11,807 ALSPAC participants (80% of the 14,731 eligible participants). The number of participants with both linked primary care and ALSPAC CMD data varied between 3,633 (age 15/16) to 1,298 (age 21/22). Outcome measures Outcome measures from ALSPAC data were diagnoses of suspected depression and/or CMDs. For the primary care data, Read codes for diagnosis, symptoms and treatment were used to indicate the presence of depression and CMDs. For each time point, sensitivities and specificities (using ALSPAC-derived CMDs as the reference standard) were calculated and the factors associated with identification of primary care-based CMDs in those with suspected ALSPAC-derived CMDs explored. Lasso models were then performed to predict ALSPAC CMDs from primary care data. Results Sensitivities were low for CMDs (range: 3.5 to 19.1%) and depression (range: 1.6 to 34.0%), while specificities were high (nearly all >95%). The strongest predictor of identification in the primary care data was symptom severity. The lasso models had relatively low prediction rates, especially for out-of-sample prediction (deviance ratio range: - 1.3 to 12.6%), but improved with age. Conclusions Even with predictive modelling using all available information, primary care data underestimate CMD rates compared to estimates from population-based studies. Research into the use of free-text data or secondary care information is needed to improve the predictive accuracy of models using clinical data.
Original languageEnglish
Publication statusPublished - 14 May 2021

Structured keywords



  • Common Mental Disorders
  • Depression
  • Primary Care Data
  • Data Linkage
  • Predictive Modelling


Dive into the research topics of 'Assessing and predicting adolescent and early adulthood common mental disorders in the ALSPAC cohort using electronic primary care data'. Together they form a unique fingerprint.

Cite this