Abstract
Background
Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework.
Methods
In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status.
Findings
Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1.
Interpretation
Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources.
Updatable estimates of COVID-19 onset, progression, and trajectories underpin pandemic mitigation efforts. To identify and characterise disease trajectories, we aimed to define and validate ten COVID-19 phenotypes from nationwide linked electronic health records (EHR) using an extensible framework.
Methods
In this cohort study, we used eight linked National Health Service (NHS) datasets for people in England alive on Jan 23, 2020. Data on COVID-19 testing, vaccination, primary and secondary care records, and death registrations were collected until Nov 30, 2021. We defined ten COVID-19 phenotypes reflecting clinically relevant stages of disease severity and encompassing five categories: positive SARS-CoV-2 test, primary care diagnosis, hospital admission, ventilation modality (four phenotypes), and death (three phenotypes). We constructed patient trajectories illustrating transition frequency and duration between phenotypes. Analyses were stratified by pandemic waves and vaccination status.
Findings
Among 57 032 174 individuals included in the cohort, 13 990 423 COVID-19 events were identified in 7 244 925 individuals, equating to an infection rate of 12·7% during the study period. Of 7 244 925 individuals, 460 737 (6·4%) were admitted to hospital and 158 020 (2·2%) died. Of 460 737 individuals who were admitted to hospital, 48 847 (10·6%) were admitted to the intensive care unit (ICU), 69 090 (15·0%) received non-invasive ventilation, and 25 928 (5·6%) received invasive ventilation. Among 384 135 patients who were admitted to hospital but did not require ventilation, mortality was higher in wave 1 (23 485 [30·4%] of 77 202 patients) than wave 2 (44 220 [23·1%] of 191 528 patients), but remained unchanged for patients admitted to the ICU. Mortality was highest among patients who received ventilatory support outside of the ICU in wave 1 (2569 [50·7%] of 5063 patients). 15 486 (9·8%) of 158 020 COVID-19-related deaths occurred within 28 days of the first COVID-19 event without a COVID-19 diagnoses on the death certificate. 10 884 (6·9%) of 158 020 deaths were identified exclusively from mortality data with no previous COVID-19 phenotype recorded. We observed longer patient trajectories in wave 2 than wave 1.
Interpretation
Our analyses illustrate the wide spectrum of disease trajectories as shown by differences in incidence, survival, and clinical pathways. We have provided a modular analytical framework that can be used to monitor the impact of the pandemic and generate evidence of clinical and policy relevance using multiple EHR sources.
Original language | English |
---|---|
Pages (from-to) | e542-e557 |
Journal | The Lancet. Digital health |
Volume | 4 |
Issue number | 7 |
Early online date | 9 Jun 2022 |
DOIs | |
Publication status | Published - 1 Jul 2022 |
Bibliographical note
Funding Information:The authors would like to thank the BHF Data Science Centre's lay members panel for their input and NHS data access environment output checkers Lisa Gray and James Walker. This work was supported by the BHF Data Science Centre led by HDR UK (grant SP/19/3/34678). This study makes use of de-identified data held in NHS Digital's Trusted Research Environment for England and made available via the BHF Data Science Centre's CVD-COVID-UK/COVID-IMPACT consortium. This work uses data provided by patients and collected by the NHS as part of their care and support. We would also like to acknowledge all data providers who make health relevant data available for research. The views expressed are those of the authors and not necessarily those of the organisations listed. This study was supported by a BHF Data Science Centre grant (SP/19/3/34678), awarded to HDR UK, which funded co-development (with NHS Digital) of the trusted research environment, provision of linked datasets, data access, user software licences, computational usage, and data management and obtaining support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser's National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations enabled data analysts, biostatisticians, epidemiologists, and clinicians to contribute their time to the study. This study was also funded by the Longitudinal Health and Wellbeing COVID-19 National Core Study, which was established by the UK Chief Scientific Officer in October, 2020, and funded by UK Research and Innovation (grants MC_PC_20030 and MC_PC_20059), by the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant MC_PC_20058), and by the CONVALESCENCE study of long COVID, which is funded by the NIHR and UKRI. This study was also supported by Health Data Research UK, which receives its funding from HDR UK (HDR-9006) funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), BHF, and the Wellcome Trust. AA is supported by HDR UK (HDR-9006); and Administrative Data Research UK, which is funded by the Economic and Social Research Council (grant ES/S007393/1). AGL is supported by funding from the Wellcome Trust (204841/Z/16/Z), the NIHR University College London Hospitals Biomedical Research Centre (BRC714/HI/RW/101440), NIHR Great Ormond Street Hospital Biomedical Research Centre (19RX02), and the Academy of Medical Sciences (SBF006\1084). AH is supported by research funding from the HDR UK text analytics implementation project. AB, AW, HH, and SD are part of the BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement 116074. AW is supported by the BHF-Turing Cardiovascular Data Science Award (BCDSA\100005) and by core funding from UK MRC (MR/L003120/1), BHF (RG/13/13/30194; RG/18/13/33946), and NIHR Cambridge Biomedical Research Centre (BRC-1215–20014). JACS and JC are supported by the HDR UK South West Better Care Partnership and the NIHR Bristol Biomedical Research Centre at University Hospitals Bristol, Weston NHS Foundation Trust, and the University of Bristol. JACS is additionally supported by UKRI and the MRC. SD and HH are supported by HDR UK London. HH and SD are supported by the NIHR Biomedical Research Centre at University College London (UCL) Hospital NHS Trust. SD is supported by an Alan Turing Fellowship (EP/N510129/1), the BHF Data Science Centre, and the NIHR-UKRI CONVALESCENCE study. HH is an NIHR Senior Investigator. SD and HH are supported by the BHF Accelerator Award (AA/18/6/24223). CT is supported by a UCL UKRI Centre for Doctoral Training in AI-enabled Healthcare studentship (EP/S021612/1), MRC Clinical Top-Up, and a studentship from the NIHR Biomedical Research Centre at University College London Hospital NHS Trust. HW is supported by the MRC (MR/S004149/2), NIHR (grant NIHR202639), and the Advanced Care Research Centre Programme at the University of Edinburgh. KL is supported by University College London and Rosetrees Trust (UCL-IHE-2020\102), NIHR, the NHS (AI_AWARD01786), the NIHR University College London Hospitals NHS Foundation Trust Bioemedical Research Centre (BRC713/HI/RW/101440), and the UCL Higher Education Innovation Fund (KEI2021–03–16).
Funding Information:
The authors would like to thank the BHF Data Science Centre's lay members panel for their input and NHS data access environment output checkers Lisa Gray and James Walker. This work was supported by the BHF Data Science Centre led by HDR UK (grant SP/19/3/34678). This study makes use of de-identified data held in NHS Digital's Trusted Research Environment for England and made available via the BHF Data Science Centre's CVD-COVID-UK/COVID-IMPACT consortium. This work uses data provided by patients and collected by the NHS as part of their care and support. We would also like to acknowledge all data providers who make health relevant data available for research. The views expressed are those of the authors and not necessarily those of the organisations listed. This study was supported by a BHF Data Science Centre grant (SP/19/3/34678), awarded to HDR UK, which funded co-development (with NHS Digital) of the trusted research environment, provision of linked datasets, data access, user software licences, computational usage, and data management and obtaining support, with additional contributions from the HDR UK Data and Connectivity component of the UK Government Chief Scientific Adviser's National Core Studies programme to coordinate national COVID-19 priority research. Consortium partner organisations enabled data analysts, biostatisticians, epidemiologists, and clinicians to contribute their time to the study. This study was also funded by the Longitudinal Health and Wellbeing COVID-19 National Core Study, which was established by the UK Chief Scientific Officer in October, 2020, and funded by UK Research and Innovation (grants MC_PC_20030 and MC_PC_20059), by the Data and Connectivity National Core Study, led by Health Data Research UK in partnership with the Office for National Statistics and funded by UK Research and Innovation (grant MC_PC_20058), and by the CONVALESCENCE study of long COVID, which is funded by the NIHR and UKRI. This study was also supported by Health Data Research UK, which receives its funding from HDR UK (HDR-9006) funded by the UK Medical Research Council, Engineering and Physical Sciences Research Council, Economic and Social Research Council, Department of Health and Social Care (England), Chief Scientist Office of the Scottish Government Health and Social Care Directorates, Health and Social Care Research and Development Division (Welsh Government), Public Health Agency (Northern Ireland), BHF, and the Wellcome Trust. AA is supported by HDR UK (HDR-9006); and Administrative Data Research UK, which is funded by the Economic and Social Research Council (grant ES/S007393/1). AGL is supported by funding from the Wellcome Trust (204841/Z/16/Z), the NIHR University College London Hospitals Biomedical Research Centre (BRC714/HI/RW/101440), NIHR Great Ormond Street Hospital Biomedical Research Centre (19RX02), and the Academy of Medical Sciences (SBF006\1084). AH is supported by research funding from the HDR UK text analytics implementation project. AB, AW, HH, and SD are part of the BigData@Heart Consortium, funded by the Innovative Medicines Initiative-2 Joint Undertaking under grant agreement 116074. AW is supported by the BHF-Turing Cardiovascular Data Science Award (BCDSA\100005) and by core funding from UK MRC (MR/L003120/1), BHF (RG/13/13/30194; RG/18/13/33946), and NIHR Cambridge Biomedical Research Centre (BRC-1215–20014). JACS and JC are supported by the HDR UK South West Better Care Partnership and the NIHR Bristol Biomedical Research Centre at University Hospitals Bristol, Weston NHS Foundation Trust, and the University of Bristol. JACS is additionally supported by UKRI and the MRC. SD and HH are supported by HDR UK London. HH and SD are supported by the NIHR Biomedical Research Centre at University College London (UCL) Hospital NHS Trust. SD is supported by an Alan Turing Fellowship (EP/N510129/1), the BHF Data Science Centre, and the NIHR-UKRI CONVALESCENCE study. HH is an NIHR Senior Investigator. SD and HH are supported by the BHF Accelerator Award (AA/18/6/24223). CT is supported by a UCL UKRI Centre for Doctoral Training in AI-enabled Healthcare studentship (EP/S021612/1), MRC Clinical Top-Up, and a studentship from the NIHR Biomedical Research Centre at University College London Hospital NHS Trust. HW is supported by the MRC (MR/S004149/2), NIHR (grant NIHR202639), and the Advanced Care Research Centre Programme at the University of Edinburgh. KL is supported by University College London and Rosetrees Trust (UCL-IHE-2020\102), NIHR, the NHS (AI_AWARD01786), the NIHR University College London Hospitals NHS Foundation Trust Bioemedical Research Centre (BRC713/HI/RW/101440), and the UCL Higher Education Innovation Fund (KEI2021–03–16).
Publisher Copyright:
© 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 license