Design and quality control of large-scale two-sample Mendelian randomization studies

Research output: Contribution to journalArticle (Academic Journal)peer-review

2 Citations (Scopus)
124 Downloads (Pure)


Mendelian randomization (MR) studies are susceptible to metadata errors (e.g. incorrect specification of the effect allele column) and other analytical issues that can introduce substantial bias into analyses. We developed a quality control (QC) pipeline for the Fatty Acids in Cancer Mendelian Randomization Collaboration (FAMRC) that can be used to identify and correct for such errors.

We collated summary association statistics from fatty acid and cancer genome-wide association studies (GWAS) and subjected the collated data to a comprehensive QC pipeline. We identified metadata errors through comparison of study-specific statistics to external reference data sets (the National Human Genome Research Institute-European Bioinformatics Institute GWAS catalogue and 1000 genome super populations) and other analytical issues through comparison of reported to expected genetic effect sizes. Comparisons were based on three sets of genetic variants: (i) GWAS hits for fatty acids, (ii) GWAS hits for cancer and (iii) a 1000 genomes reference set.

We collated summary data from 6 fatty acid and 54 cancer GWAS. Metadata errors and analytical issues with the potential to introduce substantial bias were identified in seven studies (11.6%). After resolving metadata errors and analytical issues, we created a data set of 219 842 genetic associations with 90 cancer types, generated in analyses of 566 665 cancer cases and 1 622 374 controls.

In this large MR collaboration, 11.6% of included studies were affected by a substantial metadata error or analytical issue. By increasing the integrity of collated summary data prior to their analysis, our protocol can be used to increase the reliability of downstream MR analyses. Our pipeline is available to other researchers via the CheckSumStats package (
Original languageEnglish
Article numberdyad018
Pages (from-to)1498-1521
Number of pages24
JournalInternational Journal of Epidemiology
Issue number5
Publication statusPublished - 12 Apr 2023

Bibliographical note

Funding Information:
R.M.M. was supported by a Cancer Research UK (C18281/A19169) programme grant (the Integrative Cancer Epidemiology Programme) and by the National Institute for Health Research (NIHR) Biomedical Research Centre at University Hospitals Bristol and Weston NHS Foundation Trust and the University of Bristol. RMM is a National Institute for Health Research Senior Investigator (NIHR202411). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care. This research was carried out in the MRC Integrative Epidemiology Unit (MC_UU_00011/1, MC_UU_00011/4, MC_UU_00011/6). PCH was supported by Cancer Research UK (C52724/A20138 and C18281/A29019). MCB was supported by a UK Medical Research Council (MRC) Skills Development Fellowship (MR/P014054/1). MML is supported in part by the National Institute for Health and Care Research (NIHR) Leeds Biomedical Research Centre. The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health and Social Care. NKK is supported by NIH R00 CA215360. PG is supported by a NHMRC Investigator Grant (#1173390). CIA is a research scholar of the Cancer Prevention Research Institute of Texas supported by RR170048. This research was also partially supported by U19CA203654. Acknowledgements

Publisher Copyright:
© 2023 The Author(s). Published by Oxford University Press on behalf of the International Epidemiological Association.

Structured keywords

  • Bristol Population Health Science Institute
  • ICEP


Dive into the research topics of 'Design and quality control of large-scale two-sample Mendelian randomization studies'. Together they form a unique fingerprint.

Cite this