Skip to content

Machine learning to assist risk of bias assessments in systematic reviews

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)266-277
Number of pages12
JournalInternational Journal of Epidemiology
Issue number1
Early online date8 Dec 2015
DateAccepted/In press - 23 Oct 2015
DateE-pub ahead of print - 8 Dec 2015
DatePublished (current) - 1 Feb 2016


Background: Risk-of-bias assessments are now a standard component of systematic reviews. At present, reviewers need to manually identify relevant parts of research articles for a set of methodological elements that affect the risk of bias, in order to make a risk-of-bias judgement for each of these elements. We investigate the use of text mining methods to automate risk-of-bias assessments in systematic reviews. We aim to identify relevant sentences within the text of included articles, to rank articles by risk of bias and to reduce the number of risk-of-bias assessments that the reviewers need to perform by hand.

Methods: We use supervised machine learning to train two types of models, for each of the three risk-of-bias properties of sequence generation, allocation concealment and blinding. The first model predicts whether a sentence in a research article contains relevant information. The second model predicts a risk-of-bias value for each research article. We use logistic regression, where each independent variable is the frequency of a word in a sentence or article, respectively.

Results: We found that sentences can be successfully ranked by relevance with area under the receiver operating characteristic (ROC) curve (AUC) > 0.98. Articles can be ranked by risk of bias with AUC > 0.72. We estimate that more than 33% of articles can be assessed by just one reviewer, where two reviewers are normally required.

Conclusions: We show that text mining can be used to assist risk-of-bias assessments.

    Research areas

  • Risk of bias, systematic review, text mining, machine learning

    Structured keywords

  • ConDuCT-II
  • Jean Golding

Download statistics

No data available



  • Int. J. Epidemiol.-2016-Millard-266-77

    Rights statement: (C) The Author 2015. Published by Oxford University Press on behalf of the International Epidemiological Association. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unre- stricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

    Final published version, 644 KB, PDF document

    Licence: CC BY


View research connections

Related faculties, schools or groups