Personal profile

Research interests

Group website→


My research interests lie in the development and application of computational methods in population health sciences. I am involved in a wide range of different projects and am always interested in hearing from potential PhD students or postdoctoral researchers. A selection of my research interests:

Data mining

I am interested in understanding the mechanisms of disease, and approach this through the integration of diverse biomedical and epidemiological data and the development of software tools for analysis of these data. One of our key developments is EpiGraphDB, a database that integrates epidemiological and biomedical data to support mechanism discovery and aid causal inference. The platform is openly available and is used by academic and industry researchers worldwide. 

Systematic analysis of potential interventions

The MR-Base platform aims to systematise causal inference using Mendelian randomization [Gib Hemani, Philip Haycock, Ben Elsworth, Matt Lyon and Jie (Chris) Zheng]. MR-Base integrates an extensive database of genome-wide association study data (the MRC-IEU OpenGWAS database) with Mendelian randomization (MR) methods in both a user-friendly web application and a comprehensive R package.

We have applied these tools to the systematic causal analysis of a wide array of risk factors and diseases and the prioritization of drug targets. OpenGWAS is openly available (hosted in Oracle Cloud) and used by thousands of academic and industry users worldwide to support MR and other post-GWAS analyses.

Drug target prioritization

Working in collaboration with major pharmaceutical companies we have carried out systematic analyses of potential drug targets using MR, making results openly available in EpigraphDB [Zheng et al, Nature Genetics 2020]. This approach has been summarised in an animation. We have subsequently applied this in other contexts, including for neurological and psychiatric disease, and in a cross-population context for various diseases.

Literature mining and natural language processing (NLP)

The MELODI and newer MELODI-Presto platform both aim to mine mechanistic pathways from the biomedical literature [Ben Elsworth]. The software searches for overlapping terms between two literature sets that represent two different entities (eg a risk factor and a disease). Enriched overlapping terms may represent candidate mechanisms for further investigation. MELODI is paralleled by the TeMMPo platform (developed in collaboration with WCRF), which assesses the literature for number of publications underpinning hypothesised mechanistic pathways.

We implement NLP tools (such as text embeddings and language models) to enable the mapping of human traits across different biomedical and health datasets, with proofs-of-principle including Vectology and the NLP tool in EpiGraphDB. These approaches have been used to provide trait recommendations in the OpenGWAS database. We are also working on natural language interfaces to knowledge graphs (such as EpiGraphDB), and have recently implemented the ASQ EpiGraphDB platform as a proof-of-principle of this approach. 

Machine learning

I have interests in the application of machine learning approaches to molecular data, and (with Colin Campbell) have published tools that predict the functional effects of genetic variants (the widely-used FATHMM suite of tools), haploinsufficiency (HIPred) and breast cancer survival (FS-MKL).


As co-I of the BBSRC-funded ARIES project I led the bioinformatics workpackage in generating, QC’ing and normalizing the data, and have subsequently been involved in over 20 papers utilizing these data (including a major methylation QTL analysis published in Genome Biology in 2016). The methylation QTL derived from the ARIES data are presented in our online mQTLdb, and ongoing work with the GoDMC consortium will substantially extend the scale of this analysis.

Other software

Other software tools I have overseen include: FATHMM (Shihab), mQTLdb (Shihab), TeMMPo and GTB (Shihab) (see MRC-IEU software page). 

See my Scopus and Google Scholar pages for publications.

Research group and funding

My group currently comprises 6 postdoctoral researchers and 10 PhD students.

I lead a programme in Data Mining in the MRC Integrative Epidemiology Unit, a bioinformatics cross-cutting strand in the CRUK Integrative Cancer Epidemiology programme, and I co-lead the Translational Data Science Theme in the Bristol NIHR Biomedical Research Centre. I am an Executive Board member for the ALSPAC cohort.

Group website→

Grant Boards/Panels

I am a member of the Medical Research Council Population and Systems Medicine Board and also a member of the NIHR-MRC Better Methods, Better Research Panel. 

Postgraduate research career support

I am Faculty Postgraduate Research Director, a co-director of the Wellcome Molecular, Genetic and Lifecourse Epidemiology PhD programme and a co-director of the BHF Integrative Cardiovascular Science PhD programme.


External positions

Population and Systems Medicine Board Member, Medical Research Council

2020 → …

Better Methods, Better Research Panel Member, Medical Research Council

2020 → …


  • Bioinformatics
  • Data Science
  • Population Health


Dive into the research topics where Tom R Gaunt is active. These topic labels come from the works of this person. Together they form a unique fingerprint.
  • 1 Similar Profiles

Collaborations and top research areas from the last five years

Recent external collaboration on country/territory level. Dive into details by clicking on the dots or