Skip to content

The need for statistical contributions to bioinformatics at scale, with illustration to mass spectrometry

Research output: Contribution to journalArticle

Original languageEnglish
Pages (from-to)290-299
Number of pages10
JournalStatistical Modelling
Volume17
Issue number4-5
DOIs
DateAccepted/In press - 31 Mar 2017
DatePublished (current) - 1 Aug 2017

Abstract

In their article, Morris and Baladandayuthapani clearly evidence the influence of statisticians in recent methodological advances throughout the bioinformatics pipeline and advocate for the expansion of this role. The latest acquisition platforms, such as next generation sequencing (genomics/transcriptomics) and hyphenated mass spectrometry (proteomics/metabolomics), output raw datasets in the order of gigabytes; it is not unusual to acquire a terabyte or more of data per study. The increasing computational burden this brings is a further impediment against the use of statistically rigorous methodology in the pre-processing stages of the bioinformatics pipeline. In this discussion I describe the mass spectrometry pipeline and use it as an example to show that beneath this challenge lies a two-fold opportunity: (a) Biological complexity and dynamic range is still well beyond what is captured by current processing methodology; hence, potential biomarkers and mechanistic insights are consistently missed; (b) Statistical science could play a larger role in optimizing the acquisition process itself. Data rates will continue to increase as routine clinical omics analysis moves to large-scale facilities with systematic, standardized protocols. Key inferential gains will be achieved by borrowing strength across the sum total of all analyzed studies, a task best underpinned by appropriate statistical modelling.

    Research areas

  • computational statistics, mass spectrometry, metabolomics, proteomics, sparse signal processing

Download statistics

No data available

Documents

Documents

  • Full-text PDF (accepted author manuscript)

    Rights statement: This is the author accepted manuscript (AAM). The final published version (version of record) is available online via Sage at http://journals.sagepub.com/doi/10.1177/1471082X17708519. Please refer to any applicable terms of use of the publisher.

    Accepted author manuscript, 139 KB, PDF document

    Licence: Unspecified

DOI

View research connections

Related faculties, schools or groups