Bayesian protein sequence and structure alignment

Christopher J. Fallaize*, Peter J. Green, Kanti V. Mardia, Stuart Barber

*Corresponding author for this work

Research output: Contribution to journalArticle (Academic Journal)peer-review

Abstract

The structure of a protein is crucial in determining its functionality and is much more conserved than sequence during evolution. A key task in structural biology is to compare protein structures to determine evolutionary relationships, to estimate the function of newly discovered structures and to predict unknown structures. We propose a Bayesian method for protein structure alignment, with the prior on alignments based on functions which penalize ‘gaps’ in the aligned sequences. We show how a broad class of penalty functions fits into this framework, and how the resulting posterior distribution can be efficiently sampled. A commonly used gap penalty function is shown to be a special case, and we propose a new penalty function which alleviates an undesirable feature of the commonly used penalty. We illustrate our method on benchmark data sets and find that it competes well with popular tools from computational biology. Our method has the benefit of being able potentially to explore multiple competing alignments and to quantify their merits probabilistically. The framework naturally enables further information such as amino acid sequence to be included and could be adapted to other situations such as flexible proteins or domain swaps.

Original languageEnglish
Number of pages25
JournalJournal of the Royal Statistical Society. Series C: Applied Statistics
Early online date8 Jan 2020
DOIs
Publication statusE-pub ahead of print - 8 Jan 2020

Keywords

  • Gap penalty prior
  • Markov chain Monte Carlo sampling
  • Protein structure alignment
  • Structural bioinformatics
  • Unlabelled shape analysis

Fingerprint Dive into the research topics of 'Bayesian protein sequence and structure alignment'. Together they form a unique fingerprint.

Cite this