Markov models of biomolecular systems

  • Robert E Arbon

Student thesis: Doctoral ThesisDoctor of Philosophy (PhD)


Markov models are a popular technique for understanding the dynamics of systems which Mmove through “rough” potentials [1]. In such cases, the system is well approximated as transitioning between discrete states with a set state-to-state probability, independent of its history. Choosing how these states relate to the coordinates of the system (the discretization) and how these are partitioned into metastable sets (the coarse graining) is of central importance to the technique. This thesis contributes to methods for making these choices and applies them to two
systems: water diffusion and enzyme dynamics.
Markov models were used to provide an explanation of water diffusion through viscous aerosol
particles, where diffusion is known to diverge from typical Stokes-Einstein behaviour. The choice of discretization and coarse-graining techniques came from established methods and heuristics in the Markov modelling literature. The analysis showed that water diffuses by hopping between transient cavities created by the organic fraction of the aerosol particle. For the majority of the time this process is irreversible but the water can also establish local equilibria between clusters of cavities arresting the diffusion process.
A more complex workflow was proposed and evaluated for the case of the aromatic amine dehydrogenase, an enzyme at the heart of the debate surrounding hydrogen tunneling and enzyme dynamics. This workflow used ideas from the statistics and machine learning communities in order to make the modelling process more transparent, efficient and reproducible. The response surface of an MSM - the change in model quality in response to modelling choices - was estimated and optimised using Bayesian optimisation. Statistical model selection techniques for selecting the number of metastable states in a hidden Markov model were evaluated. Theoretical and practical arguments are made in favour of the integrated complete-data likelihood criterion. The benefits of this more elaborate workflow were mixed. The response surface proved useful in creating tests of the sensitivity of inferences to the modelling choices. Many of the modelling choices were shown to not affect the model quality and as a result Bayesian optimisation proved of little benefit. The conformational landscape of aromatic amine dehydrogenase was found to consist of many short lived (20 ns to 300 ns) metastable states which slowly interconvert on a timescale of approximately 1.2 μs. However, the simulations had moved away from their reactive conformations and so the implications for understanding reactivity were limited. In addition, these results could not be validated and sensitivity tests cast doubt on the robustness of this conclusion. The source of these problems was investigated and several solutions were proposed.
Date of Award28 Sept 2021
Original languageEnglish
Awarding Institution
  • The University of Bristol
SupervisorDavid Glowacki (Supervisor) & Fred Manby (Supervisor)

Cite this