In this paper we propose a Bayesian modeling approach to the analysis of genome-wide association studies based on single nucleotide polymorphism (SNP) data. Our latent seed model combines various aspects of k-means clustering, hidden Markov models (HMMs) and logistic regression into a fully Bayesian model. It is fitted using the Markov chain Monte Carlo stochastic simulation method, with Metropolis-Hastings update steps. The approach is flexible, both in allowing different types of genetic models, and because it can be easily extended while remaining computationally feasible due to the use of fast algorithms for HMMs. It allows for inference primarily on the location of the causal locus and also on other parameters of interest. The latent seed model is used here to analyze three data sets, using both synthetic and real disease phenotypes with real SNP data, and shows promising results. Our method is able to correctly identify the causal locus in examples where single SNP analysis is both successful and unsuccessful at identifying the causal SNP.
Bibliographical notePublisher: Wiley
- Markov chain Monte Carlo; hidden Markov model; logistic regression; SNP-base population association study