Time series genetic data allow for more accurate inference of population genetic parameters and hypothesis testing on the recent action of natural selection. In this work, we develop a novel likelihood-based method for jointly estimating selection coefficient and allele age from time series data of allele frequencies. Our approach is based on a hidden Markov model where the underlying process is a Wright-Fisher diffusion conditioned to survive until the time of the most recent sample. This formulation circumvents the assumption required in existing approaches that the allele is created by mutation at a certain low frequency. We calculate the likelihood by numerically solving the resulting Kolmogorov backward equation backwards in time while re-weighting the solution with the emission probabilities of the observation at each sampling time point. This procedure reduces the two-dimensional numerical search for the maximum of the likelihood surface for both the selection coefficient and the allele age to a one-dimensional search over the selection coefficient only. We illustrate through extensive simulations that our approach can produce accurate estimates of the selection coefficient and the allele age under both constant and non-constant demographic histories. We use our method to re-analyse ancient DNA data associated with horse base coat colours. We find that ignoring demographic histories or grouping raw samples can significantly bias the inference.
- Natural selection
- Allele age
- Conditioned Wright-Fisher diffusion
- Hidden Markov model
- Maximum likelihood estimation