Abstract
The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented
detail, but presents new statistical challenges. We propose a novel inference framework
that aims to efficiently capture information on population structure provided by patterns of haplotype
similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are
reconstructed using chunks of DNA donated by the other individuals. Results of this ‘chromosome painting’
can be summarized as a ‘coancestry matrix’, which directly reveals key information about ancestral
relationships among individuals. If markers are viewed as independent, we show that this matrix almost
completely captures the information used by both standard Principal Components Analysis (PCA), and
model-based approaches such as STRUCTURE, in a unified manner. Furthermore, when markers are in
linkage disequilibrium, the matrix combines information across successive markers to increase the ability
to discern fine-scale population structure using PCA. In parallel, we have developed an efficient modelbased
approach to identify discrete populations using this matrix, which offers advantages over PCA in
terms of interpretability, and over existing clustering algorithms in terms of speed, number of separable
populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel
data for 938 individuals and 641,000 markers, and identify 226 populations reflecting differences on continental,
regional, local and family scales. We present multiple lines of evidence that whilst many methods
capture similar information among strongly differentiated groups, more subtle population structure in
human populations is consistently present at a much finer level than currently available geographic labels,
and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter
and fineSTRUCTURE are available from http://www.paintmychromosomes.com/.
Translated title of the contribution | Inference of population structure using dense haplotype data |
---|---|
Original language | English |
Article number | e1002453 |
Number of pages | 16 |
Journal | PLoS Genetics |
Volume | 8 |
Issue number | 1 |
DOIs | |
Publication status | Published - Jan 2012 |