Similarity matrices and clustering algorithms for population identifcation using genetic data

Daniel John Lawson, Daniel Falush

Research output: Contribution to journalArticle (Academic Journal)peer-review

66 Citations (Scopus)


A large number of algorithms have been developed to classify individuals into discrete populations using genetic data. Recent results show that
the information used by both model-based clustering methods and principal components analysis can be summarized by a matrix of pairwise
similarity measures between individuals. Similarity matrices have been
constructed in a number of ways, usually treating markers as independent but differing in the weighting given to polymorphisms of different
frequencies. Additionally, methods are now being developed that take
linkage into account. We review several such matrices and evaluate their
information content. A two-stage approach for population identification is to first construct a similarity matrix and then perform clustering.
We review a range of common clustering algorithms and evaluate their
performance through a simulation study. The clustering step can be
performed either on the matrix or by first using a dimension-reduction
technique; we find that the latter approach substantially improves the
performance of most algorithms. Based on these results, we describe
the population structure signal contained in each similarity matrix and
find that accounting for linkage leads to significant improvements for
sequence data. We also perform a comparison on real data, where we
find that population genetics models outperform generic clustering ap-
proaches, particularly with regard to robustness for features such as
relatedness between individuals.
Original languageEnglish
Pages (from-to)337-361
Number of pages27
JournalAnnual Review of Genomics and Human Genetics
Publication statusPublished - 2012


Dive into the research topics of 'Similarity matrices and clustering algorithms for population identifcation using genetic data'. Together they form a unique fingerprint.

Cite this