Skip to main navigation Skip to search Skip to main content

Phylogenomic mixture models outperform homogeneous and partitioned models

Mattia Giacomelli, Davide Pisani, Gergely J. Szöllősi, Eleonora Rossi, Marc Domenech Andreu, Jesus Lozano-Fernandez

Research output: Contribution to journalArticle (Academic Journal)peer-review

Abstract

Significant advances have been made in resolving the tree of life, but many nodes remain debated. The last two decades saw the emergence of mixture models, which proved particularly useful to account for across-site compositional heterogeneity, and played a central role to improve our understanding of difficult phylogenetic problems. However, some scholars have remained skeptical of their use. Here we perform a large simulation study comparing mixture models accounting for across-site compositional heterogeneity, across-site compositionally homogeneous models and partitioned models. We show that the tested mixture models fit across-site compositionally heterogeneous datasets best and achieve greater accuracy. CAT-GTR, an infinite mixture model combining a General Time Reversible –GTR– matrix with a mixture of site-frequency profiles (i.e. categories –CAT– or components) characterized by different amino acid frequency vectors, maximizes accuracy and fit. Mixture models, and particularly CAT-GTR, perform well also with across-site compositionally homogeneous datasets, where the use of a mixture of site-frequency profiles is not necessary. We show that this is because with homogeneous data these models converge to appropriate compositionally homogeneous models, avoiding overparametrization. Our results dissipate doubts about the utility of models accounting for compositional heterogeneity across sites and identify CAT-GTR as one of the most flexible models in the phylogenomic arsenal.
Original languageEnglish
JournalMolecular Biology and Evolution
Publication statusAccepted/In press - 13 Mar 2026

Fingerprint

Dive into the research topics of 'Phylogenomic mixture models outperform homogeneous and partitioned models'. Together they form a unique fingerprint.

Cite this