TY - JOUR
T1 - CALANGO
T2 - A phylogeny-aware comparative genomics tool for discovering quantitative genotype-phenotype associations across species
AU - Hongo, Jorge Augusto
AU - de Castro, Giovanni Marques
AU - Albuquerque Menezes, Alison Pelri
AU - Rios Picorelli, Agnello César
AU - Martins da Silva, Thieres Tayroni
AU - Imada, Eddie Luidy
AU - Marchionni, Luigi
AU - Del-Bem, Luiz Eduardo
AU - Vieira Chaves, Anderson
AU - Almeida, Gabriel Magno de Freitas
AU - Campelo, Felipe
AU - Lobo, Francisco Pereira
N1 - Publisher Copyright:
© 2023 The Author(s)
PY - 2023/6/9
Y1 - 2023/6/9
N2 - Living species vary significantly in phenotype and genomic content. Sophisticated statistical methods linking genes with phenotypes within a species have led to breakthroughs in complex genetic diseases and genetic breeding. Despite the abundance of genomic and phenotypic data available for thousands of species, finding genotype-phenotype associations across species is challenging due to the non-independence of species data resulting from common ancestry. To address this, we present CALANGO (comparative analysis with annotation-based genomic components), a phylogeny-aware comparative genomics tool to find homologous regions and biological roles associated with quantitative phenotypes across species. In two case studies, CALANGO identified both known and previously unidentified genotype-phenotype associations. The first study revealed unknown aspects of the ecological interaction between Escherichia coli, its integrated bacteriophages, and the pathogenicity phenotype. The second identified an association between maximum height in angiosperms and the expansion of a reproductive mechanism that prevents inbreeding and increases genetic diversity, with implications for conservation biology and agriculture.
AB - Living species vary significantly in phenotype and genomic content. Sophisticated statistical methods linking genes with phenotypes within a species have led to breakthroughs in complex genetic diseases and genetic breeding. Despite the abundance of genomic and phenotypic data available for thousands of species, finding genotype-phenotype associations across species is challenging due to the non-independence of species data resulting from common ancestry. To address this, we present CALANGO (comparative analysis with annotation-based genomic components), a phylogeny-aware comparative genomics tool to find homologous regions and biological roles associated with quantitative phenotypes across species. In two case studies, CALANGO identified both known and previously unidentified genotype-phenotype associations. The first study revealed unknown aspects of the ecological interaction between Escherichia coli, its integrated bacteriophages, and the pathogenicity phenotype. The second identified an association between maximum height in angiosperms and the expansion of a reproductive mechanism that prevents inbreeding and increases genetic diversity, with implications for conservation biology and agriculture.
KW - comparative genomics
KW - comparative methods
KW - DSML2: Proof-of-concept: Data science output has been formulated, implemented, and tested for one domain/problem
KW - evolution of quantitative phenotypes
KW - genotype-phenotype association
KW - molecular functional convergence
KW - quantitative trait
KW - species data
UR - http://www.scopus.com/inward/record.url?scp=85153940065&partnerID=8YFLogxK
U2 - 10.1016/j.patter.2023.100728
DO - 10.1016/j.patter.2023.100728
M3 - Article (Academic Journal)
C2 - 37409050
AN - SCOPUS:85153940065
SN - 2666-3899
VL - 4
JO - Patterns
JF - Patterns
IS - 6
M1 - 100728
ER -