Javascript must be enabled to continue!
Evolutionary and methodological considerations when interpreting gene presence-absence variation in pangenomes
View through CrossRef
Abstract
While graph-based pangenomes have become a standard and interoperable foundation for comparisons across multiple reference genomes, integrating protein-coding gene annotations across pangenomes in a single ‘pangene set’ remains challenging, both because of methodological inconsistency and biological presence-absence variation (PAV). Here, we review and experimentally evaluate the root of genome annotation and pangene set inconsistency using two polyploid plant pangenomes: cotton and soybean, which were chosen because of their existing diverse high-quality genomic resources and the known importance of gene presence-absence variation in their respective breeding programs. We first demonstrate that building pangene sets across different genome resources is highly error prone: PAV calculated directly from the genome annotations hosted on public repositories recapitulates structure in annotation methods and not biological sequence differences. Re-annotation of all genomes with a single identical pipeline largely resolves the broadest stroke issues; however, substantial challenges remain, including a surprisingly common case where exactly identical sequences have different gene model structural annotations. Combined, these results clearly show that pangenome gene model annotations must be carefully integrated before any biological inference can be made regarding sequence evolution, gene copy-number, or presence-absence variation.
Title: Evolutionary and methodological considerations when interpreting gene presence-absence variation in pangenomes
Description:
Abstract
While graph-based pangenomes have become a standard and interoperable foundation for comparisons across multiple reference genomes, integrating protein-coding gene annotations across pangenomes in a single ‘pangene set’ remains challenging, both because of methodological inconsistency and biological presence-absence variation (PAV).
Here, we review and experimentally evaluate the root of genome annotation and pangene set inconsistency using two polyploid plant pangenomes: cotton and soybean, which were chosen because of their existing diverse high-quality genomic resources and the known importance of gene presence-absence variation in their respective breeding programs.
We first demonstrate that building pangene sets across different genome resources is highly error prone: PAV calculated directly from the genome annotations hosted on public repositories recapitulates structure in annotation methods and not biological sequence differences.
Re-annotation of all genomes with a single identical pipeline largely resolves the broadest stroke issues; however, substantial challenges remain, including a surprisingly common case where exactly identical sequences have different gene model structural annotations.
Combined, these results clearly show that pangenome gene model annotations must be carefully integrated before any biological inference can be made regarding sequence evolution, gene copy-number, or presence-absence variation.
Related Results
PANGENOMICS OF PLANTS
PANGENOMICS OF PLANTS
The concept of plant pangenomes appeared in 2007, but the preliminary pangenomes of corn and soybeans were created in 2010. First pangenomes of three plant species (Brassica rapa, ...
Selection-based model of prokaryote pangenomes
Selection-based model of prokaryote pangenomes
Abstract
The genomes of different individuals of the same prokaryote species can vary widely in gene content, displaying different proportions of core genes, which ...
Evolution and the cell
Evolution and the cell
Genotype to phenotype, and back again
Evolution is intimately linked to biology at the cellular scale- evolutionary processes act on the very genetic material that is carried and ...
Persistent, Private and Mobile genes: a model for gene dynamics in evolving pangenomes
Persistent, Private and Mobile genes: a model for gene dynamics in evolving pangenomes
AbstractThe pangenome of a species is the set of all genes carried by at least one member of the species. In bacteria, pangenomes can be much larger than the set of genes carried b...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT
Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene...
Contingency, Repeatability and Predictability in the Evolution of a Prokaryotic Pangenome
Contingency, Repeatability and Predictability in the Evolution of a Prokaryotic Pangenome
Abstract
Pangenomes exhibit remarkable variability in many prokaryotic species. This variation is maintained through the processes of horizontal ...
PanForest: predicting genes in genomes using random forests
PanForest: predicting genes in genomes using random forests
Abstract
Motivation
The presence or absence of some genes in a genome can influence whether other genes are likely to be ...
A comparative interpreting studies view of interpreting in religious contexts
A comparative interpreting studies view of interpreting in religious contexts
This article applies Comparative Interpreting Studies to research on interpreting in religious contexts and the relevance of this literature to Interpreting Studies more broadly. C...

