Javascript must be enabled to continue!
Evolutionary and methodological considerations when interpreting gene presence-absence variation in pangenomes
View through CrossRef
Abstract
While graph-based pangenomes have become a standard and interoperable foundation for comparisons across multiple reference genomes, integrating protein-coding gene annotations across pangenomes in a single ‘pangene set’ remains challenging, both because of methodological inconsistency and biological presence-absence variation (PAV). Here, we review and experimentally evaluate the root of genome annotation and pangene set inconsistency using two polyploid plant pangenomes: cotton and soybean, which were chosen because of their existing diverse high-quality genomic resources and the known importance of gene presence-absence variation in their respective breeding programs. We first demonstrate that building pangene sets across different genome resources is highly error prone: PAV calculated directly from the genome annotations hosted on public repositories recapitulates structure in annotation methods and not biological sequence differences. Re-annotation of all genomes with a single identical pipeline largely resolves the broadest stroke issues; however, substantial challenges remain, including a surprisingly common case where exactly identical sequences have different gene model structural annotations. Combined, these results clearly show that pangenome gene model annotations must be carefully integrated before any biological inference can be made regarding sequence evolution, gene copy-number, or presence-absence variation.
Title: Evolutionary and methodological considerations when interpreting gene presence-absence variation in pangenomes
Description:
Abstract
While graph-based pangenomes have become a standard and interoperable foundation for comparisons across multiple reference genomes, integrating protein-coding gene annotations across pangenomes in a single ‘pangene set’ remains challenging, both because of methodological inconsistency and biological presence-absence variation (PAV).
Here, we review and experimentally evaluate the root of genome annotation and pangene set inconsistency using two polyploid plant pangenomes: cotton and soybean, which were chosen because of their existing diverse high-quality genomic resources and the known importance of gene presence-absence variation in their respective breeding programs.
We first demonstrate that building pangene sets across different genome resources is highly error prone: PAV calculated directly from the genome annotations hosted on public repositories recapitulates structure in annotation methods and not biological sequence differences.
Re-annotation of all genomes with a single identical pipeline largely resolves the broadest stroke issues; however, substantial challenges remain, including a surprisingly common case where exactly identical sequences have different gene model structural annotations.
Combined, these results clearly show that pangenome gene model annotations must be carefully integrated before any biological inference can be made regarding sequence evolution, gene copy-number, or presence-absence variation.
Related Results
Evolution and the cell
Evolution and the cell
Genotype to phenotype, and back again
Evolution is intimately linked to biology at the cellular scale- evolutionary processes act on the very genetic material that is carried and ...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT
Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene (rs...
Persistent, Private and Mobile genes: a model for gene dynamics in evolving pangenomes
Persistent, Private and Mobile genes: a model for gene dynamics in evolving pangenomes
AbstractThe pangenome of a species is the set of all genes carried by at least one member of the species. In bacteria, pangenomes can be much larger than the set of genes carried b...
Interpreters as Professionals
Interpreters as Professionals
In this article, I
shall examine how interpreting studies have so far accounted for different
modes and types of interpreting, and suggest that the traditional subdivision
into con...
A comparative interpreting studies view of interpreting in religious contexts
A comparative interpreting studies view of interpreting in religious contexts
This article applies Comparative Interpreting Studies to research on interpreting in religious contexts and the relevance of this literature to Interpreting Studies more broadly. C...
Effects of executive functions on consecutive interpreting for Chinese-Japanese unbalanced bilinguals
Effects of executive functions on consecutive interpreting for Chinese-Japanese unbalanced bilinguals
IntroductionPrevious research on performance in interpreting has focused primarily on the influence of interpreting experience on executive functions, such as shifting, updating, a...
Evolutionary Biomechanics
Evolutionary Biomechanics
Life has diversified on Earth in many stunning ways. Understanding how this diversity arose and has been maintained is a common interest for many evolutionary biologists. One appro...
Evolutionary Medicine
Evolutionary Medicine
Abstract
Evolutionary medicine is a fast‐growing research field providing biomedical scientists with evolutionary perspective for the comprehens...

