Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Efficient inference of large pangenomes with PanTA

View through CrossRef
AbstractPangenome analysis is an indispensable step in bacterial genomics to address the high variability of bacteria genomes. However, speed and scalability remain a challenge for pangenome inference software tools to cope with the fast-growing genomic collections. We present PanTA, a software package for constructing the pangenomes of large bacterial collections. We show that PanTA exhibits an unprecedented multiple times more efficient than the current state-of-the-arts while maintaining a similar pangenome accuracy. In addition, PanTA introduces a novel mechanism to construct the pangenome progressively where new samples are added into an existing pangenome without rebuilding the accumulated collection from scratch. In the progressive mode, PanTA is demonstrated to consume orders of magnitude less computational resource than existing solutions in managing the pangenomes of growing microbial datasets. We further show that PanTA can build the pangenome of the entire collection of >28000Escherichia coligenomes from the RefSeq database on a laptop computer in 32 hours, highlighting the scalability and practicality of PanTA.The software is open source and is publicly available athttps://github.com/amromics/pantaunder an MIT license.
Title: Efficient inference of large pangenomes with PanTA
Description:
AbstractPangenome analysis is an indispensable step in bacterial genomics to address the high variability of bacteria genomes.
However, speed and scalability remain a challenge for pangenome inference software tools to cope with the fast-growing genomic collections.
We present PanTA, a software package for constructing the pangenomes of large bacterial collections.
We show that PanTA exhibits an unprecedented multiple times more efficient than the current state-of-the-arts while maintaining a similar pangenome accuracy.
In addition, PanTA introduces a novel mechanism to construct the pangenome progressively where new samples are added into an existing pangenome without rebuilding the accumulated collection from scratch.
In the progressive mode, PanTA is demonstrated to consume orders of magnitude less computational resource than existing solutions in managing the pangenomes of growing microbial datasets.
We further show that PanTA can build the pangenome of the entire collection of >28000Escherichia coligenomes from the RefSeq database on a laptop computer in 32 hours, highlighting the scalability and practicality of PanTA.
The software is open source and is publicly available athttps://github.
com/amromics/pantaunder an MIT license.

Related Results

Methodological approaches to studying coupled human-water systems
Methodological approaches to studying coupled human-water systems
<p>This paper reports on the progress being made on the “Methodologies” chapter of the Panta Rhei synthesis book due in May 2023 and to be...
Evolutionary Grammatical Inference
Evolutionary Grammatical Inference
Grammatical Inference (also known as grammar induction) is the problem of learning a grammar for a language from a set of examples. In a broad sense, some data is presented to the ...
Evolutionary and methodological considerations when interpreting gene presence-absence variation in pangenomes
Evolutionary and methodological considerations when interpreting gene presence-absence variation in pangenomes
Abstract While graph-based pangenomes have become a standard and interoperable foundation for comparisons across multiple reference genomes, integrating protein-coding ...
Methodologies for the study of change in hydrology and society
Methodologies for the study of change in hydrology and society
<p>This paper reports on the progress being made on the “Methodologies” chapter of the Panta Rhei synthesis book due in May 2023 and to be...
Panta Rhei Benchmark Dataset
Panta Rhei Benchmark Dataset
<p>We tackle the unsolved problem in hydrology “How can we extract information from available data on human and water systems in order to inform the bui...
Panache: a Web Browser-Based Viewer for Linearized Pangenomes
Panache: a Web Browser-Based Viewer for Linearized Pangenomes
AbstractMotivationPangenomics evolved since its first applications on bacteria, extending from the study of genes for a given population to the study of all of its sequences availa...
Persistent, Private and Mobile genes: a model for gene dynamics in evolving pangenomes
Persistent, Private and Mobile genes: a model for gene dynamics in evolving pangenomes
AbstractThe pangenome of a species is the set of all genes carried by at least one member of the species. In bacteria, pangenomes can be much larger than the set of genes carried b...
What Are We Learning from Plant Pangenomes?
What Are We Learning from Plant Pangenomes?
A single reference genome does not fully capture species diversity. By contrast, a pangenome incorporates multiple genomes to capture the entire set of nonredundant genes in a give...

Back to Top