Javascript must be enabled to continue!
Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
View through CrossRef
Abstract
Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expanding genome databases. Here, we present Kun-peng, featuring a unique ordered 4GB block database design for ultra-efficient resource management, faster processing, and higher accuracy. When benchmarked on mock communities (Amos HiLo, Mixed, and NIST) against Kraken2, Centrifuge, and Sylph. Kun-peng matched Sylph, achieving the highest precision and lowest false-positive rates while demonstrating superior time and memory efficiency among all tested tools. Furthermore, Kun-peng’s efficient database architecture enables the practical utilization of large-scale reference databases that were previously computationally prohibitive. In comprehensive testing across 586 air, water, soil, and human metagenomic samples using an expansive pan-domain database (204,477 genomes, 4.3TB), Kun-peng classified 69.78-94.29% of reads, achieving 38-43% higher classification rates than Kraken2 with the standard database. Unexpectedly, Sylph failed to classify any reads in air samples and left > 99.85% of reads unclassified in water and soil samples. In terms of computational efficiency, Kun-peng processed each sample in 0.2∼11.2 minutes using only 4.0∼35.4GB peak memory. Remarkably, these processing times were comparable to Kraken2 using the standard database (81GB, 5% of the pan-domain database). Memory-wise, Kun-peng required only 35.4GB peak memory with the pan-domain database, representing a 473-fold reduction compared to Kraken2. When compared to Sylph, Kun-peng processes samples up to 46.3 times faster while using up to 20.6 times less memory. Overall, Kun-peng offers an ultra-memory-efficient, fast, and accurate solution for pan-domain metagenomic classifications.
Title: Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
Description:
Abstract
Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expanding genome databases.
Here, we present Kun-peng, featuring a unique ordered 4GB block database design for ultra-efficient resource management, faster processing, and higher accuracy.
When benchmarked on mock communities (Amos HiLo, Mixed, and NIST) against Kraken2, Centrifuge, and Sylph.
Kun-peng matched Sylph, achieving the highest precision and lowest false-positive rates while demonstrating superior time and memory efficiency among all tested tools.
Furthermore, Kun-peng’s efficient database architecture enables the practical utilization of large-scale reference databases that were previously computationally prohibitive.
In comprehensive testing across 586 air, water, soil, and human metagenomic samples using an expansive pan-domain database (204,477 genomes, 4.
3TB), Kun-peng classified 69.
78-94.
29% of reads, achieving 38-43% higher classification rates than Kraken2 with the standard database.
Unexpectedly, Sylph failed to classify any reads in air samples and left > 99.
85% of reads unclassified in water and soil samples.
In terms of computational efficiency, Kun-peng processed each sample in 0.
2∼11.
2 minutes using only 4.
0∼35.
4GB peak memory.
Remarkably, these processing times were comparable to Kraken2 using the standard database (81GB, 5% of the pan-domain database).
Memory-wise, Kun-peng required only 35.
4GB peak memory with the pan-domain database, representing a 473-fold reduction compared to Kraken2.
When compared to Sylph, Kun-peng processes samples up to 46.
3 times faster while using up to 20.
6 times less memory.
Overall, Kun-peng offers an ultra-memory-efficient, fast, and accurate solution for pan-domain metagenomic classifications.
Related Results
Kun-peng enables scalable and accurate pan-domain metagenomic classification
Kun-peng enables scalable and accurate pan-domain metagenomic classification
Abstract
Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expa...
Study on Physical Simulation Experimental Technology of Ultra-low Permeability Large-scale Outcrop Model
Study on Physical Simulation Experimental Technology of Ultra-low Permeability Large-scale Outcrop Model
Abstract
Ultra-low permeability reserves have accounted for a very large proportion of China's proven reserves and undeveloped reserves at present, so it is very ...
Sustainability and ultra-processed foods: role of youth
Sustainability and ultra-processed foods: role of youth
The objective of this research is to study and look at the ways how processed food affects human and environmental health and to find alternatives to processed food. Sustainabilit...
Sustainability and ultra-processed foods: role of youth
Sustainability and ultra-processed foods: role of youth
The objective of this research is to study and look at the ways how processed food affects human and environmental health and to find alternatives to processed food. Sustainabilit...
The Value of Lateral Flow Urine Lipoarabinomannan Assay and Empirical Treatment in the Xpert MTB/RIF Ultra Era: a Prospective Cohort Study
The Value of Lateral Flow Urine Lipoarabinomannan Assay and Empirical Treatment in the Xpert MTB/RIF Ultra Era: a Prospective Cohort Study
Abstract
Introduction: The value of Lateral Flow urine Lipoarabinomannan (LF-LAM) assay and the role of empiric tuberculosis (TB) treatment in the era of the highly sensiti...
Analysis of space-based observations of peroxyacetyl nitrate (PAN) and its relation to other atmospheric tracers
Analysis of space-based observations of peroxyacetyl nitrate (PAN) and its relation to other atmospheric tracers
<p>Peroxyacetyl nitrate (CH<sub>3</sub>C(O)O<sub>2</sub>NO<sub>2</sub>; abbreviate...
The value of lateral flow urine lipoarabinomannan assay and empirical treatment in Xpert MTB/RIF ultra negative patients with presumptive TB: a prospective cohort study
The value of lateral flow urine lipoarabinomannan assay and empirical treatment in Xpert MTB/RIF ultra negative patients with presumptive TB: a prospective cohort study
AbstractThe value of Lateral Flow urine Lipoarabinomannan (LF-LAM) assay and the role of empiric tuberculosis (TB) treatment in the era of the highly sensitive Xpert MTB/RIF Ultra ...
Changes in pacing variation with increasing race duration in ultra-triathlon races
Changes in pacing variation with increasing race duration in ultra-triathlon races
Abstract
Background: Despite the increasing scientific interest in the relationship between pacing and performance in endurance sports, little information is available abou...

