Javascript must be enabled to continue!
Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
View through CrossRef
Abstract
Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expanding genome databases. Here, we present Kun-peng, featuring a unique ordered 4GB block database design for ultra-efficient resource management, faster processing, and higher accuracy. When benchmarked on mock communities (Amos HiLo, Mixed, and NIST) against Kraken2, Centrifuge, and Sylph. Kun-peng matched Sylph, achieving the highest precision and lowest false-positive rates while demonstrating superior time and memory efficiency among all tested tools. Furthermore, Kun-peng’s efficient database architecture enables the practical utilization of large-scale reference databases that were previously computationally prohibitive. In comprehensive testing across 586 air, water, soil, and human metagenomic samples using an expansive pan-domain database (204,477 genomes, 4.3TB), Kun-peng classified 69.78-94.29% of reads, achieving 38-43% higher classification rates than Kraken2 with the standard database. Unexpectedly, Sylph failed to classify any reads in air samples and left > 99.85% of reads unclassified in water and soil samples. In terms of computational efficiency, Kun-peng processed each sample in 0.2∼11.2 minutes using only 4.0∼35.4GB peak memory. Remarkably, these processing times were comparable to Kraken2 using the standard database (81GB, 5% of the pan-domain database). Memory-wise, Kun-peng required only 35.4GB peak memory with the pan-domain database, representing a 473-fold reduction compared to Kraken2. When compared to Sylph, Kun-peng processes samples up to 46.3 times faster while using up to 20.6 times less memory. Overall, Kun-peng offers an ultra-memory-efficient, fast, and accurate solution for pan-domain metagenomic classifications.
Title: Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
Description:
Abstract
Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expanding genome databases.
Here, we present Kun-peng, featuring a unique ordered 4GB block database design for ultra-efficient resource management, faster processing, and higher accuracy.
When benchmarked on mock communities (Amos HiLo, Mixed, and NIST) against Kraken2, Centrifuge, and Sylph.
Kun-peng matched Sylph, achieving the highest precision and lowest false-positive rates while demonstrating superior time and memory efficiency among all tested tools.
Furthermore, Kun-peng’s efficient database architecture enables the practical utilization of large-scale reference databases that were previously computationally prohibitive.
In comprehensive testing across 586 air, water, soil, and human metagenomic samples using an expansive pan-domain database (204,477 genomes, 4.
3TB), Kun-peng classified 69.
78-94.
29% of reads, achieving 38-43% higher classification rates than Kraken2 with the standard database.
Unexpectedly, Sylph failed to classify any reads in air samples and left > 99.
85% of reads unclassified in water and soil samples.
In terms of computational efficiency, Kun-peng processed each sample in 0.
2∼11.
2 minutes using only 4.
0∼35.
4GB peak memory.
Remarkably, these processing times were comparable to Kraken2 using the standard database (81GB, 5% of the pan-domain database).
Memory-wise, Kun-peng required only 35.
4GB peak memory with the pan-domain database, representing a 473-fold reduction compared to Kraken2.
When compared to Sylph, Kun-peng processes samples up to 46.
3 times faster while using up to 20.
6 times less memory.
Overall, Kun-peng offers an ultra-memory-efficient, fast, and accurate solution for pan-domain metagenomic classifications.
Related Results
Kun-peng enables scalable and accurate pan-domain metagenomic classification
Kun-peng enables scalable and accurate pan-domain metagenomic classification
Abstract
Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expa...
Research and Application of Ultra-High Pressure Intelligent Well Control Technology for Ultra-Deep Carbonate Rocks
Research and Application of Ultra-High Pressure Intelligent Well Control Technology for Ultra-Deep Carbonate Rocks
Abstract
The exploration and development of the Tarim Oilfield is vigorously advancing into ultra-deep layers. Since 2021, more than 200 deep wells of the 8000m c...
Numeral Classifiers Used in the Cookbooks
Numeral Classifiers Used in the Cookbooks
<p>This article is aimed at describing numeral classifier used in the cookbooks. The data were collected through the observation, which is observation of the cookbooks. Throu...
Study on Physical Simulation Experimental Technology of Ultra-low Permeability Large-scale Outcrop Model
Study on Physical Simulation Experimental Technology of Ultra-low Permeability Large-scale Outcrop Model
Abstract
Ultra-low permeability reserves have accounted for a very large proportion of China's proven reserves and undeveloped reserves at present, so it is very ...
Sustainability and ultra-processed foods: role of youth
Sustainability and ultra-processed foods: role of youth
The objective of this research is to study and look at the ways how processed food affects human and environmental health and to find alternatives to processed food. Sustainabilit...
Sustainability and ultra-processed foods: role of youth
Sustainability and ultra-processed foods: role of youth
The objective of this research is to study and look at the ways how processed food affects human and environmental health and to find alternatives to processed food. Sustainabilit...
The Value of Lateral Flow Urine Lipoarabinomannan Assay and Empirical Treatment in the Xpert MTB/RIF Ultra Era: a Prospective Cohort Study
The Value of Lateral Flow Urine Lipoarabinomannan Assay and Empirical Treatment in the Xpert MTB/RIF Ultra Era: a Prospective Cohort Study
Abstract
Introduction: The value of Lateral Flow urine Lipoarabinomannan (LF-LAM) assay and the role of empiric tuberculosis (TB) treatment in the era of the highly sensiti...
Domain Adaptation and Domain Generalization with Representation Learning
Domain Adaptation and Domain Generalization with Representation Learning
<p>Machine learning has achieved great successes in the area of computer vision, especially in object recognition or classification. One of the core factors of the successes ...

