Javascript must be enabled to continue!
Kun-peng enables scalable and accurate pan-domain metagenomic classification
View through CrossRef
Abstract
Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2–11.2 min with 4.0–35.4 GB peak memory, corresponding to a 54–473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%–94.3% of reads, improving coverage by 20%–60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.
Oxford University Press (OUP)
Title: Kun-peng enables scalable and accurate pan-domain metagenomic classification
Description:
Abstract
Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space.
We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling.
Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.
73-fold compared with Kraken2.
Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets.
In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.
3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.
1 GB peak memory.
Kun-peng processed each sample in 0.
2–11.
2 min with 4.
0–35.
4 GB peak memory, corresponding to a 54–473-fold reduction in memory usage relative to Kraken2.
Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory.
Kun-peng classified 69.
8%–94.
3% of reads, improving coverage by 20%–60% over the standard Kraken2 database with 62,026 genomes.
This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods.
Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.
Related Results
Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
Abstract
Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expan...
CAIM: Coverage-based Analysis for Identification of Microbiome
CAIM: Coverage-based Analysis for Identification of Microbiome
ABSTRACT
Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing ...
Metagenomic Thermometer
Metagenomic Thermometer
Abstract
Various microorganisms exist in environments, and each of which has an optimal growth temperature (OGT). The relationship between genomi...
Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A
Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A
Abstract
Motivation
The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases deman...
Metagenomic Thermometer
Metagenomic Thermometer
Abstract
Various microorganisms exist in environments, and each of them has its optimal growth temperature (OGT). The relationship between genomic information and...
Coriolis: enabling metagenomic classification on lightweight mobile devices
Coriolis: enabling metagenomic classification on lightweight mobile devices
Abstract
Motivation
The introduction of portable DNA sequencers such as the Oxford Nanopore Technologies MinION has enabled real...
Analysis of space-based observations of peroxyacetyl nitrate (PAN) and its relation to other atmospheric tracers
Analysis of space-based observations of peroxyacetyl nitrate (PAN) and its relation to other atmospheric tracers
<p>Peroxyacetyl nitrate (CH<sub>3</sub>C(O)O<sub>2</sub>NO<sub>2</sub>; abbreviate...
LMAS: evaluating metagenomic short de novo assembly methods through defined communities
LMAS: evaluating metagenomic short de novo assembly methods through defined communities
Abstract
Background
The de novo assembly of raw sequence data is key in metagenomic analysis. It allows recovering draft genomes...

