Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Kun-peng enables scalable and accurate pan-domain metagenomic classification

View through CrossRef
Abstract Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2–11.2 min with 4.0–35.4 GB peak memory, corresponding to a 54–473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%–94.3% of reads, improving coverage by 20%–60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.
Title: Kun-peng enables scalable and accurate pan-domain metagenomic classification
Description:
Abstract Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space.
We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling.
Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.
73-fold compared with Kraken2.
Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets.
In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.
3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.
1 GB peak memory.
Kun-peng processed each sample in 0.
2–11.
2 min with 4.
0–35.
4 GB peak memory, corresponding to a 54–473-fold reduction in memory usage relative to Kraken2.
Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory.
Kun-peng classified 69.
8%–94.
3% of reads, improving coverage by 20%–60% over the standard Kraken2 database with 62,026 genomes.
This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods.
Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.

Related Results

Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
Kun-peng: an ultra-memory-efficient, fast, and accurate pan-domain taxonomic classifier for all
Abstract Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expan...
CAIM: Coverage-based Analysis for Identification of Microbiome
CAIM: Coverage-based Analysis for Identification of Microbiome
ABSTRACT Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing ...
Metagenomic Thermometer
Metagenomic Thermometer
Abstract Various microorganisms exist in environments, and each of which has an optimal growth temperature (OGT). The relationship between genomi...
Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A
Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A
Abstract Motivation The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases deman...
Metagenomic Thermometer
Metagenomic Thermometer
Abstract Various microorganisms exist in environments, and each of them has its optimal growth temperature (OGT). The relationship between genomic information and...
Coriolis: enabling metagenomic classification on lightweight mobile devices
Coriolis: enabling metagenomic classification on lightweight mobile devices
Abstract Motivation The introduction of portable DNA sequencers such as the Oxford Nanopore Technologies MinION has enabled real...
LMAS: evaluating metagenomic short de novo assembly methods through defined communities
LMAS: evaluating metagenomic short de novo assembly methods through defined communities
Abstract Background The de novo assembly of raw sequence data is key in metagenomic analysis. It allows recovering draft genomes...

Back to Top