Javascript must be enabled to continue!

Kun-peng enables scalable and accurate pan-domain metagenomic classification

Abstract Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space. We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling. Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.73-fold compared with Kraken2. Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets. In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.1 GB peak memory. Kun-peng processed each sample in 0.2–11.2 min with 4.0–35.4 GB peak memory, corresponding to a 54–473-fold reduction in memory usage relative to Kraken2. Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory. Kun-peng classified 69.8%–94.3% of reads, improving coverage by 20%–60% over the standard Kraken2 database with 62,026 genomes. This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods. Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.

Oxford University Press (OUP)

Qiong Chen Boliang Zhang Chen Peng Jiajun Huang Zhen Liu Xiaotao Shen Chao Jiang

Briefings in Bioinformatics

2026

Title: Kun-peng enables scalable and accurate pan-domain metagenomic classification

Description:

Abstract Comprehensive pan-domain metagenomic classification is increasingly constrained by the memory and runtime costs of building and querying the rapidly expanding reference genome space.

We introduce Kun-peng, a taxonomic classifier powered by an intelligent block-partitioned database structure and optimized search strategies, enabling ultra-scalable, memory-efficient pan-domain profiling.

Using the Critical Assessment of Metagenome Interpretation II benchmark, Kun-peng substantially reduces the memory usage of database-building and querying by up to 24-fold, and accelerates sample classification by up to 4.

73-fold compared with Kraken2.

Kun-peng achieves competitive accuracy with fewer false positives than Kraken2, Centrifuger, and even KrakenUniq, while maintaining consistently high sensitivity across diverse datasets.

In a real-world evaluation of 586 metagenomic samples spanning air, water, soil, and human-associated environments, we performed classification using a 4.

3 TB pan-domain database comprising 204,477 genomes, which was built by Kun-peng with only 4.

1 GB peak memory.

Kun-peng processed each sample in 0.

2–11.

2 min with 4.

0–35.

4 GB peak memory, corresponding to a 54–473-fold reduction in memory usage relative to Kraken2.

Compared with Sylph, Kun-peng achieved up to a 46-fold speedup while requiring 21-fold less memory.

Kun-peng classified 69.

8%–94.

3% of reads, improving coverage by 20%–60% over the standard Kraken2 database with 62,026 genomes.

This improvement reflects expanded reference coverage, although a small fraction of false positives is inherent to k-mer-based methods.

Overall, Kun-peng effectively eliminates the long-standing memory bottleneck in pan-domain database building and classification, enabling rapid and scalable pan-domain taxonomic analysis of complex environmental, ecological, and exposomic sequencing datasets.

Back

Abstract Comprehensive metagenomic sequence classification of diverse environmental samples faces significant computing memory challenges due to exponentially expan...

CAIM: Coverage-based Analysis for Identification of Microbiome

ABSTRACT Accurate taxonomic profiling of microbial taxa in a metagenomic sample is vital to gain insights into microbial ecology. Recent advancements in sequencing ...

Metagenomic Thermometer

Abstract Various microorganisms exist in environments, and each of which has an optimal growth temperature (OGT). The relationship between genomi...

Targeted domain assembly for fast functional profiling of metagenomic datasets with S3A

Abstract Motivation The understanding of the ever-increasing number of metagenomic sequences accumulating in our databases deman...

Metagenomic Thermometer

Abstract Various microorganisms exist in environments, and each of them has its optimal growth temperature (OGT). The relationship between genomic information and...

metaJAM: a Nextflow integrated metagenomic workflow for sedimentary ancient DNA

Abstract The application of metagenomics in ancient DNA (aDNA) research is rapidly expanding, driven in particular by advances in sedimentary aDNA research and sequ...

Analysis of space-based observations of peroxyacetyl nitrate (PAN) and its relation to other atmospheric tracers

Peroxyacetyl nitrate (CH3C(O)O2NO2; abbreviate...

Domain Adaptation and Domain Generalization with Representation Learning

Machine learning has achieved great successes in the area of computer vision, especially in object recognition or classification. One of the core factors of the successes ...

Email:
Password:

Email:

Kun-peng enables scalable and accurate pan-domain metagenomic classification

Related Results