Javascript must be enabled to continue!
Deep Learning for Predicting 16S rRNA Gene Copy Number
View through CrossRef
ABSTRACTBackgroundCulture-independent 16S rRNA gene metabarcoding is a commonly used method in microbiome profiling. However, this approach can only reflect the proportion of sequencing reads, rather than the actual cell fraction. To achieve more quantitative cell fraction estimates, we need to resolve the 16S gene copy numbers (GCN) for different community members. Currently, there are several bioinformatic tools available to estimate 16S GCN, either based on taxonomy assignment or phylogeny.MethodHere we develop a novel algorithm, Stacked Ensemble Model (SEM), that estimates 16S GCN directly from the 16S rRNA gene sequence strings, without resolving taxonomy or phylogeny. For accessibility, we developed a public, end-to-end, web-based tool based on the SEM model, named Artificial Neural Network Approximator for 16S rRNA Gene Copy Number (ANNA16).ResultsBased on 27,579 16S rRNA gene sequence data (rrnDB database), we show that ANNA16 outperforms the most commonly used 16S GCN prediction algorithms. The prediction error range in the 5-fold cross validation of SEM is completely lower than all other algorithms for the 16S full-length sequence and partially lower at 16S subregions. The final test and a mock community test indicate ANNA16 is more accurate than all currently available tools (i.e., rrnDB, CopyRighter, PICRUSt2, & PAPRICA). SHAP value analysis indicates ANNA16 mainly learns information from rare insertions.ConclusionANNA16 represents a deep learning based 16S GCN prediction tool. Compared to the traditional GCN prediction tools, ANNA16 has a simple structure, faster inference speed without precomputing, and higher accuracy. With increased 16S GCN data in the database, future studies could improve the prediction errors for rare, high-GCN taxa due to current under sampling.
Title: Deep Learning for Predicting 16S rRNA Gene Copy Number
Description:
ABSTRACTBackgroundCulture-independent 16S rRNA gene metabarcoding is a commonly used method in microbiome profiling.
However, this approach can only reflect the proportion of sequencing reads, rather than the actual cell fraction.
To achieve more quantitative cell fraction estimates, we need to resolve the 16S gene copy numbers (GCN) for different community members.
Currently, there are several bioinformatic tools available to estimate 16S GCN, either based on taxonomy assignment or phylogeny.
MethodHere we develop a novel algorithm, Stacked Ensemble Model (SEM), that estimates 16S GCN directly from the 16S rRNA gene sequence strings, without resolving taxonomy or phylogeny.
For accessibility, we developed a public, end-to-end, web-based tool based on the SEM model, named Artificial Neural Network Approximator for 16S rRNA Gene Copy Number (ANNA16).
ResultsBased on 27,579 16S rRNA gene sequence data (rrnDB database), we show that ANNA16 outperforms the most commonly used 16S GCN prediction algorithms.
The prediction error range in the 5-fold cross validation of SEM is completely lower than all other algorithms for the 16S full-length sequence and partially lower at 16S subregions.
The final test and a mock community test indicate ANNA16 is more accurate than all currently available tools (i.
e.
, rrnDB, CopyRighter, PICRUSt2, & PAPRICA).
SHAP value analysis indicates ANNA16 mainly learns information from rare insertions.
ConclusionANNA16 represents a deep learning based 16S GCN prediction tool.
Compared to the traditional GCN prediction tools, ANNA16 has a simple structure, faster inference speed without precomputing, and higher accuracy.
With increased 16S GCN data in the database, future studies could improve the prediction errors for rare, high-GCN taxa due to current under sampling.
Related Results
Evaluation of 16S rRNA Gene Sequence for DNA Barcoding of Tuna Fish
Evaluation of 16S rRNA Gene Sequence for DNA Barcoding of Tuna Fish
For fish product authentication, DNA barcoding has been a reliable tool. This is due to its requirement of a small amount of tissue sample in order to conduct a full analysis for s...
238. Direct identification of Bacterial Species with MinION Nanopore Sequencer In Clinical Specimens Suspected of Polybacterial Infection
238. Direct identification of Bacterial Species with MinION Nanopore Sequencer In Clinical Specimens Suspected of Polybacterial Infection
Abstract
Background
Conventional culture tests usually identify only a few bacterial species, which can grow well in the culture...
PEMANFAATAN GEN 16S rRNA SEBAGAI PERANGKAT IDENTIFIKASI BAKTERI UNTUK PENELITIAN-PENELITIAN DI INDONESIA
PEMANFAATAN GEN 16S rRNA SEBAGAI PERANGKAT IDENTIFIKASI BAKTERI UNTUK PENELITIAN-PENELITIAN DI INDONESIA
ABSTRACTThe 16S rRNA gene has hyper variable region and different for one bacterial species to another. The gene is being used as research tool to help for accurate identification ...
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Abstract
16S rRNA amplicon sequencing is a fundamental tool for characterizing prokaryotic microbial communities. While short-read 16S rRNA sequencing is a proven s...
Phylogenetic analyses of the genus Glaciecola: emended description of the genus Glaciecola, transfer of Glaciecola mesophila, G. agarilytica, G. aquimarina, G. arctica, G. chathamensis, G. polaris and G. psychrophila to the genus Paraglaciecola gen. nov.
Phylogenetic analyses of the genus Glaciecola: emended description of the genus Glaciecola, transfer of Glaciecola mesophila, G. agarilytica, G. aquimarina, G. arctica, G. chathamensis, G. polaris and G. psychrophila to the genus Paraglaciecola gen. nov.
Phylogenetic analyses of the genusGlaciecolawere performed using the sequences of the 16S rRNA gene and the GyrB protein to establish its taxonomic status. The results indicated a ...
Identification of bacterial pathogens from clinical samples using 16S rRNA sequencing
Identification of bacterial pathogens from clinical samples using 16S rRNA sequencing
Introduction: Bacterial infections have a substantial impact on global health and can become serious if misdiagnosed with several diseases related to the central nervous, cardiovas...
Gallibacterium
Gallibacterium
AbstractGal.li.bac.te'ri.um. L. masc. n.galluschicken; N.L. neut. n.bacteriumrod; N.L. neut. n.Gallibacteriumbacterium of chicken.Proteobacteria / Gammaproteobacteria / Pasteurella...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT
Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene (rs...

