Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Deep Learning for Predicting 16S rRNA Gene Copy Number

View through CrossRef
ABSTRACTBackgroundCulture-independent 16S rRNA gene metabarcoding is a commonly used method in microbiome profiling. However, this approach can only reflect the proportion of sequencing reads, rather than the actual cell fraction. To achieve more quantitative cell fraction estimates, we need to resolve the 16S gene copy numbers (GCN) for different community members. Currently, there are several bioinformatic tools available to estimate 16S GCN, either based on taxonomy assignment or phylogeny.MethodHere we develop a novel algorithm, Stacked Ensemble Model (SEM), that estimates 16S GCN directly from the 16S rRNA gene sequence strings, without resolving taxonomy or phylogeny. For accessibility, we developed a public, end-to-end, web-based tool based on the SEM model, named Artificial Neural Network Approximator for 16S rRNA Gene Copy Number (ANNA16).ResultsBased on 27,579 16S rRNA gene sequence data (rrnDB database), we show that ANNA16 outperforms the most commonly used 16S GCN prediction algorithms. The prediction error range in the 5-fold cross validation of SEM is completely lower than all other algorithms for the 16S full-length sequence and partially lower at 16S subregions. The final test and a mock community test indicate ANNA16 is more accurate than all currently available tools (i.e., rrnDB, CopyRighter, PICRUSt2, & PAPRICA). SHAP value analysis indicates ANNA16 mainly learns information from rare insertions.ConclusionANNA16 represents a deep learning based 16S GCN prediction tool. Compared to the traditional GCN prediction tools, ANNA16 has a simple structure, faster inference speed without precomputing, and higher accuracy. With increased 16S GCN data in the database, future studies could improve the prediction errors for rare, high-GCN taxa due to current under sampling.
Title: Deep Learning for Predicting 16S rRNA Gene Copy Number
Description:
ABSTRACTBackgroundCulture-independent 16S rRNA gene metabarcoding is a commonly used method in microbiome profiling.
However, this approach can only reflect the proportion of sequencing reads, rather than the actual cell fraction.
To achieve more quantitative cell fraction estimates, we need to resolve the 16S gene copy numbers (GCN) for different community members.
Currently, there are several bioinformatic tools available to estimate 16S GCN, either based on taxonomy assignment or phylogeny.
MethodHere we develop a novel algorithm, Stacked Ensemble Model (SEM), that estimates 16S GCN directly from the 16S rRNA gene sequence strings, without resolving taxonomy or phylogeny.
For accessibility, we developed a public, end-to-end, web-based tool based on the SEM model, named Artificial Neural Network Approximator for 16S rRNA Gene Copy Number (ANNA16).
ResultsBased on 27,579 16S rRNA gene sequence data (rrnDB database), we show that ANNA16 outperforms the most commonly used 16S GCN prediction algorithms.
The prediction error range in the 5-fold cross validation of SEM is completely lower than all other algorithms for the 16S full-length sequence and partially lower at 16S subregions.
The final test and a mock community test indicate ANNA16 is more accurate than all currently available tools (i.
e.
, rrnDB, CopyRighter, PICRUSt2, & PAPRICA).
SHAP value analysis indicates ANNA16 mainly learns information from rare insertions.
ConclusionANNA16 represents a deep learning based 16S GCN prediction tool.
Compared to the traditional GCN prediction tools, ANNA16 has a simple structure, faster inference speed without precomputing, and higher accuracy.
With increased 16S GCN data in the database, future studies could improve the prediction errors for rare, high-GCN taxa due to current under sampling.

Related Results

Identification of bacterial pathogens from clinical samples using 16S rRNA sequencing
Identification of bacterial pathogens from clinical samples using 16S rRNA sequencing
Introduction: Bacterial infections have a substantial impact on global health and can become serious if misdiagnosed with several diseases related to the central nervous, cardiovas...
Gallibacterium
Gallibacterium
AbstractGal.li.bac.te'ri.um. L. masc. n.galluschicken; N.L. neut. n.bacteriumrod; N.L. neut. n.Gallibacteriumbacterium of chicken.Proteobacteria / Gammaproteobacteria / Pasteurella...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene (rs...
Glycomic profiling of the gut microbiota by Glycan-seq
Glycomic profiling of the gut microbiota by Glycan-seq
AbstractBackgroundThere has been immense interest in studying the relationship between the gut microbiota and human health. Bacterial glycans modulate the cross talk between the gu...
Deep convolutional neural network and IoT technology for healthcare
Deep convolutional neural network and IoT technology for healthcare
Background Deep Learning is an AI technology that trains computers to analyze data in an approach similar to the human brain. Deep learning algorithms can find complex patterns in ...
Microbial Translocation in Patients with Parkinson’s Disease in Zambia: a Case Control Study
Microbial Translocation in Patients with Parkinson’s Disease in Zambia: a Case Control Study
Abstract BACKGROUND Over the past few years evidence has emerged that Parkinson’s disease (PD) could originate from the gastrointestinal tract. Gut leakiness in patients wh...

Back to Top