Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Deep Learning for Predicting 16S rRNA Gene Copy Number

View through CrossRef
ABSTRACTBackgroundCulture-independent 16S rRNA gene metabarcoding is a commonly used method in microbiome profiling. However, this approach can only reflect the proportion of sequencing reads, rather than the actual cell fraction. To achieve more quantitative cell fraction estimates, we need to resolve the 16S gene copy numbers (GCN) for different community members. Currently, there are several bioinformatic tools available to estimate 16S GCN, either based on taxonomy assignment or phylogeny.MethodHere we develop a novel algorithm, Stacked Ensemble Model (SEM), that estimates 16S GCN directly from the 16S rRNA gene sequence strings, without resolving taxonomy or phylogeny. For accessibility, we developed a public, end-to-end, web-based tool based on the SEM model, named Artificial Neural Network Approximator for 16S rRNA Gene Copy Number (ANNA16).ResultsBased on 27,579 16S rRNA gene sequence data (rrnDB database), we show that ANNA16 outperforms the most commonly used 16S GCN prediction algorithms. The prediction error range in the 5-fold cross validation of SEM is completely lower than all other algorithms for the 16S full-length sequence and partially lower at 16S subregions. The final test and a mock community test indicate ANNA16 is more accurate than all currently available tools (i.e., rrnDB, CopyRighter, PICRUSt2, & PAPRICA). SHAP value analysis indicates ANNA16 mainly learns information from rare insertions.ConclusionANNA16 represents a deep learning based 16S GCN prediction tool. Compared to the traditional GCN prediction tools, ANNA16 has a simple structure, faster inference speed without precomputing, and higher accuracy. With increased 16S GCN data in the database, future studies could improve the prediction errors for rare, high-GCN taxa due to current under sampling.
Title: Deep Learning for Predicting 16S rRNA Gene Copy Number
Description:
ABSTRACTBackgroundCulture-independent 16S rRNA gene metabarcoding is a commonly used method in microbiome profiling.
However, this approach can only reflect the proportion of sequencing reads, rather than the actual cell fraction.
To achieve more quantitative cell fraction estimates, we need to resolve the 16S gene copy numbers (GCN) for different community members.
Currently, there are several bioinformatic tools available to estimate 16S GCN, either based on taxonomy assignment or phylogeny.
MethodHere we develop a novel algorithm, Stacked Ensemble Model (SEM), that estimates 16S GCN directly from the 16S rRNA gene sequence strings, without resolving taxonomy or phylogeny.
For accessibility, we developed a public, end-to-end, web-based tool based on the SEM model, named Artificial Neural Network Approximator for 16S rRNA Gene Copy Number (ANNA16).
ResultsBased on 27,579 16S rRNA gene sequence data (rrnDB database), we show that ANNA16 outperforms the most commonly used 16S GCN prediction algorithms.
The prediction error range in the 5-fold cross validation of SEM is completely lower than all other algorithms for the 16S full-length sequence and partially lower at 16S subregions.
The final test and a mock community test indicate ANNA16 is more accurate than all currently available tools (i.
e.
, rrnDB, CopyRighter, PICRUSt2, & PAPRICA).
SHAP value analysis indicates ANNA16 mainly learns information from rare insertions.
ConclusionANNA16 represents a deep learning based 16S GCN prediction tool.
Compared to the traditional GCN prediction tools, ANNA16 has a simple structure, faster inference speed without precomputing, and higher accuracy.
With increased 16S GCN data in the database, future studies could improve the prediction errors for rare, high-GCN taxa due to current under sampling.

Related Results

Evaluation of 16S rRNA Gene Sequence for DNA Barcoding of Tuna Fish
Evaluation of 16S rRNA Gene Sequence for DNA Barcoding of Tuna Fish
For fish product authentication, DNA barcoding has been a reliable tool. This is due to its requirement of a small amount of tissue sample in order to conduct a full analysis for s...
PEMANFAATAN GEN 16S rRNA SEBAGAI PERANGKAT IDENTIFIKASI BAKTERI UNTUK PENELITIAN-PENELITIAN DI INDONESIA
PEMANFAATAN GEN 16S rRNA SEBAGAI PERANGKAT IDENTIFIKASI BAKTERI UNTUK PENELITIAN-PENELITIAN DI INDONESIA
ABSTRACTThe 16S rRNA gene has hyper variable region and different for one bacterial species to another. The gene is being used as research tool to help for accurate identification ...
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Pipeline for species-resolved full-length16S rRNA amplicon nanopore sequencing analysis of low-complexity bacterial microbiota
Abstract 16S rRNA amplicon sequencing is a fundamental tool for characterizing prokaryotic microbial communities. While short-read 16S rRNA sequencing is a proven s...
Identification of bacterial pathogens from clinical samples using 16S rRNA sequencing
Identification of bacterial pathogens from clinical samples using 16S rRNA sequencing
Introduction: Bacterial infections have a substantial impact on global health and can become serious if misdiagnosed with several diseases related to the central nervous, cardiovas...
Gallibacterium
Gallibacterium
AbstractGal.li.bac.te'ri.um. L. masc. n.galluschicken; N.L. neut. n.bacteriumrod; N.L. neut. n.Gallibacteriumbacterium of chicken.Proteobacteria / Gammaproteobacteria / Pasteurella...
Expression and polymorphism of genes in gallstones
Expression and polymorphism of genes in gallstones
ABSTRACT Through the method of clinical case control study, to explore the expression and genetic polymorphism of KLF14 gene (rs4731702 and rs972283) and SR-B1 gene (rs...

Back to Top