Javascript must be enabled to continue!
Usefulness of scRNA-seq data in predicting plant metabolic pathway genes
View through CrossRef
It is an ever challenging task to make genome-wide predictions for plant metabolic pathway genes (MPGs) encoding enzymes that catalyze the biosynthesis of plant natural products. Here, starting from 1,129 benchmark MPGs that have experimental evidence in Arabidopsis thaliana, we investigate the utilities of single-cell RNA sequencing (scRNA-seq) data, a recently arisen omics data that has been used in several other fields, in predicting MPGs using five machine learning (ML) algorithms that support multi-label tasks. Compared with traditional bulk RNA-seq data, scRNA-seq data lead to different but comparable co-expression networks among MPGs within metabolic classes, and significantly higher prediction accuracy of MPGs into classes. Prediction accuracy for individual metabolic classes is not associated with the co-expression network tightness, but correlated with the number of MPGs within each class, indicating that including more benchmark genes in the future will improve the MPG prediction. Splitting the RNA-seq data into genetic background/condition or tissue-specific subsets can improve the gene co-expression network tightness and MPG prediction accuracy for some classes; scRNA-seq-based models still outperform bulk RNA-seq-based models for most classes when corresponding subsets are used. In addition, deep learning approaches outperform classical machine learning approaches; approaches implemented in an ensembled workflow AutoGluon tend to have severe overfitting issues potentially due to the relative scarcity of benchmark MPGs within classes. Our results demonstrate the superiority of scRNA-seq data over bulk RNA-seq data in predicting MPGs into metabolic classes, and propose that scRNA-seq data should be included in the future to advance the identification of plant MPGs.
Title: Usefulness of scRNA-seq data in predicting plant metabolic pathway genes
Description:
It is an ever challenging task to make genome-wide predictions for plant metabolic pathway genes (MPGs) encoding enzymes that catalyze the biosynthesis of plant natural products.
Here, starting from 1,129 benchmark MPGs that have experimental evidence in Arabidopsis thaliana, we investigate the utilities of single-cell RNA sequencing (scRNA-seq) data, a recently arisen omics data that has been used in several other fields, in predicting MPGs using five machine learning (ML) algorithms that support multi-label tasks.
Compared with traditional bulk RNA-seq data, scRNA-seq data lead to different but comparable co-expression networks among MPGs within metabolic classes, and significantly higher prediction accuracy of MPGs into classes.
Prediction accuracy for individual metabolic classes is not associated with the co-expression network tightness, but correlated with the number of MPGs within each class, indicating that including more benchmark genes in the future will improve the MPG prediction.
Splitting the RNA-seq data into genetic background/condition or tissue-specific subsets can improve the gene co-expression network tightness and MPG prediction accuracy for some classes; scRNA-seq-based models still outperform bulk RNA-seq-based models for most classes when corresponding subsets are used.
In addition, deep learning approaches outperform classical machine learning approaches; approaches implemented in an ensembled workflow AutoGluon tend to have severe overfitting issues potentially due to the relative scarcity of benchmark MPGs within classes.
Our results demonstrate the superiority of scRNA-seq data over bulk RNA-seq data in predicting MPGs into metabolic classes, and propose that scRNA-seq data should be included in the future to advance the identification of plant MPGs.
Related Results
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful approach which generates genome-wide gene expression profiles at single cell resolution. Among its many applications, i...
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
AbstractGene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA-seq data, which helps to decipher single-cell heterogeneity and ...
MuSiC2: cell type deconvolution for multi-condition bulk RNA-seq data
MuSiC2: cell type deconvolution for multi-condition bulk RNA-seq data
ABSTRACTCell type composition of intact bulk tissues can vary across samples. Deciphering cell type composition and its changes during disease progression is an important step towa...
Asc-Seurat: analytical single-cell Seurat-based web application
Asc-Seurat: analytical single-cell Seurat-based web application
Abstract
Background
Single-cell RNA sequencing (scRNA-seq) has revolutionized the study of transcriptomes, arising as a powerful tool for discoverin...
Learning association for single-cell transcriptomics by integrating profiling of gene expression and alternative polyadenylation
Learning association for single-cell transcriptomics by integrating profiling of gene expression and alternative polyadenylation
AbstractSingle-cell RNA-sequencing (scRNA-seq) has enabled transcriptome-wide profiling of gene expressions in individual cells. A myriad of computational methods have been propose...
Evaluating stably expressed genes in single cells
Evaluating stably expressed genes in single cells
AbstractBackgroundSingle-cell RNA-seq (scRNA-seq) profiling has revealed remarkable variation in transcription, suggesting that expression of many genes at the single-cell level ar...
scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size
scHinter: imputing dropout events for single-cell RNA-seq data with limited sample size
Abstract
Motivation
Single-cell RNA-sequencing (scRNA-seq) is fast and becoming a powerful technique for studying dynamic gene r...

