Javascript must be enabled to continue!
Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
View through CrossRef
AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful approach which generates genome-wide gene expression profiles at single cell resolution. Among its many applications, it enables determination of the transcriptional states of distinct cell types in complex tissues, thereby allowing the precise cell type and set of genes driving a disease to be identified. However, scRNA-seq remains costly, and there are extremely limited samples generated in even the most extensive human disease studies. In sharp contrast, there is a wealth of publicly available bulk RNA-seq data, in which single cell and cell type information are effectively averaged. To further leverage this wealth of RNA-seq data, methods have been developed to infer the fraction of cell types from bulk RNA-seq data using single cell data to train models. Additionally, generative AI models have been developed to generate more of an existing scRNA-seq dataset. In this study, we develop an innovative framework that takes full advantage of powerful generative AI approaches and existing scRNA-seq data to generate representative scRNA-seq data from bulk RNA-seq. Our bulk to single cell variational autoencoder-based model, termedbulk2sc, is trained to deconvolve pseudo-bulk RNA-seq datasets back into their constituent single-cell transcriptomes by learning the specific distributions and proportions related to each cell type. We assess the performance of bulk2sc by comparing synthetically generated scRNA-seq to actual scRNA-seq data. Application of bulk2sc to large-scale bulk RNA-seq human disease datasets could yield single cell level insights into disease processes and suggest targeted scRNA-seq experiments.
Title: Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
Description:
AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful approach which generates genome-wide gene expression profiles at single cell resolution.
Among its many applications, it enables determination of the transcriptional states of distinct cell types in complex tissues, thereby allowing the precise cell type and set of genes driving a disease to be identified.
However, scRNA-seq remains costly, and there are extremely limited samples generated in even the most extensive human disease studies.
In sharp contrast, there is a wealth of publicly available bulk RNA-seq data, in which single cell and cell type information are effectively averaged.
To further leverage this wealth of RNA-seq data, methods have been developed to infer the fraction of cell types from bulk RNA-seq data using single cell data to train models.
Additionally, generative AI models have been developed to generate more of an existing scRNA-seq dataset.
In this study, we develop an innovative framework that takes full advantage of powerful generative AI approaches and existing scRNA-seq data to generate representative scRNA-seq data from bulk RNA-seq.
Our bulk to single cell variational autoencoder-based model, termedbulk2sc, is trained to deconvolve pseudo-bulk RNA-seq datasets back into their constituent single-cell transcriptomes by learning the specific distributions and proportions related to each cell type.
We assess the performance of bulk2sc by comparing synthetically generated scRNA-seq to actual scRNA-seq data.
Application of bulk2sc to large-scale bulk RNA-seq human disease datasets could yield single cell level insights into disease processes and suggest targeted scRNA-seq experiments.
Related Results
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract
Introduction
Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...
MuSiC2: cell type deconvolution for multi-condition bulk RNA-seq data
MuSiC2: cell type deconvolution for multi-condition bulk RNA-seq data
ABSTRACTCell type composition of intact bulk tissues can vary across samples. Deciphering cell type composition and its changes during disease progression is an important step towa...
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
AbstractGene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA-seq data, which helps to decipher single-cell heterogeneity and ...
Detection of Multiple Types of Cancer Driver Mutations Using Targeted RNA Sequencing in NSCLC
Detection of Multiple Types of Cancer Driver Mutations Using Targeted RNA Sequencing in NSCLC
ABSTRACTCurrently, DNA and RNA are used separately to capture different types of gene mutations. DNA is commonly used for the detection of SNVs, indels and CNVs; RNA is used for an...
Abstract 2708: Toward improved cancer classification using PCA + tSNE dimensionality reduction on bulk RNA-seq data
Abstract 2708: Toward improved cancer classification using PCA + tSNE dimensionality reduction on bulk RNA-seq data
Abstract
Intro: Minor variations in cancer type can have a major impact on therapeutic effectiveness and on the course of drug research and development. In order to ...
Global Prediction of Chromatin Accessibility Using RNA-seq from Small Number of Cells
Global Prediction of Chromatin Accessibility Using RNA-seq from Small Number of Cells
ABSTRACTConventional high-throughput technologies for mapping regulatory element activities such as ChIP-seq, DNase-seq and FAIRE-seq cannot analyze samples with small number of ce...
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
Autoencoder-based cluster ensembles for single-cell RNA-seq data analysis
AbstractBackgroundSingle-cell RNA-sequencing (scRNA-seq) is a transformative technology, allowing global transcriptomes of individual cells to be profiled with high accuracy. An es...


