Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Effects of mapping algorithms on gene selection for RNA-Seq analysis: pulmonary response to acute neonatal hyperoxia

View through CrossRef
Background: A major goal of RNA-Seq data analysis is to reconstruct the full set of gene transcripts expressed in a biological sample in order to quantify their expression levels. The process typically involves multiple steps including mapping short sequence reads to a reference genome, and estimating expression levels based on these mappings. Multiple algorithms and approaches for each processing step exist, and the impact of different methods on estimation of gene expression is not entirely clear. Methods: We evaluated the impact of three common mapping algorithms on differential expression analysis in an RNA-Seq dataset describing the lung response to acute neonatal hyperoxia. RNA-Seq data generated using the Illumina platform were mapped and aligned using CASAVA, TopHat, and SHRiMP against the mouse genome. Significance Analysis of Microarrays and Cuffdiff were used to identify differentially expressed genes between hyperoxia-challenged and age matched control mice. Results: 1403 genes were detected as differentially expressed by least one mapping and gene selection method. A majority of genes (>65%) were identified by all three mapping methods, regardless of the gene selection approach. Expression patterns for 52 genes were examined by quantitative polymerase chain reaction (qPCR). Importantly, we found different validation rates for genes selected by each method; 72% for CASAVA, 69% for TopHat and 63% for SHRiMP. Surprisingly, the validation rate for genes selected by all three mapping methods was no greater than the best single method. Conclusion: The choice of mapping strategy impacts the reliability of gene selection for RNA-Seq data analysis.
Title: Effects of mapping algorithms on gene selection for RNA-Seq analysis: pulmonary response to acute neonatal hyperoxia
Description:
Background: A major goal of RNA-Seq data analysis is to reconstruct the full set of gene transcripts expressed in a biological sample in order to quantify their expression levels.
The process typically involves multiple steps including mapping short sequence reads to a reference genome, and estimating expression levels based on these mappings.
Multiple algorithms and approaches for each processing step exist, and the impact of different methods on estimation of gene expression is not entirely clear.
Methods: We evaluated the impact of three common mapping algorithms on differential expression analysis in an RNA-Seq dataset describing the lung response to acute neonatal hyperoxia.
RNA-Seq data generated using the Illumina platform were mapped and aligned using CASAVA, TopHat, and SHRiMP against the mouse genome.
Significance Analysis of Microarrays and Cuffdiff were used to identify differentially expressed genes between hyperoxia-challenged and age matched control mice.
Results: 1403 genes were detected as differentially expressed by least one mapping and gene selection method.
A majority of genes (>65%) were identified by all three mapping methods, regardless of the gene selection approach.
Expression patterns for 52 genes were examined by quantitative polymerase chain reaction (qPCR).
Importantly, we found different validation rates for genes selected by each method; 72% for CASAVA, 69% for TopHat and 63% for SHRiMP.
Surprisingly, the validation rate for genes selected by all three mapping methods was no greater than the best single method.
Conclusion: The choice of mapping strategy impacts the reliability of gene selection for RNA-Seq data analysis.

Related Results

MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
MARS-seq2.0: an experimental and analytical pipeline for indexed sorting combined with single-cell RNA sequencing v1
Human tissues comprise trillions of cells that populate a complex space of molecular phenotypes and functions and that vary in abundance by 4–9 orders of magnitude. Relying solely ...
Hyperoxia-Induced ΔR 1
Hyperoxia-Induced ΔR 1
Background and Purpose— Acceleration of longitudinal relaxation under hyperoxic challenge (ie, hyperoxia-induced ΔR 1 ) indic...
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract P1-05-23: Utilities and challenges of RNA-Seq based expression and variant calling in a clinical setting
Abstract Introduction Variant calling based on DNA samples has been the gold standard of clinical testing since the advent of Sanger sequencing. The u...
Detection of Multiple Types of Cancer Driver Mutations Using Targeted RNA Sequencing in NSCLC
Detection of Multiple Types of Cancer Driver Mutations Using Targeted RNA Sequencing in NSCLC
ABSTRACTCurrently, DNA and RNA are used separately to capture different types of gene mutations. DNA is commonly used for the detection of SNVs, indels and CNVs; RNA is used for an...
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
Benchmarking Algorithms for Gene Set Scoring of Single-cell ATAC-seq Data
AbstractGene set scoring (GSS) has been routinely conducted for gene expression analysis of bulk or single-cell RNA-seq data, which helps to decipher single-cell heterogeneity and ...
Abstract 2323: Deciphering RNA degradation: Insights from a comparative analysis of paired fresh frozen/FFPE total RNA-seq
Abstract 2323: Deciphering RNA degradation: Insights from a comparative analysis of paired fresh frozen/FFPE total RNA-seq
Abstract Background: Fresh frozen (FF) and formalin-fixed paraffin-embedded (FFPE) samples are primary resources for archival tissues in cancer studies. Despite the ...
Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
Generating Synthetic Single Cell Data from Bulk RNA-seq Using a Pretrained Variational Autoencoder
AbstractSingle cell RNA sequencing (scRNA-seq) is a powerful approach which generates genome-wide gene expression profiles at single cell resolution. Among its many applications, i...

Back to Top