Javascript must be enabled to continue!
Effect of de novo transcriptome assembly on transcript quantification
View through CrossRef
Abstract
Background
Correct quantification of transcript expression is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For such projects,
de novo
transcriptome assembly must be carried out prior to quantification. However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation on the abundance of transcripts. In this regard, this study comprehensively investigates how assembly quality affects the performance of quantification for RNA-Seq analysis based on
de novo
transcriptome assembly.
Results
Several important factors that might seriously affect the accuracy of the RNA-Seq analysis were thoroughly discussed. First, we found that the assemblers perform comparatively well for the transcriptomes with lower biological complexity. Second, we examined the over-extended and incomplete contigs, and then demonstrated that assembly completeness has a strong impact on the estimation of contig abundance. Lastly, we investigated the behavior of the quantifiers with respect to sequence ambiguity which might be originally present in the transcriptome or accidentally produced by assemblers. The results suggest that the quantifiers often over-estimate the expression of family-collapse contigs and under-estimate the expression of duplicated contigs. For organisms without reference transcriptome, it remained challenging to detect the inaccurate abundance estimation on family-collapse contigs. On the contrary, we observed that the situation of under-estimation on duplicated contigs can be warned through analyzing the read distribution of the duplicated contigs.
Conclusions
In summary, we explicated the behavior of quantifiers when erroneous contigs are present and we outlined the potential problems that the assemblers might cause for the downstream analysis of RNA-Seq. We anticipate the analytic results conducted in this study provides valuable insights for future development of transcriptome assembly and quantification.
Availability
we proposed an open-source Python based package QuantEval that builds connected components for the assembled contigs based on sequence similarity and evaluates the quantification results for each connected component. The package can be downloaded from
https://github.com/dn070017/QuantEval
.
Title: Effect of
de novo
transcriptome assembly on transcript quantification
Description:
Abstract
Background
Correct quantification of transcript expression is essential to understand the functional products of the genome in different physiological conditions and developmental stages.
Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome.
For such projects,
de novo
transcriptome assembly must be carried out prior to quantification.
However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation on the abundance of transcripts.
In this regard, this study comprehensively investigates how assembly quality affects the performance of quantification for RNA-Seq analysis based on
de novo
transcriptome assembly.
Results
Several important factors that might seriously affect the accuracy of the RNA-Seq analysis were thoroughly discussed.
First, we found that the assemblers perform comparatively well for the transcriptomes with lower biological complexity.
Second, we examined the over-extended and incomplete contigs, and then demonstrated that assembly completeness has a strong impact on the estimation of contig abundance.
Lastly, we investigated the behavior of the quantifiers with respect to sequence ambiguity which might be originally present in the transcriptome or accidentally produced by assemblers.
The results suggest that the quantifiers often over-estimate the expression of family-collapse contigs and under-estimate the expression of duplicated contigs.
For organisms without reference transcriptome, it remained challenging to detect the inaccurate abundance estimation on family-collapse contigs.
On the contrary, we observed that the situation of under-estimation on duplicated contigs can be warned through analyzing the read distribution of the duplicated contigs.
Conclusions
In summary, we explicated the behavior of quantifiers when erroneous contigs are present and we outlined the potential problems that the assemblers might cause for the downstream analysis of RNA-Seq.
We anticipate the analytic results conducted in this study provides valuable insights for future development of transcriptome assembly and quantification.
Availability
we proposed an open-source Python based package QuantEval that builds connected components for the assembled contigs based on sequence similarity and evaluates the quantification results for each connected component.
The package can be downloaded from
https://github.
com/dn070017/QuantEval
.
Related Results
Effect of de novo transcriptome assembly on transcript quantification
Effect of de novo transcriptome assembly on transcript quantification
Abstract
Correct quantification of transcript expression is essential to understand the functional elements in different physiological condit...
Importance of transcript variants in transcriptome analyses
Importance of transcript variants in transcriptome analyses
Abstract
RNA sequencing (RNA-Seq) has become a widely adopted genome-wide technique for investigating gene expression patterns. However, conventi...
The utility of transcriptomics in the conservation of sensitive and economically important species
The utility of transcriptomics in the conservation of sensitive and economically important species
The connection between the central dogma of biology [DNA --(Transcription)---› RNA –(Translation)--› Protein] and the 'omics' resources obtained from each molecule are now being ex...
Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies
Background
De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software pack...
Precise Transcript Reconstruction with End-Guided Assembly
Precise Transcript Reconstruction with End-Guided Assembly
ABSTRACT
Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from R...
12481 Evaluating The Device Handling And Preference Assessment Questionnaire For Growth Hormone Deficiency: Results From Content Validation Interviews
12481 Evaluating The Device Handling And Preference Assessment Questionnaire For Growth Hormone Deficiency: Results From Content Validation Interviews
Abstract
Disclosure: J. Neergaard: Employee; Self; Novo Nordisk. S. Akhtar: Employee; Self; Novo Nordisk. Stock Owner; Self; Novo Nordisk. B. Berg: Employee; Self; N...
Insulin Degludec Has Lower Hypoglycemia Risk than Insulin Glargine U100 in Older People with Type 2 Diabetes (T2D)
Insulin Degludec Has Lower Hypoglycemia Risk than Insulin Glargine U100 in Older People with Type 2 Diabetes (T2D)
Vulnerability to hypoglycemia increases with age. To further assess the safety of insulin in older patients, the risk of hypoglycemia was investigated post-hoc in the SWITCH 2 trea...
Informed kmer selection for de novo transcriptome assembly
Informed kmer selection for de novo transcriptome assembly
Transcriptome assembly is one of the important step in many RNA-seq workflows. Currently de bruijn graph based de novo transcriptome assembly algorithms is widely used for this. On...

