Javascript must be enabled to continue!

Effect of de novo transcriptome assembly on transcript quantification

Abstract Background Correct quantification of transcript expression is essential to understand the functional products of the genome in different physiological conditions and developmental stages. Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome. For such projects, de novo transcriptome assembly must be carried out prior to quantification. However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation on the abundance of transcripts. In this regard, this study comprehensively investigates how assembly quality affects the performance of quantification for RNA-Seq analysis based on de novo transcriptome assembly. Results Several important factors that might seriously affect the accuracy of the RNA-Seq analysis were thoroughly discussed. First, we found that the assemblers perform comparatively well for the transcriptomes with lower biological complexity. Second, we examined the over-extended and incomplete contigs, and then demonstrated that assembly completeness has a strong impact on the estimation of contig abundance. Lastly, we investigated the behavior of the quantifiers with respect to sequence ambiguity which might be originally present in the transcriptome or accidentally produced by assemblers. The results suggest that the quantifiers often over-estimate the expression of family-collapse contigs and under-estimate the expression of duplicated contigs. For organisms without reference transcriptome, it remained challenging to detect the inaccurate abundance estimation on family-collapse contigs. On the contrary, we observed that the situation of under-estimation on duplicated contigs can be warned through analyzing the read distribution of the duplicated contigs. Conclusions In summary, we explicated the behavior of quantifiers when erroneous contigs are present and we outlined the potential problems that the assemblers might cause for the downstream analysis of RNA-Seq. We anticipate the analytic results conducted in this study provides valuable insights for future development of transcriptome assembly and quantification. Availability we proposed an open-source Python based package QuantEval that builds connected components for the assembled contigs based on sequence similarity and evaluates the quantification results for each connected component. The package can be downloaded from https://github.com/dn070017/QuantEval .

openRxiv

Ping-Han Hsieh Yen-Jen Oyang Chien-Yu Chen

2018

Title: Effect of de novo transcriptome assembly on transcript quantification

Description:

Abstract Background Correct quantification of transcript expression is essential to understand the functional products of the genome in different physiological conditions and developmental stages.

Recently, the development of high-throughput RNA sequencing (RNA-Seq) allows the researchers to perform transcriptome analysis for the organisms without the reference genome and transcriptome.

For such projects, de novo transcriptome assembly must be carried out prior to quantification.

However, a large number of erroneous contigs produced by the assemblers might result in unreliable estimation on the abundance of transcripts.

In this regard, this study comprehensively investigates how assembly quality affects the performance of quantification for RNA-Seq analysis based on de novo transcriptome assembly.

Results Several important factors that might seriously affect the accuracy of the RNA-Seq analysis were thoroughly discussed.

First, we found that the assemblers perform comparatively well for the transcriptomes with lower biological complexity.

Second, we examined the over-extended and incomplete contigs, and then demonstrated that assembly completeness has a strong impact on the estimation of contig abundance.

Lastly, we investigated the behavior of the quantifiers with respect to sequence ambiguity which might be originally present in the transcriptome or accidentally produced by assemblers.

The results suggest that the quantifiers often over-estimate the expression of family-collapse contigs and under-estimate the expression of duplicated contigs.

For organisms without reference transcriptome, it remained challenging to detect the inaccurate abundance estimation on family-collapse contigs.

On the contrary, we observed that the situation of under-estimation on duplicated contigs can be warned through analyzing the read distribution of the duplicated contigs.

Conclusions In summary, we explicated the behavior of quantifiers when erroneous contigs are present and we outlined the potential problems that the assemblers might cause for the downstream analysis of RNA-Seq.

We anticipate the analytic results conducted in this study provides valuable insights for future development of transcriptome assembly and quantification.

Availability we proposed an open-source Python based package QuantEval that builds connected components for the assembled contigs based on sequence similarity and evaluates the quantification results for each connected component.

The package can be downloaded from https://github.

com/dn070017/QuantEval .

Back

Abstract Correct quantification of transcript expression is essential to understand the functional elements in different physiological condit...

Importance of transcript variants in transcriptome analyses

Abstract RNA sequencing (RNA-Seq) has become a widely adopted genome-wide technique for investigating gene expression patterns. However, conventi...

The utility of transcriptomics in the conservation of sensitive and economically important species

The connection between the central dogma of biology [DNA --(Transcription)---› RNA –(Translation)--› Protein] and the 'omics' resources obtained from each molecule are now being ex...

Compacting and correcting Trinity and Oases RNA-Seq de novo assemblies

Background De novo transcriptome assembly of short reads is now a common step in expression analysis of organisms lacking a reference genome sequence. Several software pack...

Precise Transcript Reconstruction with End-Guided Assembly

ABSTRACT Accurate annotation of transcript isoforms is crucial to understand gene functions, but automated methods for reconstructing full-length transcripts from R...

12481 Evaluating The Device Handling And Preference Assessment Questionnaire For Growth Hormone Deficiency: Results From Content Validation Interviews

Abstract Disclosure: J. Neergaard: Employee; Self; Novo Nordisk. S. Akhtar: Employee; Self; Novo Nordisk. Stock Owner; Self; Novo Nordisk. B. Berg: Employee; Self; N...

Insulin Degludec Has Lower Hypoglycemia Risk than Insulin Glargine U100 in Older People with Type 2 Diabetes (T2D)

Vulnerability to hypoglycemia increases with age. To further assess the safety of insulin in older patients, the risk of hypoglycemia was investigated post-hoc in the SWITCH 2 trea...

Informed kmer selection for de novo transcriptome assembly

Transcriptome assembly is one of the important step in many RNA-seq workflows. Currently de bruijn graph based de novo transcriptome assembly algorithms is widely used for this. On...

Email:
Password:

Email:

Effect of de novo transcriptome assembly on transcript quantification

Related Results