Javascript must be enabled to continue!

How to normalize metatranscriptomic count data for differential expression analysis

ABSTRACT BACKGROUND Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics. Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences. Until now it is not clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics. The potential side effects that may result from direct application of transcriptomic tools to metatranscriptomic count data have not been studied so far. METHODS We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples. As a main consequence the correct normalization of metatranscriptomic count data requires the taxonomic separation of the data into organism-specific bins. Then the taxon-specific scaling of organism profiles yields a valid normalization and allows to recombine the scaled profiles into a metatranscriptomic count matrix. This matrix can then be analyzed with statistical tools for transcriptomic count data. For taxon-specific scaling and recombination of scaled counts we provide a simple R script. RESULTS When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data the organism-independent (global) scaling of counts implies a high risk of falsely predicted functional differences. In simulation studies we show that incorrect normalization not only tends to loose significant differences but especially can produce a large number of false positives. In contrast, taxon-specific scaling can equalize the variation of relative library sizes from different organisms and therefore shows a reliable detection of significant differences in all simulations. On real metatranscriptomic data the results from taxon-specific and global scaling can largely differ. In our study, global scaling shows a high number of extra predictions which are not supported by single transcriptome analyses. Inspection of the scaling error suggests that these extra predictions may actually correspond to artifacts of an incorrect normalization. CONCLUSIONS As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics. Our model implies a taxon-specific scaling of counts for normalization of the data. The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore effectively prevents the risk of false predictions due to incorrect normalization.

openRxiv

Heiner Klingenberg Peter Meinicke

2017

Title: How to normalize metatranscriptomic count data for differential expression analysis

Description:

ABSTRACT BACKGROUND Differential expression analysis on the basis of RNA-Seq count data has become a standard tool in transcriptomics.

Several studies have shown that prior normalization of the data is crucial for a reliable detection of transcriptional differences.

Until now it is not clear whether and how the transcriptomic approach can be used for differential expression analysis in metatranscriptomics.

The potential side effects that may result from direct application of transcriptomic tools to metatranscriptomic count data have not been studied so far.

METHODS We propose a model for differential expression in metatranscriptomics that explicitly accounts for variations in the taxonomic composition of transcripts across different samples.

As a main consequence the correct normalization of metatranscriptomic count data requires the taxonomic separation of the data into organism-specific bins.

Then the taxon-specific scaling of organism profiles yields a valid normalization and allows to recombine the scaled profiles into a metatranscriptomic count matrix.

This matrix can then be analyzed with statistical tools for transcriptomic count data.

For taxon-specific scaling and recombination of scaled counts we provide a simple R script.

RESULTS When applying transcriptomic tools for differential expression analysis directly to metatranscriptomic data the organism-independent (global) scaling of counts implies a high risk of falsely predicted functional differences.

In simulation studies we show that incorrect normalization not only tends to loose significant differences but especially can produce a large number of false positives.

In contrast, taxon-specific scaling can equalize the variation of relative library sizes from different organisms and therefore shows a reliable detection of significant differences in all simulations.

On real metatranscriptomic data the results from taxon-specific and global scaling can largely differ.

In our study, global scaling shows a high number of extra predictions which are not supported by single transcriptome analyses.

Inspection of the scaling error suggests that these extra predictions may actually correspond to artifacts of an incorrect normalization.

CONCLUSIONS As in transcriptomics, a proper normalization of count data is also essential for differential expression analysis in metatranscriptomics.

Our model implies a taxon-specific scaling of counts for normalization of the data.

The application of taxon-specific scaling consequently removes taxonomic composition variations from functional profiles and therefore effectively prevents the risk of false predictions due to incorrect normalization.

Back

Abstract Introduction Given pregnancy's significant impact on hematological parameters, monitoring these changes across trimesters is crucial. This study aims to evaluate hematolog...

Association of T lymphocytes level and clinical severity in patients of COVID-19 in Shenzhen China

To explore the correlation between T lymphocytes and clinical severity in patients of COVID-19. A total of 183 COVID-19 patients were recruited in Shenzhen Third People’s Hospital ...

Prognostic Implications Of PRAME Expression Levels In Myelodysplastic Syndromes

Abstract The preferentially expressed antigen of the melanoma (PRAME) gene was first identified in melanoma tissue as a tumour-associated antigen recognized by autol...

Predicting novel mosquito-associated viruses from metatranscriptomic dark matter

Abstract The exponential growth of metatranscriptomic studies dedicated to arboviral surveillance in mosquitoes has yielded an unprecedented volume of unclassified s...

Identifying Breast Cancer-induced Gene Perturbations and its Application in Guiding Drug Repurposing

Background:Breast cancer is a complex disease with high prevalence in women, the molecular mechanisms of which are still unclear at present. Most transcriptomic studies on breast c...

Platelet count patterns and patient outcomes in sepsis at a tertiary care center

Abstract Acute physiology and chronic health evaluation II (APACHE-II) scoring system is used to classify disease severity of patients in the intensive care unit. Howev...

Semiparametric methods for regression analysis of panel count data and mixed panel count data

[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Recurrent event data and panel count data are two common types of data that have been studied extensively in ...

Reticulocyte Count and Platelet Count as Predictors of Morphological Remission/Hemopoitic Recovery in Acute Lymphoblastic Leukemia (ALL) after Induction Chemotherapy

Objectives: To determine the predictive values of reticulocyte and platelet count for remission in cases of acute lymphoblastic leukemia after induction therapy. Materials and Me...

Email:
Password:

Email:

How to normalize metatranscriptomic count data for differential expression analysis

Related Results