Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Uncovering the consequences of batch effect associated missing values in omics data analysis

View through CrossRef
ABSTRACTStatistical analyses in high-dimensional omics data are often hampered by the presence of batch effects (BEs) and missing values (MVs), but the interaction between these two issues is not well-studied nor understood. MVs may manifest as a BE when their proportions differ across batches. These are termed as Batch-Effect Associated Missing values (BEAMs). We hypothesized that BEAMs in data may introduce bias which can impede the performance of missing value imputation (MVI). To test this, we simulated data with two batches, then introduced over 100 iterations, either 20% and 40% MVs in each batch (BEAMs) or 30% in both (control). K-nearest neighbours (KNN) was then used to perform MVI, in a typical global approach (M1) and a supposed superior batch-sensitized approach (M2). BEs were then corrected using ComBat. The effectiveness of the MVI was evaluated by its imputation accuracy and true and false positive rates. Notably, when BEAMs existed, M2 was generally undesirable as the differing application of MV filtering in M1 and M2 strategies resulted in an overall coverage deficiency. Additionally, both M1 and M2 strategies suffered in the presence of BEAMs, highlighting the need for a novel approach to handle MVI in data with BEAMs.Author summaryData in high-throughput omics data are often combined from different sources (batches), which creates batch effects in the data. Missing values are a common occurrence in these data, and their proportions are assumed to be equal across batches. However, instances exist when these proportions vary between batches, such as one batch having more missing values than another, resulting in batch effect associated missing values. Missing values are often dealt with through missing value imputation, but whether the variation in missing value proportions across batches affects imputation outcomes is unknown. In this paper, we investigate the consequence of performing imputation when this issue persists. We simulated data with equal and unequal missing value proportions, then assessed the performance of k-nearest neighbours imputation by its imputation accuracy and downstream analysis outcomes. This revealed that unequal missing value proportions worsens imputation and establishes the need for smarter imputation strategies to handle this complication.
Cold Spring Harbor Laboratory
Title: Uncovering the consequences of batch effect associated missing values in omics data analysis
Description:
ABSTRACTStatistical analyses in high-dimensional omics data are often hampered by the presence of batch effects (BEs) and missing values (MVs), but the interaction between these two issues is not well-studied nor understood.
MVs may manifest as a BE when their proportions differ across batches.
These are termed as Batch-Effect Associated Missing values (BEAMs).
We hypothesized that BEAMs in data may introduce bias which can impede the performance of missing value imputation (MVI).
To test this, we simulated data with two batches, then introduced over 100 iterations, either 20% and 40% MVs in each batch (BEAMs) or 30% in both (control).
K-nearest neighbours (KNN) was then used to perform MVI, in a typical global approach (M1) and a supposed superior batch-sensitized approach (M2).
BEs were then corrected using ComBat.
The effectiveness of the MVI was evaluated by its imputation accuracy and true and false positive rates.
Notably, when BEAMs existed, M2 was generally undesirable as the differing application of MV filtering in M1 and M2 strategies resulted in an overall coverage deficiency.
Additionally, both M1 and M2 strategies suffered in the presence of BEAMs, highlighting the need for a novel approach to handle MVI in data with BEAMs.
Author summaryData in high-throughput omics data are often combined from different sources (batches), which creates batch effects in the data.
Missing values are a common occurrence in these data, and their proportions are assumed to be equal across batches.
However, instances exist when these proportions vary between batches, such as one batch having more missing values than another, resulting in batch effect associated missing values.
Missing values are often dealt with through missing value imputation, but whether the variation in missing value proportions across batches affects imputation outcomes is unknown.
In this paper, we investigate the consequence of performing imputation when this issue persists.
We simulated data with equal and unequal missing value proportions, then assessed the performance of k-nearest neighbours imputation by its imputation accuracy and downstream analysis outcomes.
This revealed that unequal missing value proportions worsens imputation and establishes the need for smarter imputation strategies to handle this complication.

Related Results

The importance of batch sensitization in missing value imputation
The importance of batch sensitization in missing value imputation
AbstractData analysis is complex due to a myriad of technical problems. Amongst these, missing values and batch effects are endemic. Although many methods have been developed for m...
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer
Abstract Background and objectives Colorectal cancer (CRC) represents a heterogeneous malignancy that has concerned global burden of incidence and mortality. The tradition...
Exploring the classification of cancer cell lines from multiple omic views
Exploring the classification of cancer cell lines from multiple omic views
Background Cancer classification is of great importance to understanding its pathogenesis, making diagnosis and developing treatment. The accumulation of extensive o...
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Abstract Background The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, ...
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers
AbstractThe long-range spin-triplet supercurrent transport is an interesting phenomenon in the superconductor/ferromagnet ("Equation missing") heterostructure containing noncolline...
Muon: multimodal omics analysis framework
Muon: multimodal omics analysis framework
AbstractAdvances in multi-omics technologies have led to an explosion of multimodal datasets to address questions ranging from basic biology to translation. While these rich data p...
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...

Back to Top