Javascript must be enabled to continue!

Uncovering the consequences of batch effect associated missing values in omics data analysis

ABSTRACTStatistical analyses in high-dimensional omics data are often hampered by the presence of batch effects (BEs) and missing values (MVs), but the interaction between these two issues is not well-studied nor understood. MVs may manifest as a BE when their proportions differ across batches. These are termed as Batch-Effect Associated Missing values (BEAMs). We hypothesized that BEAMs in data may introduce bias which can impede the performance of missing value imputation (MVI). To test this, we simulated data with two batches, then introduced over 100 iterations, either 20% and 40% MVs in each batch (BEAMs) or 30% in both (control). K-nearest neighbours (KNN) was then used to perform MVI, in a typical global approach (M1) and a supposed superior batch-sensitized approach (M2). BEs were then corrected using ComBat. The effectiveness of the MVI was evaluated by its imputation accuracy and true and false positive rates. Notably, when BEAMs existed, M2 was generally undesirable as the differing application of MV filtering in M1 and M2 strategies resulted in an overall coverage deficiency. Additionally, both M1 and M2 strategies suffered in the presence of BEAMs, highlighting the need for a novel approach to handle MVI in data with BEAMs.Author summaryData in high-throughput omics data are often combined from different sources (batches), which creates batch effects in the data. Missing values are a common occurrence in these data, and their proportions are assumed to be equal across batches. However, instances exist when these proportions vary between batches, such as one batch having more missing values than another, resulting in batch effect associated missing values. Missing values are often dealt with through missing value imputation, but whether the variation in missing value proportions across batches affects imputation outcomes is unknown. In this paper, we investigate the consequence of performing imputation when this issue persists. We simulated data with equal and unequal missing value proportions, then assessed the performance of k-nearest neighbours imputation by its imputation accuracy and downstream analysis outcomes. This revealed that unequal missing value proportions worsens imputation and establishes the need for smarter imputation strategies to handle this complication.

Cold Spring Harbor Laboratory

Harvard Wai Hann Hui Wilson Wen Bin Goh

2023

Title: Uncovering the consequences of batch effect associated missing values in omics data analysis

Description:

MVs may manifest as a BE when their proportions differ across batches.

These are termed as Batch-Effect Associated Missing values (BEAMs).

We hypothesized that BEAMs in data may introduce bias which can impede the performance of missing value imputation (MVI).

To test this, we simulated data with two batches, then introduced over 100 iterations, either 20% and 40% MVs in each batch (BEAMs) or 30% in both (control).

K-nearest neighbours (KNN) was then used to perform MVI, in a typical global approach (M1) and a supposed superior batch-sensitized approach (M2).

BEs were then corrected using ComBat.

The effectiveness of the MVI was evaluated by its imputation accuracy and true and false positive rates.

Notably, when BEAMs existed, M2 was generally undesirable as the differing application of MV filtering in M1 and M2 strategies resulted in an overall coverage deficiency.

Additionally, both M1 and M2 strategies suffered in the presence of BEAMs, highlighting the need for a novel approach to handle MVI in data with BEAMs.

Author summaryData in high-throughput omics data are often combined from different sources (batches), which creates batch effects in the data.

Missing values are a common occurrence in these data, and their proportions are assumed to be equal across batches.

However, instances exist when these proportions vary between batches, such as one batch having more missing values than another, resulting in batch effect associated missing values.

Missing values are often dealt with through missing value imputation, but whether the variation in missing value proportions across batches affects imputation outcomes is unknown.

In this paper, we investigate the consequence of performing imputation when this issue persists.

We simulated data with equal and unequal missing value proportions, then assessed the performance of k-nearest neighbours imputation by its imputation accuracy and downstream analysis outcomes.

This revealed that unequal missing value proportions worsens imputation and establishes the need for smarter imputation strategies to handle this complication.

Back

AbstractData analysis is complex due to a myriad of technical problems. Amongst these, missing values and batch effects are endemic. Although many methods have been developed for m...

Benchmarking multi-omics integrative clustering methods for subtype identification in colorectal cancer

Abstract Background and objectives Colorectal cancer (CRC) represents a heterogeneous malignancy that has concerned global burden of incidence and mortality. The tradition...

Exploring the classification of cancer cell lines from multiple omic views

Background Cancer classification is of great importance to understanding its pathogenesis, making diagnosis and developing treatment. The accumulation of extensive o...

P-222 Can embryo morphokinetics act as early warning key performance indicators in relation to consumable batching

Abstract Study question Can morphokinetics be used as an early warning indicator of a batch-related effect of oil currently in u...

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis

Abstract Background The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, ...

Long-range superharmonic Josephson current and spin-triplet pairing correlations in a junction with ferromagnetic bilayers

AbstractThe long-range spin-triplet supercurrent transport is an interesting phenomenon in the superconductor/ferromagnet ("Equation missing") heterostructure containing noncolline...

Muon: multimodal omics analysis framework

AbstractAdvances in multi-omics technologies have led to an explosion of multimodal datasets to address questions ranging from basic biology to translation. While these rich data p...

A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis

Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...

Email:
Password:

Email:

Uncovering the consequences of batch effect associated missing values in omics data analysis

Related Results