Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies

View through CrossRef
AbstractLeft-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses. However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics. Thus, a practical left-censored missing value imputation method is urgently needed. We have developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp). We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline. The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity. The R code for GSimp, evaluation pipeline, vignette, real-world and simulated targeted metabolomics datasets are available at:https://github.com/WandeRum/GSimp.Author summaryMissing values caused by the limit of detection/quantification (LOD/LOQ) were widely observed in mass spectrometry (MS)-based targeted metabolomics studies and could be recognized as missing not at random (MNAR). MNAR leads to biased parameter estimations and jeopardizes following statistical analyses in different aspects, such as distorting sample distribution, impairing statistical power, etc. Although a wide range of missing value imputation methods was developed for –omics studies, a limited number of methods was designed appropriately for the situation of MNAR currently. To alleviate problems caused by MNAR and facilitate targeted metabolomics studies, we developed a Gibbs sampler based missing value imputation approach, called GSimp, which is public-accessible on GitHub. And we compared our method with existing approaches using an imputation evaluation pipeline on real-world and simulated metabolomics datasets to demonstrate the superiority of our method from different perspectives.
Title: GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies
Description:
AbstractLeft-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR).
Improper data processing procedures for missing values will cause adverse impacts on subsequent statistical analyses.
However, few imputation methods have been developed and applied to the situation of MNAR in the field of metabolomics.
Thus, a practical left-censored missing value imputation method is urgently needed.
We have developed an iterative Gibbs sampler based left-censored missing value imputation approach (GSimp).
We compared GSimp with other three imputation methods on two real-world targeted metabolomics datasets and one simulation dataset using our imputation evaluation pipeline.
The results show that GSimp outperforms other imputation methods in terms of imputation accuracy, observation distribution, univariate and multivariate analyses, and statistical sensitivity.
The R code for GSimp, evaluation pipeline, vignette, real-world and simulated targeted metabolomics datasets are available at:https://github.
com/WandeRum/GSimp.
Author summaryMissing values caused by the limit of detection/quantification (LOD/LOQ) were widely observed in mass spectrometry (MS)-based targeted metabolomics studies and could be recognized as missing not at random (MNAR).
MNAR leads to biased parameter estimations and jeopardizes following statistical analyses in different aspects, such as distorting sample distribution, impairing statistical power, etc.
Although a wide range of missing value imputation methods was developed for –omics studies, a limited number of methods was designed appropriately for the situation of MNAR currently.
To alleviate problems caused by MNAR and facilitate targeted metabolomics studies, we developed a Gibbs sampler based missing value imputation approach, called GSimp, which is public-accessible on GitHub.
And we compared our method with existing approaches using an imputation evaluation pipeline on real-world and simulated metabolomics datasets to demonstrate the superiority of our method from different perspectives.

Related Results

Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...
Regression analysis of interval-censored failure time data with non proportional hazards models
Regression analysis of interval-censored failure time data with non proportional hazards models
[ACCESS RESTRICTED TO THE UNIVERSITY OF MISSOURI AT AUTHOR'S REQUEST.] Interval-censored failure time data arises when the failure time of interest is known only to lie within an i...
Uncovering the consequences of batch effect associated missing values in omics data analysis
Uncovering the consequences of batch effect associated missing values in omics data analysis
ABSTRACTStatistical analyses in high-dimensional omics data are often hampered by the presence of batch effects (BEs) and missing values (MVs), but the interaction between these tw...
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Abstract Background The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, ...
A framework for testing different imputation methods for tabular datasets
A framework for testing different imputation methods for tabular datasets
AbstractBackground and purposeHandling missing values is a prevalent challenge in the analysis of clinical data. The rise of data-driven models demands an efficient use of the avai...
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Hydatid Cyst of The Orbit: A Systematic Review with Meta-Data
Abstarct Introduction Orbital hydatid cysts (HCs) constitute less than 1% of all cases of hydatidosis, yet their occurrence is often linked to severe visual complications. This stu...
Genotype Imputation
Genotype Imputation
Abstract A missing data problem arises in genetic epidemiological studies when genotypes of particular markers are unavailable fo...

Back to Top