Javascript must be enabled to continue!
A Hybrid Imputation Method for Multi-Pattern Missing Data: A Case Study on Type II Diabetes Diagnosis
View through CrossRef
Real medical datasets usually consist of missing data with different patterns which decrease the performance of classifiers used in intelligent healthcare and disease diagnosis systems. Many methods have been proposed to impute missing data, however, they do not fulfill the need for data quality especially in real datasets with different missing data patterns. In this paper, a four-layer model is introduced, and then a hybrid imputation (HIMP) method using this model is proposed to impute multi-pattern missing data including non-random, random, and completely random patterns. In HIMP, first, non-random missing data patterns are imputed, and then the obtained dataset is decomposed into two datasets containing random and completely random missing data patterns. Then, concerning the missing data patterns in each dataset, different single or multiple imputation methods are used. Finally, the best-imputed datasets gained from random and completely random patterns are merged to form the final dataset. The experimental evaluation was conducted by a real dataset named IRDia including all three missing data patterns. The proposed method and comparative methods were compared using different classifiers in terms of accuracy, precision, recall, and F1-score. The classifiers’ performances show that the HIMP can impute multi-pattern missing values more effectively than other comparative methods.
Title: A Hybrid Imputation Method for Multi-Pattern Missing Data: A Case Study on Type II Diabetes Diagnosis
Description:
Real medical datasets usually consist of missing data with different patterns which decrease the performance of classifiers used in intelligent healthcare and disease diagnosis systems.
Many methods have been proposed to impute missing data, however, they do not fulfill the need for data quality especially in real datasets with different missing data patterns.
In this paper, a four-layer model is introduced, and then a hybrid imputation (HIMP) method using this model is proposed to impute multi-pattern missing data including non-random, random, and completely random patterns.
In HIMP, first, non-random missing data patterns are imputed, and then the obtained dataset is decomposed into two datasets containing random and completely random missing data patterns.
Then, concerning the missing data patterns in each dataset, different single or multiple imputation methods are used.
Finally, the best-imputed datasets gained from random and completely random patterns are merged to form the final dataset.
The experimental evaluation was conducted by a real dataset named IRDia including all three missing data patterns.
The proposed method and comparative methods were compared using different classifiers in terms of accuracy, precision, recall, and F1-score.
The classifiers’ performances show that the HIMP can impute multi-pattern missing values more effectively than other comparative methods.
Related Results
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct
Introduction
Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis
Abstract
Background
The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, ...
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis
Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies
GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies
AbstractLeft-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for...
Uncovering the consequences of batch effect associated missing values in omics data analysis
Uncovering the consequences of batch effect associated missing values in omics data analysis
ABSTRACTStatistical analyses in high-dimensional omics data are often hampered by the presence of batch effects (BEs) and missing values (MVs), but the interaction between these tw...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract
Introduction
Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
A framework for testing different imputation methods for tabular datasets
A framework for testing different imputation methods for tabular datasets
AbstractBackground and purposeHandling missing values is a prevalent challenge in the analysis of clinical data. The rise of data-driven models demands an efficient use of the avai...
Enhancing data integrity in Electronic Health Records: Review of methods for handling missing data
Enhancing data integrity in Electronic Health Records: Review of methods for handling missing data
AbstractIntroductionElectronic Health Records (EHRs) are vital repositories of patient information for medical research, but the prevalence of missing data presents an obstacle to ...


