Javascript must be enabled to continue!

A Hybrid Imputation Method for Multi-Pattern Missing Data: A Case Study on Type II Diabetes Diagnosis

Real medical datasets usually consist of missing data with different patterns which decrease the performance of classifiers used in intelligent healthcare and disease diagnosis systems. Many methods have been proposed to impute missing data, however, they do not fulfill the need for data quality especially in real datasets with different missing data patterns. In this paper, a four-layer model is introduced, and then a hybrid imputation (HIMP) method using this model is proposed to impute multi-pattern missing data including non-random, random, and completely random patterns. In HIMP, first, non-random missing data patterns are imputed, and then the obtained dataset is decomposed into two datasets containing random and completely random missing data patterns. Then, concerning the missing data patterns in each dataset, different single or multiple imputation methods are used. Finally, the best-imputed datasets gained from random and completely random patterns are merged to form the final dataset. The experimental evaluation was conducted by a real dataset named IRDia including all three missing data patterns. The proposed method and comparative methods were compared using different classifiers in terms of accuracy, precision, recall, and F1-score. The classifiers’ performances show that the HIMP can impute multi-pattern missing values more effectively than other comparative methods.

MDPI AG

Mohammad H. Nadimi-Shahraki Saeed Mohammadi Hoda Zamani Mostafa Gandomi Amir H. Gandomi

Electronics

2021

Title: A Hybrid Imputation Method for Multi-Pattern Missing Data: A Case Study on Type II Diabetes Diagnosis

Description:

Real medical datasets usually consist of missing data with different patterns which decrease the performance of classifiers used in intelligent healthcare and disease diagnosis systems.

Many methods have been proposed to impute missing data, however, they do not fulfill the need for data quality especially in real datasets with different missing data patterns.

In this paper, a four-layer model is introduced, and then a hybrid imputation (HIMP) method using this model is proposed to impute multi-pattern missing data including non-random, random, and completely random patterns.

In HIMP, first, non-random missing data patterns are imputed, and then the obtained dataset is decomposed into two datasets containing random and completely random missing data patterns.

Then, concerning the missing data patterns in each dataset, different single or multiple imputation methods are used.

Finally, the best-imputed datasets gained from random and completely random patterns are merged to form the final dataset.

The experimental evaluation was conducted by a real dataset named IRDia including all three missing data patterns.

The proposed method and comparative methods were compared using different classifiers in terms of accuracy, precision, recall, and F1-score.

The classifiers’ performances show that the HIMP can impute multi-pattern missing values more effectively than other comparative methods.

Back

Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...

Handling Missing Data in COVID-19 Incidence Estimation: Secondary Data Analysis

Abstract Background The COVID-19 pandemic has revealed significant challenges in disease forecasting and in developing a public health response, ...

A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis

Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...

GSimp: A Gibbs sampler based left-censored missing value imputation approach for metabolomics studies

AbstractLeft-censored missing values commonly exist in targeted metabolomics datasets and can be considered as missing not at random (MNAR). Improper data processing procedures for...

Uncovering the consequences of batch effect associated missing values in omics data analysis

ABSTRACTStatistical analyses in high-dimensional omics data are often hampered by the presence of batch effects (BEs) and missing values (MVs), but the interaction between these tw...

Breast Carcinoma within Fibroadenoma: A Systematic Review

Abstract Introduction Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...

A framework for testing different imputation methods for tabular datasets

AbstractBackground and purposeHandling missing values is a prevalent challenge in the analysis of clinical data. The rise of data-driven models demands an efficient use of the avai...

Enhancing data integrity in Electronic Health Records: Review of methods for handling missing data

AbstractIntroductionElectronic Health Records (EHRs) are vital repositories of patient information for medical research, but the prevalence of missing data presents an obstacle to ...

Email:
Password:

Email:

A Hybrid Imputation Method for Multi-Pattern Missing Data: A Case Study on Type II Diabetes Diagnosis

Related Results