Javascript must be enabled to continue!

Handling Missing Values and Outliers in Advanced Data Pre-processing: An Enhancement of Diabetes Classification Accuracy

Abstract Background The rising global threat of diabetes demands timely detection to prevent its complications. Data scientists and practitioners are seen to be used AI and some other classification models on different aspects. Nevertheless, addressing missing data and outlier’s accurate predictions may be questionable. As such incorporating ML and AI for early diagnosis has gained attention. This study integrates medical knowledge and what types of advanced technology to develop a comprehensive diabetes classification model, focusing on handling missing values and outliers to achieve improved accuracy in early disease identification.Methods The researcher’s methodology prioritized meticulous data pre-processing to enhance analysis quality. To address missing data, the researchers utilized the missForest method, employing a multistage imputation process that minimizes data loss and distortions. Outlier detection relied on Mahalanobis squared distances, identifying anomalous data points. Instead of outright removal, the researchers strategically leveraged the missForest method, known for its robust imputation capabilities. Temporarily replacing outliers with missing values, this approach seamlessly integrated imputation. The ensuing hybrid data, minus extreme outliers and enriched via missForest, formed the foundation for subsequent analysis and modelling. Model selection and evaluation were performed on pre-processed data. This analysis incorporated two-step CV: initial dataset partition (80% training, 20% testing) and ten iterations of ten-fold cross-validation for model stability and parameter optimization. A diverse array of ML models—LogitBoost, mlpWeightDecayML, avNNet, and others—were assessed. Metrics such as sensitivity, specificity, precision, recall, F1-score, AUC, accuracy, and Kappa score were scrutinized.Results Among the models examined, LogitBoost emerged as a strong contender with a sensitivity of 0.8095, specificity of 0.9464, precision of 0.85, recall of 0.8095, F1-score of 0.8293, AUC of 0.7888, accuracy of 0.9091, and Kappa score of 0.7674. However, the comparative results showcase varying performances across different metrics and models. Sensitivity ranged from 0.6792 to 0.9057, specificity from 0.6 to 0.9464, and precision from 0.5455 to 0.85.Conclusions In summation, the methodical approach has illuminated the path toward enhanced diabetes classification accuracy. By diligently addressing missing values through the robust missForest method and tactfully managing outliers using the hybrid approach, the researchers have elevated the integrity and quality of the PIMA dataset. This strategic handling of missing values and outliers has not only fortified the dataset against potential distortions but has also culminated in improved accuracy in diabetes classification. Through the synergy of meticulous pre-processing, strategic outlier management, and comprehensive model evaluation, the researchers have contributed valuable insights into the realm of early diabetes detection.

Springer Science and Business Media LLC

Md. Hossain Astami Devnath Provash Karmokar

2023

Title: Handling Missing Values and Outliers in Advanced Data Pre-processing: An Enhancement of Diabetes Classification Accuracy

Description:

Abstract Background The rising global threat of diabetes demands timely detection to prevent its complications.

Data scientists and practitioners are seen to be used AI and some other classification models on different aspects.

Nevertheless, addressing missing data and outlier’s accurate predictions may be questionable.

As such incorporating ML and AI for early diagnosis has gained attention.

This study integrates medical knowledge and what types of advanced technology to develop a comprehensive diabetes classification model, focusing on handling missing values and outliers to achieve improved accuracy in early disease identification.

Methods The researcher’s methodology prioritized meticulous data pre-processing to enhance analysis quality.

To address missing data, the researchers utilized the missForest method, employing a multistage imputation process that minimizes data loss and distortions.

Outlier detection relied on Mahalanobis squared distances, identifying anomalous data points.

Instead of outright removal, the researchers strategically leveraged the missForest method, known for its robust imputation capabilities.

Temporarily replacing outliers with missing values, this approach seamlessly integrated imputation.

The ensuing hybrid data, minus extreme outliers and enriched via missForest, formed the foundation for subsequent analysis and modelling.

Model selection and evaluation were performed on pre-processed data.

This analysis incorporated two-step CV: initial dataset partition (80% training, 20% testing) and ten iterations of ten-fold cross-validation for model stability and parameter optimization.

A diverse array of ML models—LogitBoost, mlpWeightDecayML, avNNet, and others—were assessed.

Metrics such as sensitivity, specificity, precision, recall, F1-score, AUC, accuracy, and Kappa score were scrutinized.

Results Among the models examined, LogitBoost emerged as a strong contender with a sensitivity of 0.

8095, specificity of 0.

9464, precision of 0.

85, recall of 0.

8095, F1-score of 0.

8293, AUC of 0.

7888, accuracy of 0.

9091, and Kappa score of 0.

7674.

However, the comparative results showcase varying performances across different metrics and models.

Sensitivity ranged from 0.

6792 to 0.

9057, specificity from 0.

6 to 0.

9464, and precision from 0.

5455 to 0.

85.

Conclusions In summation, the methodical approach has illuminated the path toward enhanced diabetes classification accuracy.

By diligently addressing missing values through the robust missForest method and tactfully managing outliers using the hybrid approach, the researchers have elevated the integrity and quality of the PIMA dataset.

This strategic handling of missing values and outliers has not only fortified the dataset against potential distortions but has also culminated in improved accuracy in diabetes classification.

Through the synergy of meticulous pre-processing, strategic outlier management, and comprehensive model evaluation, the researchers have contributed valuable insights into the realm of early diabetes detection.

Back

[RETRACTED]Rhino XL Reviews, NY USA: Studies show that testosterone levels in males decrease constantly with growing age. There are also many other problems that males face due ...

Undiagnosed Diabetes in Acute Coronary Syndrome: A Silent Threat in Pakistan

Diabetes mellitus (DM) has emerged as one of the most pressing public health challenges globally, and Pakistan stands among the countries most severely affected. With rising urbani...

PENURUNAN KADAR GULA DARAH DAN RESIKO ULKUS PADA PENDERITA DIABETES MELLITUS DENGAN SENAM KAKI DIABETES

ABSTRAKDiabetes mellitus adalah suatu penyakit dengan peningkatan glukosa darah di atas normal. Indonesia merupakan negara menempati urutan ke 7 dengan penderita diabetes mellitus ...

A New Approach of Outlier-robust Missing Value Imputation for Metabolomics Data Analysis

Background:Metabolomics data generation and quantification are different from other types of molecular “omics” data in bioinformatics. Mass spectrometry (MS) based (gas chromatogra...

Effect of Diabetes Online Community Engagement on Health Indicators: Cross-Sectional Study (Preprint)

BACKGROUND Successful diabetes management requires ongoing lifelong self-care and can require that individuals with diabetes become experts in translating c...

Pendidikan dan promosi kesehatan tentang diabetes mellitus

Health education and promotion about diabetes mellitus Introduction: Diabetes mellitus in Indonesia is a serious threat to health development. The 2010 NCD World Health Organizatio...

Diabetes Awareness Among High School Students in Qatar

Diabetes is a disease that occurs when there is an abundance of glucose in the blood stream and the body cannot produce enough insulin in the pancreas to transfer the sugar from th...

Diabetes Mellitus: Life Style, Obesity and Insulin Resistance

In millennia, 40 million people were died with non-communicable diseases and diabetes is one of them. In diabetes, insulin secretions are not produced properly or resist to body an...

Email:
Password:

Email:

Handling Missing Values and Outliers in Advanced Data Pre-processing: An Enhancement of Diabetes Classification Accuracy

Related Results