Javascript must be enabled to continue!

Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data

CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method. The classification results are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation. These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages. Imbalanced data is data that has a different number of class observations. In the CHAID method, imbalanced data affects the prediction results. When the data is increasingly imbalanced the prediction result will approach the number of minority classes. Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data. This research is included in experimental research and uses simulated data from the results of generating data in RStudio. This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.

Universitas Negeri Padang

Seif Adil El-Muslih Dodi Vionanda Nonong Amalita Admi Salma

UNP Journal of Statistics and Data Science

2023

Title: Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data

Description:

CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method.

The classification results are displayed in the form of a tree diagram model.

After the model is formed, it is necessary to calculate the accuracy of the model.

The aims is to see the performance of the model.

The accuracy of this model can be done by calculating the predicted error rate in the model.

There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation.

These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages.

Imbalanced data is data that has a different number of class observations.

In the CHAID method, imbalanced data affects the prediction results.

When the data is increasingly imbalanced the prediction result will approach the number of minority classes.

Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data.

This research is included in experimental research and uses simulated data from the results of generating data in RStudio.

This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios.

The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance.

This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.

Back

Chi-squared automatic interaction detector (CHAID) algorithm is considered to be one of the mostly used supervised learning methods as it is adaptable to solving any kind of proble...

Ketepatan Klasifikasi Metode Regresi Logistik dan Metode Chaid dengan Pembobotan Sampel

Tujuan penelitian ini adalah menentukan ketepatan metode regresi logistik dan CHAID dengan pembobotan sampel pada klasifikasi status angkatan kerja Kabupaten Temanggung 2015. Popul...

Body weight prediction using different data mining algorithms in Thalli sheep: A comparative study

Background and Aim: The Thalli sheep are the main breed of sheep in Pakistan, and an effective method to predict their body weight (BW) using linear body measurements has not yet b...

Hybrid Approach Based on the CHAID Algorithm for Improving Classification Performance of Diabetes Data

Diabetes, a chronic disease that is becoming more prevalent, presents increasing challenges, especially in low- and middle-income countries, where it is a growing burden. Egypt is ...

Metode Exhaustive CHAID untuk Klasifikasi Rumah Tangga Penerima KKS di Jawa Barat Tahun 2024

Abstract. The poverty rate in West Java in 2024 reached 7.78% or around 3.8 million people. The distribution of social assistance through the Family Welfare Card (KKS) program stil...

Penerapan dan Perbandingan Tiga Metode Analisis Pohon Keputusan pada Klasifikasi Penderita Kanker Payudara

Abstract. Today there is a considerable amount of work dealing with decision trees, especially in survival analysis (Ibrahim et al, 2008). Cases classified as survival analysis, li...

Pengklasifikasian Status Kerja pada Angkatan Kerja di Kabupaten Tanah Datar Menggunakan Metode CART dan Metode CHAID

Metode CART dan metode CHAID merupakan mentode pengklasifikasian yang bertujuan untuk menentukan faktor-faktor yang paling mampu membedakan klasifikasi objek. Metode CART dan metod...

Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing

Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...

Email:
Password:

Email:

Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data

Related Results