Javascript must be enabled to continue!
Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data
View through CrossRef
CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method. The classification results are displayed in the form of a tree diagram model. After the model is formed, it is necessary to calculate the accuracy of the model. The aims is to see the performance of the model. The accuracy of this model can be done by calculating the predicted error rate in the model. There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation. These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages. Imbalanced data is data that has a different number of class observations. In the CHAID method, imbalanced data affects the prediction results. When the data is increasingly imbalanced the prediction result will approach the number of minority classes. Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data. This research is included in experimental research and uses simulated data from the results of generating data in RStudio. This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios. The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance. This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.
Universitas Negeri Padang
Title: Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data
Description:
CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method.
The classification results are displayed in the form of a tree diagram model.
After the model is formed, it is necessary to calculate the accuracy of the model.
The aims is to see the performance of the model.
The accuracy of this model can be done by calculating the predicted error rate in the model.
There are three methods, such as Leave one out cross-validation (LOOCV), Hold-out, and K-fold cross-validation.
These methods have different performances in dividing data into training and testing data, so each method has advantages and disadvantages.
Imbalanced data is data that has a different number of class observations.
In the CHAID method, imbalanced data affects the prediction results.
When the data is increasingly imbalanced the prediction result will approach the number of minority classes.
Therefore, a comparison was made for the three error rate prediction methods to determine the appropriate method for the CHAID method in imbalanced data.
This research is included in experimental research and uses simulated data from the results of generating data in RStudio.
This comparison was made by considering several factors, for the marginal opportunity matrix, different correlations, and several observation ratios.
The results of the comparison will be observed using a boxplot by looking at the median error rate and the lowest variance.
This research finds that K-fold cross-validation is the most suitable error rate prediction method applied to the CHAID method for imbalanced data.
Related Results
Performance Analysis of CHAID Algorithm for Accuracy
Performance Analysis of CHAID Algorithm for Accuracy
Chi-squared automatic interaction detector (CHAID) algorithm is considered to be one of the mostly used supervised learning methods as it is adaptable to solving any kind of proble...
Ketepatan Klasifikasi Metode Regresi Logistik dan Metode Chaid dengan Pembobotan Sampel
Ketepatan Klasifikasi Metode Regresi Logistik dan Metode Chaid dengan Pembobotan Sampel
Tujuan penelitian ini adalah menentukan ketepatan metode regresi logistik dan CHAID dengan pembobotan sampel pada klasifikasi status angkatan kerja Kabupaten Temanggung 2015. Popul...
Body weight prediction using different data mining algorithms in Thalli sheep: A comparative study
Body weight prediction using different data mining algorithms in Thalli sheep: A comparative study
Background and Aim: The Thalli sheep are the main breed of sheep in Pakistan, and an effective method to predict their body weight (BW) using linear body measurements has not yet b...
Hybrid Approach Based on the CHAID Algorithm for Improving Classification Performance of Diabetes Data
Hybrid Approach Based on the CHAID Algorithm for Improving Classification Performance of Diabetes Data
Diabetes, a chronic disease that is becoming more prevalent, presents increasing challenges, especially in low- and middle-income countries, where it is a growing burden. Egypt is ...
Metode Exhaustive CHAID untuk Klasifikasi Rumah Tangga Penerima KKS di Jawa Barat Tahun 2024
Metode Exhaustive CHAID untuk Klasifikasi Rumah Tangga Penerima KKS di Jawa Barat Tahun 2024
Abstract. The poverty rate in West Java in 2024 reached 7.78% or around 3.8 million people. The distribution of social assistance through the Family Welfare Card (KKS) program stil...
Penerapan dan Perbandingan Tiga Metode Analisis Pohon Keputusan pada Klasifikasi Penderita Kanker Payudara
Penerapan dan Perbandingan Tiga Metode Analisis Pohon Keputusan pada Klasifikasi Penderita Kanker Payudara
Abstract. Today there is a considerable amount of work dealing with decision trees, especially in survival analysis (Ibrahim et al, 2008). Cases classified as survival analysis, li...
Pengklasifikasian Status Kerja pada Angkatan Kerja di Kabupaten Tanah Datar Menggunakan Metode CART dan Metode CHAID
Pengklasifikasian Status Kerja pada Angkatan Kerja di Kabupaten Tanah Datar Menggunakan Metode CART dan Metode CHAID
Metode CART dan metode CHAID merupakan mentode pengklasifikasian yang bertujuan untuk menentukan faktor-faktor yang paling mampu membedakan klasifikasi objek. Metode CART dan metod...
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Optimising tool wear and workpiece condition monitoring via cyber-physical systems for smart manufacturing
Smart manufacturing has been developed since the introduction of Industry 4.0. It consists of resource sharing and networking, predictive engineering, and material and data analyti...

