Javascript must be enabled to continue!

Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance

Imbalanced data refers to a condition that there is a different size of samples between one class with another class(es). It made the term “majority” class that represents the class with more instances number on the dataset and “minority” classes that represent the class with fewer instances number on the dataset. Under the target of educational data mining which demands accurate measurement of the student’s performance analysis, data mining requires an appropriate dataset to produce good accuracy. This study aims to measure the resampling method’s performance through the classification process on the student’s performance dataset, which is also a multi-class dataset. Thus, this study also measures how the method performs on a multi-class classification problem. Utilizing four public educational datasets, which consist of the result of an educational process, this study aims to get a better picture of which resampling methods are suitable for that kind of dataset. This research uses more than twenty resampling methods from the SMOTE variants library. as a comparison; this study implements nine classification methods to measure the performance of the resampled data with the non-resampled data. According to the results, SMOTE-ENN is generally the better resampling method since it produces a 0,97 F1 score under the Stacking classification method and the highest among others. However, the resampling method performs relatively low on the dataset with wider label variations. The future work of this study is to dig deeper into why the resampling method cannot handle the enormous class variation since the F1 score on the student dataset is lower than the other dataset.

Universitas Pesantren Tinggi Darul Ulum (Unipdu)

Irfan Pratama Putri Taqwa Prasetyaningrum Albert Yakobus Chandra Ozzi Suria

2024

Title: Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance

Description:

Imbalanced data refers to a condition that there is a different size of samples between one class with another class(es).

It made the term “majority” class that represents the class with more instances number on the dataset and “minority” classes that represent the class with fewer instances number on the dataset.

Under the target of educational data mining which demands accurate measurement of the student’s performance analysis, data mining requires an appropriate dataset to produce good accuracy.

This study aims to measure the resampling method’s performance through the classification process on the student’s performance dataset, which is also a multi-class dataset.

Thus, this study also measures how the method performs on a multi-class classification problem.

Utilizing four public educational datasets, which consist of the result of an educational process, this study aims to get a better picture of which resampling methods are suitable for that kind of dataset.

This research uses more than twenty resampling methods from the SMOTE variants library.

as a comparison; this study implements nine classification methods to measure the performance of the resampled data with the non-resampled data.

According to the results, SMOTE-ENN is generally the better resampling method since it produces a 0,97 F1 score under the Stacking classification method and the highest among others.

However, the resampling method performs relatively low on the dataset with wider label variations.

The future work of this study is to dig deeper into why the resampling method cannot handle the enormous class variation since the F1 score on the student dataset is lower than the other dataset.

Back

Imbalanced classification is a common problem in machine learning, where one class significantly outnumbers the others. This imbalance leads to biased model performance, where the ...

Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study...

Comparative analysis of resampling algorithms in the prediction of stroke diseases

Stroke disease is a serious cause of death globally. Early predictions of the disease will save a lot of lives but most of the clinical datasets are imbalanced in nature including ...

Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems

The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in...

An automated approach for binary classification on imbalanced data

Abstract Imbalanced data is present in various business areas and must be dealt with the appropriate resampling techniques and classification algorithms. However, there is ...

Selective Ensemble Learning Algorithm for Imbalanced Dataset

Abstract Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classif...

Improving Medical Document Classification via Feature Engineering

<p dir="ltr">Document classification (DC) is the task of assigning the predefined labels to unseen documents by utilizing the model trained on the available labeled documents...

Handling the Imbalanced Problem in Agri-Food Data Analysis

Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck in machine learning/data mining and is becoming a serious issue o...

Email:
Password:

Email:

Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance

Related Results