Javascript must be enabled to continue!
Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance
View through CrossRef
Imbalanced data refers to a condition that there is a different size of samples between one class with another class(es). It made the term “majority” class that represents the class with more instances number on the dataset and “minority” classes that represent the class with fewer instances number on the dataset. Under the target of educational data mining which demands accurate measurement of the student’s performance analysis, data mining requires an appropriate dataset to produce good accuracy. This study aims to measure the resampling method’s performance through the classification process on the student’s performance dataset, which is also a multi-class dataset. Thus, this study also measures how the method performs on a multi-class classification problem. Utilizing four public educational datasets, which consist of the result of an educational process, this study aims to get a better picture of which resampling methods are suitable for that kind of dataset. This research uses more than twenty resampling methods from the SMOTE variants library. as a comparison; this study implements nine classification methods to measure the performance of the resampled data with the non-resampled data. According to the results, SMOTE-ENN is generally the better resampling method since it produces a 0,97 F1 score under the Stacking classification method and the highest among others. However, the resampling method performs relatively low on the dataset with wider label variations. The future work of this study is to dig deeper into why the resampling method cannot handle the enormous class variation since the F1 score on the student dataset is lower than the other dataset.
Universitas Pesantren Tinggi Darul Ulum (Unipdu)
Title: Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance
Description:
Imbalanced data refers to a condition that there is a different size of samples between one class with another class(es).
It made the term “majority” class that represents the class with more instances number on the dataset and “minority” classes that represent the class with fewer instances number on the dataset.
Under the target of educational data mining which demands accurate measurement of the student’s performance analysis, data mining requires an appropriate dataset to produce good accuracy.
This study aims to measure the resampling method’s performance through the classification process on the student’s performance dataset, which is also a multi-class dataset.
Thus, this study also measures how the method performs on a multi-class classification problem.
Utilizing four public educational datasets, which consist of the result of an educational process, this study aims to get a better picture of which resampling methods are suitable for that kind of dataset.
This research uses more than twenty resampling methods from the SMOTE variants library.
as a comparison; this study implements nine classification methods to measure the performance of the resampled data with the non-resampled data.
According to the results, SMOTE-ENN is generally the better resampling method since it produces a 0,97 F1 score under the Stacking classification method and the highest among others.
However, the resampling method performs relatively low on the dataset with wider label variations.
The future work of this study is to dig deeper into why the resampling method cannot handle the enormous class variation since the F1 score on the student dataset is lower than the other dataset.
Related Results
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study...
Comparative analysis of resampling algorithms in the prediction of stroke diseases
Comparative analysis of resampling algorithms in the prediction of stroke diseases
Stroke disease is a serious cause of death globally. Early predictions of the disease will save a lot of lives but most of the clinical datasets are imbalanced in nature including ...
Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems
Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems
The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in...
An automated approach for binary classification on imbalanced data
An automated approach for binary classification on imbalanced data
Abstract
Imbalanced data is present in various business areas and must be dealt with the appropriate resampling techniques and classification algorithms. However, there is ...
Selective Ensemble Learning Algorithm for Imbalanced Dataset
Selective Ensemble Learning Algorithm for Imbalanced Dataset
Abstract
Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classif...
Improving Medical Document Classification via Feature Engineering
Improving Medical Document Classification via Feature Engineering
<p dir="ltr">Document classification (DC) is the task of assigning the predefined labels to unseen documents by utilizing the model trained on the available labeled documents...
Handling the Imbalanced Problem in Agri-Food Data Analysis
Handling the Imbalanced Problem in Agri-Food Data Analysis
Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck in machine learning/data mining and is becoming a serious issue o...
Performance of Sampling/Resampling-based Particle Filters Applied to Non-Linear Problems
Performance of Sampling/Resampling-based Particle Filters Applied to Non-Linear Problems
In this work, we propose a wireless body area sensor network (WBASN) to monitor patient position. Localization and tracking are enhanced by improving the effect of the received sig...

