Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance

View through CrossRef
Imbalanced data refers to a condition that there is a different size of samples between one class with another class(es). It made the term “majority” class that represents the class with more instances number on the dataset and “minority” classes that represent the class with fewer instances number on the dataset. Under the target of educational data mining which demands accurate measurement of the student’s performance analysis, data mining requires an appropriate dataset to produce good accuracy. This study aims to measure the resampling method’s performance through the classification process on the student’s performance dataset, which is also a multi-class dataset. Thus, this study also measures how the method performs on a multi-class classification problem. Utilizing four public educational datasets, which consist of the result of an educational process, this study aims to get a better picture of which resampling methods are suitable for that kind of dataset. This research uses more than twenty resampling methods from the SMOTE variants library. as a comparison; this study implements nine classification methods to measure the performance of the resampled data with the non-resampled data. According to the results, SMOTE-ENN is generally the better resampling method since it produces a 0,97 F1 score under the Stacking classification method and the highest among others. However, the resampling method performs relatively low on the dataset with wider label variations. The future work of this study is to dig deeper into why the resampling method cannot handle the enormous class variation since the F1 score on the student dataset is lower than the other dataset.
Title: Measuring Resampling Methods on Imbalanced Educational Dataset’s Classification Performance
Description:
Imbalanced data refers to a condition that there is a different size of samples between one class with another class(es).
It made the term “majority” class that represents the class with more instances number on the dataset and “minority” classes that represent the class with fewer instances number on the dataset.
Under the target of educational data mining which demands accurate measurement of the student’s performance analysis, data mining requires an appropriate dataset to produce good accuracy.
This study aims to measure the resampling method’s performance through the classification process on the student’s performance dataset, which is also a multi-class dataset.
Thus, this study also measures how the method performs on a multi-class classification problem.
Utilizing four public educational datasets, which consist of the result of an educational process, this study aims to get a better picture of which resampling methods are suitable for that kind of dataset.
This research uses more than twenty resampling methods from the SMOTE variants library.
as a comparison; this study implements nine classification methods to measure the performance of the resampled data with the non-resampled data.
According to the results, SMOTE-ENN is generally the better resampling method since it produces a 0,97 F1 score under the Stacking classification method and the highest among others.
However, the resampling method performs relatively low on the dataset with wider label variations.
The future work of this study is to dig deeper into why the resampling method cannot handle the enormous class variation since the F1 score on the student dataset is lower than the other dataset.

Related Results

REBALANCING DATA FOR CANCER-ASSOCIATED THROMBOSIS: COMPARISON OF DIFFERENT RESAMPLING APPROACH
REBALANCING DATA FOR CANCER-ASSOCIATED THROMBOSIS: COMPARISON OF DIFFERENT RESAMPLING APPROACH
Objective: Cancer-associated thrombosis (CAT) presents a complex challenge in oncology, exacerbated by data imbalances in related datasets that often lead to suboptimal outcomes in...
Advanced Re-Sampling Techniques for Multi-Class Imbalanced Classification
Advanced Re-Sampling Techniques for Multi-Class Imbalanced Classification
Imbalanced classification is a common problem in machine learning, where one class significantly outnumbers the others. This imbalance leads to biased model performance, where the ...
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study...
Comparative analysis of resampling algorithms in the prediction of stroke diseases
Comparative analysis of resampling algorithms in the prediction of stroke diseases
Stroke disease is a serious cause of death globally. Early predictions of the disease will save a lot of lives but most of the clinical datasets are imbalanced in nature including ...
Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems
Machine Learning Based on Resampling Approaches and Deep Reinforcement Learning for Credit Card Fraud Detection Systems
The problem of imbalanced datasets is a significant concern when creating reliable credit card fraud (CCF) detection systems. In this work, we study and evaluate recent advances in...
An automated approach for binary classification on imbalanced data
An automated approach for binary classification on imbalanced data
Abstract Imbalanced data is present in various business areas and must be dealt with the appropriate resampling techniques and classification algorithms. However, there is ...
Selective Ensemble Learning Algorithm for Imbalanced Dataset
Selective Ensemble Learning Algorithm for Imbalanced Dataset
Abstract Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classif...
Machine Learning Modelling for Imbalanced Dataset: Case Study of Adolescent Obesity in Malaysia
Machine Learning Modelling for Imbalanced Dataset: Case Study of Adolescent Obesity in Malaysia
Obesity among adolescent is a public health issue with increasing burden of disease. Predicting imbalanced health data with Machine Learning may introduce bias and lead to diminish...

Back to Top