Javascript must be enabled to continue!
Selective Ensemble Learning Algorithm for Imbalanced Dataset
View through CrossRef
Abstract
Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classifier have a great impact on the performance of the ensemble classifier. In order to solve above problem toimprove the generalization performance of ensemble classifier, a selective ensemble learning algorithm based on cluster under-sampling for imbalanced dataset is proposed. First, the proposed algorithm calculates the number K of under-sampling samples according to the relationship between class sample density. Then, we use the improved K-means clustering algorithm to under-sample the majority class samples and obtain K cluster centers. Then, all cluster centers (or the sample of the nearest cluster center) are regarded as new majority samples to construct a new balanced training subset combine with the minority class’s samples. Repeat those processes to generate multiple training subsets and get multiple base-classifiers. However, with the increasing of iterations, the number of base-classifiers increase, and the similarity among the base-classifiers will also increase. Therefore, it is necessary to select some base-classifier with good classification performance and large difference for ensemble. In the stage of selecting base-classifiers, according to the difference and performance of base-classifiers, we use the idea of maximum correlation and minimum redundancy to select base-classifiers. In the ensemble stage, G-mean or F-mean, which is selected to evaluate the classification performance of base-classifier forimbalanced dataset, is selected to compute the weight of each base-classifier, and then the weighted voting method is used for ensemble. Finally, the simulation results on the artificial dataset, UCI dataset and KDDCUP dataset show that the algorithm has good generalization performance on imbalanced dataset, especially on the dataset with high imbalance degree.
Title: Selective Ensemble Learning Algorithm for Imbalanced Dataset
Description:
Abstract
Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classifier have a great impact on the performance of the ensemble classifier.
In order to solve above problem toimprove the generalization performance of ensemble classifier, a selective ensemble learning algorithm based on cluster under-sampling for imbalanced dataset is proposed.
First, the proposed algorithm calculates the number K of under-sampling samples according to the relationship between class sample density.
Then, we use the improved K-means clustering algorithm to under-sample the majority class samples and obtain K cluster centers.
Then, all cluster centers (or the sample of the nearest cluster center) are regarded as new majority samples to construct a new balanced training subset combine with the minority class’s samples.
Repeat those processes to generate multiple training subsets and get multiple base-classifiers.
However, with the increasing of iterations, the number of base-classifiers increase, and the similarity among the base-classifiers will also increase.
Therefore, it is necessary to select some base-classifier with good classification performance and large difference for ensemble.
In the stage of selecting base-classifiers, according to the difference and performance of base-classifiers, we use the idea of maximum correlation and minimum redundancy to select base-classifiers.
In the ensemble stage, G-mean or F-mean, which is selected to evaluate the classification performance of base-classifier forimbalanced dataset, is selected to compute the weight of each base-classifier, and then the weighted voting method is used for ensemble.
Finally, the simulation results on the artificial dataset, UCI dataset and KDDCUP dataset show that the algorithm has good generalization performance on imbalanced dataset, especially on the dataset with high imbalance degree.
Related Results
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND
As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine
Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine
<p>Data tidak seimbang menjadi salah satu masalah yang muncul pada masalah prediksi atau klasifikasi. Penelitian ini memfokuskan untuk mengatasi masalah data tidak seimbang p...
Machine Learning Modelling for Imbalanced Dataset: Case Study of Adolescent Obesity in Malaysia
Machine Learning Modelling for Imbalanced Dataset: Case Study of Adolescent Obesity in Malaysia
Obesity among adolescent is a public health issue with increasing burden of disease. Predicting imbalanced health data with Machine Learning may introduce bias and lead to diminish...
Advanced Re-Sampling Techniques for Multi-Class Imbalanced Classification
Advanced Re-Sampling Techniques for Multi-Class Imbalanced Classification
Imbalanced classification is a common problem in machine learning, where one class significantly outnumbers the others. This imbalance leads to biased model performance, where the ...
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study...
Handling the Imbalanced Problem in Agri-Food Data Analysis
Handling the Imbalanced Problem in Agri-Food Data Analysis
Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck in machine learning/data mining and is becoming a serious issue o...
“Continuous Neural Correlates of Imbalanced Reinforcement Learning in Obsessive-Compulsive Disorder and Healthy Individuals”
“Continuous Neural Correlates of Imbalanced Reinforcement Learning in Obsessive-Compulsive Disorder and Healthy Individuals”
Abstract
Aim
Obsessive-compulsive disorder (OCD) is characterized by imbalanced reinforcement learning. This study investigated...

