Javascript must be enabled to continue!

Selective Ensemble Learning Algorithm for Imbalanced Dataset

Abstract Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classifier have a great impact on the performance of the ensemble classifier. In order to solve above problem toimprove the generalization performance of ensemble classifier, a selective ensemble learning algorithm based on cluster under-sampling for imbalanced dataset is proposed. First, the proposed algorithm calculates the number K of under-sampling samples according to the relationship between class sample density. Then, we use the improved K-means clustering algorithm to under-sample the majority class samples and obtain K cluster centers. Then, all cluster centers (or the sample of the nearest cluster center) are regarded as new majority samples to construct a new balanced training subset combine with the minority class’s samples. Repeat those processes to generate multiple training subsets and get multiple base-classifiers. However, with the increasing of iterations, the number of base-classifiers increase, and the similarity among the base-classifiers will also increase. Therefore, it is necessary to select some base-classifier with good classification performance and large difference for ensemble. In the stage of selecting base-classifiers, according to the difference and performance of base-classifiers, we use the idea of maximum correlation and minimum redundancy to select base-classifiers. In the ensemble stage, G-mean or F-mean, which is selected to evaluate the classification performance of base-classifier forimbalanced dataset, is selected to compute the weight of each base-classifier, and then the weighted voting method is used for ensemble. Finally, the simulation results on the artificial dataset, UCI dataset and KDDCUP dataset show that the algorithm has good generalization performance on imbalanced dataset, especially on the dataset with high imbalance degree.

Springer Science and Business Media LLC

Du Hongle Zhang Yan Ke Gang Zhang Lin Chen Yeh-Cheng

2022

Title: Selective Ensemble Learning Algorithm for Imbalanced Dataset

Description:

In order to solve above problem toimprove the generalization performance of ensemble classifier, a selective ensemble learning algorithm based on cluster under-sampling for imbalanced dataset is proposed.

First, the proposed algorithm calculates the number K of under-sampling samples according to the relationship between class sample density.

Then, we use the improved K-means clustering algorithm to under-sample the majority class samples and obtain K cluster centers.

Then, all cluster centers (or the sample of the nearest cluster center) are regarded as new majority samples to construct a new balanced training subset combine with the minority class’s samples.

Repeat those processes to generate multiple training subsets and get multiple base-classifiers.

However, with the increasing of iterations, the number of base-classifiers increase, and the similarity among the base-classifiers will also increase.

Therefore, it is necessary to select some base-classifier with good classification performance and large difference for ensemble.

In the stage of selecting base-classifiers, according to the difference and performance of base-classifiers, we use the idea of maximum correlation and minimum redundancy to select base-classifiers.

In the ensemble stage, G-mean or F-mean, which is selected to evaluate the classification performance of base-classifier forimbalanced dataset, is selected to compute the weight of each base-classifier, and then the weighted voting method is used for ensemble.

Finally, the simulation results on the artificial dataset, UCI dataset and KDDCUP dataset show that the algorithm has good generalization performance on imbalanced dataset, especially on the dataset with high imbalance degree.

Back

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine

<p>Data tidak seimbang menjadi salah satu masalah yang muncul pada masalah prediksi atau klasifikasi. Penelitian ini memfokuskan untuk mengatasi masalah data tidak seimbang p...

Machine Learning Modelling for Imbalanced Dataset: Case Study of Adolescent Obesity in Malaysia

Obesity among adolescent is a public health issue with increasing burden of disease. Predicting imbalanced health data with Machine Learning may introduce bias and lead to diminish...

Advanced Re-Sampling Techniques for Multi-Class Imbalanced Classification

Imbalanced classification is a common problem in machine learning, where one class significantly outnumbers the others. This imbalance leads to biased model performance, where the ...

Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector

Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study...

Handling the Imbalanced Problem in Agri-Food Data Analysis

Imbalanced data situations exist in most fields of endeavor. The problem has been identified as a major bottleneck in machine learning/data mining and is becoming a serious issue o...

“Continuous Neural Correlates of Imbalanced Reinforcement Learning in Obsessive-Compulsive Disorder and Healthy Individuals”

Abstract Aim Obsessive-compulsive disorder (OCD) is characterized by imbalanced reinforcement learning. This study investigated...

Email:
Password:

Email:

Selective Ensemble Learning Algorithm for Imbalanced Dataset

Related Results