Javascript must be enabled to continue!
Selective Ensemble Learning Algorithm for Imbalanced Dataset
View through CrossRef
Abstract
Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classifier have a great impact on the performance of the ensemble classifier. In order to solve above problem toimprove the generalization performance of ensemble classifier, a selective ensemble learning algorithm based on cluster under-sampling for imbalanced dataset is proposed. First, the proposed algorithm calculates the number K of under-sampling samples according to the relationship between class sample density. Then, we use the improved K-means clustering algorithm to under-sample the majority class samples and obtain K cluster centers. Then, all cluster centers (or the sample of the nearest cluster center) are regarded as new majority samples to construct a new balanced training subset combine with the minority class’s samples. Repeat those processes to generate multiple training subsets and get multiple base-classifiers. However, with the increasing of iterations, the number of base-classifiers increase, and the similarity among the base-classifiers will also increase. Therefore, it is necessary to select some base-classifier with good classification performance and large difference for ensemble. In the stage of selecting base-classifiers, according to the difference and performance of base-classifiers, we use the idea of maximum correlation and minimum redundancy to select base-classifiers. In the ensemble stage, G-mean or F-mean, which is selected to evaluate the classification performance of base-classifier forimbalanced dataset, is selected to compute the weight of each base-classifier, and then the weighted voting method is used for ensemble. Finally, the simulation results on the artificial dataset, UCI dataset and KDDCUP dataset show that the algorithm has good generalization performance on imbalanced dataset, especially on the dataset with high imbalance degree.
Title: Selective Ensemble Learning Algorithm for Imbalanced Dataset
Description:
Abstract
Under the imbalanced dataset, the performance of the base-classifier, the computingmethod of weight of base-classifier and the selection method of the base-classifier have a great impact on the performance of the ensemble classifier.
In order to solve above problem toimprove the generalization performance of ensemble classifier, a selective ensemble learning algorithm based on cluster under-sampling for imbalanced dataset is proposed.
First, the proposed algorithm calculates the number K of under-sampling samples according to the relationship between class sample density.
Then, we use the improved K-means clustering algorithm to under-sample the majority class samples and obtain K cluster centers.
Then, all cluster centers (or the sample of the nearest cluster center) are regarded as new majority samples to construct a new balanced training subset combine with the minority class’s samples.
Repeat those processes to generate multiple training subsets and get multiple base-classifiers.
However, with the increasing of iterations, the number of base-classifiers increase, and the similarity among the base-classifiers will also increase.
Therefore, it is necessary to select some base-classifier with good classification performance and large difference for ensemble.
In the stage of selecting base-classifiers, according to the difference and performance of base-classifiers, we use the idea of maximum correlation and minimum redundancy to select base-classifiers.
In the ensemble stage, G-mean or F-mean, which is selected to evaluate the classification performance of base-classifier forimbalanced dataset, is selected to compute the weight of each base-classifier, and then the weighted voting method is used for ensemble.
Finally, the simulation results on the artificial dataset, UCI dataset and KDDCUP dataset show that the algorithm has good generalization performance on imbalanced dataset, especially on the dataset with high imbalance degree.
Related Results
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Application of Machine Learning Techniques for Customer Churn Prediction in the Banking Sector
Aim/Purpose: Previous studies have primarily focused on comparing predictive models without considering the impact of data preprocessing on model performance. Therefore, this study...
Elevating Security Analysis: The MTLPT Framework for Enhanced Vulnerability Prediction
Elevating Security Analysis: The MTLPT Framework for Enhanced Vulnerability Prediction
Abstract
In the current field of vulnerability prediction, accurate forecasting and identification of potential vulnerabilities in software are crucial, especially when dea...
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic
Initial Experience with Pediatrics Online Learning for Nonclinical Medical Students During the COVID-19 Pandemic
Abstract
Background: To minimize the risk of infection during the COVID-19 pandemic, the learning mode of universities in China has been adjusted, and the online learning o...
Multivariate Ensemble Sensitivity Analysis for an Extreme Weather Event Over Indian Subcontinent
Multivariate Ensemble Sensitivity Analysis for an Extreme Weather Event Over Indian Subcontinent
<p>Ensemble forecasts have proven useful for diagnosing the source of forecast uncertainty in a wide variety of atmospheric systems. Ensemble Sensitivity Analysis (ES...
Heteroscedastic-embedded Ensemble for Imbalanced Massive Data Classification
Heteroscedastic-embedded Ensemble for Imbalanced Massive Data Classification
Abstract
The imbalanced learning methods aim to learn the unbiased models from massive class-imbalanced datasets. However, due to the uncertainty of data distributions affe...
A Credit Scoring Model Based on Integrated Mixed Sampling and Ensemble Feature Selection: RBR_XGB
A Credit Scoring Model Based on Integrated Mixed Sampling and Ensemble Feature Selection: RBR_XGB
<p>With the rapid development of the economy, financial institutions pay more and more attention to the importance of financial credit risk. The XGBoost algorithm is often us...
Adversarial Learning Improves Vision-Based Perception from Drones with Imbalanced Datasets
Adversarial Learning Improves Vision-Based Perception from Drones with Imbalanced Datasets
This work proposes a vision-based perception algorithm that combines image-processing-based detection and tracking of aerial objects with convolutional neural networks (CNNs) integ...
Ensemble learning with imbalanced data handling in the early detection of capital markets
Ensemble learning with imbalanced data handling in the early detection of capital markets
Research aims: This study aims to create an early detection model to predict events in the Indonesian capital market.Design/Methodology/Approach: A quantitative study comparing ens...

