Javascript must be enabled to continue!
An automated approach for binary classification on imbalanced data
View through CrossRef
Abstract
Imbalanced data is present in various business areas and must be dealt with the appropriate resampling techniques and classification algorithms. However, there is a magnitude of multiple combinations of resampling and learning methods to handle imbalanced data that require specialised knowledge to be used correctly. In this paper, several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data. The application developed delivers recommendations of the most suited combinations of techniques for a specific dataset, by extracting and comparing dataset meta-features values recorded in a knowledge base. It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution time.
Title: An automated approach for binary classification on imbalanced data
Description:
Abstract
Imbalanced data is present in various business areas and must be dealt with the appropriate resampling techniques and classification algorithms.
However, there is a magnitude of multiple combinations of resampling and learning methods to handle imbalanced data that require specialised knowledge to be used correctly.
In this paper, several approaches, ranging from more accessible and more advanced in the domains of data resampling and cost-sensitive techniques, will be considered to handle imbalanced data.
The application developed delivers recommendations of the most suited combinations of techniques for a specific dataset, by extracting and comparing dataset meta-features values recorded in a knowledge base.
It facilitates effortless classification and automates part of the machine learning pipeline with comparable or better results to a state-of-the-art solution and with a much smaller execution time.
Related Results
BINARY TOPOLOGY BASED ON SOME NEW SETS
BINARY TOPOLOGY BASED ON SOME NEW SETS
In this chapter, we introduce and some new sets called binary -open sets, binary -sets, binary -sets, binary -closed sets, binary -sets and binary -sets , which are simple forms of...
Advanced Re-Sampling Techniques for Multi-Class Imbalanced Classification
Advanced Re-Sampling Techniques for Multi-Class Imbalanced Classification
Imbalanced classification is a common problem in machine learning, where one class significantly outnumbers the others. This imbalance leads to biased model performance, where the ...
Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine
Optimasi Data Tidak Seimbang pada Interaksi Drug Target dengan Sampling dan Ensemble Support Vector Machine
<p>Data tidak seimbang menjadi salah satu masalah yang muncul pada masalah prediksi atau klasifikasi. Penelitian ini memfokuskan untuk mengatasi masalah data tidak seimbang p...
Solving Imbalance Data Classification Problem by Particle Swarm Optimization Support Vector Machine
Solving Imbalance Data Classification Problem by Particle Swarm Optimization Support Vector Machine
A database has a plenty of hidden knowledge, which can be used in decision making to support commerce, research and other activities. Classification analysis performs a very import...
Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data
Comparison of Error Rate Prediction Methods in Classification Modeling with the CHAID Method for Imbalanced Data
CHAID (Chi-Square Automatic Interaction Detection) is one of the classification algorithms in the decision tree method. The classification results are displayed in the form of a tr...
Handling Fuzzy Similarity for Data Classification
Handling Fuzzy Similarity for Data Classification
Representing and consequently processing fuzzy data in standard and binary databases is problematic. The problem is further amplified in binary databases where continuous data is r...
Machine Learning Algorithms for Health Care Data Analytics Handling Imbalanced Datasets
Machine Learning Algorithms for Health Care Data Analytics Handling Imbalanced Datasets
In Machine Learning, classification is considered a supervised learning
technique to predict class samples based on labeled data. Classification techniques have
been applied to var...
Improving Medical Document Classification via Feature Engineering
Improving Medical Document Classification via Feature Engineering
<p dir="ltr">Document classification (DC) is the task of assigning the predefined labels to unseen documents by utilizing the model trained on the available labeled documents...

