Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

IMPLEMENTATION OF DATA LEVEL APPROACH TECHNIQUES TO SOLVE UNBALANCED DATA CASE ON SOFTWARE DEFECT CLASSIFICATION

View through CrossRef
Defects can cause significant software rework, delays, and high costs, to prevent disability it must be predictable the possibility of defects. To predict the disability the metrics software dataset is used. NASA MDP is one of the popular software metrics used to predict software defects by having 13 datasets and is generally unbalanced. The reward in the dataset can reduce the prediction of software defects because more unbalanced data produces a majority class. Data imbalance can be handled with 2 approaches, namely the data level approach technique and the algorithm level approach technique. The data level approach technique aims to improve class distribution by using resampling and data synthesis techniques. This research proposes a data level approach using resampling techniques, namely Random Oversampling (ROS), Random Undersampling (RUS), Synthetic Minority Oversampling Technique (SMOTE), Tomek Link (TL) and One-Sided Selection (OSS) which are classified with Naïve Bayes was also validated using 10 Fold Cross-Validation, then evaluated with the Area Under ROC Curve (AUC). Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.7277 using the Synthetic Minority Oversampling Technique (SMOTE). Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.62587. Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.7277 using the Synthetic Minority Oversampling Technique (SMOTE). Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.62587. Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.7277 using the Synthetic Minority Oversampling Technique (SMOTE). Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.62587.
Title: IMPLEMENTATION OF DATA LEVEL APPROACH TECHNIQUES TO SOLVE UNBALANCED DATA CASE ON SOFTWARE DEFECT CLASSIFICATION
Description:
Defects can cause significant software rework, delays, and high costs, to prevent disability it must be predictable the possibility of defects.
To predict the disability the metrics software dataset is used.
NASA MDP is one of the popular software metrics used to predict software defects by having 13 datasets and is generally unbalanced.
The reward in the dataset can reduce the prediction of software defects because more unbalanced data produces a majority class.
Data imbalance can be handled with 2 approaches, namely the data level approach technique and the algorithm level approach technique.
The data level approach technique aims to improve class distribution by using resampling and data synthesis techniques.
This research proposes a data level approach using resampling techniques, namely Random Oversampling (ROS), Random Undersampling (RUS), Synthetic Minority Oversampling Technique (SMOTE), Tomek Link (TL) and One-Sided Selection (OSS) which are classified with Naïve Bayes was also validated using 10 Fold Cross-Validation, then evaluated with the Area Under ROC Curve (AUC).
Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.
7277 using the Synthetic Minority Oversampling Technique (SMOTE).
Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.
62587.
Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.
7277 using the Synthetic Minority Oversampling Technique (SMOTE).
Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.
62587.
Prediction results based on the dataset obtained the best AUC value on MC2 with a value of 0.
7277 using the Synthetic Minority Oversampling Technique (SMOTE).
Prediction results based on the data level approach technique obtained the best average AUC value using Tomek Link (TL) with a value of 0.
62587.

Related Results

Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Mining Software Repositories for Defect Categorization
Mining Software Repositories for Defect Categorization
Early detection of software defects is very important to decrease the software cost and subsequently increase the software quality. Success of software industries not only depends ...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract Introduction Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Clinical and Radiographic Assessment of Periodontal Infrabony Defect Depth and Width and Their Correlation
Clinical and Radiographic Assessment of Periodontal Infrabony Defect Depth and Width and Their Correlation
Brief Background There is preliminary evidence of periodontal defect depth, number of walls and the width of infrabony defects exerting influence on the regenerative potential of p...
Intelligent Radar Software Defect Classification Approach based on the Latent Dirichlet Allocation Topic Model
Intelligent Radar Software Defect Classification Approach based on the Latent Dirichlet Allocation Topic Model
Abstract Existing software intelligent defect classification approaches don’t consider radar characters and prior statistics information. Thus when applying these appaorach...
Ensemble Machine Learning Model for Software Defect Prediction
Ensemble Machine Learning Model for Software Defect Prediction
Software defect prediction is a significant activity in every software firm. It helps in producing quality software by reliable defect prediction, defect elimination, and predictio...
Unbalanced Translocation in a Phenotypically Normal Male Patient Detected by Karyotyping and Array-comparative Genomic Hybridization
Unbalanced Translocation in a Phenotypically Normal Male Patient Detected by Karyotyping and Array-comparative Genomic Hybridization
Background: nbalanced translocations can cause developmental delay, intellectual disability, growth problems, dysmorphic features, and congenital anomalies Unbalanced chromosome re...
Visual software defect prediction method based on improved recurrent criss-cross residual network
Visual software defect prediction method based on improved recurrent criss-cross residual network
Purpose This study aims to solve the problems of large training sample size, low data sample quality, low efficiency of the currently used classical model, high computational compl...

Back to Top