Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Feature Learning via Correlation Analysis for Effective Duplicate Detection

View through CrossRef
With the growing reliance on software, the frequency of software bugs has increased significantly. To address these issues, users or developers typically submit bug reports, which developers analyze and resolve. However, many submitted bug reports are duplicates of previously reported issues, creating inefficiencies in the bug resolution process. To enhance developer productivity, an automatic method for detecting duplicate bug reports is essential. In this study, we present a novel approach for identifying duplicate and nonduplicate bug reports using feature learning through correlation analysis. Our method utilizes bug report features, including product and component information, extracted from bug repositories. The process begins with preprocessing the bug reports to ensure data quality. Next, a feature selection algorithm identifies relevant features, which are then used to train a machine learning model based on bidirectional encoder representations from transformers (BERT). The proposed model’s effectiveness was evaluated across multiple datasets: Apache, JDT, Platform, KDE, Core, Firefox, and Thunderbird. Our results show detection accuracies of 91.41%, 88.66%, 86.08%, 92.94%, 90.68%, 88.25%, and 91.62%, respectively. These outcomes represent a significant improvement of 32% to 41% compared to baseline models, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), convolutional LSTMs (CNN-LSTMs), Naive Bayes classifiers, and random forest classifiers. Our findings show that the proposed model is highly effective for duplicate bug report prediction and offers substantial advancements over existing methods. This approach has the potential to streamline bug management processes and improve overall software development efficiency.
Title: Feature Learning via Correlation Analysis for Effective Duplicate Detection
Description:
With the growing reliance on software, the frequency of software bugs has increased significantly.
To address these issues, users or developers typically submit bug reports, which developers analyze and resolve.
However, many submitted bug reports are duplicates of previously reported issues, creating inefficiencies in the bug resolution process.
To enhance developer productivity, an automatic method for detecting duplicate bug reports is essential.
In this study, we present a novel approach for identifying duplicate and nonduplicate bug reports using feature learning through correlation analysis.
Our method utilizes bug report features, including product and component information, extracted from bug repositories.
The process begins with preprocessing the bug reports to ensure data quality.
Next, a feature selection algorithm identifies relevant features, which are then used to train a machine learning model based on bidirectional encoder representations from transformers (BERT).
The proposed model’s effectiveness was evaluated across multiple datasets: Apache, JDT, Platform, KDE, Core, Firefox, and Thunderbird.
Our results show detection accuracies of 91.
41%, 88.
66%, 86.
08%, 92.
94%, 90.
68%, 88.
25%, and 91.
62%, respectively.
These outcomes represent a significant improvement of 32% to 41% compared to baseline models, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), convolutional LSTMs (CNN-LSTMs), Naive Bayes classifiers, and random forest classifiers.
Our findings show that the proposed model is highly effective for duplicate bug report prediction and offers substantial advancements over existing methods.
This approach has the potential to streamline bug management processes and improve overall software development efficiency.

Related Results

Missing values compensation in duplicates detection using hot deck method
Missing values compensation in duplicates detection using hot deck method
Abstract Duplicate record is a common problem within data sets especially in huge volume databases. The accuracy of duplicate detection determines the efficiency ...
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021
The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)
BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...
A Near-Duplicate Video Detection Method Based on Invariant Moments and Feature Point Matching
A Near-Duplicate Video Detection Method Based on Invariant Moments and Feature Point Matching
In this paper, a two-level near-duplicate video detection method based on invariant moment was proposed. To reduce the computational complexity of near-duplicate video detection, a...
From features to functions : leveraging protein feature architectures in comparative genomics
From features to functions : leveraging protein feature architectures in comparative genomics
When analyzing genomic data, one of the key challenges is the annotation of new genes. The toolkit for incorporating newly discovered proteins into a comprehensive evolutionary and...
Depth-aware salient object segmentation
Depth-aware salient object segmentation
Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...
Real-Time Fraud Detection Using Reinforcement Learning with Dynamic Feature Selection
Real-Time Fraud Detection Using Reinforcement Learning with Dynamic Feature Selection
Real-time fraud detection systems require sophisticated approaches capable of adapting to evolving fraud patterns while maintaining high accuracy and minimal false positive rates u...
Advanced frameworks for fraud detection leveraging quantum machine learning and data science in fintech ecosystems
Advanced frameworks for fraud detection leveraging quantum machine learning and data science in fintech ecosystems
The rapid expansion of the fintech sector has brought with it an increasing demand for robust and sophisticated fraud detection systems capable of managing large volumes of financi...

Back to Top