Javascript must be enabled to continue!

Feature Learning via Correlation Analysis for Effective Duplicate Detection

With the growing reliance on software, the frequency of software bugs has increased significantly. To address these issues, users or developers typically submit bug reports, which developers analyze and resolve. However, many submitted bug reports are duplicates of previously reported issues, creating inefficiencies in the bug resolution process. To enhance developer productivity, an automatic method for detecting duplicate bug reports is essential. In this study, we present a novel approach for identifying duplicate and nonduplicate bug reports using feature learning through correlation analysis. Our method utilizes bug report features, including product and component information, extracted from bug repositories. The process begins with preprocessing the bug reports to ensure data quality. Next, a feature selection algorithm identifies relevant features, which are then used to train a machine learning model based on bidirectional encoder representations from transformers (BERT). The proposed model’s effectiveness was evaluated across multiple datasets: Apache, JDT, Platform, KDE, Core, Firefox, and Thunderbird. Our results show detection accuracies of 91.41%, 88.66%, 86.08%, 92.94%, 90.68%, 88.25%, and 91.62%, respectively. These outcomes represent a significant improvement of 32% to 41% compared to baseline models, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), convolutional LSTMs (CNN-LSTMs), Naive Bayes classifiers, and random forest classifiers. Our findings show that the proposed model is highly effective for duplicate bug report prediction and offers substantial advancements over existing methods. This approach has the potential to streamline bug management processes and improve overall software development efficiency.

MDPI AG

Geunseok Yang Jinfeng Ji Taemin Kim

Applied Sciences

2025

Title: Feature Learning via Correlation Analysis for Effective Duplicate Detection

Description:

With the growing reliance on software, the frequency of software bugs has increased significantly.

To address these issues, users or developers typically submit bug reports, which developers analyze and resolve.

However, many submitted bug reports are duplicates of previously reported issues, creating inefficiencies in the bug resolution process.

To enhance developer productivity, an automatic method for detecting duplicate bug reports is essential.

In this study, we present a novel approach for identifying duplicate and nonduplicate bug reports using feature learning through correlation analysis.

Our method utilizes bug report features, including product and component information, extracted from bug repositories.

The process begins with preprocessing the bug reports to ensure data quality.

Next, a feature selection algorithm identifies relevant features, which are then used to train a machine learning model based on bidirectional encoder representations from transformers (BERT).

The proposed model’s effectiveness was evaluated across multiple datasets: Apache, JDT, Platform, KDE, Core, Firefox, and Thunderbird.

Our results show detection accuracies of 91.

41%, 88.

66%, 86.

08%, 92.

94%, 90.

68%, 88.

25%, and 91.

62%, respectively.

These outcomes represent a significant improvement of 32% to 41% compared to baseline models, including convolutional neural networks (CNNs), long short-term memory networks (LSTMs), convolutional LSTMs (CNN-LSTMs), Naive Bayes classifiers, and random forest classifiers.

Our findings show that the proposed model is highly effective for duplicate bug report prediction and offers substantial advancements over existing methods.

This approach has the potential to streamline bug management processes and improve overall software development efficiency.

Back

Abstract Duplicate record is a common problem within data sets especially in huge volume databases. The accuracy of duplicate detection determines the efficiency ...

CREATING LEARNING MEDIA IN TEACHING ENGLISH AT SMP MUHAMMADIYAH 2 PAGELARAN ACADEMIC YEAR 2020/2021

The pandemic Covid-19 currently demands teachers to be able to use technology in teaching and learning process. But in reality there are still many teachers who have not been able ...

Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

Learning retention mechanisms and evolutionary parameters of duplicate genes from their expression data

Abstract Learning about the roles that duplicate genes play in the origins of novel phenotypes requires an understanding of how their functions e...

A Near-Duplicate Video Detection Method Based on Invariant Moments and Feature Point Matching

In this paper, a two-level near-duplicate video detection method based on invariant moment was proposed. To reduce the computational complexity of near-duplicate video detection, a...

From features to functions : leveraging protein feature architectures in comparative genomics

When analyzing genomic data, one of the key challenges is the annotation of new genes. The toolkit for incorporating newly discovered proteins into a comprehensive evolutionary and...

Depth-aware salient object segmentation

Object segmentation is an important task which is widely employed in many computer vision applications such as object detection, tracking, recognition, and ret...

Advanced frameworks for fraud detection leveraging quantum machine learning and data science in fintech ecosystems

The rapid expansion of the fintech sector has brought with it an increasing demand for robust and sophisticated fraud detection systems capable of managing large volumes of financi...

Email:
Password:

Email:

Feature Learning via Correlation Analysis for Effective Duplicate Detection

Related Results