Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection

View through CrossRef
Bug reports are artefacts that document defects encountered by users or developers. Rapid testing and release cycles often lead to the creation of similar or near-duplicate bug reports, introducing redundancy, delaying triage, and increasing maintenance overhead. Although prior research has extensively explored automated methods to detect and manage duplicate bug reports, their natural language nature makes recent advances in large language models (LLMs)—particularly BERT and its successors—a promising avenue for improving robustness and accuracy. In this study, we investigate the use of LLMs to identify duplicate bug reports (DBRs), focusing on the impact of fine-tuning an all-mpnet-base-v2 model, which builds on BERT-based architectures while addressing several of their limitations. We fine-tuned the model using large, open-source bug tracking datasets from the Eclipse, OpenOffice, Firefox, and NetBeans projects. Our evaluation shows that fine-tuning yields only marginal performance improvements across all datasets. We also discuss the trade-offs involved in fine-tuning LLMs for this task, including hyperparameter tuning guidelines and the practical challenges posed by computational and financial cost.
Title: Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection
Description:
Bug reports are artefacts that document defects encountered by users or developers.
Rapid testing and release cycles often lead to the creation of similar or near-duplicate bug reports, introducing redundancy, delaying triage, and increasing maintenance overhead.
Although prior research has extensively explored automated methods to detect and manage duplicate bug reports, their natural language nature makes recent advances in large language models (LLMs)—particularly BERT and its successors—a promising avenue for improving robustness and accuracy.
In this study, we investigate the use of LLMs to identify duplicate bug reports (DBRs), focusing on the impact of fine-tuning an all-mpnet-base-v2 model, which builds on BERT-based architectures while addressing several of their limitations.
We fine-tuned the model using large, open-source bug tracking datasets from the Eclipse, OpenOffice, Firefox, and NetBeans projects.
Our evaluation shows that fine-tuning yields only marginal performance improvements across all datasets.
We also discuss the trade-offs involved in fine-tuning LLMs for this task, including hyperparameter tuning guidelines and the practical challenges posed by computational and financial cost.

Related Results

Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Bug Report Summarization by Using Swarm Intelligence Approaches
Bug Report Summarization by Using Swarm Intelligence Approaches
Background: Bug reports are considered as a reference document, during the maintenance phase of the software development process. The developer's counsel them at whatever point the...
Bug Tracking System to Reduce Duplicate Bug Reports Using Cost-Aware Algorithm
Bug Tracking System to Reduce Duplicate Bug Reports Using Cost-Aware Algorithm
Software engineers rely heavily on bug-tracking solutions to help direct their maintenance efforts. In certain projects, as many as quarters of all bug reports are duplicates, redu...
Feature Learning via Correlation Analysis for Effective Duplicate Detection
Feature Learning via Correlation Analysis for Effective Duplicate Detection
With the growing reliance on software, the frequency of software bugs has increased significantly. To address these issues, users or developers typically submit bug reports, which ...
Electric field tuning characteristic of multiple optical parametric oscillator based on MgO:QPLN
Electric field tuning characteristic of multiple optical parametric oscillator based on MgO:QPLN
The quasi-phase matching optical parametric oscillator tuning methods, i.e. grating period tuning, temperature tuning, pumping wavelength tuning, and angle tuning are more simple a...
Effective Bug Triage With Software Reliability
Effective Bug Triage With Software Reliability
Programming associations spend in excess of 45 percent of cost in overseeing programming bugs. An inevitable progress of settling bugs is bug triage, which wants to precisely dole ...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract Introduction Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Missing values compensation in duplicates detection using hot deck method
Missing values compensation in duplicates detection using hot deck method
Abstract Duplicate record is a common problem within data sets especially in huge volume databases. The accuracy of duplicate detection determines the efficiency ...

Back to Top