Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection

View through CrossRef
Bug reports are artefacts that document defects encountered by users or developers. Rapid testing and release cycles often lead to the creation of similar or near-duplicate bug reports, introducing redundancy, delaying triage, and increasing maintenance overhead. Although prior research has extensively explored automated methods to detect and manage duplicate bug reports, their natural language nature makes recent advances in large language models (LLMs)—particularly BERT and its successors—a promising avenue for improving robustness and accuracy. In this study, we investigate the use of LLMs to identify duplicate bug reports (DBRs), focusing on the impact of fine-tuning an all-mpnet-base-v2 model, which builds on BERT-based architectures while addressing several of their limitations. We fine-tuned the model using large, open-source bug tracking datasets from the Eclipse, OpenOffice, Firefox, and NetBeans projects. Our evaluation shows that fine-tuning yields only marginal performance improvements across all datasets. We also discuss the trade-offs involved in fine-tuning LLMs for this task, including hyperparameter tuning guidelines and the practical challenges posed by computational and financial cost.
Title: Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection
Description:
Bug reports are artefacts that document defects encountered by users or developers.
Rapid testing and release cycles often lead to the creation of similar or near-duplicate bug reports, introducing redundancy, delaying triage, and increasing maintenance overhead.
Although prior research has extensively explored automated methods to detect and manage duplicate bug reports, their natural language nature makes recent advances in large language models (LLMs)—particularly BERT and its successors—a promising avenue for improving robustness and accuracy.
In this study, we investigate the use of LLMs to identify duplicate bug reports (DBRs), focusing on the impact of fine-tuning an all-mpnet-base-v2 model, which builds on BERT-based architectures while addressing several of their limitations.
We fine-tuned the model using large, open-source bug tracking datasets from the Eclipse, OpenOffice, Firefox, and NetBeans projects.
Our evaluation shows that fine-tuning yields only marginal performance improvements across all datasets.
We also discuss the trade-offs involved in fine-tuning LLMs for this task, including hyperparameter tuning guidelines and the practical challenges posed by computational and financial cost.

Related Results

Hydatid Disease of The Brain Parenchyma: A Systematic Review
Hydatid Disease of The Brain Parenchyma: A Systematic Review
Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...
Bug Report Summarization by Using Swarm Intelligence Approaches
Bug Report Summarization by Using Swarm Intelligence Approaches
Background: Bug reports are considered as a reference document, during the maintenance phase of the software development process. The developer's counsel them at whatever point the...
Breast Carcinoma within Fibroadenoma: A Systematic Review
Breast Carcinoma within Fibroadenoma: A Systematic Review
Abstract Introduction Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...
Effective Bug Triage With Software Reliability
Effective Bug Triage With Software Reliability
Programming associations spend in excess of 45 percent of cost in overseeing programming bugs. An inevitable progress of settling bugs is bug triage, which wants to precisely dole ...
A Comparative Study of Multilabel Classification Techniques for Analyzing Bug Report Dependencies
A Comparative Study of Multilabel Classification Techniques for Analyzing Bug Report Dependencies
Bug report dependency analysis entails identifying and examining the interrelations among software bug reports. Dependencies may indicate that bugs are interconnected, with one bug...
Using CNN to Predict the Resolution Status of Bug Reports
Using CNN to Predict the Resolution Status of Bug Reports
Abstract Bug tracking systems (BTS) are a resource for receiving bug reports that help to improve software applications. They usually contain reports reported by the...
Software Bug Ontology Supporting Bug Search on Peer-to-Peer Networks
Software Bug Ontology Supporting Bug Search on Peer-to-Peer Networks
This paper presents a semantics-based bug search system that allows users to solve bugs by searching similar bug reports on peer-to-peer networks. This system uses a bug schema to ...
Classification of open source software bug report based on transfer learning
Classification of open source software bug report based on transfer learning
AbstractCurrently, the feature richness of text encoding vectors in the bug report classification model based on deep learning is limited by the size of the domain dataset and the ...

Back to Top