Javascript must be enabled to continue!

Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection

Bug reports are artefacts that document defects encountered by users or developers. Rapid testing and release cycles often lead to the creation of similar or near-duplicate bug reports, introducing redundancy, delaying triage, and increasing maintenance overhead. Although prior research has extensively explored automated methods to detect and manage duplicate bug reports, their natural language nature makes recent advances in large language models (LLMs)—particularly BERT and its successors—a promising avenue for improving robustness and accuracy. In this study, we investigate the use of LLMs to identify duplicate bug reports (DBRs), focusing on the impact of fine-tuning an all-mpnet-base-v2 model, which builds on BERT-based architectures while addressing several of their limitations. We fine-tuned the model using large, open-source bug tracking datasets from the Eclipse, OpenOffice, Firefox, and NetBeans projects. Our evaluation shows that fine-tuning yields only marginal performance improvements across all datasets. We also discuss the trade-offs involved in fine-tuning LLMs for this task, including hyperparameter tuning guidelines and the practical challenges posed by computational and financial cost.

Sociedade Brasileira de Computação

Luiz Eduardo Philippi Rosane Robert Einer Mert Yurdakul Francisco Gomes de Oliveira Neto

Anais do XXXIX Simpósio Brasileiro de Engenharia de Software (SBES 2025)

2025

Title: Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection

Description:

Bug reports are artefacts that document defects encountered by users or developers.

Rapid testing and release cycles often lead to the creation of similar or near-duplicate bug reports, introducing redundancy, delaying triage, and increasing maintenance overhead.

Although prior research has extensively explored automated methods to detect and manage duplicate bug reports, their natural language nature makes recent advances in large language models (LLMs)—particularly BERT and its successors—a promising avenue for improving robustness and accuracy.

In this study, we investigate the use of LLMs to identify duplicate bug reports (DBRs), focusing on the impact of fine-tuning an all-mpnet-base-v2 model, which builds on BERT-based architectures while addressing several of their limitations.

We fine-tuned the model using large, open-source bug tracking datasets from the Eclipse, OpenOffice, Firefox, and NetBeans projects.

Our evaluation shows that fine-tuning yields only marginal performance improvements across all datasets.

We also discuss the trade-offs involved in fine-tuning LLMs for this task, including hyperparameter tuning guidelines and the practical challenges posed by computational and financial cost.

Back

Abstarct Introduction Isolated brain hydatid disease (BHD) is an extremely rare form of echinococcosis. A prompt and timely diagnosis is a crucial step in disease management. This ...

Bug Report Summarization by Using Swarm Intelligence Approaches

Background: Bug reports are considered as a reference document, during the maintenance phase of the software development process. The developer's counsel them at whatever point the...

Breast Carcinoma within Fibroadenoma: A Systematic Review

Abstract Introduction Fibroadenoma is the most common benign breast lesion; however, it carries a potential risk of malignant transformation. This systematic review provides an ove...

Effective Bug Triage With Software Reliability

Programming associations spend in excess of 45 percent of cost in overseeing programming bugs. An inevitable progress of settling bugs is bug triage, which wants to precisely dole ...

A Comparative Study of Multilabel Classification Techniques for Analyzing Bug Report Dependencies

Bug report dependency analysis entails identifying and examining the interrelations among software bug reports. Dependencies may indicate that bugs are interconnected, with one bug...

Using CNN to Predict the Resolution Status of Bug Reports

Abstract Bug tracking systems (BTS) are a resource for receiving bug reports that help to improve software applications. They usually contain reports reported by the...

Software Bug Ontology Supporting Bug Search on Peer-to-Peer Networks

This paper presents a semantics-based bug search system that allows users to solve bugs by searching similar bug reports on peer-to-peer networks. This system uses a bug schema to ...

Classification of open source software bug report based on transfer learning

AbstractCurrently, the feature richness of text encoding vectors in the bug report classification model based on deep learning is limited by the size of the domain dataset and the ...

Email:
Password:

Email:

Evaluating Fine-tuning Approaches for Duplicate Bug Report Detection

Related Results