Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Investigating the Refactoring Capabilities of Small Open-Weight Language Models

View through CrossRef
Refactoring is essential for developing maintainable software. Using Large Language Models in software engineering is widespread, but compared to well-established domains such as code generation, reliable refactoring is still relatively underexplored. In this paper, we perform a broad analysis on the refactoring capabilities of small open-weight language models (SLMs) by evaluating 12 models on 3,453 Python programs. Our study focuses on the two defining aspects of refactoring: behavior preservation and code quality improvement. We evaluate these properties using unit tests and various code metrics. Across models ranging from 0.5B to 8B parameters, most models improve code quality. Larger models are more reliable, as they preserve behavior more consistently. Reasoning models often make more significant changes while refactoring. Allowing models to generate reasoning traces improves performance, but only for models larger than 4B. For smaller models, reasoning in fact reduces refactoring reliability. The difficulty of the underlying task affects refactoring performance, with more complex tasks associated with higher failure rates. Our results indicate that current open SLMs can support refactoring tasks, especially larger ones with reasoning capabilities, but they are best used with human oversight.
Title: Investigating the Refactoring Capabilities of Small Open-Weight Language Models
Description:
Refactoring is essential for developing maintainable software.
Using Large Language Models in software engineering is widespread, but compared to well-established domains such as code generation, reliable refactoring is still relatively underexplored.
In this paper, we perform a broad analysis on the refactoring capabilities of small open-weight language models (SLMs) by evaluating 12 models on 3,453 Python programs.
Our study focuses on the two defining aspects of refactoring: behavior preservation and code quality improvement.
We evaluate these properties using unit tests and various code metrics.
Across models ranging from 0.
5B to 8B parameters, most models improve code quality.
Larger models are more reliable, as they preserve behavior more consistently.
Reasoning models often make more significant changes while refactoring.
Allowing models to generate reasoning traces improves performance, but only for models larger than 4B.
For smaller models, reasoning in fact reduces refactoring reliability.
The difficulty of the underlying task affects refactoring performance, with more complex tasks associated with higher failure rates.
Our results indicate that current open SLMs can support refactoring tasks, especially larger ones with reasoning capabilities, but they are best used with human oversight.

Related Results

Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
Hubungan Perilaku Pola Makan dengan Kejadian Anak Obesitas
<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...
A Systematic Literature Review on Software- refactoring Techniques, Challenges, and Practices
A Systematic Literature Review on Software- refactoring Techniques, Challenges, and Practices
Abstract Software-refactoring improves the quality and reduces the complexity during the whole life cycle of the software system. The objective of this work is to elicit th...
Refactoring for Java-Structured Concurrency
Refactoring for Java-Structured Concurrency
Structured concurrency treats multiple tasks running in different threads as a single unit, thereby improving reliability and enhancing observability. The existing IDE (Integrated ...
A Task-driven Grammar Refactoring Algorithm
A Task-driven Grammar Refactoring Algorithm
This paper presents our proposal and the implementation of an algorithm for automated refactoring of context-free grammars. Rather than operating under some domain-specific task, i...
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga
The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED]Prima Weight Loss Dragons Den UK :-Obesity is a not kidding medical issue brought about by devouring an excessive amount of fat, eating terrible food sources, and practi...
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED] Prima Weight Loss Dragons Den UK v1
[RETRACTED]Prima Weight Loss Dragons Den UK :-Obesity is a not kidding medical issue brought about by devouring an excessive amount of fat, eating terrible food sources, and practi...
The Study on Software Architecture Smell Refactoring
The Study on Software Architecture Smell Refactoring
Abstract Maintenance and complexity issues in software development continue to increase because of new requirements and software evolution, and refactoring is required to h...

Back to Top