Javascript must be enabled to continue!

Investigating the Refactoring Capabilities of Small Open-Weight Language Models

Refactoring is essential for developing maintainable software. Using Large Language Models in software engineering is widespread, but compared to well-established domains such as code generation, reliable refactoring is still relatively underexplored. In this paper, we perform a broad analysis on the refactoring capabilities of small open-weight language models (SLMs) by evaluating 12 models on 3,453 Python programs. Our study focuses on the two defining aspects of refactoring: behavior preservation and code quality improvement. We evaluate these properties using unit tests and various code metrics. Across models ranging from 0.5B to 8B parameters, most models improve code quality. Larger models are more reliable, as they preserve behavior more consistently. Reasoning models often make more significant changes while refactoring. Allowing models to generate reasoning traces improves performance, but only for models larger than 4B. For smaller models, reasoning in fact reduces refactoring reliability. The difficulty of the underlying task affects refactoring performance, with more complex tasks associated with higher failure rates. Our results indicate that current open SLMs can support refactoring tasks, especially larger ones with reasoning capabilities, but they are best used with human oversight.

MDPI AG

Tamás Márton Balázs Szalontai Balázs Pintér Tibor Gregorics

2026

Title: Investigating the Refactoring Capabilities of Small Open-Weight Language Models

Description:

Refactoring is essential for developing maintainable software.

Using Large Language Models in software engineering is widespread, but compared to well-established domains such as code generation, reliable refactoring is still relatively underexplored.

In this paper, we perform a broad analysis on the refactoring capabilities of small open-weight language models (SLMs) by evaluating 12 models on 3,453 Python programs.

Our study focuses on the two defining aspects of refactoring: behavior preservation and code quality improvement.

We evaluate these properties using unit tests and various code metrics.

Across models ranging from 0.

5B to 8B parameters, most models improve code quality.

Larger models are more reliable, as they preserve behavior more consistently.

Reasoning models often make more significant changes while refactoring.

Allowing models to generate reasoning traces improves performance, but only for models larger than 4B.

For smaller models, reasoning in fact reduces refactoring reliability.

The difficulty of the underlying task affects refactoring performance, with more complex tasks associated with higher failure rates.

Our results indicate that current open SLMs can support refactoring tasks, especially larger ones with reasoning capabilities, but they are best used with human oversight.

Back

<p><em><span style="font-size: 11.0pt; font-family: 'Times New Roman',serif; mso-fareast-font-family: 'Times New Roman'; mso-ansi-language: EN-US; mso-fareast-langua...

A Systematic Literature Review on Software- refactoring Techniques, Challenges, and Practices

Abstract Software-refactoring improves the quality and reduces the complexity during the whole life cycle of the software system. The objective of this work is to elicit th...

Refactoring for Java-Structured Concurrency

Structured concurrency treats multiple tasks running in different threads as a single unit, thereby improving reliability and enhancing observability. The existing IDE (Integrated ...

A Task-driven Grammar Refactoring Algorithm

This paper presents our proposal and the implementation of an algorithm for automated refactoring of context-free grammars. Rather than operating under some domain-specific task, i...

Učinak poučavanja razrednomu jeziku u izobrazbi nastavnika njemačkoga

The actual use of classroom language is principally limited to the classroom environment. As far as foreign language learning is concerned, the classroom often turns out to be the ...

[RETRACTED] Prima Weight Loss Dragons Den UK v1

[RETRACTED]Prima Weight Loss Dragons Den UK :-Obesity is a not kidding medical issue brought about by devouring an excessive amount of fat, eating terrible food sources, and practi...

[RETRACTED] Prima Weight Loss Dragons Den UK v1

[RETRACTED]Prima Weight Loss Dragons Den UK :-Obesity is a not kidding medical issue brought about by devouring an excessive amount of fat, eating terrible food sources, and practi...

The Study on Software Architecture Smell Refactoring

Abstract Maintenance and complexity issues in software development continue to increase because of new requirements and software evolution, and refactoring is required to h...

Email:
Password:

Email:

Investigating the Refactoring Capabilities of Small Open-Weight Language Models

Related Results