Javascript must be enabled to continue!

Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring

The article’s subject matter deals with the management of software clones. Software clones are duplicate code fragments that can exist in the same or different software files. Software clone detection and management has become a well-established research area. Software clones should be managed to minimize their ill-effects, as the presence of clones can increase the software’s maintenance cost and resource requirements. Refactoring is a commonly used technique for managing clones. A software clone detection tool can detect many clones from the software, but not all detected clones are suitable for refactoring. A developer needs a subset of detected clones that can be easily refactored. This study aims to suggest software clones for refactoring using machine learning techniques. This study evaluates the performance of fourteen machine-learning algorithms and investigates the influence of three feature selection methods on clone recommendation accuracy. The tasks to be solved are as follows: selecting appropriate features from datasets, developing machine learning-based models that can suggest suitable clones for refactoring, and selecting an effective machine learning and feature selection algorithm for recommending clones for refactoring. The methods used for feature selection are correlation, InfoGain, and ReliefF. The study is conducted on datasets from six open-source software written in Java. The experimental results show that the Decision Tree and LogitBoost classifiers achieve the highest accuracy of 94.44 % on the Lucene dataset. ReliefF yields the best performance among the feature selection methods, particularly when used with the Decision Tree algorithm. This study concludes that Random Committee, Random Forest, and Decision Tree perform best when paired with correlation, InfoGain, and ReliefF, respectively. Overall, the Decision Tree classifier, combined with the ReliefF feature selection method, delivers the highest average precision, recall, and F-score across datasets.

National Aerospace University - Kharkiv Aviation Institute

Manpreet Kaur Dhavleesh Rattan Madan Lal

Radioelectronic and Computer Systems

2025

Title: Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring

Description:

The article’s subject matter deals with the management of software clones.

Software clones are duplicate code fragments that can exist in the same or different software files.

Software clone detection and management has become a well-established research area.

Software clones should be managed to minimize their ill-effects, as the presence of clones can increase the software’s maintenance cost and resource requirements.

Refactoring is a commonly used technique for managing clones.

A software clone detection tool can detect many clones from the software, but not all detected clones are suitable for refactoring.

A developer needs a subset of detected clones that can be easily refactored.

This study aims to suggest software clones for refactoring using machine learning techniques.

This study evaluates the performance of fourteen machine-learning algorithms and investigates the influence of three feature selection methods on clone recommendation accuracy.

The tasks to be solved are as follows: selecting appropriate features from datasets, developing machine learning-based models that can suggest suitable clones for refactoring, and selecting an effective machine learning and feature selection algorithm for recommending clones for refactoring.

The methods used for feature selection are correlation, InfoGain, and ReliefF.

The study is conducted on datasets from six open-source software written in Java.

The experimental results show that the Decision Tree and LogitBoost classifiers achieve the highest accuracy of 94.

44 % on the Lucene dataset.

ReliefF yields the best performance among the feature selection methods, particularly when used with the Decision Tree algorithm.

This study concludes that Random Committee, Random Forest, and Decision Tree perform best when paired with correlation, InfoGain, and ReliefF, respectively.

Overall, the Decision Tree classifier, combined with the ReliefF feature selection method, delivers the highest average precision, recall, and F-score across datasets.

Back

Abstract Software-refactoring improves the quality and reduces the complexity during the whole life cycle of the software system. The objective of this work is to elicit th...

Changes of Clonality of Paroxysmal Nocturnal Hemoglobinuria (PNH) Clones during Clinical Courses in Patients with PNH.

Abstract Paroxysmal nocturnal hemoglobinuria (PNH) is an acquired hematological disorder, in which almost all hematopoietic cells lack glycosylphosphatidylinositol (...

Refactoring for Java-Structured Concurrency

Structured concurrency treats multiple tasks running in different threads as a single unit, thereby improving reliability and enhancing observability. The existing IDE (Integrated ...

Investigating the Refactoring Capabilities of Small Open-Weight Language Models

Refactoring is essential for developing maintainable software. Using Large Language Models in software engineering is widespread, but compared to well-established domains such as c...

A Task-driven Grammar Refactoring Algorithm

This paper presents our proposal and the implementation of an algorithm for automated refactoring of context-free grammars. Rather than operating under some domain-specific task, i...

Selection of Injectable Drug Product Composition using Machine Learning Models (Preprint)

BACKGROUND As of July 2020, a Web of Science search of “machine learning (ML)” nested within the search of “pharmacokinetics or pharmacodynamics” yielded over 100...

Centaurs transitioning to JFCs: thermal and dynamical evolution

<p>1- Context</p> <p>Jupiter-family Comets are continuously replenished from their outer solar system reservoirs. Before they enter the in...

THE INFLUENCE OF CLIMATE CHANGES ON THE CONDITION AND REPRODUCTIVE PROCESSES OF PINE OF THE COMMON FINNISH ORIGIN IN THE CONDITIONS OF VINNYTSIA REGION

According to research results, it was established that all clones were characterized by intensive formation of microstrobils and pollination. Tree analysis showed that clones of Fi...

Email:
Password:

Email:

Empirical evaluation of feature selection and machine learning techniques to recommend clones for software refactoring

Related Results