Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Explaining the Imperfect: How do LLMs Respond to Smelly Code?

View through CrossRef
Code smells, indicators of suboptimal design or implementation, contribute to technical debt by impairing software comprehensibility and maintainability. While Large Language Models (LLMs) can define code smells accurately when prompted, their ability to appropriately handle smelly code when generating explanations is not well understood, despite the prevalence of smells in training datasets. This study empirically investigates how two distinct LLM types-the general-purpose chat applications (e.g., GPT-4o, GLM-4) and the open-source code-specific models (e.g., CodeT5+, CodeQwen1.5)-respond to method-level Java code containing smells. We categorized smells into Structural Complexity, Data/Type Issues, and Expression Clarity, and tasked LLMs with generating summary-length explanations for smelly and non-smelly code. Our multi-faceted evaluation, including N-gram metrics and corpus-level analysis, reveals that both Chat LLMs and open code LLMs generally exhibit limited sensitivity to the presence of these code smells in their explanatory outputs. They tend to produce consistent, fluent explanations that often do not significantly differentiate between smelly and non-smelly code. These findings underscore the need for LLMs to develop a more nuanced awareness of diverse code characteristics to effectively assist developers in understanding and addressing code quality issues.
Institute of Electrical and Electronics Engineers (IEEE)
Title: Explaining the Imperfect: How do LLMs Respond to Smelly Code?
Description:
Code smells, indicators of suboptimal design or implementation, contribute to technical debt by impairing software comprehensibility and maintainability.
While Large Language Models (LLMs) can define code smells accurately when prompted, their ability to appropriately handle smelly code when generating explanations is not well understood, despite the prevalence of smells in training datasets.
This study empirically investigates how two distinct LLM types-the general-purpose chat applications (e.
g.
, GPT-4o, GLM-4) and the open-source code-specific models (e.
g.
, CodeT5+, CodeQwen1.
5)-respond to method-level Java code containing smells.
We categorized smells into Structural Complexity, Data/Type Issues, and Expression Clarity, and tasked LLMs with generating summary-length explanations for smelly and non-smelly code.
Our multi-faceted evaluation, including N-gram metrics and corpus-level analysis, reveals that both Chat LLMs and open code LLMs generally exhibit limited sensitivity to the presence of these code smells in their explanatory outputs.
They tend to produce consistent, fluent explanations that often do not significantly differentiate between smelly and non-smelly code.
These findings underscore the need for LLMs to develop a more nuanced awareness of diverse code characteristics to effectively assist developers in understanding and addressing code quality issues.

Related Results

Explaining the Imperfect: How do LLMs Respond to Smelly Code?
Explaining the Imperfect: How do LLMs Respond to Smelly Code?
Code smells are indicators of suboptimal design or implementation that contribute to technical debt, impairing software comprehensibility and maintainability. While Large Language ...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)
Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)
BACKGROUND Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unifie...
Perspectives and Experiences With Large Language Models in Health Care: Survey Study
Perspectives and Experiences With Large Language Models in Health Care: Survey Study
Background Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unified Th...
A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare
A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare
Abstract Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public a...
LLMs and AI: Understanding Its Reach and Impact
LLMs and AI: Understanding Its Reach and Impact
Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence with their ability to understand and generate natural language discourse. This has led to the ...

Back to Top