Javascript must be enabled to continue!
Explaining the Imperfect: How do LLMs Respond to Smelly Code?
View through CrossRef
Code smells are indicators of suboptimal design or implementation that contribute to technical debt, impairing software comprehensibility and maintainability. While Large Language Models (LLMs), particularly Chat LLMs like GPT-4o, are increasingly adopted by the developer community, their ability to handle such code imperfections remains under-explored. Existing studies show that LLMs can accurately define code smells when explicitly queried. However, it is unclear whether this knowledge translates into their code explanation capabilities. In this study, we perform a detailed empirical analysis of both general-purpose Chat LLMs (GPT-4o, GLM-4) and specialized code LLMs (CodeT5+, CodeQwen1.5) to investigate their sensitivity to code smells. We tasked these models with generating method-level summaries for Java code containing smells across three categories: Structural Complexity, Data/Type Smells, and Expression Clarity. Our results reveal that Chat LLMs exhibit limited sensitivity to code smells, often prioritizing fluent and polite explanations over identifying underlying quality issues. Both generalist and specialized models tend to generate consistent summaries regardless of the presence of smells, effectively masking potential code risks. We conclude that future LLMs require a more nuanced awareness of diverse code characteristics to effectively assist developers in code comprehension and maintenance.
Title: Explaining the Imperfect: How do LLMs Respond to Smelly Code?
Description:
Code smells are indicators of suboptimal design or implementation that contribute to technical debt, impairing software comprehensibility and maintainability.
While Large Language Models (LLMs), particularly Chat LLMs like GPT-4o, are increasingly adopted by the developer community, their ability to handle such code imperfections remains under-explored.
Existing studies show that LLMs can accurately define code smells when explicitly queried.
However, it is unclear whether this knowledge translates into their code explanation capabilities.
In this study, we perform a detailed empirical analysis of both general-purpose Chat LLMs (GPT-4o, GLM-4) and specialized code LLMs (CodeT5+, CodeQwen1.
5) to investigate their sensitivity to code smells.
We tasked these models with generating method-level summaries for Java code containing smells across three categories: Structural Complexity, Data/Type Smells, and Expression Clarity.
Our results reveal that Chat LLMs exhibit limited sensitivity to code smells, often prioritizing fluent and polite explanations over identifying underlying quality issues.
Both generalist and specialized models tend to generate consistent summaries regardless of the presence of smells, effectively masking potential code risks.
We conclude that future LLMs require a more nuanced awareness of diverse code characteristics to effectively assist developers in code comprehension and maintenance.
Related Results
Explaining the Imperfect: How do LLMs Respond to Smelly Code?
Explaining the Imperfect: How do LLMs Respond to Smelly Code?
Code smells, indicators of suboptimal design or implementation, contribute to technical debt by impairing software comprehensibility and maintainability. While Large Language Model...
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study
Abstract
Introduction
The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...
Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)
Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)
BACKGROUND
Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unifie...
Perspectives and Experiences With Large Language Models in Health Care: Survey Study
Perspectives and Experiences With Large Language Models in Health Care: Survey Study
Background
Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unified Th...
A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare
A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare
Abstract
Background
The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public a...
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points
<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...
LLMs and AI: Understanding Its Reach and Impact
LLMs and AI: Understanding Its Reach and Impact
Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence with their ability to understand and generate natural language discourse. This has led to the ...

