Javascript must be enabled to continue!

Explaining the Imperfect: How do LLMs Respond to Smelly Code?

View through CrossRef

Code smells are indicators of suboptimal design or implementation that contribute to technical debt, impairing software comprehensibility and maintainability. While Large Language Models (LLMs), particularly Chat LLMs like GPT-4o, are increasingly adopted by the developer community, their ability to handle such code imperfections remains under-explored. Existing studies show that LLMs can accurately define code smells when explicitly queried. However, it is unclear whether this knowledge translates into their code explanation capabilities. In this study, we perform a detailed empirical analysis of both general-purpose Chat LLMs (GPT-4o, GLM-4) and specialized code LLMs (CodeT5+, CodeQwen1.5) to investigate their sensitivity to code smells. We tasked these models with generating method-level summaries for Java code containing smells across three categories: Structural Complexity, Data/Type Smells, and Expression Clarity. Our results reveal that Chat LLMs exhibit limited sensitivity to code smells, often prioritizing fluent and polite explanations over identifying underlying quality issues. Both generalist and specialized models tend to generate consistent summaries regardless of the presence of smells, effectively masking potential code risks. We conclude that future LLMs require a more nuanced awareness of diverse code characteristics to effectively assist developers in code comprehension and maintenance.

Institute of Electrical and Electronics Engineers (IEEE)

Zhengyi Zhuo Yan Liu Xindi Yang

Title: Explaining the Imperfect: How do LLMs Respond to Smelly Code?

Description:

Code smells are indicators of suboptimal design or implementation that contribute to technical debt, impairing software comprehensibility and maintainability.

While Large Language Models (LLMs), particularly Chat LLMs like GPT-4o, are increasingly adopted by the developer community, their ability to handle such code imperfections remains under-explored.

Existing studies show that LLMs can accurately define code smells when explicitly queried.

However, it is unclear whether this knowledge translates into their code explanation capabilities.

In this study, we perform a detailed empirical analysis of both general-purpose Chat LLMs (GPT-4o, GLM-4) and specialized code LLMs (CodeT5+, CodeQwen1.

5) to investigate their sensitivity to code smells.

We tasked these models with generating method-level summaries for Java code containing smells across three categories: Structural Complexity, Data/Type Smells, and Expression Clarity.

Our results reveal that Chat LLMs exhibit limited sensitivity to code smells, often prioritizing fluent and polite explanations over identifying underlying quality issues.

Both generalist and specialized models tend to generate consistent summaries regardless of the presence of smells, effectively masking potential code risks.

We conclude that future LLMs require a more nuanced awareness of diverse code characteristics to effectively assist developers in code comprehension and maintenance.

Related Results

Explaining the Imperfect: How do LLMs Respond to Smelly Code?

Code smells, indicators of suboptimal design or implementation, contribute to technical debt by impairing software comprehensibility and maintainability. While Large Language Model...

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)

BACKGROUND Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unifie...

Perspectives and Experiences With Large Language Models in Health Care: Survey Study

Background Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unified Th...

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Abstract Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public a...

Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points

<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...

Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points

<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...

LLMs and AI: Understanding Its Reach and Impact

Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence with their ability to understand and generate natural language discourse. This has led to the ...