Javascript must be enabled to continue!

Explaining the Imperfect: How do LLMs Respond to Smelly Code?

View through CrossRef

Code smells, indicators of suboptimal design or implementation, contribute to technical debt by impairing software comprehensibility and maintainability. While Large Language Models (LLMs) can define code smells accurately when prompted, their ability to appropriately handle smelly code when generating explanations is not well understood, despite the prevalence of smells in training datasets. This study empirically investigates how two distinct LLM types-the general-purpose chat applications (e.g., GPT-4o, GLM-4) and the open-source code-specific models (e.g., CodeT5+, CodeQwen1.5)-respond to method-level Java code containing smells. We categorized smells into Structural Complexity, Data/Type Issues, and Expression Clarity, and tasked LLMs with generating summary-length explanations for smelly and non-smelly code. Our multi-faceted evaluation, including N-gram metrics and corpus-level analysis, reveals that both Chat LLMs and open code LLMs generally exhibit limited sensitivity to the presence of these code smells in their explanatory outputs. They tend to produce consistent, fluent explanations that often do not significantly differentiate between smelly and non-smelly code. These findings underscore the need for LLMs to develop a more nuanced awareness of diverse code characteristics to effectively assist developers in understanding and addressing code quality issues.

Institute of Electrical and Electronics Engineers (IEEE)

Zhengyi Zhuo Yan Liu Xindi Yang

Title: Explaining the Imperfect: How do LLMs Respond to Smelly Code?

Description:

Code smells, indicators of suboptimal design or implementation, contribute to technical debt by impairing software comprehensibility and maintainability.

While Large Language Models (LLMs) can define code smells accurately when prompted, their ability to appropriately handle smelly code when generating explanations is not well understood, despite the prevalence of smells in training datasets.

This study empirically investigates how two distinct LLM types-the general-purpose chat applications (e.

g.

, GPT-4o, GLM-4) and the open-source code-specific models (e.

g.

, CodeT5+, CodeQwen1.

5)-respond to method-level Java code containing smells.

We categorized smells into Structural Complexity, Data/Type Issues, and Expression Clarity, and tasked LLMs with generating summary-length explanations for smelly and non-smelly code.

Our multi-faceted evaluation, including N-gram metrics and corpus-level analysis, reveals that both Chat LLMs and open code LLMs generally exhibit limited sensitivity to the presence of these code smells in their explanatory outputs.

They tend to produce consistent, fluent explanations that often do not significantly differentiate between smelly and non-smelly code.

These findings underscore the need for LLMs to develop a more nuanced awareness of diverse code characteristics to effectively assist developers in understanding and addressing code quality issues.

Related Results

Explaining the Imperfect: How do LLMs Respond to Smelly Code?

Code smells are indicators of suboptimal design or implementation that contribute to technical debt, impairing software comprehensibility and maintainability. While Large Language ...

Exploring Large Language Models Integration in the Histopathologic Diagnosis of Skin Diseases: A Comparative Study

Abstract Introduction The exact manner in which large language models (LLMs) will be integrated into pathology is not yet fully comprehended. This study examines the accuracy, bene...

Perspectives and Experiences With Large Language Models in Health Care: Survey Study (Preprint)

BACKGROUND Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unifie...

Perspectives and Experiences With Large Language Models in Health Care: Survey Study

Background Large language models (LLMs) are transforming how data is used, including within the health care sector. However, frameworks including the Unified Th...

A Systematic Review of ChatGPT and Other Conversational Large Language Models in Healthcare

Abstract Background The launch of the Chat Generative Pre-trained Transformer (ChatGPT) in November 2022 has attracted public a...

Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points

<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...

Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points

<p><code>Intelligent reflecting surface (IRS) is a promising concept for </code><code><u>6G</u></code><code> wireless communications...

LLMs and AI: Understanding Its Reach and Impact

Large Language Models (LLMs) have revolutionized the field of Artificial Intelligence with their ability to understand and generate natural language discourse. This has led to the ...