Javascript must be enabled to continue!

Systematic Evaluation of AI-Generated Python Code: A Comparative Study across Progressive Programming Tasks

Abstract Background: AI-based code assistants are on the rise in software development as powerful technologies offering streamlining of code generation and better-quality code. However, their effectiveness is very variable, and understanding their pros and cons becomes very important in regard to using them optimally. Introduction: This study evaluates the capabilities of four prominent AI-based code assistants: GitHub Copilot, Microsoft Copilot, Tabnine, and ChatGPT. This study addresses whether the quality of produced code is functional, efficient, and maintainable and whether it has areas for improvement. Methodology: The AI-generated code was compared in terms of correctness, McCabe complexity (cyclomatic complexity), efficiency, and code size. The correctness percentage was the portion of code without errors, McCabe complexity was used for measuring structural complexity, execution performance represented efficiency, and the size of the code was just by lines of code. All AI tools were benchmarked against a standard set of 100 prompts to ensure like-for-like assessment. Results: GitHub Copilot had the highest correctness at 42%, and ChatGPT generated the most complex code—which was measured with a McCabe complexity score of 2.92. Efficiencywise, ChatGPT also topped with the highest number of codes that meet the "good" criteria. On average, Tabnine produced the shortest code, whereas GitHub Copilot and ChatGPT were among the most verbose. The analysis revealed that although AI-based assistants can generate high-quality code, they usually produce code far different from solutions written by developers themselves, and it is difficult for them to cope with dependencies between classes. Conclusion: AI-based code assistants have substantial potential for improvingcode generation and software development efficiency. However, challenges remain, particularly in handling complex dependencies and producing ready-to-use code. The study suggests that leveraging the strengths of different assistants and focusing on enhancing their ability to manage complex coding scenarios could lead to significant advancements. Ongoing research and development are essential to address these limitations and fully harness the potential of AI-based code assistants in software development.

Springer Science and Business Media LLC

Yang Qianyi

2024

Title: Systematic Evaluation of AI-Generated Python Code: A Comparative Study across Progressive Programming Tasks

Description:

Abstract Background: AI-based code assistants are on the rise in software development as powerful technologies offering streamlining of code generation and better-quality code.

However, their effectiveness is very variable, and understanding their pros and cons becomes very important in regard to using them optimally.

Introduction: This study evaluates the capabilities of four prominent AI-based code assistants: GitHub Copilot, Microsoft Copilot, Tabnine, and ChatGPT.

This study addresses whether the quality of produced code is functional, efficient, and maintainable and whether it has areas for improvement.

Methodology: The AI-generated code was compared in terms of correctness, McCabe complexity (cyclomatic complexity), efficiency, and code size.

The correctness percentage was the portion of code without errors, McCabe complexity was used for measuring structural complexity, execution performance represented efficiency, and the size of the code was just by lines of code.

All AI tools were benchmarked against a standard set of 100 prompts to ensure like-for-like assessment.

Results: GitHub Copilot had the highest correctness at 42%, and ChatGPT generated the most complex code—which was measured with a McCabe complexity score of 2.

92.

Efficiencywise, ChatGPT also topped with the highest number of codes that meet the "good" criteria.

On average, Tabnine produced the shortest code, whereas GitHub Copilot and ChatGPT were among the most verbose.

The analysis revealed that although AI-based assistants can generate high-quality code, they usually produce code far different from solutions written by developers themselves, and it is difficult for them to cope with dependencies between classes.

Conclusion: AI-based code assistants have substantial potential for improvingcode generation and software development efficiency.

However, challenges remain, particularly in handling complex dependencies and producing ready-to-use code.

The study suggests that leveraging the strengths of different assistants and focusing on enhancing their ability to manage complex coding scenarios could lead to significant advancements.

Ongoing research and development are essential to address these limitations and fully harness the potential of AI-based code assistants in software development.

Back

In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...

Basic and Advance: Phython Programming

"This book will introduce you to the python programming language. It's aimed at beginning programmers, but even if you have written programs before and just want to add python to y...

Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points

<code>Intelligent reflecting surface (IRS) is a promising concept for </code><code>6G</code><code> wireless communications...

Joint Beamforming and Aerial IRS Positioning Design for IRS-assisted MISO System with Multiple Access Points

<code>Intelligent reflecting surface (IRS) is a promising concept for </code><code>6G</code><code> wireless communications...

Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review

This review summarizes the evidence from six randomized controlled trials that judged the effectiveness of systematic review summaries on policymakers' decision making, or the most...

Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report

Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...

PYTHON POWERED INTELLIGENCE AND ML

Python Powered Intelligence And ML is designed to be your essential companion in your journey through the world of Artificial Intelligence and Python programming. We understand th...

Code Plagiarism Checking Function and Its Application for Code Writing Problem in Java Programming Learning Assistant System

A web-based Java programming learning assistant system (JPLAS) has been developed for novice students to study Java programming by themselves while enhancing code reading and code ...

Email:
Password:

Email:

Systematic Evaluation of AI-Generated Python Code: A Comparative Study across Progressive Programming Tasks

Related Results