Search engine for discovering works of Art, research articles, and books related to Art and Culture
ShareThis
Javascript must be enabled to continue!

Systematic Evaluation of AI-Generated Python Code: A Comparative Study across Progressive Programming Tasks

View through CrossRef
Abstract Background: AI-based code assistants are on the rise in software development as powerful technologies offering streamlining of code generation and better-quality code. However, their effectiveness is very variable, and understanding their pros and cons becomes very important in regard to using them optimally. Introduction: This study evaluates the capabilities of four prominent AI-based code assistants: GitHub Copilot, Microsoft Copilot, Tabnine, and ChatGPT. This study addresses whether the quality of produced code is functional, efficient, and maintainable and whether it has areas for improvement. Methodology: The AI-generated code was compared in terms of correctness, McCabe complexity (cyclomatic complexity), efficiency, and code size. The correctness percentage was the portion of code without errors, McCabe complexity was used for measuring structural complexity, execution performance represented efficiency, and the size of the code was just by lines of code. All AI tools were benchmarked against a standard set of 100 prompts to ensure like-for-like assessment. Results: GitHub Copilot had the highest correctness at 42%, and ChatGPT generated the most complex code—which was measured with a McCabe complexity score of 2.92. Efficiencywise, ChatGPT also topped with the highest number of codes that meet the "good" criteria. On average, Tabnine produced the shortest code, whereas GitHub Copilot and ChatGPT were among the most verbose. The analysis revealed that although AI-based assistants can generate high-quality code, they usually produce code far different from solutions written by developers themselves, and it is difficult for them to cope with dependencies between classes. Conclusion: AI-based code assistants have substantial potential for improvingcode generation and software development efficiency. However, challenges remain, particularly in handling complex dependencies and producing ready-to-use code. The study suggests that leveraging the strengths of different assistants and focusing on enhancing their ability to manage complex coding scenarios could lead to significant advancements. Ongoing research and development are essential to address these limitations and fully harness the potential of AI-based code assistants in software development.
Springer Science and Business Media LLC
Title: Systematic Evaluation of AI-Generated Python Code: A Comparative Study across Progressive Programming Tasks
Description:
Abstract Background: AI-based code assistants are on the rise in software development as powerful technologies offering streamlining of code generation and better-quality code.
However, their effectiveness is very variable, and understanding their pros and cons becomes very important in regard to using them optimally.
Introduction: This study evaluates the capabilities of four prominent AI-based code assistants: GitHub Copilot, Microsoft Copilot, Tabnine, and ChatGPT.
This study addresses whether the quality of produced code is functional, efficient, and maintainable and whether it has areas for improvement.
Methodology: The AI-generated code was compared in terms of correctness, McCabe complexity (cyclomatic complexity), efficiency, and code size.
The correctness percentage was the portion of code without errors, McCabe complexity was used for measuring structural complexity, execution performance represented efficiency, and the size of the code was just by lines of code.
All AI tools were benchmarked against a standard set of 100 prompts to ensure like-for-like assessment.
Results: GitHub Copilot had the highest correctness at 42%, and ChatGPT generated the most complex code—which was measured with a McCabe complexity score of 2.
92.
Efficiencywise, ChatGPT also topped with the highest number of codes that meet the "good" criteria.
On average, Tabnine produced the shortest code, whereas GitHub Copilot and ChatGPT were among the most verbose.
The analysis revealed that although AI-based assistants can generate high-quality code, they usually produce code far different from solutions written by developers themselves, and it is difficult for them to cope with dependencies between classes.
Conclusion: AI-based code assistants have substantial potential for improvingcode generation and software development efficiency.
However, challenges remain, particularly in handling complex dependencies and producing ready-to-use code.
The study suggests that leveraging the strengths of different assistants and focusing on enhancing their ability to manage complex coding scenarios could lead to significant advancements.
Ongoing research and development are essential to address these limitations and fully harness the potential of AI-based code assistants in software development.

Related Results

Primerjalna književnost na prelomu tisočletja
Primerjalna književnost na prelomu tisočletja
In a comprehensive and at times critical manner, this volume seeks to shed light on the development of events in Western (i.e., European and North American) comparative literature ...
Basic and Advance: Phython Programming
Basic and Advance: Phython Programming
"This book will introduce you to the python programming language. It's aimed at beginning programmers, but even if you have written programs before and just want to add python to y...
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
Do evidence summaries increase health policy‐makers' use of evidence from systematic reviews? A systematic review
This review summarizes the evidence from six randomized controlled trials that judged the effectiveness of systematic review summaries on policymakers' decision making, or the most...
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Evaluating the Science to Inform the Physical Activity Guidelines for Americans Midcourse Report
Abstract The Physical Activity Guidelines for Americans (Guidelines) advises older adults to be as active as possible. Yet, despite the well documented benefits of physical a...
PYTHON POWERED INTELLIGENCE AND ML
PYTHON POWERED INTELLIGENCE AND ML
Python Powered Intelligence And ML is designed to be your essential companion in your journey through the world of Artificial Intelligence and Python programming. We understand th...
Code Plagiarism Checking Function and Its Application for Code Writing Problem in Java Programming Learning Assistant System
Code Plagiarism Checking Function and Its Application for Code Writing Problem in Java Programming Learning Assistant System
A web-based Java programming learning assistant system (JPLAS) has been developed for novice students to study Java programming by themselves while enhancing code reading and code ...

Back to Top