Understanding Code Quality: A Qualitative Evaluation of LLM-Generated vs. Human-Written Code.

Saved in:
Bibliographic Details
Title: Understanding Code Quality: A Qualitative Evaluation of LLM-Generated vs. Human-Written Code.
Authors: Naqvi, Abiha1, Jain, Apeksha1, Goyal, Avisha1, Verma, Ankita1 ankita.verma@mail.jiit.ac.in
Source: International Journal of Performability Engineering. Oct2025, Vol. 21 Issue 10, p559-571. 13p.
Subjects: Artificial intelligence, Computer software quality control, Code generators, Computer software development, Python programming language, Software measurement, C++
Abstract: As Large Language Models (LLMs) like GPT and Gemini become increasingly integrated into software development, understanding their capabilities and limitations is essential. This project evaluates the effectiveness of these models in code generation by comparing AIgenerated code to human-written code in C++ and Python. Key software quality metrics—including cyclomatic complexity, lines of code, and space and time complexity—are used to assess the performance, efficiency, and readability of the generated code. The study also examines how prompt complexity, analyzed at two distinct levels, influences the quality of code produced by the models. By highlighting the strengths and weaknesses of LLMs in handling programming tasks of varying difficulty, this research provides valuable insights for developers, researchers, and industry professionals. The findings aim to inform best practices for integrating AI assistance into development workflows, ensuring a balance between automation and human oversight. Ultimately, this work contributes to more efficient and maintainable coding practices in an AI-augmented development landscape. [ABSTRACT FROM AUTHOR]
Copyright of International Journal of Performability Engineering is the property of Totem Publisher, Inc. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:As Large Language Models (LLMs) like GPT and Gemini become increasingly integrated into software development, understanding their capabilities and limitations is essential. This project evaluates the effectiveness of these models in code generation by comparing AIgenerated code to human-written code in C++ and Python. Key software quality metrics—including cyclomatic complexity, lines of code, and space and time complexity—are used to assess the performance, efficiency, and readability of the generated code. The study also examines how prompt complexity, analyzed at two distinct levels, influences the quality of code produced by the models. By highlighting the strengths and weaknesses of LLMs in handling programming tasks of varying difficulty, this research provides valuable insights for developers, researchers, and industry professionals. The findings aim to inform best practices for integrating AI assistance into development workflows, ensuring a balance between automation and human oversight. Ultimately, this work contributes to more efficient and maintainable coding practices in an AI-augmented development landscape. [ABSTRACT FROM AUTHOR]
ISSN:09731318
DOI:10.23940/ijpe.25.10.p3.559571