Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning.
Saved in:
| Title: | Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning. |
|---|---|
| Authors: | Majumdar, Srijoni1,2 (AUTHOR) s.majumdar@leeds.ac.uk, Deshpande, Adwita3 (AUTHOR) adwita.deshpande.22033@iitgoa.ac.in, Das, Partha Pratim4 (AUTHOR) partha.das@ashoka.edu.in, Chakrabarti, Partha Pratim2 (AUTHOR) ppchak@cse.iitkgp.ac.in |
| Source: | Pattern Recognition Letters. Jan2026, Vol. 199, p295-302. 8p. |
| Subjects: | C (Computer program language), Language models, Evaluation methodology, Software maintenance, Annotations, Natural language processing, Machine learning |
| Abstract: | Software maintenance requires substantial time for program comprehension. Code comments significantly improve understandability by providing a glass-box view of the code and are thus essential for maintainability. Prior work has analyzed comment attributes, built automated systems to detect irrelevant comments, and applied machine learning to generate meaningful comments. With the rise of large language models, comment generation has accelerated, particularly for Java and Python. In this paper, we present a first-of-its-kind framework for code comment generation in C, a language widely used in low-level tasks. We explore the effectiveness of few-shot learning, retrieval-augmented generation, and code structure based context modeling. Our work builds on prior field studies conducted across seven companies in India and the UK, resulting in a dataset of 20,206 human-annotated C comments rated for usefulness. By 2024, contributions from 40 academic teams and 50 hackathon groups expanded this dataset to 24,578 comments. We further introduce a reusable evaluation framework involving human experts and large language model evaluators, grounded in eight dimensions derived from four industry case studies. A subset of 11,797 comments has been annotated for the presence or absence of these dimensions, serving as both input for generation and evaluation. Our results show that GPT-4o mini-trained models produce comments most aligned with human-annotated ones, achieving a similarity score of 0.64, followed by Gemini 1.5 at 0.58. GPT-4.5 achieves the highest alignment with humans as an evaluator, while Llama-3.1-70b performs the lowest. • Generic RAG and source code based architecture for comment generation in C. • Evaluation with human and LLM critics for assessing and improving generated comments. • 11.7K code comments with annotated categories relevant for code comprehension. [ABSTRACT FROM AUTHOR] |
| Copyright of Pattern Recognition Letters is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 189883472 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Majumdar%2C+Srijoni%22">Majumdar, Srijoni</searchLink><relatesTo>1,2</relatesTo> (AUTHOR)<i> s.majumdar@leeds.ac.uk</i><br /><searchLink fieldCode="AR" term="%22Deshpande%2C+Adwita%22">Deshpande, Adwita</searchLink><relatesTo>3</relatesTo> (AUTHOR)<i> adwita.deshpande.22033@iitgoa.ac.in</i><br /><searchLink fieldCode="AR" term="%22Das%2C+Partha+Pratim%22">Das, Partha Pratim</searchLink><relatesTo>4</relatesTo> (AUTHOR)<i> partha.das@ashoka.edu.in</i><br /><searchLink fieldCode="AR" term="%22Chakrabarti%2C+Partha+Pratim%22">Chakrabarti, Partha Pratim</searchLink><relatesTo>2</relatesTo> (AUTHOR)<i> ppchak@cse.iitkgp.ac.in</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Pattern+Recognition+Letters%22">Pattern Recognition Letters</searchLink>. Jan2026, Vol. 199, p295-302. 8p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22C+%28Computer+program+language%29%22">C (Computer program language)</searchLink><br /><searchLink fieldCode="DE" term="%22Language+models%22">Language models</searchLink><br /><searchLink fieldCode="DE" term="%22Evaluation+methodology%22">Evaluation methodology</searchLink><br /><searchLink fieldCode="DE" term="%22Software+maintenance%22">Software maintenance</searchLink><br /><searchLink fieldCode="DE" term="%22Annotations%22">Annotations</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+language+processing%22">Natural language processing</searchLink><br /><searchLink fieldCode="DE" term="%22Machine+learning%22">Machine learning</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Software maintenance requires substantial time for program comprehension. Code comments significantly improve understandability by providing a glass-box view of the code and are thus essential for maintainability. Prior work has analyzed comment attributes, built automated systems to detect irrelevant comments, and applied machine learning to generate meaningful comments. With the rise of large language models, comment generation has accelerated, particularly for Java and Python. In this paper, we present a first-of-its-kind framework for code comment generation in C, a language widely used in low-level tasks. We explore the effectiveness of few-shot learning, retrieval-augmented generation, and code structure based context modeling. Our work builds on prior field studies conducted across seven companies in India and the UK, resulting in a dataset of 20,206 human-annotated C comments rated for usefulness. By 2024, contributions from 40 academic teams and 50 hackathon groups expanded this dataset to 24,578 comments. We further introduce a reusable evaluation framework involving human experts and large language model evaluators, grounded in eight dimensions derived from four industry case studies. A subset of 11,797 comments has been annotated for the presence or absence of these dimensions, serving as both input for generation and evaluation. Our results show that GPT-4o mini-trained models produce comments most aligned with human-annotated ones, achieving a similarity score of 0.64, followed by Gemini 1.5 at 0.58. GPT-4.5 achieves the highest alignment with humans as an evaluator, while Llama-3.1-70b performs the lowest. • Generic RAG and source code based architecture for comment generation in C. • Evaluation with human and LLM critics for assessing and improving generated comments. • 11.7K code comments with annotated categories relevant for code comprehension. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Pattern Recognition Letters is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=189883472 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1016/j.patrec.2025.10.007 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 8 StartPage: 295 Subjects: – SubjectFull: C (Computer program language) Type: general – SubjectFull: Language models Type: general – SubjectFull: Evaluation methodology Type: general – SubjectFull: Software maintenance Type: general – SubjectFull: Annotations Type: general – SubjectFull: Natural language processing Type: general – SubjectFull: Machine learning Type: general Titles: – TitleFull: Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Majumdar, Srijoni – PersonEntity: Name: NameFull: Deshpande, Adwita – PersonEntity: Name: NameFull: Das, Partha Pratim – PersonEntity: Name: NameFull: Chakrabarti, Partha Pratim IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 01 Text: Jan2026 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 01678655 Numbering: – Type: volume Value: 199 Titles: – TitleFull: Pattern Recognition Letters Type: main |
| ResultId | 1 |