Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning.

Saved in:
Bibliographic Details
Title: Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning.
Authors: Majumdar, Srijoni1,2 (AUTHOR) s.majumdar@leeds.ac.uk, Deshpande, Adwita3 (AUTHOR) adwita.deshpande.22033@iitgoa.ac.in, Das, Partha Pratim4 (AUTHOR) partha.das@ashoka.edu.in, Chakrabarti, Partha Pratim2 (AUTHOR) ppchak@cse.iitkgp.ac.in
Source: Pattern Recognition Letters. Jan2026, Vol. 199, p295-302. 8p.
Subjects: C (Computer program language), Language models, Evaluation methodology, Software maintenance, Annotations, Natural language processing, Machine learning
Abstract: Software maintenance requires substantial time for program comprehension. Code comments significantly improve understandability by providing a glass-box view of the code and are thus essential for maintainability. Prior work has analyzed comment attributes, built automated systems to detect irrelevant comments, and applied machine learning to generate meaningful comments. With the rise of large language models, comment generation has accelerated, particularly for Java and Python. In this paper, we present a first-of-its-kind framework for code comment generation in C, a language widely used in low-level tasks. We explore the effectiveness of few-shot learning, retrieval-augmented generation, and code structure based context modeling. Our work builds on prior field studies conducted across seven companies in India and the UK, resulting in a dataset of 20,206 human-annotated C comments rated for usefulness. By 2024, contributions from 40 academic teams and 50 hackathon groups expanded this dataset to 24,578 comments. We further introduce a reusable evaluation framework involving human experts and large language model evaluators, grounded in eight dimensions derived from four industry case studies. A subset of 11,797 comments has been annotated for the presence or absence of these dimensions, serving as both input for generation and evaluation. Our results show that GPT-4o mini-trained models produce comments most aligned with human-annotated ones, achieving a similarity score of 0.64, followed by Gemini 1.5 at 0.58. GPT-4.5 achieves the highest alignment with humans as an evaluator, while Llama-3.1-70b performs the lowest. • Generic RAG and source code based architecture for comment generation in C. • Evaluation with human and LLM critics for assessing and improving generated comments. • 11.7K code comments with annotated categories relevant for code comprehension. [ABSTRACT FROM AUTHOR]
Copyright of Pattern Recognition Letters is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
FullText Text:
  Availability: 0
Header DbId: egs
DbLabel: Engineering Source
An: 189883472
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Majumdar%2C+Srijoni%22">Majumdar, Srijoni</searchLink><relatesTo>1,2</relatesTo> (AUTHOR)<i> s.majumdar@leeds.ac.uk</i><br /><searchLink fieldCode="AR" term="%22Deshpande%2C+Adwita%22">Deshpande, Adwita</searchLink><relatesTo>3</relatesTo> (AUTHOR)<i> adwita.deshpande.22033@iitgoa.ac.in</i><br /><searchLink fieldCode="AR" term="%22Das%2C+Partha+Pratim%22">Das, Partha Pratim</searchLink><relatesTo>4</relatesTo> (AUTHOR)<i> partha.das@ashoka.edu.in</i><br /><searchLink fieldCode="AR" term="%22Chakrabarti%2C+Partha+Pratim%22">Chakrabarti, Partha Pratim</searchLink><relatesTo>2</relatesTo> (AUTHOR)<i> ppchak@cse.iitkgp.ac.in</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Pattern+Recognition+Letters%22">Pattern Recognition Letters</searchLink>. Jan2026, Vol. 199, p295-302. 8p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22C+%28Computer+program+language%29%22">C (Computer program language)</searchLink><br /><searchLink fieldCode="DE" term="%22Language+models%22">Language models</searchLink><br /><searchLink fieldCode="DE" term="%22Evaluation+methodology%22">Evaluation methodology</searchLink><br /><searchLink fieldCode="DE" term="%22Software+maintenance%22">Software maintenance</searchLink><br /><searchLink fieldCode="DE" term="%22Annotations%22">Annotations</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+language+processing%22">Natural language processing</searchLink><br /><searchLink fieldCode="DE" term="%22Machine+learning%22">Machine learning</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Software maintenance requires substantial time for program comprehension. Code comments significantly improve understandability by providing a glass-box view of the code and are thus essential for maintainability. Prior work has analyzed comment attributes, built automated systems to detect irrelevant comments, and applied machine learning to generate meaningful comments. With the rise of large language models, comment generation has accelerated, particularly for Java and Python. In this paper, we present a first-of-its-kind framework for code comment generation in C, a language widely used in low-level tasks. We explore the effectiveness of few-shot learning, retrieval-augmented generation, and code structure based context modeling. Our work builds on prior field studies conducted across seven companies in India and the UK, resulting in a dataset of 20,206 human-annotated C comments rated for usefulness. By 2024, contributions from 40 academic teams and 50 hackathon groups expanded this dataset to 24,578 comments. We further introduce a reusable evaluation framework involving human experts and large language model evaluators, grounded in eight dimensions derived from four industry case studies. A subset of 11,797 comments has been annotated for the presence or absence of these dimensions, serving as both input for generation and evaluation. Our results show that GPT-4o mini-trained models produce comments most aligned with human-annotated ones, achieving a similarity score of 0.64, followed by Gemini 1.5 at 0.58. GPT-4.5 achieves the highest alignment with humans as an evaluator, while Llama-3.1-70b performs the lowest. • Generic RAG and source code based architecture for comment generation in C. • Evaluation with human and LLM critics for assessing and improving generated comments. • 11.7K code comments with annotated categories relevant for code comprehension. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Pattern Recognition Letters is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=189883472
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1016/j.patrec.2025.10.007
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 8
        StartPage: 295
    Subjects:
      – SubjectFull: C (Computer program language)
        Type: general
      – SubjectFull: Language models
        Type: general
      – SubjectFull: Evaluation methodology
        Type: general
      – SubjectFull: Software maintenance
        Type: general
      – SubjectFull: Annotations
        Type: general
      – SubjectFull: Natural language processing
        Type: general
      – SubjectFull: Machine learning
        Type: general
    Titles:
      – TitleFull: Comprehending C codes with LLMs: Effective comment generation through retrieval and reasoning.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Majumdar, Srijoni
      – PersonEntity:
          Name:
            NameFull: Deshpande, Adwita
      – PersonEntity:
          Name:
            NameFull: Das, Partha Pratim
      – PersonEntity:
          Name:
            NameFull: Chakrabarti, Partha Pratim
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 01
              M: 01
              Text: Jan2026
              Type: published
              Y: 2026
          Identifiers:
            – Type: issn-print
              Value: 01678655
          Numbering:
            – Type: volume
              Value: 199
          Titles:
            – TitleFull: Pattern Recognition Letters
              Type: main
ResultId 1