Key-based data augmentation with curriculum learning for few-shot code search.

Saved in:
Bibliographic Details
Title: Key-based data augmentation with curriculum learning for few-shot code search.
Authors: Zhang, Fan1,2 (AUTHOR) fanzhang@hnu.edu.cn, Peng, Manman1 (AUTHOR) pengmanman@hnu.edu.cn, Wu, Qiang1 (AUTHOR) wuqiang@hnu.edu.cn, Shen, Yuanyuan1 (AUTHOR) shenyuanyuan@hnu.edu.cn
Source: Neural Computing & Applications. Jan2025, Vol. 37 Issue 3, p1475-1490. 16p.
Subjects: Domain-specific programming languages, Data augmentation, Programming languages, Curriculum frameworks, Natural languages
Abstract: Given a natural language query, code search aims to find matching code snippets from a codebase. Recent works are mainly designed for mainstream programming languages with large amounts of training data. However, code search is also needed for domain-specific programming languages, which have fewer training data, and it is a heavy burden to label a large amount of training data for each domain-specific language. To this end, we propose DAFCS, a data augmentation framework with curriculum learning for few-shot code search tasks. Specifically, we first collect unlabeled codes in the same programming language as the original codes, which can provide additional semantic signals to the original codes. Second, we employ an occlusion-based method to identify key statements in code fragments. Third, we design a set of new key-based augmentation operations for the original codes. Finally, we use curriculum learning to reasonably schedule augmented samples for training well-performing models. We conduct retrieval experiments on a public dataset and find that DAFCS surpasses state-of-the-art methods by 5.42% and 5.05% in the Solidity and SQL domain-specific languages, respectively. Our study shows that DAFCS, which adopts data augmentation and curriculum learning strategies, can achieve promising performance in few-shot code search tasks. [ABSTRACT FROM AUTHOR]
Copyright of Neural Computing & Applications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Full text is not displayed to guests.
FullText Links:
  – Type: pdflink
Text:
  Availability: 1
Header DbId: egs
DbLabel: Engineering Source
An: 182466861
AccessLevel: 6
PubType: Academic Journal
PubTypeId: academicJournal
PreciseRelevancyScore: 0
IllustrationInfo
Items – Name: Title
  Label: Title
  Group: Ti
  Data: Key-based data augmentation with curriculum learning for few-shot code search.
– Name: Author
  Label: Authors
  Group: Au
  Data: <searchLink fieldCode="AR" term="%22Zhang%2C+Fan%22">Zhang, Fan</searchLink><relatesTo>1,2</relatesTo> (AUTHOR)<i> fanzhang@hnu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Peng%2C+Manman%22">Peng, Manman</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> pengmanman@hnu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Wu%2C+Qiang%22">Wu, Qiang</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> wuqiang@hnu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Shen%2C+Yuanyuan%22">Shen, Yuanyuan</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> shenyuanyuan@hnu.edu.cn</i>
– Name: TitleSource
  Label: Source
  Group: Src
  Data: <searchLink fieldCode="JN" term="%22Neural+Computing+%26+Applications%22">Neural Computing & Applications</searchLink>. Jan2025, Vol. 37 Issue 3, p1475-1490. 16p.
– Name: Subject
  Label: Subjects
  Group: Su
  Data: <searchLink fieldCode="DE" term="%22Domain-specific+programming+languages%22">Domain-specific programming languages</searchLink><br /><searchLink fieldCode="DE" term="%22Data+augmentation%22">Data augmentation</searchLink><br /><searchLink fieldCode="DE" term="%22Programming+languages%22">Programming languages</searchLink><br /><searchLink fieldCode="DE" term="%22Curriculum+frameworks%22">Curriculum frameworks</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+languages%22">Natural languages</searchLink>
– Name: Abstract
  Label: Abstract
  Group: Ab
  Data: Given a natural language query, code search aims to find matching code snippets from a codebase. Recent works are mainly designed for mainstream programming languages with large amounts of training data. However, code search is also needed for domain-specific programming languages, which have fewer training data, and it is a heavy burden to label a large amount of training data for each domain-specific language. To this end, we propose DAFCS, a data augmentation framework with curriculum learning for few-shot code search tasks. Specifically, we first collect unlabeled codes in the same programming language as the original codes, which can provide additional semantic signals to the original codes. Second, we employ an occlusion-based method to identify key statements in code fragments. Third, we design a set of new key-based augmentation operations for the original codes. Finally, we use curriculum learning to reasonably schedule augmented samples for training well-performing models. We conduct retrieval experiments on a public dataset and find that DAFCS surpasses state-of-the-art methods by 5.42% and 5.05% in the Solidity and SQL domain-specific languages, respectively. Our study shows that DAFCS, which adopts data augmentation and curriculum learning strategies, can achieve promising performance in few-shot code search tasks. [ABSTRACT FROM AUTHOR]
– Name: AbstractSuppliedCopyright
  Label:
  Group: Ab
  Data: <i>Copyright of Neural Computing & Applications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.)
PLink https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=182466861
RecordInfo BibRecord:
  BibEntity:
    Identifiers:
      – Type: doi
        Value: 10.1007/s00521-024-10670-9
    Languages:
      – Code: eng
        Text: English
    PhysicalDescription:
      Pagination:
        PageCount: 16
        StartPage: 1475
    Subjects:
      – SubjectFull: Domain-specific programming languages
        Type: general
      – SubjectFull: Data augmentation
        Type: general
      – SubjectFull: Programming languages
        Type: general
      – SubjectFull: Curriculum frameworks
        Type: general
      – SubjectFull: Natural languages
        Type: general
    Titles:
      – TitleFull: Key-based data augmentation with curriculum learning for few-shot code search.
        Type: main
  BibRelationships:
    HasContributorRelationships:
      – PersonEntity:
          Name:
            NameFull: Zhang, Fan
      – PersonEntity:
          Name:
            NameFull: Peng, Manman
      – PersonEntity:
          Name:
            NameFull: Wu, Qiang
      – PersonEntity:
          Name:
            NameFull: Shen, Yuanyuan
    IsPartOfRelationships:
      – BibEntity:
          Dates:
            – D: 21
              M: 01
              Text: Jan2025
              Type: published
              Y: 2025
          Identifiers:
            – Type: issn-print
              Value: 09410643
          Numbering:
            – Type: volume
              Value: 37
            – Type: issue
              Value: 3
          Titles:
            – TitleFull: Neural Computing & Applications
              Type: main
ResultId 1