Key-based data augmentation with curriculum learning for few-shot code search.
Saved in:
| Title: | Key-based data augmentation with curriculum learning for few-shot code search. |
|---|---|
| Authors: | Zhang, Fan1,2 (AUTHOR) fanzhang@hnu.edu.cn, Peng, Manman1 (AUTHOR) pengmanman@hnu.edu.cn, Wu, Qiang1 (AUTHOR) wuqiang@hnu.edu.cn, Shen, Yuanyuan1 (AUTHOR) shenyuanyuan@hnu.edu.cn |
| Source: | Neural Computing & Applications. Jan2025, Vol. 37 Issue 3, p1475-1490. 16p. |
| Subjects: | Domain-specific programming languages, Data augmentation, Programming languages, Curriculum frameworks, Natural languages |
| Abstract: | Given a natural language query, code search aims to find matching code snippets from a codebase. Recent works are mainly designed for mainstream programming languages with large amounts of training data. However, code search is also needed for domain-specific programming languages, which have fewer training data, and it is a heavy burden to label a large amount of training data for each domain-specific language. To this end, we propose DAFCS, a data augmentation framework with curriculum learning for few-shot code search tasks. Specifically, we first collect unlabeled codes in the same programming language as the original codes, which can provide additional semantic signals to the original codes. Second, we employ an occlusion-based method to identify key statements in code fragments. Third, we design a set of new key-based augmentation operations for the original codes. Finally, we use curriculum learning to reasonably schedule augmented samples for training well-performing models. We conduct retrieval experiments on a public dataset and find that DAFCS surpasses state-of-the-art methods by 5.42% and 5.05% in the Solidity and SQL domain-specific languages, respectively. Our study shows that DAFCS, which adopts data augmentation and curriculum learning strategies, can achieve promising performance in few-shot code search tasks. [ABSTRACT FROM AUTHOR] |
| Copyright of Neural Computing & Applications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
|
Full text is not displayed to guests.
Login for full access.
|
|
| FullText | Links: – Type: pdflink Text: Availability: 1 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 182466861 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: Key-based data augmentation with curriculum learning for few-shot code search. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Zhang%2C+Fan%22">Zhang, Fan</searchLink><relatesTo>1,2</relatesTo> (AUTHOR)<i> fanzhang@hnu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Peng%2C+Manman%22">Peng, Manman</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> pengmanman@hnu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Wu%2C+Qiang%22">Wu, Qiang</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> wuqiang@hnu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Shen%2C+Yuanyuan%22">Shen, Yuanyuan</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> shenyuanyuan@hnu.edu.cn</i> – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Neural+Computing+%26+Applications%22">Neural Computing & Applications</searchLink>. Jan2025, Vol. 37 Issue 3, p1475-1490. 16p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Domain-specific+programming+languages%22">Domain-specific programming languages</searchLink><br /><searchLink fieldCode="DE" term="%22Data+augmentation%22">Data augmentation</searchLink><br /><searchLink fieldCode="DE" term="%22Programming+languages%22">Programming languages</searchLink><br /><searchLink fieldCode="DE" term="%22Curriculum+frameworks%22">Curriculum frameworks</searchLink><br /><searchLink fieldCode="DE" term="%22Natural+languages%22">Natural languages</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Given a natural language query, code search aims to find matching code snippets from a codebase. Recent works are mainly designed for mainstream programming languages with large amounts of training data. However, code search is also needed for domain-specific programming languages, which have fewer training data, and it is a heavy burden to label a large amount of training data for each domain-specific language. To this end, we propose DAFCS, a data augmentation framework with curriculum learning for few-shot code search tasks. Specifically, we first collect unlabeled codes in the same programming language as the original codes, which can provide additional semantic signals to the original codes. Second, we employ an occlusion-based method to identify key statements in code fragments. Third, we design a set of new key-based augmentation operations for the original codes. Finally, we use curriculum learning to reasonably schedule augmented samples for training well-performing models. We conduct retrieval experiments on a public dataset and find that DAFCS surpasses state-of-the-art methods by 5.42% and 5.05% in the Solidity and SQL domain-specific languages, respectively. Our study shows that DAFCS, which adopts data augmentation and curriculum learning strategies, can achieve promising performance in few-shot code search tasks. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Neural Computing & Applications is the property of Springer Nature and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=182466861 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1007/s00521-024-10670-9 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 16 StartPage: 1475 Subjects: – SubjectFull: Domain-specific programming languages Type: general – SubjectFull: Data augmentation Type: general – SubjectFull: Programming languages Type: general – SubjectFull: Curriculum frameworks Type: general – SubjectFull: Natural languages Type: general Titles: – TitleFull: Key-based data augmentation with curriculum learning for few-shot code search. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Zhang, Fan – PersonEntity: Name: NameFull: Peng, Manman – PersonEntity: Name: NameFull: Wu, Qiang – PersonEntity: Name: NameFull: Shen, Yuanyuan IsPartOfRelationships: – BibEntity: Dates: – D: 21 M: 01 Text: Jan2025 Type: published Y: 2025 Identifiers: – Type: issn-print Value: 09410643 Numbering: – Type: volume Value: 37 – Type: issue Value: 3 Titles: – TitleFull: Neural Computing & Applications Type: main |
| ResultId | 1 |