CL-ECDD: A contrastive learning framework for enterprise-level code defect detection.
Saved in:
| Title: | CL-ECDD: A contrastive learning framework for enterprise-level code defect detection. |
|---|---|
| Authors: | Xue, Liyuan1 (AUTHOR), Leng, Fangling1 (AUTHOR), Zou, Junhao1 (AUTHOR), Bao, Yubin1 (AUTHOR) baoyubin@cse.neu.edu.cn, Zheng, Zikuan2 (AUTHOR), Yu, Ge1 (AUTHOR) |
| Source: | Neurocomputing. Jun2026, Vol. 680, pN.PAG-N.PAG. 1p. |
| Subjects: | Defect tracking (Computer software development), Machine learning, Encoding, Computer software |
| Abstract: | Existing research on AI-based code defect detection primarily relies on open-source code repositories, which struggle to adapt to the practical scenarios and requirements of enterprise-level applications. To address this issue, this paper proposes a novel code defect detection framework, CL-ECDD (Enterprise-Level Code Defect Detection with Momentum and Hard Negative Sampling), based on analyzing enterprise scenario data characteristics and optimized contrastive learning paradigms. The framework adheres to the design principle of "maximizing mutual information relevant to downstream tasks while minimizing redundant mutual information." This work integrates several key techniques. It uses task-oriented data augmentation and a sample pool built with hard negative sampling. Furthermore, it incorporates an auxiliary task for functional feature learning and an attention mechanism for word-level features. These components enhance the model's detection capabilities across word-level, semantic-level, and functional-level granularities, enabling joint modeling of code semantics and business functionalities. Experiments were conducted on the open-source Bug-Fix Pairs (BFP) and real-world industrial scenario datasets. In the industrial scenario, compared to existing baseline models, CL-ECDD achieves an F1 score improvement exceeding 1.5%, with a precision of 97.72% and a recall of 96.41%, significantly validating its effectiveness and robustness in complex enterprise environments. • Proposed a CL-ECDD framework, a contrastive learning framework for enterprise-level code defect detection. • A hard negative sample pool improves contrastive learning effectiveness and stability. • Multi-task learning with functional clustering enhances semantic representation learning. • Self-attention captures local code fragment relations for finer-grained defect detection. • The model achieves 97.72% precision and 96.41% recall on industrial real-world data. [ABSTRACT FROM AUTHOR] |
| Copyright of Neurocomputing is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.) | |
| Database: | Engineering Source |
| FullText | Text: Availability: 0 |
|---|---|
| Header | DbId: egs DbLabel: Engineering Source An: 192692362 AccessLevel: 6 PubType: Academic Journal PubTypeId: academicJournal PreciseRelevancyScore: 0 |
| IllustrationInfo | |
| Items | – Name: Title Label: Title Group: Ti Data: CL-ECDD: A contrastive learning framework for enterprise-level code defect detection. – Name: Author Label: Authors Group: Au Data: <searchLink fieldCode="AR" term="%22Xue%2C+Liyuan%22">Xue, Liyuan</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Leng%2C+Fangling%22">Leng, Fangling</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Zou%2C+Junhao%22">Zou, Junhao</searchLink><relatesTo>1</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Bao%2C+Yubin%22">Bao, Yubin</searchLink><relatesTo>1</relatesTo> (AUTHOR)<i> baoyubin@cse.neu.edu.cn</i><br /><searchLink fieldCode="AR" term="%22Zheng%2C+Zikuan%22">Zheng, Zikuan</searchLink><relatesTo>2</relatesTo> (AUTHOR)<br /><searchLink fieldCode="AR" term="%22Yu%2C+Ge%22">Yu, Ge</searchLink><relatesTo>1</relatesTo> (AUTHOR) – Name: TitleSource Label: Source Group: Src Data: <searchLink fieldCode="JN" term="%22Neurocomputing%22">Neurocomputing</searchLink>. Jun2026, Vol. 680, pN.PAG-N.PAG. 1p. – Name: Subject Label: Subjects Group: Su Data: <searchLink fieldCode="DE" term="%22Defect+tracking+%28Computer+software+development%29%22">Defect tracking (Computer software development)</searchLink><br /><searchLink fieldCode="DE" term="%22Machine+learning%22">Machine learning</searchLink><br /><searchLink fieldCode="DE" term="%22Encoding%22">Encoding</searchLink><br /><searchLink fieldCode="DE" term="%22Computer+software%22">Computer software</searchLink> – Name: Abstract Label: Abstract Group: Ab Data: Existing research on AI-based code defect detection primarily relies on open-source code repositories, which struggle to adapt to the practical scenarios and requirements of enterprise-level applications. To address this issue, this paper proposes a novel code defect detection framework, CL-ECDD (Enterprise-Level Code Defect Detection with Momentum and Hard Negative Sampling), based on analyzing enterprise scenario data characteristics and optimized contrastive learning paradigms. The framework adheres to the design principle of "maximizing mutual information relevant to downstream tasks while minimizing redundant mutual information." This work integrates several key techniques. It uses task-oriented data augmentation and a sample pool built with hard negative sampling. Furthermore, it incorporates an auxiliary task for functional feature learning and an attention mechanism for word-level features. These components enhance the model's detection capabilities across word-level, semantic-level, and functional-level granularities, enabling joint modeling of code semantics and business functionalities. Experiments were conducted on the open-source Bug-Fix Pairs (BFP) and real-world industrial scenario datasets. In the industrial scenario, compared to existing baseline models, CL-ECDD achieves an F1 score improvement exceeding 1.5%, with a precision of 97.72% and a recall of 96.41%, significantly validating its effectiveness and robustness in complex enterprise environments. • Proposed a CL-ECDD framework, a contrastive learning framework for enterprise-level code defect detection. • A hard negative sample pool improves contrastive learning effectiveness and stability. • Multi-task learning with functional clustering enhances semantic representation learning. • Self-attention captures local code fragment relations for finer-grained defect detection. • The model achieves 97.72% precision and 96.41% recall on industrial real-world data. [ABSTRACT FROM AUTHOR] – Name: AbstractSuppliedCopyright Label: Group: Ab Data: <i>Copyright of Neurocomputing is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract.</i> (Copyright applies to all Abstracts.) |
| PLink | https://search.ebscohost.com/login.aspx?direct=true&site=eds-live&db=egs&AN=192692362 |
| RecordInfo | BibRecord: BibEntity: Identifiers: – Type: doi Value: 10.1016/j.neucom.2026.133298 Languages: – Code: eng Text: English PhysicalDescription: Pagination: PageCount: 1 StartPage: N.PAG Subjects: – SubjectFull: Defect tracking (Computer software development) Type: general – SubjectFull: Machine learning Type: general – SubjectFull: Encoding Type: general – SubjectFull: Computer software Type: general Titles: – TitleFull: CL-ECDD: A contrastive learning framework for enterprise-level code defect detection. Type: main BibRelationships: HasContributorRelationships: – PersonEntity: Name: NameFull: Xue, Liyuan – PersonEntity: Name: NameFull: Leng, Fangling – PersonEntity: Name: NameFull: Zou, Junhao – PersonEntity: Name: NameFull: Bao, Yubin – PersonEntity: Name: NameFull: Zheng, Zikuan – PersonEntity: Name: NameFull: Yu, Ge IsPartOfRelationships: – BibEntity: Dates: – D: 01 M: 06 Text: Jun2026 Type: published Y: 2026 Identifiers: – Type: issn-print Value: 09252312 Numbering: – Type: volume Value: 680 Titles: – TitleFull: Neurocomputing Type: main |
| ResultId | 1 |