CL-ECDD: A contrastive learning framework for enterprise-level code defect detection.

Saved in:
Bibliographic Details
Title: CL-ECDD: A contrastive learning framework for enterprise-level code defect detection.
Authors: Xue, Liyuan1 (AUTHOR), Leng, Fangling1 (AUTHOR), Zou, Junhao1 (AUTHOR), Bao, Yubin1 (AUTHOR) baoyubin@cse.neu.edu.cn, Zheng, Zikuan2 (AUTHOR), Yu, Ge1 (AUTHOR)
Source: Neurocomputing. Jun2026, Vol. 680, pN.PAG-N.PAG. 1p.
Subjects: Defect tracking (Computer software development), Machine learning, Encoding, Computer software
Abstract: Existing research on AI-based code defect detection primarily relies on open-source code repositories, which struggle to adapt to the practical scenarios and requirements of enterprise-level applications. To address this issue, this paper proposes a novel code defect detection framework, CL-ECDD (Enterprise-Level Code Defect Detection with Momentum and Hard Negative Sampling), based on analyzing enterprise scenario data characteristics and optimized contrastive learning paradigms. The framework adheres to the design principle of "maximizing mutual information relevant to downstream tasks while minimizing redundant mutual information." This work integrates several key techniques. It uses task-oriented data augmentation and a sample pool built with hard negative sampling. Furthermore, it incorporates an auxiliary task for functional feature learning and an attention mechanism for word-level features. These components enhance the model's detection capabilities across word-level, semantic-level, and functional-level granularities, enabling joint modeling of code semantics and business functionalities. Experiments were conducted on the open-source Bug-Fix Pairs (BFP) and real-world industrial scenario datasets. In the industrial scenario, compared to existing baseline models, CL-ECDD achieves an F1 score improvement exceeding 1.5%, with a precision of 97.72% and a recall of 96.41%, significantly validating its effectiveness and robustness in complex enterprise environments. • Proposed a CL-ECDD framework, a contrastive learning framework for enterprise-level code defect detection. • A hard negative sample pool improves contrastive learning effectiveness and stability. • Multi-task learning with functional clustering enhances semantic representation learning. • Self-attention captures local code fragment relations for finer-grained defect detection. • The model achieves 97.72% precision and 96.41% recall on industrial real-world data. [ABSTRACT FROM AUTHOR]
Copyright of Neurocomputing is the property of Elsevier B.V. and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:Existing research on AI-based code defect detection primarily relies on open-source code repositories, which struggle to adapt to the practical scenarios and requirements of enterprise-level applications. To address this issue, this paper proposes a novel code defect detection framework, CL-ECDD (Enterprise-Level Code Defect Detection with Momentum and Hard Negative Sampling), based on analyzing enterprise scenario data characteristics and optimized contrastive learning paradigms. The framework adheres to the design principle of "maximizing mutual information relevant to downstream tasks while minimizing redundant mutual information." This work integrates several key techniques. It uses task-oriented data augmentation and a sample pool built with hard negative sampling. Furthermore, it incorporates an auxiliary task for functional feature learning and an attention mechanism for word-level features. These components enhance the model's detection capabilities across word-level, semantic-level, and functional-level granularities, enabling joint modeling of code semantics and business functionalities. Experiments were conducted on the open-source Bug-Fix Pairs (BFP) and real-world industrial scenario datasets. In the industrial scenario, compared to existing baseline models, CL-ECDD achieves an F1 score improvement exceeding 1.5%, with a precision of 97.72% and a recall of 96.41%, significantly validating its effectiveness and robustness in complex enterprise environments. • Proposed a CL-ECDD framework, a contrastive learning framework for enterprise-level code defect detection. • A hard negative sample pool improves contrastive learning effectiveness and stability. • Multi-task learning with functional clustering enhances semantic representation learning. • Self-attention captures local code fragment relations for finer-grained defect detection. • The model achieves 97.72% precision and 96.41% recall on industrial real-world data. [ABSTRACT FROM AUTHOR]
ISSN:09252312
DOI:10.1016/j.neucom.2026.133298