DP-S3: software defect prediction through feature fusion with syntax trees, program slices and standard features.

Saved in:
Bibliographic Details
Title: DP-S3: software defect prediction through feature fusion with syntax trees, program slices and standard features.
Authors: Chen, Jinfu1 (AUTHOR), Li, Zhehao2 (AUTHOR), Xu, Jiaping2 (AUTHOR), Cai, Saihua1 (AUTHOR), Sun, Jian2 (AUTHOR), Sosu, Rexford Nii Ayitey3 (AUTHOR), Chen, Jiming1 (AUTHOR)
Source: Computer Journal. Mar2026, Vol. 69 Issue 3, p470-491. 22p.
Subjects: Defect tracking (Computer software development), Software measurement, Java programming language, Long short-term memory, Data analytics
Abstract: Software defect prediction (SDP) is crucial for enhancing software quality and reducing development costs. Prevailing SDP methods often depend on traditional code metrics, which inadequately capture vital semantic information from source code, thereby limiting defect identification accuracy. This paper introduces DP-S3, a novel SDP model that integrates features from abstract syntax trees (ASTs), program slices, and standard metrics. DP-S3 extracts ASTs and program slices, transforming them into vector representations. A hierarchical long short-term memory network then learns semantic features from these vectors, which are combined with standard metrics from the PROMISE repository. A key innovation is our feature fusion strategy employing a channel self-attention mechanism to dynamically weight the three feature sets. We evaluated DP-S3 on seven open-source Java projects from the Apache repository against several state-of-the-art methods. The results demonstrate DP-S3's superior performance, achieving average improvements of up to 3.8% in area under the receiver operating characteristic curve, 4.5% in F1, and 7.5% in Matthews correlation coefficient over baselines, showcasing its effectiveness. Key limitations include its current focus on Java projects and within-project defect prediction. Nevertheless, this work concludes that a synergistic fusion of syntactic, semantic (slice-based), and traditional metric features, guided by attention mechanisms, significantly enhances SDP capabilities and offers a promising direction for future research. [ABSTRACT FROM AUTHOR]
Copyright of Computer Journal is the property of Oxford University Press / USA and its content may not be copied or emailed to multiple sites without the copyright holder's express written permission. Additionally, content may not be used with any artificial intelligence tools or machine learning technologies. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Database: Engineering Source
Description
Abstract:Software defect prediction (SDP) is crucial for enhancing software quality and reducing development costs. Prevailing SDP methods often depend on traditional code metrics, which inadequately capture vital semantic information from source code, thereby limiting defect identification accuracy. This paper introduces DP-S3, a novel SDP model that integrates features from abstract syntax trees (ASTs), program slices, and standard metrics. DP-S3 extracts ASTs and program slices, transforming them into vector representations. A hierarchical long short-term memory network then learns semantic features from these vectors, which are combined with standard metrics from the PROMISE repository. A key innovation is our feature fusion strategy employing a channel self-attention mechanism to dynamically weight the three feature sets. We evaluated DP-S3 on seven open-source Java projects from the Apache repository against several state-of-the-art methods. The results demonstrate DP-S3's superior performance, achieving average improvements of up to 3.8% in area under the receiver operating characteristic curve, 4.5% in F1, and 7.5% in Matthews correlation coefficient over baselines, showcasing its effectiveness. Key limitations include its current focus on Java projects and within-project defect prediction. Nevertheless, this work concludes that a synergistic fusion of syntactic, semantic (slice-based), and traditional metric features, guided by attention mechanisms, significantly enhances SDP capabilities and offers a promising direction for future research. [ABSTRACT FROM AUTHOR]
ISSN:00104620
DOI:10.1093/comjnl/bxaf126