The Value of Imbalance Ensemble Techniques on Software Defect Prediction.

Saved in:
Bibliographic Details
Title: The Value of Imbalance Ensemble Techniques on Software Defect Prediction.
Authors: Altamimi, Ahmad M.1 (AUTHOR) a_altamimi@psut.edu.jo, Azzeh, Mohammad2 (AUTHOR) m.azzeh@psut.edu.jo, Sowan, Bilal3 (AUTHOR) bilal.sowan@uop.edu.jo
Source: Arabian Journal for Science & Engineering (Springer Science & Business Media B.V. ). Mar2026, Vol. 51 Issue 5, p6573-6597. 25p.
Subject Terms: *Ensemble learning, *Defect tracking (Computer software development), *Resampling (Statistics), *Bootstrap aggregation (Algorithms), *Machine learning
Abstract: Software Defect Prediction (SDP) plays a critical role in ensuring software quality by identifying high-risk components prone to defects or bugs. Machine learning classification methods have been widely employed for defect prediction; however, their effectiveness is significantly impacted by class imbalance in training data, where non-defective modules vastly outnumber defective ones. To address this issue, various techniques have been proposed, including data rebalancing (IDRB) methods and Imbalance Ensemble Learning (IEL) approaches. While previous research has explored IDRB techniques, the comparative effectiveness of IEL remains underexamined. Furthermore, no conclusive evidence exists on whether IEL outperforms IDRB across diverse experimental settings. This study systematically evaluates IEL and IDRB techniques through extensive experiments on 38 publicly available defect prediction datasets from diverse domains. The evaluation incorporates nine base learners, five data rebalancing techniques, and six imbalance ensemble methods. Results indicate that IEL techniques generally outperform IDRB approaches, particularly on datasets with high and very high imbalance ratios. Additionally, IEL combined with Bagging and undersampling proves highly effective in severe imbalance scenarios. Conversely, IDRB techniques—especially SMOTE and ROSE—emerge as competitive alternatives for lower imbalance ratios. Given the lack of a definitive trend regarding the optimal IEL technique for different learners, practitioners are advised to select computationally efficient methods that provide accurate results. [ABSTRACT FROM AUTHOR]
Database: Energy & Power Source
Description
Abstract:Software Defect Prediction (SDP) plays a critical role in ensuring software quality by identifying high-risk components prone to defects or bugs. Machine learning classification methods have been widely employed for defect prediction; however, their effectiveness is significantly impacted by class imbalance in training data, where non-defective modules vastly outnumber defective ones. To address this issue, various techniques have been proposed, including data rebalancing (IDRB) methods and Imbalance Ensemble Learning (IEL) approaches. While previous research has explored IDRB techniques, the comparative effectiveness of IEL remains underexamined. Furthermore, no conclusive evidence exists on whether IEL outperforms IDRB across diverse experimental settings. This study systematically evaluates IEL and IDRB techniques through extensive experiments on 38 publicly available defect prediction datasets from diverse domains. The evaluation incorporates nine base learners, five data rebalancing techniques, and six imbalance ensemble methods. Results indicate that IEL techniques generally outperform IDRB approaches, particularly on datasets with high and very high imbalance ratios. Additionally, IEL combined with Bagging and undersampling proves highly effective in severe imbalance scenarios. Conversely, IDRB techniques—especially SMOTE and ROSE—emerge as competitive alternatives for lower imbalance ratios. Given the lack of a definitive trend regarding the optimal IEL technique for different learners, practitioners are advised to select computationally efficient methods that provide accurate results. [ABSTRACT FROM AUTHOR]
ISSN:2193567X
DOI:10.1007/s13369-025-10773-y