An Investigation Towards Resampling Techniques and Classification Algorithms on CM1 NASA PROMISEDataset for Software Defect Prediction

Dublin Core

Title

Subject

software defect prediction; machine learning; classificationalgorithm; imbalanced data; resampling

Description

Software defect prediction is a practical approach to improvingthequality and efficiency of software testing processes. However,establishing robust and trustworthy models for software defect prediction is quite challenging due to the limitation of historical datasets thatmostdevelopersare capable of collecting.The inherently imbalanced nature of mostsoftware defectdatasets also posed another problem. Therefore, aninsightinto howtoproperlyconstruct software defect prediction models on a small,yet imbalanced,dataset isrequired. The objective of thisstudyis thereforeto provide the required insightbyway ofinvestigating and comparinga number ofresampling techniques,classification algorithms, and evaluation measurements (metrics) for buildingsoftware defect prediction models on CM1 NASA PROMISE data as therepresentation of asmall yet unbalanceddataset. This study is comparative descriptive research.It followsapositivist(quantitative)approach. Data were collected through observation towards experiments on four categoriesof resampling techniques (oversampling, under sampling, ensemble, and combine) combined withthree categoriesof machine learning classification algorithms (traditional, ensemble, and neural network) to predictdefective software moduleson CM1 NASA PROMISE dataset. Trainingprocesses were carried outtwice, each of which used the5-fold cross-validationand the70% training and 30% testing data splitting(holdout)method. Our result shows that the combinedandoversamplingtechniques provide apositive effect on the performance of the models.In the context of classification models, ensemble-based algorithms,which extend the decision treeclassification mechanism such as Random Forestand eXtreme Gradient Boosting, achieved sufficiently good performance for predicting defective software modules.Regardingthe evaluation measurements, thecombined and rank-based performance metrics yielded modest variance values, which isdeemed suitable forevaluatingtheperformance of the models inthis contex

Creator

Agung Fatwanto1*, Muh Nur Aslam2, Rebbecah Ndugi3, Muhammad Syafrudin

Source

https://jurnal.iaii.or.id/index.php/RESTI/article/view/5910/973

Publisher

Informatics Department, Facultyof Science and Technology, UIN Sunan Kalijaga, Yogyakarta, Indonesia

Date

14-10-2024

Contributor

FAJAR BAGUS W

Format

PDF

Language

ENGLISH

Type

TEXT

Files

5910-Article Text-20659-2-10-20241016 (1).pdf

Collection

Vol 8 No 5 (2024)

Citation

Agung Fatwanto1*, Muh Nur Aslam2, Rebbecah Ndugi3, Muhammad Syafrudin, “An Investigation Towards Resampling Techniques and Classification Algorithms on CM1 NASA PROMISEDataset for Software Defect Prediction,” Repository Horizon University Indonesia, accessed January 26, 2026, https://repository.horizon.ac.id/items/show/10439.