Improving Classification Performance on Imbalanced Stroke Datasets Using Oversampling Techniques

Dublin Core

Title

Improving Classification Performance on Imbalanced Stroke Datasets Using Oversampling Techniques

Subject

borderline-SMOTE; imbalance data; SMOTE; stroke prediction; XGBoost method

Description

Stroke is the second leading cause of death globally and significantly contributes to long-term disability. While machine learning techniques have been increasingly used for early stroke detection, their performance is often limited by imbalanced data distributions that bias classification outcomes. This study aims to investigate the effectiveness of three oversampling techniques—SMOTE, Borderline-SMOTE, and SVM-SMOTE—in improving stroke classification performance on imbalanced datasets. Oversampling methods are applied to balance class distributions, followed by the implementation of Random Forest and XGBoost classifiers for stroke prediction. Experimental results demonstrate that oversampling techniques substantially improve classification performance, particularly in the Matthews Correlation Coefficient (MCC) and Area Under the Curve (AUC) metrics. Among the tested methods, Borderline-SMOTE yields the best performance, achieving accuracies of 96.45% with Random Forest and 96.41% with XGBoost. Moreover, it increases MCC by 87.51% and AUC by 45.40% for Random Forest, and MCC by 76.52% and AUC by 41.81% for XGBoost, compared without oversampling. The results demonstrate that Borderline-SMOTE effectively addresses data imbalance, enhances model robustness, and improves the detection of minority stroke cases in classification task

Creator

Muhammad Innuddin1*, Hairani Hairani2, M. Thonthowi Jauhari3, Lalu Zazuli Azhar Mardedi4

Source

https://jurnal.iaii.or.id/index.php/RESTI/article/view/6859/1158

Publisher

Department of Computer Science, Faculty of Engineering, Universitas Bumigora, Mataram, Indonesia

Date

October 26, 2025

Contributor

FAJAR BAGUS W

Format

PDF

Language

ENGLISH

Type

TEXT

Files

Collection

Citation

Muhammad Innuddin1*, Hairani Hairani2, M. Thonthowi Jauhari3, Lalu Zazuli Azhar Mardedi4, “Improving Classification Performance on Imbalanced Stroke Datasets Using Oversampling Techniques,” Repository Horizon University Indonesia, accessed February 9, 2026, https://repository.horizon.ac.id/items/show/10596.