Improving Classification Performance on Imbalanced Stroke Datasets Using Oversampling Techniques
Dublin Core
Title
Improving Classification Performance on Imbalanced Stroke Datasets Using Oversampling Techniques
Subject
borderline-SMOTE; imbalance data; SMOTE; stroke prediction; XGBoost method
Description
Stroke is the second leading cause of death globally and significantly contributes to long-term disability. While machine learning techniques have been increasingly used for early stroke detection, their performance is often limited by imbalanced data distributions that bias classification outcomes. This study aims to investigate the effectiveness of three oversampling techniques—SMOTE, Borderline-SMOTE, and SVM-SMOTE—in improving stroke classification performance on imbalanced datasets. Oversampling methods are applied to balance class distributions, followed by the implementation of Random Forest and XGBoost classifiers for stroke prediction. Experimental results demonstrate that oversampling techniques substantially improve classification performance, particularly in the Matthews Correlation Coefficient (MCC) and Area Under the Curve (AUC) metrics. Among the tested methods, Borderline-SMOTE yields the best performance, achieving accuracies of 96.45% with Random Forest and 96.41% with XGBoost. Moreover, it increases MCC by 87.51% and AUC by 45.40% for Random Forest, and MCC by 76.52% and AUC by 41.81% for XGBoost, compared without oversampling. The results demonstrate that Borderline-SMOTE effectively addresses data imbalance, enhances model robustness, and improves the detection of minority stroke cases in classification task
Creator
Muhammad Innuddin1*, Hairani Hairani2, M. Thonthowi Jauhari3, Lalu Zazuli Azhar Mardedi4
Source
https://jurnal.iaii.or.id/index.php/RESTI/article/view/6859/1158
Publisher
Department of Computer Science, Faculty of Engineering, Universitas Bumigora, Mataram, Indonesia
Date
October 26, 2025
Contributor
FAJAR BAGUS W
Format
PDF
Language
ENGLISH
Type
TEXT
Files
Collection
Citation
Muhammad Innuddin1*, Hairani Hairani2, M. Thonthowi Jauhari3, Lalu Zazuli Azhar Mardedi4, “Improving Classification Performance on Imbalanced Stroke Datasets Using Oversampling Techniques,” Repository Horizon University Indonesia, accessed February 9, 2026, https://repository.horizon.ac.id/items/show/10596.