Enhancing Stroke Prediction with Logistic Regression and Support Vector MachineUsing Oversampling Techniques

Dublin Core

Title

Enhancing Stroke Prediction with Logistic Regression and Support Vector MachineUsing Oversampling Techniques

Subject

grid search cross-validation; logistic regression; machine learning; stroke disease; support vector machine

Description

Stroke is a significant health concern that can result in both death and disability, making the early identification of risk factors crucial. Previous studies on stroke prediction have been limited by inadequate handling of class imbalance, lack of comprehensive feature selection, and parameter optimization, with accuracy ratesusually below 80%. This study compares the performance of Logistic Regression (LR) and Support Vector Machine (SVM) algorithms combined with different oversampling methods—SMOTE, Borderline-SMOTE, ADASYN, Random Over Sampling (ROS), and Random Under Sampling (RUS)—on a stroke prediction dataset. Correlation-based feature selection identified age, hypertension, and heart disease as significant predictors. GridSearchCV with 10-fold cross-validation was used for hyperparameter optimization, and performance was evaluated using precision, recall, accuracy, and ROC curves. The results showed that SVM significantly outperformed Logistic Regression across all sampling methods. SVM+ROS achieved the highest performance with perfect recall (100%), precision of 97.18%,and accuracy of 98.56% (AUC: 0.9857), whereas SVM + Borderline-SMOTE offered balanced performance with a recall of 94.99%, precision of 95.06%, and accuracy of 95.17% (AUC: 0.9512). LR + Borderline-SMOTE performed the best with an accuracy of 84.98% (AUC:0.8503), significantly better than previous studies. This improved accuracy shows significant clinical benefits, potentially reducing missed stroke diagnoses by identifying thousands of additional at-risk patients in large-scale screening programs. Healthcare providers should consider implementing SVM with ROS in critical care settings, where potentially missed stroke cases have severe consequences. Simultaneously, SVM with Borderline-SMOTE may be more appropriate for resource-constrained environments.

Creator

Syamsul Risal1*, Fajar Apriyadi2, A. Sumardin3, Andini Dani Achmad4, Annisa Nurul Puteri5

Source

https://jurnal.iaii.or.id/index.php/RESTI/article/view/6431/1089

Publisher

Department of Informatics, Universitas Teknologi Akba Makassar, Indonesia

Date

June 22, 2025

Contributor

FAJAR BAGUS W

Format

PDF

Language

ENGLISH

Type

TEXT

Files

Collection

Citation

Syamsul Risal1*, Fajar Apriyadi2, A. Sumardin3, Andini Dani Achmad4, Annisa Nurul Puteri5, “Enhancing Stroke Prediction with Logistic Regression and Support Vector MachineUsing Oversampling Techniques,” Repository Horizon University Indonesia, accessed January 27, 2026, https://repository.horizon.ac.id/items/show/10528.