A Gaussian Naive Bayes and SMOTE-Based Approach for Predicting
Breast Cancer Aggressiveness in Imbalanced Datasets
Dublin Core
Title
A Gaussian Naive Bayes and SMOTE-Based Approach for Predicting
Breast Cancer Aggressiveness in Imbalanced Datasets
Breast Cancer Aggressiveness in Imbalanced Datasets
Subject
Breast Cancer, Gaussian Naive Bayes, Classification, SMOTE, Medical Diagnosis, Machine Learning.
Description
Breast cancer remains one of the leading causes of death among women worldwide, making early and accurate detection essential to improving
patient outcomes. This study aims to develop a predictive model for breast cancer aggressiveness using the Gaussian Naive Bayes algorithm on
the Breast Cancer Wisconsin Diagnostic Dataset. The dataset contains 569 instances with 30 numerical features representing various cell
characteristics. Preprocessing steps included data cleaning, label encoding, and Min-Max normalization. The model was evaluated using
accuracy, precision, recall, F1-score, and a confusion matrix. Initially, the model achieved an accuracy of 78.88%; however, the recall for
malignant cases was relatively low at 45.5%, highlighting a critical limitation in detecting aggressive cancer. To address class imbalance and
improve model sensitivity, the Synthetic Minority Oversampling Technique (SMOTE) was applied. While detailed post-SMOTE metrics were
not reported in this version, the approach is expected to enhance recall and F1-score for the malignant class. This research demonstrates the
potential of Gaussian Naive Bayes, combined with data balancing techniques, as a fast and interpretable tool for early breast cancer diagnosis.
Future work will focus on model comparison, cross-validation, and statistical evaluation to improve robustness and reliability.
patient outcomes. This study aims to develop a predictive model for breast cancer aggressiveness using the Gaussian Naive Bayes algorithm on
the Breast Cancer Wisconsin Diagnostic Dataset. The dataset contains 569 instances with 30 numerical features representing various cell
characteristics. Preprocessing steps included data cleaning, label encoding, and Min-Max normalization. The model was evaluated using
accuracy, precision, recall, F1-score, and a confusion matrix. Initially, the model achieved an accuracy of 78.88%; however, the recall for
malignant cases was relatively low at 45.5%, highlighting a critical limitation in detecting aggressive cancer. To address class imbalance and
improve model sensitivity, the Synthetic Minority Oversampling Technique (SMOTE) was applied. While detailed post-SMOTE metrics were
not reported in this version, the approach is expected to enhance recall and F1-score for the malignant class. This research demonstrates the
potential of Gaussian Naive Bayes, combined with data balancing techniques, as a fast and interpretable tool for early breast cancer diagnosis.
Future work will focus on model comparison, cross-validation, and statistical evaluation to improve robustness and reliability.
Creator
Deshinta Arrova Dewi,
1,* Tri Basuki Kurniawan2
1,* Tri Basuki Kurniawan2
Source
https://ijiis.org/index.php/IJIIS/article/view/250/158
Publisher
INTI International University, Malaysia,
Date
january 2025
Contributor
Fajar bagus W
Format
PDF
Language
English
Type
Text
Files
Collection
Citation
Deshinta Arrova Dewi,
1,* Tri Basuki Kurniawan2, “A Gaussian Naive Bayes and SMOTE-Based Approach for Predicting
Breast Cancer Aggressiveness in Imbalanced Datasets,” Repository Horizon University Indonesia, accessed January 2, 2026, https://repository.horizon.ac.id/items/show/9726.
Breast Cancer Aggressiveness in Imbalanced Datasets,” Repository Horizon University Indonesia, accessed January 2, 2026, https://repository.horizon.ac.id/items/show/9726.