TELKOMNIKA Telecommunication, Computing, Electronics and Control
A novel data balancing technique via resampling majority and minority classes toward effective classification
Dublin Core
Title
TELKOMNIKA Telecommunication, Computing, Electronics and Control
A novel data balancing technique via resampling majority and minority classes toward effective classification
A novel data balancing technique via resampling majority and minority classes toward effective classification
Subject
Divide and conquer
Heart disease prediction
Imbalance data handling
Machine learning
Medical informatics
Heart disease prediction
Imbalance data handling
Machine learning
Medical informatics
Description
Classification is a predictive modelling task in machine learning (ML), where the class label is determined for a specific example of predefined features. In determining handwriting characters, identifying spam, detecting disease, identifying signals, and so on, classification requires training data with many features and label instances. In medical informatics, high precision and recall are mandatory issues besides the high accuracy of the ML classifiers. Most of the real-life datasets have imbalanced characteristics that hamper the overall performance of the classifiers. Existing data balancing techniques perform the whole dataset at a time that sometimes causes overfitting and underfitting. We propose a data balancing technique that follows the divide and conquer procedure to cluster the dataset into several segments, and both oversampling and undersampling operation is performed on each cluster. Finally, the cluster joined together and built a balanced dataset. We chose the sample data of two heart disease datasets: Hungarian and Long Beach. Logistic regression and random forest classifier are the representatives of ML algorithms. We compare our proposed techniques with existing SMOTE, NearMiss, and SMOTETomek data balancing techniques. Both algorithms perform better on the proposed technique-balanced dataset. This technique can be the optimal solution for the imbalanced data handling strategy.
Creator
Mahmudul Hasan, Md. Fazle Rabbi, Md. Nahid Sultan, Adiba Mahjabin Nitu, Md. Palash Uddin
Source
http://journal.uad.ac.id/index.php/TELKOMNIKA
Date
Jul 30, 2023
Contributor
peri irawan
Format
pdf
Language
english
Type
text
Files
Collection
Citation
Mahmudul Hasan, Md. Fazle Rabbi, Md. Nahid Sultan, Adiba Mahjabin Nitu, Md. Palash Uddin, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
A novel data balancing technique via resampling majority and minority classes toward effective classification,” Repository Horizon University Indonesia, accessed February 5, 2025, https://repository.horizon.ac.id/items/show/4646.
A novel data balancing technique via resampling majority and minority classes toward effective classification,” Repository Horizon University Indonesia, accessed February 5, 2025, https://repository.horizon.ac.id/items/show/4646.