Exploring feature selection techniques on Classification Algorithms for
Predicting Type 2 Diabetes at Early Stage

Dublin Core

Title

Exploring feature selection techniques on Classification Algorithms for
Predicting Type 2 Diabetes at Early Stage

Subject

Type 2 diabetes, machine learning, feature selection, feature importance

Description

Predicting early Type 2 diabetes (T2D) is critical for improved care and better T2D outcomes. An accurate and efficient T2D
prediction relies on unbiased relevant features. In this study, we searched for important features to predict T2D by integrating
ML-based models for feature selection and classification from 520 individuals newly diagnosed with diabetes or who will
develop it. We used standard machine learning classifications, such as logistic regression (LR), Gaussian naive Bayes (NB),
decision tree (DT), random forest (RF), support vector machine (SVM) with linear basis function, and k-nearest neighbors
(KNN). We set out to systematically explore the viability of main feature selection representing each different technique, such
as a statistical filter method (F-score), an entropy-based filter method (mutual information), an ensemble-based filter method
(random forest importance), and a stochastic optimization (simultaneous perturbation feature selection and ranking (SpFSR)).
We used a stratified 10-fold cross-validation technique and assessed the performance of discrimination, calibration, and
clinical utility. We attained the highest accuracy of 98% using RF with the full set of features (16 features), then used RF as a
classifier wrapper to select the important features. We observed a combination of SpFSR and RF as the best model with a Pvalue above 0.05 (P-value = 0.26), statistically attaining the same accuracy as the full features. The study's findings support
the efficiency and usefulness of the suggested method for choosing the most important features of diabetic data: polyuria,
gender, polydipsia, age, itching, sudden weight loss, delayed healing, and alopecia.

Creator

Mila Desi Anasanti1
, Khairunisa Hilyati2
, Annisa Novtariany3

Publisher

University College London, London, United Kingdom

Date

31-10-2022

Contributor

Fajar bagus W

Format

PDF

Language

Indonesia

Type

Text

Files

Collection

Citation

Mila Desi Anasanti1 , Khairunisa Hilyati2 , Annisa Novtariany3, “Exploring feature selection techniques on Classification Algorithms for
Predicting Type 2 Diabetes at Early Stage,” Repository Horizon University Indonesia, accessed June 7, 2025, https://repository.horizon.ac.id/items/show/9262.