Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
Dublin Core
Title
Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost)
Subject
diabetes; prediction; machine learning; xgboost
Description
Diabetes results from impaired pancreas function as a producer of insulin and glucagon hormones, which regulate glucose
levels in the blood. People with diabetes today are not only experienced adults, but pre-diabetes has been identified since the
age of children and adolescents. Early prediction of diabetes can make it easier for doctors and patients to intervene as soon
as possible so that the risk of complications can be reduced. One of the uses of medical data from diabetes patients is used to
produce a model that can be used by medical staff to predict and identify diabetes in patients. Various techniques are used to
provide the earliest possible prediction of diabetes based on the symptoms experienced by diabetic patients, including using
machine learning. People can use Machine Learning to generate models based on historical data of diabetic patients, and
predictions are made with the model. In this study, extreme gradient boosting is the machine learning technique to predict
diabetes (xgboost) using Feature Importance XGBoost. The diabetes dataset used in this study comes from the Early stage
diabetes risk prediction dataset published by UCI Machine Learning, which has 520 records and 16 attributes. The diabetes
prediction model using xgboost is displayed as a tree. The model accuracy result in this study was 98.71%, for the F1 score
was 98.18%. While the accuracy obtained based on the best 10 attributes using the XGBoost feature importance are 98.72%
levels in the blood. People with diabetes today are not only experienced adults, but pre-diabetes has been identified since the
age of children and adolescents. Early prediction of diabetes can make it easier for doctors and patients to intervene as soon
as possible so that the risk of complications can be reduced. One of the uses of medical data from diabetes patients is used to
produce a model that can be used by medical staff to predict and identify diabetes in patients. Various techniques are used to
provide the earliest possible prediction of diabetes based on the symptoms experienced by diabetic patients, including using
machine learning. People can use Machine Learning to generate models based on historical data of diabetic patients, and
predictions are made with the model. In this study, extreme gradient boosting is the machine learning technique to predict
diabetes (xgboost) using Feature Importance XGBoost. The diabetes dataset used in this study comes from the Early stage
diabetes risk prediction dataset published by UCI Machine Learning, which has 520 records and 16 attributes. The diabetes
prediction model using xgboost is displayed as a tree. The model accuracy result in this study was 98.71%, for the F1 score
was 98.18%. While the accuracy obtained based on the best 10 attributes using the XGBoost feature importance are 98.72%
Creator
Kartina Diah Kusuma W, Memen Akbar
Source
http://jurnal.iaii.or.id
Publisher
Professional Organization Ikatan Ahli Informatika Indonesia (IAII)/Indonesian Informatics Experts Association
Date
August 2023
Contributor
Sri Wahyuni
Rights
ISSN Media Electronic: 2580-0760
Format
PDF
Language
English
Type
Text
Files
Collection
Citation
Kartina Diah Kusuma W, Memen Akbar, “Diabetes Risk Prediction using Feature Importance Extreme Gradient Boosting (XGBoost),” Repository Horizon University Indonesia, accessed January 11, 2026, https://repository.horizon.ac.id/items/show/10060.