Development of decision tree classification algorithms in predicting mortality of COVID‐19 patients
Dublin Core
Title
Development of decision tree classification algorithms in predicting mortality of COVID‐19 patients
Subject
Decision tree, CART, C5.0, CHAID, Logistic regression, COVID-19 mortality, Predictive factors
Description
Abstract
Introduction The accurate prediction of COVID-19 mortality risk, considering influencing factors, is crucial in guiding
effective public policies to alleviate the strain on the healthcare system. As such, this study aimed to assess the effi-
cacy of decision tree algorithms (CART, C5.0, and CHAID) in predicting COVID-19 mortality risk and compare their
performance with that of the logistic model.
Methods This retrospective cohort study examined 5080 cases of COVID-19 in Babol, a city in northern Iran, who
tested positive for the virus via PCR from March 2020 to March 2022. In order to check the validity of the findings,
the data was randomly divided into an 80% training set and a 20% testing set. The prediction models, such as Logistic
regression models and decision tree algorithms, were trained on the 80% training data and tested on the 20% testing
data. The accuracy of these methods for the test samples was assessed using measures like ROC curve, sensitivity,
specificity, and AUC.
Results The findings revealed that the mortality rate for COVID-19 patients who were admitted to hospitals was 7.7%.
Through cross validation, it was determined that the CHAID algorithm outperformed other decision tree and logistic
regression algorithms in specificity, and precision but not sensitivity in predicting the risk of COVID-19 mortality. The
CHAID algorithm demonstrated a specificity, precision, accuracy, and F-score of 0.98, 0.70, 0.95, and 0.52 respectively.
All models indicated that factors such as ICU hospitalization, intubation, age, kidney disease, BUN, CRP, WBC, NLR, O2
sat, and hemoglobin were among the factors that influenced the mortality rate of COVID-19 patients.
Conclusions The CART and C5.0 models had outperformed in sensitivity but CHAID demonstrates a better perfor-
mance compared to other decision tree algorithms in specificity, precision, accuracy and shows a slight improvement
over the logistic regression method in predicting the risk of COVID-19 mortality in the population under study.
Keywords Decision tree, CART, C5.0, CHAID, Logistic regression, COVID-19 mortality, Predictive factors
Introduction The accurate prediction of COVID-19 mortality risk, considering influencing factors, is crucial in guiding
effective public policies to alleviate the strain on the healthcare system. As such, this study aimed to assess the effi-
cacy of decision tree algorithms (CART, C5.0, and CHAID) in predicting COVID-19 mortality risk and compare their
performance with that of the logistic model.
Methods This retrospective cohort study examined 5080 cases of COVID-19 in Babol, a city in northern Iran, who
tested positive for the virus via PCR from March 2020 to March 2022. In order to check the validity of the findings,
the data was randomly divided into an 80% training set and a 20% testing set. The prediction models, such as Logistic
regression models and decision tree algorithms, were trained on the 80% training data and tested on the 20% testing
data. The accuracy of these methods for the test samples was assessed using measures like ROC curve, sensitivity,
specificity, and AUC.
Results The findings revealed that the mortality rate for COVID-19 patients who were admitted to hospitals was 7.7%.
Through cross validation, it was determined that the CHAID algorithm outperformed other decision tree and logistic
regression algorithms in specificity, and precision but not sensitivity in predicting the risk of COVID-19 mortality. The
CHAID algorithm demonstrated a specificity, precision, accuracy, and F-score of 0.98, 0.70, 0.95, and 0.52 respectively.
All models indicated that factors such as ICU hospitalization, intubation, age, kidney disease, BUN, CRP, WBC, NLR, O2
sat, and hemoglobin were among the factors that influenced the mortality rate of COVID-19 patients.
Conclusions The CART and C5.0 models had outperformed in sensitivity but CHAID demonstrates a better perfor-
mance compared to other decision tree algorithms in specificity, precision, accuracy and shows a slight improvement
over the logistic regression method in predicting the risk of COVID-19 mortality in the population under study.
Keywords Decision tree, CART, C5.0, CHAID, Logistic regression, COVID-19 mortality, Predictive factors
Creator
Zahra Mohammadi‐Pirouz1
, Karimollah Hajian‐Tilaki2,3*, Mahmoud Sadeghi Haddat‐Zavareh4
,
Abazar Amoozadeh3 and Shabnam Bahrami1
, Karimollah Hajian‐Tilaki2,3*, Mahmoud Sadeghi Haddat‐Zavareh4
,
Abazar Amoozadeh3 and Shabnam Bahrami1
Source
https://doi.org/10.1186/s12245-024-00681-7
Date
2024
Contributor
Peri Irawan
Format
pdf
Language
english
Type
text
Files
Collection
Citation
Zahra Mohammadi‐Pirouz1
, Karimollah Hajian‐Tilaki2,3*, Mahmoud Sadeghi Haddat‐Zavareh4
,
Abazar Amoozadeh3 and Shabnam Bahrami1, “Development of decision tree classification algorithms in predicting mortality of COVID‐19 patients,” Repository Horizon University Indonesia, accessed April 25, 2026, https://repository.horizon.ac.id/items/show/12400.