TELKOMNIKA Telecommunication, Computing, Electronics and Control
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data classification
Dublin Core
Title
TELKOMNIKA Telecommunication, Computing, Electronics and Control
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data classification
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data classification
Subject
CSBS
Mixed data
Multi-classification
Naïve Bayes
Short text
Similarity-based
Mixed data
Multi-classification
Naïve Bayes
Short text
Similarity-based
Description
In this paper, a hybrid method has been introduced to improve the
classification performance of naïve Bayes (NB) for the mixed dataset and
multi-class problems. This proposed method relies on a similarity measure
which is applied to portions that are not correctly classified by NB. Since the
data contains a multi-valued short text with rare words that limit the NB
performance, we have employed an adapted selective classifier based on
similarities (CSBS) classifier to exceed the NB limitations and included the
rare words in the computation. This action has been achieved by transforming
the formula from the product of the probabilities of the categorical variable to
its sum weighted by numerical variable. The proposed algorithm has been
experimented on card payment transaction data that contains the label of
transactions: the multi-valued short text and the transaction amount. Based on
K-fold cross validation, the evaluation results confirm that the proposed
method achieved better results in terms of precision, recall, and F-score
compared to NB and CSBS classifiers separately. Besides, the fact of
converting a product form to a sum gives more chance to rare words to
optimize the text classification, which is another advantage of the proposed
method.
classification performance of naïve Bayes (NB) for the mixed dataset and
multi-class problems. This proposed method relies on a similarity measure
which is applied to portions that are not correctly classified by NB. Since the
data contains a multi-valued short text with rare words that limit the NB
performance, we have employed an adapted selective classifier based on
similarities (CSBS) classifier to exceed the NB limitations and included the
rare words in the computation. This action has been achieved by transforming
the formula from the product of the probabilities of the categorical variable to
its sum weighted by numerical variable. The proposed algorithm has been
experimented on card payment transaction data that contains the label of
transactions: the multi-valued short text and the transaction amount. Based on
K-fold cross validation, the evaluation results confirm that the proposed
method achieved better results in terms of precision, recall, and F-score
compared to NB and CSBS classifiers separately. Besides, the fact of
converting a product form to a sum gives more chance to rare words to
optimize the text classification, which is another advantage of the proposed
method.
Creator
Fatima El Barakaz, Omar Boutkhoum, Abdelmajid El Moutaouakkil
Source
http://journal.uad.ac.id/index.php/TELKOMNIKA
Date
Sep 16, 2020
Contributor
peri irawan
Format
pdf
Language
english
Type
text
Files
Collection
Citation
Fatima El Barakaz, Omar Boutkhoum, Abdelmajid El Moutaouakkil, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data classification,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/3643.
A hybrid naïve Bayes based on similarity measure to optimize the mixed-data classification,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/3643.