TELKOMNIKA Telecommunication Computing Electronics and Control
Experimental of vectorizer and classifier for scrapped social media data
Dublin Core
Title
TELKOMNIKA Telecommunication Computing Electronics and Control
Experimental of vectorizer and classifier for scrapped social media data
Experimental of vectorizer and classifier for scrapped social media data
Subject
Classifier
Experiment
Social media
Text processing
Vectorizer
Experiment
Social media
Text processing
Vectorizer
Description
In this study, we used several classifiers and vectorizers to see their effect on
processing social media data. In this study, the classifiers used were random
forest, logistic regression, Bernoulli Naive Bayes (NB), and support vector
clustering (SVC). Random forests are used to reduce spatial complexity, and
also to minimize errors. Logistic regression is a method with a statistical model
whose basic form uses a logistic function to represent the binary dependent
variable. Then, the Naive Bayes function uses binary elements and SVC which
has so far given good results rivals other guided learning. Our tests use social
media data. Based on the tests that have been carried out on classifier
variations and vectorizer variations, it was found that the best classifier is a
linear regression algorithm based on predictive adaptive compared to the
random forest method based on decision trees, probability-based Bernoulli NB
and SVC which work by clustering. Meanwhile, from the test results on the
count vectorizer, term frequency-inverse document frequency (TFIDF), and
hashing, the best accuracy is achieved on the TFIDF vectorizer. In this case,
it means that the TFIDF vectorizer has a better value in presenting word
feature dimensions.
processing social media data. In this study, the classifiers used were random
forest, logistic regression, Bernoulli Naive Bayes (NB), and support vector
clustering (SVC). Random forests are used to reduce spatial complexity, and
also to minimize errors. Logistic regression is a method with a statistical model
whose basic form uses a logistic function to represent the binary dependent
variable. Then, the Naive Bayes function uses binary elements and SVC which
has so far given good results rivals other guided learning. Our tests use social
media data. Based on the tests that have been carried out on classifier
variations and vectorizer variations, it was found that the best classifier is a
linear regression algorithm based on predictive adaptive compared to the
random forest method based on decision trees, probability-based Bernoulli NB
and SVC which work by clustering. Meanwhile, from the test results on the
count vectorizer, term frequency-inverse document frequency (TFIDF), and
hashing, the best accuracy is achieved on the TFIDF vectorizer. In this case,
it means that the TFIDF vectorizer has a better value in presenting word
feature dimensions.
Creator
Setiawan Assegaff, Errissya Rasywir, Yovi Pratama
Source
http://telkomnika.uad.ac.id
Date
Feb 16, 2023
Contributor
peri irawan
Format
pdf
Language
english
Type
text
Files
Collection
Citation
Setiawan Assegaff, Errissya Rasywir, Yovi Pratama, “TELKOMNIKA Telecommunication Computing Electronics and Control
Experimental of vectorizer and classifier for scrapped social media data,” Repository Horizon University Indonesia, accessed April 17, 2025, https://repository.horizon.ac.id/items/show/4567.
Experimental of vectorizer and classifier for scrapped social media data,” Repository Horizon University Indonesia, accessed April 17, 2025, https://repository.horizon.ac.id/items/show/4567.