Application of Naïve Bayes Algorithm Variations
On Indonesian General Analysis Dataset for Sentiment Analysis
Dublin Core
Title
Application of Naïve Bayes Algorithm Variations
On Indonesian General Analysis Dataset for Sentiment Analysis
On Indonesian General Analysis Dataset for Sentiment Analysis
Subject
sentiment analysis, indonesian dataset, bernoulli nave bayes, gaussian nave bayes, multinomial nave bayes,
complement nave bayes
complement nave bayes
Description
Indonesian General Analysis Dataset is a dataset sourced from social media twitter by using keywords in the form of
conjunctions to get a dataset that does not only focus on a particular topic. The use of Indonesian language datasets with
general topics can be used to test the accuracy of the classification model so as to provide additional reference in choosing the
right methods and parameters for sentiment analysis. One of the algorithms which in several studies produces the highest level
of accuracy is naive Bayes which has several variations. This study aims to obtain the method with the best accuracy from the
naive Bayes variation by setting the minimum and maximum document frequency parameters on the Indonesian General
Analysis Dataset for sentiment analysis. The naive Bayes classifier variations used include Bernoulli naive Bayes, gaussian
naive Bayes, complement naive Bayes and multinomial naive Bayes. The research stage begins with downloading the dataset.
Preprocessing becomes the next stage which consists of tokenizing, stemming, converting abbreviations and eliminating
conjunctions. In the preprocessed data, feature extraction is carried out by converting the dataset into vectors and applying
the TF-IDF method before entering the sentiment analysis classification stage. Tests in this study were carried out by applying
the minimum document frequency (min-df) and maximum document frequency (max-df) for each variation of naive Bayes to
obtain the appropriate parameters. The test uses k-fold cross validation of the dataset to divide the training data and sentiment
analysis test data. The next confusion matrix is made to evaluate the level of accuracy
conjunctions to get a dataset that does not only focus on a particular topic. The use of Indonesian language datasets with
general topics can be used to test the accuracy of the classification model so as to provide additional reference in choosing the
right methods and parameters for sentiment analysis. One of the algorithms which in several studies produces the highest level
of accuracy is naive Bayes which has several variations. This study aims to obtain the method with the best accuracy from the
naive Bayes variation by setting the minimum and maximum document frequency parameters on the Indonesian General
Analysis Dataset for sentiment analysis. The naive Bayes classifier variations used include Bernoulli naive Bayes, gaussian
naive Bayes, complement naive Bayes and multinomial naive Bayes. The research stage begins with downloading the dataset.
Preprocessing becomes the next stage which consists of tokenizing, stemming, converting abbreviations and eliminating
conjunctions. In the preprocessed data, feature extraction is carried out by converting the dataset into vectors and applying
the TF-IDF method before entering the sentiment analysis classification stage. Tests in this study were carried out by applying
the minimum document frequency (min-df) and maximum document frequency (max-df) for each variation of naive Bayes to
obtain the appropriate parameters. The test uses k-fold cross validation of the dataset to divide the training data and sentiment
analysis test data. The next confusion matrix is made to evaluate the level of accuracy
Creator
Najirah Umar1
, M. Adnan Nur2
, M. Adnan Nur2
Publisher
STMIK Handayani Makassar
Date
22-08-2022
Contributor
Fajar bagus W
Format
PDF
Language
Indonesia
Type
Text
Files
Collection
Citation
Najirah Umar1
, M. Adnan Nur2, “Application of Naïve Bayes Algorithm Variations
On Indonesian General Analysis Dataset for Sentiment Analysis,” Repository Horizon University Indonesia, accessed June 7, 2025, https://repository.horizon.ac.id/items/show/9214.
On Indonesian General Analysis Dataset for Sentiment Analysis,” Repository Horizon University Indonesia, accessed June 7, 2025, https://repository.horizon.ac.id/items/show/9214.