Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter UsingPre-Trained and Self-Training Word Embeddings
Dublin Core
Title
Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter UsingPre-Trained and Self-Training Word Embeddings
Subject
Sentiment Analysis, Twitter, Bidirectional LSTM, Word Embedding, fastText, GloVe
Description
Sentiment analysis regarding the COVID-19 vaccine can be obtained from social media because
users usually express their opinions through social media. One of the social media that is most
often used by Indonesian people to express their opinion is Twitter. The method used in this
research is Bidirectional LSTM which will be combined with word embedding. In this study,
fastText and GloVe were tested as word embedding. We created 8 test scenarios to inspect performance of the word embeddings, using both pre-trained and self-trained word embedding vectors. Dataset gathered from Twitter was prepared as stemmed dataset and unstemmed dataset. The highest accuracy from GloVe scenario group was generated by model which used selftrained GloVe and trained on unstemmed dataset. The accuracy reached 92.5%. On the other hand, the highest accuracy from fastText scenario group generated by model which used selftrained fastText and trained on stemmed dataset. The accuracy reached 92.3%. In other scenarios that used pre-trained embedding vector, the accuracy was quite lower than scenarios that used self-trained embedding vector, because the pre-trained embedding data was trained using the Wikipedia corpus which contains standard and well-structured language while the dataset used
in this study came from Twitter which contains non-standard sentences. Even though the dataset
was processed using stemming and slang words dictionary, the pre-trained embedding still can
not recognize several words from our dataset.
users usually express their opinions through social media. One of the social media that is most
often used by Indonesian people to express their opinion is Twitter. The method used in this
research is Bidirectional LSTM which will be combined with word embedding. In this study,
fastText and GloVe were tested as word embedding. We created 8 test scenarios to inspect performance of the word embeddings, using both pre-trained and self-trained word embedding vectors. Dataset gathered from Twitter was prepared as stemmed dataset and unstemmed dataset. The highest accuracy from GloVe scenario group was generated by model which used selftrained GloVe and trained on unstemmed dataset. The accuracy reached 92.5%. On the other hand, the highest accuracy from fastText scenario group generated by model which used selftrained fastText and trained on stemmed dataset. The accuracy reached 92.3%. In other scenarios that used pre-trained embedding vector, the accuracy was quite lower than scenarios that used self-trained embedding vector, because the pre-trained embedding data was trained using the Wikipedia corpus which contains standard and well-structured language while the dataset used
in this study came from Twitter which contains non-standard sentences. Even though the dataset
was processed using stemming and slang words dictionary, the pre-trained embedding still can
not recognize several words from our dataset.
Creator
Kartikasari Kusuma Agustiningsih, Ema Utami, and Omar Muhammad Altoumi Alsyaibani
Source
http://dx.doi.org/10.21609/jiki.v15i1.1044
Publisher
Faculty of Computer Science Universitas Indonesia
Date
022-02-27
Contributor
Sri Wahyuni
Rights
e-ISSN : 2502-9274 printed ISSN : 2088-7051
Format
PDF
Language
English
Type
Text
Coverage
Jurnal Ilmu Komputer dan Informasi (Journal of Computer Science and Information)
Files
Collection
Citation
Kartikasari Kusuma Agustiningsih, Ema Utami, and Omar Muhammad Altoumi Alsyaibani, “Sentiment Analysis of COVID-19 Vaccines in Indonesia on Twitter UsingPre-Trained and Self-Training Word Embeddings,” Repository Horizon University Indonesia, accessed May 22, 2025, https://repository.horizon.ac.id/items/show/8838.