TELKOMNIKA Telecommunication, Computing, Electronics and Control
RoBERTa: language modelling in building Indonesian question-answering systems
Dublin Core
Title
TELKOMNIKA Telecommunication, Computing, Electronics and Control
RoBERTa: language modelling in building Indonesian question-answering systems
RoBERTa: language modelling in building Indonesian question-answering systems
Subject
ALBERT, ELECTRA, Indonesian QAS, Language modelling, RoBERTa
Description
This research aimed to evaluate the performance of the A Lite BERT
(ALBERT), efficiently learning an encoder that classifies token
replacements accurately (ELECTRA) and a robust optimized BERT
pretraining approach (RoBERTa) models to support the development of the Indonesian language question and answer system model. The evaluation carried out used Indonesian, Malay and Esperanto. Here, Esperanto was used as a comparison of Indonesian because it is international, which does not belong to any person or country and this then make it neutral. Compared to other foreign languages, the structure and construction of Esperanto is relatively simple. The dataset used was the result of crawling Wikipedia for Indonesian and Open Super-large Crawled ALMAnaCH coRpus (OSCAR) for Esperanto. The size of the token dictionary used in the test used approximately 30,000 sub tokens in both the SentencePiece and byte-level byte pair encoding methods (ByteLevelBPE). The test was carried out with the learning rates of 1e-5 and 5e-5 for both languages in accordance with the reference from the bidirectional encoder representations from transformers (BERT) paper. As shown in the final result of this study, the ALBERT and RoBERTa models in Esperanto showed the results of the loss calculation that were not much different. This showed that the RoBERTa model was better to implement an Indonesian question and answer system.
(ALBERT), efficiently learning an encoder that classifies token
replacements accurately (ELECTRA) and a robust optimized BERT
pretraining approach (RoBERTa) models to support the development of the Indonesian language question and answer system model. The evaluation carried out used Indonesian, Malay and Esperanto. Here, Esperanto was used as a comparison of Indonesian because it is international, which does not belong to any person or country and this then make it neutral. Compared to other foreign languages, the structure and construction of Esperanto is relatively simple. The dataset used was the result of crawling Wikipedia for Indonesian and Open Super-large Crawled ALMAnaCH coRpus (OSCAR) for Esperanto. The size of the token dictionary used in the test used approximately 30,000 sub tokens in both the SentencePiece and byte-level byte pair encoding methods (ByteLevelBPE). The test was carried out with the learning rates of 1e-5 and 5e-5 for both languages in accordance with the reference from the bidirectional encoder representations from transformers (BERT) paper. As shown in the final result of this study, the ALBERT and RoBERTa models in Esperanto showed the results of the loss calculation that were not much different. This showed that the RoBERTa model was better to implement an Indonesian question and answer system.
Creator
Wiwin Suwarningsih, Raka Aditya Pratama, Fadhil Yusuf Rahadika, Mochamad Havid Albar Purnomo
Source
DOI: 10.12928/TELKOMNIKA.v20i6.24248
Publisher
Universitas Ahmad Dahlan
Date
December 2022
Contributor
Sri Wahyuni
Rights
ISSN: 1693-6930
Relation
http://journal.uad.ac.id/index.php/TELKOMNIKA
Format
PDF
Language
English
Type
Text
Coverage
TELKOMNIKA Telecommunication, Computing, Electronics and Control
Files
Collection
Citation
Wiwin Suwarningsih, Raka Aditya Pratama, Fadhil Yusuf Rahadika, Mochamad Havid Albar Purnomo, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
RoBERTa: language modelling in building Indonesian question-answering systems,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/4483.
RoBERTa: language modelling in building Indonesian question-answering systems,” Repository Horizon University Indonesia, accessed April 4, 2025, https://repository.horizon.ac.id/items/show/4483.