Comparing Word Representation BERT and RoBERTa in Keyphrase Extraction using TgGAT

Dublin Core

Title

Subject

Keyphrase Extraction, BERT, RoBERTa, Pre-Trained Language Models, Topic-Guided Graph Attention Networks

Description

In this digital era, accessing vast amountsof information from websites and academic papers has become easier. However, efficiently locating relevant content remains challenging due to the overwhelming volume of data. Keyphrase Extraction Systems automatethe process of generating phrases that accurately represent a document’s maintopics. These systems are crucialfor supporting various natural language processing tasks, such as text summarization, information retrieval, and representation.The traditional method of manually selecting key phrasesis still common but often proves inefficient and inconsistent in summarizing the main ideas of a document. This study introduces an approach that integrates pre-trained language models, BERT and RoBERTa, with Topic-Guided Graph Attention Networks (TgGAT) to enhance keyphrase extraction. TgGAT strengthens the extraction process by combining topic modellingwith graph-based structures, providing a more structured and context-aware representation of a document’s key topics. By leveraging the strengths of both graph-based and transformer-based models, this research proposes a framework that improves keyphrase extraction performance. This is the first to apply graph-based and PLM methods for keyphrase extraction in the Indonesian language. Theresults revealed that BERT outperformed RoBERTa, with precision, recall, and F1-scores of 0.058, 0.070, and 0.062, respectively, compared to RoBERTa’s 0.026, 0.030, and 0.027.Theresult showsthat BERT with TgGAT obtained more representative keyphrases than RoBERTa with TgGAT. These findings underline the benefits of integrating graph-based approaches with pre-trained modelsfor capturing both semantic relationships and topic relevance

Creator

Novi Yusliani1*, Aini Nabilah2, Muhammad Raihan Habibullah3, Annisa Darmawahyuni4, Ghita Athalina

Source

https://jurnal.iaii.or.id/index.php/RESTI/article/view/6279/1034

Publisher

Department of Informatic Engineering, Faculty of Computer Science, Universitas Sriwijaya, Palembang, Indonesia

Date

20-03-2025

Contributor

FAJAR BAGUS W

Format

PDF

Language

ENGLISH

Type

TEXT

Files

6279-Article Text-22173-1-10-20250320 (1).pdf

Collection

Vol 9 No 2 (2025)

Citation

Novi Yusliani1*, Aini Nabilah2, Muhammad Raihan Habibullah3, Annisa Darmawahyuni4, Ghita Athalina, “Comparing Word Representation BERT and RoBERTa in Keyphrase Extraction using TgGAT,” Repository Horizon University Indonesia, accessed January 26, 2026, https://repository.horizon.ac.id/items/show/10490.