TELKOMNIKA Telecommunication, Computing, Electronics and Control
Single document keywords extraction in Bahasa Indonesia using phrase chunking
Dublin Core
Title
TELKOMNIKA Telecommunication, Computing, Electronics and Control
Single document keywords extraction in Bahasa Indonesia using phrase chunking
Single document keywords extraction in Bahasa Indonesia using phrase chunking
Subject
Keyword candidate, Keyword extraction, Phrase chunking, Phrase pattern, Single document
Description
Keywords help readers to understand the idea of a document quickly.
Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results.
Unfortunately, considerable time and effort are often needed to come up with a good set of keywords manually. This research focused on generating keywords from a document automatically using phrase chunking. Firstly, we collected part of speech patterns from a collection of documents. Secondly, we used those patterns to extract candidate keywords from the abstract and the content of a document. Finally, keywords are selected from the candidates based on the number of words in the keyword phrases and some scenarios involving candidate reduction and sorting. We evaluated the result of each scenario using precision, recall, and F-measure. The experiment results show: i) shorter-phrase keywords with string reduction extracted from the abstract and sorted by frequency provides the highest score, ii) in every proposed scenario, extracting keywords using the abstract always presents a better result, iii) using shorter-phrase patterns in keywords extraction gives better score in comparison to using all phrase patterns, iv) sorting scenarios based on the multiplication of candidate frequencies and the weight of the phrase patterns offer better results.
Creator
I Nyoman Prayana Trisna, Arif Nurwidyantoro
Source
DOI: 10.12928/TELKOMNIKA.v18i4.14389
Publisher
Universitas Ahmad Dahlan
Date
August 2020
Contributor
Sri Wahyuni
Rights
ISSN: 1693-6930
Relation
http://journal.uad.ac.id/index.php/TELKOMNIKA
Format
PDF
Language
English
Type
Text
Coverage
TELKOMNIKA Telecommunication, Computing, Electronics and Control
Files
Collection
Citation
I Nyoman Prayana Trisna, Arif Nurwidyantoro, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
Single document keywords extraction in Bahasa Indonesia using phrase chunking,” Repository Horizon University Indonesia, accessed November 13, 2024, https://repository.horizon.ac.id/items/show/3974.
Single document keywords extraction in Bahasa Indonesia using phrase chunking,” Repository Horizon University Indonesia, accessed November 13, 2024, https://repository.horizon.ac.id/items/show/3974.