Folk Games Image Captioning using Object Attention
Dublin Core
Title
Folk Games Image Captioning using Object Attention
Subject
image captioning; folk games; object attention; object detection
Description
The result of deep-learning based image captioning system with encoder-decoder framework relies heavily on image feature
extraction technique and caption-based model. The model accuracy is heavily influenced by the proposed attention mechanism.
Unsuitability between the output of the attention model and the input expectation of the decoder can cause the decoder to give
incorrect results. In this paper, we proposed an object attention mechanism using object detection. Object detection outputs a
bounding box and object category label, which is then used as an image input into VGG16 for feature extraction and into a
caption-based LSTM model. Experiment results showed that the system with object attention gave better performances than
the system without object attention. BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDER scores for image captioning system with
object attention improved 12.48%, 17.39%, 24.06%, 36.37%, and 43.50% respectively compared to the system without object
attention.
extraction technique and caption-based model. The model accuracy is heavily influenced by the proposed attention mechanism.
Unsuitability between the output of the attention model and the input expectation of the decoder can cause the decoder to give
incorrect results. In this paper, we proposed an object attention mechanism using object detection. Object detection outputs a
bounding box and object category label, which is then used as an image input into VGG16 for feature extraction and into a
caption-based LSTM model. Experiment results showed that the system with object attention gave better performances than
the system without object attention. BLEU-1, BLEU-2, BLEU-3, BLEU-4, and CIDER scores for image captioning system with
object attention improved 12.48%, 17.39%, 24.06%, 36.37%, and 43.50% respectively compared to the system without object
attention.
Creator
Saiful Akbar, Benhard Sitohang, Jasman Pardede,
Irfan I. Amal, Kurniandha S. Yunastrian, Marsa T. Ahmada, Anindya Prameswari
Irfan I. Amal, Kurniandha S. Yunastrian, Marsa T. Ahmada, Anindya Prameswari
Source
http://jurnal.iaii.or.id
Publisher
Professional Organization Ikatan Ahli Informatika Indonesia (IAII)/Indonesian Informatics Experts Association
Date
August 2023
Contributor
Sri Wahyuni
Rights
ISSN Media Electronic: 2580-0760
Format
PDF
Language
English
Type
Text
Files
Collection
Citation
Saiful Akbar, Benhard Sitohang, Jasman Pardede,
Irfan I. Amal, Kurniandha S. Yunastrian, Marsa T. Ahmada, Anindya Prameswari, “Folk Games Image Captioning using Object Attention,” Repository Horizon University Indonesia, accessed January 12, 2026, https://repository.horizon.ac.id/items/show/10071.