Analysis and Mitigation of Religion Bias in Indonesian Natural Language Processing Datasets

Dublin Core

Title

Subject

natural language processing; Indonesian NLP; social bias; debiasing

Description

Previous studies have shown the existence of misrepresentation regarding various religious identities in Indonesian media.
Misrepresentations of other marginalized identities in natural language processing (NLP) datasets have been recorded to
inflict harm against such marginalized identities, in cases such as automated content moderation, and as such must be
mitigated. In this paper, we analyze, for the first time, several Indonesian NLP datasets to see whether they contain unwanted
bias and the effects of debiasing on them. We find that two, out of three, datasets analyzed in this study contain unwanted bias,
whose effects trickle down to downstream performance under the form of allocation and representation harm. The results of
debiasing at the dataset level, as a response to the biases previously discovered, are consistently positive for the respective
dataset. Nevertheless, depending on the dataset and embedding used to train the model, they vary highly at the downstream
performance level. In particular, the same debiasing technique can decrease bias on a combination of datasets and embedding,
yet increase bias on another, particularly in the case of representation harm.

Creator

Muhammad Arief Fauzan, Ari Saptawijaya

Source

http://jurnal.iaii.or.id

Publisher

Professional Organization Ikatan Ahli Informatika Indonesia (IAII)/Indonesian Informatics Experts Association

Date

August 2023

Contributor

Sri Wahyuni

Rights

ISSN Media Electronic: 2580-0760

Format

PDF

Language

English

Type

Text

Files

5035-Article Text-17642-1-10-20230812.pdf

Collection

VOL 7 NO 4 (2023)

Tags

debiasing,Repository, Repository Horizon University Indonesia, Repository Universitas Horizon Indonesia, Horizon.ac.id, Horizon University Indonesia, Universitas Horizon Indonesia, HorizonU, Repo Horizon , Indonesian NLP,Repository, Repository Horizon University Indonesia, Repository Universitas Horizon Indonesia, Horizon.ac.id, Horizon University Indonesia, Universitas Horizon Indonesia, HorizonU, Repo Horizon , natural language processing,Repository, Repository Horizon University Indonesia, Repository Universitas Horizon Indonesia, Horizon.ac.id, Horizon University Indonesia, Universitas Horizon Indonesia, HorizonU, Repo Horizon , social bias

Citation

Muhammad Arief Fauzan, Ari Saptawijaya, “Analysis and Mitigation of Religion Bias in Indonesian Natural Language Processing Datasets,” Repository Horizon University Indonesia, accessed February 3, 2026, https://repository.horizon.ac.id/items/show/10030.