Oversampling vs. undersampling in TF-IDF variations for imbalanced Indonesian short texts classification
Dublin Core
Title
Oversampling vs. undersampling in TF-IDF variations for imbalanced Indonesian short texts classification
Subject
Bahasa Indonesia
Imbalanced dataset
Oversampling method
Short-text classification
Term frequency inversed document frequency Undersampling method
Imbalanced dataset
Oversampling method
Short-text classification
Term frequency inversed document frequency Undersampling method
Description
Even though it is considered a more traditional method compared to more modern algorithms, term frequency inversed document frequency (TF-IDF) nevertheless produces good results in a range of text mining tasks. This study assesses the effectiveness of several TF-IDF modifications for short text classification. Imbalanced datasets are another issue that is addressed in this research. To rectify the imbalanced issue, we integrate standard, log-scaled, and boolean TF-IDF in short text classification with undersampling and oversampling methods. Precision, recall, and f-measure metrics are used to evaluate each experiment. The best result is obtained when applying boolean TF-IDF with the oversampling method. Oversampling methods outperform the undersampling methods in every experiment, although there are some cases where experiments with undersampling methods are considerable. Additionally, our conducted study reveals that employing modified TF-IDF, such as boolean or log-scaled versions, provides greater advantages to classification performance, particularly in handling imbalanced datasets, when compared to solely relying on the standard TF-IDF approach.
Creator
I Nyoman Prayana Trisna, Ni Wayan Emmy Rosiana Dewi, Muhammad Alam Pasirulloh
Source
Journal homepage: http://telkomnika.uad.ac.id
Date
Dec 26, 2024
Contributor
PERI IRAWAN
Format
PDF
Language
ENGLISH
Type
TEXT
Files
Collection
Citation
I Nyoman Prayana Trisna, Ni Wayan Emmy Rosiana Dewi, Muhammad Alam Pasirulloh, “Oversampling vs. undersampling in TF-IDF variations for imbalanced Indonesian short texts classification,” Repository Horizon University Indonesia, accessed January 12, 2026, https://repository.horizon.ac.id/items/show/10009.