Oversampling vs. undersampling in TF-IDF variations for imbalanced Indonesian short texts classification

Dublin Core

Title

Oversampling vs. undersampling in TF-IDF variations for imbalanced Indonesian short texts classification

Subject

Bahasa Indonesia
Imbalanced dataset
Oversampling method
Short-text classification
Term frequency inversed document frequency Undersampling method

Description

Even though it is considered a more traditional method compared to more modern algorithms, term frequency inversed document frequency (TF-IDF) nevertheless produces good results in a range of text mining tasks. This study assesses the effectiveness of several TF-IDF modifications for short text classification. Imbalanced datasets are another issue that is addressed in this research. To rectify the imbalanced issue, we integrate standard, log-scaled, and boolean TF-IDF in short text classification with undersampling and oversampling methods. Precision, recall, and f-measure metrics are used to evaluate each experiment. The best result is obtained when applying boolean TF-IDF with the oversampling method. Oversampling methods outperform the undersampling methods in every experiment, although there are some cases where experiments with undersampling methods are considerable. Additionally, our conducted study reveals that employing modified TF-IDF, such as boolean or log-scaled versions, provides greater advantages to classification performance, particularly in handling imbalanced datasets, when compared to solely relying on the standard TF-IDF approach.

Creator

I Nyoman Prayana Trisna, Ni Wayan Emmy Rosiana Dewi, Muhammad Alam Pasirulloh

Source

Journal homepage: http://telkomnika.uad.ac.id

Date

Dec 26, 2024

Contributor

PERI IRAWAN

Format

PDF

Language

ENGLISH

Type

TEXT

Files

Collection

Citation

I Nyoman Prayana Trisna, Ni Wayan Emmy Rosiana Dewi, Muhammad Alam Pasirulloh, “Oversampling vs. undersampling in TF-IDF variations for imbalanced Indonesian short texts classification,” Repository Horizon University Indonesia, accessed January 12, 2026, https://repository.horizon.ac.id/items/show/10009.