Classification and Prediction of Video Game Sales Levels Using the Naive
Bayes Algorithm Based on Platform, Genre, and Regional Market Data
Dublin Core
Title
Classification and Prediction of Video Game Sales Levels Using the Naive
Bayes Algorithm Based on Platform, Genre, and Regional Market Data
Bayes Algorithm Based on Platform, Genre, and Regional Market Data
Subject
Naïve Bayes, Video Game Sales, Machine Learning, Classification, Data Imbalance, Feature Engineering, Predictive Modeling.
Description
The exponential expansion of the video game industry has resulted in a vast accumulation of market data that can be leveraged to analyze and
predict sales performance. This study aims to construct a classification model for video game sales levels by applying the Naïve Bayes algorithm,
recognized for its simplicity, efficiency, and strong baseline performance in supervised learning tasks. The research employs a public dataset
containing over 13,000 video game entries, encompassing key attributes such as genre, platform, publisher, release year, user and critic ratings,
and global sales figures. The target variable global sales was discretized into three categories: Low (<1 million units), Medium (1–5 million
units), and High (>5 million units) to represent distinct tiers of commercial success. Prior to modeling, the dataset underwent a comprehensive
preprocessing pipeline involving duplicate removal, handling of missing data, normalization of numerical attributes, and feature selection to
ensure optimal model performance. The Multinomial Naïve Bayes classifier was then implemented and assessed using standard evaluation
metrics, including accuracy, precision, recall, and F1-score. Experimental results revealed an accuracy of 71.82% and an F1-score of 70.03%,
signifying strong predictive capability for a probabilistic model of this simplicity. The classifier effectively identified low and medium sales
categories, though slightly underperformed on the high sales group due to class imbalance within the dataset. Further analysis of conditional
probabilities indicated that game genre, platform popularity (especially PS2 and Wii), and critic scores were the most influential determinants of
higher sales outcomes. These findings affirm that the Naïve Bayes algorithm provides a reliable and interpretable foundation for video game
sales prediction, serving as a benchmark model in market analytics. Future studies are encouraged to address data imbalance through
oversampling or synthetic data generation, incorporate contextual variables such as marketing strategies and release schedules, and explore
ensemble or deep learning approaches to enhance predictive accuracy and robustness
predict sales performance. This study aims to construct a classification model for video game sales levels by applying the Naïve Bayes algorithm,
recognized for its simplicity, efficiency, and strong baseline performance in supervised learning tasks. The research employs a public dataset
containing over 13,000 video game entries, encompassing key attributes such as genre, platform, publisher, release year, user and critic ratings,
and global sales figures. The target variable global sales was discretized into three categories: Low (<1 million units), Medium (1–5 million
units), and High (>5 million units) to represent distinct tiers of commercial success. Prior to modeling, the dataset underwent a comprehensive
preprocessing pipeline involving duplicate removal, handling of missing data, normalization of numerical attributes, and feature selection to
ensure optimal model performance. The Multinomial Naïve Bayes classifier was then implemented and assessed using standard evaluation
metrics, including accuracy, precision, recall, and F1-score. Experimental results revealed an accuracy of 71.82% and an F1-score of 70.03%,
signifying strong predictive capability for a probabilistic model of this simplicity. The classifier effectively identified low and medium sales
categories, though slightly underperformed on the high sales group due to class imbalance within the dataset. Further analysis of conditional
probabilities indicated that game genre, platform popularity (especially PS2 and Wii), and critic scores were the most influential determinants of
higher sales outcomes. These findings affirm that the Naïve Bayes algorithm provides a reliable and interpretable foundation for video game
sales prediction, serving as a benchmark model in market analytics. Future studies are encouraged to address data imbalance through
oversampling or synthetic data generation, incorporate contextual variables such as marketing strategies and release schedules, and explore
ensemble or deep learning approaches to enhance predictive accuracy and robustness
Creator
Rafi Pratama Putra1,*
, Nevita Cahaya Ramadani2
, Agi Nanjar3
, Nevita Cahaya Ramadani2
, Agi Nanjar3
Source
https://ijiis.org/index.php/IJIIS/article/view/242/155
Publisher
University of AMIKOM Purwokerto
Date
january 2025
Contributor
Fajar bagus W
Format
PDF
Language
English
Type
Text
Files
Collection
Citation
Rafi Pratama Putra1,*
, Nevita Cahaya Ramadani2
, Agi Nanjar3, “Classification and Prediction of Video Game Sales Levels Using the Naive
Bayes Algorithm Based on Platform, Genre, and Regional Market Data,” Repository Horizon University Indonesia, accessed January 2, 2026, https://repository.horizon.ac.id/items/show/9723.
Bayes Algorithm Based on Platform, Genre, and Regional Market Data,” Repository Horizon University Indonesia, accessed January 2, 2026, https://repository.horizon.ac.id/items/show/9723.