TELKOMNIKA Telecommunication, Computing, Electronics and Control
Parallel classification and optimization of telco trouble ticket dataset
Dublin Core
Title
TELKOMNIKA Telecommunication, Computing, Electronics and Control
Parallel classification and optimization of telco trouble ticket dataset
Parallel classification and optimization of telco trouble ticket dataset
Subject
Classification
Hadoop
Optimization
Spark
Trouble ticket
Hadoop
Optimization
Spark
Trouble ticket
Description
In the big data age, extracting applicable information using traditional machine
learning methodology is very challenging. This problem emerges from the
restricted design of existing traditional machine learning algorithms, which do
not entirely support large datasets and distributed processing. The large
volume of data nowadays demands an efficient method of building machine-
learning classifiers to classify big data. New research is proposed to solve
problems by converting traditional machine learning classification into a
parallel capable. Apache Spark is recommended as the primary data processing
framework for the research activities. The dataset used in this research is
related to the telco trouble ticket, identified as one of the large volume datasets.
The study aims to solve the data classification problem in a single machine
using traditional classifiers such as W-J48. The proposed solution is to enable
a conventional classifier to execute the classification method using big data
platforms such as Hadoop. This study’s significant contribution is the output
matrix evaluation, such as accuracy and computational time taken from both
ways resulting from hyper-parameter tuning and improvement of W-J48
classification accuracy for the telco trouble ticket dataset. Additional
optimization and estimation techniques have been incorporated into the study,
such as grid search and cross-validation method, which significantly improves
classification accuracy by 22.62% and reduces the classification time by
21.1% in parallel execution inside the big data environment.
learning methodology is very challenging. This problem emerges from the
restricted design of existing traditional machine learning algorithms, which do
not entirely support large datasets and distributed processing. The large
volume of data nowadays demands an efficient method of building machine-
learning classifiers to classify big data. New research is proposed to solve
problems by converting traditional machine learning classification into a
parallel capable. Apache Spark is recommended as the primary data processing
framework for the research activities. The dataset used in this research is
related to the telco trouble ticket, identified as one of the large volume datasets.
The study aims to solve the data classification problem in a single machine
using traditional classifiers such as W-J48. The proposed solution is to enable
a conventional classifier to execute the classification method using big data
platforms such as Hadoop. This study’s significant contribution is the output
matrix evaluation, such as accuracy and computational time taken from both
ways resulting from hyper-parameter tuning and improvement of W-J48
classification accuracy for the telco trouble ticket dataset. Additional
optimization and estimation techniques have been incorporated into the study,
such as grid search and cross-validation method, which significantly improves
classification accuracy by 22.62% and reduces the classification time by
21.1% in parallel execution inside the big data environment.
Creator
Fauzy Che Yayah, Khairil Imran Ghauth, Choo-Yee Ting
Source
http://journal.uad.ac.id/index.php/TELKOMNIKA
Date
Sep 20, 2020
Contributor
peri irawan
Format
pdf
Language
english
Type
text
Files
Collection
Citation
Fauzy Che Yayah, Khairil Imran Ghauth, Choo-Yee Ting, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
Parallel classification and optimization of telco trouble ticket dataset,” Repository Horizon University Indonesia, accessed April 3, 2025, https://repository.horizon.ac.id/items/show/3826.
Parallel classification and optimization of telco trouble ticket dataset,” Repository Horizon University Indonesia, accessed April 3, 2025, https://repository.horizon.ac.id/items/show/3826.