TELKOMNIKA Telecommunication, Computing, Electronics and Control
Large scale data analysis using MLlib
Dublin Core
Title
TELKOMNIKA Telecommunication, Computing, Electronics and Control
Large scale data analysis using MLlib
Large scale data analysis using MLlib
Subject
Big data
Data analysis
Machine learning
Open source
Parallel processing
Spark MLlib
Data analysis
Machine learning
Open source
Parallel processing
Spark MLlib
Description
Recent advancements in the internet, social media, and internet of things
(IoT) devices have significantly increased the amount of data generated in a
variety of formats. The data must be converted into formats that is easily
handled by the data analysis techniques. It is mathematically and physically
expensive to apply machine learning algorithms to big and complicated data
sets. It is a resource-intensive process that necessitates a huge amount of
logical and physical resources. Machine learning is a sophisticated data
analytics technology that has gained in importance as a result of the massive
amount of data generated daily that needs to be examined. Apache Spark
machine learning library (MLlib) is one of the big data analysis platforms
that provides a variety of outstanding functions for various machine learning
tasks, spanning from classification to regression and dimension reduction.
From a computational standpoint, this research investigated Apache Spark
MLlib 2.0 as an open source, autonomous, scalable, and distributed learning
library. Several real-world machine learning experiments are carried out in
order to evaluate the properties of the platform on a qualitative and
quantitative level. Some of the fundamental concepts and approaches for
developing a scalable data model in a distributed environment are also
discussed.
(IoT) devices have significantly increased the amount of data generated in a
variety of formats. The data must be converted into formats that is easily
handled by the data analysis techniques. It is mathematically and physically
expensive to apply machine learning algorithms to big and complicated data
sets. It is a resource-intensive process that necessitates a huge amount of
logical and physical resources. Machine learning is a sophisticated data
analytics technology that has gained in importance as a result of the massive
amount of data generated daily that needs to be examined. Apache Spark
machine learning library (MLlib) is one of the big data analysis platforms
that provides a variety of outstanding functions for various machine learning
tasks, spanning from classification to regression and dimension reduction.
From a computational standpoint, this research investigated Apache Spark
MLlib 2.0 as an open source, autonomous, scalable, and distributed learning
library. Several real-world machine learning experiments are carried out in
order to evaluate the properties of the platform on a qualitative and
quantitative level. Some of the fundamental concepts and approaches for
developing a scalable data model in a distributed environment are also
discussed.
Creator
Ahmed Hussein Ali, Maan Nawaf Abbod, Mohammed Khamees Khaleel, Mostafa Abdulghafoor Mohammed, Tole Sutikno
Source
http://journal.uad.ac.id/index.php/TELKOMNIKA
Date
Sep 12, 2021
Contributor
peri irawan
Format
pdf
Language
english
Type
text
Files
Collection
Citation
Ahmed Hussein Ali, Maan Nawaf Abbod, Mohammed Khamees Khaleel, Mostafa Abdulghafoor Mohammed, Tole Sutikno, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
Large scale data analysis using MLlib,” Repository Horizon University Indonesia, accessed November 21, 2024, https://repository.horizon.ac.id/items/show/4294.
Large scale data analysis using MLlib,” Repository Horizon University Indonesia, accessed November 21, 2024, https://repository.horizon.ac.id/items/show/4294.