TELKOMNIKA Telecommunication, Computing, Electronics and Control
WEIDJ: Development of a new algorithm for semi-structured web data extraction
Dublin Core
Title
TELKOMNIKA Telecommunication, Computing, Electronics and Control
WEIDJ: Development of a new algorithm for semi-structured web data extraction
WEIDJ: Development of a new algorithm for semi-structured web data extraction
Subject
Document object model
JavaScript object notation
Web data extraction
Wrapper extraction of image
JavaScript object notation
Web data extraction
Wrapper extraction of image
Description
In the era of industrial digitalization, people are increasingly investing in
solutions that allow their process for data collection, data analysis and
performance improvement. In this paper, advancing web scale knowledge
extraction and alignment by integrating few sources by exploring different
methods of aggregation and attention is considered in order focusing on
image information. The main aim of data extraction with regards to semi-
structured data is to retrieve beneficial information from the web. The data
from web also known as deep web is retrievable but it requires request
through form submission because it cannot be performed by any search
engines. As the HTML documents start to grow larger, it has been found that
the process of data extraction has been plagued with lengthy processing time.
In this research work, we propose an improved model namely wrapper
extraction of image using document object model (DOM) and JavaScript
object notation data (JSON) (WEIDJ) in response to the promising results of
mining in a higher volume of image from a various type of format. To
observe the efficiency of WEIDJ, we compare the performance of data
extraction by different level of page extraction with VIBS, MDR, DEPTA
and VIDE. It has yielded the best results in Precision with 100, Recall with
97.93103 and F-measure with 98.9547.
solutions that allow their process for data collection, data analysis and
performance improvement. In this paper, advancing web scale knowledge
extraction and alignment by integrating few sources by exploring different
methods of aggregation and attention is considered in order focusing on
image information. The main aim of data extraction with regards to semi-
structured data is to retrieve beneficial information from the web. The data
from web also known as deep web is retrievable but it requires request
through form submission because it cannot be performed by any search
engines. As the HTML documents start to grow larger, it has been found that
the process of data extraction has been plagued with lengthy processing time.
In this research work, we propose an improved model namely wrapper
extraction of image using document object model (DOM) and JavaScript
object notation data (JSON) (WEIDJ) in response to the promising results of
mining in a higher volume of image from a various type of format. To
observe the efficiency of WEIDJ, we compare the performance of data
extraction by different level of page extraction with VIBS, MDR, DEPTA
and VIDE. It has yielded the best results in Precision with 100, Recall with
97.93103 and F-measure with 98.9547.
Creator
Ily Amalina Ahmad Sabri, Mustafa Man
Source
http://journal.uad.ac.id/index.php/TELKOMNIKA
Date
Document object model
JavaScript object notation
Web data extraction
Wrapper extraction of image
JavaScript object notation
Web data extraction
Wrapper extraction of image
Contributor
peri irawan
Format
pdf
Language
english
Type
text
Files
Collection
Citation
Ily Amalina Ahmad Sabri, Mustafa Man, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
WEIDJ: Development of a new algorithm for semi-structured web data extraction,” Repository Horizon University Indonesia, accessed November 22, 2024, https://repository.horizon.ac.id/items/show/3550.
WEIDJ: Development of a new algorithm for semi-structured web data extraction,” Repository Horizon University Indonesia, accessed November 22, 2024, https://repository.horizon.ac.id/items/show/3550.