TELKOMNIKA Telecommunication, Computing, Electronics and Control
WEIDJ: Development of a new algorithm for semi-structured web data extraction

Dublin Core

Title

TELKOMNIKA Telecommunication, Computing, Electronics and Control
WEIDJ: Development of a new algorithm for semi-structured web data extraction

Subject

Document object model
JavaScript object notation
Web data extraction
Wrapper extraction of image

Description

In the era of industrial digitalization, people are increasingly investing in
solutions that allow their process for data collection, data analysis and
performance improvement. In this paper, advancing web scale knowledge
extraction and alignment by integrating few sources by exploring different
methods of aggregation and attention is considered in order focusing on

image information. The main aim of data extraction with regards to semi-
structured data is to retrieve beneficial information from the web. The data

from web also known as deep web is retrievable but it requires request
through form submission because it cannot be performed by any search
engines. As the HTML documents start to grow larger, it has been found that
the process of data extraction has been plagued with lengthy processing time.
In this research work, we propose an improved model namely wrapper
extraction of image using document object model (DOM) and JavaScript
object notation data (JSON) (WEIDJ) in response to the promising results of
mining in a higher volume of image from a various type of format. To
observe the efficiency of WEIDJ, we compare the performance of data
extraction by different level of page extraction with VIBS, MDR, DEPTA
and VIDE. It has yielded the best results in Precision with 100, Recall with
97.93103 and F-measure with 98.9547.

Creator

Ily Amalina Ahmad Sabri, Mustafa Man

Source

http://journal.uad.ac.id/index.php/TELKOMNIKA

Date

Document object model
JavaScript object notation
Web data extraction
Wrapper extraction of image

Contributor

peri irawan

Format

pdf

Language

english

Type

text

Files

Collection

Tags

,Repository, Repository Horizon University Indonesia, Repository Universitas Horizon Indonesia, Horizon.ac.id, Horizon University Indonesia, Universitas Horizon Indonesia, HorizonU, Repo Horizon , ,Repository, Repository Horizon University Indonesia, Repository Universitas Horizon Indonesia, Horizon.ac.id, Horizon University Indonesia, Universitas Horizon Indonesia, HorizonU, Repo Horizon , ,Repository, Repository Horizon University Indonesia, Repository Universitas Horizon Indonesia, Horizon.ac.id, Horizon University Indonesia, Universitas Horizon Indonesia, HorizonU, Repo Horizon , ,Repository, Repository Horizon University Indonesia, Repository Universitas Horizon Indonesia, Horizon.ac.id, Horizon University Indonesia, Universitas Horizon Indonesia, HorizonU, Repo Horizon ,

Citation

Ily Amalina Ahmad Sabri, Mustafa Man, “TELKOMNIKA Telecommunication, Computing, Electronics and Control
WEIDJ: Development of a new algorithm for semi-structured web data extraction,” Repository Horizon University Indonesia, accessed November 22, 2024, https://repository.horizon.ac.id/items/show/3550.