The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character RecognitionEngine
Dublin Core
Title
The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character RecognitionEngine
Subject
affine transformation;digital signature;automatic data entry;optical character recognition;RSA-2048;SHA-256;tesseract-OCR
Description
The 2019 election process employed the Vote Counting Information System, also known as Sistem Informasi Penghitungan Suara (Situng), to provide transparency in the recapitulation process. The data displayed in Situng is from the C1 document for 813,336 voting stations in Indonesia. The data collected from the C1 document is entered and uploaded into Situng by officers at the municipal General Election Commission (GEC). Since this process is performed by humans, it is not immune to errors. In the recapitulation process of the 2019 election results, there were 269 data entry errors, and the data entry process also did not run according to the specified target, resulting in delays. Furthermore, there were cases of C1 document modification, raising concerns about the data's authenticity. To avoid human errors and increase data entry speed, automatic data entry is a plausible option. The data entered is text data in image documents with the same template format, so that optical character recognition (OCR) can be used to read the text while improving image quality and alignment, resulting in a more accurate OCR reading area. In this study, we developed aC1 document data extraction application using the waterfall SDLC method, which has undergone a systematic and thorough process. The application wasdeveloped using Tesseract optical character recognition. Tesseract is an open-source OCR engine and command-line programthat allows for the recognition of text characters within a digital image. The accuracy obtained by using this method is still not optimal as a substitute for Situng's data entry officer. To guarantee the integrity of the C1 document, we used the RSA-2048 digital signature scheme.Usingthe Tesseract-OCR Engine for character recognition, combined with digital signature capabilities, provides a comprehensive solution to reduce the human error factor that might result in miscalculations and inaccurate processes
Creator
Ircham Aji Nugroho1, Bety Hayat Susanti2*,Mareta Wahyu Ardyani3, Nadia Paramita R.A.4
Source
https://jurnal.iaii.or.id/index.php/RESTI/article/view/5151/891
Publisher
Department of Cryptographic Engineering, Politeknik Siber dan Sandi Negara
Date
04-02024
Contributor
FAJAR BAGUS W
Format
PDF
Language
ENGLISH
Type
TEXT
Files
Collection
Citation
Ircham Aji Nugroho1, Bety Hayat Susanti2*,Mareta Wahyu Ardyani3, Nadia Paramita R.A.4, “The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character RecognitionEngine,” Repository Horizon University Indonesia, accessed January 12, 2026, https://repository.horizon.ac.id/items/show/10193.