The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character RecognitionEngine

Dublin Core

Title

The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character RecognitionEngine

Subject

affine transformation;digital signature;automatic data entry;optical character recognition;RSA-2048;SHA-256;tesseract-OCR

Description

The 2019 election process employed the Vote Counting Information System, also known as Sistem Informasi Penghitungan Suara (Situng), to provide transparency in the recapitulation process. The data displayed in Situng is from the C1 document for 813,336 voting stations in Indonesia. The data collected from the C1 document is entered and uploaded into Situng by officers at the municipal General Election Commission (GEC). Since this process is performed by humans, it is not immune to errors. In the recapitulation process of the 2019 election results, there were 269 data entry errors, and the data entry process also did not run according to the specified target, resulting in delays. Furthermore, there were cases of C1 document modification, raising concerns about the data's authenticity. To avoid human errors and increase data entry speed, automatic data entry is a plausible option. The data entered is text data in image documents with the same template format, so that optical character recognition (OCR) can be used to read the text while improving image quality and alignment, resulting in a more accurate OCR reading area. In this study, we developed aC1 document data extraction application using the waterfall SDLC method, which has undergone a systematic and thorough process. The application wasdeveloped using Tesseract optical character recognition. Tesseract is an open-source OCR engine and command-line programthat allows for the recognition of text characters within a digital image. The accuracy obtained by using this method is still not optimal as a substitute for Situng's data entry officer. To guarantee the integrity of the C1 document, we used the RSA-2048 digital signature scheme.Usingthe Tesseract-OCR Engine for character recognition, combined with digital signature capabilities, provides a comprehensive solution to reduce the human error factor that might result in miscalculations and inaccurate processes

Creator

Ircham Aji Nugroho1, Bety Hayat Susanti2*,Mareta Wahyu Ardyani3, Nadia Paramita R.A.4

Source

https://jurnal.iaii.or.id/index.php/RESTI/article/view/5151/891

Publisher

Department of Cryptographic Engineering, Politeknik Siber dan Sandi Negara

Date

04-02024

Contributor

FAJAR BAGUS W

Format

PDF

Language

ENGLISH

Type

TEXT

Files

Collection

Citation

Ircham Aji Nugroho1, Bety Hayat Susanti2*,Mareta Wahyu Ardyani3, Nadia Paramita R.A.4, “The Design of a C1 Document Data Extraction Application Using a Tesseract-Optical Character RecognitionEngine,” Repository Horizon University Indonesia, accessed January 12, 2026, https://repository.horizon.ac.id/items/show/10193.