Paper The following article is Open access

Balinese character recognition on mobile application based on tesseract open source OCR engine

, , , , , and

Published under licence by IOP Publishing Ltd
, , Citation I M D R Mudiarta et al 2020 J. Phys.: Conf. Ser. 1516 012017 DOI 10.1088/1742-6596/1516/1/012017

1742-6596/1516/1/012017

Abstract

Balinese script is a part of Balinese culture is rarely used today. The Provincial Government of Bali with the Governor Regulation number 80 of 2018 is trying to preserve the Balinese language and script. This study aimed at preserving the Balinese script through a mobile technology approach which is the recent trend with worldwide coverage for supporting ubiquitous learning. This research integrated the Android application to recognize Balinese characters in the form of images into text with Tesseract open source Optical Character Recognition (OCR) engine. The input of this application is a Balinese script image captured by a mobile camera or from a Balinese script image. The application recognized input image into text that can be further processed based on training data available in the application. The new Balinese script training data was created based on eighteen Balinese script's basic syllables and numbers only. This application can be operated offline with mobile hardware that supports camera functions. The result for testing for 50-word, recognition was 62% obtained in good quality image-based Bali-Simbar font. This application can be further developed to recognize other character repertoire i.e., vowels (Akśara Suara), semi vowels (Arda Suara), additional syllables (Akśara Şwalalita), and sound killers (Pangangge Tengenan).

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Please wait… references are loading.