Balinese glyph recognition with gabor filters

Recognizing Balinese glyphs from the Balinese script on palm leaf manuscripts is not trivial. In Balinese script, there are more than a hundred glyphs which represent basic syllables and compound syllables, and also some punctuation marks. They naturally share a strong interclass similarity between each other related to the form of their writing curves. The degraded image of textured palm leaf manuscript also offer some challenging parts in recognizing the Balinese glyph. In this paper, we investigated the use of Gabor filter bank as the feature extraction method to recognize the Balinese glyphs. By using Gabor filter, we can detect many texture variations with different orientations and frequencies. In our experiments, the published dataset of AMADI_LontarSet for glyph recognition was used. It showed a very promising result by using a single hidden layer Neural Network as the classifier. Gabor filters with Zoning method achieved a high enough recognition rate. For future works, Gabor filters will be analyzed in combination with the Histogram of Gradient, Neighborhood Pixel Weight and Kirsch Edges features.


Introduction
In many document image analysis (DIA) research projects, glyph recognition or character recognition is frequently performed. Naturally, the glyph/character recognition step is needed in the Optical Character Recognition (OCR) module, especially for the segmentation based text recognition methods. For the segmentation based text recognition methods [1], [2], prior to recognition process, a text line will be segmented into individual or isolated glyph/character segment. The recognition process will be performed for each glyph/character segment.
Many works on the isolated character recognition were already reported. The existence of wide variety of feature extraction methods to recognize a glyph/character helps many DIA projects to achieve their optimal performances. Nevertheless, the findings of many ancient documents or manuscripts all over the world offer a continuous challenges for glyph/character recognition methods. These ancient manuscripts contain many ancient scripts with totally a new collection of glyph/characters that need to be analysed. For example, the ancient manuscript collections found in some Asian countries or more specifically in some Southeast Asian countries. In the cases of these manuscripts, the feature extraction methods which are already widely used for Latin script recognition are not trivial to be applied and to be implemented. This is due to the fact that those ancient manuscripts with the ancient scripts, glyphs and characters have many specific characteristics in writing style, different form and formats, and oftenly with many very complex writing rules. For example the Chinese and Japanese script families [3]- [6], the Indian script families [7], [8], the Gurmukhi script [9]- [12], the Sundanese script [13], the Khmer script [14]- [16], and the Balinese script [17]- [19]. The collection of manuscripts on palm leaves from Bali, Indonesia, offers also a significant number of challenges for the DIA research projects [17]. These manuscripts were written in dried palm leaves and scratched with a small knife to write the script. The first challenge is the physical condition of the manuscripts. They are degraded and the writings are faded. The second challenge is the complexity of the Balinese script as it is a family of the alpha-syllabic script. This paper presents an investigation study on the use of Gabor filters to recognize the isolated Balinese glyph. Previous work on the isolated Balinese glyph recognition has actually been reported [18]. But, a new challenge in the word spotting system research for the Balinese manuscript collections on palm leaves leads to a new hypothesis in the use of Gabor filters for feature extraction method. Section 2 of this paper presented a short overview about the characteristics of the Balinese glyphs in Balinese script. Gabor filters were presented in Section 3. The experimental results, the Balinese glyph dataset and the evaluation method was described in Section 4. Finally, Section 5 some conclusions and future works were listed in Section 5.

Balinese Glyphs of Balinese Script
In this section, a brief overview of the use of Balinese script in Balinese palm leaf manuscripts was presented. The Balinese glyph collection was also introduced.

The use of Balinese Script in Manuscripts on palm leaves
More than fifty thousand collections of manuscripts on palm leaves were found in Bali, Indonesia ( Figure 1). The main collections, about six thousands palm leaf manuscript collections can be found in two principal museum owned by the Balinese regional government. But, most of the remaining collections are owned by the Balinese private family at their houses. The important values of these collections are the variety of the manuscript's content. It ranges widely from the religious contents, arts, laws, architectural aspects, traditional medical knowledge, and so many socio-cultural aspects of the ancient Balinese ways of life. The Balinese language, mixed with the old Javanese of Kawi and the Sanskrit, with the Balinese script were used in palm leaf manuscripts writing. The Balinese script is a descendant of the Brahmi script. Nowadays, the Balinese in Bali speak the Balinese language, but they do not write in Balinese script. It makes the young generation is not totally aware of the richness of their ancient cultural heritages in the palm leaf manuscript collections. They do not know how to write in Balinese script, and it makes even more difficult to read the Balinese script in the manuscript collections. In Balinese script, there are more than a hundred glyphs. They consist of the consonants, the independent vowels, the dependent vowels, the conjunct form of the consonants, the digits, the punctuations, and many additional symbols for special consonants and musical symbols (Figure 2).

Gabor Filters
Gabor filter is a filter modulation between sinusoid and Gaussian filter [20]. Gabor filter is a texture based filter with many orientations and frequencies. A bank of Gabor filters is a collection of many Gabor filters with different orientation, wavelength, aspect ratio and bandwidth as the parameters (Figure 4). In our hypothesis, Gabor filter can provide the initial information about the existence of textures in the document. It can be used to detect the preliminary informations about text and non text   [21]. The Zoning method was finally applied to the binarized image of Gabor filtered image. It consisted of seven different zoning area such as vertical and horizontal zoning, block zoning, diagonal zoning, radial and circular zoning ( Figure 5). For all zoning areas, the parameter of zone width was set to 10 pixels. The feature value in each zoning area was computed by calculating the ratio of text pixels and all pixels in zoning area. Finally, all feature values which were extracted from the binarized images of the 64 Gabor filtered images, divided with seven variations of Zoning area were then joined to output a vector of 4,992 dimensions as the final feature.

Experimental Results and Evaluation
We present in this section the dataset that we used in our experimental tests to evaluate the performance of Gabor filters in recognizing the Balinese glyphs.

Dataset
To investigate the performance of Gabor filter for the Balinese glyph recognition, the dataset of AMADI_LontarSet [17]  Each class has different number of sample images. Some samples were easily found in this dataset, but some others were too rare ( Figure 6).  Table 1 shows the recognition rate of the isolated Balinese glyph recognition by using Gabor filters and Zoning method. It shows a very promising result by using a single hidden layer Neural Network as the classifier. Gabor filters with Zoning method achieved a high enough recognition rate. Recognizing Balinese glyphs from the Balinese script on palm leaf manuscripts is not trivial. In this paper, we investigated the use of Gabor filter bank as the feature extraction method to recognize the Balinese glyphs. Gabor filters provide initial information about the existence of text textures in the document. By using Gabor filter, we can detect many texture variations with different orientations and frequencies. For future works, Gabor filters will be analyzed in combination with the Histogram of Gradient, Neighborhood Pixel Weight and Kirsch Edges features.