Image Enhancement System for the Restoration of Old Jawi Malay Manuscripts using Binarization Method

Image enhancement is the process of refining a given image, so that the desired image features become easier to be perceived by the human visual system. Ancient manuscripts from hundreds to thousands of years often are often non-comprehendible condition. Several factors include the age of the manuscript, environmental influence, and ink quality. This includes Malay manuscripts, which are normally written in the Jawi scripts. These documents, which were written from the 16th to 19th century, are in possession to date, however, the quality has been degraded. The purpose of this study is to develop a system to protect historical manuscripts which are in worsening form and also to find solution to reduce image noise. Type of enhancement method employed is binarization or thresholding method. As a result of this study, Image Enhancement System (IES) was easy to use and has a great potential for improvement in future because IES provides high entropy value for enhanced image compared to original image.


Introduction
There are a large number of manuscripts written in the past, and the condition of these historical manuscripts are degrading. Several factors that cause the degradation of the manuscripts include ill age, environmental influence, and ink quality. An example includes old Malay manuscripts, which are normally written in the Jawi scripts. These documents, which were written in the period of 16th to 19th century, have still survived until today but the quality has been degraded [1]. The Image Enhancement System (IES) aids in the restoration of old Jawi Malay manuscripts using the binarization process, preserving the cultural heritage and ancient knowledge for future generations.
The software employed in developing this project is MATLAB. The MATLAB allows to test algorithms immediately without recompilation and can type something at the command line or execute a section in the editor and immediately see the results, greatly facilitating algorithm development. Other than that, MATLAB is able to obtain the results quickly by other means. This is probably because MATLAB has a long history of refinement, has very good user documentation and has a very large library of built-in pre-written functions for many common numerical computing tasks.
This study classifies the methods to enhance old manuscripts into two image enhancement methods, which are Image enhancement method binarization/ thresholding and local image enhancement method. In the evaluation section, the entropy value and expert review are employed to evaluate the results of the project.

Background and related study
This section describes the background of Image Enhancement System (IES) and related studies that have been investigated. There are few useful methods to calculate background and foreground regions of an image. A prominent method to segment the original image into binary image is called Entropybased method. Based on the performance analysis, IES provides higher entropy value for enhanced image compared to original image.
Past studies have discovered different methods to facilitate the reading and reproduction of the manuscripts which possess damages [1]. Fuzzy logic and histogram are the fast and efficient method used for colour image enhancement system. Research by [2] proposes a new fuzzy logic and histogram-based algorithm for improving low contrast colour images. The techniques were based on the concept of translating the distorted histogram of the original image into a standardised histogram. The visual quality, Tenengard, CII, and computing time are all used to test the efficiency of the various contrast enhancement algorithms. Based on the results of the performance review, the researchers believed that the proposed Fuzzy Logic approach is well suited for low contrast colour picture contrast enhancement [2].
MedGA is an innovative evolutionary approach for enhancing image quality in medical imaging systems. Image enhancement techniques are frequently used in medical imaging systems to aid physicians in anomaly/abnormality detection and diagnosis, as well as to improve the quality of images that are subjected to automated image processing. MedGA is a new image enhancement approach based on Genetic Algorithms that strengthens the two underlying sub-distributions of images with a bimodal grey level intensity histogram, improving the appearance and visual quality [3].
Historical Devanagari Manuscripts' Digital Image Restoration [4] discussed on the current techniques for background restoration, which have been done to remove background deterioration. However, limited work has been done to remove deterioration in the foreground of the manuscript. The aim of the work is to restore the foreground of the manuscript by using restoration techniques to complete the missing characters in a torn section of the manuscript, by using the Otsu binarization thresholding method to restore the context. On degraded manuscript images, image processing methods were used to restore text, foreground, and context [4].

Methodology
This Image Enhancement System (IES) was developed in two phases. The idea was formulated in the first phase, including documentation, including the preparation of the proposal, presentation of the idea objective and analysis of requirements, design and prototyping of the application. A number of functions must be covered when developing the requirements, such as covering the binarization/thresholding process, covering the entropy value, and understanding the local image enhancement method. The second phase focused on the development, testing, and maintenance phases. In the analysis phase which determine where the problem is, it attempts to fix the system, which involves breaking down the system into different pieces to analyze the situation, analyzing project goals, breaking down what needs to be created, and attempting to engage users so that definite requirements can be defined. The survey technique is used for requirement analysis. The google form was prepared with 20 questions regarding Image Enhancement System (IES) and was distributed randomly. Expert review completes this study.

Binarization/Thresholding Method
Historical document images generally show low quality due to storage conditions and the quality of the written parchment which cause the documents to be degraded over time. Thus, it is difficult to separate the foreground and background. The problem causes more difficulty due to the fact that many documents have varying contrast, dirt, varying background intensity, and the presence of seeping ink from the opposite side of the document [8]. In the first stage, a global threshold is used to generate an initial binary image B. This is sufficient for noise free characters. Then, as described below, an evaluation procedure determines which of the connected components in B are well separated from the background and which components need to be refined. This study treats SDi as the foreground and the rest as the background (CCi-SDi) within the bounding box of CCi. Then an algorithm for a distance 3 transform [10] is used which calculates the distance from the nearest first pixel for each background pixel (pixel in set CCi − SDi) (pixels in the set SDi). The set of calculated distances for Ti by DTi are then denoted. The difference in DTi is used to distinguish between good and noisy characters. the document is presumed to be fully degraded if σmean ≥ σths, where σths is an empirically validated threshold, and all its elements, CCi, are labelled as noisy. Aside from that, the document comprises of both positive and bad characters. Every component CCi with σi ≤ σmean is classified as a well segmented character in this case. The local method is used to process the remaining components as below [8]: a) Find all potential pixels. Background pixels that are 8-connected to the rising foreground are the candidate pixels. b) Assign candidate pixels p, consider the following for each candidate pixel p: Let Mf be the average grey scale value of this window's foreground pixels and Mb be the average grey scale value of this window's background pixels. Assign p to the class.

Method Based on Entropy Threshold Calculation
The background and foreground regions are calculated using a few useful methods. Entropy-based method is a prominent means of segmenting the original image into a binary image [12]. This method usually involves certain threshold values in the range of two variables, T1 (background) and T2 (foreground). In theory, the higher the entropy value, the greater the number of details contained in the image; therefore, a higher entropy value is required.

Usefulness and Ease of Use
In reviewing the system, two types of acceptance testing are used, which are alpha and beta. For beta acceptance testing, approximately 30 respondents were employed from Universiti Utara Malaysia comprising of students and staffs. To collect the opinion of the system, a questionnaire given in Google form attached with a video of image enhancement system. There are three sections of the questionnaire to be answered, Section A is respondent's demography and background information. Section B is about the usefulness, the ease and Section C is about satisfaction towards image enhancement system (IES). The respondents must answer all the questions. In evaluation, the respondents were given 15 to 20 minutes to perform all tasks. For alpha acceptance testing, few output images of image enhancement system were provided to five experts. They must evaluate each output image and also to rewrite what they have observed.

Demography and Background.
Demographics and background are important because they are measurable characteristics. For example, it is simple to determine number of male and female students, as well as the current study semester and other relevant information. The ability to measure such characteristics allows to identify the number of people potentially to be required for future and current image enhancement systems. Thirty (30) respondents answered the questionnaire distributed to them. The number of male respondents are eight (8) and female respondents are 22. The highest number of respondents is from semester 5 which is 40% of the total respondents. The number of respondents from semester 6 who answered the questionnaire are six (6) students or 20%. Furthermore, the number of respondents from semester 3 are four (4) students or 13.3% and number of respondents from semester 4 until semester 7 are three (3) students or 10% in each semester, respectively. Finally, the number of respondents from semester 1 are two (2) students or 6.7% and respondents from semester 2 are zero.

Usefulness of Image Enhancement System.
Most of the respondents agree with the statement 'Able to complete the task quickly using image enhancement system' whereby 16 respondents answered 'Agree'. However, seven (7) respondents answered 'Neutral', one respondent chose 'Disagree' and six (6) respondents answered 'Strongly Agree for this question. Most of the respondents agree that 'Whenever they made mistake while using this system, they can recover quickly and easily' whereby 20 respondents answered 'Agree'. However, seven (7) respondents answered neutral, two (2) respondents choose 'Disagree' and one (1) respondent answered 'Strongly Agree' for this question.
Most of the respondents agree that 'Image Enhancement System has all functions and capabilities that expected it to have' whereby 19 respondents answered 'Agree' and two (2) respondents answered 'Strongly Agree'. However, eight (8) respondents answered 'Neutral' and one (1) respondent answered 'Disagree'.
Most of the respondents agreed that 'All the functions and capabilities work or behave exactly as anticipated' whereby 18 respondents answered 'Agree' and two (2) respondents selected 'Strongly Agree'. However, 10 respondents selected 'Neutral'.

Ease of Use of Image Enhancement System.
The number of respondents agree with the statement are 21 respondents. However, five (5) respondents answered 'Strongly agree' and all the other respondents answered 'Neutral'. Most of the respondents also agree that they feel comfortable using image enhancement system and they feel in control and have freedom when using this image enhancement system, whereby 16 respondents answered 'Agree' and five (5) respondents answered 'Strongly Agree' and rest of them answered 'Neutral'. Meanwhile, for the statement 'Navigation through this image enhancement system is easy', 21 respondents answered 'Agree' and five (5) respondents answered 'Strongly agree'. However, most of the respondents agreed with that statement and only six (6) respondents answered 'Neutral' and one (1) respondent answered 'Strongly disagree'.
Most of the respondent agree that 'They always know where to get the information they wanted' whereby 14 respondents answered 'Agree'. However, five (5) respondents answered 'Strongly agree', and 11 respondents answered 'Neutral' for this question. For the overall user satisfaction of Image Enhancement System, most of the respondents agree that 'Image Enhancement system is easy to use' whereby 17 respondents answered 'Agree'. Eleven (11) respondents answered 'Neutral' and two (2) respondents answered, 'Strongly agree'. Most of the respondent also agree that 'They are satisfied with this Image Enhancement System' whereby 17 respondents answered 'Agree' for this question. However, 11 respondents answered 'Neutral' and two (2) respondents answered 'Strongly agree'.

.Evaluation Process
For the evaluation process, five evaluators were selected with expertise in image processing field and they were given few output images of image enhancement system. The evaluators must evaluate each output images and were also requested to rewrite what they have observed. Figure 1 shows the process of getting feedback from the evaluators. Evaluators must predict the Jawi words given by the study output. Two evaluators obtained better results than the other three evaluators. This assessment of accuracy depends on the word results identified by the evaluator. Inaccurate answers mostly occurred due to the selection of the alphabet found in the image segmentation and the number of dots in the word.
The first word refers to the word 'dengan', (gloss:with), written with three dots for the letter 'nga ‫.')ڠ(‬ Therefore, three out of five evaluators could not guess this word due to the lack of dots for letters. However, the three dots in the image are also not clear due to the lines found in the image. The researchers believe that two out of five evaluators have a good foundation in Jawi writing.
The second word refers to the word 'kita', (gloss:we), which is a word that is difficult to identify in the second image. This is because there is the letter 'wau ‫')و(‬ at the beginning of the word and makes it difficult for the evaluator to evaluate the word in this second image. This is beyond the scope of this study which can separate one word from another due to cursive Jawi letters. The letters 'ya ‫')ي(‬ and 'ta ‫')ت(‬ are also less clear in the image although they have been improved with the binarization method. However, the image appears clearer than the original manuscript.
The third word refers to the word 'puji', (gloss:praise), which is the word that obtained accurate results from all evaluators. This word is clearly seen with the binarization method. The writing of the letter 'pa ‫')ف(‬ should have three dots ‫)ڨ(‬ in the new Jawi script; however, the writing of the old manuscript uses one dot for the writing of the letters 'fa ‫')ف(‬ and 'pa ‫.')ڨ(‬

Prototype Development
A prototype of Image Enhancement System was developed. It represents the requirement explained in the previous subsection. Software prototyping is a standard way of demonstrating the software requirement so that further comments and suggestions could be obtained from the users based on their experience in interacting with the prototype. The MATLAB software was used to develop this system. Figure 3 shows the login page for the system.  Figure 4 shows the interface of the enhancement system. The enhancement process includes load file, enhancement of the images based on the sharpness, filtering colours, colour emp (Filtering of the Image), and colour inverse. Moreover, entropy is applied to measure the details in the image according to the Shannon theory. Theoretically, the higher the entropy value, the greater the details contained in the image, so a higher entropy value is desired.

Conclusion
This study describes the design and development of Image Enhancement system. There are many aspects of IES that can be studied. In usability evaluation, this study suggests that the system requires improvement on the interface part. Meanwhile, based on the expert review, the system requires