Feature extraction Hue, Saturation, Value (HSV) and Gray Level Cooccurrence Matrix (GLCM) for identification of woven fabric motifs in South Central Timor Regency

South Central Timor (TTS) is one of the districts that has a weaving culture and also produces woven cloth in East Nusa Tenggara. The many types of woven fabric from each TTS tribe makes outsiders and even native TTS people do not recognize the typical TTS woven fabric, therefore we need a system that can help facilitate the community in recognizing the type and motif of woven fabric. In this study, digital image processing is used to identify the type of woven fabric in the TTS district using the HSV color feature extraction method, and the GLCM texture feature, and to measure the similarity of woven fabric using the Euclidean distance method. The image data of the woven fabric used is the image of woven fabric from 3 tribes of TTS district, namely the Amanatun, Amanuban, and Mollo tribes. Identification of woven fabric motifs using the K-fold cross validation test with two stages, namely the training and testing stages. The results of testing variants using 10 fold get an accuracy rate for GLCM texture features of 55%, for HSV color features of 62.5% and a combination of color and texture features of 91.67%.


I. Introduction
Indonesia has a lot of artistic and cultural diversity with the characteristics of each region, one of which is in terms of dress. Almost all regions in Indonesia have their own distinctive and high-value traditional fabrics. If the batik cloth generally comes from the Java region, then for the type of woven cloth it originates and is a characteristic of the population in the areas of Sumatra, Nusa Tenggara, Kalimantan and Sulawesi. In NTT, woven fabrics in each region are unique in terms of patterns and motifs. The existing motives or patterns are a description of the daily life of the community and have close ties with the people in each region. The tradition of weaving cloth in NTT has been largely abandoned due to the fact that fewer and fewer young people learn weaving techniques from their parents. South Central Timor (TTS) is one of the districts that has a weaving culture and is also a producer of woven fabrics in East Nusa Tenggara. TTS Regency has 3 major and indigenous tribes, namely Amanuban, Amanatun and Mollo. Each of them has many types of fabrics with distinctive weaving patterns and motifs. The many types of woven fabrics from each TTS tribe make outsiders and even native TTS people less familiar with TTS's typical woven fabrics. People who want to know and want to buy have difficulty in choosing fabrics from each tribe, because only certain people such as those who weave or local customary elders know well the types and motifs of woven fabrics of each tribe in TTS. The local government has made efforts to build several galleries for TTS woven fabrics that aim to make it easier for the community to recognize the types and motifs of woven fabrics, but these efforts are not enough because people still have difficulty choosing the woven motifs of each tribe. At this time, the community is still wrong in choosing or determining the cloth motif for each tribe in TTS. Therefore, to make it easier for people to choose woven fabric motifs, a system is needed that can help parties, especially in the gallery TTS woven fabric. In this study the authors used digital images to identify the woven fabrics of the 3 tribes of TTS district. Many methods are supported in digital images, for example, Image feature extraction, Edge Detection, Convolution, Image Quality Improvement, and various other methods. One of the methods used in this study is the extraction of the color features of Hue, Saturation, Value (HSV). The choice of HSV as the recognition of color characteristics is because of the several existing color feature recognition methods, the HSV method is the best color feature recognition method [1]. Texture feature extraction is Gray Level Cooccurrence Matrix (GLCM)used because according to Siqueira et al. [2] among several statistical approaches, GLCM is proven to be very powerful as a feature descriptor in representing the texture characteristics of an image. In a study Nurhaida et al. [3] stated that, GLCM is the best feature extraction method for recognizing batik images, when compared to canny edge detection and gabor filters. Euclidean Distance is used to make it easier to classify types of fabrics based on the pattern of motifs on woven fabrics based on texture characteristics and color characteristics.

Digital Image processing
Is a process for processing images, which aims to improve image quality so that it is easily interpreted or identified by humans or computers. The term digital image processing is generally defined as twodimensional image processing with a computer where the input is an image, and the output can be an image or a set of characteristics related to the image.

Preprocessing
Preprocessing is a process for processing image data that is used before the image data is processed into the image identification stage. Some of the preprocessing used in this study, among others, cutting or cropping images, resizing or resizing, printing stretching, converting images to HSV color, and grayscalling (washing).

2.2.1.
Cropping. Image cutting, or what is commonly called cropping, is a technique to cut off part of an image, to extract the part that is used. This process aims to separate one object from another object in an image.

Resizing.
Resizing is a process of changing the original size of an image to the specified size. To change the image size, the method used is the interpolation method bicubic. Bicubic interpolation uses neighboring 4x4 pixels to obtain information [4].

Contrast stretching.
Contrast stretching is a technique used to obtain a new image with a better contrast than the original image [5]. Contrast stretching is an image quality improvement method that aims to increase or decrease the contrast of an image by widening or narrowing the range of image pixel intensity values [6].

HSV color space.
Space is one of the color spaces used by humans, in determining and describing colors. To get the HSV color characteristics, a color conversion process from RGB to HSV is carried out. The equation for converting an RGB image to an HSV image uses Equations (1) to (3) [7]. (1) In this formula, if the value of S = 0 then H cannot be determined, it is necessary for RGB normalization first with equations (4) to (6).
where: R : value Red has not been normalized r : value red Normalized G : value Green has not been normalized g : value green Normalized B : value Blue has not been normalized b : value blue Normalized After the RGB image is normalized, then it is converted into an HSV image with the following equation: If R = V, then H =

Washing (Grayscalling).
Washing or grayscalling is the initial process in image processing. It is used to simplify the image model. The color of the image compiler, which was 3 colors, will be changed to only 1 color or image grayscale. To change a color image into image grayscale, use Equation (13).

Feature extraction
Feature extraction is one way to recognize an object by looking at specific characteristics possessed by the object. The method used for feature extraction in this study is the extraction of HSV color features and texture features of the Gray Level Cooccurrence Matrix (GLCM).
2.3.1. HSV color feature extraction. Extraction uses the characteristic mean of the HSV color space distribution, so that each color component of the HSV color space will produce the characteristic value mean. To calculate the characteristic value mean of HSV color using Equation (14) [8]. where

Texture features
Before extracting texture features, the RGB image is converted into a grayscale image after which the texture features of the image can be searched using the method Gray level cooccurrence matrix (GLCM). and feature statistics of second order.

a. Gray Level Cooccurrence Matrix (GLCM)
GLCM is a matrix that represents the neighborhood relationship between pixels in an image at various angles and distances certain. The stages of making the GLCM matrix are carried out by making a 4corner matrix first, the angles that are maxed are angles 0 0 , 45 0 , 90 0 , 135 0 with a distance of 1 pixel. After the 4 corner matrix is created then normalized into a probability form.

1)
Creating an angle matrix 0 0 , 45 0 , 90 0 , 135 0 . The 4 corner matrix is obtained by looking for the neighbor relationship between pixels in the direction and distance in the image. Figure 1 is an illustration depicting a matrix of 4 types of angles and a distance of 1 pixel used

2)
Normalization of the Gray Level Cooccurrence Matrix (GLCM). The 4 corner matrix is normalized to obtain the GLCM matrix, which later the GLCM matrix is used as input in calculating second order statistical features, the steps are as follows: • 4 corner matrix is added with the transposition so that it becomes symmetrical.
• The 4 corner matrix that has been symmetrical is then normalized to the form of probability, the element value for each cell is divided by the number of all its constituents or N.

b.
Second order statistical feature extraction Second order statistical feature. Extraction is used to distinguish textures from image objects by utilizing the GLCM matrix. To obtain second order statistical features, first look for the values mean, variance and standard deviation of the GLCM matrix with the following equation: where: µx , µy : value Mean of normalized matrix σ 2 x , σ 2 y : value Variance of normalized matrix σx,σy : Standard deviation value of the normalized matrix Pij : Value coocurence characteristic matrix of order two The result of these three values, used as input for calculating second order statistical features. In this study using 5 statistical characteristics of second order.

1)
Contrast. Contrast indicates the size of the image intensity. Visually, the contrast value is a measure of the variation between degrees of gray in an image area. To get the contrast value, you can use Equation (22).

3)
Homogeneity. Homogeneity is used to measure the homogeneity of the intensity variation of an image. To get the homogeneity value, you can use Equation (24).
where: i : Row of second order feature co-occurrence matrix j : Column of second order feature co-occurrence matrix P (i, j) : Value of second order feature co-occurrence matrix 4) Energy. Energy represents the distribution of pixel intensity. Energy is used to measure the concentration of intensity pairs on the matrix co-occurrence. To get the energy value, you can use Equation (25).
where: i : Row of second order feature co-occurrence matrix j : Column of second order feature co-occurrence matrix P (ij) : Value of second order feature co-occurrence matrix of 5) Entropy. Entropy represents a measure of the irregularity of shape. The entropy value is large for an image with an even degree of gray transition and a small value if the image structure is irregular or varies [9]. To get the entropy value, Equations (26) and (27) can be used.
where: i : Second order co-occurrence matrix line j : Second order co-occurrence matrix column P (ij) : Second order feature co-occurrence matrix value ƐEpsilon :value (2.2204e-16)

Euclidean distance method
In image processing, one of the parameters that represents the level of similarity between two images is the distance euclidean. The smaller the distance euclidean between two images, the more similar the two images will be. To get the value, you euclidean can use Equation ( Figure 2, the system testing process starts from input image into the system. The image that has been input is entered into preprocessing, where the image is cropped to be separated from the unused part and the image is resized to equalize the size to 200x200 pixels, then the image is changed in contrast using contrast stretching. Image of the preprocessing into the process of extraction of image features, starting with the conversion to HSV color then the process of feature extraction HSV color, then in the process of feature extraction texture with GLCM, the image is converted to grayscale and then the process of extracting texture features GLCM with 5 characteristic statistic of order two (Contrast, Correlation, Energy, Homogeneity, Entropy). After obtaining the value of the test image feature extraction, then the distance is calculated Euclidean with the training image in the database, so that the identification results of woven fabrics are obtained. The identification results are stored in the database application, then displayed in the results bar on the system.

System testing method.
Image data used for testing cross validation in this study were 240 images of woven fabrics. Image data consists of 96 image data of Amanatun woven cloth, 84 image data of Amanuban tribe and 60 image data of woven cloth of Mollo tribe. For this study, with a total of 240 image data, the value k used by the researcher was 10. The number of k was determined by the researcher with the intention of dividing 240 image data into 10 random parts (subsets) with the number of each image data randomly. The first step in testing is dividing the data into 10 folds. In the process fold-ing, the fold first is a combination of several subsets different which are then combined and used as data training. The subset remaining is used as data testing. The process training and testing is repeated until all subsets have their turn for testing.

Results of system testing with K-fold cross validation
In this study, the number of data used was 240 data, then the data was divided into several subsets for 10 folds. The Testing process is K-fold carried out until all subsets have their turn trained (training) and tested (testing). The following are the test results using 10-fold cross validation.
3.1.1. 10-fold cross validation. Testing with 10-fold cross validation, image data is divided into 10 subsets, each subset has 24 image data. Test results with 10-fold cross validation can be seen in Table  3

Discussion
The results indicate fold to -9 is the fold with the best data that represents the overall data for identification of woven fabric TTS districts with the highest accuracy rate of 91.67%. In testing using 10-fold cross validation, a large deviation of accuracy is obtained, namely 58.34, this is because the quality of some image data used is not good, which is influenced by non-uniform brightness levels, and some woven fabric objects in the image are less clear shape so as to make the system misidentify and affect accuracy.

Conclusion
Based on testing using the K-fold cross validation, fold into -9 is the fold with the best data that represents the overall data for identification of woven fabric TTS districts with the highest accuracy rate of 91.67%. The test results show that the extraction of textural features alone is not good enough in identifying woven fabrics with the highest accuracy of 54.16%. Extraction of color features produces the highest accuracy of 62.5% and the combination of extraction of color and texture features can work well in identifying and obtaining the highest accuracy of 91.67%. The resulting accuracy deviation for 10-fold cross validation has a large value, namely 58.34, this is because the quality of some image data used is not good, which is influenced by non-uniform brightness levels, and some woven cloth objects in the image the shape is less clear so that the system misidentifies and affects accuracy.

Suggestions
Inside identification of woven fabrics, the method of extraction of color features (HSV) and texture characteristics (GLCM) do not become benchmarks in determining the color and texture characteristics of an image. The more features used for identification, the better the identification results. Therefore, in future studies it is recommended to use other feature extraction in the identification process. In addition, it is expected to use woven fabric image data with better image quality than the image data in this study.