Segmentation of Leaf Spots Disease in Apple Plants Using Particle Swarm Optimization and K-means Algorithm

Leaf spot disease is one of the causes of a decrease in apple production. Early detection of this disease will increase the quality and amount of apple production. Monitoring the health of apple plants in larger area is traditionally a job that requires a lot of time and effort. The use of drones to detect the leaf spot disease is an alternative technology to monitor the health of apple plants in large areas. The image of an apple leaf taken by a drone needs to be processed by segmenting the apple leaves into the infected and the healthy leaves to detect apple leaf spot disease. K-means clustering offers an algorithm that is simple, fast and works unsupervised to segment images compared to level set algorithms. Random selection of centroid will cause K-means to be trapped at the local optimum point and result in unsatisfactory image segmentation. To solve this problem, Particle Swarm Optimization offers a good solution to optimization problems to avoid convergence problems at the local optimum. Therefore, in this paper, we study how to segment leaf spot disease on apple leaves using the Particle Swarm Optimization and K-means algorithm. The objective function of K-means algorithm optimized by Particle Swarm Optimization is used for segmenting the leaf spot disease. The first step in this proposed method is to convert the leaf image from RGB to CIE L * a * b format. The a component’s of L * a * b format are taken and clustered by using the Particle Swarm Optimization. The global best of the Particle Swarm Optimization becomes an initial centroid of K-means algorithm. The experimental results show that the Particle Swarm Optimization-K-means (PSOK) has a better performance than the K-means algorithm in segmenting leaf spot disease.


Introduction
Horticultural commodities are an important part of the agricultural sector. In line with the increasing of population growth, the public demand for horticultural products in Indonesia also increase. Apple plants are one of the most popular horticultural commodities in Indonesia besides oranges and mango. Since 1960, apple plants have been widely planted in Batu City to replace citrus plants that have died from disease. The apple is one the main commodity in Batu City. A leaf spot disease on apple plants is one of the diseases that often attacks the apple plants. The leaf spot disease is a disease that attacks leaves aged 4-6 weeks after the process of cutting leaves and twigs that are less productive. The leaf spot disease results in significant defoliation, decreases the apple quality and the apple marketability due to severe infection [1]. The detection of the leaf spot disease on apple plants early is very important to overcome this problem. According to some studies, leaf spot disease develops very quickly and the incubation period can be as short as two days, so it is necessary to detect leaf spot diseases early. The farmer detect the leaf spot disease in the apple plant manually in the narrow field. However, the detecting leaf spot disease in a short time on a large filed is not an easy job for

Research Methodology
This research is started by the literature study. The second step in this research is the developing of the proposed model or method and program. The last step in this research is the performance evaluation of the proposed method.

Image Segmentation
Along with the development of information technology and computers, its users are increasingly diverse. One of the fields in information technology that has developed very rapidly and has been utilized in agriculture is the image processing technology. The image processing technology with artificial intelligence and robotics becomes a promising technology in agriculture. Zhai and Du in 2008 proposed a method of classification of plants based on the leaf images using the extreme learning machine method [5]. Image segmentation is one of the pre-processing methods before the classification or identification process. Image segmentation is the process of partitioning digital images into segments so that each segment has similar visual characteristics. There are several image segmentation techniques, namely thresholding, edge, clustering, and neural network. Among these techniques, the easiest and most widely used technique in image segmentation is clustering. One method of the popular clustering techniques is K-Means clustering [6].

K-means Clustering for image segmentation
Clustering is a method that is used to divide a set of data into a specific number of groups [6]. K-means clustering is a method of grouping non-hierarchical (bulk) data that attempts to partition existing data into two or more groups. Data that has similar characteristics will gather in the same cluster and different data will gather in different clusters. The purpose of grouping this data is to minimize the objective function used in the grouping process, which generally seeks to minimize variations within a group and maximize variation between groups. In the image segmentation, the data points are the pixel values of an image. A digital image can be represented as a M × N multidimensional matrix. Then the matrix is changed to a one-dimensional X matrix measuring l × p where l is the multiplication of the number of M rows and the number of N columns, while p is the number of dimensions. For colour images, p has a value of 3 because color images have 3 matrices, namely matrix R, G, and B. The purpose of image segmentation using K-means clustering is to group pixel images into k clusters. The flowchart of the K-means clustering algorithm for image segmentation can be seen in Figure 1.
In several studies, it was found that K-means clustering has several disadvantages, namely the number of clusters must be determined at the beginning of the algorithm, fast converging to local optimum or slow converging to global optimum, and very sensitive to the initial centroid. Therefore, if the initial centroid is not suitable, it will produce bad output clustering or bad image segmentation. It is caused by the algorithm is trapped around the local optimum value. Therefore, the researchers tried to combine the metaheuristic algorithm with the K-means clustering algorithm to balance exploration and exploitation in order to obtain better algorithms, such as Particle Swarm Optimization (PSO).

Particle Swarm Optimization
Particle Swarm Optimization (PSO) is a metaheuristic method that is inspired by the social behavior of animals such as flocks of birds and fish. . Since PSO algorithm has a number of desirable properties, it has been used for solving problems in many real-world. It has properties such as simplicity of implementation, scalability in dimension, and good empirical performance [7]. PSO has three important parameters, namely particle velocity (v), social parameters (c1 and c2), and weight of inertia (w). Particle velocity is limited in the range [vmin, vmax] so that the particle does not oscillate quickly around the optimal value with a very large amplitude, so it exits the search area. These three parameters function to balance between intensification and diversification. The value of the three large parameters will cause the movement of large particles as well, thus encouraging the process of diversification, while the value of the three small parameters will cause a smooth particle movement, thus encouraging the intensification process. Social parameters are useful for changing the direction of particles and determining the path of particles to the optimum solution. PSO is an optimization algorithm that has a strong exploration power, because it is able to search all solution areas to get the optimum value. Therefore, PSO is very suitable for determining global optimum solutions and can be applied to optimization problems with large dimensions. In addition, PSO also has a simple algorithm concept [6]. Based on this description, the PSO algorithm and K-Means Clustering will be combined to balance exploration and exploitation so that a better algorithm is obtained. The PSO and K-Means Clustering algorithm is called PSOK.

Proposed Method
This article proposes a method for segmenting the leaf spot disease by using PSOK. Flowchart of the proposed method can be shown in Figure 3.1

Transform the image color space from RGB to L * a * b
In this study, the image used is a color image. The color image in the RGB color space is first transformed into the L * a * b color space. The process produces three matrices for each image, namely the matrix L, * a, and * b. Furthermore, the three matrices are visualized to determine the matrix used as an input variable for each image, which is a matrix that has pixel values that are able to distinguish between image objects and image backgrounds. Then the process of segmentation and evaluation of the performance of the PSOK algorithm is performed.

Image Segmentation by PSOK
PSOK is a hybrid algorithm combined with PSO and K-means clustering algorithm which consists of two stages. In image segmentation using the PSOK algorithm, the first thing to do is input the image to be segmented. The image is represented as a multidimensional matrix measuring × . Then the matrix is changed to a one-dimensional X matrix measuring × where l is the multiplication of M rows and N columns and p is the number of dimensions. After obtaining the matrix X, then the PSOK algorithm will run the first stage. In the first stage, PSOK seeks global optimum solutions in the solution space using the PSO algorithm. The fitness value calculated by the PSO algorithm is the sum of the minimum distances of each dimension of the image pixel to each particle dimension. When the global optimum solution or the estimated global optimum solution is found, the PSO algorithm is stopped and the PSOK algorithm continues the second stage using K-means clustering. The global optimum solution found in the first stage, is the initial centroid of K-means clustering. Following is the algorithm of PSOK: 1. Input a matrix X, the number of particles (s), PSOK parameters, maximum iteration ( ), epsilon, and number of cluster ( ), c1,c2, wmin, wmax. 2. Generate the position and velocity of the particle according to the following equation as many as s particles where, ( ) is a position velocity i th at t iteration, ( ) is a particle velocity i th at t iteration and is number of dimensions. 3. Calculate the distance of each data to each cluster center at the particle position using the following equation, where ( , ) is distance between data to l to the position of particle i by considering cluster center m.
is data points to l with p dimensions and ( , )is the position of the particle i at the center of cluster m with p dimensions. 4. Place each data to the nearest cluster center in the particle position. 5. Calculate the fitness value of each particle using the following equation where, is i th -particle fitness value.
6. Update the best particle position using the following equation, ⃗ ( ) is the best particle position of particle i th at t th iteration. ⃗ ( ) is the position of particle i th at t th iteration. ⃗ ( ) is fitness value of the best position of particle i th at t th iteration.
7. Update the best global position of all particles using the following equation, where, ⃗ is the global best position. 8. Update the position and velocity of each particle using the following equation, ⃗ ( ) is particle velocity i th , t is inertia weight, and , is social parameter 1 and 2. ( ), ( ) is random number [0,1] . ⃗ ( ) is the best position of the particle i th at t th iteration. ⃗ ( ) is position of particle i th at t th iteration and ⃗ is the global best position.
is average fitness value of all particles. is the variance of fitness values of all particles. 10. Update inertia weight using the following equation [3].
11. If the first stage is converging, create matrix based on ( ). After that, the clustering process is done using K-means clustering. 12. The results of the program are a matrix , fitness value, and computational time.

Experimental Results and Discussions
In this section, we will describe the data used in this paper, the parameters used in the study and the experimental results of the proposed method using test data. In addition, the analysis of experimental results will be explained.

Data used for experiments
This paper uses some the image data set of apple leaf infected spot disease for testing the performance of the proposed method. Leaf images used are as many as 4 types of leaf images which can be seen in Figure 4. The test images have varying colors, shapes, textures and lighting. This is intended so that the presence of the method can be known after testing with that image.

Setting parameters
Before doing the experiments to evaluate the proposed method, several parameters of PSOK algorithm have to be be determined in advance. The parameters used in the PSOK algorithm are a cognitive parameter c1 =1.49, a social parameter c2 = 1.49, an inertia weight maximum wmax = 0.9, and an inertia weight minimum wmin = 0.4 [3]. The number particles for the PSO algorithm is 20 to 30 particles [8], so this experiments uses 20 particles, the number of cluster (k) used for all images at the algorithm is 2. The goal of the proposed method is to divide each image into 2 clusters of the leaf spot disease image (foreground) and the health leaf(background). The maximum iteration is 100 iteration.   Figure 7. Segmentation results by PSOK (the proposed method). (a)-(d) Healthy leaf area of data 1st, data 2nd, data 3rd, and data 4th segmented by PSOK, respectively. (e)-(h) Leaf spot disease area of data 1st, data 2nd, data 3rd, and data 4th segmented by PSOK, respectively. Figure 8. Segmentation results by K-means algorithm. (a)-(d) Healthy leaf area of data 1st, data 2nd, data 3rd, and data 4th segmented by K-means algorithm, respectively. (e)-(h) Leaf spot disease area of data 1st, data 2nd, data 3rd, and data 4th segmented by K-means algorithm, respectively.  Figure 5 shows the comparison of image representations in different color spaces. Figure 5. (a) shows the image of leaves in RGB color space. Leaf spot disease is indicated by the brownish areas and the healthy leaves with a greenish color. Figure 5 shows only the reddish-greenish component (a) of the L * a * b space that is able to distinguish areas affected by leaf spot disease. Image in grayscale format, image components in RGB format, and image components L and b in color space L * a * b cannot distinguish areas affected by leaf spot disease. Therefore, in the next step only component a will be used for the clustering process. Figure 7 shows the results of leaf spot disease segmentation in apple plants using the PSOK method.  Figure 7, it can be shown that the PSOK method can segment the affected leaves well. In contrast to the PSOK method, the Kmeans method or algorithm is not able to properly segregate leaves with leaf spot disease. This can be seen in Figure 8. Figure 8 (a), 8 (b) and 8 (c) can see that the healthy leaf area is still mixed with the area affected by leaf spot disease, while in Figure 8 (d), it can be seen that the healthy part of the leaf is extracted from the leaves infected with leaf spot disease. From Figures 7 and 8, they can be concluded that the PSOK method has better performance.

Conclusions
From the experimental results, it can be concluded that the proposed method (method of segmentation of leaf spot disease using PSOK method) has a much better performance compared to the K-means algorithm.