Automatic Identification Of Construction Dust Based On Improved K-Means Algorithm

At present, the construction dust has caused great losses to people's healthy life and national economic development. In order to solve the shortcomings of the existing real-time sensor network detection of construction dust such as poor accuracy, this paper proposes an automatic identification of construction dust based on computer vision and improved K-Means clustering algorithm. We extract the saturation of HSV color model of each image to form a text data set. We determine the median value of the data set as the initial centroid of clustering, reduce the number of iterations and get the global optimal solution. Mahalanobis distance is used as similarity measure to cluster, which reduces the difference between different feature measures, improves the accuracy, and realizes the automatic recognition of construction dust based on computer vision. For the improved K-Means algorithm, the precision, recall rate and harmonic mean value are used to analyze the clustering results. The experimental results show that the improved K-Means algorithm has good robustness and high accuracy, and the automatic recognition rate can reach 89.33%.


Introduction
With the rapid development of economy, the process of urbanization is accelerating, and the construction of urban infrastructure is more and more. However, due to the influence of traditional production mode, the management mode of construction site is more bad. In the process of construction, a lot of dust will be produced, which will bring serious damage to the environment of the city. From the point of view of environmental pollution, Hui Yan [1] et.al showed that the concentrations of suspended particulates, PM2.5 and PM10 in construction sites were 42.24%, 19.76% and 16.27% higher than those in the surrounding environment, respectively. Construction dust is also one of the main factors of haze formation. Smaoui Nour [2] et.al have shown that dust emission is also a major factor threatening the health of construction workers and managers. The concentration of dust in the environment is directly proportional to the incidence rate of respiratory system. The concentration of dust in the air increases and the chance of illness increases. Dust reduces the visibility of the surrounding air, increases the pressure of urban traffic and reduces the visual beauty of people's life. At the same time, dust contains a variety of heavy metal elements, dust with the accumulation of heavy metal elements on the surface of plant leaves, hindering photosynthesis and respiration of plants, affecting the growth of plants. The dust from the construction site enters the equipment, which shortens the service life of the machine and increases the construction cost [3] .
In view of the above hazards of construction dust, it is great significant for the urban environment and the health of construction workers to detect the dust in the construction site and take timely measures to reduce the dust. In order to reduce the emission of construction dust, the government has promulgated various impact standards for construction sites and construction dust detection method [3] . At present, weight method is the only method with legal effect in China. The weight method is to calculate the weight difference of filter membrane before and after sampling, and obtain the concentration of construction dust through conversion relationship. The process of this method is complicated, the workload is large, and the degree of automation is relatively low. On construction sites, real-time sensor networks are usually used to detect the emission of construction dust. The realtime sensor network monitors the emission of construction dust. If the emission of construction dust reaches a certain concentration, it will send intelligent instructions to the dust reduction facilities to reduce the concentration of construction dust. However, the maintenance cost of real-time sensor network detection equipment is high and the accuracy is poor [4][5][6] .
With the development of computer vision and large capacity digital image memory, the use of computer vision and machine learning for image classification and recognition is becoming more and more popular [7][8] . In view of the shortcomings of the existing dust detection technology in sensor networks, this paper proposes an improved K-Means algorithm based on computer vision to realize the automatic identification of construction dust. This economic and efficient method of construction dust detection greatly facilitates the monitoring personnel.
The section one of this paper describes the hazards of construction dust and the limitations of the current research status. The section two discusses the overall design scheme. The section three describes the extraction of image features and the improved experimental algorithm. The section four mainly analyzes the results of our improved algorithm and the traditional method. The section five summarizes the actual contribution of this study, as well as the future research prospects.

Scheme Overview
The principle of K-Means clustering algorithm is simple and easy to understand, which it is suitable for clustering experiments of a large number image data. In outdoor vision system, a large number of images data need to be cluster analyzed to automatically recognition the images with construction dust. Therefore, this paper uses K-Means clustering algorithm to realize the images with construction dust. The scheme flow chart is shown in Fig.1.

Data sources
At the beginning of the experiment, we went to the Beijing construction site to collect images from time to time, as shown in Fig.2, each image was taken at different time, angle and place. It ensures the comprehensiveness and coverage of the data set. We collected 150 images with construction dust and 150 images without construction dust. The resolution of each image is 500 * 500.
(a) Image with construction dust (b) Image without construction dust Figure 2. The sample image

Image processing and feature extraction
Color feature is one of the most basic features of image. At the same time, color feature is the most widely used feature in image classification, which is easy to understand and extract. Compared with the surrounding environment, the construction dust has a great color difference. HSV is a color model that can clearly express human visual perception and characteristics. The hue, saturation and value of HSV color model are consistent with human visual sense [9][10] .
The images collected in the early stage are RGB color images. We need to convert the RGB color model of the images into HSV color models. This paper compares the image features of HSV color model and HSV color model separated the three channels from the sample images. We found that the saturation of the images with construction dust was significantly lower than that without construction dust. As shown in the comparison in Fig.3. Therefore, the mean value and variance of saturation components are selected as feature to realize the images with construction dust.  (1) [10] . The mean value A and variance B of image saturation are used as features to form coordinate points (A N , B N ) (N represents the total number of samples) representing the features of each image. The coordinate points of all images in the sample constitute a data set C. 1 , ,  In the formula (2), D τ (τ = 1,2,3 ,… K) represents the class τ after clustering, X represents the sample data points in class D τ , and μ τ represents the mean value of sample data points in class D τ .

Improved K-Means clustering algorithm
The stability of K-Means algorithm is affected by the selection of initial centroid. Random selection of the initial centroid may increase the number of iterations and fall into the local minimum solution. The optimal solution cannot be reached [11] . In view of the above problems, our improved K-Means clustering algorithm selects the initial centroid.
We use the median of the sample data set as the initial centroid of clustering, and we arrange the coordinate points of the data set in ascending order as formula (3). The initial centroid clustering of two classes are centroid1 (0.282410, 0.037153) and centroid2 (0.287985, 0.037225).

3
In the formula (3), the median of the data set , X represents the sample data points, and N represents the total number of samples.
The traditional K-Means clustering algorithm uses Euclidean distance d 1 such as formula (4) as the measurement of similarity. The contribution of different dimensions of each coordinate point to Euclidean distance is the same, which leads to the difference between different feature metrics, which leads to judgment errors and seriously affects the accuracy of the experiment [12] . In this paper, Mahalanobis distance d 2 , such as formula (5), is used as the similarity measurement method to normalize the variance of each dimension data of coordinate points. The feature relationship is in line with the actual situation to improve the accuracy of clustering results [13] .
In formula (4), X jx and X jy represent the abscissa and ordinate of data point X. μ τx , μ τy represent the abscissa and ordinate of the class where the data point X is located. In formula (5), X is the sample data point, μ τ is the centroid of the class where X belongs, and V is the covariance matrix of all sample data points in the class where data point X belongs.

Experimental evaluation method
We have done four clustering experiments to determine whether the initial centroid of clustering and different metric distances. The experimental results were evaluated by precision, recall rate and harmonic mean value F1 [14] . In the formula, X = {X1, X2, X3,… Xn} represents all sample data, where n is the number of class data sample points, Xj (1≤j≤n) is the determined category L(Xj), D(Xj) is the category after Xj clustering, where Xi, Xj(1≤ i, j≤n, i ≠ j). F1 represents the harmonic mean value of evaluation index.

Experimental results
The comparison of our four experimental results is shown in Table 1. In this paper, experiments 1 and 3 are using Mahalanobis distance and Euclidean distance respectively under the initial centroid of random selection. Experiment 2 and experiment 4, we use Mahalanobis distance and Euclidean distance to measure the classification, and the median value of the data set is taken as the initial clustering centroid. The evaluation data show that the precision, recall rate and harmonic average value of the improved K-Means clustering algorithm are better. Compared with experiments 1 and 2, experiments 3 and 4, the results show that the recognition rate is higher when the initial centroid is selected. The reason for the improvement of recognition rate is that the determination of initial centroid eliminates the local minimum solution and achieves the global optimal solution. At the same time, compared with experiments 1 and 3, experiments 2 and 4, Mahalanobis distance has higher recognition rate than Euclidean distance. The reason is that Mahalanobis distance normalizes the variance of each dimension of coordinate points, reduces the difference between different feature quantity standards, eliminates the interference of correlation between variables, and improves the accuracy of clustering algorithm.
The precision of clustering evaluation shows that the algorithm can distinguish images without construction dust. The higher the precision, the stronger the discrimination ability. Recall rate is the ability to identify the image with construction dust. The higher the recall rate, the higher the recognition ability of the algorithm. The larger the harmonic mean value, the more robust the algorithm is [15] . The table data show that the improved K-Means algorithm has more accuracy recognition rate and better robustness. According to the evaluation results, the best clustering effect is obtained by Mahalanobis distance when the initial centroid of clustering is determined in Experiment 2. The accuracy of automatic identification of construction dust is 89.33%. Fig.4 shows the distribution map of sample coordinate data before clustering, and Fig.5 shows the distribution map of coordinate points of sample data after clustering, which determines the initial centroid and measures clustering by Mahalanobis distance. In Fig.4 and Fig.5, the green coordinate points represent the mean value and variance of the saturation of the image with construction dust, and the red coordinate point represents the mean value and variance of the saturation of the image without construction dust.  Figure 5. Results of Experiment 2 We also compared the running time of four experiments. The running time of the improved K-Means algorithm is shorter than the traditional K-Means algorithm. The improved K-Means algorithm with median as the initial centroid can reduce the number of iterations. Mahalanobis distance eliminates the interference of variable correlation, and its running time is faster than that of European distance. K-Means algorithm, which determines the initial centroid and use Mahalanobis distance as the similarity measurement, also has the advantage in running speed. As shown in Fig.6.

Conclusion and outlook
At present, there are few researches on the application of image processing and computer vision for the detection of construction dust. This paper proposes K-Means clustering algorithm to determine the initial clustering centroid and use Mahalanobis distance as the similarity measurement. The improved K-Means clustering algorithm can automatically identify and cluster the images of construction dust. However, the extraction of saturation as a feature of image recognition classification has poor recognition ability for low concentration construction dust. We can increase the amount of data samples, and have a more detailed clustering analysis on the clustering of construction dust, and divide the construction dust into high concentration and low concentration, so as to promote the development of intelligent detection of construction dust. It lays the foundation for designing an easy to operate construction dust detection system by using camera directly.