A New Method of Cloud Detection Based on Cascaded AdaBoost

Cloud detection of remote sensing image is a critical step in the processing of the remote sensing images. How to quickly, accurately and effectively detect cloud on remote sensing images, is still a challenging issue in this area. In order to avoid disadvantages of the current algorithms, the cascaded AdaBoost classifier algorithm is successfully applied to the cloud detection. A new algorithm combined cascaded AdaBoost classifier and multi-features, is proposed in this paper. First, multi-features based on the color, texture and spectral features are extracted from the remote sensing image. Second, the automatic cloud detection model is obtained based on the cascaded AdaBoost algorithm. In this paper, the results show that the new algorithm can determine cloud detection model and threshold values adaptively for different resolution remote sensing training data. The accuracy of cloud detection is improved. So it is a new effective algorithm for the cloud detection of remote sensing images.


Introduction
Remote sensing satellite images have significant applications in different areas such as climate modelling, flood forecasting, water resources management, assessment of forest resources, examining marine environments etc [1,2]. Cloud detection of remote sensing image is a critical step in the processing of the remote sensing images [2,5]. How to fast, accurately and effectively detect cloud on remote sensing's data, is still a challenging issue in this area.
Most of the currently cloud detection methods were developed for moderate resolution sensors. Zhu [3] (2012), Hollingsworth [4] (1996) and Irish [6] (2006) proposed some methods for Landsat images. While, Wang [13](2006) and Li [16](2009) do research on the MODIS images. There are also some studies on other sensors [2,15]. Most of the researches are based on finding better features to select the cloud from water, soil, shadow, stone and other objects on the earth's surface [3,4,6,7,8,9,10,11,12]. Those methods can do well for special remote sensing data. But it does not work well on the other images. How to select features, set the threshold values for features, and organize the model's order? It needs enough priori knowledge. In order to avoid disadvantages of the current algorithms, the cascaded AdaBoost classifier algorithm is successfully applied to the cloud detection. A new algorithm, i.e. the combined cascaded AdaBoost classifier and multi-features, is proposed in this paper. This new method solves the feature selection and the threshold value setting problems. The results show that the new algorithm can determine cloud detection model and threshold values adaptively for different resolution remote sensing training data. The accuracy of cloud detection is improved. So it is a new effective algorithm for the cloud detection of remote sensing images.

The AdaBoost Algorithm
In this section, we will give a briefly description of the AdaBoost algorithm. The original AdaBoost algorithm was introduced by Freundand Shapire [13,14]. It is a supervised learning algorithm designed to find a binary classifier that discriminates between positive and negative examples. This 1 To whom any correspondence should be addressed.
can be defined by the eq.1.
Where f denotes one dimension of input data x ,  is a threshold and p denotes the direction of the inequality. The function will return a boolean value {0, 1}. The output value of 1 represents that the input data x is classified as a positive example and 0 otherwise.
The AdaBoost algorithm boosts the classification performance of a simple learning algorithm by combining a collection of weak classifiers to a stronger classifier. The stronger classifier uses supervised learning with the wrapper method of feature selection. It selects the best weak classifier with respect to a given weighted error of the input data at each iteration. The data misclassified in previous classifiers get higher weights in the next iteration. Then the stronger classifier can be tuned in favour of previous classifiers. Assuming T denotes the number of the AdaBoost weak classifiers and the iterations times. The AdaBoost stronger classifier can be obtained according to the eq.2.
Where t a is the weight value of the t th weak classifier, and it can be obtained by eq.3.
Where t  represents the classification errors of the t th weak classifiers.

The New Method of Cloud Detection Based on Cascaded AdaBoost
The stronger classifier can increase the classification precision. However, it needs to combine a great quantity of weak classifiers. Then, the resulting detector is very slow with mass weak classifiers. To overcome this problem, the cascaded AdaBoost algorithm is adopted in this paper. A new method of cloud detection based on cascaded AdaBoost is proposed. The cascaded AdaBoost model has two key properties that computation time and the detection rate can be adjusted.

Building the cascaded AdaBoost detector model
The cascaded AdaBoost classifier model is comprised of several stronger classifiers. Each stronger classifier gets a high precision rate. Then, good results will be obtained by a simple aiming for a fixed maximum false drop rate ( max More detail about the cascaded AdaBoost classifier model can be seen in the figure 1.

Feature Selection of input data
Cascaded AdaBoost is a degradation of the decision tree. Figure 2 shows the main procedures that the cascaded AdaBoost detector model was applied in the cloud detection. In the cloud detection processing for the target images, the feature vector of each pixel was specified first. Then those feature vectors were classified in the first classifier as the input data. If they are classified as "cloudy pixels", they follow into the next stage classifiers and so on. Once they are classified as "clear-sky pixels" by one stage classifier, they will finally be set as "clear-sky pixels". Only the pixels which were classified as "cloudy pixels" by every stage classifier can be set as "cloudy pixels". (2)Choose the training input data, and classify them to "cloudy pixels" and "clear-sky pixels". Then, specify the feature vector of the training input data as i x .
(3)Initialize the false drop rate and detection rate as And set the number of current working stage as

Feature Selection of input data
As mentioned before, the training data are pixels which come from the Landsat-5 remote sensing images. In order to keep the universality of our model detector, the training data come from seven Landsat-5 remote sensing images of different districts (such as Japan, US and China) and various kinds of land cover(water, mountain, sea, city, vegetation and so on). Those pixels of the training data are label with "cloudy pixel" and "clear-sky pixel", corresponding to positive example data and negative example data. In this paper, the number of "cloudy pixel" and "clear-sky pixel" is 64,577 and 155,717. The total training data number is 220,254.
Feature extraction algorithm is the basis of cloud detection. To describe the visual content of the image, we implement the colour, texture and spectral descriptor. In this paper, digital number (DN) values are converted to Top of Atmosphere (TOA) reflectance for Band 1, 2, 3, 4, 5, 7 and Band6 Brightness Temperature (BT). In additional the TOA reflectance and BT values, the NDVI, NDSI, Whiteness, HOT and Ratio4_5 features are used to describe the input data [3] . The dimension of the feature vector is 12. The relevance description can be obtained by the following equation:

System implementation details
Our implementation platform was carried out on the Matlab2010b environment. The empirical evaluation was performed on Dell3G memory Pc with Win7 operating system. The detail parameter setting in the cascaded AdaBoost detector is illustrated in the following: the fixed maximum false drop rate (

Experimental results
The results obtained by the new method of cloud detection based on cascaded AdaBoost, was compared with the stronger classifier of the AdaBoost algorithm and the object-based cloud detection model [3]. The precision index was used as the performance measurement index for the different methods on the cloud detection. In order to argue the precision index on the test data, the input data was divided to training data and test data which are independent. The number of training data and test data is 110127 respectively. The precision index of test data with the object-based cloud detection model [3] is 92.03%. Table 1 shows the precision index of AdaBoost algorithm and cascaded AdaBoost algorithm when the number of training data increases in the cloud detection processing. The result is the average precision index on 10 times experiments. It shows that the precision index using AdaBoost algorithm and cascaded AdaBoost algorithm is higher than the object-based cloud detection model result, especial with the increasing of the training data number. The precision index on the test data using cascaded AdaBoost algorithm is higher than using AdaBoost algorithm, when the number of training data is more than 60000. Above all, the precision index on the test data using cascaded AdaBoost algorithm is as high as 98.61%. Table 1.

Conclusion
In this study, we argue that the cascaded AdaBoost algortithm was successfully applied on the cloud detection of the remote sensing images. In this paper, multi-features based on the color, texture and spectral features are extracted for the detection image pixels and the training data examples. Then, the cascaded AdaBoost detector model was trained by training data. Last, the "cloudy pixels" of the remote sensing image can be detected according the model. The new model for cloud detection is fast, accurate and robust. While the training phase is time consuming due to the large number of samples and the features, the resulting detector is very fast. Training with a large variety of training data from various kinds of clouds made the detector robust to some rotations, occlusions and also to the degradation of image quality to a significant extent. It also does not require any manual intervention. Thus, it is suitable for real-time surveillance and non-intrusive applications. However, there are some improvements for the model in the future. First, much more common features for the remote sensing data should be exacted in order to make the model applicable to different images. Second, the performance of the detector might further be enhanced by training with samples normalized to an