The application of remote sensing image sea ice monitoring method in Bohai Bay based on C4.5 decision tree algorithm

In The Paper, the remote sensing monitoring of sea ice problem was turned into a classification problem in data mining. Based on the statistic of the related band data of HJ1B remote sensing images, the main bands of HJ1B images related with the reflectance of seawater and sea ice were found. On the basis, the decision tree rules for sea ice monitoring were constructed by the related bands found above, and then the rules were applied to Liaodong Bay area seriously covered by sea ice for sea ice monitoring. The result proved that the method is effective.


Introduction
As early as 1992, Key and Haefliger pointed out that land satellites could be used for positioning studies of sea ice activity and trend. There have been many research methods related to sea ice monitoring by satellite remote sensing, is mainly SAR (Synthetic Aperture Radar) and MODIS (Moderate-resolution Imaging Spectroradiometer) as the data source for sea ice monitoring. Han Suqin et al. [1] Extracted the distribution of sea ice and the characteristic values of the outer edge of sea ice using the characteristics of sea ice reflected in visible, near infrared and far infrared channels, and the results show that a large extent of sea ice can be detected by using MODIS remote sensing images. Wu Kuiqiao [2], Wu Longtao [3] et al. carried on the sea ice parameter inversion mainly using the MODIS remote sensing data, and provide remote sensing images of sea ice and numerical products such as sea ice concentration. Franz, J.Meyer [4] et al. extracted the range of the offshore sea ice by processing the interferometric phase pattern and the interferometric coherent images based on the L band SAR data. Sungwook Hong [5] et al. retrieved the roughness coefficients of small scale sea ice and the refractive index of sea ice by passive microwave, and extracted sea ice by the refractive index.
The essence of sea ice monitoring is to classify sea ice and sea water as two categories. Data mining techniques for classification problems have inherent advantages and are capable of discovering potentially useful and unknown rules and knowledge. So far, many researchers have adopted data mining techniques to solve classification problems. Using the ASTER remote sensing data as the data source, Li Mingshi [6] trained the datasets for 8 main terrain types, and extracted the spatial distribution information of these terrain types respectively using the maximum likelihood method, BP neural network method and decision tree classification algorithm. By comparison, the result showed that the decision tree classification algorithm has the best classification performance. Wang Changying [7] conducted in-depth discussion on the remote sensing image classification method of coastal zone based on Data Mining. The sea ice of Liaodong Bay Area being heaviest icing area in Bohai Bay was monitored and the range of sea ice was extracted by decision tree classification method, using HJ1B remote sensing images as data source. At Last the precision of the extraction was verified.

Research area
Bohai sea is a part of the Western Pacific Ocean and also an inland sea of china. Bohai sea is composed of five parts, including the northern Liaodong Bay, western Bohai bay, south of Laizhou bay, central shallow sea basin and Bohai strait. During the year, sea ice appeared only in winter in Bohai sea and the north of the Yellow Sea. The glacial period of Bohai sea and the northern of the Yellow Sea is about 3~4 months. The period of the Liaodong bay, the north of the Yellow Sea, Bohai Bay and Liaodong Bay shortens in turn.

Data pre-process
The CCD images from HJ1 satellite were preprocessed firstly, including radiometric calibration, geometric correction, registration of CCD1 and CCD2 data, data mosaic, data cutting and so on.

Statistical analysis of sample data and collation of training samples
Statistical analysis was carried out for the data selected above. Firstly, the mean values of the 4 bands of sea ice and sea water are calculated and shown in figure 1.  Figure 1, the reflectivity of sea ice and sea water is very different from the average in the B1 and B2 band. Figure 2 shows the statistical analysis of the reflectivity of sea ice and sea water in the B1 and B2 band respectively. The results show that the reflectivity of sea ice in the B1 band is mostly concentrated at 0.05~0.15, while the reflectivity of sea water in the B1 band is mainly concentrated on 0.02~0.05. Based on the result, the B1 bands are chosen as characteristic attributes for separating sea ice and sea water. Similarly, from Figure 1, the reflectivity of sea ice and land was found to differ considerably in the mean values of the B2 and B4 band. In addition, NDSI [8] (Normalized Difference Snow Index) is used to extract the information of snow, mainly due to the characteristics of snow with high reflectivity in the visible band and with low reflectivity in short wave infrared. And sea ice has similar characteristics. The formula for NDSI is: NDSI=(B3-B4)/(B3+B4) There are three relationships between sea ice and background data: sea ice and sea water; sea ice and land; sea ice, sea water and land. To validate the C4.5 decision tree algorithm, the paper divided the experiment into three sets of training sample data.
The sample set selected in the 3.2 section was used as the training sample. the NDSI was calculated according to the formula. Then B1 and B2 band were selected as a set. B2 and B4 band were selected as a set. B1, B2 and B4 were used as the characteristic attributes together, and then a Category attribute is added. The complete data structure of the training sample set is shown in table 2.  [9] algorithm is a classic decision tree algorithm is developed based on the ID3 algorithm. Due to the use of cross validation (Cross Validation) [10], C4.5 algorithm only requires a training set, does not need special test set. The tree display of the C4.5 decision tree classification model constructed finally in the paper was shown in figure 3.

Validation of decision tree model
Based on the 5 Scenes HJ-1B CCD data of Bohai sea in February 4, 2012, which is of better imaging quality, the decision tree classification model constructed above was applied to validate the effectiveness of sea ice monitoring. There was the HJ-1B image after pre-process on the left of figure  4, and the sea ice monitoring result map generated by the model trained in the previous section on the right. The sea area covered by ice was red, while the land area was black, and the sea area not covered by ice was dark green.

Conclusion
In this paper, the problem of sea ice monitoring in remote sensing images is transformed into data mining classification problem. Through the statistics, analysis and necessary operations of remote sensing image data, a decision tree for sea ice monitoring is constructed by using decision tree algorithm, and the rules of sea ice monitoring in HJ images are obtained. Experiments were carried out using the February 8, 2010 Bohai sea remote sensing images, the results of the C4.5 decision tree algorithm are compared with the results of the ISODATA method, and the relevant rules were applied to the February 4, 2012 sea ice monitoring remote sensing image of Bohai sea in ice serious Liaodong Bay Area. The area of the sea ice in Liaodong Bay area by C4.5 decision tree extraction is 15243.831 square kilometres, compared with 16391.216 square kilometres of satellite remote sensing monitoring results released by Liaoning Meteorological Administration in 2012, the accuracy rate reached 93%. The validity of the method is proved.

Suggestion
(1) Sea ice information extraction is performed using techniques other than decision tree classification techniques, constantly improve the sea ice information extraction technology to achieve higher accuracy.
(2) The decision tree is used to classify sea ice information and apply it to other remote sensing images, such as TM, MODIS and so on.