Research on Building Height Extraction Method from High-resolution Image

In recent years, there are many researches on calculating the height of buildings by extracting the shadow of buildings, but this method is difficult to be realized in the case of dense buildings or complex ground. To solve this problem, we can use the side information of the building to retrieve the height of the building. In this paper, Otsu Algorithm and LVQ neural network are used to extract the side information of buildings from high-resolution remote sensing images, calculate its length, and then calculate the height of the building according to the physical model of imaging. Compared with the measured building height, the calculation results of this method can meet the accuracy requirements of building height calculation, and this method can complement the results of shadow extraction method. Finally, the height information of most buildings in the area can be obtained.


Introduction
The key of this work is to obtain the building height information when building three-dimensional modeling of large area buildings in areas where basic surveying and mapping data are scarce, such as a certain area abroad and poor and backward areas. The current method of obtaining building height information can be divided into the following categories: (1) using stereo image pairs of images to calculate building height according to the principle of aerial triangulation. This method has high accuracy in theory, but requires two or more images with a certain overlap; (2) by obtain building shadow information and calculate shadow length, simulate the spatial geometric relationship between shadow and building according to Sun's altitude and azimuth, this method only needs one image, which is relatively low cost, but cannot obtain complete shadow information when the building is dense or the ground objects are complex; (3) Using Lidar (Light Detection And Ranging) data to obtain height information of buildings, which is less applicable [1][2] .
The research group has done many years of shadow inversion of building height [3][4][5][6] and found that this method has some limitations. For example, when the building is dense, the shadow of the front row building may be projected to the back row building, resulting in incomplete shadow of the extracted building.
Therefore, the paper attempts to use depth learning and other methods to obtain the side information of buildings, calculate their length, and then estimate the height of buildings according to the spatial geometric relationship between satellites and buildings. In order to make up for the defect of using shadow information to calculate building height information. At home and abroad, some literatures have been studying this problem [1]7] , but the research content mainly involves a relatively simple scene, and has not been compared with the shadow extraction method.

Research methods and principle
The following prerequisites are required for research of this methods: (1)Remote sensing images with high spatial resolution. Spatial remote sensing images with high resolution can obtain more pixels, such as sub-meter Quick Bird images or higher resolution aerial images.
(2)Non-positive projection imaging. When using the side information of the building to estimate the height of the building, it's necessary to see the side information of the building in the remote sensing image, but when the satellite height angle is 90 degrees or close to 90 degrees, the side image is not obvious. Therefore, the length of the visible side should not be too short, otherwise the side information of the building cannot be identified.
(3)When the sun and satellite are on the same side information of the building, the spectral features of the building side are obvious, the shape and texture features is beneficial to the extraction of the building side.
(4)The structure outside the building is regular and tidy. If it's irregular, it is necessary to use more complex physical models to calculate the height of the building. The irregular external structure of the building will affect the extraction accuracy of the side information of the building, and prevent inversion of building height.
(5)The side information of the building is less sheltered. When the buildings are dense, the adjacent buildings will block each other, or the shadow of the adjacent buildings will be projected on the side of another building; if there is more vegetation on the ground, it will also block the building side.
Subject to the above five premises, the spatial location of satellites and buildings is shown in figure 1: Figure1. Schematic diagram of space position relationship between satellite and buildings It can be seen from figure 1 that the height information of the building is closely related to the azimuth of the satellite and the height angle of the satellite. Because the remote sensing image is a twodimensional image, it is necessary to restore the image to a three-dimensional graph to calculate. The relationship between the two-dimensional graph and the three-dimensional graph is shown in figure 2. We can see that the side information of the building on the image is B, while the side of the real building is A. The relationship between the actual height of the building and the length of the side information of the building is shown in (1).
The H represents the actual height of the building, the L represents the length of the side information of the building on the image, the β represents the height angle of the satellite.  Figure 2. 3-D and 2-D diagram of the building side In fact, in most cases, the parallel line of sight taken by the satellite is not perpendicular to the orientation of the building, and the actual shape is shown in plane C. In order to simplify the algorithm, only need to consider the edge direction of the building, the edge line is orientation of the building side. As shown in figure 3, the angle between the edge line and Y direction of the screen obtained, and calculate the actual length of the side according to the triangular relationship.

Side information extraction based on deep learning method
Experimental data is a Quick Bird image after fusion. There are two methods for building side extraction: (1) threshold segmentation (Ostu algorithm) [8] and mathematical morphology; (2) using deep learning which is LVQ neural network [9] to extract building side information. The following is a brief introduction to the processing process of the second method.
LVQ neural network used in this paper, that is, Learning Vector Quantization, a kind of forward neural network, which is mainly used to train competitive neural network, and include supervised learning methods. Therefore, LVQ neural networks can be classified like neurons, which can overcome the classification errors by unsupervised learning algorithms. For example, input a picture, set its discriminant condition as whether this picture contains an aircraft, if there is an aircraft, you can define this picture as 1 flag, if there is no aircraft in this picture, the picture is defined as 0 flag. By using this feature, the LVQ neural network is regarded as a classifier, to divide the picture into several blocks. The image block which feature satisfies the side of the building is defined as 1, and the image block which does not satisfied is defined as 0. Then the image block defined as 1 is set as foreground and 0 is set as background. finally, side information of the building can be achieved.
(1) Select training samples A large number of images with a pixel size of 256×256 can be cut from the remote sensing image, which has clearly sight of building side and mark it. Turn this picture into a gray image, and set the side  (3) Set input vectors For most buildings, the building material of side is the same as top, which may cause the spectral features of the building side and top to be similar, so we can not only rely on the spectral features as feature vectors. The geometric and texture features of the building side must also be added to form the input vector. Geometric features contain three indexes: The ratio of square root to circumference in extracted area, the ratio of circumference to the number of vertex, and the area ratio of characteristic primitive and external rectangle, the above forms the eigenvectors of 3D shapes. The formulas are shown in formulas (2), (3) and (4). Texture features can reflect the arrangement rules in a local area in image, but texture features can only reflect few features of the object's surface. Texture description can use four characteristic descriptors of the second-order gray level co-occurrence matrix of histogram and pixel relative position as input vectors. the four descriptors are: consistency, entropy, contrast, and selfcorrelation. The formulas are shown in (5), (6), (7) and (8).

) Normalized inputs
In training a model, in order to speed up the learning process and reduce the experimental time, the input feature vector is normalized. The normalized input is mainly divided into two steps, the first step is zero value equalization, that is, the variance of the difference between the training value and the average value, and the second step is to normalize the variance data according to the previous step.

Precision analysis of experimental results
Three parameters are used to evaluate the extraction accuracy: side missed detection rate, side false detection rate and side total error rate. The calculated expressions are shown in formulas (9), (10) and (11).
The SOC F represents the side missed detection rate, the SCE F represents side false detection rate, the STE F represents side total error rate, N F represents the total number of pixels undetected on the side information of the building. E F represents the total number of non-building side pixels mistakenly detected as building side pixels, Y F represents the real total number of building side pixels. The above formula is used to calculate the three evaluation parameters, and the results are shown in the following 839 From the above two tables, it can be seen that the results extracted by the depth learning method are convenient and accurate than traditional method (Ostu algorithm). It shows that this extraction method is feasible.

Estimate for building's actual height
The estimation of the actual building's height is divided into two steps: (1) calculating the length of the extracted side information corresponding to building's height;(2) using the method shown in formula (1) to calculate the actual height of building.
In this paper, the side of the building in the experimental area is mostly parallelogram, and the length of the building side can be obtained by direct measurement. A fishing net method [3] is used calculate the side length of the building. According to fishing net method, the grid line is generated on the binary image, and the cut line is obtained by logic negation operating. The cut line is counted in an array in the form of connected components. Filter with pauta criterion, the number of pixels of the remaining line is calculated, the average value is obtained, and the final result of the side length is obtained.
After calculating the side length, the height of the building is estimated by formula (1) method. The calculated height and actual height of 10 buildings are compared. The results are shown in Table 3 and  Table 4.  It can be seen from the diagram that there are some errors in both methods and need to be improved continuously. In general, the absolute accuracy of the depth learning method is slightly higher than the threshold segmentation method.

Conclusion
It is found that the following conditions will affect the effect of this method: (1) the vegetation at the bottom of the building is too lush, which will lead to the boundary line of the building contact with the ground cannot be directly selected; (2) the structure material of roof and side are consistent, which will interfere with side extraction; (3) the top structure of the building is complex and uneven. (4) Overdense buildings with fewer exposed sides.
It is found that it is feasible to estimate building height by using building side information in high resolution remote sensing images, and can complement each other with shadow calculation method, which can make up for some defects of shadow calculation, such as dense building group area. At the same time, some of the difficult situations in this method can be realized in the shadow method, such as when the material is consistent with side and the top of building. The degree of complementation between this method and the shadow extraction method needs further verification in the subsequent research.