Research on 3D skeletal model extraction algorithm of branch based on SR4000

A branch 3D Skeleton extraction method based on SR4000 is proposed, which can be used for pruning robot visual recognition. In this method a 3D Skeleton model of the branches of apple trees in the initial fruit stage was successfully constructed. According to the experimental results, the average computational efficiency of the 2AHC(2-level Aggregation Hierarchy Clustering) is increased by 369%, the detection rate is 84.69%, and the error rate is 5.03%. The average detection rate of depth analytic hierarchy process was 64%; The overall 3D Skeleton restoration effect is better. Therefore, the algorithm can better reflect the qualitative relationship between branches, improve the computational efficiency, and provide algorithm support for automatic pruning robot visual recognition.


Introduction
Due to the rise of "Intelligent Agriculture" in China, various agricultural robots have gradually become an important part of agricultural production. At present, fruit tree pruning robots are widely used in various orchards. However, now the mainstream pruning robots are still in the semi-automatic process. Like taking branch images by robots first; and then experts give pruning schemes according to the images; finally manual remote control of pruning robots to pruning fruit trees. So, the degree of automation and intelligence is still at a low level [1][2][3] . Therefore, how to improve the automation and intelligence of pruning robot has become a focus of agricultural machinery research at home and abroad.
At present, there are mainly two problems along the research progress of automatic pruning robots. First, the robot's vision recognition algorithm. That is to say, this requires a robot to acquire images, thus establishing a three-dimensional model of the crown, which is then used as the data basis for generating pruning strategies. Second, the algorithm of automatically generated pruning strategy. In other words, the robot needs to decide on its own pruning plan without manual intervention. For the robot vision recognition algorithm mentioned in the first point, there are currently two common methods: Target detection based on RGB images and 3D reconstruction based on infrared scanning. However, these two methods have big defects and cannot be directly applied to pruning robots. Although the former can achieve better accuracy in the analysis of RGB images, two-dimensional images cannot fully reflect the morphological characteristics of branches with complex growth morphology and strong spatial characteristics. As for the latte, the volume of the infrared scanner used in the latter is too large to be applied to pruning robots. Up to now, the research focus on fruit tree pruning at home and abroad has focused on simulation and pruning logic [8][9][10][11][12] , the application of computer vision to pruning is still not deep enough. Meanwhile, in terms of computer vision, there is still a lack of a lot of research on lightweight image recognition algorithms and miniaturization of data acquisition instruments, application and improvement in pruning field, and there are many difficulties.
In recent years, with the continuous in-depth development of computer vision technology, the development of VR technology and the continuous in-depth study of 3D reconstruction, cameras with depth perception function have emerged. Research on 3D reconstruction of fruit trees using depth images has made progress in fruit calibration and visualization [13][14][15][16][17] . Considering the actual needs of pruning robots, it is a valuable research direction to apply depth camera to pruning robots. Because of the actual demand of robot, depth sensor SR4000 based on TOF(Time Of Flight ) principle is used as data acquisition equipment. Based on SR4000, we can propose a tree branch 3D skeleton extraction algorithm. By combining the collected binary image, depth image and point cloud data, the complex calibration operation of three-dimensional reconstruction is simplified on the basis of preserving the three-dimensional spatial information of branches, the data volume is reduced, and the calculation workload is reduced. After experimental analysis, the algorithm can still restore the skeleton features of the crown well at the expense of a certain accuracy, and its accuracy and efficiency meet the practical application requirements of robots.

Experimental Equipment and Equipment Parameters
The experimental equipment uses MESA's SR4000 depth sensor, as shown in Fig.1 left, and the connection of the equipment is shown in Fig. 1   The depth camera uses fast Ethernet to connect to the computer, is powered by 220V voltage, has a modulation frequency of 30 MHz, and has a maximum effective shooting range of 5m, a calibration range of 0.8m~5m, an absolute accuracy of 10mm, a maximum frame rate of 54FPS, and a pixel matrix size of 176 * 144. The principle of depth measurement is TOF method. That is to say, the target object distance [21,22] is obtained by continuously sending the optical pulse to the target and then receiving the light returned from the object with the sensor and detecting the flight (round trip) time of the optical pulse. The computer operating system used in the experiment is Windows10. The official MESA software Swissranger is used to control the camera shooting and data acquisition.

Extraction Method of Three -dimensional Skeleton from Branches
Considering the practical needs of pruning robots, the output of visual recognition algorithm should match the input of pruning decision algorithm. Combined with the common pruning principles of fruit trees and the analysis of empirical trees, pruning decision algorithm mainly depends on the spatial and morphological characteristics of branches, that is, the three-dimensional model of branches. However, the traditional pruning principle does not require the quantitative and accurate shape of branches, but rather a qualitative analysis of the shape of branches. Therefore, to extract the three-dimensional skeleton of a branch is actually to extract the spatial vector representing the branch, which is feasible for pruning decisions. Therefore, what we really need is a graph G(V,E) that reflects the spatial structure of branches. Where V is the key point of bone and E is the adjacency matrix of the key point. This method first requires us to find the key points of the branch skeleton, namely the starting point and 3 the ending point of the branch. For many branches growing on the same main branch, we need to find out their branching points on the main branch. After that, we can judge whether there is a connection relation between the key points according to the regional relation of the key points on the binary image, and obtain the adjacency matrix E. The algorithm flow chart is as Fig.3: Figure 3 Algorithm flow chart The data collected by SR4000 are mainly output in the form of point cloud on Swissranger. In addition, Swissranger also provide grayscale images, depth images, and confidence coefficient images of targets. The point cloud data is in ASC format, and its contents are global coordinates (x,y,z), which are matrices with (176 * 144) rows and 3 columns. Wherein the third column z value is the depth value d of this point, i.e. d(x,y)=z The Python script is used to process the output point cloud data in batch. The binary image and depth image can be obtained respectively by mapping the following two formulas.
(1) Mapping of Binary Images: D = { 255 > 0 0 = 0 In the formula, d is the original depth value of the point, and D is the mapped depth value. D=0 means no depth (i.e. no object is detected). The depth value of all detected points is set to 255, while the depth value of points without detected objects is kept at 0, thus obtaining a binary image of the target.

255
In the formula, d is the original depth value of the point, and d max is the maximum depth value of all points. When the depth value is scaled up to the gray scale range (0,255), the depth image of the target can be obtained.
In this algorithm, Harris' corner detection algorithm is used to process binary images, and the corner detected is taken as the initial candidate point. For the invalid points existing in the initial candidate points, an improved algorithm based on aggregation hierarchy clustering algorithm named 2HAC(2level Aggregation Hierarchy Clustering), and a depth hierarchy analysis algorithm are proposed to eliminate them, and finally the skeleton key points are obtained. In order to obtain the connection relation between the key points, that is, to obtain the adjacency matrix E, the line-covering algorithm is proposed for processing.

Key Algorithms
Harris corner detection algorithm only obtains qualified points located at corners. Some of these points are the key points of branches and bones, which are valid candidate points. However, there are many complicated situations in the actual crown image, which will produce a large number of invalid candidate points. Among them, the two types of invalid candidate points with the largest proportion are adjacent candidate points and cross candidate points respectively.

2AHC(2-level Aggregation Hierarchy Clustering)
The main reason for the adjacent candidate points is that the branches have a certain thickness when growing, which results in a corner point being detected on both the left and right sides of the branches. In addition, when complex branches cross, a large number of invalid candidate points will be generated, for example, when two branches cross, four irrelevant corner points will be generated. For such invalid candidate points, a 2AHC algorithm is proposed to merge them. The specific algorithm steps are as follows: (1) Rough clustering: According to the actual branch image situation, the threshold S 1 of rough score distance is selected. Any point P(x,y) in Harris corner detection result set A 1 is put into cluster set b, and the distance l between point p and each other point P i (x i ,y i ) in A i is solved. If L＞S i , the P i point remains unchanged in A 1 ; if L≤S 1 , the P i point is moved from A i to B. A cluster set B can be obtained every time A 1 is traversed. The above steps are repeated until A i is empty, and then n cluster sets B i ( i = 1,2 ... n ) can be obtained.
(2) Fine clustering: according to the actual branch image situation, the threshold S 2 of subdivision distance is selected, and the following steps are repeated for each cluster set B i : Calculate the distance L ij between every two points Pi (x i ,y i ) and Pj(x j ,y j ) in Bi. Traversing to L ij , take the two points P i and P j which make L ij reaches the minimum value. Then calculate the coordinates P 1/2 ((x i +x j )/2, (y i +y j )/2) and the quartering points P 1/4 ((x i +x j )/4,(y i +y j )/4 ), P 3/4 (3(x i +x j )/4 ,3 (y i +y j )/4), and obtain its gray value in the binary image.
If the gray values of P 1/2 , P 1/4 , P 3/4 are 255, the two points are combined into point P 1/2 , and the shortest distance L ij is deleted. If the gray value of any point in P 1/2 , P 1/4 , P 3/4 is 0, then the two points are not considered to belong to the same merging group, then the shortest distance L ij is deleted, and the two points with the second shortest distance are selected to repeat the above steps until the shortest distance L≥S 2 is stopped. At this time, we can obtain n merged key point clustering sets B i (i = 1,2 ... n), and put all points p in bi into set A 2 , which is the merged key point set.

Depth Level Analysis Algorithm
The main reason for the intersection candidate points is that the complex growth of branches leads to the intersection of branches. Branches will generate multiple corners when crossing, and these corners are not the key points of branches and bones we need, but invalid candidate points due to visual factors. The characteristic of this kind of candidate point is that there will be many intersecting branches in its neighborhood, so its depth data will show multi -level, and each intersecting branch can provide an additional depth level. However, there are only two layers at the non-intersection point: the branch layer and the background layer. We can use this feature of cross candidate points to do depth hierarchical analysis on the neighborhood of pixel points through clustering, thus judging whether the point is a cross candidate point. The specific algorithm is as follows: (1) For each candidate point P(x,y) in the set A 2 , find its corresponding position P D (x D ,y D ) in the depth image: x D =x ; y D =y (2) The size of the neighborhood is selected to be 11 * 11, and the depth value d(x,y ) of each point Q(x,y) in the neighborhood centered on the P D point is taken out.
Q(x,y)={(x,y)|x∈(x-5,x+5),y∈(y-5,y+5), x, y∈N} The neighborhood of candidate points on the image boundary will exceed the value range of coordinates. In order to facilitate the calculation, the gray values of all pixels beyond the range are defined as 0. (4) If N=2, the point is the key point of the crown skeleton, and the point is saved into the candidate point set A 3 . If N>2, the point is a cross candidate point and needs to be deleted from the set A 2 . After traversing A 2 , a skeleton key point set A 3 can be obtained.
According to the coordinates of point P in A 3 , the global coordinates O i (x i ,y i ,z i ) of the corresponding point O in ASC point cloud data are selected, and the set formed by O i is the three-dimensional key point set V of the branch skeleton.

Line-covering algorithm
In the processed key point set V, the points are independent of each other and have no logical relationship. Therefore, we need to first judge whether there is a branch between any two key points based on the obtained key points, that is, whether there is a connection between the points. In order to solve this problem, we put forward the line-covering algorithm.
The basic principle of the line covering method is to make a line segment between any two points. If more than 80% of the line segment is in the target domain (i.e. covered by the target domain), then it is considered that there is a connection between the two points. The specific algorithm steps are as follows: (1) Traversing the key point set A 3 ={P i }(i=1,2,3…n), and for each key point P i (x i ,y i ), finding a line segment I ij formed by each other point in P j (x j ,y j ) and A 3 respectively. That is: In ∈ ( , ), take x k with step length of 1 and substitute it into the above formula to find the corresponding y k . The gray scale g k of pixel points (x k , y k ) in the binary image is taken out, and the average gray scale of all eligible points (x k , y k ) in line I ij is calculated.
Judge whether the value of ̅ is greater than 80% of 255, namely 204. If it is greater than, points P i and P j determine that there is a branch, that is, there is a connection between points P i and P i . if it is less than, it proves that there is no connection between the two points. The result is then saved into the adjacency matrix E.
The key point set V and the adjacency matrix E thus form a graph G(V,E ). In this graph, G is an approximate skeleton model graph of a branch, and the points in all adjacency matrices with a corresponding value of 1 are the spatial vectors ( P i ,P j ) of the branch.

Image acquisition of fruit tree branches under general conditions
Considering the use requirements, acquisition accuracy and effective shooting range of the acquisition equipment SR4000, the data acquisition site was selected as Beijing Bakou Fruit Tree Experimental Park, and spindle apple trees in the initial fruit period were selected as the experimental objects. Since SR4000 is easily affected by light, the data acquisition time is selected at 19: 00 PM. During the measurement, one person holds SR4000 to simulate pruning robot and operates Swissranger for data acquisition. At this time, a total of 15 fruit trees with regular shapes were selected for data collection. Each tree was photographed from 8 angles, and 120 data sets were collected. Sample binary images and corresponding depth images are shown in Fig. 4 (rotated to an angle suitable for viewing).

The extraction of 3D Skeleton
Considering that SR4000 is easily affected by illumination, noise exists in the obtained binary image and depth image. Therefore, Gaussian filtering is carried out on the binary image before skeleton extraction is carried out. The set A 1 obtained by Harris corner detection is shown in Fig.5 top. The set A 2 obtained by 2AHC is shown in Fig. 5 middle. The set A 3 obtained by using depth hierarchical analysis is shown in Fig. 5 bottom.
Obtaining an adjacency matrix E according to a line covering algorithm; According to the coordinates of the midpoint P in the set A3, the global coordinates O(x, y, z) of the key points of 3D skeleton are obtained, thereby obtaining the set V of the 3D Skeleton key points. Tree branches 3D skeleton model G (V, E) established in global coordinates is shown in Fig. 6 It can be seen that although the accuracy is somewhat lower than that of the infrared scanning method, it is still sufficient to reflect the qualitative relationship between crown branches and meet the practical application requirements of pruning robots.

Comprehensive analysis
Considering the practical application requirements of pruning robot, comparing infrared scanning method with RBG image target detection method, this algorithm has the following advantages and disadvantages: Disadvantage: The accuracy is lower than the other two. Advantage: The 3D information is maintained, and spatial characteristics of branches can be completely reflected; The equipment is compact (about 7*7*8cm3); Small amount of data; Fast processing speed; The measurement operation is simple and does not involve complicated steps such as calibration and multipoint measurement.

Analysis of key algorithms (1) Analysis of 2AHC
According to the characteristics of adjacent candidate points, candidate points can be pre-grouped according to distance before merging. Key points closer to each other are calculated within the group, while key points further away from the group are not calculated, thus avoiding a large number of meaningless calculations. Using two algorithms to calculate 120 data sets and dividing them into 15 groups according to the source of fruit trees, the average time-consuming results are shown in the following Table 1 (unit: second) Table 1 Comparison of two algorithms in time cost Overall average time consumption: 1.034s for traditional aggregation hierarchy clustering and 0.280s for 2AHC. The average processing speed increased by 369%.
In order to study the performance of this algorithm, one data set was extracted from each of 15 fruit trees to form sampling samples. The two algorithms were analyzed and compared with manual analysis. The results are as follow tables: Table 2. Adjacent Candidates' Detection(P) Table 3. Error Detection Points(Q) According to the above table, the detection rate R=(Pi-Qi)/Ps; Error rate W=Qi/Pi. It can be calculated that the average detection rate of two methods that R1=85.20%, R2=84.69%. And the average error rate is W1=6.55%, W2=5.03%, which meet the use requirements. The detection rate of 2AHC is slightly lower than that of the traditional aggregation hierarchy clustering algorithm on a few fruit tree pictures with special shapes, but the error rate is also reduced, so the algorithm has advantages in performance compared with the traditional algorithm.
(2) Depth hierarchy analysis algorithm The key difference between the deep hierarchy analysis algorithm and the 2AHC is that its input data is constant at 121, so its single execution time is relatively fixed. It takes 117.6s to execute the algorithm 1000 times, so it takes about 0.118s to execute the algorithm once. The performance of the algorithm is tested in the sample data set extracted in the previous section and compared with manual analysis. The results are shown in the following tables:  Table 4. Intersection Candidates' Detection Table 5. Detection Rate It can be seen that the algorithm can effectively detect cross candidate points in most data sets, with a total detection rate of 64%. The error mainly comes from the accuracy of equipment SR4000. The depth perception accuracy of SR4000 is relatively lower than that of infrared scanner. If the crossing branches of actual fruit trees are close to each other, the difference of depth values in point cloud data is small, which requires the detection threshold to be set to a lower number. At the same time, SR4000 is easily affected by illumination. The image noise generated by the background during shooting will interfere with the depth feature expression of the image. In order to improve the anti-noise capability, the detection threshold is required to be set to a higher value. For this contradiction, this algorithm chooses the latter, which improves the anti-noise ability, but at the same time it is difficult to detect the cross candidate points generated by some cross branches with close spacing. How to better solve this contradiction and improve the detection rate of cross candidate points still needs further research.

Overall error analysis
In practical applications, pruning robots do not need high-precision quantitative analysis of branches. Pruning strategies focus more on analyzing the relationship between branches and belong to qualitative analysis. Based on this characteristic of pruning robot, in order to evaluate the reduction effect of this method on branch 3D Skeletons, the following characteristic values are defined for quantitative evaluation: (1) The vectors' number in single branches is K, and the average of all branches is ̅ = 1 ∑ =1 .
K means that each branch is spliced by K vectors in the 3D Skeleton, and the average value of K for all branches in each data set is ̅ . The smaller ̅ , the more accurate of the single branch's restoration by this method.
L represents the total length of the space vector representing a branch in the method, and L 0 represents the true length of the branch in the original image. The larger ̅̅̅̅ , the better to restore the branches' length.
(3) The direction restoration rate H = 0 , and the average direction restoration rate ̅̅̅̅ = 1 ∑ =1 . α represents the angle between the space vector representing a branch and the Y axis in the method, and 0 represents the true value of angle between the branch and the Y axis in the original image. The larger ̅̅̅̅ means that the method is better for the branch direction restoration (4) The number of detected branch T, and the branch detection rate H = 0 . T represents the number of branches which detected by this method in each data set, and T 0 represents the true number of branches of the tree. The larger H , the better detection effect of the method for the tree. Based on the above characteristic values, the sample data set extracted in the previous section is still used for this testing, and the analysis results are shown in the following tables:   Table 6. Overall error analysis It can be seen that due to the inevitable existence of the sub-nodes, most branches with sub-branches are connected by multiple vectors, and ̅ is generally around 2~3; Gaussian filtering is performed before Harris corner detection, causing that edges are not clear. And after 2AHC the adjacent points are merged, causing the length value will be smaller than the actual length, ̅̅̅̅ is about 80%. This method has a better restoration of the branch growth direction, and ̅̅̅̅ can reach 90%. Limited by the accuracy of SR4000, a few short or thin branches will be omitted in the filtering process, and cannot be detected. Totally，H is more than 80%, and some excellent samples can reach 100%.

Conclusion
(1) A branch 3D Skeleton extraction method based on SR4000 is proposed, which can be used for pruning robot visual recognition. In this method, gray image, depth image and point cloud data are combined to find out the key points of branch skeleton and their adjacent relations, thus generating a branch 3D Skeleton model and obtaining a space vector representing the branch. The method retains the 3D information of branches, and does not need to use large-scale infrared scanners, calibration rods and other equipment. The measuring instrument is simple and portable, and meets the actual needs of pruning robots.
(2) A 3D Skeleton model of the branches of apple trees in the initial fruit stage was successfully constructed. According to the experimental results, the average computational efficiency of the improved 2AHC is increased by 369%, the detection rate is 84.69%, and the error rate is 5.03%. The average detection rate of depth analytic hierarchy process was 64%; The overall 3D Skeleton restoration effect is better. Therefore, the algorithm can better reflect the qualitative relationship between branches, improve the computational efficiency, and provide algorithm support for automatic pruning robot visual recognition.
(3) This algorithm is suitable for tree species with small crown and relatively simple structure. The restoration effect of large tree species or tree species with dense branches and complex composition is not ideal enough, and further research is needed.