Robust solution of coordinate transformation parameters with a high breakdown point

Because the M estimation method may lead to poor robustness or even failure owing to excessive outliers, a robust algorithm with a high breakdown point was proposed and applied to the estimation of coordinate transformation parameters. Firstly, the sampling method was used to calculate multiple sets of model parameters, and some sampling results were sifted according to posterior information. Then, the samples were sorted according to their number in the sampling results, and the F-test was adopted to screen and reserve valid information. Finally, the initial values of the reliable parameters were computed using the valid information, and the final parameters were obtained by the Institute of Geodesy and Geophysics III scheme. Monte Carlo method was adopted for the simulation test, and a case analysis was chosen for verification. The results show that the proposed method can identify and process outliers more accurately than those of Rousseeuw and Hubert (2011 Wires Data Min. Knowl. 1 73–79) and Tao et al (2016 Acta Geod. Cartogr. Sin. 45 297–301). When the proportion of outliers exceeded 50%, the proposed algorithm maintained a strong robustness and had a high breakdown point.


Introduction
Coordinate transformation has been widely used in the field of geodesy and engineering measurements. Common models include four-parameter and seven-parameter models based on coordinate axis rotation [1,2], quaternion models based on space vector rotation [3,4], and models based on Rodrigues and antisymmetric matrices [5,6]. The main differences between them are the different descriptions of the rotation matrix. The four-parameter model is relatively simple * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. in form, but limited by the rotation angle, therefore, it has poor applicability. The linear model produced larger model error with an increase in the rotation angle. Zeng and Tao [7] proposed a nonlinear model for three-dimensional (3D) coordinate transformation that treats the linearization error as a function model error and expands the application range of the four-parameter model. The Gauss-Newton iterative method is typically used to solve this type of nonlinear problem. When there are errors in the common points of the two periods, the total least squares (TLS) [8,9] method can consider the error of the coefficient matrix to obtain a better parameter estimate. Wang et al [10] used the mixed LS-TLS algorithm to calculate the spatial-transformation parameters, which can weaken the impact of gross errors. When there is no gross error, the effect of LS-TLS is the same as LS. Centralization of coordinates can overcome the strong correlation between parameters and reduce calculation complexity [11]. Liu and Li [12] applied the best-fit LS and radial basis function neural networks to improving the accuracy of coordinate transformation. Chang et al [13] developed an optional solution of coordinate transformation parameters to solve multiple frames. When the data does not contain outliers, the parameters estimated by different models are the same. In practice, measurement errors are unavoidable, and the displacement and deformation of the points occasionally occur. To make the model more practical, a robust estimation of the transformation parameters is worth further attention.
Setting the system and gross errors aside, and assuming that the error follows a normal distribution, LS method can acquire the optimal estimation of parameters. The LS is very sensitive to outliers and cannot obtain reliable results when the relative position of the common point changes. Large errors produced in the measurement process and the displacement of the common points between the two periods can be treated as outliers.
There are two schemes for handling gross errors. One is to classify it into a gross error detection of the function model [14][15][16], and the other is to classify it into a robust estimation of the stochastic model [17,18]. When the common point is deformed in only one coordinate component direction, the gross error detection method eliminates the point entirely, which results in a waste of data. However, the robust estimation method [19,20] only reduces the weight of a single observation with outliers. Comparatively speaking, the latter is more effective and the data is more efficiently utilized.
Reliable initial values of the parameters can guarantee the robustness of the robust estimation algorithm and help the iterative process converge faster. Qiu [21] used the minimum sum of the absolute value of the residuals as the objective function and established a weighted iterative method based on linear programming. The initial value of the residual is reliable, the location of the gross error is accurate, and the method has strong robustness. The standard deviation of the unit weight plays an important role in robust estimation, affecting the determination of the weights of the observations. The median of all absolute deviations from the median (MAD) was proposed by Rousseeuw and Hubert [22] to estimate the scale factor. The breakdown value of the MAD estimator is 50%. The breakdown point of the median parameter method [23,24] reached 50% in single-parameter estimation. However, it dropped sharply with an increase in the number of parameters. Tao et al [25] applied the median parameter method to obtain reliable parameters by sampling, and used them as the initial values in the equivalent weight robust estimation method, which improves the robustness of the algorithm. However, the process of acquiring the initial parameter values cannot fully utilize the valid information in the data. Statistics can be constructed through residuals and posterior information, and combined with statistical hypothesis test, outliers can be reasonably weighted only when the number of gross errors is small [26]. Qu et al [27] proposed an automatic selection strategy that combines the K-means clustering algorithm and quasiaccurate detection [14] method to obtain quasi-accurate observations and calculate the true error. They estimated the variance factor and fixed weight according to the true error, which can accurately estimate the model parameters and has a high breakdown point. Wang et al [28,29] combines Jackknife resampling theory and least norm algorithm with the weighted total least squares (WTLS) algorithm for the identification and detection of outliers, and the methods can effectively drop off the interference caused by gross errors.
The robustness of M estimation is affected by the magnitude, quantity, and location of gross errors. When there are gross errors, reliable initial parameter values can be obtained by introducing a statistical hypothesis test and screening the valid information in the data, this can improve algorithm efficiency and robustness. When a large amount of data in the sample is contaminated, only a small amount of good data can be used to obtain robust estimation of the parameters. Sifting as much valid information as possible is the key to robust estimation [30,31]. The median parameter method is a robust algorithm with a high breakdown point, but it only emphasizes the strong robustness of the estimation and does not fully utilize the valid information in the data. Therefore, it is inefficient.
Robust statistics aims to detect outliers by searching for the model fitted by the majority of the data. Inspired by Qu et al [27], this study proposes a new strategy. First, multiple sets of model parameters are calculated by the sampling method, certain reliable parameter results are selected according to the posterior information, the number of samples is counted, and the samples are sorted according to their frequency. The higher the frequency, the more reliable is the samples. Then, from high to low frequency, all good samples are sifted with the Ftest, and LS estimation is performed on the valid information to obtain reliable initial parameter values. The posterior standard deviation of the unit weight is regarded as the standard deviation of the unit weight of the robust estimation. Finally, the observations are weighted by the Institute of Geodesy and Geophysics (IGG) III scheme and the transformation parameters are calculated to improve the accuracy and reliability of the coordinate transformation.

Seven-parameter coordinate transformation model
The seven-parameter coordinate transformation model [2] has no special requirements for the initial parameter values and has no limit on the rotation angle. Therefore, its application is flexible and convenient. The common coordinate transformation model is expressed as follows: Assume that the control points in the original coordinate system rotate counterclockwise along the Z, Y, and X axes respectively at angles of γ, β, and α. The Gauss-Newton iterative method is adopted to calculate the parameters. Expanding equation (1) according to the Taylor series at the approximation of the parameters and retaining the first term, we get where R 0 is the rotation matrix at the approximation of the parameters, and dR is the complete differential form of the rotation matrix.
From equation (2), the error equation becomes where V is the coordinate correction (residual of the observation), A is the coefficient matrix, x = (d∆X, d∆Y, d∆Z, dk, dα, dβ, dγ) T is the parameter correction and l = X T For any common point, A can be expressed as follows: where I is the unit matrix.
x can be calculated as follows: where P is the weight matrix of observation.

Robust estimation based on equivalent weight function
The influence of outliers on parameter estimation of is weakened through weight reduction by robust estimation. The robust solution for x iŝ is the equivalent weight matrix,p i is the equivalent weight of the ith observation, and n is the number of observations. The solution of x in the (j + 1)th iteration iŝ Common equivalent weight functions include Huber, Hampel, Danish, Tukey and IGG III [32]. The IGG III weight function divides the observations into three categories, which is more rigorous in theory. The equivalent weight of the ith observation is where p i is the prior weight, The standard deviation of the unit weight can be expressed as where t denotes the number of parameters. The magnitude of σ 0 is sensitive to outliers, which is the basis for the standardized calculation of each residual during outlier recognition.
To reduce the interference of outliers, the scale factor can be estimated via MAD [22,33]: where med(·) represents the median operator, and λ = 1.4826 is the adjustment factor. The procedure for parameter calculation based on the equivalent weight function [34] is given as follows: Step (1): The initial value of the residual V, parameter x, and scale factor σ 0 are calculated using equations (3)-(10).
Step (2): The standardized residualṼ is obtained using V and σ 0 , and the equivalent weight matrixP is obtained from equation (8).

Parameter estimation algorithm based on valid information automatic sifting
The median parameter method only uses the sorted information of the observations and emphasizes the robustness of the algorithm. Therefore, the estimation efficiency of this approach is not high. For multiple-parameter estimation, the median parameter method fails when the proportion of outliers exceeds 50%.
To ensure that the algorithm has strong robustness and can fully utilize the valid information in the data, a robust estimation algorithm with a high breakdown point is proposed. Compared to the median parameter method, the new algorithm makes full use of the posterior information of the sample set. The credibility of all the samples is analyzed through sampling, sorting, and statistical analysis. Valid information is fully selected using the hypothesis test and applied to estimate the reliable initial parameter values and standard deviation of the unit weight. Finally, an equivalent weight function is adopted to reasonably reduce the weight of the outliers and estimate the final model parameters.
The main steps of the algorithm are as follows: Step (1): r (the minimum number of samples required to calculate the model parameters) samples are chosen from m samples to compute the model parameters and obtain the ω = C r m group results. The sample set is marked as (S 1 , S 2 , . . . , S ω ) and the parameter estimation iŝ where [ 1 x 2 x · · · t x ] T is the parameter vector to be estimated, t is the number of parameters, and the solutions of the ith parameter correspond to the ith row element x ω ] of the matrix. Based on the LS estimation, the posterior standard deviation of the unit weight σ 0i (i ∈ [1, 2, . . . , ω]) is calculated according to equation (9). The sample set of each group is marked, and σ 0i and S i are made to uniquely correspond.
Step (2): The posterior standard deviation of the unit weight is sorted (from small to large) and recorded as (σ 1 , σ 2 , . . . , σ ω ). The difference of adjacent standard deviation of the unit weight is Considering that a sample with small error corresponds to high posterior precision, the maximum value max(dσ k ′ ), k ′ ∈ [1, 2, . . . , int[(ω − 1)/2]] is found in the first half. We truncate at the maximum and reserve the part with high accuracy.
Step (3): The number of occurrences of each sample in the reserved data is counted and sorted from the largest to the smallest. The more frequent the sample, the more reliable it is.
Step (4): The samples used for calculating the initial model parameter values are determined. Initially, r high-frequency samples are selected to compute the parameters according to the sorting result, and the posterior variance is expressed as According to the sample sorting situation, the (r + 1)th sample is added on the original basis, that is, r + 1 samples are used to calculate the parameters. The posterior variance is where the weight matrices are all unit matrices. The F-test method is applied to determine whether the added (r + 1)th sample contains gross errors. Assuming that H 0 : σ 2 f = σ 2 l , the (r + 1)th sample does not contain gross error. Considering alternative hypothesis H 1 : σ 2 f ̸ = σ 2 l , the (r + 1)th sample contains gross error. The statistics is constructed as follows: The significance level α is selected, and the quantile value , the original hypothesis H 0 is accepted. This indicates that the added sample does not contain gross error. The posterior variance of the r + 1 samples is computed according to equation (13), the sample addition is continued.
Step (4) is repeated until the test fails. Otherwise, the alternative hypothesis H 1 is accepted, the sample addition is stopped, and only the original r samples are reserved.
Step (5): LS estimation is applied to all samples that pass the test, and the required model parameters are used as the initial value of the robust estimation. The standard deviation of unit weight is calculated using equation (9), and the weight of all observations is determined using the IGG III weight function to solve the final parameters.
In the proposed algorithm, the reliability of the samples is reasonably sorted by means of sampling, sorting, and statistics. This is helpful for the effective implementation of the Ftest and the identification of outliers. During the calculation of the initial value of the parameters, the valid information in the sample is fully utilized to guarantee the robustness and efficiency of the proposed algorithm.

Simulation test
The center of the instrument was considered as the origin of the coordinate system. Twenty points well-distributed in space were simulated and measured by the total station and used as the first group of data. To avoid non-convergence caused by collinearity or poor graphic conditions of the selected sample The scale for the first group of data remained unchanged. First, all the points were translated 30, 50, and 80 m along the X, Y, and Z directions respectively. Then, all the points were rotated 30 • , 45 • , and 60 • around the X, Y, and Z axes respectively. Finally, random errors were added to the coordinate components of all points according to a precision of horizontal angle of ±0.5 s, precision of vertical angle of ±0.7 s, and precision of ranging of ±(0.6 mm + 1 × 10 −6 D) to obtain the second group of data.
The LS estimation was used to process the above two groups of data. The Monte Carlo method was adopted to test 1000 times, and the range of the posteriori standard deviation of the unit weight was 0.262-0.513 mm. Taking the significance level ψ = 0.001 and testing efficacy η = 0.8, the range of the lower bound value of gross error is 1.122-2.325 mm through experiments.
Fifteen points were randomly selected from the second group of data, and 2.000-5.000 mm errors (including small and gross errors) were added to their coordinate components in random directions and positions for testing. The following five schemes were designed to generate simulation data, and each scheme was tested 1000 times by Monte Carlo method. Scheme x: The coordinate components only contain random errors, and the proportion of outlier is 0.00%.
Scheme y: The coordinate components were mixed with 10 outliers, and the proportion of outliers is 16.67%.
Scheme z: The coordinate components were mixed with 20 outliers, and the proportion of outliers is 33.33%.
Scheme {: The coordinate components were mixed with 30 outliers, and the proportion of outliers is 50.00%.
Scheme |: The coordinate components were mixed with 40 outliers, and the proportion of outliers is 66.67%.
The root mean square error (RMSE) of the parameter estimations of LS, Rousseeuw and Hubert [22], Tao et al [25], and the proposed algorithm are shown in table 2. In the experiment, the IGG III equivalent weight function was utilized for all methods, and the values of k 1 and k 2 were set to 1.0 and 3.0 respectively.
From table 2, we can get the following analysis results.
The result of Scheme x shows that the optimal estimation of parameters can be acquired by LS estimation when observations only contain random errors. The results of the four methods are basically equivalent, and the difference between several sets of parameters is extremely small, indicating that parameter estimations are very close to the theoretical values. The RMSE of k was less than 0.5 × 10 −6 . The RMSE of the translation parameters in the X and Y directions was within 0.060 mm, while that in the Z direction was larger (0.088-0.207 mm). After the analysis, the precision of the Z direction coordinate transformation is relatively low owing to the low accuracy of the vertical angle, which is in accordance with the actual situation. Additionally, the RMSE of the rotation parameters was between 0.13 and 0.22 s.
The results of Schemes y and z show that the RMSE of each parameter gradually increased after adding gross error. The results of the parameters in the study by Tao et al [25] are better than those of Rousseeuw and Hubert [22], which shows that the method proposed by the former can effectively resist the interference of outliers and is more robust than the latter when the proportion of outliers is close to or less than 35%. When ten outliers were mixed in the data, the results obtained by the proposed algorithm were consistent with those of scheme x, whereas the RMSEs of the parameters of Rousseeuw and Hubert [22] and Tao et al [25] significantly increased, which illustrates that the results of the parameters of the proposed algorithm are least affected by outliers. When the number of outliers increased to 20, the RMSEs of the translation parameters of Rousseeuw and Hubert [22] and Tao et al [25] were close to 0.350 mm, whereas that of the proposed algorithm was less than 0.300 mm, and the RMSE of the rotation parameters was within 0.50 s.
The results of Schemes { and | show that those of Tao et al [25] are invalid, and the results are worst when the proportion of outliers reaches or exceeds 50%. The results of Rousseeuw and Hubert [22] are slightly better than those of Tao et al [25], and the RMSE of k is approximately 5 × 10 −6 , which is ten times that of Scheme x, and robustness is lost. However, the proposed algorithm could still effectively resist the interference of outliers when the proportion of outliers was 50%. The RMSE of the translation parameters was less than 0.500 mm, and that of the rotation parameters was within 1.00 s. When the proportion of outliers exceeded 50%, the range of increase in the RMSE of the parameters in the proposed algorithm was small, which demonstrates the strong robustness of the proposed algorithm.
Taking the posterior standard deviation of the unit weight as the internal coincidence accuracy of the coordinate transformation, the correspondingσ 0 and the weighted mean of weights of all observations κ of the four methods are shown in table 3.
As shown in table 3, for scheme x, theσ 0 of the LS achieves 0.249 mm, which represents the actual accuracy of the coordinate transformation of the current observations. Due to the inconsistent size of random errors, three robust methods still perform weight reduction on some observations with relatively larger random errors when processing data without gross errors, resulting in higher accuracy than LS. But its parameter estimation results are basically unaffected and reliable  parameters can still be obtained. The results of scheme y show that the posteriori accuracy of the three robust methods is less than or close to the actual accuracy when ten gross errors are mixed in the data, which indicates that the three robust algorithms can effectively handle outliers when the number of outliers is small. When the proportion of outliers exceeds 30%, the results of schemes z-| show that the tolerance of the three robust methods gradually decreases with an increase in the number of outliers. When the proportion of outliers reaches or exceeded 50%, the posteriori accuracy of Rousseeuw and Hubert [22] and Tao et al [25] far exceeded the actual accuracy, indicating that the two methods are invalid. In contrast, compared with Rousseeuw and Hubert [22] and Tao et al [25], the proposed algorithm has the highest accuracy and is kept within 1.000 mm, showing stronger robustness. LS does not reduce the weight of observations during the adjustment process, so its κ is always equal to 1.000. The difference of κ between Rousseeuw and Hubert [22] and Tao et al [25] is small, which indicates that the weight reduction effects of the two methods are basically consistent. In contrast, κ of the proposed method is the smallest among all schemes and decreases as the number of outliers increases, which is consistent with actual situation. Besides, it can be seen that the degree of weight reduction increases with the growing number of outliers. At the same time, it verifies the strong robustness of the proposed method.  The number of gross errors The proportion of gross errors Rousseeuw and Hubert [22] Tao et al [25] The proposed algorithm When 10 and 20 gross errors were mixed in the data, figure 2 shows that the residuals of the gross errors of the three methods are close to 5.000 mm, which illustrates that gross errors are well reflected in the residuals. Table 4 shows that the τ 1 and τ 2 of the three algorithms are equal to 1.0, which indicates that gross errors can be accurately located and identified.
When the number of mixed gross errors reached 30 and 40, the maximum residuals of Rousseeuw and Hubert [22] and Tao et al [25] were close to 10.000 mm, indicating that the gross error was transferred and that the residuals of the observations were irregular and inconsistent with the actual situation. The τ 1 and τ 2 of Rousseeuw and Hubert [22] and Tao et al [25] are equal to 0.0, which illustrates that the two methods cannot recognize and locate gross errors. When the number of gross errors increases to 30, it can be seen from figure 2(c) that the proposed algorithm can still effectively identify and deal with the gross errors. Both the τ 1 and τ 2 of the proposed algorithm are 1.0, and the distribution of residuals of observations is regular. Among them, the residual of the valid observations are close to 0.000 mm, and the residual of the observations with gross error are close to 5.000 mm, which is consistent with the added gross errors. When the proportion of gross errors exceeded 50%, the τ 2 of the proposed method was 0.9. After inspection, it was found that two gross errors were transferred to the other two observations with gross errors, resulting in the residual of the two observations with gross errors close to 0.000 mm. In contrast, the residual error of the other two observations with gross errors was close to 10.000 mm. However, it had no effect on the results of the parameters.
The above analysis results show that compared with Rousseeuw and Hubert [22] and Tao et al [25], the proposed algorithm possesses stronger robustness and a higher breakdown point.

Case analysis
Data of the tunnel control network of Shanghai Synchrotron Radiation Facility in 2016 and 2017 were selected. After separate adjustments of the two periods of data, 269 common  points were obtained, with 807 observations participating in coordinate transformation. The distribution of the points is shown in figure 3.
As shown in figure 3, the control network is circular with a circumference of 432 m, and the points of the two phases exhibit an obvious displacement on the horizontal plane. The pattern of free station setting and multi-station splicing was adopted to construct the network. Generally, the instrument is roughly leveled during the measurement. Therefore, the points of the two phases have a certain rotation in the Z axis direction.
The data of two phases were processed using the methods of LS, Rousseeuw and Hubert [22], Tao et al [25], and the proposed algorithm. The residual of observations of the four methods are shown in figure 4. Figure 4 shows that the residuals of the observations obtained by the four methods are mainly distributed within ±0.500 mm, and some of them are relatively large, indicating that the common points had moved between the two periods, and their size did not exceed 1.500 mm. Overall, the residual distributions of the four methods exhibit the same trend. Among them, the residual distribution range of Tao et al [25] and the proposed algorithm are larger, and the local residual distribution is more consistent.
The residual of point after coordinate transformation can be calculated by where v p represents the residual of point; v x , v y and v z are the residuals of the three coordinate components. The residual of points after coordinate transformation are shown in figure 5. It can be seen from figure 5 that the residual trends of the points obtained via the four methods are generally the same, but the results of the points with large errors are different. It is not difficult to find that the residuals of the points with large errors in the proposed method are greater than those of the other three methods in numerical value, and the residuals of the point with a small error in the proposed method are significantly smaller than those of the other methods in numerical value. This demonstrates that the proposed algorithm can effectively deal with large errors and inhibit their transfer to good observations to a certain extent. At this time, the residuals of points with small errors were also small, and the errors of all points were well reflected in the residuals.
For the analysis of the robustness of various methods, the weights of the observations of the three methods are shown in figure 6, and the posteriori standard deviation of the unit weightσ 0 and the weighted mean of weights of all observations κ = 807 i =1p i /807 are shown in table 5. As shown in figure 6 and table 5, the three robust methods can resist the interference of outliers to a certain extent and improve the accuracy of the coordinate transformation by reducing the weight of the observations with errors when the common points were deformed. Combined with the residual distribution of the observations in figure 4, the three methods performed reasonable weight reduction for observations with large residuals. When the standardized residual of the observation is greater than the threshold k 2 , it is considered as a gross error. Among the 807 observations, the number of gross errors identified by Rousseeuw and Hubert [22] was 0, and 3 errors were identified by Tao et al [25], whereas 25 gross errors were recognized by the proposed algorithm, which indicates that the proposed algorithm is more reasonable for processing errors. The LS method has the lowest precision for coordinate transformation, because it is not robust. The method proposed by Rousseeuw and Hubert [22] had the same accuracy as that of  Tao et al [25]. In contrast, theσ 0 of the proposed algorithm is the smallest, and the precision of the coordinate transformation is better than 0.200 mm. Furthermore, the weighted mean of the weights of all observations κ is consistent withσ 0 , which illustrates that the proposed algorithm is more robust than those of Rousseeuw and Hubert [22] and Tao et al [25].

Conclusions
The seven-parameter coordinate transformation model is not limited by the rotation angle, and there is no need to calculate the initial value of reliable parameters before adjustment. Based on this model, a robust algorithm for coordinate transformation parameters with high breakdown point was established. Through sampling, sorting, and statistics, the new algorithm can make full use of the posterior information of the sample set, reasonably sort the reliability of the sample, and use the hypothesis test method to select valid information from the data. When the proportion of outliers exceeds 50%, the new algorithm can still deal with the outliers effectively and has strong robustness and applicability (selection of stable points in the deformation monitoring network). Compared those of Rousseeuw and Hubert [22] and Tao et al [25], the proposed algorithm has a higher breakdown point. However, the efficiency of the proposed algorithm decreases as the growing number of common points. How to improve the computational efficiency of the algorithm deserves further research.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).