A rapid outliers detection and correction method for external ballistic velocity measurement data

Due to the influence of various sudden and abnormal factors during the tracking environment, equipment working conditions, and operation process, there are inevitable outliers in the tracking measurement process of aircraft such as carrier rockets, artificial satellites, and missiles. Therefore, the prerequisite for ensuring the reliability of processing results is to timely and accurately detect and correct outliers. In this paper, we proposed the sliding window-based variable degree B-spline function method (SWVD B-spline), which can handle isolated outliers and spotted outliers. SWVD B-spline uses variable degree B-spline function to model observed data in sliding windows, which can detect and correct outliers point by point along with window sliding. Then, we propose an initial window data selection method to remove outliers in initial windows to ensure the processing effect. In addition, because there are often inflection points in external ballistic velocity measurement data, differential evolution is used to optimize variable degree B-spline in windows that include inflection points to improve processing accuracy. The experimental results verify that SWVD B-spline can handle various outliers rapidly and efficiently.


Introduction
In the external ballistic tracking measurement of spacecraft, the measured data is influenced by factors such as * Author to whom any correspondence should be addressed.
Original Content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.measurement instruments, measurement methods, measurement environment, and surveyors, so there are always unpredictable outliers in the results.Outliers will seriously affect the accuracy of subsequent data processing.Therefore, how to detect and correct outliers more efficiently to improve data credibility is a problem that must be solved first.
In data processing, outliers are points that deviate significantly from normal values, they are usually the few numbers of data that deviates significantly from the trend of most remaining data, and the extent of deviation from the primary trend can be used to distinguish outliers [1].There are many causes for generating outliers in the measurement process, such as gross error, sudden and temporary changes in the sampling environment, heavy-tailed distribution [2,3], and beyond the measurement range of the sensor [4].In general, outliers can be divided into isolated outliers and spotted outliers according to whether they are continuous or not.Isolated outliers refer to outliers taking the form of isolated points, and spotted outliers are a piece of continuous abnormal data.In engineering practice, the fitting results of data are often affected by outliers.With the number of outliers in the data increases or the deviation distance of the outliers increases, the fitting results of the data will deteriorate.Because the sampling time of external ballistic measurement data is long, it usually fits by piecewise or sliding polynomials.If one of the segments is deviated due to outliers, it will also have an unpredictable effect on the fitting results of the other segments.
Scholars have done much research on handling outliers, and there were many practical algorithms [5][6][7][8].In [9], the outlier detection was based on the median relative height of the surrounding data within a defined detection window.And in [10], a locally estimated noise variance was used.Then, Sangeetha and Mary A [11] carried out a method for detecting outliers in a mixed dataset using the rough set-based entropy-weighted density.The classifier methods and clustering methods could also be used to handle outliers in a dataset.Classification has been one of the essential methods for outlier detection [12,13].In [14], an adaptive distributed Bayesian approach was proposed.Similarly, some improved SVM classifier methods also solved the outlier detection problems [15,16].Clustering approaches have relied on similarity metrics to detect outliers [17,18].Yu et al [19] proposed an outlier elimination algorithm that can improve k-means clustering efficiency.Subsequently, Yan et al [20] proposed a novel statistical outlier detection method to identify cluster centroids automatically from the decision graph.
In recent years, some deep learning methods have been gradually used to detect outliers [21].For instance, in [22], two recurrent autoencoder ensembles are used to solve outlier detection in time series, and these autoencoders are built by sparsely-connected recurrent neural networks.In the same year, Chakraborty et al [23] used the stacked autoencoders for extract feature extraction first, and then they used a probabilistic neural network to detect the outliers.Otherwise, a novel single-objective generative adversarial active learning method was proposed to enrich the information on outliers [24].And to detect outliers in a few labeled data, a graph-based semisupervised method was designed [25].However, deep learning models often require a large amount of training data, so they cannot be applied to a small sample of data.Moreover, these methods are time-consuming and involve many adjustable parameters.
Although the existing outliers handling methods have shown effectiveness, there are still some areas for improvement.(1) Many methods can only be applied to process data segments, and the processing time is usually long, so they cannot rapidly deal with the outliers in observation data point by point.(2) Some methods require a large amount of sample data for learning, which cannot be realized in some practical problems, so they have certain limitations.(3) Many methods can only deal with the sudden appearance of spotted outliers, but the processing effect of some complex spotted outliers is not very good.To solve the problems mentioned above, we propose the sliding window-based variable degree B-spline function method (SWVD B-spline) in this paper.
The main contributions in this paper are summarized as follows: (1) To detect and correct the possible outliers in the external ballistic velocity measurement data, we propose SWVD B-spline method for the first time.SWVD B-spline uses variable degree B-spline to deal with outliers point by point with sliding windows, which can simultaneously detect and correct outliers.(2) To reduce the influence of the possible outliers in the initial window on the subsequent data processing, we propose a data selection method to remove the outliers in the initial window.
(3) Differential evolution (DE) is used to optimize SWVD Bspline, which can further improve the processing effect near inflection points of velocity measurement data.(4) The experimental results on two velocity measurement data sets prove that SWVD B-spline can rapidly and efficiently deal with isolated outliers and spotted outliers.
The proposed method can be applied to the point-by-point rapid processing of external ballistic velocity measurement data.
The rest of this paper is structured as follows.Section 2 uses the B-spline to model the external ballistic data.In section 3, we propose SWVD B-spline.Section 4 presents the two optimization methods of SWVD B-spline: model initial window data selection and DE.The experiments and results are presented in section 5. Finally, the conclusion and future works are given in section 6.

External ballistic data modeling
In order to detect outliers in observed data, we need to model normal data.In this section, we first prove that the spline functions can model data better than the polynomials.Subsequently, we use the B-spline to model the normal data in the external ballistic observed data.

Fundamental properties of approximation
Based on the physical properties of trajectory in aerospace external measurement data processing, it seems that modeling the measurement data using polynomial or B-spline function can both solve the external ballistic data processing problem.However, there are essential differences in their intrinsic estimates.The following properties give a clear answer by mathematical methods.For convenience of explanation, the following definition is given.From the above definition and [26,27], we have the following theorem.
For the spline functions, according to [28], we have the following theorem.
then the following inequality holds When the same accuracy is required, in order to compare the number of parameters to be estimated for the polynomial function and the spline function, we give the following example [27].
∞ ⩽ 10, assume that the approximation accuracy δ is equal to 10 −5 , in order to ensure Solving the above inequality about k, we can conclude that the degree k of the polynomial function should be 50, so the corresponding number of coefficients to be estimated should be 51.This result shows that if the polynomial function Q(t) is used to approximate f(t), 51 parameters must be estimated to ensure the accuracy given in advance. When Solving the above inequality about h, we can obtain that the maximum interval h of the spline function should be 0.094, so the minimum number of coefficients to be estimated should be 25.This result shows that if the spline function Q(t) is used to approximate f(t), 25 parameters must be estimated to ensure the accuracy given in advance.The computational complexity of the spline function is obviously lower than that of the polynomial function for the same accuracy.
Whether the polynomial function or the spline function is used to approximate trajectory f(t), the final use of redundant data is faced with a least squares problem.There are the following theorem about least squares estimation [27].
Theorem 3. Suppose the linear regression model is Y = Xθ + ε, X is the known full column rank matrix, rank(X) = n, θ are the unknown parameters, and ε ∼ N(0, σ 2 I) are the random errors of the measuring equipment, then θ = (X T X) − Theorem 3 shows that the estimation error is positively proportional to the number of unknown parameters when using X θ as the estimate of Xθ [27].Hence, the more parameters to be estimated, the larger the estimation error and the worse the estimation accuracy.For the above example, the number of unknown parameters to be estimated by the polynomial approximation is much higher than that required by the spline function.Therefore, the estimation accuracy of the polynomial approximation is much lower than that of the spline function approximation.
In conclusion, the above analysis proves that ballistic approximation using splines is better than polynomials in terms of computational complexity and approximation accuracy in the aerospace data processing.

B-spline modeling
Suppose a certain station tracks and observes the space flight target in a measurement task, a set of time series observed data is O = {(t 1 , y 1 ), (t 2 , y 2 ), . . ., (t n , y n )}, and there is a nonlinear measurement equation of the following form where, y(t) is a set of the observed data, S(t) is a set of the normal data, and ξ are the outliers in the data.The normal data S(t) = ȳ(t) + s(t) + ε, where ȳ(t) is a set of truth values, s(t) is a set of systematic errors with a certain varying regularity, and ε are the random errors obeying the statistical properties.Because in practical situations, S(t) within a time window usually has a continuous change trend, a B-spline function of k-th degree can be used for fitting.
where, m is the number of spline segments, d j (j = 0, 1, . . ., m) refers to the j-th control point, which is the spline coefficient to be estimated, and N j,k (t)(j = 0, 1, . . ., m) is the j-th Bspline basis function of a k-th degree curve.
If the matrix form is used, let Then equation ( 3) can be expressed as where ε is a vector composed of the random errors ε.
The B-spline fitting coefficients vector d estimated by the least squares method is (6) and from d = d0 d1 . . .dm , the B-spline fitting result is To compare the fitting accuracy of B-spline and traditional polynomial for external ballistic velocity measurement data, we used the quasi-uniform B-spline method, the center sliding polynomial method, and the Hermite method to process a set of velocity radar tracking data that include ε ∼ N(0, σ 2 ), where the quasi-uniform B-spline and Hermite used the same number of segments.The differences between the fitting results and the truth values are shown in figure 1, and the root mean square error (RMSE) results are shown in table 1.
From figure 1 and table 1, the B-spline method can effectively deal with the random errors in this set of external ballistic velocity measurement data, and its fitting accuracy is obviously better than that of the center sliding polynomial method and the Hermite method.

Outliers handling using SWVD B-spline
The previous section uses the B-spline function to model the external ballistic velocity measurement data.Compared with the traditional polynomial model, the B-spline model can effectively reduce the coefficients to be estimated in the model and significantly improve the estimation accuracy of the model.However, the model only models normal data in observed data and does not deal with outliers.In order to solve the problem of outliers handling, we propose a new faulttolerant B-spline algorithm called the SWVD B-spline.This method uses sliding fitting windows and variable degree Bspline function to construct the model.It combines the characteristics of outliers in external measurement data to distinguish outliers so that various outliers in observed data can be quickly detected and corrected.Since external ballistic flight targets generally do not change dramatically or sharply, the observation curves of the flight target can be described by polynomials with a maximum degree of no more than cubic.Hence, the method proposed in this paper uses the B-spline with no more than cubic.
Step 1: Let k = 1, k = 2, and k = 3 in turn.Then the B-spline is modeled by equation ( 5), and the estimated values of the first-degree, second-degree, and third-degree B-spline fitting coefficients are calculated by equation ( 6) corresponding as d (1) , d (2) , and d (3) , respectively.The coefficients 2) , and d (3) are substituted into equation (7) with the next time t p+L , respectively, so the estimated values of y p+L are ŷ (1)  p+L , ŷ (2)  p+L , and p+L by the extrapolation fitting method.Step 2: If any ŷ(1) p+L , ŷ(2) p+L , ŷ(3) p+L satisfies condition |ŷ where λ is the given threshold.Then it is determined that y p+L is not an outlier and goes to Step 4; otherwise, it is determined that y p+L is an outlier and goes to Step 3.
After y p+L = ỹp+L , go to Step 4.
Step 4: Let p = p + 1. Slide the window point by point to the next window until the end.
The flow chart of SWVD B-spline handling outliers is shown in figure 2.
As seen in figure 2, the process of SWVD B-spline handling outliers is divided into two parts: outliers detection and outliers correction.And this method can handle outliers point by point by sliding a window of length L until the end of the last data is processed.

Optimization methods for SWVD B-spline
In order to better handle outliers, two optimization algorithms are used to optimize SWVD B-spline.In section 3, SWVD Bspline first needs an initial window without outliers or outliers that have been corrected.If the window includes outliers, it may affect the overall processing effect, so the initial window data selection method is proposed.In addition, the B-spline knot vector optimization can achieve better fitting effects to deal with more problems, so we adopt the DE algorithm.

Model initial window data selection
The initial window data of the model refer to a segment of data at the initial phase of the observed data, and its length is less than or equal to the sliding window length L. The initial phase of external ballistic velocity measurement data is usually smooth and has a low probability of outliers.In order to avoid the influence of the possible outliers in the initial window on the subsequent fitting, we remove the outliers in the initial window and then select the remaining data as the initial window data of the model.
Firstly, a set of the initial window data {(t 1 , y 1 ), (t 2 , y 2 ), . . ., (t l , y l )} of length l is given starting from the moment t 1 of the observed data O. Calculate the linear regression coefficient ρ and the estimated values ŷ(t) where t = 1 l l i =1 t i and ȳ = 1 l l i =1 y i .Calculate the distance D i = |y i − ŷi |, (i = 1, 2, . . ., l) between the original data y(t) and the corresponding estimate ŷ(t) point by point, and calculate their mean value D = where C is a constant, then determine that the i-th data is an outlier and remove it; otherwise, determine that the i-th data is normal and retain it.
After removing outliers, only normal data are retained in the initial window, and the retained data are selected as the initial window data of the model.If the initial window length of the model is less than L, the window does not slide at this time but expands the window length by processing and adding point by point until the window length is equal to L.
As shown in figure 3, given 20 data points that include an outlier, calculate their distances between the original data and the corresponding estimated values.
In figure 3 So

B-spline Knot vector optimization
In the external ballistic velocity measurement data, there are often inflection points caused by the velocity change.In order to further improve the fitting accuracy of the B-spline function near the inflection point, the DE algorithm is used to optimize the number and position of knot points in the non-uniform B-spline.DE has emerged as one of the most efficient global search algorithms, which excels when searching over real continuous domains.It is an iterative, population-based algorithm that uses simple mathematical operations to find global solutions to optimization problems.It has been shown to outperform more well-established evolutionary algorithms like genetic algorithm and particle swarm optimization when applied over real domains [30].
DE divides into two stages: initialization and evolution.At the first stage, the population is generated randomly.At the second stage, the population undergoes mutation, crossover, and selection, which are repeated until a termination criterion is met [31].
Step 3: Calculate the mutant vector where F is the scaling factor.(3) Crossover: DE employs the crossover operator to generate a trial vector to increase the diversity of the population.Crossover is performed between the current target vector β i and the mutant vector U i to obtain the trial vector V i .The trial vector is generated as V i,j , β i,j , and U i,j are the j-th values in V i , β i , and U i .CR ∈ (0, 1) is the crossover probability and j = 1, 2, . . ., d. (4) Selection: BIC [32] is used to evaluate the fitting effect of the model, and the current target vector is selected according to BIC.The expression of BIC in the curve-fitting problem is where R is the residual sum of squares between the fitted data and the original data, and k is the degree of B-spline fitting.Calculate BIC β of β i and BICv of V i by equation (15), then let The mutation, crossover, and selection operations are repeated until the end of the iteration.The optimal vector in the population is output as the internal knot vector of the B-spline.

Simulation analysis
To verify the processing effect of SWVD B-spline on outliers in external ballistic velocity measurement data, we used the external ballistic tracking velocity measurement data during the launch of the carrier rocket for experimental analysis.We selected a set of velocity radar tracking data and a set of X-direction velocity data that included an inflection point.
Experimental data 1: A set of velocity radar tracking data was selected, and the data segment during the 10 s-110 s was intercepted as experimental data.The sampling frequency of the data is 20 Hz, and there are a total of 2001 data.
Experimental data 2: A set of X-direction velocity data that included an inflection point was selected, and the data segment during the 1 s-269.95s was intercepted as experimental data.The sampling frequency of the data is 20 Hz, and there are a total of 5380 data.

Model initial window data selection
The initial window data of experimental data 1 and the initial window data of experimental data 2 have similar characteristics.Here, experimental data 1 was taken as an example.
The 20 data points starting from the start time in experimental data 1 were used as the initial window data, their time during 10 s-10.95 s.Firstly, a set of Gaussian white noise ε ∼ N(0, 2.5) was added to the data set as random errors.Then, an isolated outlier or a set of spotted outliers was added to this data set for the experiments, respectively.The isolated outlier was added at the 10.As can be seen from figure 4, the proposed method can detect isolated outliers and spotted outliers in the initial window data very well.However, due to the existence of random errors, there may be occasional superfluous detection, such as at the 10.15 s in figure 4(c) and the 10.7 s in figure 4(d).In order to increase the reliability of the initial data as much as possible and reduce its adverse influence on the subsequent algorithm, it is necessary to eliminate all the detection results that include additional detection and then select the remaining data as the initial window data of the model.

Velocity measurement data without inflection point
Firstly, a set of Gaussian white noise ε ∼ N(0, 2.5) was added to experimental data 1.Then, three different isolated outliers were added to the data at the 27.15 s, 43.65 s, and 103 s, respectively, and three different types of spotted outliers were added to the data during 21.45 s-22.25 s, 50.95 s-51.95 s, and 94.45 s-96 s, respectively.The detection results of SWVD Bspline for outliers are shown in figure 5.
As seen in figure 5, SWVD B-spline has good detection results for various types of outliers on this velocity measurement data set, such as the isolated outliers with different values in partially enlarged detail figures 5(b), (c), and (f) and the spotted outliers with different types in partially enlarged detail figures 5(a), (d), and (e).To compare the experiment effect SWVD B-spline, we used the first-order difference method [33] and DBSCAN method to detect the same data set.Their results are shown in figures 6 and 7.
In figure 6, the first-order difference method has good results for the isolated outliers, but it cannot detect the spotted outliers accurately.Figure 7 shows the outliers detection results by DBSCAN.We first segmented the data and removed the data trends by the polynomials.Then, we used the DBSCAN clustering method to detect the outliers and obtain the detection results of outliers.From figure 7, this method can detect the isolated outliers and the sudden appearance of spotted outliers, but it cannot accurately detect the complex spotted outliers.
We counted the number of outliers detected by the firstorder difference method, DBSCAN method, and SWVD Bspline on this data set.These results are shown in table 2.
From the above experiments, it can be verified that SWVD B-spline can accurately detect the various outliers in the simulation data.The correction results of the outliers detected by SWVD B-spline are shown in figure 8.In figure 9, the comparison of the differences with experimental data 1 shows the correction effect of the outliers.
As shown in figures 8 and 9, SWVD B-spline can effectively correct different types of outliers, and the correction results are basically consistent with the data distribution of random errors.
To be realistic, more complex errors were added to experimental data 1 where ȳ(t) is experimental data 1, ε 1 are the random errors of ε 1 ∼ N(0, 2.5), ε 2 are the random errors of ε 2 ∼ N(0, 1), and ξ 1 are outliers.Because 0.02 • ε 2 • |ȳ(t)| simulates the situation that the measurement errors increase with the increase of |ȳ(t)|, the larger the value of the range rate, the larger the impact of the random errors.Also, due to this data set was added a set of periodic errors, which would have a certain impact on the handling of outliers.To further verify the effectiveness of SWVD B-spline, in addition to the same outliers in the above experiment, we added a set of spotted outliers during 105.2 s-106.3s.The proposed method is used to handle outliers in y(t), and the detection and correction results are shown in figure 10.
For the data with complex random errors, SWVD B-spline may have a small amount of missed and incorrect detection.But overall, SWVD B-spline is still applicable even if the random errors are complex, and the correction effect of outliers is good.Because in the actual situation, the external ballistic velocity data often includes inflection points.To verify the processing effect of SWVD B-spline on the outliers near the inflection point, we used experimental data 2 for experiments.Firstly, a set of Gaussian white noise ε ∼ N(0, 2.5) was added to experimental data 2.Then, two different isolated outliers were added to the data at the 52.6 s and 127 s, respectively, and six different types of spotted outliers were added to the data during 17.65 s-18.55 s, 41.95 s-42.75 s, 77.25 s-79.1 s, 155.95 s-157.5 s, 159 s-159.7 s, and 205.45 s-206.8s, respectively.The detection and correction results of SWVD B-spline not using DE are shown in figure 11.
As seen in figure 11, SWVD B-spline can get good detection and correction results for outliers far from the inflection point.However, for outliers near the inflection point, such as figures 11(f) and (g), although the outliers can be detected, their correction results deviate from the original data.If the spotted outliers near the inflection point are long, it is easy to appear the overall offset.To solve this problem, we used DE to optimize SWVD B-spline in the sliding windows that include the inflection point.Its partially enlarged detail is shown in figure 12.In figure 13, the comparison of the differences with experimental data 2 shows the correction effect of the outliers near the inflection point.
The outliers in figures 12 and 13 correspond to figures 11(f) and (g).The experimental results show that the correction effect of using DE to optimize SWVD B-spline is better.
To verify the rapidity of the proposed method, we took experimental data 2 as an example to count the operation time on the device with a CPU of i5-8400 and memory of 16GB, and the results are shown in table 3.
When the sliding window includes the inflection point, it obviously takes longer time using DE than not using DE.However, whether DE is used, the average operation time of each window is less than the sampling time of 0.05 s, so the proposed algorithm can rapidly deal with the outliers in the external ballistic velocity measurement data.The average operation time of each window (s) 0.0017 not using DE 0.0018 0.0017 using DE 0.1715 0.0081

Conclusion
SWVD B-spline is proposed to solve the problem of outliers handling in external ballistic velocity measurement data.To verify the effect of this method, we conducted experiments, and the conclusions of the experiment are as follows.
(1) For the velocity measurement data without any inflection point, this method can accurately detect and correct the outliers in the data.(2) For the velocity measurement data that includes the inflection point, we propose using DE to optimize the knot vectors of variable order B-spline in the windows that include the inflection point.(3) The proposed method is a point-bypoint sliding operation, and the average operation time of each window is less than the sampling time of 0.05 s, so SWVD Bspline can be applied to the point-by-point rapid processing of external ballistic data.In summary, the proposed method can detect various outliers rapidly and effectively, and it also can correct the detected outliers to generate valid data that conforms to the original data trend.Furthermore, The detected outliers can also be applied to engineering analysis.Hence, SWVD B-spline has good engineering application value.In future work, we will consider data processing problems in more complex environments, and SWVD B-spline can also be further optimized and applied to other fields.

Figure 1 .
Figure 1.The differences between the fitting results and the truth values.

Figure 2 .
Figure 2. The flow chart of SWVD B-spline handling outliers.

Figure 3 .
Figure 3.Comparison of the distances between the original data and the corresponding estimated values.
3 s, and its detection results are shown in figures 4(a) and (c).The set of spotted outliers was added during the 10.3 s-10.4 s, and its detection results are shown in figures 4(b) and (d).

Figure 4 .
Figure 4.The detection results of the outliers in the initial window data.

Figure 5 .
Figure 5.The detection results of SWVD B-spline for outliers.

Figure 6 .
Figure 6.The detection results of the first-order difference method for outliers.

Figure 7 .
Figure 7.The detection results of the DBSCAN method for outliers.

Figure 8 .
Figure 8.The correction results of SWVD B-spline for outliers.

Figure 9 .
Figure 9.The differences with experimental data 1.

Figure 10 .
Figure 10.The detection and correction results of SWVD B-spline for outliers.

Figure 11 .
Figure 11.The detection and correction results when SWVD B-spline not using DE.

Figure 12 .
Figure 12.The detection and correction results when SWVD B-spline using DE.

Figure 13 .
Figure 13.The differences with experimental data 2.

Definition 1 .
Let f(t) be an unknown function to be determined on the interval [a, b], P(t) be a polynomial function of time t, and H n be a set of polynomials with degree k, whose coefficients are real.Then ∆(P) = max [29]ose that the p-th window contains data {S(t p ), S(t p+1 ), . .., S(t p+L−1 )}(p ∈ {1, 2, . .., n − L + 1}) of length L, let knot points u 0 ⩽ u 1 ⩽ u 2 ⩽ . ..⩽ u m+k+1 , and B-spline basis functions are defined by the De Boor recursive formula[29]

Table 1 .
The RMSE results of the different methods.

Table 2 .
The detected outliers number statistics.

Table 3 .
The average operation time of each window.