Analysis of statistical methods for outlier detection in telemetry data arrays, obtained from “AIST” small satellites

Due to various disturbing factors affecting data reception, telemetry information often contains outliers and lost entries. Detection of outliers and anomalies is one of the key problems of satellite data analysis. A comparison of several statistical methods of outlier detection in telemetric information on the power supply system, obtained from the “AIST” small satellites, was carried out for the purpose of increasing the accuracy of satellite’s systems monitoring based on the telemetry data. The structure of telemetry from the power supply and thermal control systems of the “AIST” small satellites was analyzed. Several statistical methods for outlier detection were studied. Methods, described in the paper, allow to detect fast acting changes in telemetry data from “AIST” small satellites.


Introduction
Telemetry data is the main source of information about the events, occurring onboard of an operating spacecraft. Through the process of telemetry analysis, the emergency markers are identified, the satellite's operation is being monitored, the platform's performance is evaluated and failures and malfunctions of the onboard equipment are detected.
Telemetry data is transmitted to Earth during the communication sessions between the satellite and the ground control center [1]. However, the received data is prone to having outliers and missing data entries due to the influence of numerous disturbing factors, such as weather conditions and equipment malfunctions. The paper [2] presents a study of methods for imputing missing data entries into telemetry arrays and proposes an algorithm for imputing data into the telemetry from "AIST" small satellite. The paper, however, does not cover the problem of outlier detection.
Detection of outliers and data anomalies is one of the main problems in satellite data analysis. There are several definitions of the term "outlier". In [3], an outlier is "an observation (or subset of observations) which appears to be inconsistent with the remainder of that set of data", in [4], an outlier is "an observation which deviates so much from other observations as to arouse suspicions that it was generated by a different mechanism".
In [5] a well-structured and elaborate review for outlier and anomaly detection in data arrays is given. Due to the diversity of data types, researchers propose their own methods and algorithms for detecting outliers [6], [7]. According to [5], most of the methods can be divided into classification (neural networks), nearest neighbor, cluster and statistical, as well as methods related to information theory and spectral theory. This paper discusses application of a number of parametric statistical methods for analyzing the accumulated telemetry information from the "AIST" small satellites. Detection of outliers in the 2 telemetry information will allow to determine the anomalous readings from onboard equipment sensors, which may indicate a malfunction in any of the spacecraft systems, as well as determine outliers caused by errors when receiving telemetry information, which will reduce the error in assessing the state of the spacecraft onboard systems.

Structure of the telemetry data
Telemetry data, received from the "AIST" small satellite [8], is an array of measurements of main parameters, defining the state of the spacecraft. Telemetry data is transmitted to the ground station as a series of frames. Structurally, these frames are series of bites of predefined length with a header in the beginning and a control sum at the end. Thus, telemetry information is a consistently measured data set at equal intervals of time, which makes it possible to characterize telemetry information as a time series.
Let us group the telemetric information based on the purpose of the "AIST" small satellite systems. Table 1 lists parameters of the power supply system and table 2 lists the parameters related to the thermal control system. A 0 Power control system contains date and time of readings, values of the main parameters, onboard voltage and current, solar array current and electric currents of onboard subsystems of the satellite (17 in total). Telemetry of the thermal control system contains time and date of measurements, readings from temperature sensors, located on external and internal surfaces of the satellite, and also temperatures of certain subsystems, such as battery, telemetry controller, navigation equipment, transmitters and onboard computer. The results of the analysis of the operation of the thermal control system of the "AIST" small satellite are presented in [9,10,11].

Analysis of statistical methods of outlier detection
In this section we will review several statistical methods on their ability to detect outliers in a test telemetry set, based on the power supply system telemetry from the "AIST" small satellite. The test set was constructed from a regular telemetry file by adding several outlier values to it. The data array contains recordings of three parameters of the power supply system: on-board voltage (Ubs, V), on-board current (Ibs, A), solar array current (Isun, A). On the voltage chart, sporadic data changes are noticeable, which are beyond the allowable values.
Sharp current surges of the on-board network up to 1.5 A, with a duration of about 5-10 minutes, are visible on the graph of the on-board network current; such jumps make it possible to judge that some onboard equipment was operating at that time.
The graph of the current of the solar cells is observed to change from 0 to 2 A, which indicates the frequency of the apparatus entering the Earth's shadow as it moves in orbit, there is also one sharp jump to 4 A.

Standard score
Standard score (or z-score) is a measure of relative spread of the observed or measured value, which shows how many standard deviations make up its spread of the relative average value [12]. Z-score is defined as follows: where x and s indicate the mean and standard deviation of the sample, respectively. In order to find outliers in telemetry arrays, the z-score is calculated, which requires centering and normalization of a random value. Then the value points that are too far from the zero are identified as the outliers. The threshold value of 2.5 or -2.5 was used, therefore, if the z-score was below -2.5 of above 2.5, it was identified as an outlier.
This method has a disadvantage -the outliers influence the mean values and the standard deviation. Considering that the z-score method relies on these values for measuring the central tendency and dispersion, the results could be inadequate.

Modified Z -score method
Modified z-score, described in [13] uses median and the median of the absolute deviation of the median (MAD) instead of the mean and standard deviation of the sample thus reducing the influence of the outliers on the score.
Modified z-score is computed as follows: where 0,6745 -is the 0,75-th quartile of standard normal distribution, that the MAD converges to for large normal data.

The interquartile range
InterQuartile range (IQR) is a measure of statistical dispersion, much more resilient towards outliers, being equal to the difference between 75th and 25th percentiles, or between upper and lower quartiles. According to [14] any data point, laying in the range 1,5*IQR lower than first quartile ii outlier where outlier x -outlier data point,

Results
The test telemetry set, based on the power supply system telemetry from the "AIST" small satellite was analyzed using the abovementioned methods. Table 3 shows outliers, detected using z-score method for the on-board voltage Ubs, V, demonstrating that according to the method values Ubs, V 7,15, 12,0 and 3,61 are outliers, as their zscore is less than 2,5.  Therefore, z-score method considers Ubs values lower than 13,38 and higher than 15,68 to be outliers. For on-board current Ibs, the values lower than 0,11 and higher than 0,29 A are considered as outliers. For the solar array current, according to the z-score method every value above 1,7 A is an outlier. Table 4 shows outliers, detected using modified z-score method in the on-board voltage Ubs, V data array. According to the modified z-score method Ubs values lower than 13,78 and higher than 14,76 to be outliers. For on-board current Ibs, the values lower than 0,11 and higher than 0,29 A are considered as outliers. For the solar array current, according to the modified z-score method every value above 1,7 A is an outlier.
Analysis of the Ubs telemetry data with the InterQuartile range method shows that values lower than 13,64 and higher than 14,76 to be outliers. For on-board current Ibs, the values lower than 0,19 and higher than 0,27 A are considered as outliers. For the solar array current, according to the IQR method every value above 1,79 A is an outlier.
Between the compared methods, modified z-score and the IQR have shown the best and the most similar results. Figure 1 shows the test telemetry array with the outliers, detected with modified zscore and IQR methods marked with dots.
It should be noticed, that by analyzing the data, presented in Figure 1, the modified z-score method and IQR identified the data instances that appear to be caused by errors in the reception of telemetry information as outliers. At the same time, these methods have also marked data instances that are obviously not outliers, namely, increased on-board current values (Ibs, A) corresponding with the spacecraft communication session. Figure 2 shows the test telemetry from the thermal control system of the "AIST" small satellite with the outliers, detected with modified z-score and IQR methods marked with dots.
The periodic change in the readings from thermal sensors that can be noted on figure 2 is caused by rotation of the satellite around its center of mass. All deviations were successfully detected as outliers by the modified z-score and IQR methods.

Conclusion
The article has presented a structural analysis of telemetry array of the power control system and thermal control system of the "AIST" small satellite. Statistical methods, capable to detect outliers in arrays of data, were analyzed.  6 Methods, considered in the article, allow finding uncharacteristic data values in telemetry arrays. However, they often mark valid data instances as outliers, as it has been shown on the example of the satellite's on-board current values spiking during the communication sessions. Thus, it can be concluded that such methods are only suitable for studying telemetric information data, identifying data instances that may require specific attention, be it an outlier or a sudden change in the operating parameters of the onboard equipment. For further, more detailed analysis of telemetric information data, it will obviously be necessary to use neural network and cluster methods in order to effectively detect outliers and anomalies.