Simulation analysis of outage recovery time of navigation satellite

Outage recovery time is an important factor affecting the availability of navigation satellite. In order to effectively estimate the outage recovery time of satellite on orbit at the design stage, this paper analyses the specific process of outage recovery, and puts forward the modelling and analysis method of outage recovery time according to the characteristics of outage recovery process. In this paper, outage recovery time is divided into fault detection time, fault location time, fault handling time, state recovery time and access system time. By using the correlation analysis method, the equipment that may cause outage can be located quickly. Taking the recovery time corresponding to a certain recovery strategy as the input, the recovery time on the satellite can be obtained by weighting the recovery time and failure rate. Considering the randomness of outage recovery process, state flow graph is used to model and simulate the recovery time. Finally, an example of simulation analysis is given.


Introduction
Satellite navigation system has become an important space infrastructure of national economy and national defense construction. With the completion of BDS (BeiDou Navigation Satellite System) deployment in China, BDS will enter the stage of comprehensive application. Continuous broadcasting of continuously available navigation signals is the basic service demand of navigation satellite. In engineering, the degree of meeting the demand of navigation satellite is quantitatively evaluated by the availability requirements. Generally, the status of satellite unavailability is defined as outage, and the outage frequency and recovery time are used to quantitatively characterize the outages.
In the operation of satellite navigation system, the high outage frequency directly affects the continuity, and the long outage recovery time directly affects the availability. In July 2019, the Galileo system was interrupted for 117h due to technical failure of ground infrastructure, which significantly affected the use of users. Therefore, in order to ensure the availability of navigation services, people must pay attention to outage recovery strategy and control recovery time.
In order to quantify the impact of outage and improve the design, BDS proposed the outage recovery time requirement. Outages are often caused by soft faults on the satellite. The recovery of outage is a complex dynamic process. However, the current research and public reports mainly focus on the availability of navigation constellation [1][2][3][4], and mostly take the satellite as the basic unit, through setting some basic data to model and evaluate the availability of constellation. Other studies have explored the availability modeling of satellites [5,6] or the propagation model of soft errors [7], but none of them involve how to obtain the value of outage requirements. The causes of failures and recovery strategies are uncertain. Under this background, this paper analyzes the composition of outage recovery process and recovery time, and according to the characteristics of outage recovery process, puts forward the modeling and analysis method of outage recovery time.

Outage recovery process and recovery time decomposition
The navigation satellite constitutes the space segment of the satellite navigation system, which provides users with continuous space navigation signals through multiple coverage. When the navigation signal of a navigation satellite is interrupted or abnormal, it will lead to a single satellite service outage event. According to the time sequence, the recovery process can be divided into three stages: fault detection and location, determination of disposal strategy and implementation, satellite access to navigation system.
(1) Fault detection and location. The ground system receives the navigation message in the visible orbit section of the satellite. After finding the data abnormality, it interprets and analyses the data and locates the faulty equipment.
(2) Determine the disposal strategy and implementation. After positioning the satellite fault equipment, reset or switch the equipment according to the failure preconception. The equipment performs initialization, time initial synchronization and other necessary procedures, and then enters the normal working state. Some failures may require coordination between the ground system and the satellite system, resulting in some management delays.
(3) Satellite access to navigation system. After the satellite returns to normal, the ground will observe and process the satellite again, complete the navigation message injection, and restore the satellite to the network.
Combined with the above recovery process and the general process of satellite fault handling, the typical recovery process of single satellite outage is shown in Figure 1.   It can be seen from Figure 1 that the recovery process of satellite outage consists of the following parts: (1) Fault detection time. It refers to the time when the ground segment detects the abnormal signal after the satellite is abnormal or interrupted. This time is related to the coverage of ground segment monitoring and the interpretation time of abnormal data. Satellite faults outside the monitoring range usually need to be judged after entering the visible area. The process is deterministic, but the time required is uncertain.
(2) Fault location time. The repeated anomaly can be located quickly, and the first occurrence anomaly may need multi-party joint location. Both the process and the time required are uncertain.
(3) Troubleshooting time. It refers to the time for sending telecommand to the satellite and equipment to resume normal functions. The process of fault handling is affected by the fault equipment, fault mode, recovery strategy and start-up characteristics of the equipment. Its process and required time are uncertain.
(4) State recovery time. It refers to the time for the satellite to recover its operation state, such as parameter recovery, message injection, etc. The process of state recovery is affected by faulty equipment and recovery strategy, and its process and required time are uncertain.
(5) Time of access to navigation system. After the ground segment confirms that the satellite is normal, set the satellite as available, and the time for the satellite to participate in the navigation service again. This process and the corresponding time are relatively certain.
Therefore, the recovery time of satellite outage can be expressed as

Determination of analysis object and scope
The composition of outage recovery time of navigation satellite is investigated. The failure detection time, failure location time and access time to navigation system are mainly affected by management factors. Except for data interpretation time, failure diagnosis time and design related, other time belongs to management and support delay time. The fault handling time and state recovery time are mainly determined by the design, which are designable and verifiable. Therefore, the outage recovery time can be divided into two types: the inherent design time and the ground delay time. The ground delay time is mainly estimated according to the experience data, while the time of data interpretation and fault diagnosis by the ground software is relatively fixed. Excluding the ground factors, the fault handling time and state recovery time after the outage of navigation satellite show great differences with the different fault equipment and fault mode. Therefore, these two parts of time are the focus of the analysis of outage recovery time, which is collectively referred to as on-board recovery time in this paper.
The recovery time of outage is different from the average repair time in maintenance engineering. A significant difference is that the failure does not necessarily lead to outage, for example, the loss of a remote control channel only affects the remote control function for a short time. After the earth sensor fails, it can be replaced by a redundancy equipment or star sensor, which will not affect the normal operation of the navigation function. A satellite has more than one hundred equipment and tens of thousands of components. A comprehensive analysis of the relationship between these equipment / components and satellite outage will consume a lot of resources. Therefore, it is necessary to first determine the scope of outage recovery time analysis in a simple and effective way, that is, the equipment and components related to navigation signal outage.
The availability engineering of ESA (European Space Agency) proposes an outage analysis method similar to FMEA [8] , lists possible outage events for satellite equipment, and further analyses the causes and effects. This method needs to cover all equipment and costs a lot of manpower, time and other resources. In order to improve the analysis efficiency and save cost, based on function analysis and information flow analysis, correlation analysis can be carried out to quickly narrow the analysis scope from top to bottom. The analysis steps are as follows: a. Obtain the composition, function and redundant design information of all subsystems; b. To analyse the relationship between each subsystem and the navigation signals, and clarify the effects of outages; c. For the subsystem that may lead to satellite outage, the relationship between each equipment and navigation signal is further analysed, and determine the underlying unit that may lead to satellite outage.
In order to quickly determine the relationship between subsystem, equipment and satellite outage, table 1 gives some reference clues. Weak On this basis, the FTA method can be used to get the list of equipment and parts that may cause satellite outage by taking "navigation satellite service outage" as the top event and using the results of outage analysis. Suppose that there are n equipment and each equipment has i n failure modes, which will cause navigation satellite service outage. Each failure mode has m disposal strategies, and the recovery time corresponding to each disposal strategy can be divided into k time units. Assuming that in any outage event, the probability of a failure mode adopting a certain disposal strategy is (0 1)   , the on-board recovery time corresponding to the failure mode can be described as  Where: rs T is the average recovery time on the satellite; i  is the failure rate of the equipment i , which usually comes from reliability prediction or equipment reliability assessment results.

Establishment of simulation logic
The occurrence of navigation satellite outage is random, and the recovery process is uncertain due to different failure reasons. For the same failure mode, because the implementation process is closely related to the failure state and ground operation, the recovery time of each outage is also different. Especially when considering the time of detection and location, due to the difference of satellite transit time, management delay and different support resources, the recovery time of each outage is more uncertain. Therefore, in order to obtain more accurate recovery time, it is necessary to carry out Monte Carlo simulation analysis for the randomness of the recovery process.
The typical recovery time simulation logic is shown in Figure 3.
Where: ri T is the recovery time in the i th simulation.

Background
The basic mission of a navigation satellite is to receive the navigation message and send the navigation signal to the ground system and users. It is found that there are three equipment and five failure modes that cause the outage of satellite signal. The relevant information is shown in Table 2.

Simulation model of recovery time
It can be seen from table 2 that the satellite has two types of failure modes and three typical recovery strategies. Due to the occurrence of outage, implementation of recovery strategies and state recovery is a typical state transition process, so the Simulink is used for modelling.
The basic structure of a navigation satellite recovery process model is shown in Figure 4, and the processing unit recovery process model is shown in Figure 5.    Table 2, set the simulation time as 1 year, and get the simulation results of the average recovery time is 7.01min.
According to the operation results of ground test and on orbit interruption disposal of navigation satellite, the recovery time of satellite itself is within 10 minutes. The simulation results are in good agreement with the actual test results.

Conclusion
Analyzing the outage requirements and carry out quantitative availability design is the basic way to ensure the long-term stability and continuous availability of the navigation satellite system. Outage recovery time is an important factor affecting the availability of satellite navigation system. This paper reviews the outage recovery process, puts forward the analysis method of outage recovery time, and introduces a simulation case. The main conclusions are as follows: (1) The recovery time of satellite outage can be divided into fault detection time, fault location time, troubleshooting time, state recovery time and time of access to navigation system. Troubleshooting time and state recovery time constitute on-board recovery time and are the focus of analysis.
(2) In order to improve the efficiency of analysis, the correlation analysis method can be used to quickly locate the equipment that may cause outage.
(3) The on-board recovery time is based on the recovery strategy with a certain failure mode, and is obtained by weighting the recovery time and failure rate.
(4) Considering the randomness of the recovery process, it is necessary to establish Monte Carlo simulation model for simulation analysis.