An approximate optimal control method for 6-DOF spacecraft

In this paper, an approximate optimal control method based on adaptive dynamic programming (ADP) is proposed for the proximity operations with six degrees of freedom (6-DOF) relative motion control of spacecraft. Firstly, the dynamic model is normalized and dimension-reduced to avoid the excessive gradient change of the neural network. Then, a nonlinear disturbance observer is designed to deal with the disturbance. Finally, using ADP ‘s reinforcement learning idea, a single-critic neural network is constructed to solve the approximate optimal control strategy and minimize the cost function. The simulation results show that the designed controller can achieve the attitude and orbit maneuvering targets, and the overall energy consumption is approximately optimal.


Introduction
With the development of space technology, there is an increasing demand for close-range operations using spacecraft, such as observation of non-cooperative targets, spacecraft formation and so on.Due to the coupling between spacecraft attitude and orbit, the controller design is challenged.
At present, for the control problem of 6-DOF spacecraft, sliding mode control [1], backstepping control [2] and model predictive control [3] have been widely used.However, the sliding mode control will generate high-frequency chattering when switching the sliding mode surface; backstepping control can easily lead to 'differential explosion'; model predictive control has a large amount of calculation.
Adaptive dynamic programming (ADP) is an intelligent control method integrating reinforcement learning, dynamic programming and neural network.The approximate optimal control strategy is obtained by approximately solving the Hamilton-Jacobi-Bellman (HJB) equation [4].It has a wide range of applications and is suitable for solving complex nonlinear control problems.
In recent years, the control strategy based on ADP has been studied in the aerospace field.In reference [5], the attitude control problem with pointing constraints is studied based on ADP.Based on ADP, the integrated guidance and control problem with impact angle and field of view constraints is studied in reference [6].In reference [7], Based on ADP, the optimal tracking control problem for leader-follower spacecraft formation flying system is solved combined with event-triggered control.
In this paper, an ADP-based control method is proposed for 6-DOF spacecraft, which avoids the excessive change rate of neural network weights by normalization and dimension-reduced.Combined with the nonlinear disturbance observer to deal with the disturbance, an ADP-based control method is used to minimize the objective function to achieve the control target.

Spacecraft attitude dynamics
The modified rodrigues parameters (MRPs) is used to describe the attitude of the spacecraft.The attitude dynamics and attitude kinematics of the spacecraft are described by [8]: where , ,      denotes the attitude of the body-fixed frame relative to the Earth-centered inertial frame. 1 2 3 , ,    denote the components of spacecraft attitude respectively.J denotes the inertia matrix of spacecraft. and d  represent the control torque and the external disturbance torque of spacecraft respectively.I n denotes the n n  identity matrix.  S  is a cross-product matrix given by:

Spacecraft relative translation dynamics
Assume that the distance between the tracking spacecraft and the reference spacecraft is much smaller than the radius of the Earth.In the local-vertical-local-horizon coordinate frame, the relative motion dynamics of the tracking spacecraft relative to the reference spacecraft are described by [9]: where denotes the position vector from reference spacecraft to the tracking spacecraft.m represents the mass of spacecraft, R denotes the distance between the earth centre and the reference spacecraft,  is the gravitational constant,  is the true anomaly for the reference spacecraft, , e denotes orbit eccentricity, a denotes semimajor axis.f and f d represent the control force and the external disturbance force.

Position normalized 6-DOF spacecraft dynamics
Considering that use this dynamics model directly will lead to the gradient variation of the neural network change excessively.The dynamics model is normalized to make the attitude and orbit in the same order of magnitude.Define  is the maximum distance between the reference spacecraft and the tracking spacecraft, which is set artificially.The 6-DOF spacecraft dynamics can be rewritten as: where , where R L b is the bodyfixed frame to the LVLH frame rotation matrix.

Dynamics dimension-reduced
Firstly, an intermediate variable is defined to reduce the system dimension to reduce the number of neurons required in the controller design.

 
where , , , , , is the intermediate variable,  is a positive constant to ensure the system is stable, q d and q d  represent the desired states and its derivative.Obviously, when  approaches 0 , the system reaches the maneuvering target.Derivation of the Eq. ( 5): where    

Design of disturbance observer
Considering Eq. ( 6), the disturbance observer is introduced to deal with continuous perturbations to the system: where z is an auxiliary variable, l is a positive constant, d is the estimation of the disturbance d .
Assumption 1.The disturbance d is continuous and bounded: , 1max d and 2 max d are positive number.Then, the control quantity for handling disturbances is designed as:

Design of control method
The dynamic equation of the 6-DOF spacecraft is: where disturbance d is processed by disturbance observer.
Define the cost function: where Q and R are positive-definite symmetric weight matrices.Then, define the Hamiltonian as: where By solving HJB function, the optimal control law * u is given by: Substitute Eq. ( 12) into Eq.( 11) and get: With the help of neural network (NN), the optimal cost function and its gradient can be estimated as: , , , is the nonlinear NN activation function,  is the reconstruction error of NN. the Hamiltonian can be rewritten as: where .
The optimal cost function and its gradient can be estimated as: Accordingly, the approximately optimal control can be obtained as: Then, the approximate Hamiltonian can be written as: , the adaptive weight update law based on the gradient decent algorithm is considered as: Assumption 2. Assume that there is a positive definite matrix     :

Stability of disturbance observer
Define the estimation error of the disturbance: Taking the derivative of Eq. ( 22) of time and substituting Eq. ( 7): Differentiating Eq. ( 24) and substituting Eq. ( 23): Therefore, after a long time, the norm of the estimation error is bounded by

Stability of ADP
Define the weight estimated error: The estimation of the Hamiltonian function can be rewritten as: Thus: Define the Lyapunov function:  are positive constant.Differentiating Eq. (30): where  are positive constant to guarantee 7 0 There exists a positive constant 10 0   , and When one of the two conditions is satisfied, 0 The Add and subtract  on the right at the same time: When one of the two conditions is satisfied, 0 According to standard Lyapunov extensions reference [10], by choosing appropriate parameters, intermediate variable  , weight estimation error W  are uniform ultimate boundness.

Numerical simulations
The simulation parameters are selected as follows: semi-major axis .The controller paraments are set to be: , initial NN weights 0 18 0.001 W  1 , where 18 1 is a column vector of 1.Output range of actuator: . Disturbance is set to be: For comparison, the comparison control law is set to be the sliding mode control (SMC): The simulation results are shown in the figures:   From the figure 1-5, it can be seen that the proposed control method based on ADP in this paper and the comparison control method completing the maneuvering target almost at the same time at about 250s, however, the energy consumption of spacecraft attitude and position maneuver is reduced under the proposed control method compared with the comparison control method.From the figure 6, the weights of the neural network change slowly at first regarding to the limitation of the actuator which lead to a large deviation between the calculated control law and the actual control law, but the neural network weights converge in the end.

Conclusions
In this paper, an approximate optimal control strategy based on ADP is designed for 6-DOF spacecraft.By means of normalization and dimension-reduced, the excessive gradient of neural network weights is avoided and the number of neurons is reduced.The cost function is approximately minimized under the premise of ensuring the completion of the maneuvering target, and the stability of the proposed controller is proved by Lyapunov stability theory.The simulation results show that the proposed method consume less energy and achieve approximate optimal control.

2 {3
, the mass of the spacecraft is assumed to be 50kg m , the inertia matrices of the spacecraft is assumed to be