Improving resilience of sensors in planetary exploration using data-driven models

Improving the resilience of sensor systems in space exploration is a key objective since the environmental conditions to which they are exposed are very harsh. For example, it is known that the presence of flying debris and Dust Devils on the Martian surface can partially damage sensors present in rovers/landers. The objective of this work is to show how data-driven methods can improve sensor resilience, particularly in the case of complex sensors, with multiple intermediate variables, feeding an inverse algorithm (IA) based on calibration data. The method considers three phases: an initial phase in which the sensor is calibrated in the laboratory and an IA is designed; a second phase, in which the sensor is placed at its intended location and sensor data is used to train data-driven model; and a third phase, once the model has been trained and partial damage is detected, in which the data-driven algorithm is reducing errors. The proposed method is tested with the intermediate data of the wind sensor of the TWINS instrument (NASA InSight mission), consisting of two booms placed on the deck of the lander, and three boards per boom. Wind speed and angle are recovered from the intermediate variables provided by the sensor and predicted by the proposed method. A comparative analysis of various data-driven methods including machine learning and deep learning (DL) methods is carried out for the proposed research. It is shown that even a simple method such as k-nearest neighbor is capable of successfully recovering missing data of a board compared to complex DL models. Depending on the selected missing board, errors are reduced by a factor between 2.43 and 4.78, for horizontal velocity; and by a factor between 1.74 and 4.71, for angle, compared with the situation of using only the two remaining boards.


Introduction
Sensor reliability is one of the key properties of sensors and sensor networks. This is particularly true in the case of space exploration, since the effect of missing data can strongly affect science output, and at the same time, replacing or repairing a damaged sensor is usually not possible. The case example studied in this paper will be the wind sensor of the TWINS instrument, of NASA InSight mission, launched in 2018 [1,2]. Even though this wind sensor has not suffered any damage on Mars surface, other wind sensors have suffered partial damage. This is the case, for example, of the MEDA wind sensor [3], which has been partially damaged on Mars due to the presence of flying debris generated by Dust Devils [4]. Therefore, the use of techniques to improve sensor resilience in space exploration (and other fields) is a key research objective.
Atmospheric monitoring on Mars plays an important role in developing scientific knowledge of the planet. It involves various sensors to characterize the environmental dynamics (air temperature, pressure, sky opacity, wind velocity, etc). Among the various sensors used, wind sensors have been considered as key elements for the exploration of the atmosphere [5,6]. Thus, it is crucial to receive data from sensors using a reliable approach. This implies uninterrupted monitoring of wind activity using the received data.
With the increased use of sensors, sensor problems have also been common in different domains such as industrial condition monitoring systems, traffic monitoring systems, marine systems, and healthcare systems [7][8][9]. The possible reasons behind data discontinuities, or stoppages, can be network outages, power issues, or sensor failures. Particularly, sensors used in space exploration missions operate under harsh environmental conditions [10]. Considering the importance of sensors in space exploration, they must be resilient against such scenarios, which can negatively affect the missions.
Therefore, various methods have been applied to recover the data of a sensor in case of long discontinuities. Data-driven methods come into play to provide data estimations for unavailable sensor using data from other sensors. The estimations of dependent variables using independent variables is known as regression problem [11]. Various data-driven models have been used to develop soft or virtual sensors in different applications [12][13][14][15]. Among these data driven models, machine learning (ML) and deep learning (DL) models have emerged as promising solutions for implementing soft sensors or predicting time series data. These models allow solving regression problems through training processes, which maps inputs to outputs. The data-driven methods provide reliable estimations through modeling of other process variables. However, it becomes difficult for data-driven approaches to accurately model data due to complex characteristics of available data for estimations [12]. Fortunately, ML and DL methods have been surpassing statistical, signal processing methods investigated for complex data analysis. Moreover, DL methods can learn rich features from non-linear data owing to their multilayered and hierarchical structure. The key features of DL models such as high accuracy, self-feature learning and the capability of working with raw/noisy data, have been capturing the attention of data scientists or researchers [16][17][18].
In this paper, a comparative analysis of various ML and DL methods is carried out for making complex sensors resilient against long data discontinuities. The comparative analysis among various data-driven models such as polynomial regression (PR), k-nearest neighbors (KNNs), random forest (RF), extreme gradient boosting (XGboost), multi layer perceptron (MLP), long short-term memory (LSTM), and bidirectional LSTM (Bi-LSTM) is conducted for implementing a soft wind sensing board. These ML and DL algorithms have been widely explored for similar problems with different model configurations. In fact, the performance of these algorithms not only rely on their hyperparameters but also depends on the nature of data. Depending on data complexity, sometimes a simple model can yield better results than a complex one [19,20]. Thus, all aforementioned models have been experimented for comparative analysis in this work.
Various data-driven methods have been used for sensor hysteresis compensation [21], sensor variable estimations and localization [22,23], sensor output classification [24], sensor calibration [13], and sensor fault diagnosis [25]. Considering the missing data imputation problem, there have been multiple studies conducted in different domains using data-driven approaches for imputing multivariate or univariate missing data using relevant input data features. A recent study [26] investigated multiple methods for imputing missing environmental data. The methods including missForest (MF), multivariate imputation by chained equations (MICEs), and KNNs were employed to address the random missing data problem. To conduct the proposed study, the authors artificially induced missing data in the environmental databases with varying percentages. The experiments were conducted on ten different pre-processed datasets and the achieved results demonstrated that the MF model was able to outperform the MICE and KNN models. The MF model allowed to reduce the missing data imputation errors significantly and KNN demonstrated less processing time compared to the other two models. Moreover, the researchers conducted a real case study on the performance monitoring station of Quebec waste water plant which showed the effectiveness of the proposed method in terms of addressing missing data problem in diverse environmental databases. Another relevant investigation [27] carried out experimentations on five data imputation methods with a US Credit Bureau dataset. The dataset after pre-processing included 329, 917 data samples and 153 variables and different percentages of data were removed to establish the datasets for the proposed methods. The methods included Amelia, MICE, mi, Hmisc, and MF. The performance of these methods was evaluated using root mean squared error (RMSE) against the conventional mean based imputation method. Based on the achieved results, the authors concluded that MF yielded the best performance and mi model yielded the worst performance. However, there was no impact on choice of an imputation model with change in data missing percentage. More recently, the authors of [28] have put efforts on experimenting potential of diffusion models on missing data imputation for both the categorical and numerical variables. The proposed method was applied to seven different open-source benchmarking datasets. The achieved results on the benchmark dataset demonstrated the effectiveness of the conditional score-based diffusion models for tabular data (TabCSDI) model compared to the well-known existing imputation methods including Mean, MICE (linear), and MF. Furthermore, the model provided evident better performance in imputing categorical variables with feature tokenization. For further improvement through the diffusion models, the authors pointed to a focus on improving inference time, model architecture improvement, and loss function optimization.
Compared to the existing work in the domain of sensors and missing data imputation, we present a data-driven approach to improve resilience of the sensor under circumstances of continuous missing variables of a complex wind sensor in space (Mars). This study focuses on imputing the missing data variables through feature learning from the data variable of remaining sensing units of the sensor. The investigation targets imputation of multiple numerical variables. Moreover, this work is an extension of our partial preliminary work [29,30], and we focus on a large number of different models, and we improve the results obtained previously.
The proposed methods are tested with the intermediate variables of the wind sensor of the TWINS instrument, of the InSight mission. The key contributions are: (1) We propose a soft wind sensing board based on a data-driven model for recovering the missing intermediate data of wind sensor feeding an inverse algorithm (IA) from which the final sensor output is established. In the case of missing data for longer periods from a board, the use of data predicted by a data-driven model can assist in improving sensor accuracy by recovering the intermediate variables.
(2) For implementation of the soft wind sensing board, a comparative analysis of multiple data-driven methods has been carried out. The analysis of different ML and DL models allows to select simple and efficient solution for the soft sensor. (3) We employ power transformation for data pre-processing. The pre-processing method makes the dataset more like a Gaussian distribution, which allows the models to perform smoothly and learn better representation from the data. (4) The proposed methods allow predicting multivariate data for one wind sensing board using the data of the two boards of the wind sensor of the same boom. The comparative analysis of various data-driven methods reveals that even a simple method like KNN works effectively for the proposed soft sensor owing to the nature of the data and model architecture.
The paper is organized as follows: section 2 reports the wind sensor and its working principle. Section 3 describes the dataset used in the research. Section 4 describes the algorithms, section 5 discusses experimental configuration and evaluation metrics. Section 6 reports and discusses the results achieved in this research. Finally, section 7 concludes the paper.

Mars wind sensor
The wind sensor of the TWINS instrument is very similar to the one in the REMS instrument of the Mars science laboratory (MSL) mission, launched in 2011 [16,17]. Both sensors are based on hot film anemometry implemented on two booms. At the surface of each of the booms, there are three boards in which the local wind velocity, at three points is monitored. Local 3D wind recovery is obtained at each boom using the data from its three boards. Finally, a high level algorithm implements the final wind selection from the two local wind estimations (one per each boom), based on the optimality of the wind angle incidence on the booms. Figure 1 shows the placement of the booms on the lander deck. Figure 2 depicts a detail of one of the booms, and one of its boards can be clearly seen. All boards in each boom are oriented sideways at an angle of 120 • between them. The booms are installed on the lander's deck in such a way that their horizontal plane is parallel to the ground and are placed diametrically opposite to each other. The booms are placed in this position to minimize the impact of other components on the InSight's deck. This also ensures that at least one of the booms is toward wind flow at all the times [2].
The detail of each board has been presented in [2,31,32]. The objective is to monitor convection heat transfer from four silicon dice which are heated above the temperature of the air and are kept at a constant temperature difference with regard to a fifth die, a cold die, which is not heated, on the same board. All dice have been placed on top of four supporting rods in order to thermally isolate them from the boom. Figure 3 shows a picture of one of the prototypes of the boards.
The hot dice have three kinds of Pt resistors deposited: one resistor is used for heating the sensor, a second one is used to measure the temperature (sensing resistor) and the last one is used as reference of the overheat for the measurements (it helps to set the temperature difference between the hot and cold dice). The cold die at the extreme of the printed circuit board uses only the sensing resistor to measure the temperature. As it has been mentioned before, the board operates under constant temperature difference between the hot dice and the cold die. Changes in the wind velocity observed by the boards are compensated by the closed loops controlling the temperatures of the hot dice [31]. The control loops implement thermal sigma-delta modulators.  The wind sensor data used in this paper is acquired at the sampling rate of 1 Hz [1]. For a better interpretation of the sensor's behavior over time, the variables from the four dices are reduced to two variables obtained as difference between the normalized convective conductance of the thermal sigma-delta modulators as follows [31]:

Generation of the intermediate variables
Calculating the intermediate variables B_LONG and B_TRANS requires first to obtain the convective power exchanged with the atmosphere by each of the four dice of the transducer board. The injected power (P inj ) for each die can be directly calculated using the sigma-delta pulses readings [31], to then calculate the convective power (P conv ). The conductive (P cond ) and radiative (P rad ) powers are calculated using the coefficients calculated in the thermal vacuum dice characterization tests [2]. The convective conductance (G) can be simply obtained dividing the convective power by the overheat as given in equation (4), which is directly related with the convective coefficient h, In order to make the convective conductance non-dependent on temperature, pressure and the conductivity of the fluid, the estimator B, the normalized convective conductance, is obtained, and using equations (1) and (2) B_LONG and B_TRANS can be finally obtained, where L is the characteristic die dimension, K is the fluid thermal conductivity, and Pr is the Prandtl number.

Inverse algorithm (IA)
The calibration of the sensor in the wind tunnel as described in [2] produces a calibration file relating Reynolds number (used as wind speed estimator), wind direction and the B_LONG-B_TRANS variables for each board (1)(2)(3). This calibration file has been interpolated for a fine Reynolds and direction mesh so that a lookup algorithm outputs the most similar Reynolds-direction for a given reading of B_LONG 1-3 and B_TRANS 1-3 . It must be noted that the IA computes the local wind velocity and angle at each boom, independently of the other, using the boards of each boom. The local wind velocities at each boom can be different, owing to the possible obstacles on the lander deck in the direction of the incoming wind. Additionally, for each boom, wind retrieval is optimal for a certain range of horizontal angles. For example, when the wind is coming from the rear of a boom, the wind inference is not correct, because the rear part of the boom is generating an aerodynamical distortion of the local wind pattern. It has been determined [1] that for all angles, at least one boom is always in a good position, and therefore, the last part of the IA includes a routine determining which local boom velocity is the one finally considered as correct (and placed in the Planetary Data System, PDS).

Restricted inverse algorithm (RIA)
The first possibility given a failure of a board is to use a restricted version of the IA, RIA, in which the lookup algorithm uses only a database populated with the Reynolds-wind direction associated to the two remaining pairs of B_LONG/B_TRANS (thus the missing B_LONG/B_TRANS pair is not used), as illustrated in figure 4. Other than the missing B_LONG/B_TRANS entries, this restricted algorithm is identical to the regular IA.
As mentioned before, the objective of this paper is to improve the resilience of complex sensors that use multiple intermediate data for generating the output signals. The proposed algorithm will be tested with data from the wind sensor of the TWINS instrument. We will consider scenarios with missing data from one board of any of the two booms of the sensor. The ML or DL network will be used to infer the B_LONG and B_TRANS intermediate variables of the missing boards, so that all the six intermediate variables can be fed into the IA. A comparison will be made with using only the RIA and the data from the two remaining boards per boom. Here, we show that the built-in correlations in the sensor can be further exploited because we use it for training, the data obtained while the sensor is on Mars surface. In the development of most missions, all this data will be much more than the data obtained during the calibration process. The amount of information used for training, validation and test is therefore much larger than the original set used for calibration.

Dataset and pre-processing
The TWINS instrument dataset, provided by the Centro de Astrobiologia (CAB, INTA-CSIC), is used for the proposed work. The dataset includes multiple comma-separated values files and each file contains the data for one sol (one day of Mars). The files include time-series data of the two booms (BPY and BMY), temperature, pressure, and wind sensors installed on the InSight lander. Each boom contains three boards. Thus, the variables to be used for the proposed models are BMY_LONG 1-3 , BMY_TRANS 1-3 , BPY_LONG 1-3 , and BPY_TRANS 1-3 . Where the B_LONG and B_TRANS are the orthogonal parameters of normalized convective conductance obtained using equations (1) and (2).
To investigate the proposed method, the individual data files of TWINS instrument from sol 126 to sol 254 (one sol = one Martian day) are loaded and combined. The combined dataset of 128 sols is then cleaned for removing the rows with no values. In the next step, the combined dataset is split into three sets with the percentage of 70%, 20%, and 10% as training set, validation set, and testing/cross-validation (CV) set, respectively. The training and validation sets comprise of 3534 901 data samples. The testing or CV set includes 354 491 data samples. After splitting the dataset into respective sets, it is necessary to scale the data for efficient representation learning. Sometimes, datasets need to be transformed into a Gaussian distribution to achieve optimal performance from a complex dataset. For the proposed work, we apply a power transform to make the data distribution more Gaussian-like. Moreover, the transformation has applied to all the used data-driven methods except PR.
The Yeo-Johnson (YJ) transformation is an upgraded form of the Box-Cox transformation. The transformation is applied on the InSight mission dataset to transform it into a Gaussian-like distribution, which allows DL model to effectively learn features from the non-linear data. It stabilizes the variance and normalizes the data distribution. The YJ transformation works for both positive and negative data variables. It makes data distribution symmetric about mean value for better analysis [33]. It has been extensively used for pre-processing datasets to improve the data analysis accuracy [34,35]. The mathematical function for the YJ power transformation is expressed as follows: where P d represents the transformed data, q is the input data, and δ is a transformation parameter. The transformation parameter is any real number between 0 and 2. The parameter is optimized by the transformation method to transform a data distribution closer to a normal distribution. We apply the YJ transformation to the dataset to improve the data quality. The pre-processing allows the DL model to efficiently learn representations from a Gaussian-like data distribution. The predicted data is transformed back to the original scale by applying the inverse YJ transformation.

Soft wind sensing board based on ML and DL models
Several ML and DL algorithms have been used, namely PR, KNN, RF, XGboost, MLP, LSTM, and Bi-LSTM models. This section provides a short theoretical description of these models implemented for multivariate data predictions.
Each of these models is explained in the following subsections:

Polynomial regression (PR)
A PR model allows to learn non-linear relationships between dependent and independent variables computing polynomials features. Multivariate PR can be represented by equation (7) [36]: where k is polynomial order, w l1 , w l1l2 . . . .w l1l2.....l k represent PR coefficients, and x l1 , x l2 . . . .x l k represent input variables. We are using the least squares method to find the polynomial coefficients through minimizing sum of squared error of the predicted outcomes to the ground-truth values. In this research, we use PR method to predict TWINS wind sensor data. Furthermore, a grid-search approach, which is explained in section 5 is utilized to select the optimal number of degrees for the analysis.

K-nearest neighbor (KNN)
KNN is a widely used method for regression and classification problems owing to its simple learning process. For this algorithm, selection of an optimal k value is of prime importance to achieve effective performance [37]. It uses feature similarity learned from nearest values of training data to predict new data instances. It employs a few simple steps to predict a new value from its functional subspace. First, it computes the distance between input data samples and available data samples and selects KNNs of input data samples. Then, yields output predictions as an average value of the selected KNNs. Thus, performance of KNN methods depends on the optimal selection of the k-value. KNN is a simple and efficient learning model. However, it becomes a lazy learning model with large datasets because it performs lots of distance computations with a large training data size [38]. We employ the KNN model for multivariate regression with Minkowski distance metric, which is given by equation (8): where x i is query data sample and y i is available data sample. The p is selected to use it as different distance metrics like if p is 1 then it is used as Manhattan distance. If p value is 2 then it becomes Euclidean distance. In this research, the p value is obtained through the grid-search approach.

Random forest (RF)
Ensemble learning enables for improving model performance through using multiple learners together. The RF algorithm was created by Breimen [39] with the idea of achieving better performance through ensemble learning. The RF model is a type of ensemble learning which consists of multiple decision trees which allows reduced overfitting and improved generalization performance through solving the problem of large variance by averaging several deep decision trees. This is also a supervised learning method which is used for both classification and regression problems. It provides predictions by utilizing multiple unrelated decision trees with a set depth. The final predictions are obtained through average of all decision trees in an RF model. This model is computationally efficient and easy to implement owing to its fewer hyperparameters such as tree size, tree depth, and number of variables. Moreover, there have been many examples in which the RF model is employed for a soft-sensor implementation [40,41]. In this research, RF model is used for multivariate regression problem.

Extreme gradient boosting (XGBoost)
XGBoost is an upgraded form of gradient boosting decision method [42] which was introduced by Chen and He [43]. XGboost has the capability to efficiently construct boosted trees that can operate in parallel and solve regression and classification problems. For achieving higher prediction performance, XGboost continuously adds and grows depth of trees by feature splitting. Once the model is trained, k trees are obtained with scores corresponding to each tree. The final prediction is obtained from scores of k trees. It tries to optimize an objective function with a gradient boosting framework and a regularization term to avoid the overfitting problem. Compared to general gradient boosting, XGBoost has highly scaled and flexible model structure that improves its speed and makes it less computationally intensive [44]. We use XGBoost model to predict multivariate data of the wind sensing board of TWINS instrument.

Multi layer perceptron (MLP)
MLP is a non-linear and widely used DL topology that consists of at least three layers including input layer, hidden layer, and output layer. These layers and interconnected by basic units called neurons. Within a MLP network, three main operations are carried out. First, input vectors are multiplied by weight matrices. In the second step, a bias value is added to the weighted input matrices. Lastly, an activation function is applied and the output is obtained as a weighted sum of the inputs. The activation function allows to ensure that each input data sample is mapped to different space in the final output vector [45,46].
The MLP network training part consists of two processes including forward propagation and backpropagation. Forward propagation computes output and backpropagation computes error and update the weights based on the error. Backpropagation updates weights using differentiation and the chain rule. The MLP network can include multiple hidden layers which learn the model parameters [47]. The learned parameters are further fed to the output layer which yields outputs with linear activations, in case of regression problems. In this work, MLP model is implemented for multivariate regression problem.

Long short-term memory (LSTM)
LSTM models have been extensively used owing to their memory-based architecture. Moreover, they have been able to provide promising results in continuous data predictions by avoiding the gradient vanishing problem [48], which typically represents a problem for deep neural networks.
The LSTM topology was proposed by Hochreiter and Schmidhuber [48], to overcome the limitations of recursive neural networks. The capabilities of LSTM, such as memorizing past data sequences and discarding unnecessary information using its gated architecture make it a good choice for problems with sequential data. Each LSTM unit consists of four sub-units or gates as shown in figure 5, which includes three gates and one memory unit. The gates are shared among all the cells in the unit. Each LSTM unit receives three parameters as input (x t ), and state of previous cell (C t−1 ), and previous hidden state (h t−1 ). After, processing of the current input information by the three gates, the gates states are normalized by a tanh activation function. The gates regulate the flow of information within LSTM unit and yield outputs.
The operation of an LSTM unit is defined as follows: Memory cell (C t ) can be considered as a mini-state machine that stores or memorizes the state of the unit.
Forget gate (f t ) decides what percentage of the information from the previous memory cell needs to be discarded or preserved. It decides through combining output of previous cell (h t−1 ) and current input (x t ) and applying a sigmoid activation function (σ) as expressed in equation (10). The sigmoid function is expressed in equation (9), and yields a value between 0 and 1. Where, 0 means forgetting and 1 means memorizing the state. In the subsequent step, it is multiplied with the previous cell state (C t−1 ) as given in equation (11), Input gate (i t ) selects the input value that needs to update the current memory state. It selects through combining output of previous cell (h t−1 ) and current input (x t ) and applying sigmoid activation function (σ) as given in equation (12). Then, tangent hyperbolic function (tanh) is applied on similar combination of parameters to generate candidate cell state vector (C t ). The tanh function given in equation (11), provides an output between range (−1, 1). Subsequently, the current cell state (C t ) is obtained from the addition of forget gate f t and input gate i t outcomes after element wise multiplication with previous and new cell state, respectively. The new cell state (C t ) is expressed as follows: Output gate (o t ) produces the output at time 't' based on the application of the sigmoid function on the current input and the state of the previous hidden layer (h t−1 ). The new hidden (h t ) is produced by element wise multiplication of the output gate and new the cell state of the cell (C t ) which is activated by function tanh, C t is the candidate cell computed using equation (13)  In this work, stateless LSTM layers are used in which LSTM cells reinitializes cell states for every new batch of data during the training of the model in order to reduce complexity of the network and have a faster training process.

Bidirectional LSTM (Bi-LSTM)
Bi-LSTM is an advanced variant of LSTM algorithm, it computes the final output of the hidden layer by combining the forward layer and backward layer [49] as shown in figure 6.  backward computations use similar operation as that of standard LSTM layer which is described by following expressions: where LSTM (•) denotes LSTM network.
The final output vector of Bi-LSTM layers is obtained using the expression as follows: where W− → hy and W← − hy denote the weight matrices of forward and backward LSTM layers, respectively. b y denotes the output layer bias. Moreover, σ (•) is a concatenating or an averaging function that combines both outputs.
Bi-LSTM structure allows the capture of important trends from data with forward and backward moving windows [50]. Thus, this study also investigates Bi-LSTM model for multivariate predictions for TWINS wind sensing board. LSTM and Bi-LSTM models require three-dimensional (3D) input as (batch-size, time steps, number of features). In this work, the input shape for these models is used as (512, 1, 4).
The DL architectures including MLP, LSTM, and Bi-LSTM have been implemented with three hidden layers of 16, 8, and 4 units. The parameters learned from the hidden layers of the DL models are lastly fed to the output layer, which comprises of two dense units with linear activation. The final dense units use linear activation function to yield the output variables. The number of inputs is four, from two wind-sensing boards (four input variables: B_LONG and B_TRANS of the remaining two boards). While, the number of outputs of the model is two, as the model is targeted for predicting variables for one board of TWINS boom.

Experimental configuration and evaluation
The overall methodology used for DL models is illustrated in figure 7.
The proposed DL models are trained with an unshuffled dataset. The raw data is fed to the models after pre-processing as explained in section 3. The activation functions are added in each successive layers of DL models except the last dense layer, which yields multivariate regression output. The Adam optimizer is employed for efficient training of the models. The optimizer allows the model to learn efficiently and yield accelerated and improved convergence towards minima. The learning rate was set to 0.0001 for effective learning of the models. The models are trained with a batch size of 512. The loss function used for the model is the mean squared error (MSE) given in equation (20). The training process is automatically controlled using early stopping callback, which stops the model when it starts overfitting on the data. The early stopping callback is used which monitors the validation error to avoid the overfitting problem. When the training process is stopped, the learned model parameters are saved. Lastly, the saved model is used to make predictions and then predicted data is inversely power transformed. The LSTM and Bi-LSTM models require 3D input shape (batch-size, time step, features), thus the input shape has been converted as per required shape. The models are fed with input features at one-time step per sample.  Regarding ML models (MPR, KNN, RF, and XGBoost) hyperparameters are obtained through the grid-search with the k-fold CV approach [51]. The grid-search CV approach allows to obtain most effective parameters based on k-folds. The grid-search CV approach is used to evaluate performance of ML models on unseen data or to find out how ML models would perform when these models are deployed for generating predictions in actual scenarios. We use five folds of data with the grid-search CV technique to obtain optimal parameters of the ML models. Figure 8 illustrates the general k-fold CV procedure used with the grid search approach. Moreover, the hypermeters obtained through the grid-search method are reported in table 1. The training and validation sets (90%) of the data are used for finding optimal parameters of the ML models including KNN, RF, and XGboost. After obtaining the optimal parameters, the models are trained using the training set (70%). Furthermore, these models are tested using the testing/CV (10%) so that their performance can be compared with the DL models. The models are trained on a graphics processing unit (GPU) sever with 256 GB RAM and four GPU units. The DL models are implemented with Keras and Tensorflow environments and ML models are implemented with Scikit-learn library. Some essential python libraries such as Numpy, Pandas, Seaborn, and Matplotlib are employed for data preprocessing and plotting the achieved results. Moreover, the performance of the model is evaluated using RMSE and mean relative error (MRE) as expressed as follows: where y i denotes the actual values, y i denotes the predicted values, and n denotes the number of samples.

Results and discussions
This section reports and discusses the results achieved with the proposed method. In this study, seven ML and DL methods were explored for multivariate predictions. For the applied methods, RMSE errors in BLONG-BTRANS variables and horizontal wind speed and angle are studied. The comparative analysis of the experimented methods allowed us to select an appropriate method which yields effective and consistent performance for all six wind sensing boards. The models have been tested for each one of the six wind sensing boards of two booms (at each time, only one board in one boom is predicted, using only the data from the other two boards in the same boom). The proposed models were efficiently trained for multivariate predictions of the six cases owing to the computing power of the GPU server, statistical transformation, and effective hyperparameters. Finally, the KNN, MLP, and LSTM models are able to generalize well on the dataset and able to yield consistent performance in all six cases as can be observed from table 2. The KNN model was found to be most suitable for the proposed soft wind sensing board owing to its high generalization performance, simplicity, and its lower computational cost compared to DL models (MLP and LSTM) which are computationally intensive due to the large number of parameters. The achieved results for the board-1 of the BMY boom are depicted in figure 9, which depicts line plots of the board-1 for the data of 5 min. The plot depicts the intermediate variables B_LONG and B_TRANS, and shows how predicted values are close to the actual values for all three boards. Figure 10(a) shows the comparison between velocity obtained from the ground truth data of the three boards and the IA, the velocity obtained from two boards' data and the RIA, as well as the velocity obtained from two boards and the one predicted board using a KNN model, fed into the IA.
As it can be observed from the figures, the wind velocity with RIA and two boards is quite different from the actual wind velocity. Moreover, the velocity obtained with two boards is highly fluctuating and in most instances, its magnitude is much greater than that of velocity obtained with three boards. Contrarily, the velocity obtained with the predicted data is quite similar to the original velocity as depicted in the figure. Similarly, recovered wind direction values with one board predicted are very close to the actual values compared to the values obtained from two boards and RIA which has a lot of fluctuations as shown in figure 10(b). A clear impact of the absence of BMY board-1 can be observed in the velocity and wind direction from two board's data which depicts higher fluctuations and has larger magnitude at some points  than the velocity and wind direction obtained from three boards. In contrast, the velocity with one board predicted, is quite similar to the ground truth velocity. Thus, the obtained results from the IA and one board predicted confirm the usefulness of the proposed method. Figures 11(a) and (b) illustrate RMSE of the wind variables including wind velocity and wind angle for the total test data. The errors are computed between the wind values estimated from predicted using different models and the original test dataset from all three boards, using the IA in both cases. Additionally, these figures also depict the RMSE between the estimated values based on three boards with IA, and two boards with the RIA. The errors were computed by taking into account only the data from booms (BMY/BPY) that were selected as test data for obtaining the wind estimations from the intermediate variables, i.e. the selected boom is well positioned for the angle of the wind. In all cases, one of three boards is unavailable.
The comparative error analysis from figure 11, clearly shows that the applied models are helpful in recovering wind values and are better than the values obtained from two boards and RIA. Among the applied models, some models including KNN, MLP, and LSTM seem to perform consistently. However, PR depicts high error for board-1 of BMY boom while RF, XGBoost, and Bi-LSTM show high error in case of the board-3 of BPY boom. Similarly, the KNN, MLP, and LSTM models perform well for all six boards as can be observed from figure 11(b). While, PR, RF, XGBoost, and Bi-LSTM depict high wind direction error for at least one board out of the six boards.
Similarly, figures 12(a) and (b) depicts MRE of the wind variables including wind velocity and wind angle for the test set. It also shows that the proposed methods are effective in recovering the wind values from the relevant variables in the case of one missing board. Furthermore, it confirms the superiority of KNN, MLP, and LSTM models over the other employed methods. Figure 10. Comparison between the values obtained using the inverse algorithm with the original three boards' data, the restricted inverse algorithm with the original data of only two boards, and velocity obtained using the data predicted for one missing board and the remaining boards with the complete IA of (a) horizontal velocity and (b) wind direction.  The performance comparison allows the KNN model to stand out among all the models based on its effective performance and simple architecture. In fact, MLP and LSTM also demonstrate an effective performance and have yielded similar as that of KNN. Among these models, KNN would be an obvious choice owing to its less-complex architecture and lower computational cost.
Considering the nature of data, the statistical transformation was essential to avoid bias and complexity in the training process. Some methods such as PR and KNN do not essentially require data transformation and these methods are able to work without data transformation. Nonetheless, by applying power transformation, the data were converted to a Gaussian-like distribution and these transformed data allowed for a better and smoother training process. For a large and complex time-series dataset, it becomes crucial to apply statistical transformations to achieve targeted results using ML and DL models. Moreover, the use of grid-search CV approach resulted in a smooth training process without the overfitting problem. Thus, various models with the necessary statistical data transformations were efficiently trained after applying the appropriate data transformation. Comparatively, the models were able to perform better on the data of BMY boom as it can be observed from the error reduced in table 2.
The comparative analysis of the seven data-driven models revealed that the simple ML model like KNN was able to yield similar performance as that of LSTM and MLP models. We expected the LSTM models to work far better than the other methods, but this did not occur. On the other hand, different variables were input to the algorithms, some more or less correlated with the variables of the two boards of the same boom that are used to predict the missing values. This is why in some cases KNN and MLP might work better than Bi-LSTM and LSTM, in some cases the other way around. In LSTM based methods, time memory could play an important role, but given the results it does not. Additionally, it must be noted that the correlation between all the variables clearly depends on the wind magnitude and direction.
Considering the multivariate missing data problem, KNN has proven its effectiveness in multivariate numerical data imputation compared to powerful DL algorithms [52]. Depending upon the nature of numerical data, KNN has been able to yield better or close enough performance to the complex models in various studies [53][54][55][56]. Therefore, it is often preferred over complex models for numerical data imputation owing to its competitive performance, deterministic nature, and low computational requirements.
It must be noted that, even though all ML and DL models give comparable prediction metrics, in case the calculation should be made in situ (on Mars surface, using the very limited computation resources of the rover or lander) the trained DL model might be more useful, since they require less computational resources on edge to obtain the inferences.
The complex aerodynamics generated by the wind on the booms requires an inversion algorithm based on extensive calibration in the wind tunnel. This means that there are no analytical models or approximations describing the sensor outputs as a function of the incoming wind. On the other hand, having three boards on each boom typically guarantees that at least two of them sense wind very well. This means that from the aerodynamical point of view, a certain correlation between the output signals is to be expected. This is one of the key factors explaining the fact that the algorithms presented in this work are capable of producing good predictions.

Conclusions
In this paper, a comparative analysis of seven ML and DL models was carried out to predict intermediate variables from a complex sensor, such as the wind sensor from the TWINS instrument. The study is aimed at improving the resilience of sensors using the proposed methods. The models were trained with the statistically transformed data of two wind-sensing boards to predict the data for the third board. The evaluation based on RMSE of the outputs demonstrated significant potential of the KNN, MLP, and LSTM models in terms of predictions. Nonetheless, the KNN model is the preferable choice for the soft wind sensing board owing to its simple architecture, lower computational cost, and robust performance for all six cases. The performance of the proposed model to obtain wind values was evaluated based on the results of the IA. The comparative error analysis demonstrated good fit between wind values obtained through original and predicted data. The wind recovery errors for the horizontal velocity are reduced by a factor between 2.43 and 4.78, compared with the situation of using only two boards, and restricting the IA to them. Similarly, the error for the wind direction is reduced by a factor between 1.74 and 4.71.
This approach can be applied to other sensors working with complex intermediate variables sets and calibration data.

Data availability statement
The data that has been used is confidential.