Prediction model of bearing fault remaining useful life based on weighted variable loss degradation characteristics

In the prediction of bearing fault remaining useful life (RUL), the identification and feature extraction of early bearing faults are very important. In order to improve the accuracy of early fault RUL prediction, a bearing fault RUL prediction model based on weighted variable loss degradation characteristics is proposed. The model is composed of a stack denoising autoencoder (SDAE) module guided by variable loss, a signal-to-noise feature adaptive weighting module and a long-short term memory (LSTM) degradation characteristics extraction and regression output module. Firstly, this model improves the ability of SDAE model to extract weak fault features by ascending dimension learning and variable loss function. Then, an adaptive weighting matrix is generated according to the test signal to modulate the weight vector of SDAE. Finally, the hidden layer features of SDAE were input into LSTM model to extract the bearing state degradation features and realize the RUL prediction of bearing faults. The experimental results show that the proposed model can accurately predict the RUL of the test data in the early fault stage and the fault development stage. The proposed model can give early fault warning to the bearing state.


Introduction
The development of mechanical equipment reliability research is based on sensor measurement.The vibration and acoustic signals of the mechanical equipment are collected by the measuring device for processing and analysis, so as to achieve the purpose of safety and reliability.The development of Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.measurement science is also promoting the progress of reliability research.With the advent of the Internet of Things era, the demands on the reliability for mechanical equipment have been upgraded from traditional fault monitoring to early fault monitoring [1].The rotating components of mechanical equipment such as bearings and gears are prone to bearing wear and gear breakage, which is the key research object in early fault monitoring [2].In order to ensure the safe and reliable early fault monitoring, not only the accurate and efficient fault diagnosis technology is needed, but also the demands on the remaining useful life (RUL) prediction method of the fault parts is increasing.Data-driven method is one of the commonly used life prediction methods, which has been widely used in the absence of prior knowledge of fault prediction [3].The data-driven prediction method mainly focuses on two key issues: 'How to establish a health indicator reflecting the degradation process' and 'choosing a suitable prediction algorithm to predict the development trend' [4].As a powerful tool for processing big data, deep learning model [5,6] has been successfully applied to mechanical failure RUL prediction.
The whole life cycle of parts mainly includes the smooth running stage, the early failure stage, the failure development stage and the failure stage [2].In the early failure stage, the fault begins to appear in the form of weak faults (such as small cracks and small deformation).Because the fault is weak, the amplitude of the fault feature is much smaller than that of the rotation frequency feature.Therefore, in the time-domain and frequency-domain diagrams, the early failure stage is almost identical to the smooth running stage.Therefore, in order to accurately reflect the early weak fault degradation process, it is necessary to reduce the noise and enhance the feature of the early weak fault signal.In recent years, researchers have proposed signal processing methods related to weak signal noise reduction and feature enhancement, such as correlation analysis [7], wavelet decomposition [8], singular value decomposition [9], etc.In order to reduce the involvement of expert experience, researchers have successively proposed weak signal processing methods based on machine learning methods, such as denoising autoencoder (DAE) [10], sparse coding [11], and convolutional noise reduction [12].DAE based on autoencoder.By adding random noise to the input data and encoding and decoding the noisy input data, the reconstructed data is as close as possible to the original input data, which greatly enhances the robustness of the model.Jiejie et al used DAE network to clean the status data of power transmission and transformation equipment, and then identified the abnormal operating status of power transmission and transformation equipment [13].Majumdar designed a blind denoising method based on automatic encoder to improve the model noise reduction performance [14].According to the above references, in order to enhance the noise reduction ability of DAE, multiple DAE can be superimposed to form stack denoising autoencoder (SDAE).Increasing the depth and complexity of machine learning models can effectively improve the nonlinear mapping ability and robustness of the models.However, too complex models tend to produce local overfitting or underfitting phenomena during training, which leads to the model convergence speed reduction or even nonconvergence.In order to avoid the above problems, when improving DAE for low signal-to-noise ratio (SNR) problems, this paper should not only consider how to improve the nonlinear mapping capability of the model, but also consider the balance between the reconstruction accuracy of the model and the training speed.
With the progress of time and fault development, the early weak fault characteristics gradually strengthen until the fault component fails.In this process, the SNR of the target signal increases with the time sequence.If the noise reduction method for strong noise is used to process the weak noise signal, the accuracy of the reconstructed signal will be reduced, which is not conducive to accurately establishing the fault degradation characteristics.To solve the problem that the performance of noise reduction methods decreases when the application scenarios change, researchers generally adopt adaptive noise reduction [15], convolutional neural networks [16] and transfer learning [17].This paper is inspired by the above references.In the process of improving DAE, we will start from the perspective of learning the change process of the target signal characteristics, so that the model can maintain the reconstruction accuracy even when the SNR of the target signal changes.
Fault characteristics can reflect whether a component has a fault, but may not accurately reflect the fault degradation state.A reasonable fault degradation index should meet the requirements of correlation, monotonicity and robustness.In recent years, many researches related to the construction and prediction algorithms of health indicators have been carried out, such as ANN [18], SVM [19], RNN [20] and their improved algorithms.Camci et al extracted the time-domain index of the rolling bearing vibration signal and used the monotony index to evaluate the degradation performance of the timedomain feature [21].Qian et al used Recursive Quantitative Analysis to extract recursive graph entropy features from vibration signals as an effective indicator for monitoring bearing deterioration [22].Qiu et al used self-organizing mapping (SOM) to fuse time-domain features and took the minimum quantization error (MQE) as a health indicator [23].ANN and SVR have good ability to fit nonlinear data, but these algorithms treat time series as independent of each other and do not consider the time memory between data points.Therefore, RNN and their improved algorithms, long-short term memory (LSTM), have been proposed [24].LSTM can effectively solve the problem of gradient disappearance and gradient explosion caused by RNN long-term memory.Cheng et al combined a series of long and short term memory neural networks according to Bayesian inference algorithm and proposed an integrated LSTM model to improve the adaptive and generalization ability under different prediction conditions [25].Wang et al propose a Recurrent Convolutional Neural Network to simulate the time dependence of different degenerate states, and adopt variational reasoning to quantify the uncertainty of RUL [26].It can be seen from the above references that the characteristics of LSTM storage time memory are very suitable for fault degradation feature construction and lifetime prediction model construction.Therefore, this paper will use the time memory characteristics of LSTM combined with the extracted fault characteristics to construct the bearing fault degradation index, and use the index to predict the remaining life of the faulty bearing.
In summary, aiming at the problem of bearing's early fault RUL prediction, and considering the excellent noise reduction ability of DAE model and the time memory characteristics of LSTM model, this paper attempts to improve and merge the two models, and proposes a bearing fault RUL prediction model based on weighted variable loss degradation characteristics.Firstly, the DAE model is optimized to improve the ability to weak signal processing and the adaptability to noise changes.Then, the hidden layer feature of DAE is used as the input signal of LSTM model to learn the bearing fault degradation feature.Finally, the learned degradation features  Suppose x ∈ R m is the original signal of dimension 1 × m, then the destroyed signal is x = x + n, where n is the artificially added noise.n is usually Gaussian noise or shielded noise.Then the hidden layer h and the output signal y are expressed as follows.

Basic theory
where x i and y i are the ith value of the original signal x and the output signal y.Gradient descent method is usually used to optimize network parameters.

Long-short term memory
LSTM network is an improved algorithm of RNN.LSTM solves the problem of gradient explosion or gradient disappearance caused by continuous matrix multiplication of RNN by adding multiple gate control structures on the basis of creating time paths.The LSTM model structure is shown in figure 2. Each large square in figure 2 is a memory unit for that moment.Each memory unit acts as a short-term memory that controls three gate (input gate in (t) , forgetting gate f (t) , output gate out (t) ) signals and a nonlinear tanh layer to protect and update its cell state.The core of LSTM is the unit state of the memory unit, which determines which information is passed to the next memory unit.The forget gate f (t) determines what information in the cell state of the current memory unit needs to be discarded.The input gate in (t) and the nonlinear tanh layer together determine which new information needs to be stored in the memory unit.The output gate out (t) determines what information needs to be output.x (t) is the input signal at time t, h (t−1) and C (t−1) are the output and main memory of the t-1 moment memory unit, and h (t) and C (t) are the output and main memory of the t moment memory unit.The yellow areas represent matrix multiplication and the pink areas represent dot multiplication or dot addition.The forward propagation of LSTM is calculated as follows ) where {W f , W in , W out , W c } and {b f , b in , b out , b c } are the weight matrix and the corresponding bias vector, and are the main parameters of the memory unit.'•' stands for vector dot product calculation.tanh and σ stand for tanh activation function and sigmoid activation function, respectively.According to the above equations ( 4)-( 6), all gated units are vectors activated by the sigmoid function, and all elements of the gated units are in the interval (0, 1).In equations ( 8) and ( 9), the gating unit controls the pass rate of information through dot multiplication calculation to realize the gating function.In combination with figure 2 and equation ( 8), it can be seen that only dot multiplication and dot addition linear operations are carried out during the forward propagation of the main line memory C (t) .This ensures that important timing information will not change even after a long time, and is the core method of LSTM to solve long-term dependency problems.LSTM uses the error between the target output and the actual output, back-propagates to calculate the gradient of ownership weight and deviation, and obtains the optimal weight and deviation through optimization algorithm.

RUL prediction model based on weighted variable loss degradation characteristics
Considering the excellent noise reduction ability of DAE model and the time memory characteristics of LSTM model, this paper improved and fused the two models to propose a RUL prediction model based on weighted variable loss degradation characteristics.Its structure is shown in figure 3. The input to the model is the time series X (t−Sm+1) , X (t−Sm+2) , • • • , X (t) , X (t) ∈ R l of the raw unprocessed vibration signal at time t, where l is the length of the input signal at each time step, and S m is the number of input time steps.The output of the model is the RUL of the time step t, expressed as RUL (t)  pre .The model consists of four modules, namely, SDAE module guided by variable loss (SDAE module), adaptive weighting model of signal-noise feature change (AWM module), LSTM time series degradation feature extraction and regression output module (LSTM module).Firstly, each time step signal is input into the SDAE module and AWM module, and the input signal is reconstructed with noise reduction and feature extraction.Then the degradation trend features are extracted by input into LSTM model in sequential order.Finally, RUL was predicted by regression model.

SDAE module guided by variable loss
Due to the limited ability of shallow networks to express complex problems, a SDAE module guided by variable loss is constructed in this paper.Its network structure and training process are shown in figure 4. Assume that the original signal of the training dataset is  Step 2: Randomly zero the hidden layer of DAE1 using Dropout to obtain the destroyed hidden layer h1 = h 1 • dropout, where dropout is a vector containing only two elements, 0 and 1, and the vector elements obey the Bernoulli distribution, namely dropout(i) ∼ Bernoulli(p).
Step Step 5: Reserve the weight parameters and bias vectors from the input layer to the hidden layer of the Nth DAE to form the encoder of SDAE.The output of the encoder is A variety of loss functions are used in the training process, namely, reconstruction loss function L R , sparse loss function L S and disturbance loss function L F .The details are as follows: (a) Reconstruction loss function L R The reconstruction loss function L R , as shown in align (3), minimizes the difference between the original signal and the reconstruction signal, which is an important standard to ensure the noise reduction capability of DAE.According to the idea of manifold learning, L R helps DAE to learn vector fields pointing to low-dimensional manifolds.The correct estimation of the regression manifold can be obtained from the destroyed data points that deviate from the manifold.This is the fundamental reason why DAE is able to reconstruct damaged signals.As can be seen from figure 4, except DAE1, which is an overcomplete autoencoder, the rest of DAE are undercomplete autoencoders.In order to prevent the learning task of overcomplete autoencoders from becoming meaningless replication identities and encourage the DAE model to learn beneficial features about data distribution from the training data, two regular terms, namely sparse loss function L S and disturbance loss function L F , were added to the reconstructed loss function in this paper.
(b) Sparse loss function L S The sparse loss function L S expression is shown in aligns (10) and (11), where ρj represents the average activation value of all training samples on neuron j in the hidden layer.L S uses KL divergence to measure the difference between two different distributions ρj and the given sparse value ρ, and then constrends the average activation value output by the hidden layer neurons.One SDAE training consists of N DAE training, and one DAE training consists of multiple fine-tuning of four parameter matrices, whose dimension is the number of neurons in the input layer and the hidden layer.It can be seen that the training cost of SDAE is huge.In order to reduce the computation of the model, L S is introduced.On the one hand, sparse constraint reduces the computational burden of the model by inhibiting the activity of some hidden layer neurons, on the other hand, it enables the hidden layer to interpret input and learn unique statistical features by affecting the distribution of hidden layer variables (c) Disturbance loss function L F The structure of disturbance loss function L F is shown in align (12), where J f (x) is Jacob's matrix.L F is the F norm of the Jacobian matrix expressed by the hidden layer of the input in order to shrink the mapping of the feature space near the training data.By achieving local spatial contraction, the perturbation of the training sample in all directions is suppressed.
When L F has a relatively small first derivative, it indicates that the hidden layer expression corresponding to the input signal is smooth.In other words, when the input changes, the hidden layer expression does not change too much, thus making the DAE insensitive to input changes With the deepening of the stack structure, the impact of noise disturbance gradually decreases, and the task focus of DAE shifts from noise reduction to feature extraction, so the weight of disturbance loss function can be appropriately reduced.With the reduction of hidden layer dimension, the network computing burden is reduced.In order to ensure that the hidden layer contains enough features, the requirement for sparsity should be appropriately reduced.Combining training requirements and all loss functions, a variable loss function is proposed, as shown in align (13).In the align, β is a hyperparameter with the value range [0, 1], which determines the weight of the disturbance loss function.The weight of the sparse loss function is adjusted by the sparse value ρ.Therefore, the 'multiple' in the variable loss function refers to the combination of multiple loss functions, and the 'variable' refers to the hyperparameter changing according to the training When the stack structure is used, the interference and error generated by each noise disturbance will be accumulated layer by layer, that is, the earlier the disturbance appears, the greater the impact on the final result of SDAE.Therefore, the training process of DAE1 is particularly important.The main task of DAE1 is to return data that deviates from the manifold to the vicinity of the manifold.The task of the remaining DAE is to perform corrective remediation and feature extraction when unknown noise distorts the regression estimation of DAE1.In order to ensure the learning quality of DAE1 hidden layer, DAE1 is optimized as follows: 1.When the destruction process is applied to the original signal, only Gaussian white noise is used in DAE1.This is because in all DAEs, only DAE1 inputs a time-domain signal.Dropout cannot correctly represent the feature distribution of test data destroyed by noise.2. The number of neurons in the hidden layer of DAE1 is greater than that in the input layer, which belongs to the learning process of dimension enhancement.Mainly for the case of high noise intensity.At this time, the target feature is weak and scarce, and the feature dimension can be expanded by increasing dimension, so that the hiddden layer can learn more feature information and lay the foundation for the subsequent DAE.
The model is trained by stochastic gradient descent, with the goal of minimizing the loss function.The gradient updating process is shown in align (14).Where ε 1 represents the learning rate

Adaptive weighting model of signal-noise feature change
Considering that with the advance of the time sequence, the weak fault characteristics gradually increase, and the SNR of the target signal also changes.For SDAE model trained for specific noise intensities, the reconstruction accuracy will be reduced when dealing with noise of different intensities, which is not conducive to accurately establishing fault degradation characteristics.In this paper, an adaptive weighting module of signal-noise feature change is proposed from the perspective of learning the process of target signal feature change.The network training process of this module is shown in figure 5.
Considering that the signal feature changes in the frequency domain can more clearly reflect the signal-noise intensity distribution, Fourier transform is performed on the input signal x of the SDAE model to obtain the frequency domain signal F (x) composed of Fourier coefficients.F (x) completes two dimensional upgrading learning under the function of weight matrix W A1 and W A2 , and obtains the adaptive weighting matrix W AWM .
where f (•) is the sigmoid function.It can be seen from align (15) that the result of the adaptive weighting matrix W AWM is jointly determined by the parameter  16) According to section 3.1, the weight matrix of SDAE is a vector field that assists regression of deviation manifold data.When there are signal-noise characteristics differences between the test signal and the training signal of SDAE, the regression of the vector field does not match the actual deviation degree, resulting in inaccurate SDAE reconstruction results.The value range of each element in the adaptive weighting matrix W AWM is [0, 1], which can be regarded as a control gate.The control gate can modulate the vector field according to the difference of signal and noise characteristics, and make the regression match the actual deviation degree.Because the first layer of SDAE constructed in this paper is responsible for the main manifold regression task, only the weight matrix W 11 of the first layer needs to be modulated.Therefore, the weight matrix Ŵ11 and SDAE after modulation optimization can more accurately reflect the signal characteristics, and lay a foundation for accurately extracting fault degradation characteristics.
Two training objectives are summarized from the above process: Training objective 1: When input signals with different signalto-noise characteristics, SDAE reconstructed signals with adaptive weighted matrix W AWM modulation have higher reconstruction accuracy.Training objective 2: Reduce the computational burden of the adaptive weighting module to ensure that useful features can be learned from the updimension learning.
According to the weighting principle and training objectives of the adaptive weighting matrix W AWM , the objective function of the adaptive weighting module is constructed, and the align is shown in align (17).The first item in the align is the reconstruction error between the input signal and the reconstructed signal, which meets the training objective 1.The second term in the align is the L1 norm of the weight matrix, so that the elements in the weight matrix are 0 as much as possible to obtain the sparse effect and meet the training objective 2. λ is the weight attenuation coefficient. Parameter ) is trained by minimizing the objective function and stochastic gradient descent, and the gradient update process is shown in align (18).Where ε 2 represents the learning rate

LSTM degradation feature extraction and regression output module
In the process of bearing life cycle degradation, there is a strong correlation between the degradation states of each time step.Therefore, when predicting the RUL of the current time step, not only the characteristics of the current time step monitoring data, but also the rich performance degradation trend information in the historical monitoring data are needed.It can prevent the abnormal predicted value caused by the characteristic mutation of monitoring data caused by information interference.Considering the problems of long term memory and time sequence dependence, LSTM model is used to extract the bearing fault degradation state features.Its structure is shown in figure 3.This module consists of LSTM layer, FC layer and regression output layer.The time sequence signals of each time step are processed by adaptive weighting module and SDAE module, and the noise reduction and preliminary feature extraction are completed.Therefore, the input of this module is SDAE NTH hidden layer h N .Firstly, the degradation state feature { h (t−Sm+1) , h (t−Sm+2) , . . ., h (t) } of each time step is extracted by LSTM layer.Then it is input into the FC layer for feature weighting and dimensionality reduction to obtain the degenerate state feature H (t) of the current time step t.Finally, input the regression output layer to calculate the health index value RUL (t)  pre of the current time step.The calculation formula of the regression output layer is shown in align (19).Where ψ represents Relu activation function, W O and b O are weight matrix and bias vector of regression output layer respectively Parameter this module is trained through Adaptive Moment Estimation (Adam) algorithm to minimize the objective function.The objective function is shown in align (20) L output θ output ; RUL (t)  pre , RUL

The training and testing process of RUL prediction model
The training and testing process of the RUL prediction model based on weighted variable loss degradation features is shown in figure 6. Specific implementation steps are summarized as follows: Step 1: Collect lifecycle monitoring data.First, the whole life cycle vibration monitoring data is collected at a fixed time interval.
Step { X (t−Sm+1) , X (t−Sm+2) , . . ., X (t) , RUL true }, t = S m , S m + 1, . . ., RULO τ is constructed with the monitoring data of each full life cycle, where RUL O is the run time and τ is the time interval of data collection in the RUL monitoring process.
Step 3: Build and train the SDAE model with variable loss guidance.According to the method introduced in section 3.1 and the structure shown in figure 4, the SDAE model is constructed.The SDAE module network parameters were trained using training sample set 1.
Step 4: Build and train the adaptive weighting model of signal-noise feature change.According to the method introduced in section 3.2 and the structure shown in figure 5, the AWM module is constructed on the basis of the SDAE module trained in Step 3. The network parameters of the AWM module were trained using training sample set 2.
Step 5: Construct and train the RUL prediction model based on weighted variable loss degradation features.According to the method introduced in section 3.3 and the structure shown in figure 3, the RUL prediction model is constructed on the basis of the SDAE module and the AWM module trained in Step 3 and Step 4. The network parameters of the RUL prediction model were trained using training sample set 3. In practical applications, the network size of the model and its hyperparameters can be adjusted as needed, such as the number of SDAE hidden layers, LSTM layers and FC layers, the number of neurons in each network layer, and loss function parameters.
Step 6: Use the trained RUL prediction model to make RUL prediction.The monitoring signal of a certain length is collected every time τ , and the signal {X (t) } collected in real time is combined with the signals collected by several historical time steps into the signal sequence { X (t−Sm+1) , X (t−Sm+2) , • • • , X (t) }, and input into the RUL prediction model to calculate the RUL (t)  pre of the current time step.

Test equipment and data introduction
In the part of experimental verification and analysis, this paper adopts the PHM2012 bearing acceleration life experimental data set published by FEMTO-ST Institute [27], which is widely used to verify the validity of RUL prediction model.The structure of the experimental platform is shown in figure 7.In the experimental platform, a cylinder pressure is arranged beside the bearing, and the bearing failure is generated by applying continuous load to the bearing.The signal acquisition is done by means of two 3035B DYTRAN vibration acceleration sensors mounted horizontally and vertically in the  bearing.Because vertical data contains less information [28,29], this paper mainly uses horizontal vibration signals.
The experimental data consist of three working conditions: working condition 1 (load 4000 N, speed 1800 rpm), working condition 2 (load 4200 N, speed 1650 rpm) and working condition 3 (load 5000 N, speed 1500 rpm).The life cycle vibration monitoring data of 7 groups (Bearing1_1-Bearing1_7), 7 groups (Bearing2_1-Bearing2_7) and 3 groups (Bearing3_1-Bearing3_3) were collected under three working conditions, respectively.The sampling frequency is 25.6 kHz.Vibration data of 0.1 s is collected every 10 s as a monitoring sample, that is, the length of a single sample is 2560.The signal acquisition method is shown in figure 8. Details of the data are presented in table 1.The data of the training set is the monitoring data of the whole life cycle degradation, and the time-domain waveform of the original vibration data of the training set is shown in figure 9.The data of the rest of the test set retains only the first part of the data after the life cycle data truncation, which is used to predict the RUL of the bearing.

Evaluation index of RUL prediction model
In order to quantitatively evaluate the performance of RUL prediction model, score is used as the evaluation index.The calculation formula of %Er i of RUL predicted by the ith test sample is shown in align ( 21) where RUL true and RUL (i ) pre show the true and predicted remaining life of the ith test sample respectively.The marking func-tion of the prediction result of the ith test sample is shown in align (22), and the curve of the function is shown in figure 10   A In engineering applications, the lead prediction error (%Er i > 0) has a lower risk than the lag prediction error (%Er i < 0) risk.Therefore, when the lead error and the lag error are the same, the score of the lead prediction error is higher.That is, the predicted RUL is preferred to be lower than the actual RUL.The calculation method of the marking value of the prediction performance of the RUL model is shown in align (23), where N S is the number of test samples.A higher score means better prediction performance of the RUL model

The training and validation of SDAE module
Follow the procedure in section 3.
The total life cycle training set is 7534 single samples, and the single sample length is 2560.The number of neurons in the input layer of SDAE module is equal to the length of a single sample.The remaining hyperparameters are determined by cross-comparison.The SDAE module in this paper consists of four DAE model stacks.The number of four hidden layer neurons was 3000, 1500, 750, 380, respectively.Disturbance loss weights β were 0.7, 0.6, 0.5, 0.4, respectively.The sparse values ρ were 0.2,0.3,0.5 and 0.8, respectively.The activation function of the network is the sigmoid function.There is a total number of 100 training iterations for each layer.The learning rate is determined to be 1 × 10 −4 .40% samples were randomly selected in the whole life cycle training set to construct training sample set 1, and the rest were used as test sample set.
According to the test results in table 2, the SDAE model with variable loss guidance proposed in this paper can reduce the noise signal from −20 dB to +9.36 dB, and the reconstruction error is only 0.83.This shows that the variable loss guided SDAE model proposed in this paper can reduce the noise of weak signals and has a low distortion degree.In this paper, the average SNR and average RMSE of the model are better than that of the comparison model.However, the training time of the proposed model is slightly higher than that of the comparison model 3 and model 4 under the combined effects of dimensionality increase learning and sparsity reduction.
The test result of comparison model 1 is the worst, the results show that it can not completely eliminate the noise, and the reconstructed signal distortion is high.All stack models except comparison model 1 get better results, which proves the necessity of stack structure.The reconstruction results of comparison model 2 are also poor because of the lack of perturbation constraints.Moreover, the training time of contrast model 2 is the longest because it lacks sparse constraints.The above results prove the necessity of multiple loss functions.The test results of comparison model 3 are better, but the number of neurons in the fourth hidden layer is only half that of the proposed model, that is, the hidden layer contains less information.It can be seen that dimensionality reduction learning can improve the training speed, but it will reduce the useful features learned by the hidden layer and affect the use of the subsequent model.The test result of comparison model 4 is also better, and its training speed is faster than that of the proposed model because of the fixed sparse value.The disturbance loss of comparison model 4 is fixed weight, and the disturbance loss weight of the proposed model is decreasing layer by layer, but the difference between the noise reduction results of the two models is very small.This result proves that the influence of noise disturbance decreases with the deepening of stack structure, that is, the variable loss function is correct.The test results of comparison model 5 are similar to those of the proposed method, and both have good noise reduction effect, which proves once again that the proposed method can reduce the noise of weak signals.However, due to the complex structure of comparison model 5, the training time is longer than that of the proposed method.This is consistent with the problem mentioned in the section 1 that the complex structure of the model leads to the reduction of convergence speed.
The features extracted from the noise reduction model will be applied to the RUL prediction model, so the effective information transmitted by the noise reduction model must be considered.The results of the fourth hidden layer of a test sample in the proposed model and the comparison models 2, 3 and 4 are shown in figure 11.
As can be seen from figure 11, due to sparsity constraints, some data points in (a), (c) and (d) are 0, that is, the information contained is less than the hidden layer dimension.Due to the differences in model structure and sparse values, the information content of each model is also different.There are about 300 data points for proposed model, 380 data points for comparison model 2, 130 data points for comparison model 3, and 76 data points for comparison model 4. It is difficult to directly judge the validity of data information from the figure, so it is hoped to retain as many data points as possible.Combined  the test results in table 2, even if the comparison model 2 has the largest amount of data, it cannot guarantee the accuracy of life prediction.The reconstruction accuracy of the other three models is close, but the proposed model can transmit the most information to provide more information for the degraded feature extraction module and ensure the accuracy of the RUL prediction model.Therefore, from the perspective of noise reduction and reconstruction, the proposed model and the comparison model 3 and 4 are excellent noise reduction models.However, from the perspective of RUL prediction, the   joint-loss convolutional neural network proposed in [16].This model improves the generalization of deep learning models by sharing parameters and partial The effect AWM module needs to be verified by the effect of noise reduction reconstruction, so SNR and RMSE are still used to evaluate the performance of the model.The average SNR and average RMSE of each model test result are shown in table 3. Figure 12 shows the time domain diagram and frequency domain diagram of a test sample after adding Gaussian white noise with different SNR.
According to the test results in table 3, the average SNR and average RMSE of noise reduction reconstructed signals under different noise environments are superior to the comparison model under the function of the AWM module proposed in this paper.The result of comparison model 1 is very unsatisfactory, which indicates that the SDAE module cannot return the damaged signal to the correct manifold when the noise intensity changes.The result between the proposed model and the comparison model 1 shows the necessity of AWM module.The above results prove that the AWM module can learn T Yu et al the characteristics of noise environment changes and modulate the first-layer weight matrix of SDAE module in the form of a control gate, so that SDAE module can maintain a high reconstruction accuracy when dealing with different levels of noise.
The results of comparison model 2 in table 3 are also not very ideal.The noise can be suppressed, but the distortion of the reconstructed signal is high.This is because the input signal is a time domain signal.Looking at the frequency domain diagram in figure 12, although different levels of noise are added, the signal characteristics remain consistent.The difference between the spectrum is reflected in the variation of the characteristic amplitude.Such consistency and difference is conducive to the neural network to focus on learning how to correctly identify the characteristics of noise changes, so as to achieve correct modulation.However, the difference of each signal in the time domain diagram is large, which leads to the distraction of the neural network and the failure to correctly identify the characteristics of noise changes.Therefore, the input of the adaptive weighting module must be a frequency domain signal.
The experimental result of comparison model 3 is lower than that of the proposed model, but higher than that of comparison model 1 and model 2. This is because comparison model 3 has generalization capability, but lacks noise suppression capability for early faults.This shows that under the joint action of SDAE module and AWM module, the features of the early fault stage and the fault development stage can be extracted at the same time to ensure the prediction performance of the RUL prediction model.

The training and validation of RUL prediction model
Construct training sample set 3 using the full lifecycle training set as described in section 3.4.The value of time step S m is 20.LSTM module input is the fourth hidden layer data of SDAE module, and the number of neurons is 380.The number of output neurons of LSTM model for each time step is 20.The number of output in FC is 20.The number of output neurons in the regression output layer is 1.There is a total number of 100 training iterations for each layer.The learning rate is determined to be 1 × 10 −4 .The model in this paper does not involve cross-domain problems, so it is only trained and tested under the same condition.According to the verification requirements, two groups of verification tasks are constructed.5.
The five cut-off points in task C1 represent the five stages of the bearing's whole life cycle, which are respectively: bearing smooth operation stage, bearing early failure stage, bearing failure development stage (two groups), and bearing failure stage.In practical engineering, more attention is paid to the early fault stage and the fault development stage.According to the data in table 4, the prediction error of the proposed model in the early fault stage and fault development stage is small, and the prediction is in advance.This shows that the RUL prediction effect of the proposed model is good.It can be seen from the results of table 5 that the single bearing scores and average scores of the proposed model are higher than those of the comparison model.In conclusion, under the same working conditions, the proposed model can be effectively applied to the prediction of bearing RUL and meet the engineering requirements.
Comparison model 1 has the lowest score because the model takes the time domain signal as input.Although the input signal has the characteristics of timing change, it contains interference factors such as noise, which leads to inaccurate timing characteristics learned by LSTM model.As can be seen from the data in table 4, the error of comparison model 1 in the early stage is very large, and it is almost impossible to correctly predict the RUL in the early stage, because the SDAE module is missing to extract the early fault features.Comparison model 2 has SDAE module, and its score is higher than comparison model 1, which also proves the necessity of SDAE module.Because the comparison model 2 lacks AWM module, its score is lower than that of the proposed model.It can be seen from the data in table 4 that the error in the early stage is small, while the error in the fault development stage is large.This is because of the lack of AWM module, the eigenmanifold does not have the result of correct regression.This shows the necessity of AWM module.Comparison model 3 scored higher than comparison model 4 but lower than proposed model.The difference between these three models is the amount of data entered into the LSTM.The proposed model has the largest amount of data and the comparison model 4 has the least amount of data.The amount of data matched the score, that is, the more data, the higher the score.This is consistent with the conclusion in section 4.3, which once again proves the importance of ascending dimension learning and variable loss function.As can be seen from figure 14, both groups of training data are bearing fault slow degeneration models.After the early fault characteristics appear, the fault gradually develops with time, and the fault characteristics are gradually obvious until the final failure.Bearing1_3 is also a slowly degraded model in the test data, which is similar to the training data.Therefore, Bearing1_3 can obtain a higher score in the test.The test data Bearing1_5 has been in the stable operation stage in the early stage until the fault characteristics appear at some point in time, and it rapidly develops to the failure state in a short time.Therefore, it is inferred that Bearing1_5 belongs to the accelerated degradation model and has a big difference with  Set up four comparison models for task C2.Sutrisno et al [31] proposed a RUL prediction method based on vibration frequency anomaly detection, degradation feature inference and run-time ratio.Lei et al [32] proposes a RUL prediction model based on MQE and particle filter algorithm.Hong et al [33] adopts the lifetime prediction method of waveletempirical mode decomposition and SOM.Guo et al [34] proposes a RUL calculation method based on multi-feature and RNN.The predictive performance results of the model proposed in C2 task and the comparison model are shown in table 6.
As can be seen from the results in table 6, the final score of the proposed model is higher than that of the comparison model in early fault RUL prediction verification.This shows that the proposed model can be effectively applied to the RUL prediction of early bearing faults under the same working conditions.In the references, various comparison models can achieve high scores in the fault development stage.However, because the early fault stage is not considered, the model does not have the ability to extract weak fault features.Therefore, none of the four comparison models achieved a high score in the C2 task.
In summary, the RUL prediction model proposed in this paper can extract the fault characteristics of the early fault  stage and the fault development stage, and predict the RUL of the bearing through the fault characteristics.

Conclusion
In this paper, a bearing fault RUL prediction model based on weighted variable loss degradation characteristics is proposed.show that the proposed model has the lowest life prediction error for the early fault stage and fault development stage, and the score is higher than that of the comparison model.This shows that the model proposed in this paper can not only predict the life of the bearing fault, but also give fault warning to the bearing state earlier than the comparison model.
In future work, the effect of degradation rate on prediction accuracy will be further studied.

Data statement
The data that support the findings of this study are openly available at the following URL/DOI: Patrick Nectoux, Rafael Gouriveau, Medjaher, Emmanuel Ramasso, Brigitte Morello, Noureddine Zerhouni, Christophe Varnier.PRONOSTIA: An Experimental Platform for Bearings Accelerated Life Test.IEEE International Conference on Prognostics and Health Management, 2012.

T
Yu et al are used as the input index to predict the RUL of the faulty bearing.The predictive performance of the proposed model and the comparison model is verified by using the bearing life cycle test data.The rest of this paper is organized as follows.Section 2 introduces the basic theory.Section 3 gives the proposed model structure diagram and steps.In section 4, the proposed model is verified experimentally.Section 5 summarizes the full text.The innovations of this paper are as follows: 1.A SDAE model with variable loss guidance is proposed.The sparsity and disturbance resistance of DAE model are improved by stack structure, ascending dimension learning and variable loss function.The proposed SDAE model can reduce the noise of −20 dB signal to 9.36 dB. 2.An adaptive weighting model of signal-noise feature change is proposed.The frequency domain data containing noise signal is used as input data to learn the variation characteristics of noise intensity.The adaptive weighting matrix output by the model modulates the first weight matrix of the SDAE model in the form of a control gate, so that the SDAE model has the adaptability and robustness to noise changes.3. A bearing fault RUL prediction model based on weighted variable loss degradation characteristics is proposed.Thanks to the ability of SDAE module to extract weak fault features, the proposed model can predict RUL in the early fault stage.Thanks to the adaptability of the adaptive weighting module to the noise change, the proposed model can predict the RUL in the fault development stage.Thanks to the RUL prediction of early failure, the bearing condition can be warned earlier.

Figure 3 .
Figure 3.The structure of RUL prediction model based on weighted variable loss degradation characteristics.
and add Gaussian white noise (SNR = −15 dB) to the original signal to obtain the destroyed signal x = {x i } N i =1 .The SDAE training process is as follows:

Figure 4 .
Figure 4.The structure of SDAE module guided by variable loss structure.

Figure 5 .
Figure 5.The structure of adaptive weighting model of signal-noise feature change.

Figure 6 .
Figure 6.The training and testing process of the RUL prediction model.

Figure 7 .
Figure 7.The structure of the PRONOSTIA experimental platform.
4 to add white Gaussian noise (−20 dB) to the single sample data of the full lifecycle training set.The SNR formula is shown in align (24) [30] SNR = 10 log Signal power Noise power .
Four comparison models were used to verify the SDAE module.Comparison model 1 is a single-layer multi-loss DAE model with a structure of 2560-1280-2560, disturbance loss weight of 0.7, sparsity value of 0.2, and other parameters are the same as the model in this paper.Comparison model 2 is a SDAE model with only reconstructed loss function, and the model structure and parameters are the same as those in this paper.Comparison model 3 is a variable loss SDAE model with dimensionality reduction learning.The model structure is 2560-1280-640-320-160, and other parameter settings are the same as the model in this paper.Comparison model 4 is a fixed multi-loss SDAE model with the same structure and parameters as the model in this paper, except that the disturbance loss weight is fixed at 0.7 and the sparsity value is fixed at 0.2.Comparison model 5 is a parallel noisy deep learning model proposed in [12].

Figure 9 .
Figure 9.Time domain diagram of vibration data of training set.

Figure 10 .
Figure 10.The curve of the function.

4. 4 .Figure 12 .
Figure 12.The time domain and frequency domain diagram of a test sample with different Gaussian white noise.
(a) Task C1 Training and testing using data from condition 1.The C1 task truncates the test data several times at different time steps, and predicts the RUL of the truncated point.The time domain diagram and cut-off point of C1 task are shown in figure 13.C1 is used to verify the lifetime prediction ability of the model proposed in this paper for the whole life cycle data.Set up four comparison models for task C1.Comparison model 1 is a single LSTM module with time domain samples as input.The structure and parameter settings are the same as in this paper.Comparison model 2 consists of SDAE module and LSTM module with the same structure and parameter settings as in this paper.In comparison model 3, the SDAE module of the model proposed in this paper is replaced with the variable loss SDAE module of dimensionality reduction learning.The structure and parameter settings are the same as those in section 4.3.In comparison model 4, the SDAE module of the model proposed in this paper is replaced with a fixed multi-loss SDAE model, with the same structure and parameter settings as in section 4.3.The RUL prediction model was verified by the evaluation indicators defined in section 4.2.The RUL prediction results of each truncation point of Bearing1_3 in the C1 task are shown in table 4. The prediction performance results of the model proposed in C1 and the comparison model are shown in table

T Yu et alFigure 13 .
Figure 13.task test data and cut-off points.

Figure 14 .
Figure 14.RUL predicted values for the whole life cycle of training data, high-score data and low-score data.

T Yu et alFigure 15 .
Figure 15.C2 task test data and cut-off points.
and the input signal x.Since the parameters are fixed after the training, the adaptive weighting matrix only changes with the input signal.When SDAE input signals with different signal-noise characteristics, the adaptive weighting module constructs an adaptive weighting matrix W AWM according to the signal-noise characteristics of the input signals.The adaptive weighting matrix W AWM weighted the weight matrix W 11 of the first layer of the SDAE model by dot multiplication, as shown in align (

Table 1 .
[30] set description.areused to evaluate the noise reduction performance of the model.The RMSE calculation align is shown in align (25)[30].Where x i is the noiseless original signal and y i is the reconstructed signal after noise reduction.The average SNR, average RMSE and average training time of each model test result are shown in table2

Table 2 .
The average SNR, average RMSE and average training time of each model test result.
Figure 11.The results of the fourth hidden layer of a test sample.

Table 3 .
The average SNR and average RMSE of each model test result.

Table 4 .
RUL prediction performance of proposed model and comparison model in C1 task (Bearing1_3).

Table 5 .
RUL prediction performance of proposed model and comparison model in C1 task.

Table 6 .
RUL prediction performance of proposed model and comparison model in C2 task.
The model is composed of a SDAE module guided by variable loss, an adaptive weighting model of signal-noise feature change, and a LSTM degradation feature extraction and regression output module.The reconstruction accuracy of the DAE model for weak signals with low stack structure, dimension learning and variable loss function.On this basis, combined with the modulation T Yu et al effect of adaptive weighting matrix, the adaptability and robustness of the model to noise changes are improved.Thanks to the above two points, the RUL prediction model can obtain accurate RUL prediction results in the early fault stage and fault development stage.After theoretical and experimental analysis, the following conclusions can be drawn: 1.In the SDAE module verification, using the data containing white Gaussian noise (SNR = −20 dB) as training and test samples, the SNR of the SDAE model with variable loss guidance can be improved to 9.36 dB.The experimental results show that the proposed model not only has better reconstruction accuracy than the comparison model, but also has the most backward information transfer.This shows that the stack structure, ascending dimension learning and variable loss function effectively improve the sparsity and disturbance resistance of the model, and lay a foundation for improving the accuracy of RUL prediction in the early fault stage.2. In the AWM module verification, a variety of noisy data were used as training and test samples, and the reconstructed SNR of the SDAE model after adaptive weighted matrix modulation could reach 9.85 dB.The reconstructed SNR of the unmodulated model is only 1.28 dB.This shows that the adaptive weighting matrix trained according to the input signal can adapt the SDAE model to the noise changes, improve the robustness of the model, and lay a foundation for improving the accuracy of RUL prediction in the fault development stage.3.In the validation of RUL prediction model, RUL prediction test is carried out on early fault data and full life cycle data respectively.In the two verification tasks, the prediction scores of the bearing fault RUL prediction model based on weighted variable loss degradation characteristics are 0.5143 and 0.5989, respectively.The experimental results