A feature reconstruction and SAE model based diagnosis method for multiple mixed faults

Owing to the merits of automatic feature extraction and depth structure, intelligent fault diagnosis based on deep neural networks has become a great concern. However, the non-fault state monitoring data volume of actual industrial machinery is rich, whereas the fault state data volume is insufficient and weak. Furthermore, achieving multiple mixed-fault diagnoses using skewed data distributions is extremely difficult. A feature reconstruction and sparse auto-encoder (AE) model-based diagnosis method for multiple mixed faults is proposed in this study to bridge these gaps. Such a feature reconstruction algorithm is designed and employed to address the following issues: (1) expensive computing resulting from the long sequential features of vibration monitoring data and (2) the extraction problem caused by the submersion of scarce data features. Furthermore, an adaptive loss function was formulated, and a deep AE network was constructed to identify the health status and determine the fault level. Diagnoses of artificial and real faults verify the availability and superiority of the proposed scheme, demonstrating the adaptability and robustness of these hyperparameters.


Introduction
As critical components of complex, large-scale industrial units operating under harsh and extreme conditions, rotating machines are usually inclined to undergo various malfunctions, inevitably weakening the performance of the device and even causing heavy losses [1,2].However, fault formation has a progressive course, exhibiting a characteristic intensity Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
ranging from weak to severe.Therefore, effective and timely fault diagnosis of rotating machinery systems is crucial to prevent these faults and improve reliability.
In fact, because of the complexity of industrial operation processes combined with the interference of working conditions, the vibration signals produced by machinery are usually nonstationary, nonlinear, and mingled with intense background noise [3].In addition, the complex structures of advanced devices bring about the common issue of multiple mixed faults, which increases the difficulty in diagnostic tasks [4].Consequently, mining features from early breakdown signals for identifying the machine status remains a popular and challenging research topic.
Currently, in pursuit of mechanical security and reliability, researchers continuously explore effective measures and have further proposed various strategies for data handling to extract breakdown features.For instance, Tahi et al [5] proposed an expert system method based on a decision tree that utilized the statistical feature parameters of vibration signals to diagnose faults in rotating machinery.Zhao et al [6] designed a fault diagnosis method for rotating machinery based on denoising the wavelet domain and measurement of distance and compared the measured distances between tested fault samples and known fault samples to identify fault types.Han et al [7] proposed a hierarchical Lempel-Ziv complexity analysis algorithm for extracting the fault features of rotating machinery.By combining multivariate hierarchical multiscale fluctuation dispersion entropy, multi-cluster feature selection, and a gray wolf optimization-based kernel extreme learning machine, Zhou et al [8] constructed a fault diagnosis model for rotating machinery.These methods exhibit a high classification accuracy for small-sample data without relying on precise mathematical models.However, the artificial design of feature parameters hinders the adaptivity and generalization performance of these measures.Additionally, the recognition accuracy significantly decreases in large-scale complex monitoring data mining.
Compared with the artificial mining of features in shallow framework architectures, automatic feature extraction [9][10][11] is an advantage of deep learning.Naturally, as a core module of deep framework architectures [12], auto-encoders (AEs) possess the abilities of data dimension reduction, automatic feature extraction, and parameter discrimination and are already widely employed in mechanical fault diagnosis [13][14][15].Xiong et al [16] proposed a method based on the sparrow search algorithm (SSA), variational modal extraction (VME), and multipoint optimal minimum entropy deconvolution adjusted (MOMEDA), that is, (SSA-VME-MOMEDA), for gear fault diagnosis, and the simulation and experimental results revealed the excellent diagnostic effect of this method.However, the empirical nature of hyperparameter configurations and known feature parameters limit their effectiveness and practicality.Combined with multi-dilation rates and a multi-attention mechanism, Chu et al [17] designed a multiscale convolution model for mechanical fault diagnosis, and experiments for bearing and gearbox datasets showed fine diagnostic performance.However, the mixed-fault issues were not fully considered, and the model had a large scale and a long training time.Liu et al [18] developed a method based on meta-analogical momentum contrast (MA-MOCO) learning that focused on wind turbine fault diagnosis with data scarcity and mixed faults.The results indicate that there is still room for improvement in accuracy and generalization ability.Gunapriya et al [19] integrated principal component analysis and a fuzzy interference system for signal feature selection in induction motor fault classification, which is particularly suitable for single-point faults with data scarcity.However, mixed faults were neglected, and the artificial selectivity of the feature parameters prevented their practicality and generalization.
However, the assumption of balanced monitoring class data and highlighted fault symptoms form such general limitations on deep neural network-based diagnostic strategies, which seriously hinders the recognition accuracy, preventing the actual effectiveness of the methods in intelligent status detection [20,21].On the one hand, the health condition recognition of equipment under balanced class data ignored the current situation of scarce fault data in industrial sites, leading to diagnostic results leaning toward abundant advantageous data.However, when machinery malfunctions, weak initial signal strength, low signal-to-noise ratio, and submerged fault features increase the difficulty in feature mining.
Additionally, intelligent detection and recognition of industrial machines encounter some key challenges.Currently, most deep-learning-based diagnostic methods have been validated with over 50% of the total training samples [20].However, in practical industrial engineering, high-quality condition monitoring data, particularly high-capacity labeled condition monitoring data, are insufficient [22].Unlike one-point failures, multiple mixed faults are more realistic in practical industrial machine systems, and their vibration signals are more complicated, increasing the difficulty of diagnosis [2,23].Fault diagnosis should be studied at a level higher than the component level.Similarly, it is necessary to consider the fundamental causes (i.e.interactions between mechanical systems) of faults [24,25].Consequently, achieving an effective intelligent diagnosis of multiple mixed faults is challenging with a small training-sample size in rotating machinery.
Therefore, in this study, a feature reconstruction and sparse AE (SAE) model-based intelligent diagnostic solution for multiple mixed faults was constructed.The major contributions of this study are as follows: (1) An effective intelligent diagnostic solution is built, eliminating the constraints in weak fault-feature-mining under intense background noise.(2) A feature reconstruction algorithm was designed for transmitting and processing multi-scale sub-feature data to enhance the feature representation of scarce data and reduce operating costs.(3) Based on this improved SAE model, the proposed method can conduct accurate diagnosis with a small trainingsample size.(4) Effective and high-accuracy diagnosis of multiple mixed faults.
The remainder of this paper is organized into several sections.Section 2 provides a brief introduction to deep SAE networks.Section 3 presents the proposed feature reconstruction algorithm and introduces the designed feature reconstruction and the SAE model-based diagnosis method for multiple mixed faults.Section 4 describes the diagnostic model for hyperparameter configuration and the criteria for performance evaluation.Performance verification and comparative experiments on artificial-and real-fault diagnoses are presented in section 5. Finally, conclusions are drawn in section 6.

A brief introduction to AE
A deep model, an AE, is proposed to address the primary disadvantages of traditional shallow machine-learning methods.
The AE model is a deep network based on unsupervised learning that achieves an efficient representation of input data.

Encoder
As a module of the AE model, the encoder encodes highdimensional inputs into low-dimensional hidden variables, thereby forcing the model to learn the most informative features.In addition, the dimensions of the feature data obtained by this method are much smaller than those of the original inputs, reducing the dimensionality of the feature data.
Specifically, consider an unlabeled input data sample set X In = x 1  In , x 2  In , . . ., x k In , . . ., x F In , where x k In ∈ R D×1 , D represents the dimension of every input sample, while F refers to the capacity of the input data sample.The mapping function of the encoder is defined as f en  In , and the hidden variable, Hf en In , can be calculated from formula (1), where the sigmoid function Si (•) is generally regarded as the activation function used in hidden layer, W en In and b en In are the weight matrix and the bias vector of the encoder, respectively.

Decoder
As another module of the AE model, the decoder function restores the hidden variables of the hidden layer to their initial dimensions.In the best form, the decoder output can perfectly or approximately restore the original input.
If the mapping function of the decoder is defined as f de Ou , the reconstruction vector of the input data sample can be calculated from formula (2), where W de Ou and b de Ou are the weight matrix and the bias vector of the decoder, respectively.
The AE model rebuilds the signals, minimizing the reconstruction error in formula (3) to maximize the accuracy of raw data recovery, where In represents the reconstructed data sample set.
Therefore, AE models are commonly employed as feature detectors to mine features hidden in raw input-samples.Generally, owing to the concerns regarding mechanical replication from the input end to the output end, in practical applications, different constraints are often introduced to the AE, which forces the model to mine valuable features in raw samples.
An effective practice is to introduce Kullback-Leibler (KL) divergence function into the loss function of the AE model as a sparse-constraint condition, sparsifying the hidden variable Hf en  In .The average activation of Hf en In is shown in formula (4), where Dim refers to the dimension of the hidden variable (Hf en In ) i .The KL divergence function generated by the sparseness-constraint condition is presented in formula (5), where ρ is named sparse parameter with values approaching 0. Models incorporating the sparseness condition are called the SAE models, which have already been popularly promoted in multiple sectors [26,27].Finally, the training of the SAE model is transformed to solve the optimization problem of formula (6), where Φ = {W en In , W de  Ou , b en In , b de Ou }, and a is employed as one parameter for proportion tuning between reconstruction error and sparseness condition.

Feature reconstruction and SAE model based diagnosis method
Although deep-learning-based diagnostic methods are promising and emerging techniques, they face the problems of scarce fault data and multiple mixed-fault diagnoses in the early detection of rotating machinery failure.The crucial challenge in addressing these issues while maintaining feature diversity is minimizing the detrimental effects of highdimensional feature vectors on computational reliability.Simultaneously, the above challenge is considered a solution for feature learning and diagnostic-expressed reinforcement under small training samples and multiple mixed faults.Thus, as shown in figure 1, a feature reconstruction and SAE modelbased intelligent diagnostic solution for multiple mixed faults was constructed in this study, eliminating the constraints of weak fault feature mining under intense background noise.

Feature reconstruction algorithm
An effective algorithm for feature reconstruction was developed in this study to alleviate the requirement of many training samples for fault diagnosis in rotating machinery in most deep learning-based diagnostic methods and the scarcity of high-quality labeled data in practical industrial engineering.This algorithm transmits and processes multi-scale sub-feature data, enhancing the feature representation of scarce data and reducing running costs.Suppose a raw input data sample with F features is decomposed into three sub-data samples based on three sliding windows wd i , where wd i represents the window size and i = 1, 2, 3.For the raw input data samples of the C classes, each sub-sample generated a D i C-dimensional class vector, and D i is calculated from formula (7), where, Notably, for the reduction of the matrix dimensionality, a maximum pooling layer is implanted on each class matrix, which reduces the matrix dimension from D i × C to Di 2 × C. By cascading, the input data samples ultimately generate three Di -dimensional transformed feature vectors, as shown in formula (9): Because of the potential overfitting problem caused by similar features, a pooling technology is employed in this study for comparing adjacent feature vectors in intra-class and reducing their dimensionality by half.Therefore, the initial feature extraction can be achieved without changes in the characteristic distribution.In this manner, the redundancy of the signal data decreases, and the computational efficiency of the algorithm is improved.
The algorithm for feature reconstruction can be marked by formula (10) to improve the transmission and processing efficiency of the feature data, where Fe j represents the feature vector cascaded to jth layer in the feature extraction model, and Fe 1 is the feature vector input from the multi-scale window to the feature extraction model.Considering the transformed feature vectors for every wd i as , where i = 1, 2, 3, and then the full description of the feature reconstruction algorithm is as follows: To fully consider the diversity of features, as shown in formula (11), X 1 , X 2 , and X 3 are cascaded back together to produce the feature vector with Y-dimensions and this vector is then adopted as the input sample for the first layer of the feature-extraction model.Using this approach, the original feature diversity can be preserved as follows: In the characteristic matrix, by flattening those stacked features of C-class, the feature vector for each transformation can be formed.Essentially, a group of feature vectors with a length of C is a type of class feature.Accordingly, based on the proposed feature reconstruction algorithm, in every transformed feature vector, maximum pooling resampling is performed on these class feature vectors, effectively reducing the data dimensionality and improving the feature density.
The feature reconstruction algorithm designed in this study reduces the memory cost for calculating and transmitting feature data and preserves their diversity.Therefore, the operational effectiveness of fault diagnosis increases.

Improved SAE model
Some problems of traditional SAE in the feature extraction of rotating machinery are the similarity and inadequacy of feature learning caused by the random initialization of network hyperparameters and the restriction of parameter size in the loss function on optimization efficiency.An improved SAE model is constructed to address these issues.
First, the initialization constraint condition of the network hyperparameters was designed as follows: The input layer size of the SAE connection weight W is presumed to be Z in and the output layer size to be Z ou .
Additionally, y = 6 Zin+Zou is developed.Formula ( 12) is then constructed for the initialization of W, where RF (•) is a random function, Second, the loss function of the SAE was improved, and a detailed description is as follows: Compared to the KL divergence function, the advantages of the L1 norm lie in its ability to sparsify the weights and characteristics of a single hyperparameter.Therefore, the sparseconstraint term in (13) was designed, where Dim represents the dimensions of the hidden layer feature vector, In contrast, for fully mining the most discriminative features in the input signal, the weight regularization items about W en  In is constructed, as shown in formula (14), where n denotes the dimension of the input sample and m is the feature dimension, Finally, after employing the well-trained improved SAE to extract features, softmax was selected as the terminal classifier for the discrimination mission realization.

Proposed diagnosis method based on feature reconstruction and improved SAE model
In this study, a feature reconstruction and SAE model combined with a recognition scheme for multiple mixed faults was developed.The general procedure is shown in figure 1.First, the original vibration signals are acquired by monitoring the rotating machinery.Then, the obtained signals are decomposed, with one part enhanced by the proposed feature reconstruction algorithm to form a training set and the other part intercepted to form a testing set.Next, the training samples were input into the improved SAE to train the deep model.Finally, the failure types and levels in the test sets are successfully determined using a well-trained model.
The network was trained using a greedy algorithm in a layered manner for exceptional discriminative performance.The improved SAE hyperparameters are individually trained for each layer in the forward pretraining part, and in the back-fine-tuning part, the hyperparameters of each layer of the diagnosis model are updated together based on the backpropagation algorithm.

Diagnostic model hyper-parameters configuration
To achieve a preferable classification performance, reasonably configuring the hyperparameters for the deep framework is crucial.Consequently, an experimental method was adopted to obtain the optimal hyperparameters of the diagnostic model.Specifically, parameter configuration tests were conducted on unloaded rolling bearings based on the vibration signals.This set included one normal dataset and nine types of failure data.The dataset is rich in normal-condition data, whereas the failure data are in small volumes.Therefore, the proposed feature reconstruction algorithm was first employed to construct a training-sample set as follows: (1) The truncation method is adopted to get a data sample matrix with size 1200 × 300 for the normal-condition dataset of the bearing.(2) Sliding windows with sizes of 64, 128, and 256 were employed to decompose the nine fault condition datasets of the bearing into three sub-datasets.(3) For each fault condition, a maximum pooling layer with a window size of 1 × 2 is implanted on every sub-data sample set, achieving a reduction in the dimensionality of the sub-data sample matrix.(4) The data sample matrix in (1) was combined with the data sample set of all fault conditions after dimensionality reduction to form a training-sample set with size 1200 × 3000.
The truncation method was then used to construct a testing sample set with size 1200 × 300.The training and testing sample sets have different distributions, which are more suitable for revealing the generalization ability of the proposed model.
In particular, considering both contingency and randomness, 30 consecutive trials were conducted, and the hyperparameters for the deep framework were effectively arranged based on the average experimental effect.

Configuration of hidden layer neuron size.
The capacity of neurons decreases with the depth of the network [28].Figure 2 gives test results between the hidden layer neuron sizes and diagnostic indicators.
Figure 2(a) shows the diagnostic accuracy obtained from 30 consecutive trials under different neuron-size conditions, whereas figure 2(b) shows the average training and testing times of these trials.Therefore, as highlighted by the red circle in the figure, a comprehensive consideration of the diagnostic accuracy and efficiency of 200 × 100 was determined as the hidden layer neuron size configuration scheme in this study.

Configuration of measurement coefficient of loss function.
(1) Configuration of measurement coefficient of first hidden layer weight regularization term.In the cost function, the ratio of the designed weight regularization item of W en  In was achieved by adjusting the proportiontuning parameter.Thus, appropriately configuring the measurement coefficient is a feasible method for detecting the most discriminative features of a vibration signal.Figure 3 shows the relationship between the measurement coefficient and recognition accuracy.As shown in figure 3, when the value of measurement coefficient was ∈ [0.01, 0.1], the test accuracy of the designed framework remained above 99%.Furthermore, as the measurement coefficient deviated from this interval, the evaluation indicator of the framework exhibited a downward trend.Finally, in this study, 0.01 is determined as the configuration scheme for this measurement coefficient, as indicated by the black circle in figure 3.
(2) Configuration of measurement coefficient of the sparseconstraint term.To maximize the satisfaction of the feature detection requirements and improve the diagnostic efficiency of this model in the cost function, it is necessary to reasonably allocate the ratio of sparseness conditions.Consequently, the proportion of the tuning-parameter configurations has become particularly important.Figure 4 shows the relationship between the measurement coefficient and the diagnostic accuracy.
From figure 4, when measurement coefficient was ∈ [3,5], the diagnostic accuracy of the model was always greater than 99%.As the measurement coefficient continued to decrease, diagnostic accuracy showed a downward trend.Therefore, 5 is determined as the configuration scheme for the measurement coefficient in this study, as indicated by the blue circle in figure 4.

Diagnostic model performance evaluation criteria
Furthermore, for performance evaluation of the proposed diagnostic model, the strategy in section 4.1 is still adopted to construct the training/testing sample set of rolling bearings.Tests were then conducted to investigate the impact of changes in the size of the condition-monitoring data on the diagnostic results.The process is as follows.
(1) Consider that the normal-condition data are rich data with a length of L N and remain unchanged, while the failure ones are scarce data with the length of L F = LN φ , where φ > 1.Then, the imbalance ration φ between scarce classes and rich class data is shown below.
(2) φ is increased in steps of 10, and then φ i is acquired, where Although figure 5 shows distribution differences in training and testing sample sets, for this diagnostic framework, the mean testing accuracy still exceeds 99% at φ i = 50.With the continuous increase in φ i , the size of fault condition data is sharply decreased, and the diagnostic performance of the model shows a downward trend.Fortunately, the average testing accuracy could remain over 94% when φ i = 100.However, as i continues to increase, the proposed method is hardly accurate to complete the diagnostic task under severe shortage of fault data.Therefore, the model proposed in this paper is particularly suitable for high-accuracy diagnosis situations with φ i ⩽ 50.

Multiple mixed faults diagnosis in gearbox
A gearbox is a mechanical system used to increase/decrease torque through deceleration.It consists of two or more gears, of which one is driven by an electric motor.Gearboxes are typically preferred in constant-speed applications because they provide increased torque.Unlike element failures, most gearbox faults are mixed rather than single-point failures.In some ways, they map the interactions between units that correspond more to practical mechanical faults.Compared to a single fault, it also mixes more noise in the vibration signal, making effective fault diagnosis more difficult.Therefore, to verify the effectiveness of the proposed method, a gearbox fault-detection dataset collected from the bedstand [29] served by the Qianpeng Company was employed.The gearbox was formed by meshing the primary and secondary gears.In the test, the acceleration sensor installed on the gearbox collected vibration signals at a sampling frequency of 5.12 kHz.Specifically, the following types of vibration data were collected: (1) normal (N), (2) primary gear-pitting malfunction (PGP), (3) primary gear tooth breakage malfunction (PGB), (4) primary gear tooth breakage and secondary gear wear malfunctions (PGB&SGW), (5) primary gear corrosive pitting and secondary gear wear malfunctions (PGP and SGW), and ( 6) secondary gear wear malfunction (SGW).In the case of loading, the drive motor operated at 1 h and 3 h.For the obtained monitor signals, those in the non-fault state were a rich category and those in the faulty state were scarce categories.Imbalance ration from scarce to rich class data is φ = 20.
First, the proposed feature reconstruction algorithm is adopted to construct the training-sample set using the following steps: (1) For the rich class dataset, the truncation method is employed to get the data sample matrix with size 512 × 300.
(2) Sliding windows with sizes of 64, 128, and 256 are used to decompose the five scarce class datasets into three subdatasets.(3) For each scarce class dataset, a maximum pooling layer with a window size of 1 × 2 is implanted on each subdataset reducing the dimensionality of the sub-data sample matrix.(4) The data sample matrix in (1) was combined with those of all scarce class data after dimensionality reduction to form the training-sample set with a size 512 × 1800.
The truncation method was then adopted to construct a testing sample set of size 512 × 420.The training and testing sample sets exhibited different distributions.
A detailed description of the experimental sample sets obtained from the aforementioned operations is provided in table 1.
The input/output neuron dimension in this diagnostic model was set as 512/6, whereas the other hyperparameters in the experiment followed the configuration scheme in section 4.1.In addition, formula (12) in section 3.1 is employed for initializing the weight matrix, while a zero vector is assigned to bias.The technical specifications of the computer used for model training were as follows: CPU Core i7-6700 3.4 GHz with 16 GB of RAM.

Validation experiments and analysis.
Considering both occasionality and randomness, ten consecutive trials are conducted; then, figure 6 gives the precision of these tests.The solid dots indicate the accuracy of a single test, and the blue dashed line indicates the average accuracy of ten trials.From figures 6(a) and (b), it can be observed that even under different loading conditions, the proposed method can detect fault types and determine gearbox health modes with an accuracy of over 99%.
The accumulated testing results for each health condition under different load conditions were calculated, as shown in figure 7, where figure 7(a) shows the correctly predicted sample size and practical sample size of various health modes obtained from ten cumulative trials under loading mode 1 (1 hp).Obviously, only the correctly classified dimensions of PGB, SGW, and SGW are slightly fewer than the practical dimensions.Similarly, figure 7   For the quantitative analysis shown in figure 7, the mean values of the classification precision for each state under different load conditions were calculated, as listed in table 2, which clearly shows the distinct differences in every status identification.However, the developed scheme achieved a diagnostic precision of over 98.8% for multiple mixed faults in the gearbox, even with a load disturbance.
Further, to detect the specific situation of misclassification and missing classification, the classification matrices of confusion under loading mode 1 (1 hp) and loading mode 2 (3 hp) are generated, as shown in figure 8, where the elements on the diagonal of the matrix represent the correctly predicted sample size while the elements outside the diagonal are the incorrectly predicted sample size.
In loading mode 1 (1 hp), the single-point fault SGW is mainly misdiagnosed as a multiple fault PGB and SGW, and figure 8(a) illustrates these conclusions.From figure 8(b), increased loads make the misdiagnosis of loading mode 2 (3 hp) more complex.Multiple-fault SGW are misdiagnosed as single-point fault PGP, and singlefault PGB and SGW are misdiagnosed as multiple-fault SGW.However, notably, this developed scheme satisfies the low misclassification scale in multiple mixed-fault diagnoses in the gearbox.The missing classification rate is 0.
Finally, table 3 provides the comprehensive diagnostic indicators for the developed scheme.Under load interference and ambient noise, the mean values of diagnostic precision for multiple mixed faults in the gearbox were above 99.4%, the average training time for this diagnostic framework was less than 20 s, and the average testing time did not exceed 4 ms.Therefore, even in the case of an imbalanced distribution of condition data, the proposed method can quickly and accurately identify fault types and determine the health modes for the gearbox.MDRMA-MSCM [17] 97.714 97.428 MA-MOCO [18] 98.285 98.143 Improved deep forest [30] 93.238 92.976 MP-DBN [31] 97.524 97.128 KPCA + AE [32] 96.436 96.064 CNN + TL [33] 97.143 96.857The proposed method 99.786 99.405

Comparative experiments and analysis.
In this section, six methods multi-dilation rates and multi-attention mechanism (MDRMA-MSCM) [17], MA-MOCO [18], improved deep forest [30], mixed pooling deep belief network (MP-DBN) [31], AE [32], and convolutional neural network + transfer learning (CNN + TL [33]) were employed to analyze the same dataset.Considering the testing stability, ten consecutive experiments were conducted; the average accuracy of the comparison methods is shown in table 4.
The accuracy of multi-scale convolution model based on MDRMA-MSCM method in [17] is relatively high.However, it is mainly aimed at single-point fault diagnosis and does not fully consider mixed-fault detection issues.The MA-MOCO learning-based model designed in [18] successfully handled wind turbine fault diagnosis, which is difficult to generalize for other mechanical condition recognition.As described in [30], the improved deep forest-based method reduces the burden of hyperparameter tuning.However, it ignores uneven data distribution, and there is still room for improvement in diagnostic accuracy.The fine diagnostic performance of the MP-DBN-based method proposed in [31] relies on massive sample data, images, and a powerful computer processing ability.The AE-based method designed in [32] performed well in handling single-point fault diagnosis issues with a balanced data distribution.However, it is not competent when faced with multiple mixed-fault diagnoses under imbalanced data distribution conditions.Fan et al [33] employed a CNN and TL for rolling-bearing single-point fault diagnosis.However, they are not well suited for other mechanical applications.
In summary, the method proposed in this study can effectively achieve multiple mixed-fault diagnoses of a gearbox with a small training-sample size, and its performance is superior to that of traditional methods.There are several reasons for this achievement.The designed feature reconstruction algorithm enhances the feature representation of scarce data and reduces  running costs.In addition, the constructed model quickly mined valuable features from the original signal, whereas the model parameter size decreased.Furthermore, the improved SAE model alleviates the impact of overfitting on model accuracy and strengthens the diagnostic accuracy and model generalization ability.

Multiple mixed faults diagnosis in rolling bearings
Rolling bearings are critical components of rotating machinery, and their quality directly determines the condition of the machine.Owing to the harsh work environment, the failure of rolling bearings is inevitable, and malfunctioning situations are unpredictable.Such multiple mixed faults seriously damage the stability and performance of the machinery.Therefore, to further verify the availability and generalization ability of the developed diagnosis scheme, a public experiment dataset was employed in this section, i.e. the rolling bearing fault detection dataset provided by the Chair of Design and Drive Technology at Paderborn University [34].The main modules and parameters of the testbed are shown in figure 9 and table 5. Malfunction data on these bearings were produced based on accelerated life testing.Three types of rolling bearings were used in this experiment: healthy bearing (HB), inner circle failure bearing (IFB), and outer circle failure bearing (OFB).Three types of failures were observed: single spot failure (SiF), repeated failure (ReF), and multiple failures (MuFs).There are three levels of fault severity: D1, D2, and D3.Tables 6  and 7 list the full instructions on the experimental dataset.The fault modes in this dataset vary, especially for multiple mixed faults.Therefore, fault detection in this section faces greater complexity and diagnosis difficulty.
In experimentation, the vibration data of HBs are abundant, while those of faulty bearings are scarce, where the imbalance ration between the scarce and rich class data is φ = 30.Subsequently, a truncation strategy was employed to produce a testing sample set with a size 2560 × 910.In this case, the training and testing sample sets still exhibited different distributions.
Finally, for the rolling bearings, a detailed description of the experimental sample sets collected from the above steps is presented in table 8.

Validation experiments and analysis.
Inputs/output neuron dimensions of the model were set to 2560/13 in this section.The remaining hyperparameters in the experiment still adopted the configuration scheme in section 4.1, and the parameter initialization strategy in section 5.1 was expected to reveal the effectiveness, stability, and generalization performance of the detection model developed in this study.
Ten consecutive trials were developed, considering the generality and universality of the results.Figure 10 displays the testing precision, where the solid dots denote the accuracy of a single test, and the green dashed line indicates the mean accuracy values of all experiments.Under coexisting multiple mixed faults with bearing loading, the proposed method can detect the actual health condition and fault severity with an accuracy of over 99.7%.
The accumulated testing results of the actual health condition of each bearing in ten trials were counted, and then figure 11 was drawn, where the X-axis marks the bearing code and the Y-axis provides the accumulated testing sample size.This shows that only the correctly classified dimensions of FB01 and FB12 are slightly fewer than the practical dimensions.
For the quantitative analysis of figure 11, the mean values of the classification precision for the state of each bearing were counted and are provided in table 9.This demonstrated that the identification results for each bearing were different.However, with complex fault coexistence and loading conditions, the proposed method achieved a diagnostic accuracy of over 99.5% for multiple mixed faults in rolling bearings.Additionally, to facilitate the observation of misclassification and missing classification, a classification matrix of confusion on health-condition detection for every bearing is generated and then displayed in figure 12, where numerals on the  diagonal of this matrix are the correctly predicted sample sizes, while the elements outside the diagonal denote the incorrectly predicted sample size.
The main problem in the detection is the misdiagnosis of the indentation fault of the outer ring of FB02 as the pitting fault of the outer ring of FB03, and the single-point fault of the outer ring of FB05 as the distributed multiple faults of   10.Based on the real-fault data collected by accelerated life testing, it can be observed that under load interference and ambient noise, the mean values of the diagnostic precision for multiple mixed faults in bearings are above 99.7%.However, with 3900 training samples, the average training time for this diagnostic framework was less than s, whereas the average testing time was less than 15 ms.Therefore, even in the case of an imbalanced distribution of condition-monitoring data, the proposed method can achieve failure-type and health-mode classification for different bearings quickly and accurately.

Comparative experiments and analysis.
Especially, to demonstrate further the classification and generalization ability of the developed scheme, six methods (MDRMA-MSCM [17], MA-MOCO [18], improved deep forest [30], MP-DBN [31], AE [32], and CNN + TL [33]) tested in section 5.1 are adopted again to analyze the same dataset of rolling bearings.Neglecting potential randomness and contingency, ten consecutive trials were developed; the final statistical results of the comparison methods are listed in table 11.
The method designed in [17] is particularly applicable to unmixed mechanical fault diagnosis, and it is necessary to improve the generalization performance of the model in [18].The method in [30] promoted the efficiency of hyperparameter tuning.However, the issues of multiple mixed faults, practical monitoring data, and uneven data distribution were not considered, which made it difficult to achieve satisfactory detection accuracy in this experiment.The method referred to in

Method
Average test accuracy (%) MDRMA-MSCM [17] 97.285 MA-MOCO [18] 98.428 Improved deep forest [30] 90.022 MP-DBN [31] 96.626 KPCA + AE [32] 94.748 CNN + TL [33] 97.285The proposed method 99.725 [32] is particularly suitable for the detection of bearing condition tasks with balanced data distribution and single-point faults only, whereas the recognition performance of the dataset in this test is significantly reduced.Compared to previous methods, the diagnostic accuracy of the method proposed in [31] is relatively good.However, it has strict restrictions on data capacity, image quantity, and computer performance, limiting the flexibility and universality of its applications.In addition, the results in [33] show that it is suitable for classifying the target domain data when the source domain contains sufficient mechanical state samples.
From the test results in table 11 and the above analysis, the method developed in this study can precisely complete the practical multiple mixed-fault diagnosis of rolling bearings with a small training-sample size, and its performance is superior to that of traditional methods.The main advantages of this method are as follows: (1) the designed feature reconstruction algorithm strengthens the feature representation of each bearing fault data and reduces running costs, (2) the constructed model can quickly extract valuable features from the original signal and decrease the model parameter size and (3) the improved SAE model alleviates the impact of overfitting on model accuracy and promotes diagnostic accuracy and model generalization ability.
The difference between the practical faults and the artificial damage existed according to the experimental products in sections 5.1 and 5.2 for the physical devices.Therefore, the design of a data-based fault diagnosis method should possess the ability to diagnose artificial damage and quickly and accurately achieve practical fault recognition and determine actual health condition, thereby improving the practicality of the diagnostic scheme.

Conclusions
In this study, a feature reconstruction and SAE model-based diagnostic method for multiple mixed faults is developed.For the long sequential features of vibration monitoring data, a feature reconstruction algorithm was designed to process the condition monitoring signal, solving the issues of feature extraction from scarce data and high computing costs.First, in multi-scale translation, the sub-features decomposed by three different sizes of sliding windows were connected to enhance data representation while retaining critical malfunction messages.Subsequently, at the signal transmission from multi-scale translation to the SAE model, a resampling with maximum implantation of the pooling layer was proposed to reduce the excessive redundant data improve network learning efficiency.Additionally, in the SAE model, the adaptive loss function was optimized to improve learnability with a smaller training-sample size.Finally, results of artificial-and real-fault diagnoses were analyzed, which verified the validity of this scheme on multiple occasions and overstepped the classical intelligent recognition strategy.The major contributions of this study are as follows.
(1) A hyperparameter configuration scheme for the designed diagnostic model was discussed and presented using a monitoring dataset independent of the cases in this study.(2) Diagnostic performance evaluation criteria for the applicability and limitations of the designed method were tested and proposed, and the effectiveness of concrete problems was pre-estimated under differential distributions of training and testing samples.(3) The proposed method can effectively diagnose multiple mixed faults under the class-monitoring data-imbalanced condition.With an imbalance ratio of 100 between the scarce and rich class data, the diagnostic accuracy remained over 94%.(4) Under the default hyperparameter settings, this developed scheme precisely achieves s states (i.e. one non-fault state + five multiple mixed failure states) of the gearbox and 13 states (i.e. one non-fault state + 12 multiple mixed failure states) of the rolling bearings, with diagnostic accuracies above 99.4% and 99.7%, respectively.(5) The designed feature reconstruction algorithm enhanced the feature representation of scarce data and reduced operating costs, effectively improving the diagnostic performance of SAE model.(6) Compared with traditional intelligence-based diagnostic methods, the proposed method reduced the constraints on weak fault feature mining under intense background noise, decreased the hyperparameter size, and improved the accuracy and convenience of multiple mixed-fault diagnoses under skewed data distribution conditions.
Therefore, it is necessary to focus on solving the practical engineering issues.Accordingly, future studies will focus on detection issues using cross-domain monitoring data, i.e. the training and testing samples collected from different datasets for (1) the same machine and (2) different machines.Combining this developed scheme with the concept of TL will be an excellent for future studies.

Figure 1 .
Figure 1.Procedure of the proposed feature reconstruction and improved SAE based diagnosis method.

Figure 2 .
Figure 2. Tests on neuron size of hidden layers to diagnostic results.

Figure 3 .
Figure 3. Tests on regularization term measurement factor to diagnostic accuracy.

Figure 4 .
Figure 4. Tests on sparse constraint term measurement factor to diagnostic accuracy.

Figure 5 .
Figure 5. Tests on imbalanced scale to diagnostic accuracy.
) Based on each φ i , the rolling bearing training sample-set is constructed with the strategy in section 4.1.(4) The rolling bearing test sample set is formed with the strategy in section 4.1, and all the training-sample sets corresponding to φ i share the same test set.(5) To avoid chance and randomness, this study developed 30 continuous tests, and figure 5 shows the mean results of the tests.

Figure 6 .
Figure 6.Diagnosis accuracy of gearbox multiple mixed faults different loading mode.
(b)  shows the cumulative test results under loading mode 2 (3 hp), which are consistent with those in figure7(a).

Figure 7 .
Figure 7. Accumulated testing results of gearbox every health condition under different loading mode.

Figure 8 .
Figure 8. Classification confusion matrices of gearbox multiple mixed faults diagnosis under different loading mode.

Table 6 .
Description of operating condition for healthy bearing.Bearing code Run-in period (h) Radial load (N) Speed (min −1 ) proposed feature-reconstruction algorithm was employed to form the training-sample set.(1) For the rich class dataset, the truncation strategy is used to obtain the data sample matrix with size 2560 × 300.(2) Sliding windows with sizes of 64, 128, and 256 are adopted to disassemble the twelve scarce class datasets into three sub-datasets.(3) For each scarce class dataset, a maximum pooling layer with a window size of 1 × 2 is implanted on each subdataset, achieving a dimensionality reduction of the subdata sample matrix.(4) The data sample matrix in (1) was combined with those ones of all scarce class data after dimensionality reduction to construct a training-sample set with a size 2560 × 3900.

Figure 10 .
Figure 10.Diagnosis accuracy of loaded bearings with multiple mixed faults.

Figure 11 .
Figure 11.Accumulated testing results of loaded bearings health condition.

Figure 12 .
Figure 12.Classification confusion matrix of loaded bearings multiple mixed faults diagnosis.

Table 1 .
Detailed description of sample sets for gearbox multiple mixed faults diagnosis.

Table 2 .
Average test accuracy of gearbox every health condition different loading mode.

Table 3 .
Comprehensive diagnosis indicators of gearbox multiple mixed faults diagnosis under different loading mode.

Table 4 .
Diagnostic accuracy of gearbox comparison for different methods.

Table 5 .
Description of test rig parameters.

Table 7 .
Description of operating conditions for damaged bearings.

Table 8 .
Detailed description of sample sets for bearings multiple mixed faults diagnosis.

Table 9 .
Average test accuracy of every loaded bearing health condition.

Table 10 .
Comprehensive diagnosis indicators of loaded bearings multiple mixed faults diagnosis.

Table 11 .
Diagnostic accuracy of loaded bearings comparison for different methods.