Fault diagnosis of a planetary gearbox based on a local bi-spectrum and a convolutional neural network

The transmission paths of vibration signals in the planetary gearboxes are complex. The signals have the characteristics of strong background noise, instability and non-Gaussian. Bi-spectrums can suppress Gaussian colored noise and are suitable for vibration signal processing of planetary gearboxes. In the traditional fault diagnosis methods based on bi-spectrums, the amplitudes of fault characteristic frequency, or the other further quantitative calculations values, are generally used as the basis of fault diagnosis processes. It has been found that bi-spectrum images can directly characterize the faults of the planetary gearboxes. Convolutional neural networks (CNNs) have been used in mechanical fault diagnoses in recent years. One-dimensional original signals are converted into two-dimensional images as CNN input, which is an effective method for mechanical fault diagnoses. At the present time, there has not been any relevant research conducted using bi-spectral images as CNN input. In this study, a fault diagnosis method based on local bi-spectrum and CNN was proposed. A bi-spectral analysis of the vibration signals of the planetary gearbox was first carried out in order to reveal the fault information while retaining the non-Gaussian information. Then, according to the bi-spectrum symmetry, local images containing the main information were taken as the input of the CNN, which reduced the redundancy of the fault information. Then, in order to improve the diagnostic accuracy of the CNN, the key parameters of CNN architecture were optimized. Finally, a CNN diagnosis model was built to realize the classification diagnoses of different fault positions and different fault degrees of planetary gearboxes. This study’s comparison of the diagnosis results of the full bi-spectrum + CNN, original vibration signal + CNN, local bi-spectrum + (support vector machines), and local bi-spectrum + (stacked auto-encoder) showed that the proposed method in this study had achieved both accuracy and rapidity in the fault diagnoses of planetary gearboxes.

The transmission paths of vibration signals in the planetary gearboxes are complex. The signals have the characteristics of strong background noise, instability and non-Gaussian. Bi-spectrums can suppress Gaussian colored noise and are suitable for vibration signal processing of planetary gearboxes. In the traditional fault diagnosis methods based on bi-spectrums, the amplitudes of fault characteristic frequency, or the other further quantitative calculations values, are generally used as the basis of fault diagnosis processes. It has been found that bi-spectrum images can directly characterize the faults of the planetary gearboxes. Convolutional neural networks (CNNs) have been used in mechanical fault diagnoses in recent years. One-dimensional original signals are converted into two-dimensional images as CNN input, which is an effective method for mechanical fault diagnoses. At the present time, there has not been any relevant research conducted using bi-spectral images as CNN input. In this study, a fault diagnosis method based on local bi-spectrum and CNN was proposed. A bi-spectral analysis of the vibration signals of the planetary gearbox was first carried out in order to reveal the fault information while retaining the non-Gaussian information. Then, according to the bi-spectrum symmetry, local images containing the main information were taken as the input of the CNN, which reduced the redundancy of the fault information. Then, in order to improve the diagnostic accuracy of the CNN, the key parameters of CNN architecture were optimized. Finally, a CNN diagnosis model was built to realize the classification diagnoses of different fault positions and different fault degrees of planetary gearboxes. This study's comparison of the diagnosis results of the full bi-spectrum + CNN, original vibration signal + CNN, local bi-spectrum + (support vector machines), and local bi-spectrum + (stacked auto-encoder) showed that the proposed method in this study had achieved both accuracy and rapidity in the fault diagnoses of planetary gearboxes.
Keywords: convolutional neural networks, local bi-spectrum, planetary gearboxes, fault diagnoses (Some figures may appear in color only in the online journal) * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Introduction
Planetary gearboxes are widely used in large and complex mechanical equipment, such as aviation, metallurgy, and ship equipment. As an important part of the transmission system, its normal running conditions directly determine the performance and safety of the entire system. In the actual operations of planetary gear pairs, its complex working environment often leads to broken teeth and other faults of key components, such as sun gears, planet gears, and inner ring gears. If maintenance work is not carried out in a timely manner, it is likely to lead to the shutdown of the entire system [1,2]. Therefore, research regarding the operational condition monitoring and fault diagnoses of planetary gearboxes is of major significance to the normal operations of maintenance equipment and the safety of personnel. The vibration signal characteristics of a planetary gearbox are significantly different from those of a fixed-shaft gearbox. The planetary gears not only rotate around their own rotation axis, but also rotate along with the planet carrier around the axes of other gears. Therefore, the vibration responses are complex with various frequency characteristics, making it difficult to extract the fault characteristics [3]. In addition, in a planetary gearbox, the sensor is generally installed on the box connected to the ring gear, and the relative position between the sensor and the meshing points of the two meshing pairs (for example, a sun gearplanet gear or a planet gear-ring gear) will change with the rotation of the planet carrier. This will result in changes to the vibration transmission pathways [4]. Consequently, the fault vibration signals of the planetary gear train will present strong background noise and non-stationary and non-Gaussian characteristics, which increases the difficulty of the fault diagnoses.
Nowadays, many time-frequency analysis techniques have been successfully used for fault detection of planetary gears. Among them, the representative methods include empirical mode decomposition (EMD), wavelet transform (WT), variational mode decomposition (VMD), and local mean decomposition (LMD) [2]. However, these methods have some inherent weaknesses for non-stationary signals. EMD decomposition has the problems of modal aliasing and end effect. WT and VMD methods are non-adaptive. LMD method has the problems of end effect, large number of smoothing times, the signal will occur in advance or delay, the step length cannot be optimized in smoothing, and there is no fast algorithm [5]. In addition, these time-frequency domain analysis methods cannot retain the phase information between frequency components. Bi-spectrums based on high-order statistics can completely retain the amplitude, frequency, and phase information of a signal, and have the characteristics of timeshift invariance, scale variations, and phase retention. Therefore, they can effectively suppress Gaussian colored noise, making them suitable for the feature extractions of strong background noise and non-stationary signals. Jiang et al [6] proposed a method of fault feature extraction which combined EMD and bi-spectral analysis. The method was successfully applied to the fault diagnoses of rolling bearings. Chen et al [7] effectively diagnosed early faults of motor bearings by combining improved EMD with bi-spectrum analysis. Liang et al [8] realized the fault diagnoses of the tooth surface wear of planetary gearboxes based on the bi-spectral features and the 1.5D spectrum features of the signals. The abovementioned fault diagnosis methods based on bi-spectrum techniques generally used the amplitudes of the fault characteristic frequencies of the bi-spectrums or bi-spectrum slices, or the further quantitative calculations of the fault characteristic values as the basis of the fault diagnoses. However, the bispectrums of different types of gear faults, and even the different degrees of gear faults, will show obvious differences in the images, and can directly represent the gear faults. For example, the author used the bispectrum distribution area as the image feature for gear fault diagnosis [9]. The target template was constructed by bispectrum binary image to realize the matching diagnosis of typical gear faults. However, the degree of intelligence of this method is not high, and a large number of samples are easy to produce confusion.
The machine learning fault diagnosis methods based on bispectrum have existed in some applications, for example, Cao et al [10] proposed a radar emitter signal recognition method, which performed bispectrum estimation of radar signals and used hierarchical extreme learning machine for further feature learning and recognition. Mitiche et al [11] used convolutional neural network (CNN) and ResNet to classify the bispectrum images, and successfully realizes the classification of insulation faults in high voltage environment. CNN, proposed by Yann Lecun of New York University in 1998, is essentially a multi-layer perceptron. Strong feature self-learning ability can avoid the disadvantages of time consuming and non-general feature selection in traditional algorithms. It has been applied in speech recognition [12,13] and image classifications [14,15] processes, as well as other fields. In the past five years, CNN has been applied in the field of mechanical fault diagnosis. Initially, it was mainly used due to its powerful feature self-learning ability for the purpose of obtaining fault features directly from the original signals, such as Olivier et al [16] proposed a feature learning model for condition monitoring based on CNN to autonomously learn useful features for bearing fault detection from the data itself. Min et al [17] considered both temporal and spatial information of the raw data from multiple sensors during the training process of the CNN, and extracted representative features automatically from the raw signals. Then, with the wide application of CNN in image recognition, the field of mechanical fault diagnosis began to convert one-dimensional (1D) original time-domain signals into two-dimensional (2D) images and used them as input for CNN in order to train classification models. For example: Long et al [18] proposed through a conversion method to convert signals into 2D images, then to use a new CNN based on LeNet-5 for fault diagnosis. They considered the proposed method can extract the features of the converted 2D images and eliminate the effect of handcrafted features. Zhiyu et al [19] and Ying et al [20] considering to take advantage of existing research results of CNN that were originally designed for images, applied short-time Fourier transform to convert signals to 2D graphs in their research, then used to CNN to classify the fault category. Huaqing et al [21] converted vibration signals from multiple sensors to images. They considered feature maps of different fault types can be obtained like this without tedious parameter adjustments. Next, based on the feature maps, a corresponding novel CNN model is constructed to perform fault diagnosis. Zhuyun et al [22] proposed a novel deep learning (DL)-based fault diagnosis method, based on 2D map representations of cyclic spectral coherence and CNNs, to improve the recognition performance of rolling element bearing faults. Therefore, it has been found to be an effective method to use 2D images as the input for CNN when carrying out mechanical fault diagnoses.
Taking bi-spectral images as input, a CNN is used to construct classifiers for purpose of the classification diagnosis of faults of planetary gearboxes, which can make full use of the advantages of the two separate methods. Since bispectrums are symmetrical, the fault information contained in a bi-spectrum has great redundancy. Therefore, directly taking a bi-spectrum as the input sample of CNN will reduce the training efficiency of the CNN. However, it has been found that replacing global bi-spectrums with local bi-spectrums containing global information will not only reduce the redundancy of the fault information but will also reduce the dimension of the CNN training sample and improve the training speed of the CNN. The performance of a CNN is largely dependent on its architecture, including the number of layers, number of convolution kernels, sizes of the convolution kernels, and the stepsize of the convolution kernels. Therefore, optimizing those parameters can further improve the classification accuracy of the CNN. In the current research, the local bispectrum of planetary gearbox signal was used to construct CNN input samples to enhance the generalization performance of CNN diagnosis model. In addition, in order to improve the fault diagnosis accuracy of planetary gearbox, the key structural parameters of CNN were optimized.

Theory of bi-spectrum
Bi-spectrum is the Fourier transform of a third-order cumulant. In high-order spectrum, the order of bi-spectrum is the lowest, so it has the lowest complexity and strong practicability. In the current study, {x (n)} was set as a zero-mean non-stationary random signal, and its auto-correlation function was defined as follows: where E {x (n) x (n + τ)} is the mathematical expectation. A power spectrum is defined as the Fourier transform of an autocorrelation function: The third-order moment of x (n) is defined as: A bi-spectrum defines the 2D Fourier transform of the third moment: A bi-spectrum can be estimated to be: where X (ω) is the Fourier transform of x (n).

Theory of convolution neural network
CNN has a strong ability of feature learning and expression. It has the characteristics of weight sharing, which can significantly reduce the complexity of network computing and can process image information well. CNN structure mainly includes input layer, convolution layer, pooling layer and fully connected layer and output layer, as shown in figure 1.
The input layer is used to accept input images with predefined labels as valid training samples. The quality of the input image affects CNN performance, such as accuracy and efficiency.
The main function of convolution layer is to extract the features of input image information. The convolution kernel obtains the filtering results by sliding on the input image. The convolution kernel does not need to be manually designed, but is obtained by random initialization, and then optimized by backward propagation to obtain better recognition results. The operation expression of convolution layer is where l is the current number of layers, M is the number of input feature maps, x l j is the jth feature graph of the lth layer, ij is the weight matrix of convolution kernel, b l j is the bias, f is the activation function, * is convolution operation [23].
The batch normalization (BN) layer normalizes the input before each layer of the network inputs. We know that when the network is trained, the parameters will be updated. In addition to the input layer, the input data distribution of each layer behind is always changing. This phenomenon of data distribution change is called internal covariate displacement. BN is to solve in the training process, the middle layer data distribution changes. The expression of BN is where µ is the mean value of x l−1 i , σ is the standard deviation of x l−1 i , γ is the scale factor, β is the moving factor [24]. ReLu allows the network to introduce sparsity by itself [25]. Compared with Sigmoid and tanh, ReLU can converge rapidly in stochastic gradient descent (SGD). The expression is The pooling layer compresses the extracted features. After feature extraction in the convolution layer, the output feature map is passed to the pooling layer for feature selection and information filtering. In this paper, the maximum pooling method is selected, and the maximum value of each sampling area is calculated as the output of the pooling layer. The pooling layer expression is where down () represents a down-sampling function, and each output map is given its own multiplicative bias β and additive bias. If there are N input maps, there will be exactly N output maps.
The full connection layer maps the distributed feature representation to the sample space. The feature map will lose the spatial topology in the full connection layer and be expanded as a vector. The full connection layer combines the extracted features nonlinearly to obtain the output. SoftMax classifier can be used for classification, and the output of layer l is where M is the dimension of the input vector, N is the dimension of the output vector, and α ij is the weight of the jth output connected to the ith input.

Fault diagnosis method of a planetary gearbox based on a local bi-spectrum and a CNN
In this study, due to the symmetry of the bi-spectrum, the redundant information could be removed and only part of the quadrant information could be selected for analysis purposes.
In this investigation, one quarter of the bi-spectral image was intercepted in order to obtain its local bi-spectral image for further analysis. This was then used as the input signal of the CNN, which not only completely described the original image information, but also improved the learning efficiency of the network. The fault diagnostic flow diagram of the planetary gearbox based on a local bi-spectrum and a CNN is shown in figure 2, with details as follows: (a) Bi-spectral analysis was performed on the collected vibration signals for the purpose of obtaining a bi-spectral map. (b) One quarter of the bi-spectrum was intercepted in order to obtain a local bi-spectrum which could effectively represent the global information. (c) A CNN parameter optimization sample library was constructed, and the number of CNN layers, number of convolution kernels, sizes of convolution kernels, and the stepsizes of the convolution kernels were optimized in this study by comprehensively considering the identification accuracy and running time.

Data acquisition
In this study, the effectiveness of the proposed diagnosis method based on a local bi-spectrum and a CNN was proven through fault diagnosis experiments of a planetary gearbox. The experimental data were obtained from the fault test bench for the planetary gearbox as shown in figure 3, which was mainly composed of a planetary gearbox and its accessory equipment and signal acquisition system. The planetary gearbox in the test bench is an NGW type planetary transmission gearbox, the shaft of the sun gear is the input shaft, the ring gear is fixed, and the planet carrier is the output shaft. The specific parameters are shown in table 1. The fault test bench can simulate various failure modes via replace different invalid parts for studying fault diagnoses techniques. The invalid parts include half broken teeth of a sun gear; full broken teeth of a sun gear; half broken teeth of a planet gear; full broken teeth of a planet gear; half broken teeth of an inner ring gear; and full broken teeth of an inner ring gear, shown in figure 4. In order to verify the effectiveness of the proposed method, vibration simulation tests of typical faults in a planetary gearbox were carried out. The vibration signals of a normal state; half broken teeth of a sun gear; full broken teeth of a sun gear; half broken teeth of a planet gear; full broken teeth of a planet gear; half broken teeth of an inner ring gear; and full broken teeth of an inner ring gear were collected, which including a total of seven kinds of running state vibration signal. The sampling frequency was 12 384 Hz, and the signals at the rotation speed of 2400 r min −1 at the input end were recorded for further analysis. In total, 480 samples of each operation state were selected, with a total of 3360 samples of the seven operation states obtained.

Bi-spectrum analysis
In theory, a bi-spectrum has the ability to completely suppress Gaussian noise and reveal nonlinear information in fault signals while retaining useful non-Gaussian components. Figure 5 shows the bi-spectrum of a normal gear. Figure 6 shows the bi-spectrum of the different fault degrees of broken teeth of a sun gear. Figure 7 illustrates the bi-spectrum of different fault degrees of broken teeth of a planet gear. The bi-spectrum of different fault degrees of broken teeth of an inner ring gear is shown in figure 8.
Therefore, by comparing figures 5, 6, 7(a), and 8(a), it can be seen that there are obvious differences in the bi-spectrum forms of the vibration signals of planetary gearboxes at different fault positions. In addition, it can be seen in figures 6-8 that for the different fault degrees at the same location, the forms of the bi-spectrum were similar. However, with the increases in the fault degree, the coverage area of bi-spectrum had increased. Therefore, it could be seen that the bi-spectrum could represent the faults of different positions and different degrees of the planetary gear transmission system.

CNN parameter optimization
The performance of a CNN is largely dependent on its architecture, including the number of layers, number of convolution kernels, sizes of the convolution kernels, and the step-size of the convolution kernels.
In regard to the number of convolution layers, the number of layers of a model determines the depth of the model. Generally speaking, within a certain range, the nonlinear expression ability of the network will be enhanced by increases in the model depth. The extracted features will be clearer, and the effects of pattern recognition will improve. However, deep networks will increase the complexity of the model and reduce the training efficiency.
From the perspective of the number of convolution kernels, convolution kernels are used to extract signal features, and different convolution kernels will extract different features. Generally speaking, multiple convolution kernels are used to extract different features in a network in order to improve the accuracy of pattern recognition. At the same time, the number of convolution kernels will directly affect the size of the network. Therefore, using too many convolution kernels will increase the network redundancy, thereby affecting the recognition efficiency.
In regard to the sizes of the convolution kernels, in the cases of the same receptive fields, the smaller the convolution kernels are, the smaller the required parameters and calculation times will be.
The step-sizes of the convolution kernels refer to the convolution kernels sliding on the data with a certain step-size. In order to avoid information loss, the step-size is usually smaller than the convolution kernel size. The smaller the step size is, the more sufficient the data scanning will be. As a result, more complete feature information can be obtained. However, at the same time, more noise information will be repeatedly calculated and the generalization ability of the model will be reduced.
In order to improve the ability of a CNN to extract the fault information of a planetary gear bi-spectrum, the number of CNN layers, number of convolution kernels, sizes of the convolution kernels, and the step-sizes of the convolution kernels were optimized in this study by comprehensively considering both the recognition accuracy and the running time.
The CNN parameter optimization sample sets were constructed by selecting samples of normal state; half broken teeth  of a sun gear; half broken teeth of a planet gear; and half broken teeth of an inner ring gear. Introducing a BN layer into a CNN can alleviate the problem of gradient disappearance. As a result, a higher learning rate can be obtained, and the manual adjustment processes of the parameters can be reduced [26]. The activation layer adopted a linear rectification function, which was simple to calculate. The training speed of the model could then be increased, which effectively prevented over fitting. The sizes of the convolution kernels were set as 3 × 3, and convolution step-sizes were set as 1 and 2, respectively. In addition, convolution layers with different depths were set in order to compare the accuracy and running times of the fault identifications. The test results are shown in figure 9.
As can be seen in the figure 9, with the increases in the number of convolution kernels, the accuracy was accordingly improved. The results tended to be stable when the increases were within a certain extent. However, at the same time, the running time had increased with the increases in the number of convolution kernels. This was consistent with the results obtained in previous analyses. For example, the higher the number of convolution kernels was, the more fully the signal features could be extracted. However, when the features were fully extracted, the further addition of convolution kernels tended to lead to redundancy of the network parameters and affected the operating efficiency. It can also be seen in the figure that the factors affecting the running times included the number of convolution layers and increases in the number of layers increased the running times.
The sizes of the convolution kernels were set as 5 × 5 and 7 × 7, respectively, and the step-sizes and depths were set accordingly. The accuracy of the fault identifications and running times are shown in figures 10 and 11.
Relu2 N = 12 10 Maxpool2 C = 2, N = 12, S = 2 11 Conv3 C = 7, N = 12, S = 2 12 Batchnorm3 N = 12 13 Relu3 N = 12 14 Maxpool3 As can be seen in the figures 10 and 11, too few steps of the convolution kernels tended to reduce the accuracy since repeated calculations of noise were caused when the convolution kernels had more sufficient scanning features. In this study, it was found that by comparing the operational diagrams of three convolutional kernel sizes, the effects were relatively best when the number of convolutional layers was 3; the number of steps was 2; the number of convolutional cores was 12; and the size of the convolutional cores was 7.
According to the comparison optimization, the number of layers, number of convolution kernels, sizes of the convolution kernels, and the number of the convolution steps of the CNN were obtained. The learning rate was set as 0.001, and the 'adam' optimizer was used to construct this study's CNN diagnosis model. The CNN architecture constructed in this article is shown in table 2.
In table 2, C is the size of the filters, N is the number of filters, S is the stride step size, W is the loss rate of dropout layer and F is the number of nodes.

Analysis of the diagnosis results
In order to verify the effectiveness of the method, five sample sets were constructed, as shown in table 3 In this study's experimental process, 300 samples of each operational state were selected as training samples in order to construct the classifier. In addition, 180 samples were used as test samples for the classification diagnosis verifications. The input size of full bispectrum is 512 × 512, and the input size of local bispectrum is 128 × 128. The diagnosis results of each sample set are detailed in tables 4-8.
As can be seen in the tables, the local bi-spectrum + CNN method had not only realized the fault diagnoses of the different fault positions of the planetary gearbox, but had also realized the diagnoses of the same types of faults with different Normal state, half broken teeth of an inner ring gear, full broken teeth of an inner ring gear Sample set 5 Normal state, half broken teeth of a sun gear, full broken teeth of a sun gear, half broken teeth of a planet gear, full broken teeth of a planet gear, half broken teeth of an inner ring gear, full broken teeth of an inner ring gear  fault degrees. However, when compared with the sun gear and the inner gear ring, the diagnosis results of the different degrees of broken faults of the planet gear were lower. The reason was that the planet gear not only revolved around its own rotation axis, but also rotated with the planet carrier around the axis of the other gears. Therefore, its vibration responses were more complex and identifications were found to be more difficult. In order to further verify the effectiveness and advantages of the proposed method, full bi-spectrum + CNN, original data + CNN, local bi-spectrum + (support vector machines (SVMs)), and local bi-spectrum + (stacked autoencoder (SAE)) were each used to analyze the data of sample set 5. The fault identification results are shown in figure 12. The average accuracy rates and running times are shown in table 9.
In figure 12, XD, XQ, CD, CQ, TD, TQ and ZC represent half broken teeth of a planet gear, full broken teeth of a planet gear, half broken teeth of an inner ring gear, full broken teeth of an inner ring gear, half broken teeth of a sun gear, full broken teeth of a sun gear, and normal gear respectively. As can be seen from figure 12 and table 9, the accuracy of the diagnosis method based on full bi-spectrum + CNN was consistent with that of the local bi-spectrum + CNN proposed in this study. However, the running time of the proposed method was shorter by more than ten times. Also, when compared with the local bi-spectrum + SVM, bi-spectrum + SAE and the original data + CNN, it was found that the accuracy rates and running times were improved. The main reason for this was that the local bi-spectrum was only one quarter the size of the full bi-spectrum image, which reduced the training parameters in the CNN. As a result, the training time was greatly reduced. The original 1D vibration signal contained much noise, and the fault characteristics were not obvious. The direct diagnosis of it would not only reduce the accuracy, but also increase the training time. SVM is a binary classifier. The multi-classification problem encountered in this study was resolved by selecting and combining multiple binary classifiers. However, since a one-to-one combination method only uses two related training samples for training, it will be very slow in training and testing classifications when there are many training categories. SAE may lose some important information due to its sparsity constraints. In order to meet the input requirements of SAE, the image is compressed to one dimension, which makes the spatial information in the original information not utilized, resulting in a relatively low diagnostic rate. Therefore, a CNN based on a local bi-spectrum will have more advantages than the abovementioned methods.

Conclusions
This study's focus was the fault diagnosis of a planetary gearbox, and a diagnosis method based on a local bi-spectrum and a CNN was presented. The proposed method not only was found to have high accuracy for the different fault positions of a planetary gearbox, but also displayed high accuracy for different degrees of broken gear faults in each part. The proposed method also had a high recognition accuracy for mixed samples with different fault locations and different fault degrees. When comparing the local bi-spectrum + CNN diagnosis method proposed in this study with the full bispectrum + CNN method and the original vibration signal + CNN method, it was found that the local bi-spectrum was more suitable as a sample for a CNN for the purpose of fault diagnoses of broken teeth in planet gearboxes. Also, when compared with the local bi-spectrum + SVM method and the local bi-spectrum + SAE method, the proposed method was found to have more advantages from the aspects of both diagnosis accuracy and diagnosis time requirements.

Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.