Bi-filter multiscale-diversity-entropy-based weak feature extraction for a rotor-bearing system

Multiscale-based entropy methods have proven to be a promising tool for extracting fault information due to their high feature extraction ability and easy application. Despite multiscale analysis showing great potential in extracting fault characteristics, it has some drawbacks, such as cutting the data length and neglecting high-frequency information. This paper proposes a bi-filter multiscale diversity entropy (BMDE) to filter comprehensive fault information and address the data length problem. First, the low-frequency information is filtered out by moving average in a multi-low procedure and the high-frequency information is filtered out by an adjacent subtraction in a multi-high procedure. Second, a modified coarse-grained process is introduced to overcome the issue of data length. The validity of the BMDE method is evaluated using both simulation signals and experimental measurements. Results demonstrate that the proposed method offers optimal feature extraction capability with the highest diagnostic accuracy compared with four other traditional entropy-based diagnosis methods.


Introduction
Rotor-bearing systems play an important role in modern production and manufacturing fields [1][2][3][4]. Timely maintenance strategies can guarantee a long service life for a rotor-bearing system, enhance its economic benefits and reduce unplanned * Author to whom any correspondence should be addressed.
Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI. downtime [5][6][7]. Feature extraction is the core step for the condition monitoring of rotor-bearing systems. To this end, entropy theory emerged from information entropy, as originally proposed by Shannon. Information entropy reflects uncertainty and complexity in a system, which laid the foundations for modern information theory and digital communication.
Recently, entropy-based methods have been demonstrated to be a promising tool in extracting hidden information from measured vibration signals [8][9][10][11][12][13][14]. They have the advantages of independence of prior knowledge, there being no reprocessing manipulations required, and easy application [13,14]. The most commonly used entropy methods include approximate entropy (AE) [10], sample entropy (SE) [15][16][17], fuzzy entropy (FE) [12,13,18] and permutation entropy (PE) [19]. SE can be regarded as an improvement of AE with more reliability. Seeking greater stability, FE was then developed as an improvement for SE. Using the amplitude permutation of orbits to estimate dynamical complexity, PE provided a new perspective for dynamic complexity analysis. These methods each have their own intrinsic merits and drawbacks.
Unfortunately, there are two main drawbacks to using multiscale analysis. From one aspect, the course-graining operation can shorten the length of the time series [29], leading to large fluctuations and inaccurate complexity evaluation [27]. From another aspect, the coarse-graining procedure involved can be considered as a low-pass filter based on a Haar wavelet [29]. By performing this process, only the lowfrequency information is preserved, resulting in the loss of high-frequency information.
Aiming at overcoming the inherent drawback of shortening time series in coarse-grained algorithms, composite multiscale entropy was proposed [25]. In composite multiscale sample entropy (CMSE), at a factor of τ , the entropy value was computed for all processed time series and the CMSE value is defined as the means of τ value. High-frequency information loss was not addressed, although the data-length problem was addressed. Hierarchical procedure [30] can be regarded as an improvement of coarse-grained procedure which creates two frequency operators to multiply in the following layers. However, the signals in different layers do not distribute as high-frequency components or low-frequency components. There are also some other methods to enhance the coarsegrained method, but its drawbacks have not been satisfactorily addressed [31][32][33]. Therefore, it is necessary to introduce a new method to overcome these drawbacks. This paper proposes bi-filter multiscale analysis, which can decompose a raw signal into multiple-scale time series by a multi-high procedure and multi-low procedure, simultaneously. Note that the multi-low procedure aims to extract the low-frequency information over different scales by overlapping means. Meanwhile, high-frequency information can be gained through the multi-high procedure by transplacement subtraction. The multi-low procedure and multi-high procedure use a fixed-step-size sliding window to achieve relatively invariable data length. By doing this, the comprehensive fault information embedded in the low and high spectra can be extracted using bi-filter multiscale analysis, and thereby the drawbacks of the original coarse-graining process can be mitigated. Based on the superiority of bi-filter multiscale analysis, bi-filter multiscale diversity entropy (BMDE) is generated by calculating the diversity entropy value of the processed signals.
According to the whole fault diagnosis method, the BMDE is applied to fault-feature extraction from vibrational signals first. Then, a random forest (RF) [30,34] classifier is selected to identify the fault types using the features from the BMDE. Section 2 introduces the theoretical framework of the proposed method and the related theories. Section 3 presents a simulation model and simulated signal used to evaluate the performance of the BMDE. Section 4 validates the effectiveness of the proposed method with experimental bearing signals. Finally, section 5 summarizes the conclusions.

The shortcomings of the original multiscale analysis
The BMDE method is composed of two main steps. First, to generate multiple time series by a multiscale procedure; second, to calculate the entropy value of each coarse-grained time series. Figure 1 demonstrates the original multiscale procedure. The implementation steps of the multiscale process are as follows.
where τ indicates the scale factor, which should be fixed as a positive integer. The parameter τ represents the strength of the procedure.
Step 2. Calculate the BMDE value of the multiscale time series using equation (2). The detailed computational procedure can be found in [30]. The diversity entropy algorithm can be described via For a given time series of synthesized signals, the data length is 8196. The simulated signal is given by equation (3). The sampling frequency is fixed at 63 Hz. Figure 2 illustrates the waveforms of the multiple series. By comparing the three waveforms (τ = 1, τ = 3, τ = 5), it is noted that the envelope of the signal tends to flatten as the scale factor increases. The length of the data becomes shorter with a larger scale factor. Figure 3 represents spectrograms of multiple series (τ = 1, τ = 3, τ = 5). The low-frequency components are retained, and the high-frequency components are suppressed. It has been mentioned that the multiscale procedure is effectively a low-pass filter [29]. The resulting spectra  in figure 2 are consistent with multiscale theory. It demonstrates that the multiscale procedure excavates low-frequency information.

Improvements to multiscale procedure
To tackle the data-length problem, a new strategy, called bifilter multiscale analysis, is developed in this paper. From one aspect, a modified coarse-grained process is developed to solve the difficulty of data length when using original multiscale analysis. In this paper, the BMDE utilizes a sliding window as one step, and then most of the data points are reused in the generation of the next multiple series' data points. The schematic of new strategy shows great stable in data-length in figure 4. For example, the data length of the original time series is 1000. In the traditional multiscale procedure, the length of multiple series is reduced to 100, while the new strategy generates a data length for its multiple series of 980.
From another aspect, bi-filter multiscale analysis adopts a multi-high operator to generate high-frequency information. In this strategy, some of the data points are reused to generate adjacent multiple-series data points. This can enhance the relevance of the multiple series. Traditionally, via subtraction with adjacent signal points, the noise signal is offset and high-frequency information is retained. The strategy to gain high-frequency information is given in equation (4). In the multi-high operator, the x j constantly subtracts with the next τ − 1 points. Then, it calculates the mean value of the τ − 1 results: While the traditional process just considers the influence of the adjacent data points, the multi-high operator calculated by equation (4) contains N data points after x j . The frequency-loss problem is also considered in the new strategy.

The proposed BMDE method
In the proposed BMDE method, more frequency information can be captured and the length of data kept constant. The computational procedures are introduced as follows: Step 1. For a given time series where N indicates the data length and τ represents the scale factor. When the scale factor is fixed, X will be processed with the multi-low procedure and multihigh procedure, which are given by the multi-low operator and multi-high operator as below:  multi-low operator : multi-high operator : Step 2. Calculate the entropy value of the two types of series in each multiple-scale time series. The expression of the BMDE is given by A flow chart of the proposed method is given in figure 5. A diagram of the proposed method is shown in figure 6. It is worth noting that the process length of the time series will not be reduced dramatically. Two columns of data can be obtained using the proposed method, which represents the high-frequency components and low-frequency components, respectively.
Overall, a novel method that considers both the length of data and more frequency information is here proposed.
The BMDE values calculated by the procedure can be considered as fault features. After feature extraction by the proposed BMDE method, the separability can be evaluated by RF classification.

Simulation evaluation
To validate the reliability of the proposed method, a simulation bearing-structure model is constructed in this section. In this case, three types of faults (inner race fault, outer race fault and rolling element fault) are introduced. The detailed parameters of the tested bearing are listed in table 1. In the simulation, the rotation speed is set to 6000 rpm and the sampling frequency to 10 240 Hz. Structure diagrams of the three types of simulated faults are shown in figure 7. Assume that the damage point is located in the load zone of the rolling bearings and the sensor is located in the load zone with the maximum load density, as shown in figure 7(a). Suppose the damage point comes into contact with the rolling element at    (8): where d o denotes the impulse intensity, δ(t) denotes the unit impulse function, k is the number of generated pulses, f o signifies the characteristic frequency of the outer race fault and T o = 1/f o is the time interval between two impulses. The damping vibration function aroused by the impulse force is written as Because the time required for attenuation of the bearing vibration is much smaller than T o , the damage point of the outer race will continuously generate impact pulses at frequency f o during rotation. The outer race fault signal can be calculated by equation (10):

Inner race fault model for rolling bearings.
The inner race fault model of rolling bearings is shown in figure 7(b). It is assumed that the rolling elements come into contact with the damage point in the inner race at the point of maximum load and the first pulse is aroused at t = 0. Upon ignoring the influence of load distribution and damage-point position, a series of impulsive forces generated by the damage point can be expressed as equation (11), which is similar to the outer race fault: where d i indicates the intensity of the impulsive force, δ(t) is the unit impulse function and k represents the number of impulses. In addition, f i is the characteristic frequency of the inner race fault and T i = 1/f i denotes the time interval between two impulses. The load distribution can be expressed as equation (12): In this paper, it is set as n = 1.1 and σ = 0.5. As shown in figure 7(b), when the damage point of the inner race contacts with the rolling element at a certain angle φ , the impulsive force collected by the sensor is the projection of which on the axis of the sensor. The expression of the influence coefficient of the damage-point location is as follows: where f r indicates the rotation frequency. In this way, the impulsive force on the axis of the sensor is given by equation (14): Finally, the simulated inner race fault signal can be expressed as equation (15): where A i is the conversion coefficient between the impulsive force and vibration. In this paper, A i = 1.

Rolling element fault model.
The simulated rolling element fault signal is similar to the bearing inner and outer simulations, and which is shown in equation (16): where A b is the conversion coefficient between the impulsive force and vibration. In this paper, A b = 1. After constructing fault models and obtaining corresponding simulated signals, figure 8 illustrates the raw time-domain signals and the corresponding spectra. The signals used are consistent with the built models.

Simulation results and analysis
To illustrate the superiority of the proposed method, the proposed BMDE is compared with hierarchical diversity entropy (HDE) and traditional multiscale diversity entropy (MDE) using the simulated signals. The length of the data used is fixed at 4096. To simulate a realistic noisy operating environment of the rotation machine, white noise interference is added to the simulated signals. The signal-to-noise ratio is set as 8. The parameters of the three methods are listed in table 2. Figure 9 shows the Euclidean distance of each set of entropy for different methods. The mean value of the features of the first 10 samples is taken as the base vector, and the vertical axis represents the Euclidean distance between the samples and the reference vector. The closer the Euclidean distance between the same fault sample and the reference vector, the better the stability of the entropy method. Furthermore, the larger the Euclidean distance between different fault samples is, the better the possible performance of the feature extraction and fault classification of the entropy method. The points numbered 1-50, 51-100 and 101-150 in (a)-(c) are generated by the simulated outer race fault signal, inner race fault signal and rolling element fault signal, respectively.     the proposed method demonstrates better distinguishability, stability and dependability than the other two entropy methods. Figure 10 presents frequency spectra of the data used through the multi-high process. The data used are simulated based on the constructed outer race fault model. Considering the length of the data, the scale is equal to 1, 2, 4, 6 and 8, respectively. It should be noted that figure 10(a) represents the spectrum of the original data when τ = 1.
As mentioned above, the multi-high process is designed to gain high-frequency information. Compared with the raw signal spectrum in figure 10(a), the frequency component in the  Casing friction support and blade disc 6 Test bearing pedestal 7 Worm and worm gear low area is attenuated and the frequency spectrum is boosted in the high area ((b)-(e) in figure 10) when the number of scales increases. Similarly, figure 11 presents frequency spectra of the data used in the multi-low progress, which is also obtained from the constructed outer race fault model. Compared with the spectrum of the raw signal when τ = 1, the frequency components gradually gather in the low-frequency area with the increase of the number of scales. The appearances of these multi-low and multi-high spectra are consistent with the theoretical derivation.
In this section, the simulation model was first established and the proposed method has been verified by comparing it with other well-known entropy methods. Then, the capacity of BMDE to capture more frequency information in spectrum processing has also been analyzed and proven.

Experimental evaluation
To evaluate the feasibility and advantage of the proposed method in practical working conditions, an experimental test rig was set up, as shown in figure 12. The experimental system is based on a variable frequency motor, rotor shaft, rolling bearing seat, shaft system load disc, radial loading device, grinding installation bracket and coupling. The type of bearing is a 6205 bearing, and each component of the test rig in figure 12 is described in table 3.

Experimental setting
Faults are simulated by substituting the faulty parts or increasing the extent of the axis shift. More than 20 types of faults are simulated by the system, which includes single-portion failures and combined faults. Single-point failures and combined faults will be discussed, respectively. The parameters of the experiment are given in table 4. In order to obtain the Table 5. Parameters of methods set in the experiment. Scale  Layer  Time delay  Threshold  Embedding dimension   HDE  -3  1  -3  MPE  20  -1  -6  MSE  20  ---2  MFE  20  --0.15  2  BMDE  20  -1  -3  MDE  20  experimental data, an acceleration sensor was placed on the axis, which is labeled in figure 12. The constant rotational speed was 2000 rpm. The load was set to 40% and the sampling frequency to 10 240 Hz.

Results and discussion
The performance of the proposed method has been validated by comparing it with other classical entropy models. According to correlated references, the optimal parameter settings of the methods applied are listed in table 5 [5].  figure 13.
To validate the performance in a practical environment, a shaft rubbing fault signal is used. In the simulated portion, the frequencies in the low-frequency band are limited, while the frequencies in the high-frequency band are emphasized. In the experimental shaft-rubbing signal, this condition still pertains.
The performance in the low-frequency procedures is similar, as well. The detailed performance is shown in figure 14.
To examine the quality of the features, features from six methods are first projected onto a two-dimensional plane with the t-distributed stochastic neighbor embedding(t-SNE) method, as shown in figure 15. The clustering shows the feature-extraction ability of the method: the smaller the intraclass distance among samples within the same cluster, and the larger the inter-class distance among clusters, the better performance the feature extraction has.
The features from the proposed BMDE method display the largest inter-class distance and smallest intra-class distance in figure 15(a). There is only a little mix in the margin of class-D and class-E, which means that the BMDE has good separability among features. In figures 15(b) and (e), the features of HDE indicate better inter-class distance than those of MPE, but the intra-class distance of MPE is smaller than that of HDE. In figures 15(c) and (d), the clusters of MDE and MFE are dispersed, and even show a relatively big mix in MFE. Furthermore, the clusters obtained by MSE mix together and can hardly be distinguished, as is shown in figure 15(f). Based on the analysis above, the BMDE method generates the most distinctive features, which are superior to other entropy methods.
Following this, the original features from the six methods are processed using the RF classifier mentioned above. Half of the data will be used to train the classifier and the rest of  the data used to test the identification rate. As figure 16 illustrates, the BMDE method can provide data that has the most distinctive features among the six methods. Significantly, the identification rate of BMDE can reach up to 95.1%, which is favorable in industrial practice. The performance of HDE is notable among the other methods, the identification rate of  In reality, the recognition accuracy will increase with augmentation of the proportion of training data. It is noticeable that when only 15% of the data is set as training data, the proposed method shows high accuracy, which is the equivalent to 90% of the data being trained in HDE, and much higher than that of MDE. This means the proposed method works accurately, especially when the data set is insufficient. While dealing with examples of real-world industrial diagnosis, data are usually limited. In this regard, BMDE demonstrates outstanding application prospects. Figure 17 shows the identification results of the proposed BMDE. Only 28 samples are misclassified. Based on the confusion matrix, samples of class-4 and samples of class-5 tend to be misclassified, which displays a high degree of consistency. Further, the method performs well in feature extraction with the RF classifier.

Combined failure analysis.
In a real working environment, combined malfunctions are commonly found along with single faults, and so should be considered in any analysis. As such, 11 kinds of combined failures and health working data are collected from the experiment bench shown in figure 12. The combined failures include full annular rubbing with a shaft coupling fault and shaft crack (FARCFSF), full annular rubbing and shaft rubbing (FARSR), a blade crack(4) with shaft coupling fault (BC(4)SCF), a blisk crack with full annular rubbing (BCFAR), a blisk crack with a shaft crack (BCSC), a blisk crack with shaft rubbing (BCSR), a shaft coupling unbalanced fault with a shaft crack (BCUFSC), shaft rubbing with a shaft fault (SRSF), a shaft crack with a blade crack(4) (SCBC(4)), a shaft crack with a shaft coupling fault (SCSCF) and a shaft crack with shaft rubbing (SCSR). The parameter settings for data collection remain unchanged.
First, the raw features from six methods are processed by data dimension-reduction methods to obtain a visual data distribution. The sample clusters of the proposed BMDE shown in figure 18(a) display a small intra-distance and large interdistance, with few clusters mixed. In figures 18(b), (d) and (e), the clusters of HDE, MFE and MPE are nearly the same, with larger intra-distances and closer inter-distances than that of BMDE. However, the clusters of HDE, MFE and MPE are basically distinguishable. In addition, the result of the MSE method is far from ideal, as shown in figure 18(f).
When 50% of the data is used for training, the recognition rate can reach up to 98.2% with 12 kinds of combined failures by the RF classifier, which is shown in figure 19. The recognition rate achieved by the classifier shows high uniformity, as confirmed in the cluster diagrams. HDE, MFE, and MPE show practical unanimity after the classifier with scattergram.

Conclusion
To address the characteristic issues of entropy-based fault diagnosis using multiscale analysis, this paper proposed a bi-filter multiscale analysis method. To solve the frequency-limitation problem, a multi-low and multi-high procedure were used to filter out low-frequency and high-frequency information. To deal with the data-length loss problem, a bi-filter multiscale analysis was performed to reset the step of the sliding window to overcome the drawbacks of coarse-graining.
The bi-filter analysis is combined with diversity entropy, which thus constitutes the new BMDE method. Finally, a faultdiagnosis framework was developed based on BMDE and RF to achieve accurate fault diagnosis for a rotor-bearing system. The performance of the proposed method was verified by using simulated and experimental data. By comparing with the HDE, MDE, MFE, MPE and MSE methods, it was found that the BMDE could extract more features and provide accurate entropy values with long data lengths. Thus, it can generate the highest diagnostic accuracy and shows superior performance in resisting environmental noises compared with the aforementioned methods.
In future work, bi-filter multiscale analysis will be combined with other entropy-based methods which can provide accurate and low-volatility features. The application of bifilter multiscale analysis shows great potential in weak feature extraction for fault diagnosis.

Data availability statement
The data cannot be made publicly available upon publication because they contain commercially sensitive information. The data that support the findings of this study are available upon reasonable request from the authors.