Brought to you by:
Paper

Adversarial domain adaptation with classifier alignment for cross-domain intelligent fault diagnosis of multiple source domains

, , and

Published 11 December 2020 © 2020 IOP Publishing Ltd
, , Citation Yongchao Zhang et al 2021 Meas. Sci. Technol. 32 035102 DOI 10.1088/1361-6501/abcad4

0957-0233/32/3/035102

Abstract

Recently, most cross-domain fault diagnosis methods focus on single source domain adaptation. However, it is usually possible to obtain multiple labeled source domains in real industrial scenarios. The question of how to use multiple source domains to extract common domain-invariant features and obtain satisfactory diagnosis results is a difficult one. This paper proposes a novel adversarial domain adaptation with a classifier alignment method (ADACL) to address the issue of multiple source domain adaptation. The main elements of ADACL consist of a universal feature extractor, multiple classifiers and a domain discriminator. The parameters of the main elements are simultaneously updated via a cross-entropy loss, a domain distribution alignment loss and a domain classifier alignment loss. Under the framework of multiple loss cooperative learning, not only is the distribution discrepancy among all domains minimized, but so is the prediction discrepancy of target domain data among all classifiers. Two experimental cases on two source domains and three source domains verify that the ADACL can remarkably enhance the cross-domain diagnostic performance under diverse operating conditions. In addition, the diagnostic performance of different methods is extensively evaluated under noisy environments with a different signal-to-noise ratio.

Export citation and abstract BibTeX RIS

1. Introduction

Rolling bearings or gears play a key and irreplaceable role in rotating machinery, which usually operates in poor operating environments, changing operating conditions or shock loads. The incipient faults may cause system downtime, equipment damage or casualties [13]. Thus, the fault diagnosis of rolling bearings or gears under variable operating conditions is becoming increasingly important [47].

With the advent of the Industry 4.0 era, it is possible to collect and store a mass of real-time data from the various sensors of a device. As a result, data-driven artificial intelligence methods such as stack autoencoder network [8], deep belief network [9, 10], convolutional neural network (CNN) [11, 12], recurrent neural network [13] and long short-term memory networks [14] are widely used in the fault diagnosis field. However, these data-driven methods all assume that the training data and testing data come from the same distribution. Deep neural networks have powerful feature extraction capabilities, which can obtain accurate diagnosis accuracy under the same data distribution [15, 16]. Unfortunately, since the operating conditions and noise environments of the same machine may be changeable, it is difficult to maintain the consistency of data distribution in a real industrial scenario. It is an impossible task to collect labeled data for all operating conditions [17]. Hence, these methods cannot establish an effective fault diagnosis model on account of the lack of training data under the same distribution. As a result, a reasonable diagnosis model may not only achieve accurate fault diagnosis under the same distribution, but also has strong generalization ability under similar tasks without labeled data.

To address the domain shift problem caused by a change in the operating conditions, some scholars proposed transfer learning methods, which try to learn domain-invariant features from two domains so as to enhance the diagnostic precision of the target domain. Han et al [18] presented a transfer learning framework based on pre-trained CNN, in which a pre-trained CNN is used to diagnose a new task with proper fine-tuning. Li et al [19] used the fine-tuning pre-training model to enhance the accuracy of cross-domain diagnosis, in which L2 regularization and particle swarm optimization are utilized to optimized traditional DNN, although the transfer learning model based on fine-tuning can improve the diagnostic accuracy of the target domain by fine-tuning a small number of target domain labeled samples. However, in an unsupervised setting without the target domain label, transfer learning based on fine-tuning will no longer be applicable. Currently, some unsupervised transfer learning methods have been proposed, also known as domain adaptation methods. Lu et al [20] presented a deep neural network with maximum mean discrepancy (MMD), in which the MMD is minimized to reduce the distribution discrepancy of the source domain and target domain. Wen et al [21] proposed a domain adaptation method based on a three-layer sparse auto-encoder, in which the MMD term is also minimized to implement cross-domain fault diagnosis. Li et al [22] further enhanced the accuracy of cross-domain fault diagnosis by minimizing the multi-kernel MMD between the two domains. Zhu et al [23] also proposed a new multi-layer domain adaptation method to achieve the bearing fault diagnosis under diverse operating conditions. Ma et al [24] presented an improved transfer learning algorithm based on weighted transfer component analysis to reduce the distribution discrepancy between two diverse domains. Yang et al [25] proposed a feature-based transfer neural network for locomotive bearing fault diagnosis using the data collected from the laboratories. Jiao et al [26] proposed a new unsupervised diagnostic network, in which the classifier discrepancy is utilized to learn class-separable and domain-invariant features. Han et al [27] proposed an adversarial learning framework by adding an additional discriminative classifier, and the applicability and superiority of this framework are demonstrated on two fault datasets. Li et al [28] proposed an unsupervised cross-domain diagnosis method based on adversarial training for vibration signals at different positions. Guo et al [29] presented a deep convolutional transfer learning network using one-dimensional CNN to achieve fault diagnosis of unlabeled data, in which both a domain classification error and an MMD loss are simultaneously optimized to implement domain invariants. Despite numerous researches on domain adaptation under diverse operating conditions, they only study the domain adaptation problem in the case of a single source domain.

The domain adaptation methods have achieved good cross-domain diagnosis results in a single source domain scenario when the gap between the source domain and the target domain is small [30, 31]. Generally, the domain adaptation task of fault diagnosis assumes that the discrepancy between the two domains is small. This assumption may not be true in actual fault diagnosis scenarios, because the operating conditions of the machine often vary widely. Diagnosis knowledge learned in a relatively stable operating condition is difficult to transfer to extreme operating conditions with large gaps [32]. Thus, it is difficult to achieve accurate diagnosis through a single source domain. In the industrial scenarios, label data of multiple operating conditions can be obtained. The diagnosis knowledge from multiple operating conditions may solve the issue of fault diagnosis in extreme operating conditions. Recently, the multi-source domain adaptation has been studied in a small number of research fields, such as image classification [33, 34] and text classification [35]. However, how to use multi-source domain data to improve the classification accuracy of the target domain has received far from enough attention in the field of cross-domain fault diagnosis [36].

Inspired by the multi-source domain adaptation, an adversarial domain adaptation with classifier alignment (ADACL) method is adopted for unsupervised multi-source domain adaptation under diverse operating conditions. The main contributions of this study are as follows:

  • (a)  
    A novel adversarial learning network architecture is proposed to conduct the multi-source domain adaptation task, which realizes the information sharing among multi-source domains and target domain.
  • (b)  
    A novel dual alignment mechanism is developed: domain distribution alignment based on adversarial learning across all source domains and target domain, and classifier alignment among each classifier prediction. This aligns the multiple feature spaces and improves the diagnostic accuracy of the decision boundary sample.
  • (c)  
    Two experimental scenarios for two and three source domains are conducted on the proposed method and three comparison approaches, and the experimental results demonstrate that the proposed method obtains the state-of-the-art cross-domain diagnosis result.

The rest of this paper is summarized as follows. Section 2 details the proposed method. Section 3 presents two experimental cases under two source domains and three source domains. Finally, conclusions are briefly summarized in section 4.

2. Proposed cross-domain diagnosis method

2.1. Problem formulation

A single-source domain adaptation problem under diverse operating conditions has been widely studied [37]. In this study, the multi-source unsupervised domain adaptation method is proposed for cross-domain fault diagnosis. The illustration of single-source and multi-source domain adaptations is shown in figure 1. It can be seen that the diagnosis accuracy of the target domain can be improved through the organic combination of multiple source domains.

Figure 1.

Figure 1. Illustration for single-source and multi-source domain adaptations.

Standard image High-resolution image

Let Dsj = $\{{x}_i^{{sj}}$, ${y}_i^{{sj}}\}^{i=1}_{nsj}$ i = 1, 2, ..., nsj, j= 1, 2, ..., Ns denote the labeled source samples, where ${x}_i^{{sj}}$ indicates the sample features of Ninput dimensions from source domain j, ysj represents the corresponding labels from source domain j, nsj indicates the number of labeled samples of domain j and Ns indicates the number of the source domains. Let Dt = {Dt train, Dt test} denote the target domain samples; Dt train and Dt test indicate target domain training samples and testing samples, respectively. Among them, Dt train = $\{{x}_i^{\text{train}}\}_{i=1}^{n_{\text{train}}}$, i = 1, 2,...,ntrain denote the unlabeled target samples, where xtrain i indicates the sample features, and ntrain indicates the number of unlabeled samples. Correspondingly, Dt test = $\{{{x}_i^{\text{test}}, {y}_i^{\text{train}}}\}_{i=1}^{n_{\text{test}}}$, i = 1, 2,...,ntest denote the labeled target samples, where xi test indicates the sample features, yi test represents the corresponding labels, and ntest indicates the number of labeled samples. The purpose of the study is to train a model that can accurately diagnose Dt test through Dsj and Dt train.

2.2. Proposed network

The architecture overview and basic network of the proposed method are shown in figure 2, which includes feature extractor G, domain discriminator D and classifier C. The G receives multiple source domain samples and target domain samples simultaneously, and the one-dimensional feature vectors x= F(x) are obtained. After the feature vectors, all domain samples are fed into the D, which aims to align feature distributions of all domains. Specifically, the D learns domain-invariant features using adversarial learning between D and G by adding a simple gradient reversal layer (GRL). After the feature vectors, the classifier $\{C_j\}^{N_s}_{j=1}$ is added. All classifiers have a uniform network structure and initialization parameters. Each classifier Cj receives F(x) coming from jth source domain samples and target domain samples simultaneously and outputs the corresponding prediction label.

Figure 2.

Figure 2. The architecture overview of the ADACL. The framework receives multi-source and target instances, and updates the parameters using both the domain distribution alignment and domain classifier alignment to implement classification of the target samples. Blue and green arrows indicate the data flows of source data; red arrows indicate the data flows of target data (best viewed in color).

Standard image High-resolution image

2.3. Optimization objective

  • (a)  
    Source domain classification loss term

The multi-source domain fault diagnosis task is studied in this paper. Each set of source domain data xsj corresponds to a classifier Cj . The parameters of the G and the Cj are updated using corresponding source domain samples. The cross-entropy loss is adopted in this term [38]. The cross-entropy loss of the all classifier Cj on labeled source samples $\{{{x}_i^{{sj}}}{{y}_i^{sj}}\}_{i=1}^{n_{sj}}$ is formulated as:

Equation (1)

where Ns indicates the number of all the source domains, ${\mathcal{L}}^j_{\text{cls}}$(·) indicates the cross-entropy loss of jth classifier, and ${\mathcal{L}}^j_{\text{cls}}$(·) is defined as follows:

Equation (2)

Here Cj (F(xsj )) denotes the prediction probability output of xsj by classifier Cj, and k indicates the number of health conditions.

The parameters of feature extractor, domain discriminator and classifier are defined as θG, θD and θCj , while the parameters of all classifiers are defined as θC . For each classifier, the cross-entropy loss is minimized to seek optimal parameters of θG and θC , which can be expressed as follows:

Equation (3)

  • (b)  
    Domain distribution alignment loss term

To achieve the first domain distribution alignment stage, the domain discriminator D is introduced into the training process. There are Ns + 1 neurons in the final output layer of D, and each neuron represents a specific domain. The cross-entropy loss is also used in this term. Therefore, the domain classification loss can be defined as:

Equation (4)

where D(F(x)) denotes the prediction probability output of x by D.

In the training process, the distribution shift in the shared feature space F(x) is reduced through an adversarial learning process. First, the purpose of D is to accurately distinguish all domain samples. On the contrary, the purpose of G is to confuse the D. To be specific, the parameters of the D are updated to minimize ${\mathcal{L}}$ d and the parameters of the G are updated to maximize ${\mathcal{L}}$ d simultaneously using all the domain samples, which attempts to align domain distribution discrepancy [39]. Thus, the optimal parameters can be obtained by training this function:

Equation (5)

In this scenario, the parameters of equation (5) cannot be directly optimized because ${\mathcal{L}}$ d is maximized by the G and ${\mathcal{L}}$ d is minimized by the D simultaneously. To solve the problem, the GRL is introduced as shown in figure 2. GRL acts on the back propagation process of the network, and the sign of gradient is flipped after the gradient passes through the GRL [40]. This procedure ingeniously solves the issue that the maximum and minimum gradient cannot be trained simultaneously. In ADACL, the GRL is introduced between the G and the D. To be specific, the GRL can be written as a function R(x):

Equation (6)

where λ indicates the penalty coefficient and λ = 1 in the proposed method, and I denotes an identity matrix. In this way, equation (5) can be formulated as:

Equation (7)

  • (c)  
    Domain classifier alignment loss term

The domain distribution alignment can only achieve the confusion of the multi-domain features. Therefore, the target domain samples near the decision boundary are easily misclassified by the classifier. Intuitively, for these target samples close to the decision boundary, different classifiers are likely to make different predictions. In this case, the number of misclassified samples close to the decision boundary is reduced by reducing the predicted disagreement of classifiers.

To be specific, all classifiers are used to construct a discriminator, and the L1-norm of all the classifier predictions of target train samples xtrain as discrepancy loss:

Equation (8)

where Cj (F(xtrain)) and Ci (F(xtrain)) denote the prediction probability output of xtrain by classifier Cj and Ci . In the same way, the optimal parameters can be obtained by training this function:

Equation (9)

  • (d)  
    Overall formulation of the proposed method

To sum up, the proposed method includes three loss terms. All the terms are integrated to obtain the following overall optimization objective:

Equation (10)

where ${\mathcal{L}}$ denotes total loss, and α and β are two trade-off parameters.

Specifically, the G is optimized to minimize the ${\mathcal{L}}$ cls, ${\mathcal{L}}$ d and ${\mathcal{L}}$ dis. The D is optimized to minimize the ${\mathcal{L}}$ d . The C is optimized to minimize the ${\mathcal{L}}$ cls and ${\mathcal{L}}$ dis. Thus, the overall optimization problem is as follows:

Equation (11)

2.4. Training procedure

The algorithm of the ADACL is summarized in Algorithm 1. During the training process, we feed uniform numbers of source samples and target samples to the network to calculate the three losses simultaneously, and minimize total loss by stochastic gradient descent (SGD) until the maximal epoch is met [41]. Finally, to test the network performance on the target domain testing samples, the average of all classifier outputs is taken as the final output.

Algorithm 1 Training procedure of ADACL

 
Input: Labeled source domain samples Dsj = $\{{{x}_i^{{sj}}}, {{y}_i^{sj}}\}_{i=1}^{n_{sj}}$, Unlabeled target domain samples Dt train = $\{{x}_i^{\text{train}}\}_{i=1}^{n_{\text{train}}}$, and the number of training epoch E.
Output: Configurations of ADACL
 1: Initialize θG, θD and θC
 2: for e = 1 to E do
 3:Uniformly sample (xs, ys ) from Dsj
 4:Uniformly sample (xtrain) from Dt train
 5:Gradually change α and β from 0 to 1
 6:Feed the Dsj and Dt train to the G, D and C
 7:Calculate ${\mathcal{L}}_{\text{cls}}$ using equation (1)
 8:Calculate ${\mathcal{L}}_{{d}}$ using equation (4)
 9:Calculate ${\mathcal{L}}_{\text{dis}}$ using equation (8)
 10:Optimize multiple subnetworks G, C, and D in turn:
 11: $({\hat \theta _G}) = \arg \{ \mathop {\min }\limits_{{\theta _G}} \mathcal{L}_{_{cls}}^{},\mathop {\min }\limits_{{\theta _G}} \mathcal{L}_d^{},\mathop {\min }\limits_{{\theta _G}} \mathcal{L}_{_{dis}}^{}\} $
 12: $({\hat \theta _D}) = \mathop {\arg \min }\limits_{{\theta _D}} \mathcal{L}_d^{}$
 13: $({\hat \theta _C}) = \arg \{ \mathop {\arg \min }\limits_{{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{cls}}^{},\mathop {\arg \min }\limits_{{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{dis}}^{}\} $
 14: end for

3. Experimental verification

3.1. Implementation details

In the network architecture, some effective operations [42, 43], namely convolution (Conv), batch normalization, rectified linear unit and maximum pooling (MP) are introduced to improve the ability of feature extraction. The architecture and parameters of the ADACL are displayed in table 1, where k indicates the number of health conditions and Ns indicates the number of the source domains. As the vibration signal is one-dimensional data, one-dimensional CNN is adopted. The raw signal is inputted into the G directly without manually extracting features, which avoids heavy manual signal processing and improves the application ability of the industry. The length of each sample is 2048.

Table 1. The architecture and parameters of the ADACL.

Network moduleNetwork layersKernel sizeStride sizePadding sizeInput/output size
G Conv9112048 × 1/2042 × 4
MP2202042 × 4/1021 × 4
Conv9111021 × 4/1015 × 8
MP2201015 × 8/507 × 8
Conv911507 × 8/501 × 16
MP220501 × 16/250 × 16
Conv911250 × 16/244 × 32
MP220244 × 32/122 × 32
Conv911122 × 32/116 × 64
MP220116 × 64/58 × 64
Conv91158 × 64/52 × 128
MP22052 × 128/26 × 128
Flat  26 × 128/3328
Linear  3328/256
Linear  256/128
D Linear  128/10
Linear  10/1 + Ns
C Linear  128/k

In network training, the optimization objective is optimized via the SGD with a momentum of 0.9. The epoch and batch size of training processes are set at 500 and 50, respectively. The initial learning rate is 0.1, and its value decreases by 10% after every 100 epochs. The proposed network is implemented in the PyTorch framework. In calculating the total loss, α and β are gradually changed from 0 to 1 to suppress noisy signals in the initial stage of the training process, and they can be calculated as [40]:

Equation (12)

where i indicates ith epoch in the training, E indicates the number of the epoch, and γ is 10 throughout the experiments.

3.2. Comparison approaches

To assess the effectiveness of the ADACL comprehensively, several common methods are implemented as comparisons. The network structure and parameters of these comparison approaches are similar to the proposed method, and they are implemented on the same dataset.

  • (a)  
    Without domain adaptation (WDA).

First, a simple method WDA is carried out, where the model is trained only using labeled source domain samples. In this approach, the network structure includes a feature extractor and a classifier, and cross-entropy loss is adopted as an optimization objective. Since the multi-source domain scenario is studied in this paper, WDA can be divided into two schemes: 1. WDAhigh: with one source domain as the training data, WDAhigh represents the best single source domain diagnosis results in multiple source domains. 2. WDAcom: WDAcom represents the diagnosis results with all source domains are combined as the training data.

  • (b)  
    MMD-based domain adaptation method.

In this approach, the network structure includes a feature extractor, an MMD adaptation layer and a classifier, and the MMD adaptation layer is added behind the feature extractor. The MMD is used to minimize the distribution discrepancy of the two domains. Hence, a cross-entropy loss and MMD loss are adopted as the optimization objective in this comparison approach. In the same way, because of the existence of multiple source domains, MMDhigh is defined as the best single source domain diagnosis in multiple source domains.

  • (c)  
    Domain adversarial network (DAN)-based domain adaptation method

Considering that the proposed method is an extension of the adversarial method, the original domain adversarial method is used as a comparison approach. In this approach, the network structure is consistent with the ADACL. The cross-entropy loss and domain adversarial loss are adopted as optimization objectives. Similarly, DANhigh is defined as the best single source domain diagnosis in multiple source domains. DANcom is defined as the diagnosis results in all source domains combined as the training data, which also represents the proposed method without considering the domain classifier alignment loss.

3.3. Case 1: analysis of the bearing test-rig dataset

3.3.1. Experimental setup.

In this study, a practical bearing test-rig is constructed first to assess the effectiveness of the ADACL. In this experiment, the accelerometers used to collect vibration signals are installed on the bearing house, and the vertical vibration signals are used for fault analysis. This experiment considers five health conditions i.e. normal (N), outer-race fault (OF), inner-race fault (IF), ball fault (BF) and compound fault (OF_BF). The bearing test-rig and three types of fault are shown in figure 3. The experiment is implemented under three rotating speed conditions, namely 600 rpm (R1), 1200 rpm (R2) and 1800 rpm (R3). The sampling time is set at 220 s, and the sampling frequency is 20 kHz.

Figure 3.

Figure 3. The bearing test-rig and three types of fault.

Standard image High-resolution image

The waveforms of all health conditions under the three operating conditions are shown in figure 4. From the vibration waveforms, in the normal condition, no obvious impacts appear in various operating conditions. However, in the four fault conditions, there are some distinctions between the transient fault characteristics of different operating conditions. The generalization ability of the deep learning model is seriously weakened due to domain shift caused by the feature distribution discrepancy. In existing studies, the domain adaptation technique can be utilized to shorten the domain distribution discrepancy.

Figure 4.

Figure 4. The waveforms of various health conditions under the three operating conditions.

Standard image High-resolution image

In this experiment, three operating conditions (R1, R2 and R3) are studied. According to the multi-source domain adaptation scenario in this study, three domain adaptation tasks are designed. That said, the model is trained with two source domains and a target domain. The details of the three domain adaptation tasks are shown in table 2.

Table 2. The details of three tasks on the bearing test-rig dataset.

 The operating conditionNumber of samples 
 Source domainTarget domainSourceTargetTarget 
Task(rpm)(rpm)trainingtrainingtestingFault categories
R1, R2R3 600, 120018002 × 5 × 4005 × 4005 × 200 
R1, R3R2 600, 180012002 × 5 × 4005 × 4005 × 200Five conditions (labels 0–4)
R2, R3R1 1200, 18006002 × 5 × 4005 × 4005 × 200 

3.3.2. Experimental results.

The diagnostic performance on the bearing test-rig dataset is analyzed. The fault diagnostic results of all methods in three domain adaptation tasks are given in table 3. The average diagnostic accuracy rate of the proposed method is 90.7%, which is the best diagnostic accuracy rate among all diagnostic methods. This sufficiently demonstrates that the ADACL can effectively achieve cross-domain fault diagnosis under diverse operating conditions, and its results are better than existing methods.

Table 3. Fault classification accuracy on the bearing test-rig dataset.

 Fault classification accuracy (%)
Method R1, R2R3 R1, R3R2 R2, R3R1 Average
WDAhigh 65.665.832.557.6
WDAcom 81.379.256.272.2
MMDhigh 90.289.162.880.7
DANhigh 89.189.369.382.9
DANcom 90.491.971.584.6
Proposed 94.8 99.6 77.7 90.7

To be specific, some conclusions can be drawn. The WDAcom of combining two source domains as a training source domain is generally better than using a single source domain method on the diagnosis result. This may be due to the fact that the addition of discrepant samples expands the generalization ability of the network, thereby improving the accuracy of diagnosis. As can be seen from the results of MMDhigh and DANhigh, in most domain adaptation tasks, encouraging results can be obtained by learning the domain-invariant representation of source domain and target domain. In DANcom, compared with single source adversarial learning, the diagnostic performance is not significantly improved only by introducing multiple source domains into the DAN. However, adding domain classifier alignment on the basis of multi-source adversarial learning can narrow the gap among all classifiers and improve the classification performance of all classifiers. Hence, these results indicate that the ADACL can effectively achieve cross-domain fault diagnosis in two source domains, although the operating conditions vary greatly among all domains.

To further show the advantage of the ADACL, the t-distributed stochastic neighbor embedding technique (t-SNE) [44] is used to implement two-dimensional feature visualization of output features. Taking the task R1, R3R2 as an example, the two-dimensional visualization results of the training process are shown in figure 5. It is clear that the features of two source domain samples and target domain training samples are well separated, and the feature distributions of the same health conditions are aligned. Figure 6 shows the visualization results of the testing process for all methods. As shown in figure 6, the feature map by WDAhigh does not cluster well in health conditions N, IF and OF, because the three health conditions may present similar features under different operating conditions. The clustering conditions are improved in health conditions N, IF and OF by introducing the general domain adaptation method, but there is still a slight overlap between IF and OF. However, it is clear that the learned feature by ADACL achieves the best clustering and separability in all health conditions.

Figure 5.

Figure 5. The visualized features of the training process on the bearing test-rig dataset.

Standard image High-resolution image
Figure 6.

Figure 6. The visualized features of the testing process for all methods on the bearing test-rig dataset.

Standard image High-resolution image

To present detailed diagnostic results for each health condition by different methods in task C1, C3→C2, the confusion matrices of the six methods on the bearing test-rig dataset are given in figure 7. In the confusion matrices, the predicted label and the ground-truth label are displayed, and the prediction accuracy rates and prediction error rates for each health condition are also displayed. It is clear from figure 7 that the ADACL can achieve accurate predictions of all health conditions on the bearing test-rig dataset.

Figure 7.

Figure 7. The confusion matrices in task C1, C3→C2 on the bearing test-rig dataset (0, 1, 2, 3 and 4 represent BF, IF, N, OF and OF_BF respectively).

Standard image High-resolution image

Considering that there may be a variety of noises in the actual industrial scenario, the diagnostic performance is evaluated under the noisy environment in the task C1, C3→C2 to verify the generalization ability of the proposed method. In this experiment, Gaussian noise with signal-to-noise ratios (SNR) of −4, 0, 4, 8 are added to the source domain signal and the target domain signal, where the SNR is defined as [45]:

Equation (13)

where Psignal and Pnoise indicate the powers of the original signal and the added noise, respectively.

Figure 8 shows the diagnostic results of the proposed method and comparison methods under different SNR on the bearing test-rig dataset. It can be seen that the environmental noise significantly reduces the diagnostic performance of the model in methods WDAhigh, MMDhigh and DANcom. However, in the proposed method and the WDAcom method, environmental noise has little effect on diagnostic performance. Since the introduction of multi-source domains improves the generalization ability of the model, the performance of the model against noise is enhanced. Therefore, the proposed method can also obtain accurate prediction results in a noisy environment.

Figure 8.

Figure 8. The diagnostic results of all methods under different SNR on the bearing test-rig dataset.

Standard image High-resolution image

3.4. Case 2: analysis of the generic gearbox dataset

3.4.1. Experimental setup.

The generic gearbox dataset is analyzed as the second case. The generic gearbox dataset is from the 2009 challenge data of Prognostics and Health Management [46]. The inside details of the gearbox are shown in figure 9. The main elements of the generic gearbox consist of four gears, three axes (input axis IS, intermediate axis ID, output axis OS) and six bearings (three on input side: IS and three on output side: OS). This dataset considers eight health conditions, as shown in table 4. Four rotating speeds are considered in this study, namely 35 Hz (R1), 40 Hz (R2), 45 Hz (R3) and 50 Hz (R4). The vibration signals are obtained from the output shaft end by an accelerometer with a sampling frequency of 66.67 kHz.

Figure 9.

Figure 9. The inside detail of the gearbox.

Standard image High-resolution image

Table 4. Pattern label description of the generic gearbox.

 GearBearingShaft
Label32 T96 T48 T80 TIS:ISID:ISOS:ISIS:OSID:OSOS:OSInputOutput
0GGGGGGGGGGGG
1CGEGGGGGGGGG
2GGEGGGGGGGGG
3GGEBrBGGGGGGG
4CGEBrInBOGGGGG
5GGGBrInBOGGGImG
6GGGGInGGGGGGKs
7GGGGGBOGGGImG

G: good; C: chipped; E: eccentric; Br: broken; B: ball; In: inner race; O: outer race; Im: imbalance; Ks: keyway sheared.

In this experiment, four operating conditions (R1, R2, R3 and R4) are studied. In the first case, the proposed method is validated in a scenario with two source domains. Therefore, in this experiment, a scenario with three source domains was studied. Four domains can implement four domain adaptation tasks; the details of the four domain adaptation tasks are shown in table 5.

Table 5. The details of the four tasks on the generic gearbox dataset.

 The operating conditionNumber of samplesFault categories
TaskSource domain (Hz)Target domain (Hz)Source trainingTarget trainingTarget testing
R1, R2, R3R4 35, 40, 45503 × 8 × 1008 × 1008 × 100Eight conditions
R1, R2, R4R3 35, 40, 50453 × 8 × 1008 × 1008 × 100(labels 0–7)
R1, R3, R4R2 30, 45, 50403 × 8 × 1008 × 1008 × 100 
R2, R3, R4R1 40, 45, 50353 × 8 × 1008 × 1008 × 100 

3.4.2. Experimental results.

The diagnostic performance on the generic gearbox dataset is analyzed in three source domain scenarios. The statistics of the diagnostic results of all methods in four domain adaptation tasks are given in table 6. Similar to the case study in the bearing test-rig dataset, the results of the ADACL are significantly better than those of the comparison methods. This sufficiently demonstrates that the proposed method can effectively achieve cross-domain fault diagnosis under three source domains.

Table 6. Fault classification accuracy on the generic gearbox dataset.

 Fault classification accuracy (%)
Method R1, R2, R3R4 R1, R2, R4R3 R1, R3, R4R2 R2, R3, R4R1 Average
WDAhigh 76.378.680.279.378.6
WDAcom 85.787.888.487.387.3
MMDhigh 88.690.391.590.190.1
DANhigh 91.691.190.790.591.0
DANcom 93.895.694.992.394.2
Proposed method 96.3 97.9 97.3 96.8 97.1

Similarly, to visually demonstrate the superiority of the ADACL, taking the task R1, R2, R4R3 as an example, the visualization results of the training process are shown in figure 10. It shows that the four domain data are well separated, and the features of the four domains are well aligned. The visualization results and confusion matrices of the testing process for all methods on the task R1, R2, R4→R3 are also given in figures 11 and 12. It is clear that some areas are overlapping between diverse health conditions by the compared methods. Meanwhile, the features of all categories by the proposed method are well separated, which means that the ADACL can correctly distinguish between diverse health conditions.

Figure 10.

Figure 10. The visualized features of the training process on the generic gearbox dataset.

Standard image High-resolution image
Figure 11.

Figure 11. The visualized features of the testing process for all methods on the generic gearbox dataset.

Standard image High-resolution image
Figure 12.

Figure 12. The confusion matrices in task R1, R2, R4R3 on the generic gearbox dataset.

Standard image High-resolution image

Figure 13 shows the diagnostic results of the proposed method and comparison methods under different SNR on the task R1, R2, R4→R3. It is clear that the diagnostic performance of the proposed method is less affected by environmental noise. Therefore, the proposed method also has a strong ability to resist noise in the case of three source domains. To sum up, case 2 further illustrates that the proposed method can also accurately achieve cross-domain fault diagnosis in the case of three source domains.

Figure 13.

Figure 13. The diagnostic results of all methods under different SNR on the generic gearbox dataset.

Standard image High-resolution image

4. Conclusions

In this study, a novel ADACL method is proposed for the cross-domain intelligent fault diagnosis of multiple source domains under diverse operating conditions. In the proposed method, three loss items, namely a cross-entropy loss, a domain distribution alignment loss and a domain classifier alignment loss are introduced into the training process simultaneously to align the domain distribution of all domains and the target domain outputs of all the classifiers. The proposed method can accurately identify the target samples that are easily misclassified at the class boundary, thereby improving the diagnostic performance.

Two experimental scenarios for two source domains and three source domains are conducted to verify the effectiveness of the ADACL. It is observed that the addition of multiple source domains contributes to better learning of domain-invariant features under different operating conditions; thus, the proposed method achieves a state-of-the-art cross-domain diagnosis result compared to the comparison methods. In particular, the proposed method achieves excellent anti-noise performance relative to other methods. Consequently, the ADACL provides a promising tool in addressing cross-domain fault diagnosis problems in real industrial scenarios.

Acknowledgments

The research was supported by the Fundamental Research Funds for the Central Universities (N180304018) and also supported by the National Key Research and Development Program of China (2017YFB1103700).

Please wait… references are loading.