Adversarial domain adaptation with classifier alignment for cross-domain intelligent fault diagnosis of multiple source domains

Yongchao Zhang; Zhaohui Ren; Shihua Zhou; Tianzhuang Yu

doi:10.1088/1361-6501/abcad4

1. Introduction

Rolling bearings or gears play a key and irreplaceable role in rotating machinery, which usually operates in poor operating environments, changing operating conditions or shock loads. The incipient faults may cause system downtime, equipment damage or casualties [1–3]. Thus, the fault diagnosis of rolling bearings or gears under variable operating conditions is becoming increasingly important [4–7].

With the advent of the Industry 4.0 era, it is possible to collect and store a mass of real-time data from the various sensors of a device. As a result, data-driven artificial intelligence methods such as stack autoencoder network [8], deep belief network [9, 10], convolutional neural network (CNN) [11, 12], recurrent neural network [13] and long short-term memory networks [14] are widely used in the fault diagnosis field. However, these data-driven methods all assume that the training data and testing data come from the same distribution. Deep neural networks have powerful feature extraction capabilities, which can obtain accurate diagnosis accuracy under the same data distribution [15, 16]. Unfortunately, since the operating conditions and noise environments of the same machine may be changeable, it is difficult to maintain the consistency of data distribution in a real industrial scenario. It is an impossible task to collect labeled data for all operating conditions [17]. Hence, these methods cannot establish an effective fault diagnosis model on account of the lack of training data under the same distribution. As a result, a reasonable diagnosis model may not only achieve accurate fault diagnosis under the same distribution, but also has strong generalization ability under similar tasks without labeled data.

To address the domain shift problem caused by a change in the operating conditions, some scholars proposed transfer learning methods, which try to learn domain-invariant features from two domains so as to enhance the diagnostic precision of the target domain. Han et al [18] presented a transfer learning framework based on pre-trained CNN, in which a pre-trained CNN is used to diagnose a new task with proper fine-tuning. Li et al [19] used the fine-tuning pre-training model to enhance the accuracy of cross-domain diagnosis, in which L2 regularization and particle swarm optimization are utilized to optimized traditional DNN, although the transfer learning model based on fine-tuning can improve the diagnostic accuracy of the target domain by fine-tuning a small number of target domain labeled samples. However, in an unsupervised setting without the target domain label, transfer learning based on fine-tuning will no longer be applicable. Currently, some unsupervised transfer learning methods have been proposed, also known as domain adaptation methods. Lu et al [20] presented a deep neural network with maximum mean discrepancy (MMD), in which the MMD is minimized to reduce the distribution discrepancy of the source domain and target domain. Wen et al [21] proposed a domain adaptation method based on a three-layer sparse auto-encoder, in which the MMD term is also minimized to implement cross-domain fault diagnosis. Li et al [22] further enhanced the accuracy of cross-domain fault diagnosis by minimizing the multi-kernel MMD between the two domains. Zhu et al [23] also proposed a new multi-layer domain adaptation method to achieve the bearing fault diagnosis under diverse operating conditions. Ma et al [24] presented an improved transfer learning algorithm based on weighted transfer component analysis to reduce the distribution discrepancy between two diverse domains. Yang et al [25] proposed a feature-based transfer neural network for locomotive bearing fault diagnosis using the data collected from the laboratories. Jiao et al [26] proposed a new unsupervised diagnostic network, in which the classifier discrepancy is utilized to learn class-separable and domain-invariant features. Han et al [27] proposed an adversarial learning framework by adding an additional discriminative classifier, and the applicability and superiority of this framework are demonstrated on two fault datasets. Li et al [28] proposed an unsupervised cross-domain diagnosis method based on adversarial training for vibration signals at different positions. Guo et al [29] presented a deep convolutional transfer learning network using one-dimensional CNN to achieve fault diagnosis of unlabeled data, in which both a domain classification error and an MMD loss are simultaneously optimized to implement domain invariants. Despite numerous researches on domain adaptation under diverse operating conditions, they only study the domain adaptation problem in the case of a single source domain.

The domain adaptation methods have achieved good cross-domain diagnosis results in a single source domain scenario when the gap between the source domain and the target domain is small [30, 31]. Generally, the domain adaptation task of fault diagnosis assumes that the discrepancy between the two domains is small. This assumption may not be true in actual fault diagnosis scenarios, because the operating conditions of the machine often vary widely. Diagnosis knowledge learned in a relatively stable operating condition is difficult to transfer to extreme operating conditions with large gaps [32]. Thus, it is difficult to achieve accurate diagnosis through a single source domain. In the industrial scenarios, label data of multiple operating conditions can be obtained. The diagnosis knowledge from multiple operating conditions may solve the issue of fault diagnosis in extreme operating conditions. Recently, the multi-source domain adaptation has been studied in a small number of research fields, such as image classification [33, 34] and text classification [35]. However, how to use multi-source domain data to improve the classification accuracy of the target domain has received far from enough attention in the field of cross-domain fault diagnosis [36].

Inspired by the multi-source domain adaptation, an adversarial domain adaptation with classifier alignment (ADACL) method is adopted for unsupervised multi-source domain adaptation under diverse operating conditions. The main contributions of this study are as follows:

(a)
A novel adversarial learning network architecture is proposed to conduct the multi-source domain adaptation task, which realizes the information sharing among multi-source domains and target domain.
(b)
A novel dual alignment mechanism is developed: domain distribution alignment based on adversarial learning across all source domains and target domain, and classifier alignment among each classifier prediction. This aligns the multiple feature spaces and improves the diagnostic accuracy of the decision boundary sample.
(c)
Two experimental scenarios for two and three source domains are conducted on the proposed method and three comparison approaches, and the experimental results demonstrate that the proposed method obtains the state-of-the-art cross-domain diagnosis result.

The rest of this paper is summarized as follows. Section 2 details the proposed method. Section 3 presents two experimental cases under two source domains and three source domains. Finally, conclusions are briefly summarized in section 4.

2. Proposed cross-domain diagnosis method

2.1. Problem formulation

A single-source domain adaptation problem under diverse operating conditions has been widely studied [37]. In this study, the multi-source unsupervised domain adaptation method is proposed for cross-domain fault diagnosis. The illustration of single-source and multi-source domain adaptations is shown in figure 1. It can be seen that the diagnosis accuracy of the target domain can be improved through the organic combination of multiple source domains.

**Figure 1.** Illustration for single-source and multi-source domain adaptations.
Download figure:
Standard image High-resolution image

Let D^sj = $\{{x}_i^{{sj}}$ , ${y}_i^{{sj}}\}^{i=1}_{nsj}$ i = 1, 2, ..., n_sj, j= 1, 2, ..., N_s denote the labeled source samples, where ${x}_i^{{sj}}$ indicates the sample features of N_input dimensions from source domain j, y_sj represents the corresponding labels from source domain j, n_sj indicates the number of labeled samples of domain j and N_s indicates the number of the source domains. Let D^t = {D^t _train, D^t _test} denote the target domain samples; D^t _train and D^t _test indicate target domain training samples and testing samples, respectively. Among them, D^t _train = $\{{x}_i^{\text{train}}\}_{i=1}^{n_{\text{train}}}$ , i = 1, 2,...,n_train denote the unlabeled target samples, where x^train _i indicates the sample features, and n_train indicates the number of unlabeled samples. Correspondingly, D^t _test = $\{{{x}_i^{\text{test}}, {y}_i^{\text{train}}}\}_{i=1}^{n_{\text{test}}}$ , i = 1, 2,...,n_test denote the labeled target samples, where x_i ^test indicates the sample features, y_i ^test represents the corresponding labels, and n_test indicates the number of labeled samples. The purpose of the study is to train a model that can accurately diagnose D^t _test through D^sj and D^t _train.

2.2. Proposed network

The architecture overview and basic network of the proposed method are shown in figure 2, which includes feature extractor G, domain discriminator D and classifier C. The G receives multiple source domain samples and target domain samples simultaneously, and the one-dimensional feature vectors x= F(x) are obtained. After the feature vectors, all domain samples are fed into the D, which aims to align feature distributions of all domains. Specifically, the D learns domain-invariant features using adversarial learning between D and G by adding a simple gradient reversal layer (GRL). After the feature vectors, the classifier $\{C_j\}^{N_s}_{j=1}$ is added. All classifiers have a uniform network structure and initialization parameters. Each classifier C_j receives F(x) coming from jth source domain samples and target domain samples simultaneously and outputs the corresponding prediction label.

2.3. Optimization objective

(a)
Source domain classification loss term

The multi-source domain fault diagnosis task is studied in this paper. Each set of source domain data x^sj corresponds to a classifier C_j . The parameters of the G and the C_j are updated using corresponding source domain samples. The cross-entropy loss is adopted in this term [38]. The cross-entropy loss of the all classifier C_j on labeled source samples $\{{{x}_i^{{sj}}}{{y}_i^{sj}}\}_{i=1}^{n_{sj}}$ is formulated as:

$\begin{equation}\mathcal{L}_{_{cls}}^{} = \frac{1}{{{N_s}}}\sum\limits_{j = 1}^{{N_s}} {\mathcal{L}_{_{cls}}^j({C_j}(F({x^{sj}})),{y^{sj}})} \end{equation} \tag{ 1 }$

where N_s indicates the number of all the source domains, ${\mathcal{L}}^j_{\text{cls}}$ (·) indicates the cross-entropy loss of jth classifier, and ${\mathcal{L}}^j_{\text{cls}}$ (·) is defined as follows:

$\begin{align}\mathcal{L}_{_{cls}}^j({C_j}(F({x^{sj}})),{y^{sj}}) & = - {\mathbb{E}_{({x^{sj}},{y^{sj}}) \in {D^{sj}}}}\bigg[\sum\limits_{k = 1}^k {\mathbb{I}_{\left[k = {y^{sj}}\right]}} \nonumber\\ & \quad \log ({C_j}(F({x^{sj}}))) \bigg].\end{align} \tag{ 2 }$

Here C_j (F(x^sj )) denotes the prediction probability output of x^sj by classifier C_j, and k indicates the number of health conditions.

The parameters of feature extractor, domain discriminator and classifier are defined as θ_G, θ_D and θ_Cj , while the parameters of all classifiers are defined as θ_C . For each classifier, the cross-entropy loss is minimized to seek optimal parameters of θ_G and θ_C , which can be expressed as follows:

$\begin{equation}({\hat \theta _G},{\hat \theta _C}) = \mathop {\arg \min }\limits_{{\theta _G},{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{cls}}^{}.\end{equation} \tag{ 3 }$

(b)
Domain distribution alignment loss term

To achieve the first domain distribution alignment stage, the domain discriminator D is introduced into the training process. There are N_s + 1 neurons in the final output layer of D, and each neuron represents a specific domain. The cross-entropy loss is also used in this term. Therefore, the domain classification loss can be defined as:

$\begin{equation}\mathcal{L}_{_d}^{} = - {\mathbb{E}_{(x,y) \in {D^{sj}},D_{^{train}}^t}}\left[\sum\limits_{k = 1}^{1 + {N_s}} {{\mathbb{I}_{[y = k]}}\log (D(F(x)))} \right]\end{equation} \tag{ 4 }$

where D(F(x)) denotes the prediction probability output of x by D.

In the training process, the distribution shift in the shared feature space F(x) is reduced through an adversarial learning process. First, the purpose of D is to accurately distinguish all domain samples. On the contrary, the purpose of G is to confuse the D. To be specific, the parameters of the D are updated to minimize ${\mathcal{L}}$ _d and the parameters of the G are updated to maximize ${\mathcal{L}}$ _d simultaneously using all the domain samples, which attempts to align domain distribution discrepancy [39]. Thus, the optimal parameters can be obtained by training this function:

$\begin{equation}\begin{gathered} ({{\hat \theta }_D}) = \mathop {\arg \min }\limits_{{\theta _D}} \mathcal{L}_d^{} \hfill \\ ({{\hat \theta }_G}) = \mathop {\arg \max }\limits_{{\theta _G}} \mathcal{L}_d^{} \hfill \\ \end{gathered} .\end{equation} \tag{ 5 }$

In this scenario, the parameters of equation (5) cannot be directly optimized because ${\mathcal{L}}$ _d is maximized by the G and ${\mathcal{L}}$ _d is minimized by the D simultaneously. To solve the problem, the GRL is introduced as shown in figure 2. GRL acts on the back propagation process of the network, and the sign of gradient is flipped after the gradient passes through the GRL [40]. This procedure ingeniously solves the issue that the maximum and minimum gradient cannot be trained simultaneously. In ADACL, the GRL is introduced between the G and the D. To be specific, the GRL can be written as a function R(x):

$\begin{equation}\begin{gathered} R(x) = x \hfill \\ \frac{{{\text{d}}R(x)}}{{{\text{d}}x}} = - \lambda {\text{I}} \hfill \\ \end{gathered} \end{equation} \tag{ 6 }$

where λ indicates the penalty coefficient and λ = 1 in the proposed method, and I denotes an identity matrix. In this way, equation (5) can be formulated as:

$\begin{equation}({\hat \theta _G},{\hat \theta _D}) = \mathop {\arg \min }\limits_{{\theta _G},{\theta _D}} \mathcal{L}_d^{}.\end{equation} \tag{ 7 }$

(c)
Domain classifier alignment loss term

The domain distribution alignment can only achieve the confusion of the multi-domain features. Therefore, the target domain samples near the decision boundary are easily misclassified by the classifier. Intuitively, for these target samples close to the decision boundary, different classifiers are likely to make different predictions. In this case, the number of misclassified samples close to the decision boundary is reduced by reducing the predicted disagreement of classifiers.

To be specific, all classifiers are used to construct a discriminator, and the L1-norm of all the classifier predictions of target train samples x^train as discrepancy loss:

$\begin{equation}\mathcal{L}_{_{dis}}^{} = {\mathbb{E}_{x \in D_{^{train}}^t}}\frac{1}{{{N_s}}}\sum\limits_{j = 1}^{{N_s} - 1} {\sum\limits_{i = j + 1}^{{N_s}} {{{\left\| {{C_j}(F({x^{train}})) - {C_i}(F({x^{train}}))} \right\|}_1}} } \end{equation} \tag{ 8 }$

where C_j (F(x^train)) and C_i (F(x^train)) denote the prediction probability output of x^train by classifier C_j and C_i . In the same way, the optimal parameters can be obtained by training this function:

$\begin{equation}({\hat \theta _G},{\hat \theta _C}) = = \mathop {\arg \min }\limits_{{\theta _G},{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{dis}}^{}.\end{equation} \tag{ 9 }$

(d)
Overall formulation of the proposed method

To sum up, the proposed method includes three loss terms. All the terms are integrated to obtain the following overall optimization objective:

$\begin{equation}\mathcal{L} = \mathcal{L}_{_{cls}}^{} + \alpha \mathcal{L}_{_d}^{} + \beta \mathcal{L}_{_{dis}}^{}\end{equation} \tag{ 10 }$

where ${\mathcal{L}}$ denotes total loss, and α and β are two trade-off parameters.

Specifically, the G is optimized to minimize the ${\mathcal{L}}$ _cls, ${\mathcal{L}}$ _d and ${\mathcal{L}}$ _dis. The D is optimized to minimize the ${\mathcal{L}}$ _d . The C is optimized to minimize the ${\mathcal{L}}$ _cls and ${\mathcal{L}}$ _dis. Thus, the overall optimization problem is as follows:

$\begin{equation}\begin{gathered} ({{\hat \theta }_G}) = \arg \{ \mathop {\min }\limits_{{\theta _G}} \mathcal{L}_{_{cls}}^{},\mathop {\min }\limits_{{\theta _G}} \mathcal{L}_d^{},\mathop {\min }\limits_{{\theta _G}} \mathcal{L}_{_{dis}}^{}\} \hfill \\[3pt] ({{\hat \theta }_D}) = \mathop {\arg \min }\limits_{{\theta _D}} \mathcal{L}_d^{} \hfill \\[3pt] ({{\hat \theta }_C}) = \arg \{ \mathop {\arg \min }\limits_{{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{cls}}^{},\mathop {\arg \min }\limits_{{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{dis}}^{}\} \hfill \\ \end{gathered}. \end{equation} \tag{ 11 }$

2.4. Training procedure

The algorithm of the ADACL is summarized in Algorithm 1. During the training process, we feed uniform numbers of source samples and target samples to the network to calculate the three losses simultaneously, and minimize total loss by stochastic gradient descent (SGD) until the maximal epoch is met [41]. Finally, to test the network performance on the target domain testing samples, the average of all classifier outputs is taken as the final output.

Algorithm 1 Training procedure of ADACL


Input: Labeled source domain samples D^sj = $\{{{x}_i^{{sj}}}, {{y}_i^{sj}}\}_{i=1}^{n_{sj}}$ , Unlabeled target domain samples D^t _train = $\{{x}_i^{\text{train}}\}_{i=1}^{n_{\text{train}}}$ , and the number of training epoch E.
Output: Configurations of ADACL
1: Initialize θ_G, θ_D and θ_C
2: for e = 1 to E do
3:	Uniformly sample (x^s, y^s ) from D^sj
4:	Uniformly sample (x^train) from D^t train
5:	Gradually change α and β from 0 to 1
6:	Feed the D^sj and Dt train to the G, D and C
7:	Calculate ${\mathcal{L}}_{\text{cls}}$ using equation (1)
8:	Calculate ${\mathcal{L}}_{{d}}$ using equation (4)
9:	Calculate ${\mathcal{L}}_{\text{dis}}$ using equation (8)
10:	Optimize multiple subnetworks G, C, and D in turn:
11:	$({\hat \theta _G}) = \arg \{ \mathop {\min }\limits_{{\theta _G}} \mathcal{L}_{_{cls}}^{},\mathop {\min }\limits_{{\theta _G}} \mathcal{L}_d^{},\mathop {\min }\limits_{{\theta _G}} \mathcal{L}_{_{dis}}^{}\}$
12:	$({\hat \theta _D}) = \mathop {\arg \min }\limits_{{\theta _D}} \mathcal{L}_d^{}$
13:	$({\hat \theta _C}) = \arg \{ \mathop {\arg \min }\limits_{{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{cls}}^{},\mathop {\arg \min }\limits_{{\theta _{{C_{^1}}}},{\theta _{{C_{^2}}}},\ldots,{\theta _{{C_{^{{N_c}}}}}}} \mathcal{L}_{_{dis}}^{}\}$
14: end for

3. Experimental verification

3.1. Implementation details

In the network architecture, some effective operations [42, 43], namely convolution (Conv), batch normalization, rectified linear unit and maximum pooling (MP) are introduced to improve the ability of feature extraction. The architecture and parameters of the ADACL are displayed in table 1, where k indicates the number of health conditions and N_s indicates the number of the source domains. As the vibration signal is one-dimensional data, one-dimensional CNN is adopted. The raw signal is inputted into the G directly without manually extracting features, which avoids heavy manual signal processing and improves the application ability of the industry. The length of each sample is 2048.

Table 1. The architecture and parameters of the ADACL.

Network module	Network layers	Kernel size	Stride size	Padding size	Input/output size
G	Conv	9	1	1	2048 × 1/2042 × 4
	MP	2	2	0	2042 × 4/1021 × 4
	Conv	9	1	1	1021 × 4/1015 × 8
	MP	2	2	0	1015 × 8/507 × 8
	Conv	9	1	1	507 × 8/501 × 16
	MP	2	2	0	501 × 16/250 × 16
	Conv	9	1	1	250 × 16/244 × 32
	MP	2	2	0	244 × 32/122 × 32
	Conv	9	1	1	122 × 32/116 × 64
	MP	2	2	0	116 × 64/58 × 64
	Conv	9	1	1	58 × 64/52 × 128
	MP	2	2	0	52 × 128/26 × 128
	Flat	∼			26 × 128/3328
	Linear	∼			3328/256
	Linear	∼			256/128
D	Linear	∼			128/10
D	Linear	∼			10/1 + N_s
C	Linear	∼			128/k

In network training, the optimization objective is optimized via the SGD with a momentum of 0.9. The epoch and batch size of training processes are set at 500 and 50, respectively. The initial learning rate is 0.1, and its value decreases by 10% after every 100 epochs. The proposed network is implemented in the PyTorch framework. In calculating the total loss, α and β are gradually changed from 0 to 1 to suppress noisy signals in the initial stage of the training process, and they can be calculated as [40]:

$\begin{equation}{\alpha _i} = {\beta _i} = \frac{2}{{1 + \exp ( - \gamma (\frac{i}{{\text{E}}}))}} - 1\end{equation} \tag{ 12 }$

where i indicates ith epoch in the training, E indicates the number of the epoch, and γ is 10 throughout the experiments.

3.2. Comparison approaches

To assess the effectiveness of the ADACL comprehensively, several common methods are implemented as comparisons. The network structure and parameters of these comparison approaches are similar to the proposed method, and they are implemented on the same dataset.

(a)
Without domain adaptation (WDA).

First, a simple method WDA is carried out, where the model is trained only using labeled source domain samples. In this approach, the network structure includes a feature extractor and a classifier, and cross-entropy loss is adopted as an optimization objective. Since the multi-source domain scenario is studied in this paper, WDA can be divided into two schemes: 1. WDA_high: with one source domain as the training data, WDA_high represents the best single source domain diagnosis results in multiple source domains. 2. WDA_com: WDA_com represents the diagnosis results with all source domains are combined as the training data.

(b)
MMD-based domain adaptation method.

In this approach, the network structure includes a feature extractor, an MMD adaptation layer and a classifier, and the MMD adaptation layer is added behind the feature extractor. The MMD is used to minimize the distribution discrepancy of the two domains. Hence, a cross-entropy loss and MMD loss are adopted as the optimization objective in this comparison approach. In the same way, because of the existence of multiple source domains, MMD_high is defined as the best single source domain diagnosis in multiple source domains.

(c)
Domain adversarial network (DAN)-based domain adaptation method

Considering that the proposed method is an extension of the adversarial method, the original domain adversarial method is used as a comparison approach. In this approach, the network structure is consistent with the ADACL. The cross-entropy loss and domain adversarial loss are adopted as optimization objectives. Similarly, DAN_high is defined as the best single source domain diagnosis in multiple source domains. DAN_com is defined as the diagnosis results in all source domains combined as the training data, which also represents the proposed method without considering the domain classifier alignment loss.

3.3. Case 1: analysis of the bearing test-rig dataset

3.3.1. Experimental setup.

In this study, a practical bearing test-rig is constructed first to assess the effectiveness of the ADACL. In this experiment, the accelerometers used to collect vibration signals are installed on the bearing house, and the vertical vibration signals are used for fault analysis. This experiment considers five health conditions i.e. normal (N), outer-race fault (OF), inner-race fault (IF), ball fault (BF) and compound fault (OF_BF). The bearing test-rig and three types of fault are shown in figure 3. The experiment is implemented under three rotating speed conditions, namely 600 rpm (R₁), 1200 rpm (R₂) and 1800 rpm (R₃). The sampling time is set at 220 s, and the sampling frequency is 20 kHz.

**Figure 3.** The bearing test-rig and three types of fault.
Download figure:
Standard image High-resolution image

The waveforms of all health conditions under the three operating conditions are shown in figure 4. From the vibration waveforms, in the normal condition, no obvious impacts appear in various operating conditions. However, in the four fault conditions, there are some distinctions between the transient fault characteristics of different operating conditions. The generalization ability of the deep learning model is seriously weakened due to domain shift caused by the feature distribution discrepancy. In existing studies, the domain adaptation technique can be utilized to shorten the domain distribution discrepancy.

In this experiment, three operating conditions (R₁, R₂ and R₃) are studied. According to the multi-source domain adaptation scenario in this study, three domain adaptation tasks are designed. That said, the model is trained with two source domains and a target domain. The details of the three domain adaptation tasks are shown in table 2.

Table 2. The details of three tasks on the bearing test-rig dataset.

	The operating condition		Number of samples
	Source domain	Target domain	Source	Target	Target
Task	(rpm)	(rpm)	training	training	testing	Fault categories
R₁, R₂→R₃	600, 1200	1800	2 × 5 × 400	5 × 400	5 × 200
R₁, R₃→R₂	600, 1800	1200	2 × 5 × 400	5 × 400	5 × 200	Five conditions (labels 0–4)
R₂, R₃→R₁	1200, 1800	600	2 × 5 × 400	5 × 400	5 × 200

3.3.2. Experimental results.

The diagnostic performance on the bearing test-rig dataset is analyzed. The fault diagnostic results of all methods in three domain adaptation tasks are given in table 3. The average diagnostic accuracy rate of the proposed method is 90.7%, which is the best diagnostic accuracy rate among all diagnostic methods. This sufficiently demonstrates that the ADACL can effectively achieve cross-domain fault diagnosis under diverse operating conditions, and its results are better than existing methods.

Table 3. Fault classification accuracy on the bearing test-rig dataset.

	Fault classification accuracy (%)
Method	R₁, R₂→R₃	R₁, R₃→R₂	R₂, R₃→R₁	Average
WDA_high	65.6	65.8	32.5	57.6
WDA_com	81.3	79.2	56.2	72.2
MMD_high	90.2	89.1	62.8	80.7
DAN_high	89.1	89.3	69.3	82.9
DAN_com	90.4	91.9	71.5	84.6
Proposed	94.8	99.6	77.7	90.7

To be specific, some conclusions can be drawn. The WDA_com of combining two source domains as a training source domain is generally better than using a single source domain method on the diagnosis result. This may be due to the fact that the addition of discrepant samples expands the generalization ability of the network, thereby improving the accuracy of diagnosis. As can be seen from the results of MMD_high and DAN_high, in most domain adaptation tasks, encouraging results can be obtained by learning the domain-invariant representation of source domain and target domain. In DAN_com, compared with single source adversarial learning, the diagnostic performance is not significantly improved only by introducing multiple source domains into the DAN. However, adding domain classifier alignment on the basis of multi-source adversarial learning can narrow the gap among all classifiers and improve the classification performance of all classifiers. Hence, these results indicate that the ADACL can effectively achieve cross-domain fault diagnosis in two source domains, although the operating conditions vary greatly among all domains.

To further show the advantage of the ADACL, the t-distributed stochastic neighbor embedding technique (t-SNE) [44] is used to implement two-dimensional feature visualization of output features. Taking the task R₁, R₃→R₂ as an example, the two-dimensional visualization results of the training process are shown in figure 5. It is clear that the features of two source domain samples and target domain training samples are well separated, and the feature distributions of the same health conditions are aligned. Figure 6 shows the visualization results of the testing process for all methods. As shown in figure 6, the feature map by WDA_high does not cluster well in health conditions N, IF and OF, because the three health conditions may present similar features under different operating conditions. The clustering conditions are improved in health conditions N, IF and OF by introducing the general domain adaptation method, but there is still a slight overlap between IF and OF. However, it is clear that the learned feature by ADACL achieves the best clustering and separability in all health conditions.

**Figure 5.** The visualized features of the training process on the bearing test-rig dataset.
Download figure:
Standard image High-resolution image

**Figure 6.** The visualized features of the testing process for all methods on the bearing test-rig dataset.
Download figure:
Standard image High-resolution image

To present detailed diagnostic results for each health condition by different methods in task C₁, C₃→C₂, the confusion matrices of the six methods on the bearing test-rig dataset are given in figure 7. In the confusion matrices, the predicted label and the ground-truth label are displayed, and the prediction accuracy rates and prediction error rates for each health condition are also displayed. It is clear from figure 7 that the ADACL can achieve accurate predictions of all health conditions on the bearing test-rig dataset.

**Figure 7.** The confusion matrices in task C₁, C₃→C₂ on the bearing test-rig dataset (0, 1, 2, 3 and 4 represent BF, IF, N, OF and OF_BF respectively).
Download figure:
Standard image High-resolution image

Considering that there may be a variety of noises in the actual industrial scenario, the diagnostic performance is evaluated under the noisy environment in the task C₁, C₃→C₂ to verify the generalization ability of the proposed method. In this experiment, Gaussian noise with signal-to-noise ratios (SNR) of −4, 0, 4, 8 are added to the source domain signal and the target domain signal, where the SNR is defined as [45]:

$\begin{equation}{\text{SNR}}({\text{db}}) = 10{\log _{10}}({P_{signal}}/{P_{noise}})\end{equation} \tag{ 13 }$

where P_signal and P_noise indicate the powers of the original signal and the added noise, respectively.

Figure 8 shows the diagnostic results of the proposed method and comparison methods under different SNR on the bearing test-rig dataset. It can be seen that the environmental noise significantly reduces the diagnostic performance of the model in methods WDA_high, MMD_high and DAN_com. However, in the proposed method and the WDA_com method, environmental noise has little effect on diagnostic performance. Since the introduction of multi-source domains improves the generalization ability of the model, the performance of the model against noise is enhanced. Therefore, the proposed method can also obtain accurate prediction results in a noisy environment.

3.4. Case 2: analysis of the generic gearbox dataset

3.4.1. Experimental setup.

The generic gearbox dataset is analyzed as the second case. The generic gearbox dataset is from the 2009 challenge data of Prognostics and Health Management [46]. The inside details of the gearbox are shown in figure 9. The main elements of the generic gearbox consist of four gears, three axes (input axis IS, intermediate axis ID, output axis OS) and six bearings (three on input side: IS and three on output side: OS). This dataset considers eight health conditions, as shown in table 4. Four rotating speeds are considered in this study, namely 35 Hz (R₁), 40 Hz (R₂), 45 Hz (R₃) and 50 Hz (R₄). The vibration signals are obtained from the output shaft end by an accelerometer with a sampling frequency of 66.67 kHz.

**Figure 9.** The inside detail of the gearbox.
Download figure:
Standard image High-resolution image

Table 4. Pattern label description of the generic gearbox.

	Gear				Bearing						Shaft
Label	32 T	96 T	48 T	80 T	IS:IS	ID:IS	OS:IS	IS:OS	ID:OS	OS:OS	Input	Output
0	G	G	G	G	G	G	G	G	G	G	G	G
1	C	G	E	G	G	G	G	G	G	G	G	G
2	G	G	E	G	G	G	G	G	G	G	G	G
3	G	G	E	Br	B	G	G	G	G	G	G	G
4	C	G	E	Br	In	B	O	G	G	G	G	G
5	G	G	G	Br	In	B	O	G	G	G	Im	G
6	G	G	G	G	In	G	G	G	G	G	G	Ks
7	G	G	G	G	G	B	O	G	G	G	Im	G

G: good; C: chipped; E: eccentric; Br: broken; B: ball; In: inner race; O: outer race; Im: imbalance; Ks: keyway sheared.

In this experiment, four operating conditions (R1, R2, R3 and R4) are studied. In the first case, the proposed method is validated in a scenario with two source domains. Therefore, in this experiment, a scenario with three source domains was studied. Four domains can implement four domain adaptation tasks; the details of the four domain adaptation tasks are shown in table 5.

Table 5. The details of the four tasks on the generic gearbox dataset.

	The operating condition		Number of samples			Fault categories
Task	Source domain (Hz)	Target domain (Hz)	Source training	Target training	Target testing	Fault categories
R₁, R₂, R₃→R₄	35, 40, 45	50	3 × 8 × 100	8 × 100	8 × 100	Eight conditions
R₁, R₂, R₄→R₃	35, 40, 50	45	3 × 8 × 100	8 × 100	8 × 100	(labels 0–7)
R₁, R₃, R₄→R₂	30, 45, 50	40	3 × 8 × 100	8 × 100	8 × 100
R₂, R₃, R₄→R₁	40, 45, 50	35	3 × 8 × 100	8 × 100	8 × 100

3.4.2. Experimental results.

The diagnostic performance on the generic gearbox dataset is analyzed in three source domain scenarios. The statistics of the diagnostic results of all methods in four domain adaptation tasks are given in table 6. Similar to the case study in the bearing test-rig dataset, the results of the ADACL are significantly better than those of the comparison methods. This sufficiently demonstrates that the proposed method can effectively achieve cross-domain fault diagnosis under three source domains.

Table 6. Fault classification accuracy on the generic gearbox dataset.

	Fault classification accuracy (%)
Method	R₁, R₂, R₃→R₄	R₁, R₂, R₄→R₃	R₁, R₃, R₄→R₂	R₂, R₃, R₄→R₁	Average
WDA_high	76.3	78.6	80.2	79.3	78.6
WDA_com	85.7	87.8	88.4	87.3	87.3
MMD_high	88.6	90.3	91.5	90.1	90.1
DAN_high	91.6	91.1	90.7	90.5	91.0
DAN_com	93.8	95.6	94.9	92.3	94.2
Proposed method	96.3	97.9	97.3	96.8	97.1

Similarly, to visually demonstrate the superiority of the ADACL, taking the task R₁, R₂, R₄→R₃ as an example, the visualization results of the training process are shown in figure 10. It shows that the four domain data are well separated, and the features of the four domains are well aligned. The visualization results and confusion matrices of the testing process for all methods on the task R1, R2, R4→R3 are also given in figures 11 and 12. It is clear that some areas are overlapping between diverse health conditions by the compared methods. Meanwhile, the features of all categories by the proposed method are well separated, which means that the ADACL can correctly distinguish between diverse health conditions.

**Figure 10.** The visualized features of the training process on the generic gearbox dataset.
Download figure:
Standard image High-resolution image

**Figure 11.** The visualized features of the testing process for all methods on the generic gearbox dataset.
Download figure:
Standard image High-resolution image

**Figure 12.** The confusion matrices in task R₁, R₂, R₄→R₃ on the generic gearbox dataset.
Download figure:
Standard image High-resolution image

Figure 13 shows the diagnostic results of the proposed method and comparison methods under different SNR on the task R1, R2, R4→R3. It is clear that the diagnostic performance of the proposed method is less affected by environmental noise. Therefore, the proposed method also has a strong ability to resist noise in the case of three source domains. To sum up, case 2 further illustrates that the proposed method can also accurately achieve cross-domain fault diagnosis in the case of three source domains.

4. Conclusions

In this study, a novel ADACL method is proposed for the cross-domain intelligent fault diagnosis of multiple source domains under diverse operating conditions. In the proposed method, three loss items, namely a cross-entropy loss, a domain distribution alignment loss and a domain classifier alignment loss are introduced into the training process simultaneously to align the domain distribution of all domains and the target domain outputs of all the classifiers. The proposed method can accurately identify the target samples that are easily misclassified at the class boundary, thereby improving the diagnostic performance.

Two experimental scenarios for two source domains and three source domains are conducted to verify the effectiveness of the ADACL. It is observed that the addition of multiple source domains contributes to better learning of domain-invariant features under different operating conditions; thus, the proposed method achieves a state-of-the-art cross-domain diagnosis result compared to the comparison methods. In particular, the proposed method achieves excellent anti-noise performance relative to other methods. Consequently, the ADACL provides a promising tool in addressing cross-domain fault diagnosis problems in real industrial scenarios.

Acknowledgments

The research was supported by the Fundamental Research Funds for the Central Universities (N180304018) and also supported by the National Key Research and Development Program of China (2017YFB1103700).

Adversarial domain adaptation with classifier alignment for cross-domain intelligent fault diagnosis of multiple source domains

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction