Domain adaptation with domain specific information and feature disentanglement for bearing fault diagnosis

Collecting bearing fault signals from several rotating machines or under varied operating conditions often results in data distribution offset. Furthermore, the newly obtained data is typically unlabelled. When intricate confounding aspects of data distribution across several domains are present, achieving desired outcomes through straightforward transfer learning techniques becomes challenging. This research presents a new framework, the domain-specific invariant adversarial network, which combines the principles of domain-invariant representation learning and feature de-entanglement to solve the challenge at hand. This framework uses domain-specific information as an auxiliary training tool and employs the data generation process to transfer labelled source domain data to the target domain. The aim of this approach is to uncover potential information components and improve the model’s ability to acknowledge patterns. The study showcases the method’s strong diagnostic capability by conducting experimental analysis on four fault datasets.


Introduction
Rotating machinery is widely used in various industries, and its reliable operation is crucial for economic growth.However, the operation of rotating machinery can lead to a multitude of problems and failures in the bearings.Statistical data indicates that rolling bearings are responsible for approximately 40% of rotating machinery failures [1].The data emphasises the importance of developing accurate and effective procedures for diagnosing bearing faults.Such diagnostic approaches Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence.Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
have the potential to greatly decrease maintenance costs and economic losses resulting from failures.
Intelligent fault diagnosis (IFD) methods can be classified into two categories: model-based methods [2] and datadriven methods [3].Model-based methods require expertise and knowledge to extract features and identify problems using complex mathematical deductions [4], making them more time-consuming and challenging to apply compared to data-driven methods.Data-driven fault diagnosis methods are less dependent on pre-existing information.Machine learning algorithms are frequently utilised in fault diagnosis to automatically extract fault information and perform diagnostics.While conventional algorithms such as the k-nearest neighbour algorithm [5] and convolutional neural networks (CNNs) [6] can handle specific tasks, they primarily focus on surface-level characteristics of the data.As a result, they have limitations when it comes to processing intricate signals achieve effective diagnostic outcomes.Defect detection has been facilitated by deep learning techniques [7] algorithms based on deep representation learning, such as deep neural networks [8] and deep belief networks [9].Deep learning approaches enable the autonomous extraction of intricate features from unprocessed data, avoiding the laborious and timeintensive tasks involved in conventional human feature engineering.However, these techniques usually require a substantial amount of annotated data to achieve consistent and precise prediction outcomes, which can be a significant obstacle in real-world scenarios.The majority of the data collected during the machinery's operation relates to its health status, with a smaller amount of data related to malfunctions.This implies that obtaining sufficient fault data requires a significant investment of time and money.Additionally, newly collected data often lacks labels [10].In addition, the usual methods mentioned earlier assume that all data conforms to the same distribution.However, it is common for data obtained from several rotating machines or from the same equipment under varying operating conditions to adhere to distinct distributions.Hence, it is imperative to devise novel approaches to tackle these challenges.
The process of transfer learning effectively addresses the issue of unmarked and unbalanced distribution of the desired dataset by transferring current knowledge to the appropriate domain [11].Domain adaptation (DA) [12] is a subset of transfer learning that focuses on situations where only the source domain has labelled data.It involves the mapping of data features from different domains to a common feature space, enabling the transfer of information from labelled to unlabelled domains.The approach addresses the issue of unlabelled data in the desired domain by using representations that are consistent across domains.To address the issue of uneven distribution of training and test data, it is necessary to make assumptions about the correlation between the distributions of the two domains.Covariate shifting [13] postulates that the input distributions of the training and test sets are different, but share the same functional connection.In a covariate shifting study [14], the marginal distribution P (X) changes but the conditional distribution P (Y|X) remains consistent across domains.The kernel method uses a linear mapping to transform data that cannot be divided in low-dimensional space into high-dimensional space.It then utilises techniques such as maximum mean difference (MMD) [15] or Wasserstein distance [16] to minimise the disparity in the marginal distributions, thereby achieving an alignment in the feature distribution.This approach effectively addresses the issue of unlabelled data in the target domain.Multi-kernel maximum mean discrepancy (MK-MMD) [17] decreases the disparity in feature distributions between two domains and enhances the consistency of features across domains.Domain adversarial learning techniques aim to achieve domain-invariant features by minimising the distributional discrepancy between the possible features of two domains [18].While these methods aim to achieve domain-invariant representations through mapping or shared representations [19], the presence of potential discrepancies in feature distribution can hinder the learning process in practice.Simply restricting the marginal distribution P (X) may not be suitable in intricate circumstances, particularly in scenarios where the conditional distribution P (Y|X) may differ.
At a more fine-grained level, it is more realistic to align conditional probability distributions rather than simply relying on covariate changes.Joint distribution adaptation [20] achieves the adaptation of edge distributions in two domains using MMD, and subsequently addresses the difficulty of conditional probability distribution through pseudo-label iteration.Joint adaptation networks [21] aim to align the joint distribution of many layers relevant to different domains.They achieve this by employing adversarial training algorithms to optimise the joint maximum mean discrepancy (JMMD) and minimise the discrepancies in the joint distribution between domains.When dealing with complex working conditions, such as major variances in labels between overlapping parts of two domain distributions, confounding factors have a significant impact on predictive ability.It is challenging to produce optimal diagnostic results with basic constraints.To address this issue, the domain index method, which incorporates domain-specific auxiliary factors and data training, has been proposed in the literature [22].This approach aims to explore the latent information components and enhance the identification ability.In the literature [23], domain-specific information is utilised as a domain index in the encoder to preserve crucial information about Y in the latent representation feature z.The literature [24] suggests an auxiliary indexing method that involves labelling the target domain with sub-labels based on the probability distribution of the sample space.This labelling is then used to guide the alignment of the distributions, allowing samples of the same category to be grouped together in the feature space.This approach effectively reduces the mismatch between categories.The research in this paper proposes the use of domain-specific information to facilitate transfer learning in order to investigate fault diagnostic solutions for cross-working scenarios in complicated environments.In bearing fault diagnosis, different data sets correspond to particular changes in the environment, such as different rotation rates and load circumstances.By utilising θ as a representation of domain-specific information and including fault data learning, the data dimension is expanded, expressiveness is enhanced, and the ability of the model to handle complicated working situations is improved.
The domain-indexed models face challenges in performing cross-case fault detection tasks due to their limited performance in specific migration tasks.This limitation may be related to the different ability of the model to recognise different features.Models in recognition tasks need to be able to accurately differentiate and utilise cross-domain invariant features, while disregarding or adequately managing cross-domain changing features.Taking image recognition as an example, variations in lighting conditions or backgrounds can be considered as cross-domain features that differ, but the principal objects in an image remain consistent across domains.In domain separation networks [25], a given feature z is separated into two components: z S , which represents the change or style, and z C , which represents the invariant or content.By imposing supplementary limitations, it enables z not only to anticipate the label y, but also reconstruct the initial data, so improving the model's capacity for generalisation.The literature [26] argues for the recognisability of potential representations of these two components in different domains based on assumptions about the data generation process.In their research, study [27] introduced a domain-invariant variational autoencoder that decomposes features into domain information, category information, and other information.Another investigation [28] focuses on the perceptibility of common features and verifies the consistency of these common features across various perspectives.Literature [29] provides a systematic approach to disentanglement, which further improves recognition performance by recovering the original latent features.Literature [30] devised a framework for learning conditional feature disentanglement that specifically targets the separation of operational conditions and health state features.The framework exhibited exceptional resistance to interference in diverse operational situations.Furthermore, literature [31] employs a disentanglement strategy that integrates subdomain adaptive and adversarial learning.This technique aims to effectively align the distributions of local and global features, thereby improving recognition performance.Although current approaches do not adequately resolve labelling discrepancies in the overlapping region of the two domain distributions, these findings indicate that enhancing the model's internal feature representation can greatly enhance the performance and reliability of cross-domain diagnostic tasks.
This study presents a novel architecture for IFD that integrates domain-invariant representation learning and feature disentanglement.This architecture is referred to as domain-specific invariant adversarial network (DSIAN).This approach is based on the data generation process where domain-specific information is utilised as a domain index.This information is then inputted into the encoder along with the data for training purposes.By increasing the number of feature dimensions, the model is able to learn domain-specific content.Subsequently, this approach divides the prospective features into a segment that undergoes cross-domain modification and a segment that remains constant, so augmenting the model's identification capacity by analysing the constituent elements of the potential data.The primary contributions of this study can be summarised as follows.
(1) A unique domain-adaptive for fault diagnosis method is proposed, which effectively utilises domain-specific information as an auxiliary tool during training.This method efficiently achieves knowledge translation from the source domain to the target domain and effectively performs bearing fault diagnosis in different operating situations by integrating the data generation process with the autoencoder methodology.(2) A novel feature processing mechanism is proposed, which enhances the model's ability to distinguish between different cases by effectively utilising the parameter set of potential features for data reconstruction and migration.
Additionally, it achieves distribution alignment by extracting the cross-domain invariant component from the representation of potential features, and accurately identifies faults based on this component.(3) The method described in this paper is extensively compared with other existing domain-adaptive methods in tests performed on four datasets.The experimental findings demonstrate that the approach employed in this investigation surpasses the comparative approaches in terms of diagnostic precision.
The subsequent sections of the paper are structured in the following manner.Section 2 provides a concise description of the domain adaptive bearing fault diagnosis approach.Section 3 outlines the suggested DSIAN technique.Section 4 provides an overview of the dataset utilised in the test.Section 5 presents the test outcomes and analysis.Section 6 provides the final conclusion.

Introduction to bearing fault diagnosis based on DA
DA is a specific area within transfer learning.It involves two domains: the source domain, denoted by D S = X S , P S (X) , and the target domain, denoted by D T = X T , P T (X) .The data samples of the source domain, denoted as X S , are labelled and their marginal distribution is represented by P S (X).On the other hand, the data samples of the target domain, denoted as X T , are unlabelled and their marginal distribution is represented by P T (X).When utilising the domain adaptive technique for the diagnosis of vibration signals, it is unavoidable that the data obtained from rotating machines under different operating situations will have distinct distributions [32].The primary goal of DA is to train a classifier using labelled data from a source domain that can accurately identify the joint distribution P T (X, Y) of unlabelled target domains.This goal involves leveraging the similarity between the two domains.Previous research has primarily concentrated on a single shift, presuming that other changes in disparate domains remain unaltered.For instance, certain conventional transfer learning methods impose constraints only on the marginal invariant P S (Z) = P T (Z) of the latent feature Z.However, this method hinders the model's classification efficiency of the constraint fails to adequately align the joint probability distribution The previous method faces challenges in learning the data features collected under different operating conditions using a single fixed encoder, mainly due to the intricate variations between different operating modes of machines, such as diverse driving speeds and different radial forces applied to bearings.Hence, this study presents domain-specific information θ as a tool to facilitate training, taking into account different working contexts.In the fault diagnosis task of rotating machinery, θ can represent various rotational speeds, varied load circumstances, or other environmental-specific variations that are not connected to the projected Y class.The causal

Proposed method
This paper presents a state-of-the-art (SOTA) DA framework called DSIAN, which is specifically developed to overcome the constraints of current fault diagnosis approaches.The key innovation of the DSIAN model lies in its integration of domain-invariant representation learning and feature disentanglement techniques.The key innovation of the model lies in its integration of domain-invariant representation learning and feature disentanglement techniques.This enables DSIAN to successfully manage domain-specific parameters and efficiently disentangle potential features.The architecture of the model, depicted in figure 2, uses a sophisticated network design.In this design, the model initially obtains the labels u S and u T for the source and target domains and then inputs them into the domain encoder.This step produces domainspecific information θ S and θ T , which is crucial for successful transfer learning.The source domain data x S and the target domain data x T are then processed through the convolutional section of the encoder.The resulting features are then combined with the domain-specific information to ensure a comprehensive representation of the data characteristics in each domain.DSIAN applies an adversarial approach to reduce the distributional discrepancy between the source and target domains in the potential space.This is done specifically for the domain-invariant regions z S c and z T c within the potential space z.Furthermore, by using auxiliary classifiers and decoders, DSIAN enhances the classification accuracy and generalisation capability of the model.The following chapters will provide a comprehensive analysis of the technical aspects of this study.This will include a detailed explanation of the design concept, network topology, and algorithmic strategy of the model.In summary, the DSIAN model represents a significant technological improvement in fault diagnosis, enhancing diagnostic precision and introducing novel concepts to the field of IFD.

Invariance of latent features
Prior research has mostly concentrated on employing classifiers to acquire invariant representations across domains.This involves mapping the hidden variable z to novel domains in latent space and minimising distributional disparities to enable classifiers to attain favourable cross-domain classification accuracy.Empirical evidence has demonstrated that these methodologies are incapable of consistently yielding favourable outcomes in every case.The aim of this work is to clarify the distinguishing features of the invariant and changing components of hidden representations and to use this knowledge to make predictions in the desired domain.
The study assumes that the data x ∈ X is generated by the latent variable z ∈ Z ⊆ R n and the domain-specific parameter θ, x = g (z, θ).The latent representation z is divided into two parts z = [z c , z s ], where z c ∈ Z c ⊆ R nc denotes the crossdomain invariant part of the latent variable, and z s ∈ Z s ⊆ R ns denotes the cross-domain variable part of the latent variable.Therefore, the data generation process can also be expressed as x = g (z c , z s , θ).At this time, the invariant part z u c in the latent representation z u can be used to predict the label ŷu = h (z u c ).
Based on the proof of identifiability of the potential feature change part z s [25], proved that the invariant subspace is identifiable in a block manner.Furthermore, they showed that the invariant part retains the information of the real invariant part and does not mix information from the changing component.Based on equation (1) the estimated data generation process demonstrated that it can achieve block-wise identifiability of the invariant subspace.This suggests that g is partially reversible and identifiable. (1) Therefore, this study can recover z c and z s from the observed value x in domains.The joint distribution P(x, z c , z s |u) can be partially identified.Once z c is known, the predictor can be trained on the labelled data, and then used to forecast the label of the data P(y|z c ) in the target domain.Additionally, P(x, z c |u T ; θ) is used in the target domain to partially identify the joint distribution P(x, y|u T ).

Domain invariant representation learning
The paper describes an approach that integrates domaininvariant representation learning with feature disentanglement.The features z preserves important domain-related information by including domain-specific information into each domain's data as a domain index.This allows the data production process to reveal the components of this potential information, hence improving the diagnostic capability of the model.
This article describes a method that considers the domain index of the data, represented as u ∈ {S,T}.The domainspecific information is denoted by θ ∈ θ S , θ T and serves as a parameter for the cross-domain variation of P(X|Y).The domain index matrix is inputted into the domain encoder to acquire the domain-specific parameters θu .Subsequently, the data is inputted into the convolutional layer of the encoder.The extracted features are merged with the estimated parameter θu and input into the fully connected section of the encoder to acquire the potential feature representation z u i = ϕ (x u i , θu ).Here, x u i and z u i represent the ith input data point and the potential representation from the u domain, respectively.
To align the distribution of the two-domain features P S (Z) = P T (Z), this study trains a domain predictor h D .The predictor classifies whether the data belongs to the source or target domains, denoted as ûi = h D (z u iC ).The adversarial strategy is then implemented to diminish the disparity between the two domain distributions.Subsequently, the data generation process employs the potential representation z and the domain-specific parameter θ to reconstruct the data x through the decoder xu i = φ (z u i , θu ).The joint mutual information I (θ; (X, Y)) [33] quantifies the impact of θ on the process of generating X from Y, with the aim of reducing its influence.To achieve this, we measure and restrict I(θ; (X, Y)) = JSD(P XY;θ= θS ||P XY;θ= θT ).
Here, θS and θT are variables that are peculiar to a particular domain, while JSD represents Jensen-Shannon divergence.As the target domain in this study lacks labels, information regarding P XY;θ= θT is unavailable.Consequently, assessing the implicit joint distributional reconstruction data in JSD(P XY;θ= θS ||P XY;θ= θT ) is not feasible.To overcome this issue, we can use the deduced variables specific to the domain to transfer the data from the source domain to the target domain through the decoder xT trans.= φ (z S i , θ T ).The annotated data can serve as an estimate of the combined distribution in the desired domain to restrict the mutual information I (θ; (X, Y)) [34].An appropriate approach is required to minimise the JSD of the joint distribution of reconstructed and migrated data in the source domain.[35] provides a theorem that gives an upper limit on the JSD, which is readily reducible (2) The upper bounds of 1 2 ∫ P Y|X=(y|x) µ (x, y) and .Use the cross-entropy loss to constrain the two KL divergences.The model is pre-trained using labelled data from the source domain.An optimiser is employed to reduce the classification loss of the model, which enables it to identify certain features related to fault classification.The predictor ŷS = h(z S c , θ S ) is trained by incorporating the cross-domain invariant portion z S c and the domain-specific information θ S into the potential features of the labelled source domain.By calculating and optimising the loss function between the predicted labels and the true labels, the classification loss can be reduced to improve the accuracy of the classifier in predicting unlabelled target domain data.The classification loss, denoted as L cls , is presented in the following manner.m S represents the total number of observations in the source domain

Process for proposed method
y S in log ŷS in . (3)

S Xie et al
After entering the domain encoder, the domain index u is mapped to θ.The convolutional layer of the feature encoder is used to learn from the source domain data x S and target domain data x T .The resulting latent features are then combined with the domain-specific parameter θ, which is output by the slave domain encoder.Finally, the combined features are input into the fully connected part of the feature encoder and the bottleneck layer.The classifier uses the invariant part z c of the potential feature z output by the bottleneck layer for the classification task.The z c input domain discriminator is aligned with the edge distribution of the source and target domains through confrontation.To achieve this, we introduce the adversarial loss L inv , which is expressed as follows To promote the identifiability of the model and retain valuable domain-related information in the potential feature z, this study combines the potential feature z output by the bottleneck layer with the domain-specific parameter θ.The feature decoder is then used to reconstruct the data x, introducing the reconstruction loss L recon .The loss is shown as follows (5) Of the article discusses mutual information minimisation in [31].To improve the prediction of the target domain label, an auxiliary classifier is trained on the reconstructed source domain data and the migrated data.This constrains the influence of θ and forces the cross-domain minimum change of P (X|Y).The loss function is optimised to minimise the influence of θ when reconstructing and migrating data, resulting in a condition-invariant representation and minimising the change of the corresponding conditional distribution P (X|Y).The output features of the bottleneck layer in the source domain are combined with the domain-specific parameter θ T of the target domain.This combined output is then input into the decoder to facilitate data migration through the process of data generation.The auxiliary classifier takes in the migrated data xT trans.= φ z S i , θ T and the reconstructed data , and the optimiser is used to reduce the crossentropy loss of the auxiliary classifier.This ensures that the conditional distribution P (X|Y) undergoes minimal crossdomain changes.The formula for the cross-entropy loss of the auxiliary classifier is as follows represent softmax predictions of source domain reconstruction and migration, respectively using predictor h C .
The total loss of the algorithm in this paper is as follows The optimisation objective of the suggested algorithm is divided into two distinct components.Firstly, the label predictor is trained to reduce the cross-entropy loss of the labelled data in the source domain.Additionally, supervised learning is used to minimise the cross-entropy loss of the reconstructed source domain data and the migration data in the auxiliary classifier.Secondly, unsupervised learning is employed to decrease the adversarial loss and reconstruction loss of the source and target domains data.

Open source data sets
To test the efficacy of the newly developed algorithm, it is imperative to test its performance using publicly available datasets.The objective of this study is to assess the effectiveness of the DSIAN approach by processing and testing four open-source bearing datasets.Below is a description of the datasets.

CWRU data set
The Case Western Reserve University (CWRU) bearing data centre provides the CWRU bearing fault dataset.The dataset includes vibration signals from an SKF-6205 drive end bearing, sampled at a frequency of 12 kHz.The dataset has four operating speeds: 1797 rpm, 1772 rpm, 1750 rpm, and 1730 rpm, each corresponding to a different task.In total, there are 12 transfer learning settings.CWRU divides bearings into a normal state (NA) and three fault types: inner ring fault (IF), ball fault (BF) and outer ring fault (OF), with ten categories based on the size of the fault.The faults in the bearing are artificially created using a discharge machine at the seven-miller, 14 miller and 21 miller diameter positions.For a comprehensive explanation, please refer to [36].

PU data set
The Paderborn University (PU) dataset is a collection of bearing fault data provided by Paderborn University, which include both artificially induced and damage.The dataset is divided into four working conditions based on different loads, torques, radial forces, and speeds.These four working conditions correspond to 12 transfer learning settings, and the data is sampled at a frequency of 64 kHz.The dataset includes 13 labels for different fault types and locations.For detailed explanation, please refer to [37].

JNU data set
The Jiangnan University (JNU) bearing dataset, provided by Jiangnan University in China uses four health conditions: NA, IF, OF and BF, corresponding to four fault labels respectively.The vibration signal is sampled at a frequency of 50 kHz at three different speeds: 600 rpm, 800 rpm, 1000 rpm.The signals sampled at the three speeds are regarded as different tasks, resulting in a total of six transfer learning task settings.For a detailed explanation, please refer to [38].

PHM2009 data set
The PHM2009 dataset, provided by the PHM Data Challenge, is a comprehensive dataset of gearbox information that is extensively utilised in several industries.This study focuses on the helical gear data obtained through the use of an accelerometer installed on the mounting plate of the input shaft.The data was sampled at a frequency of 200 KHz/3.This study examines data obtained from four distinct rotational frequencies (30 Hz, 35 Hz, 40 Hz, and 45 Hz) during high load circumstances.The data were separated into four tasks based on their respective rotating speeds, and 12 migration learning scenarios were created.For a detailed explanation, please refer to [39].

Training details
This study applies an IFD approach based on unsupervised deep transfer learning (UDTL) and integrates it into the unified DSIAN framework using the PyTorch programming.The data in each dataset is partitioned into segments of length 1024 without any overlap between them.The Z-score normalising approach is then applied to process the data.The Adam optimiser is employed as the default choice with a learning rate of 10 −3 and a batch size of 64.To prevent data leakage and ensure accurate computation outcomes, this study divided four-fifths of the entire sample into a training set, while the remaining portion is allocated to the test set.It is ensured that there is no overlap of data between the two sets.In order to mitigate underfitting, each algorithm was trained 300 times.To facilitate comprehension, we employ the notation a-b to represent tasks that span multiple domains, where a is the source domain and b is the goal domain.The model is primarily pre-trained using data from the source domain during the first 50 epochs of training.After the pre-training phase, the model is trained using both the source and target domain data simultaneously.
The trials were conducted using Intel Core i7-12 700K, GeForce RTX A5000 PCs with 32 GB RAM, running on the Linux operating system and PyTorch version 1.10.

Result analysis
This study uses overall accuracy to evaluate the classification proficiency of various algorithms.Overall accuracy measures the proportion of correctly identified samples out of the total number of classified samples.In order to mitigate the influence of chance, five trials were conducted for each transfer learning task.The ultimate performance was assessed by calculating the average of the highest overall accuracy achieved in each trial.
Tables 1-3 show the experimental results of the proposed method DSIAN compared to the other methods presented in this paper.The highest accuracy in each case and the final overall accuracy are marked in bold.The results demonstrate the advantages of DSIAN in various tasks.DSIAN achieves the highest diagnostic accuracy compared to other methods in most test scenarios, highlighting its superior ability to deal with significant distributional differences in data under different operating conditions.Traditional CNNs, on the other hand, perform mediocrely in reducing the distributional differences between the source and target domains in technical.CNN performs better when combined with distribution distance constraint algorithms such as MK-MMD, JMMD, and CORAL.This highlights the key role of reducing the inter-domain distribution difference in model performance optimisation.Meanwhile, domain adversarial neural network (DANN) and conditional domain adversarial network (CDAN), based on adversarial learning, still have a significant gap compared with DSIAN, although they have improved their classification effectiveness.The existence of this gap verifies the effectiveness of DSIAN in simultaneously considering the effects of conditional and marginal probability distributions, as well as in optimising the loss function.DSIAN performs well when dealing with complex PU datasets, with an average accuracy of 61.13%, outperforming all other comparative methods in a range of cross-domain tasks.In some of the more difficult tasks, such as 0-1, 0-3, 2-3, 3-0, and 3-2, DSIAN improves by 5.47%, 8.28%, 5.31%, 8.78%, and 5.13%, respectively, compared to other optimal methods.This further corroborates its remarkable ability in domain discrepancy reduction and fault diagnosis.In addition, DSIAN demonstrates a significant improvement in classification accuracy compared to domain-specific adversarial network (DSAN).This achieved through the use of a feature de-entanglement technique to classify potential features and impose corresponding constraints which.This validates the effectiveness of the approach in improving model performance.
To demonstrate the versatility of DSIAN, this study applies it to the PHM2009 dataset to identify complex gearbox vibration signals.The results of the experiments are summarised in table 4. Consistent with the three previous experiments, DSIAN achieves optimal performance in most of the migration tasks.In addition, it also shows excellent overall average accuracy.The results demonstrate the effectiveness of the methodology presented in this paper for in DA and fault diagnosis.Overall, the experiments demonstrate that DSIAN's technology is valuable.
To demonstrate the novelty of this paper, we compare our method with the two new proposed methods, and present the results in table 5. We replaced the loss function and other elements of these two methods into the network model of our paper, respectively, using the same network parameters and hyper-parameters to ensure a fair comparison.We provide a brief description of the two new methods below.Firstly, the discriminator-free adversarial learning network (DALN) [40] introduces a nuclear-norm Wasserstein discrepancy coupled with a classifier for explicit domain alignment and category differentiation.DALN has advantages over the existing SOTA.Secondly, Quan et al [41] proposed a new variance representation metric, maximum mean square discrepancy, which comprehensively expresses the variance and mean information to improve the distributional alignment between domains.The  table shows that the DSIAN method outperforms the two new methods in most working conditions and has the highest total accuracy.This demonstrates the novelty and effectiveness of the method presented in this paper.Tables 6-9 show the results of the ablation experiments for the proposed algorithm on the four datasets.The tables show the impact of individual loss terms, where α 1 = 1, α 2 = 0.1, L total = L cls + L inv + 0.1 • L recon + 0.01 • L auxcls .To conduct an in-depth evaluation of the model's performance, we first determined the baseline configuration, i.e. the model that employs only the categorical loss function.This setting provided us with an initial performance reference point.Subsequently, we introduced the adversarial loss function based on the theory of adversarial learning to enhance the model's robustness and adaptability to different feature distributions.The adversarial loss resulted in a significant improvement in the model's classification accuracy (from about 2% to 9%), indicating its in optimising classification performance.However, upon adding reconstruction loss, we observed a slight decrease in model accuracy.The decrease in model performance may be due to the additional constraints introduced by the reconstruction loss conflicting with the existing classification and adversarial loss.Therefore, it is important to consider the interactions and tradeoffs between different loss functions when designing a multitask learning framework.Finally, we improved the model's classification performance by integrating an auxiliary classifier and incorporating its classification loss.The addition of auxiliary classifiers can help the model capture data features in a more detailed way, leading to better overall classification performance.Based on these experimental results, it can be concluded that the DSIAN model exhibits significant performance improvement with the loss function configurations we set up, which validates the effectiveness of its design and implementation.These findings provide strong support for the model's design.This study conducted experimental tests on four datasets to assess the efficacy of domain indexing θ.The study compared the approach that incorporates domain indexing with the one that does not use domain indexing.The experimental results on the CWRU and JNU datasets, as presented in table 10, demonstrate that the DSIAN method achieves high accuracy even without the use of the domain index θ.While the addition of θ does improve accuracy, it does not provide sufficient evidence to confirm the significant role of the domain index.However, during the trials conducted on the PU and PHM2009 dataset, the models that employed the domain indexing configuration exhibited superior accuracy in the majority of migration tasks compared to the models that did not apply this configuration.The DSIAN model demonstrates an average accuracy that is 3.23% and 1.51% higher than the model without domain indexing.These results indicate that the use of domain indexing greatly enhances the precision and flexibility of the model across various datasets.The effectiveness of domain indexing θ is also demonstrated.
To evaluate the performance of DSIAN thoroughly, this study presents the detailed classification results of various algorithms using a confusion matrix.The confusion matrix

Visual analysis
To gain a deeper understanding of the feature transformation involved in the migration task, this study introduces the t-distributed stochastic neighbourhood embedding (t-SNE) algorithm.This algorithm effectively reduces the dimensionality of the features and generates a clear image of their distribution.Figure 4 illustrates the distribution map of the invariant portion of potential features across domains obtained after dimensionality reduction of the bottleneck layer outputs of migration tasks 0-1 in the JNU dataset using the t-SNE algorithm [42].The graph uses identical colours for identical labels.Circles represent the source domain data, while triangles represent the target domain.This study presents the results of four different algorithms (CNN, DANN, DSAN and DSIAN) separately to enable a comprehensive comparison and analysis.This approach allows for a clearer observation and understanding of the feature extraction and differentiation abilities of each algorithm in dealing with the migration task.The t-SNE visualisation graph of the standard CNN shows scattered data points, indicating poor feature differentiation and unclear category boundaries.DANN, a neural network model for DA, reduces the distribution distance of data features in the latent space, making the source and target domains more similar.In this subfigure, it is evident that the data points of different categories are more separated when using DANN, indicating better feature separation.However, DANN suffers from clustering errors.DSAN  DSIAN further reduces the distributional differences, showing the effectiveness of the proposed method.

Conclusion
This study proposes a novel domain adaptive fault diagnosis method that combines domain invariant representation learning and feature disentanglement concepts.Specifically, this method uses domain-specific information as a domain index and inputs it into an encoder along with training data.The learned latent representations are then categorised into a crossdomain invariant part and a cross-domain variable part.The full parameters of these potential features are inputted into the decoder for data reconstruction and migration.The process performs distribution alignment and classification operations using the invariant part of the potential features.The method utilises the data generation process to parse the components of potential information, which significantly improves the model's recognition ability.This study demonstrates the excellent capability of the DSIAN method in fault diagnosis and provides new ideas for future fault diagnosis and DA research.Future research will investigate the impact of domainspecific information on fault diagnosis.The plan is to digitise and integrate operating condition information such as speed and loading conditions into domain-specific data.This data will then be used to train fault diagnosis models.The objective is to simulate model training under different operating conditions by adjusting these domain-specific parameters.This will help to cope with the distribution bias problem, thereby improving the model's generalisation ability and accuracy.

Figure 1 .
Figure 1.The generation process of proposed methods.

Figure 2 .
Figure 2. The diagram of the DSIAN framework.

1 2 ∫
|Q X (x)| µ (x) are denoted by c 1 and c 2 respectively.The conditional distribution of Y of a given X specified by the auxiliary classifier h C is denoted by Q hc Y|X .JSD P X|θ=θ S ||P X|θ=θ T represent the JSD divergence between the edge distribution of the source and target domains of X, and is fixed.The inequality is constrained only by minimising the two KL divergences on the right side of inequality (2).To solve this problem, train the auxiliary classifier h C on the reconstructed source domain data xS and the migrated data xT trans.

Figure 2
Figure 2 depicts the model architecture described in this article.The procedural instructions for training the model are as follows:The model is pre-trained using labelled data from the source domain.An optimiser is employed to reduce the classification loss of the model, which enables it to identify certain features related to fault classification.The predictor ŷS = h(z S c , θ S ) is trained by incorporating the cross-domain invariant portion z S

Figure 3 .
Figure 3. Confusion matrix of different methods in PU test data set.

Figure 4 .
Figure 4. Visualisation of features in source and target domain samples based on t-SNE.

Table 1 .
Experimental results of CWRU data set.

Table 2 .
Experimental results of JNU data set.

Table 3 .
Experimental results of PU data set.

Table 4 .
Experimental results of PHM2009 data set.

Table 5 .
Experimental results of DSIAN with two new methods.

Table 6 .
Results of ablation experiments on CWRU data set.

Table 7 .
Results of ablation experiments on JNU data set.

Table 8 .
Results of ablation experiments on PU data set.

Table 10 .
The results of using domain index or not in four data set transfer diagnosis experiment.