Single-source UDA for privacy-preserving intelligent fault diagnosis based on domain augmentation

In practical applications of fault diagnosis, several factors, including fluctuations in load, changes in equipment condition, and environmental noise effects, could cause a classifier that’s been trained on the source domain to be ill-suited for matching data from the target domain. Unsupervised domain adaptation techniques have been developed to tackle this issue, but they typically demand access to fully labeled source domains, ignoring concerns of privacy regarding source domain data. Therefore, we consider a new research scene for source-free unsupervised domain adaptation (SFUDA), which exclusively relies on a source model trained on source domain sample without requiring access to fully labeled source domain data. This paper introduces a SFUDA approach that utilizes knowledge distillation (KD), which involves two stages: (1) generalizing the source model by applying domain augmentation techniques and LS methods that enhance the model’s potential to enhance its generalization capability; (2) adapting the target model using a KD framework to achieve knowledge migration; and in addition, mutual information structure regularization is added to consider the internal data structure, thus enhancing the model’s adaptability. To evaluate the efficacy of our approach, we perform experiments on two datasets—the Case Western Reserve University dataset and the Paderborn University dataset, comprising 24 transfer tasks. Our experiments demonstrate the effectiveness of the domain augmentation technique, mutual information regularization, and the proposed method.


Introduction
Rolling bearings are critical components in mechanical equipment and find widespread use across various industrial fields [1]. The reliability of these bearings directly impacts the equipment's performance, making fault detection paramount for ensuring industrial production safety and warranting significant research efforts. In recent years, numerous intelligent fault diagnosis (IFD) techniques [2][3][4][5][6] have been proposed by researchers to enhance the dependability of systems and ensure machinery and equipment safety, including support vector machines (SVM) [7], convolutional neural networks (CNN) [8], and deep belief networks [9]. Such intelligent classification models have made remarkable progress in fault diagnosis problems using large amounts of bearing-labeled data.
However, real-world industrial scenes present a challenge due to limited labeled samples and fault features that shift with varying operational conditions [10]. Using a method trained under specific operating conditions on another can bring about changes in classification boundaries due to variations in loads, speeds, and equipment wear conditions, ultimately degrading model performance [11]. Extensive research in fault diagnosis has explored Unsupervised domain adaptation (UDA) [12][13][14] as a means to enhance diagnostic performance by addressing issues such as insufficient labeled samples and domain shift problems. Using UDA, discriminative and domain-invariant features can be learned from the labeled data from single or multiple source domains and then applied to various but related unlabeled target domains. The mainstream UDA method uses domain adversarial training [15,16] or maximum mean discrepancy [17,18] to align the features of the source domain with those of the target domain, thereby narrowing the data distribution between domains and further improving model classification performance. For example, Wang et al [19] build a Wasserstein distance-based semi-supervised deep adversarial UDA network for learning domain-invariant features and combine this network with supervised instance-based methods to learn discriminative features with better intra-class cohesiveness and inter-class differentiability. Cao et al [20] combine a clustering strategy with a UDA adversarial network to suppress negative transfer during partial domain adaptation. In order to address the challenges posed by noisy data and changes in working environment conditions, Su et al [21] propose a hierarchical branch CNN (HB-CNN) scheme that utilizes three-level labels to represent fault detection, fault isolation, and fault identification. Zhang et al [22] propose an enhanced CNN-based approach for fault diagnosis that utilizes multiple parallel convolutional layers to construct a multi-mode CNN. This novel approach is effective in extracting diverse and complementary fault features and addresses the issue of classifying bearing faults in the presence of varying operating conditions. Guo et al [23] propose a deep convolutional transfer learning network, which obtains a shared feature representation for bearing diagnosis from vibration signals through condition recognition. Simultaneously, the approach maximizes the recognition error between domains while minimizing the distance between probability distributions, with the goal of learning domain-invariant features.
The traditional UDA approach presupposes that source domain data are available; otherwise, inter-domain features cannot be aligned. However, the reality is that the datasets collected by various laboratories are not convenient to share due to the presence of confidential information in the original signals, such as physical parameters, machine operating conditions, and production capacity. Considering the issue of privacy protection, federated learning (FL) provides a new solution [24][25][26]. FL consists of two parts: the server and the client. At the beginning of the FL process, each client downloads the global common model from the server and trains a local model using its own data. The server then combines the local models from each client to form a common model. After many epochs of training on the client and interactions between client and server, we end up with a global model that generalizes better. In FL, the privacy of local data is preserved as the server only utilizes the parameter weights from the local model and does not require direct access to the labeled data from the client. A small group of researchers has introduced FL into the realm of IFD. Zhang and Li [27] first propose an FL method for IFD, using prior distributions to bridge the domain gap indirectly. Following on, Zhang and Li [28] further propose a federated transfer learning approach for fault diagnosis, considering domain shift phenomena and data privacy issues. However, multiple clients of FL correspond to multiple source domains, which cannot be applied to common single-source domain UDA scenarios.
This paper tackles the issue of single-source domain privacy preservation by focusing particularly on the challenging scene of source-free unsupervised domain adaptation (SFUDA) within the broader field of UDA. The SFUDA scenario takes privacy preservation into consideration and does not directly access the source domain data during the training of the target domain model; put another way, the SFUDA scenario uses the source domain model as a black-box predictor, employing a methodology similar to FL. The SFUDA scenario is a worthy research direction. Although some researchers have explored research on it in the field of vision [29][30][31] in recent years, there is a scarcity of related studies in the realm of IFD. Research on SFUDA is currently being conducted in two steps: source domain generalization and target domain adaptation. These two steps represent critical aspects of SFUDA, addressing the challenges of acquiring a source domain model with robust generalization capabilities as well as enhancing the target domain model's effectiveness under cross-domain DA. In brief, SFUDA is a particularly challenging UDA scenario that involves additional obstacles due to limitations on source domain data access. It involves applying a source model that has already been trained to an unlabeled target domain without utilizing source domain data that possesses labels. In addition, compared to the scenario of the multi-source domain, the single-source domain has a more homogeneous data distribution and presents a greater challenge in improving the generalization of the source domain model. Zhu et al [32] use the above two-stage training method to implement the SFUDA problem under privacy protection. Virtual adversaries are first introduced into the source domain generalization training, and then the target domain data are adaptively finetuned by minimizing losses such as information maximization, which shows better results on the classification task.
Regarding the SFUDA scenario's issue with fault diagnosis, this paper introduces knowledge distillation (KD) [33] and draws on the knowledge adaptation network DINE to propose a domain augmentation-based KD framework. Our algorithm also takes the classical two-stage approach to implement the classification task in SFUDA scenarios, as shown in figure 1. The first stage involves extracting knowledge from the teacher model of the source domain; the second stage achieves knowledge transfer across domains without using source-domain data.
To improve the generalization of the source domain model, we perform domain augmentation on the source domain samples in the first stage; the second stage uses the classical distillation method to achieve knowledge transfer from the source domain to the target domain and adds structural regularization loss to further improve the model's cross-domain classification performance. In order to enhance the generalization ability of the source domain model, in the first stage, domain augmentation is performed on the source domain samples. In the second stage, when transferring knowledge from one domain to another, the classical distillation method is used, and a structural regularization loss is introduced to further enhance the model's cross-domain classification performance. The main contribution points are as follows: • The focus of this paper is on the privacy-preserving issue in the SFUDA scenario, which is an area that has received limited attention in the field of IFD. • We offer a scheme for KD based on domain augmentation while introducing structural regularization in unsupervised distillation to further enhance the model's cross-domain classification capability.

Method
Our primary concern is solving the realistic UDA setting of the cross-domain fault diagnosis classification task, where the unlabeled target domains only have access to a single blackbox source predictor. This section presents the problem formulation and notation for SFUDA in this paper, as well as introducing our proposed method.

Problem formulation and notations
For the single-source UDA scenario, the source domain is rep- , n s and n t are the number of samples within the domain, respectively. Where The goal of UDA is to fit the class label { y i t } nt i =1 . The greatest distinction between SFUDA and conventional UDA is that with SFUDA, only source model f s can be accessible, and source domain data cannot be accessed. A feature extractor g s (x s , θ s ) = F s and a classifier h s (Fs) =ŷ s make up the pretrained f s (x s , θ s ) = h s (g s (x s )), where θ s stands for the parameters of f s , F s is the feature extracted by g s (x s , θ s ) function, and y s represents the predicted label. The SFUDA seeks to train the target model f t : f t (x t , θ t ) = h t (g t (x t )) using x t and f s , where θ t stands for the target model's parameters.

Source domain generalization
During UDA, strongly generalized pre-trained source models play a crucial role in enhancing the practicality and performance of unlabeled target models. Therefore, our proposed method first performs source-domain generalization. This section involves the application of label smoothing (LS) and domain augmentation techniques.

LS.
The LS method helps enhance the discriminability of f s to some extent by encouraging features to be located in tightly clustered groups. LS's goal function can be expressed as: where (1 − η)1 ys + η/K stands for smoothed label vector, η for the smoothing parameter, which is set to 0.1, K for the number of classes, and 1 j for a K-dimensional one-hot encoding.

Domain augmentation.
Due to the difficulty and high cost of collecting labeled data from actual industrial machines, publicly available datasets or data collected in laboratories contain only a limited number of bearing failure types and operating conditions. Especially in the single-source domain UDA scenario, acquiring a pre-trained source model with a high level of generalization ability is a challenging task using only source domain data under a single operating condition. Therefore, in this study, we consider the perspective of source domain data expansion, expecting to learn generalized fault diagnosis knowledge by obtaining more diverse source domain training data. To increase the size of the source domain training sample, this paper implements a domain augmentation technique by utilizing an interpolation function to stretch the time-domain vibration signal samples, which in turn yields additional pseudo-domains, so that the single-source domain can be extended to multiple similar but differently distributed pseudo-domains. In other words, the interpolation operation realizes the transformation from a single-source domain to a multi-source domain, while the pseudo-domains expand the single-source domain data to simulate the multi-source domain UDA scenario. By enriching the distribution space of the source domain samples, this approach results in an improved generalization ability of f s .
Since the components constantly contact the fault during operations, the bearing failures result in broad-band impulse responses in the acceleration signals [34]. As per the dynamics of rotating machinery, it is established that the fault feature frequency bears a linear relationship with the shaft speed [35]. Hence, it is possible to generate extra pseudo-domains by manually stretching the time-domain vibration signal data by a certain scale factorα. Specifically, as shown in figure 2, let us suppose that we require a pseudo-sample of length N input . When α > 1, the data sample containing N input /α amplitudes is first selected from the acquired continuous mechanical vibration signal and the sample is stretched using interpolation to generate a fake sample of length N input . Similarly, when α < 1, the data sample of length N input /α is intercepted from the acquired timing signal, and then further compressed into a generated sample of length N input . Through the above interpolation operations, the size of the training sample can be increased by generating fake samples for extra fields. Each stretch coefficient corresponds to a unique field. By implementing this technique to simulate the transition from a singlesource to a multi-source domain, it is possible to enhance the model's generalization capabilities and mitigate issues related to overfitting in model f s . It is worth mentioning that in each transfer task, the domain augmentation technique used in this paper is only applied for the training of the source domain teacher model in the first stage.

Target model adaptation
2.3.1. KD. This article applies KD [33] to SFUDA scenarios so as to acquire fault knowledge from a single model f s . The KD based on logits forces the target model (student) to obtain prediction results similar to those of the source model (teacher) without paying attention to the features of the model's middle layers. The loss of distillation for target sample x t is expressed as: where D kl stands for the Kullback-Leibler divergence loss.

Distillation with structural regularizations.
Due to domain shift, the output of the source teacher model is likely to be incorrect because of noise interference. Therefore, this article adds global structural information during the distillation process in the target domain. To alleviate the issue of misclassification of a small number of samples, we try to use popular mutual information to make the prediction of target instances more diverse. The goal of maximizing mutual information is as follows: The effect of increasing marginal entropy H(Y t ) is to maintain uniform label distribution, while reducing conditional entropy H(Y t |X t ) improves the accuracy of network prediction.

Target model training objective.
The ultimate loss function is as follows when formulas (2) and (3) are combined: where β is a superparameter that has been experimentally fixed to 1.

Dataset
Open-source datasets are the foundation for researching, comparing, and evaluating different methods. In this article, we evaluate our method's performance by testing it on two public datasets, namely the Case Western Reserve University (CWRU) dataset and the PU dataset. The CWRU dataset, supplied by CWRU [34], is a highly popular open-source dataset for fault diagnosis. As illustrated in figure 3(a), CWRU data acquisition device comprises five components: a fan end bearing, an electric motor, a drive end bearing, a torque transducer and encoder, and a dynamometer. In this paper, we use drive end bearing failure data with a sampling frequency of 12 kHz. Nine fault categories can be created by combining the three fault kinds and degrees. There is also normal data, totaling ten classes. The specific fault category information is shown in table 1. In addition, the CWRU dataset is divided into four working conditions of 0, 1, 2, and 3 according to the different motor load speeds, corresponding to the speeds of 1797, 1772, 1750, and 1730 rpm according to table 2, respectively. To facilitate the description of transfer tasks, we define transfer identifiers for each task, as shown in table 3. For example, the term T 01 means a source domain of 0 and a target domain of 1. In this study, we focus on investigating the transfer of jobs from a particular source domain to various target domains. For example, when working condition 0 serves as the source domain, our method transfers its knowledge to working conditions 1, 2, and 3, corresponding to three transfer learning tasks: T 01 , T 02 , and T 03 . Therefore, there are 12 transfer learning tasks in these working conditions. The PU dataset [36] contains both man-made and natural damage, with a sample frequency of 64 kHz. The acquisition equipment for the PU dataset is shown in figure 3(b). The PU dataset is more complex than the CWRU dataset, and the existence of compound faults increases the difficulty of the classification task in the SFUDA scenario. The PU dataset is split into four working conditions based on various speeds, radial bearing forces, and drive train load torques, as indicated in table 2. The four various operating conditions of the PU dataset are also designated here as 0, 1, 2, and 3. The identification of the transfer task is also consistent with that of CWRU, but it is important to note that the working conditions in the PU and CWRU datasets are not the same, according to table 2. In addition, table 4 displays the information on the fault categories for 14 bearings.
In addition, table 2 lists the number of original, augmented, and total samples corresponding to each working condition for both datasets, where the total number of samples is the sum of the original and augmented samples. In the pretraining stage, only the source domain data are applied, and the original source domain samples are augmented to expand     0  0  0  1  1  1  2  2  2  3  3  3  Target domain  1  2  3  0  2  3  0  1  3  0  1 2

Baselines
In the experiment, we used different methods to validate the efficacy of our suggested approach, namely Basic, DINE [37], NoAug, NoLS, NoSR, and AugOnly. To be specific, the Basic Method, or the outcome of the first stage of pre-training, solely takes into account the source domain data and does not use transfer learning technology. DINE is a relatively new twostep knowledge adaptation framework that first extracts knowledge from source predictors into the student model and then fine-tunes the extracted model to further adapt to the target domain. The NoAug method is used to validate the efficacy of domain augmentation, which removes the domain augmentation part of our method; NoLS is to verify the role of LS in source domain generalization; and NoSR is to test the role of regularization in target domain model adaptation. The AugOnly method is used to verify the impact of techniques other than domain augmentation methods. All the above methods have the same backbone and experimental setup as the suggested method, allowing us to assess its performance.

Implementation details
We use the PyTorch framework in our experiments, and the average classification accuracy is taken as a measure of classification performance after three replications of each transfer task. The first stage trains the source model using all samples in source domain D s , and the second distillation-based domain adaptation stage uses only the unlabeled data in the target domain and f s to finally obtain model f t . Implementation details are described below in terms of data pre-processing, the structure of the backbone network, and experimental settings.

Data pre-processing.
The pre-processing operation of the data will contribute to the method's performance to some degree. For example, data cleaning, noise reduction, normalization, etc., or the application of the short-time fourier transform to convert the signal into a time-frequency map will have an impact on the final result. In this article, nonoverlapping sampling is performed on the original vibration signal with a sample length of 1024. Then each sample is subjected to Z-score normalization, and the normalized data is the final input of the backbone network. We obtain the normalized sample x normalize i as follows, where µ represents x i 's average value, and σ represents x i 's standard deviation.

3.3.2.
The structure of backbone. The backbone network in this paper is consistent with the backbone network of UDTL [38]. It consists of four stacked 1D convolution modules. Each module adds a ReLU activation function and a batch normalization layer. Moreover, the pooling layer adopts both max pooling and adaptive max pooling. Figure 4 illustrates specifics about the backbone network, where C stands for the number of classes.

Experimental settings.
In all compared methods, we adopt the same backbone network for fair comparison. The Adam optimizer is used for all experiments, and the backpropagation algorithm is used for all parameter updates. The initial learning rate lr is set to 0.001, the batch size bs is set to 64, and the hyper-parameter β is set to 1.

Evaluation metrics
Our evaluation metric for measuring model performance is the average classification accuracy, represented as: where n refers to the total number of target samples, y i t denotes the fault label of the target samples, and f(x i t ) denotes the predicted label.

Discussion and analysis
We compare the proposed method with other methods on two datasets: the CWRU dataset and the PU dataset. The average results for all approaches' classification accuracy on these two datasets are shown in tables 5 and 6, respectively.

Results on CWRU.
Due to the little difference between domains, the CWRU dataset is less difficult to classify, and the Basic method can reach 90.78%. While DINE has a remarkable effect on the image classification task, the accuracy on the CWRU dataset is rather lower than the result of Basic. Compared with our method, NoAug and NoLS reduce by 3.21% and 0.35%, respectively, which shows the effectiveness of domain augmentation techniques and LS in improving the domain model's capacity for generalization. In addition, the results of NoSR are poorer, even 7.03% lower than Basic and 13.66% lower than our method, which indicates that regularization directly affects the adaptive ability of the target model, resulting in poorer model learning ability. Overall, our proposed method exhibits superior performance on the CWRU dataset, achieving an accuracy of 97.41%.

Results on PU.
On the PU dataset, with an average accuracy of only 32.57% across 12 transfer tasks, the Basic method's diagnostic performance in cross-domain classification tasks is incredibly inadequate.
On the one hand, the reason is that applying the acquired fault knowledge from a source domain to target domains for fault diagnosis has a serious domain shift problem, and on the other hand, there are compound faults in the PU dataset, leading to a more difficult classification. DINE is a distillation and fine-tuning framework proposed for the SFUDA scene in the visual field, and the results are not greatly improved when it is used in the realm of IFD. Although it improved by 8.22% over the Basic method, it was still nearly 10% worse than our method. On the basis of the Basic method, AugOnly only adds the domain augmentation technology, and the final average accuracy rate increases by 5.01%; in addition, the accuracy rate of the NoAug method is 5% lower than our proposed method. From these two sets of comparison experiments, it can be seen that the domain augmentation strategy provides a significant improvement in the model's generalization. The effectiveness of NoLS is comparable to that of our method, and it is clear that the LS technology used in source domain model training does not improve the final result as significantly as domain augmentation technology. In contrast, the results of NoSR were worse, only less than 5% higher than Basic and 13.26% lower than our method.

Results on source domain generalization.
In the generalization part of the source domain, we adopt domain augmentation technology to expand the samples of the source domain data by generating the corresponding pseudo-domain space samples with different stretching coefficients to expand the sample size of the source domain, thereby improving the generalization ability of the teacher model. In particular, the domain augmentation technology improves the result of Basic from 32.57% to 37.58% on the PU dataset. The effectiveness of domain augmentation can also be seen from the comparison of the results of NoAug and our method. But in comparison, the data augmentation technique is not as effective on CWRU. While our method achieved an accuracy 3.21% higher than that of NoAug, the result of AugOnly in the pre-training stage is not much improved compared with the Basic method, which is 90.93% and 90.78%, respectively. The primary reason contributing to this result is the relatively minor domain shift issue within the CWRU dataset, enabling a direct transfer of the source domain model to the target domain and allowing for a high level of accuracy exceeding 90%. In contrast, while the domain augmentation technique may increase the source domain samples and improve the performance of f s , it also has the potential to destroy the original distribution of the data and increase the divergence between the source and target domain distributions, which leads to the AugOnly result being almost the same as the Basic method, and the domain augmentation technique does not play the role of enhancing the generalization ability. Therefore, the domain augmentation technique is more suitable for harder classification tasks and may be counterproductive for simpler classification tasks.

Results on target domain adaptation.
After the domain generalization phase of the pre-training stage, we obtain the source domain model f s with stronger generalization ability, and the performance of f s tends to be proportional to the result of the final target domain model. Tables 5 and 6 reveal the results of 24 migration tasks within the two datasets, of which our method demonstrated the highest accuracy in 14 of them, followed by the NoLS and NoAug methods. This result is also consistent with the analysis in the previous three sections. The difference in migration difficulty between the datasets themselves leads to a 47.24% difference in the final results for the two datasets.

Visualization results.
To conduct a detailed analysis of classification accuracy for each fault class, we use the transfer task T 03 as a case study and compute the corresponding confusion matrix for the Basic, AugOnly, and NoSR comparison methods. Additionally, we apply the TSNE method to observe the feature distribution of the model output. The subgraphs in figures 5 and 6 correspond one by one. From figure 5(a), the classification effectiveness of the Basic method in the three categories of labels 1, 4, and 8 is very unsatisfactory, and there are 32, 77, and 27 samples misclassified, respectively; the corresponding TSNE plot also clearly shows that the four categories BF-7 and BF-21 in the red circle and IF-14 and OF-21 in the basket circle cannot be well distinguished, and there is a large overlapping part. Compared with Basic, the AugOnly method that only uses domain augmentation showed good classification ability on the inner fault with label 1, with no misclassified samples. However, the classification of these two classes with labels 4 and 8 is still very poor, and there are 42 and 75 misclassified samples, respectively. This is because the domain augmentation technique somewhat destroys the inter-domain distribution differences, which will bring some interference to the results, as depicted by the red circle in figure 6(b). As demonstrated by (c) of figures 5 and 6, regularization plays a crucial role in determining the final classification outcomes. However, the NoSR method experiences an unfavorable convergence pattern during training, which hinders its ability to learn superior class features and ultimately results in less effective results than the Basic method. However, our proposed method can classify samples very well. As shown in figure 6(d), there is a clear gap between classes, and merely a negligible fraction of samples are classified incorrectly. On the CWRU dataset, although the domain augmentation technology has no significant effect and is somewhat disturbing in the pre-processing stage, after the learning and correction of the target model adaptation stage, the domain augmentation technology also exerts the advantage of enhancing the domain generalization ability. Overall, the above results and analysis demonstrate the effectiveness of domain augmentation techniques and regularization.
3.5.6. Parameter sensitivity experiment. We investigate the parameter sensitivity of β on the CWRU dataset, which is in the range of [0.0, 0.2, 0.5, 1.0, 2.0, 5.0]. Combining figures 7(a) and (b), the results illustrate that setting β = 1 yields the highest accuracy among all 12 transfer tasks. Therefore, we set it to 1 to obtain better performance from our proposed method.
In addition, the choice of augmentation coefficient will affect the domain generalization ability of f s . On the one hand, selecting too many augmentation coefficients will lead to an increase in the quantity of samples in D s , resulting in a decrease in the training efficiency of the model; on the other hand, if the augmentation coefficient is small, the capacity of the model to generalize will not be significantly improved. Therefore, we carry out multiple sets of experiments on the PU dataset and select an appropriate number and value of the augmentation coefficient α, taking into account the training efficiency and generalization ability of f s .
We empirically select the appropriate augmentation coefficient from 0.8, 0.9, 1.1, 1.2, and 1.5. In the experiment, we only focus on the results of pre-training. The upper limit of the training epoch is fixed at 150, and the rest of the settings are the same as in the pre-training part of our method. The specific experimental results can be found in table 6. The    Base method does not perform any stretching operations on the source domain data, at which time α = 1. Table 7 reveals that the generalization ability of f s becomes stronger as the augmentation coefficient increases, and the classification accuracy increases by 1.93%, 2.55%, and 5.72%, respectively. In order to make the augmented domain include both elongation and stretching states and also consider the efficiency of model training, we finally chose 0.9 and 1.1 as the augmentation coefficients. The experimental results also show that with these two augmentation coefficients, 7 out of 12 transfer tasks achieve the highest classification accuracy, showing strong domain generalization ability.

Conclusion
This paper considers a challenging SFUDA scenario. In order to solve the privacy problem under SFUDA, in the training of the target domain, we do not use the source domain data directly and only use the pre-trained model f s as the teacher model to guide f t (the student model). The two steps of the method proposed in this research are source model generalization and target model adaptation. The generalization phase of the source model incorporates domain augmentation technology, and suitable augmentation coefficients are selected to boost the effectiveness of the baseline approach. For the target model's training stage, regularization is introduced to take into account the target domain's data structure in order to further reduce inter-domain differences and ease the domain shift problem. Our method exhibits superior performance compared to the baselines on the PU and CWRU datasets. These results highlight the effectiveness of domain augmentation techniques and regularization.