A novel unsupervised dynamic feature domain adaptation strategy for cross-individual myoelectric gesture recognition

Objective. Surface electromyography pattern recognition (sEMG-PR) is considered as a promising control method for human-machine interaction systems. However, the performance of a trained classifier would greatly degrade for novel users since sEMG signals are user-dependent and largely affected by a number of individual factors such as the quantity of subcutaneous fat and the skin impedance. Approach. To solve this issue, we proposed a novel unsupervised cross-individual motion recognition method that aligned sEMG features from different individuals by self-adaptive dimensional dynamic distribution adaptation (SD-DDA) in this study. In the method, both the distances of marginal and conditional distributions between source and target features were minimized through automatically selecting the optimal feature domain dimension by using a small amount of unlabeled target data. Main results. The effectiveness of the proposed method was tested on four different feature sets, and results showed that the average classification accuracy was improved by above 10% on our collected dataset with the best accuracy reached 90.4%. Compared to six kinds of classic transfer learning methods, the proposed method showed an outstanding performance with improvements of 3.2%-13.8%. Additionally, the proposed method achieved an approximate 9% improvement on a publicly available dataset. Significance. These results suggested that the proposed SD-DDA method is feasible for cross-individual motion intention recognition, which would provide help for the application of sEMG-PR based system.


Introduction
Surface electromyography (sEMG) is commonly used to decode human physiological activities and monitor neuromuscular functions due to its advantages of simple acquisition and rich information [1,2].Through the analysis of sEMG signals using pattern recognition techniques, the motion intent contained in the EMG signals can be decoded, serving as a control signal for prosthetic/rehabilitated hand [3,4] or an interactive signal in virtual and augmented reality scenarios [5,6].In recent years, pattern recognition based on sEMG signals has become a research hotspot due to its ability to enable intuitive control of assistive devices and real-time interaction in human-computer systems.However, the nature of sEMG signal is user-dependent, as they are largely affected by factors such as subcutaneous quantity fat, skin impedance, and electrode position [7,8].Even when users perform the same intended motion class, sEMG signals recorded from different users often exhibit noticeable differences, which poses challenges in developing myoelectric systems used effectively for multiple users.
In order to achieve adequate classification accuracy for new users, the EMG-based motion recognition methods based on the traditional machine learning algorithms [9][10][11][12] such as linear discriminant analysis (LDA), artificial neural network (ANN), and support vector machine, or the deep learning models of convolutional neural network (CNN) and long short-term memory, often require collecting sEMG signals of new users for each motion class and retraining a specific classifier.However, collecting training data from new users usually requires them to perform motions according to the designed experimental paradigm, which is time-consuming and boring [13,14].Therefore, developing a more efficient crossindividual motion intention recognition method is crucial for the practical application of myoelectric systems and has attracted the attention of many researchers worldwide.
Transfer learning (TL) technology is an effective approach to address the problem of cross-individual pattern recognition [15].In TL technology, a motion classifier is pre-trained on the sEMG data of multiple users and domain knowledge is captured.Then, the domain knowledge can be transferred to new users using a small amount of data from new users.In this way, TL vastly reduces the number of training samples from new users.Kim et al proposed a subjecttransfer framework that selected and fine-tuned the pre-trained CNN classifiers using the labeled EMG data of the new subject's hand movements, which obtained 23.2% higher accuracy than the model trained using only the new subject's EMG data [16].Khushaba et al designed a canonical correlation analysis (CCA) framework to extract individualirrelevant feature sets from different users for building a cross-individual motion classifier, and achieved 83% accuracy across multiple users [17].Kobylarz et al achieved the recognition accuracy of around 97% on new subjects by introducing the processes of inductive and supervised transductive approach and using a five-second EMG data of each motion class as calibration data [18].Although these supervised methods can improve the accuracy of motion intent recognition for new users, they usually require a labeled dataset from the new users for TL, which will increase the burden of providing labeled calibration data.
Compared with supervised methods, unsupervised TL methods only require the acquisition of unlabeled data from novel users, which eliminates additional hardware set-up and the time-consuming process of data relabeling.Especially, the unlabeled data generated during long-term usage can be used to retrain the unsupervised TL model constantly [19].Recently, there has been an increasing number of unsupervised methods proposed in the field of TL.One common strategy is to align the feature representations of source and target domains into a common feature subspace.For example, Zhang et al proposed a joint geometrical and statistical alignment framework to align the features between different domains in a common subspace, and achieved outstanding performance in cross-domain visual recognition tasks [20].Wang et al proposed the dynamic distributed adaptation (DDA) network algorithm to find the optimal feature subspace that can reduce inter-domain differences, which obviously improved the performance of unsupervised TL in digital recognition, sentiment analysis and image classification [21].In these methods, the selection of hyper parameters, such as the dimension of feature subspace and the regularization parameters, greatly affect the performance of feature alignment.Among the hyper parameters, the dimensionality of the resulting subspace plays a crucial role in finding a common feature representation of the source and target domains [22,23].As demonstrated in the study of Chen et al, a large size of dimension could not effectively remove the irrelevant information, while a small size of dimension would affect the identification capacity [24].In previous studies, the optimal dimension was often determined based on prior verification on the source domain [21,25].However, due to the large individual differences in sEMG signals, the dimension determined by source users may not be applicable to new individuals, which would degrade the performance of the unsupervised methods.Therefore, it is necessary to develop an unsupervised sEMGbased cross-individual TL method which can adaptively determine the dimensionality of the common subspace.
In this study, we proposed an optimized unsupervised feature domain adaptation strategy, namely self-adaptive dimensional dynamic distribution adaptation (SD-DDA), which can automatically select the optimal feature dimension based on the unlabeled target domain data from novel users to improve the performance of cross-individual motion recognition.In this method, the feature spaces of different individuals are mapped to the same subspace by dynamically aligning the marginal and conditional distributions of features between individuals, and the optimal dimensionality of the feature subspace is adjusted for each new subject by estimating the preclassification performance on the target individual's unlabeled data.The effectiveness of the algorithm was verified on our collected dataset and a publicly available dataset.Furthermore, the effects of different feature sets, the data lengths of new individual and the number of source individuals on the performance of the proposed method were comprehensively investigated.

Method
DDA [21] is a feature domain adaptation method that minimizes the distances of the marginal and conditional distributions between the source and target domains through dimensionality reduction which notably improves the performance of TL in digital recognition, sentiment analysis and image classification.However, a major limitation of the traditional DDA method is that the dimensionality determined by source individuals is fixed during the TL process, and the fixed dimension is not necessarily applicable to target individuals because of the large individual differences in sEMG signals.To address this issue, we propose a SD-DDA framework as shown in figure 1.In this framework, sEMG signals of the source and target subjects are segmented and features are extracted to compose the source and target datasets, respectively.Next, the labeled source dataset and the unlabeled target dataset are input into the SD-DDA algorithm as the source and target domain data, respectively.Then, the transfer matrix is obtained by minimizing the difference between the source domain and the target domain data, where the optimal dimension of common feature subspace is automatically selected by estimating the performance of the pre-classification.Finally, the source and target datasets are mapped to a common feature subspace by the transfer matrix to obtain the training and testing datasets, respectively.

Dynamic distribution adaptation (DDA) 2.1.1. Marginal and conditional distribution
To minimize the difference between the source and target domains data, a feature transfer matrix is used to map the source and target domains to the same subspace.In this subspace, the distributions of the source and target domains should be as close as possible, which means that their marginal and conditional distributions should be similar after transferred.This alignment of the distributions is crucial for effective domain adaptation in order to ensure that the learned feature subspace is applicable to both the source and target domains.
In DDA method, the maximum mean discrepancy (MMD) is used to estimate the distance of marginal distribution and the distance of conditional distribution between the source and target domains.The MMD of marginal distribution between the source domain (Ds) and the target domain (Dt) is calculated as where n and m represent the total number of samples in the source domain and target domains, respectively, || • || H is reproducing Kernel Hilbert space (RKHS) norm, in which the RKHS is a complete normed space with an inner product.The MMD will asymptotically approach zero only if the two distributions are the same in RKHS.A is a matrix that transfer Ds and Dt to a common feature subspace.The distance of conditional distribution between the source domain Ds and the target domain Dt can be denoted as where C is the number of motion categories, and n c and m c are the number of samples of the cth category motion in the source and target domains, respectively.Since the sEMG data of target individuals are unlabeled, c = 1 to C are pseudo-labels obtained by pre-classification, in which the classifier trained on the labeled source domain data was applied to the unlabeled target domain data.To obtain matrix A, we convert MMD into matrix representation by applying kernel tricks and the theorem of ||W|| 2 = trace(WW T ), which are commonly used in dimensionality reduction-based domain adaptation methods [26,27].Equations ( 1) and (2) can be expressed as where X = [Xs, Xt], is the combined data of source domain and the target domain.M 0 and M c can be represented as Therefore, the formula that combines the two distances of marginal distribution and conditional distribution is represented as where λ||A|| F 2 is the regular term and µ∈[0,1] is the adaptive factor which indicates the importance of the conditional distribution.Then, the problem of minimizing the difference between the source and target domain data is equal to the trace optimization problem [28].

Minimizing distance of source and target domain
In order to map the source and target domains into a common subspace, in which the feature of source and target domains can be efficiently aligned, while the differences between different motion classes remains distinguishable.Long et al proposed a dimensionality reduction framework to reduce the distance of marginal distribution and conditional distributions between the source and target domains [27].In this method, a constraint as follows is added, where H is the center matrix, and I is the identity matrix.Then, the optimization objective can be obtained as follows, Next, according to the theory of constrained optimization, the above optimization objective can be converted into a generalized eigenvalue decomposition, (10) where ϕ is Lagrange multiplier.Then, we can solve the eigenvectors of equation ( 10) to obtain the eigen matrix A.
Then, the transfer matrix B can be obtained by taking the k eigenvectors from A, as shown in equation (11), Finally, the source domain data and the target domain data are mapped to a k-dimensional subspace by multiplying with transfer matrix B to achieve feature alignment.It is worth noting that k is a crucial parameter that will affect the alignment performance of the source and target domains within the common subspace.The value of k is determined adaptively by evaluating the alignment performance according to the method described in section 2.2.

Dynamically adjust the importance of marginal distributions and conditional distribution
To compute the matrix A, the parameter µ, which indicates the importance of the conditional distribution in equation (10), should be calculated first.In DDA algorithm, the parameter µ, was determined by using a measure of A-distance which evaluates the distances between different distributions [21].According to the study of Ben-David et al [29].A-distance is defined as the error of building a linear classifier to distinguish two domains.
A-distance for marginal distributions can be denoted as where err(h) is the error rate for classifying two domains using a LDA classifier.A-distance for conditional distributions can be denoted as where err (c) (h) is the error rate of domain classification for the cth motion.Then, the parameter µ was be calculated by the proportion of d C to the sum of d M and d C as following When the marginal and conditional differences between the domains decrease, the ability of the binary classifier to recognize the source and target domains will decrease, which in turn leads to a decrease in |d M | and |d C | [29].So, it is simple to represent the importance of marginal and conditional distributions in equation ( 9) according to the proportion of |d M | and |d C |.

Self-adaptive dimensional selection
Since the value of k determines the dimensionality of the new feature space after transferred in the equation (11), the optimization of parameter k is another important issue to obtain the transfer matrix B. The traditional approach to determine the parameter k is based on the classification performance of the source users, which may not be suitable for target users because of the great difference in crossindividual sEMG signals.Therefore, we proposed a method to automatically select the optimal dimension based on clustering the unlabeled target domain data in this study.The steps are as follows: (1) Clustering the target domain data: since the target domain data are unlabeled, we applied kmeans clustering [30] to cluster the data.It was assumed that each cluster contains all the samples for each motion class, even though the motion label of each cluster is unknown.Then the average value of the samples is calculated according to the following formula to obtain the data center of each cluster, (2) Pre-classifying target domain data: the target domain data are pre-classified using the LDA classifier trained on the source domain data, and then the center of each class is calculated according to the following formula (3) Calculating the distance between the cluster center and the pre-classification center: the Euclidean distance between each cluster center and each pre-classification center is calculated, the formula is represented as where i and j represent the ith class of clustering and the jth class of classification, respectively.Then, the following matrix is composed clus , Center (4) Estimating the performance of pre-classification and selecting best dimension: the performance of pre-classification was estimated by calculating the distance between the cluster and preclassification of each motion class.However, as the motion labels of clusters are unknown, the ith actual cluster may not correspond to the ith class of pre-classification in the DE matrix.To address this, we corresponded each cluster to a pre-classification motion class based on the distance between the cluster center and the preclassification center.This allowed us to obtain the matrix DE', where the diagonal values represent the distances between the cluster centers and corresponding pre-classification centers.So, the sum of the diagonal values of the matrix DE' is calculated to estimate the performance of the pre-classification, Next, we calculate the pre-classification performance for different dimensions, ranging from 1 to the maximum feature dimension.The dimension that minimizes Perf clas is selected as the optimal dimension.

Experiment setup and acquisition protocol
In this experiment, eight able-bodied subjects aged 22-33 years participated in the collection of the dataset, and the surface EMG signals were collected by eight wireless electrodes (Trigno wireless system, Delsys Inc., Boston, USA).The experimental protocol was approved by the Institutional Review Board of Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences (SIAT-IRB-200715-H0513).And all subjects provided permission for publication of photographs for scientific and educational purposes.

Data acquisition protocol
As shown in figure 2(a), eight wireless electrodes were evenly attached on the skin surface of the forearm in two circles with 5 cm between two rows.We collected surface EMG signals of seven movement classes shown in figure 2(b), including hand close (HC), hand open (HO), wrist extension (WE), wrist flexion (WF), wrist pronation (WP), wrist supination (WS), no movement (NM).There are two experiment sessions for each subject.In each session, subjects were asked to hold each movement for 5 s and repeat it four times.In order to avoid muscle fatigue, a 5 s rest is between the two successive repetitions and a 5 min rest is between the two sessions.

Data preprocessing and dataset segmentation
The EMG signals were recorded at a sampling rate of 2000 Hz, and filtered with a band-pass filter from 20 Hz to 450 Hz and a trap filter of 50 Hz.Based on the conclusions of the existing literatures [31][32][33] and the results of our primary experiment, a window size of 100 ms with 50 ms sliding was used to segment sEMG data to achieve a reasonable number of samples (i.e. 5 s/50 ms = 100 windows for one repeat (5 s of data)).To validate the proposed crossindividual motion intent recognition algorithm, we randomly select three subjects from the remaining seven subjects as source dataset for different subject as target dataset.Thus, there are C  to train the feature transfer matrix.For target subjects, the data were unlabeled and the feature sample sequences were randomized, as shown in figure 3.

Performance evaluation and validation 2.4.1. Performance evaluation
Performance of the proposed SD-DDA was tested on four state-of-the-art sEMG feature sets [5,[34][35][36][37][38] including normalized TD4 (TD4N), RMS and fifthorder AR coefficient (TDAR), time domain power spectral descriptors (TDPSD), and improved discrete Fourier transform (iDFT).Six advanced TL algorithms, including CCA, principal components analysis (PCA), geodesic flow Kernel (GFK), transfer component analysis (TCA), joint distribution adaptation (JDA) and DDA [17,21,22,26,27], were compared with the SD-DDA algorithm.Among them, the dimensionality of the subspaces which aligned different domains by CCA, PCA, TCA, JDA and DDA algorithms needs to be obtained by cross-validation of the training set.In our study, LDA was used as the classifier due to its less computational requirement and high classification performance in sEMG-PR.
In order to investigate the effect of target domain data length on the classification accuracy, the sEMG data length of 0.5 s-6 s was used in this study.In addition, different number of 2-7 subjects were used as source dataset to investigate the effect of source subjects number on the classification performance.In this study, the classification accuracy was used to evaluate the motion classification performance of the proposed method on the issue of cross-individual, and the formula can be represented as In addition, we performed statistical significance analysis of the classification results using T-Test with a significance level of 0.01.

Performance validation on public dataset
In addition to the dataset collected for this paper, a public dataset called MYO Dataset [7] which was collected by Thalmic lab's Myo Armband was utilized to verify the cross-individual classification performance of the SD-DDA algorithm.The MYO dataset is built by Cote-Allard et al to study the sEMG-based crossindividual gesture recognition algorithms.The dataset contains seven motion classes, which are NM, HC, HO, WE, WF, ulnar deviation (UD) and radial deviation (RD).There are two sub-datasets in the MYO Dataset, including pre-training dataset and evaluation dataset.The pre-training dataset and evaluation dataset are comprised of 19 able-bodied subjects and 17 able-bodied subjects, respectively.The pretraining dataset and the evaluation dataset were used as source dataset and target dataset to study the generalization ability of the proposed SD-DDA algorithm in our study.

Cross-individual classification performance of the proposed SD-DDA algorithm
The accuracies of TD4N, TDAR, TDPSD and iDFT across eight subjects before and after transferred by SD-DDA are shown in figure 4. It can be observed that the classification accuracy of TD4N feature set achieved 11.0% improvement after transferred.For TDAR, TDPSD, and iDFT, the accuracy of each subject showed an improvement after transferred and the averaged accuracies of them all reached around 90%.Among them, TDPSD showed the highest improvement, with an average accuracy increase of 12.9%.In addition, the TDPSD feature set also achieved the highest accuracy of 90.4%.The SD-DDA algorithm achieved 8%-13% improvement on four feature sets, and reached a significant improvement with the Ttest (p < 0.01).
The confusion matrices of the seven motion classes for the four feature sets, before and after transferred by SD-DDA, are shown in figure 5.It can be observed that prior to transfer, the accuracies of HO and WS were notably lower compared to other motion classes for all four feature sets.However, after feature transfer by SD-DDA, the accuracies of HO and WS improved notably for all feature sets except the TD4N, with the accuracy of WS for TDPSD increasing by 17.4%.As for the motion classes of WE, WF, and NM, which had higher accuracies before transferred, the average accuracies still improved by 3.3% for all feature sets after transferred by SD-DDA algorithm.

Effect of different parameters on the classification performance of SD-DDA algorithm
The effects of the sEMG data length of target subjects on the performance of TL were investigated in this study.As shown in figure 6, the classification accuracies for all the four feature sets generally increased with increasing data length.For TD4N, the classification accuracy exceeded 79% as the data length reached 6000 samples, and then stabilized at around 77%-80% with further increase in data length.For TDAR, the accuracy showed fluctuations with increasing data length, achieving the highest accuracy of 90.5% at a data length of 12 000 samples.
For TDPSD, the classification accuracy increased from 79.9% to 90.3% as the data length increased from 1000 samples to 8000 samples, but no notable improvement was observed with longer data length.For iDFT, the accuracy showed a rapid improvement with increasing data length, reaching 90.0% at a data length of 4000 samples.
Furthermore, the effects of the number of subjects that used in the source dataset were also investigated.As shown in figure 7, the area with slashes represents the portion of increased accuracy after applying the SD-DDA algorithm.It can be seen that as the number of source subjects increases, the accuracies of different feature sets after TL showed an upward trend.For TD4N, as the number of source subjects increased, the accuracy after transferred by SD-DDA gradually improved and stabilized above 80% when the number of source subjects reached 4. For TDAR, when the number of source subjects went from 2 to 3, the accuracy after transferred was greatly improved from 83.1% to 88.4%, and then slowly increased to more than 90% as the number of source subjects increased.For TDPSD, the accuracy after transferred by SD-DDA was increased by 7.1%-94.8%when the number of source subjects was increased from 2 to 5, and no notable improvement was obtained when the number of source subjects was further increased.The accuracy of iDFT after transferred was improved from 88.1% to 94.5% when the number of source subjects went from 2 to 7.

Comparison of performance with different TL algorithms
We also compared the performance of the proposed SD-DDA strategy and other advanced TL algorithms, as shown in figure 8.It can be observed that the proposed SD-DDA strategy achieved the highest classification accuracy compared to other algorithms and demonstrated a significant improvement over the baseline without TL.In particular, the accuracies of SD-DDA were around 90% for the feature sets of TDAR, TDPSD and iDFT, which were higher than those of other algorithms.The classification accuracy of different feature sets increased by less than 1% when the CCA algorithm was used, and even decreased for the GFK and PCA algorithms.In contrast, the TCA, JDA and DDA algorithms had the relatively higher classification accuracy among the commonly used algorithms.The accuracy of TD4N was increased by more than 7% for TCA.And the accuracies of TDPSD feature set increased from 77.5% to 83.4% and 87.2% for JDA and DDA, respectively, which were significantly higher than those without TL.
Figure 9 represents feature maps of the original TDPSD feature set and that transferred by SD-DDA, where the first eight principal components are shown.It can be seen that there were notable differences   between source and target subjects for the original feature set.Especially for the motion class of WF, the feature map of the source and target subjects exhibit great differences in the original feature set.However, after feature transferred by SD-DDA, the feature distributions of the source and target subjects for each motion class were coincidental, indicating a successful alignment of the feature distributions between the source and target subjects.

Performance validation on public dataset
The performance of the proposed SD-DDA framework was investigated on the MYO dataset which is used for cross-individual EMG-based pattern recognition in previous study.It can be observed from table 1 that the accuracies of the TDPSD and   iDFT feature sets after transferred by the SD-DDA algorithm were improved by approximately 9% and 8%, respectively.For the TD4N and TDAR feature sets, the average accuracies were also improved by 2%-6% after transferred by the SD-DDA algorithm.
Compared with other classic TL algorithms, the performance of SD-DDA algorithm was also notably better.These results indicate that the proposed SD-DDA algorithm has good generalization ability.
Figure 10.The 2D feature distribution of WE motion after minimizing the differences of marginal distribution, the differences of conditional distribution and the difference of these two distributions between source and target subjects, respectively.

Discussion
Hand motion recognition is an important research area in the field of human-computer interaction.However, due to individual differences, when a model was trained on known individuals, its performance will be greatly degraded for new individuals [39].The traditional approach is to collect a large amount of labeled data from new users to retrain a model, which is time-consuming.Knowledge transfer of existing motion recognition models using a small amount of unlabeled data for new users is a research hotspot recently.In this study we proposed an optimized unsupervised feature domain adaptation strategy to align the features of source individuals and target individuals.The method achieved good crossindividual classification performance on our collected sEMG dataset and a public sEMG dataset, indicating the good generalization ability of the method.

Performance of the SD-DDA algorithm in feature alignment
In this paper, we aim to reduce the individual differences between source and target subjects through feature alignment.The proposed SD-DDA algorithm can realize feature alignment between source and target individuals by minimizing the differences of marginal and conditional distributions between different individuals.Figure 10 shows the 2D feature distribution maps of WE after minimizing the differences of marginal distribution, the differences of conditional distribution and the differences of these two distributions between source and target subjects, respectively.It can be seen that the distance between the original feature distributions of source and target subjects was large, while the distances were closer after transferred by minimizing the distances of marginal or conditional distribution between source and target subjects.Compared to the feature alignment based on marginal distribution or conditional distribution only, the alignment based on both marginal and conditional distributions achieved the similar feature distributions for source and target subjects.These results show that the individual information in the source and target subjects is greatly reduced based on the SD-DDA algorithm, which facilitates the recognition of cross-individual motion intention.
In this study, we investigated the effectiveness of the proposed SD-DDA method on the state-of-theart feature sets, including TD4N, TDAR, TDPSD and iDFT.From figure 4, it can be seen that the classification accuracies of the four feature sets were improved by 8%-13% after transferred by SD-DDA.In terms of motion classes, as shown in figure 5, it can be observed that the accuracies of all motion classes were above 80% after transferred by SD-DDA for the iDFT feature set.In particular, the accuracies of HO and WS were notably improved, with the accuracy increment of 17.4% of WS for TDPSD.These results evident that the SD-DDA algorithm has good generalization performance for different features.

Effects of target data length and source subject numbers on the performance of SD-DDA
The length of unlabeled sEMG data collected from new individual and the number of subjects in the source dataset are important factors that affects the classification performance of TL algorithms [15,40].
From figure 6, it can be seen that as the data length increasing, the classification accuracy of all feature sets gradually increased and then stabilized at around the highest value, in which the classification accuracy of the TDAR, TDPSD and iDFT feature sets reached a stable level of around 90%.The result demonstrates that the classification accuracy is positively correlated with the data length until a stable accuracy is reached, after which further increases in data length have no apparent effect on the accuracy.Thus, it is reasonable to select an appropriate sample length for new individuals to achieve the highest classification performance.
As shown in figure 7, the accuracies of all four feature sets without TL demonstrated an upward trend with an increase in the number of subjects in the source dataset and reached a maximum accuracy of over 85%.This indicates that having more subjects in the source dataset can be beneficial to reduce the impact of individual differences, which is consistent with the findings in [41].More subjects in source dataset also improved the classification performance after transferred by SD-DDA, with the highest accuracy of 95.5%.The improved portions of classification accuracies after applying SD-DDA method were always above 6% with an increase in the number of subjects in the source dataset.In summary, increasing the number of subjects in the source dataset can reduce the effect of individual differences and then improve the classification accuracy after transferred by SD-DDA.

Comparison with other TL method and validation on the public dataset
The classic TL algorithms such as GFK, CCA, PCA, TCA, JDA and DDA were compared with SD-DDA.
As shown in figure 8, the accuracy of GFK algorithm was lower than that of SD-DDA algorithm, with no improvement compared to the feature sets without TL.This may be because the subspace dimension which selected by subspace disagreement measure is too small to represent the input information [27].
For the algorithm of CCA, which has been proven to extract the most relevant information from the same motions and can significantly improve crossindividual classification performance [42,43], obtain the less than 1% improvement than the original accuracies which have no TL.This may lie in the reason that the data of target subjects are randomly sequential.To compare the performance of CCA in the sequential and non-sequential conditions, we performed the unsupervised CCA algorithm when the motion sequence of target subjects was ordered as the same with that of source subjects.The results are shown in table 2. It can be observed that the classification accuracies reached more than 87% after feature transfer by the unsupervised CCA algorithm when the movement sequences of source and target data were the same, which were notably higher than those when the movement sequence of target data was randomized.Although the unsupervised CCA algorithm with sequential motions achieves about 90% classification performance, the unsupervised SD-DDA algorithm proposed in this study also achieves a performance similar to that of the unsupervised CCA method with sequential motions.Similar to SD-DDA algorithm, the accuracies of domain distribution adaptation algorithms such as TCA, JDA and DDA were higher than the accuracies of non-domain distribution adaptation algorithms such as CCA, PCA and GFK when the data of target subjects were randomly sequential, which is consistent with the findings of Pan et al [26].
For domain distribution adaptation algorithms, choosing an optimal dimension that aligns the feature of source and target subjects is crucial to achieve the highest classification accuracy.A common method for selecting the optimal dimension is to use a small amount of labeled data from the target subjects for validation, which is unfeasible for unlabeled target subjects' data [15,44].Another approach is to determine the appropriate dimension through source dataset and then used it for target subjects [25].This approach can be used for the situation when the data of target subjects are unlabeled.However, as shown in figure 8, the classification accuracies of the DDA algorithm that adopted the second method to select the optimal dimension were noticeably lower than those of the SD-DDA algorithm, indicating that the dimension selected according to the source subjects is not suitable for target subjects.
Furthermore, in most TL methods, an optimal dimension is selected by pre-training and not be changed for different target datasets.However, it is worth noting that the optimal dimensions may vary for different features and individuals in EMG signals.Figure 11(a) illustrates the accuracies of different subspace dimensions on the four feature sets with fixed dimension.It can be observed that the TDPSD feature set achieved the highest accuracy of 88.3% at a dimension of 12, while the TDAR feature set attained the highest accuracy of 88.4% at a dimension of 16.Similarly, as shown in figure 11(b), subject Sub1 exhibited the highest accuracy at a dimension of 28, while subject Sub6 achieved the highest accuracy at a dimension of 12.These findings emphasize the importance of selecting appropriate dimensions for different features and individuals in TL algorithms.Therefore, the proposed SD-DDA algorithm, which can automatic select optimal dimension to match the feature of target individual, is useful for crossindividual motion recognition.
To further evaluate the performance of our proposed method in more difficult recognition tasks, six finger gestures of HC, HO, index finger extension (EXIF), double fingers extension (EXDF), three fingers extension (EXTF) and four fingers extension (EXFF), which are included in the public SIA dataset (https://github.com/malele4th/sEMG_DeepLearning), were utilized.The dataset comprises data from four able-bodied subjects, which were collect with six Delsys electrodes affixed to the forearm at a sampling rate of 2000 Hz.Each subject was chosen as the target subject, while the other three served as source subjects.The results of the TDPSD feature set before and after feature transfer by the proposed SD-DDA algorithm are shown in the figure 12.It can be observed that the classification accuracies for all six finger gestures increased with SD-DDA.The classification accuracies of EXIF, EXDF, EXFF and HC increased by more than 3% after feature transfer by SD-DDA.Notably, the classification accuracy of HO increased by 6.8% after employing SD-DDA.So, it can be demonstrated that our proposed method is effective for recognizing finger gestures.

Limitations and future work
We only used some engineered feature sets to demonstrate the performance of the proposed SD-DDA method, although the classification performance was improved to more than 90%, there was still about 10% room for improvement.Additionally, In the future, we will combine the framework with deep learning to improve the across-individual classification performance.Furthermore, more subjects and even upper arm amputees who are the end users of EMG-PR based prostheses will be recruited to verify the effectiveness of SD-DDA algorithm.Based on conclusion of the existing literatures [42,45] and the results of our primary experiment, the transferred feature does not always bring a positive impact on the target domain since the various degrees of difference between the source and target domains.In the future research, we will find an index, such as cosine similarity, to evaluate how much difference between the source and target subjects can be acceptable by our method.By continuously refining and expanding our research, we hope to enhance the practicality and effectiveness of the SD-DDA method for cross-individual motion intention recognition.

Conclusion
Developing an efficient cross-individual motion recognition method is crucial for the practical application of myoelectric systems.However, most multiuser myoelectric systems have poor accuracy and require labeled target user data.In this study, we propose a novel SD-DDA method which automatically maps source and target domains to a common subspace using a small amount of unlabeled target data from new subject.The performance of this method was verified on eight able-bodied subjects and four state-of-art feature sets.Results showed that the average classification accuracy was improved by more than 10%, and the best accuracy reached 90.4% ± 5.5%.Additionally, compared with six classic TL algorithms, the classification accuracies were notably improved by the proposed SD-DDA method, with improvements of 3.2%-13.8%.Therefore, the work of this study would provide a feasible method for cross-individual sEMG pattern recognition system.

Figure 1 .
Figure 1.Schematic diagram of the SD-DDA algorithm.

3 7 =
35 combinations of source dataset for each target dataset and 280 (35 source dataset * 8 target dataset) pairs of source and target datasets were explored.For subjects of source dataset, the EMG signals of the first session were used as training set data.For subject of target dataset, the EMG signals of the second session were used as test set data.The EMG signals of first repetition of each motion from the target subjects' first session are taken

Figure 3 .
Figure 3. De-labeling and randomly sequencing the target dataset.

Figure 5 .
Figure 5.Comparison of classification confusion matrix between features without transfer learning and after transfer learning by SD-DDA.

Figure 6 .
Figure 6.Classification performance for different lengths of target subject's sEMG data used in our proposed SD-DDA framework.Error bars are used to represent the standard deviation in accuracy across the eight subjects.

Figure 7 .
Figure 7. Accuracies for different numbers of subjects used in source dataset (the area with slashes is the improvement after using the SD-DDA).

Figure 8 .
Figure 8.Comparison of the classification performance between the SD-DDA algorithm and other advanced transfer learning algorithms.The error bar represents standard deviation in accuracy for eight subjects.* indicates that the two groups of data between transfer learning algorithm and original method are different at 1% significance level (p-value < 0.01).

Figure 9 .
Figure 9. Feature map of the original TDPSD feature and that transferred by SD-DDA, where the first eight principal components are shown.

Figure 11 .
Figure 11.Classification accuracy of different subspace dimensions without dynamic dimensional selection.(a) Different features (b) different subjects based on TD4N feature set.

Figure 12 .
Figure 12.Classification confusion matrix of TDPSD feature set of SIA dataset with and without transfer learning by SD-DDA algorithm.

Table 1 .
Classification performance of SD-DDA algorithm and classic transfer learning algorithms in public dataset.

Table 2 .
The comparison of classification performance between our proposed method and the unsupervised CCA method using non-sequential and sequential target dataset.