DistaNet: grasp-specific distance biofeedback promotes the retention of myoelectric skills

Objective. An active myoelectric interface responds to the user’s muscle signals to enable movements. Machine learning can decode user intentions from myoelectric signals. However, machine learning-based interface control lacks continuous, intuitive feedback about task performance, needed to facilitate the acquisition and retention of myoelectric control skills. Approach. We propose DistaNet as a neural network-based framework that extracts smooth, continuous, and low-dimensional signatures of the hand grasps from multi-channel myoelectric signals and provides grasp-specific biofeedback to the users. Main results. Experimental results show its effectiveness in decoding user gestures and providing biofeedback, helping users retain the acquired motor skills. Significance. We demonstrates myoelectric skill retention in a pattern recognition setting for the first time.


Introduction
Beyond diverse applications in neuroscience and clinical neurology [1], one use of myoelectric signals is in the control of active effectors, e.g.prostheses and exoskeletons [2][3][4][5][6][7][8] or to enable interaction with objects in the virtual, augmented, and mixed-reality environments.Machine learning enables the estimation of user intents by decoding distinct grasp or movement signatures from the myoelectric signals [5,9].For instance, Meta is developing machine-learning models of upper-limb muscle activity that can adapt to a user's unique typing patterns and enable personalised virtual keyboards [10].
In conventional human motor learning tasks, in which the relationship between the muscle activity and the task is one-to-one, linear, and simple, practice and (bio-)feedback improve performance and reduce undesired variability in the relevant degrees of freedom [4,11].This improvement in performance persists over time, that is, the delivery of feedback supports the retention of new myoelectric control skills [12][13][14].However, existing machine learningbased approaches to myoelectric control cannot deliver continuous, intuitive, and smooth feedback about the control space or myoelectric variability, relevance, or redundancy to the user.
Similarly, both one-day and multi-day studies of myoelectric control with machine learning based decoders reveal improvement in control with practice; with fixed, recalibrated, or adaptive decoders [11,[15][16][17][18]. Nonetheless, it is debatable whether these improvements are temporary, reflecting motor adaptation [19], or long-term, supporting motor learning [20].A common component of the studies of myoelectric adaptation and learning is feedback; be it presented visually on a screen or with a prosthesis or delivered with electro-or vibro-tactile stimuli [21][22][23][24].Extrinsic feedback typically provides the user with task results, e.g.target hits in a typical motor control task, or with a knowledge of the quality of control, e.g.path efficiency in the control space [25][26][27].If motor learning underpins these improvements, one would expect to observe the retention of myoelectric skills [28][29][30] after learning.Assessing the retention of myoelectric skills is only feasible when feedback is not available [30,31].
One challenge in the use of current black-box and typically non-linear machine learning algorithms is that mapping high-dimensional myoelectric signals to low-dimensional task-related feedback results in discrete, jittery, and non-intuitive signals.This limitation is exacerbated by the intrinsic noise and signal-dependent variability in myoelectric signals [32], as well as the inaccuracies of machine learning algorithms.Consequently, providing simple, continuous, and target-specific biofeedback is not feasible.In fact, at best, machine learning methods can offer feedback regarding the decoded motor intention in a discrete manner, designed to maximise confidence and ensure reliability [33].Current feedback methods are usually dependent on the electromyography (EMG) signal amplitude, e.g.[34][35][36].Fang et al [36] used dimensionality reduction to visualise the pattern recognition control space.However, the unsupervised dimension reduction method was unstable and the feedback trajectory in the 2D space was noisy and hard for users to follow, hence nonintuitive.
With this paper, we address two primary limitations of machine learning methods when applied to myoelectric signals: (1) lack of explainability and (2) absence of smoothness in low-dimensional biofeedback.Addressing these challenges facilitates the study of myoelectric control learning using machine learning decoders.

DistaNet
We propose DistaNet; a neural network framework with inputs comprising the conventional features of multi-channel myoelectric signals and the output being the estimated distances between the user's state in the control space and the centroids of all gestures.This distance measure is an abstract construct and is unit-free.It offers three features, namely (1) targetspecificity, (2) continuity, and (3) smoothness.
DistaNet creates continuous pseudo-labels for the input data in a low-dimensional control space and uses these pseudo-labels to train a neural network.Figures 1(A) and (B) depict the operation of DistaNet in the neural network training and testing phases.Figure 1(C) shows 25 s of myoelectric signals recorded from the surface of the skin on the forearm of a participant whilst they performed different gestures.Figure 1(D) depicts two attributes (features) of signals from each channel, namely waveform length (WL) and the logarithm of variance.Figure 1(E) illustrates the outputs of DistaNet which are six estimated distances D to six gestures in the task space, including the rest.At each time instance, the classifier chooses the gesture with the lowest D as its output.
As we detail in the Experiment Design section, during the experiment, in each trial, participants were shown a target gesture which they had to perform and hold for 1 s. Figure 1(F) shows the classifier output with the thick overlay representing the hold period.In some parts of the experiments, subjects were presented a score between 0% and 100% at the end of each trial, which quantified the success of the participant in matching the decoder output and the instructed target gesture during the hold period.Finally, figure 1(G) shows the task state machine and the score in each trial.

Dimension reduction
In a typical EMG based human-machine interface application, be it in a metaverse, virtual and extended reality, or upper-limb prosthetic application, the EMG signals are acquired from multiple channels.The decoding algorithm needs to deal with a highdimensional data space, of size number of channels × number of the extracted EMG attributes.Although the feature extraction methods can reduce data complexity, dimension explosion remains a considerable challenge, especially when high-density EMG signals are considered [10,37].For a more intuitive, continuous, objective, and visualisable space to users, dimensionality reduction is necessary.A wide variety of dimensionality reduction methods are available, e.g.linear discriminant analysis (LDA), principal component analysis, locally linear embedding, Laplacian eigenmaps, t-distributed stochastic neighbour embedding, isomaps, etc.In this work, we chose the LDA method because of its simplicity and linearity which offer intuitiveness and explainability.

Pseudo-label estimation
The instantaneous distance between each sample in the LDA space and the class centroids was chosen to build the pseudo-label.This approach resulted in a continuous and intuitive feedback when compared to the conventional feedback about decoding outcome likelihood.We used the Euclidean distance because of its simplicity intuitiveness, and explainability.Other distances, e.g.Hamming, Chebyshev, and Minkowski distances could have also been used.We further used a Savitzky-Golay filter [38] to smoothen the distance pseudo-labels.It is a time-domain filtering method based on local orthogonal polynomial least square fitting.The hyper-parameters are window length and filter order, which are determined by an exhaustive search towards higher classification accuracy, between 10-500 and 1-5, respectively.

Neural network mapping
We used a shallow temporal convolutional network (TCN) [39] structure, as shown in figure 2. In this study, we stacked two blocks of TCN with one dilated convolution layer at the beginning.The layer sizes of the two blocks were 64 and 32 respectively.The kernel size was 7. The dropout was set to 0.5 to keep TCN out of the over-fitting problem.

Participants
Eighteen adult limb-intact right-handed people (age range: 23-43) took part in this experiment.Three participants had previous experience with myoelectric control.

Ethics
All experimental procedures were in accordance with the Declaration of Helsinki and approved by the local Ethics Committees of the School of Informatics, University of Edinburgh (2019/89177).All participants read an information sheet and gave consent prior to the experiments.

Experiment design
The participants were allocated to three independent groups.Group one used an LDA decoder to make an instructed grasp with their muscle activity and hold the grasp for a certain period for a score between 0%-100%, which was displayed at the end of a trial in the practice blocks only.We denote this group with LDA.The second group used the DistaNet decoder.Also, they received the score feedback at the end of each trial that reflected how well they performed in the trial.We denote this group with DistaNet-S, where S stands for Score.The third group, that is DistaNet-SD, performed the same task but during practice trials, they received two feedback signals, namely, the score per trial and a visual representation of D on the computer screen with six bars.Each bar represented the distance between where the control signal is in the feature space-after dimensionality reduction-and the centroids of the gesture classes as well as the rest class.
Experiments were conducted over two days in real-time settings.On the first day, we initially collected some EMG data for model training.Participants sat in front of a computer screen, on which pictures of five hand postures and the rest were shown.The postures included power, lateral, tripod, pointer, and hand open, as shown in figure 3(A).
Participants were instructed to perform and hold each grasp ten times, each being six seconds long.They had ten seconds to relax between each repetition.Finally, the same procedure was repeated for the Rest posture.We kept 80% of the recorded data to train a standard LDA decoder and the remaining data was used to test the decoder.Simple cross-validation was performed to avoid over-fitting.The model with the best testing score was picked.This data collection took less than 10 min.
Model training was followed by a Baseline block during which all participants used the LDA model to perform 20 trials.Each grasp was presented four times, in a pseudo-randomised order.After completing the baseline block, participants were assigned to one of three groups, namely LDA, DistaNet-S, and DistaNet-SD.The assignment was based on a moving average of the group average scores, that is, participants were assigned to groups such that the difference in the average performance score between the groups was minimised, as shown in figure 4.
Trials started when participants relaxed their muscles.An audible beep signalled the start of the trial and one of the target grasps was presented.Trials were four seconds long, comprising two periods of three and one seconds, referred to as reach and hold, as shown in figure 3(B).The start of the hold period and the end of the trial were also cued with two audible beeps.Once a target grasp was presented, the reach period allowed the participant to change the posture of their hand and hold the grasp for one second.At the end of a typical trial, a score was presented in the centre of the screen.The score, expressed as a percentage, refers to the proportion of the hold period during which the output of the decoder matched the instructed grasp.We included a two-second rest period between trials.Participants had longer rest periods between blocks, say five minutes.
Figure 3(C) shows the complete experiment design, including the type and the number of blocks during the two experimental days.Figure 3(D) reports the full arrangement of the experiment in terms of the decoder, and the availability of scores and visual feedback in the Practice and Test blocks.In summary, no feedback was provided in the Baseline block.In the Practice blocks, participants in the LDA and DistaNet-S groups received score feedback only and participants in the DistaNet-SD group received both score and distance visual feedback.The participants did not receive any feedback in the Test blocks.Accuracy, i.e. higher scores, in the Test blocks, was used as a metric to quantify learning and retention of myoelectric skills.
All the experiment is carried out in a real-time manner to simulate the most practical scenario.Due  to the light-weight model employed the requirement, hardware implementation could follow the work presented by Wu et al [40] The experiments were performed on an HP EliteBook 840 G8 laptop computer (2.6 GHz i5-1145G7 CPU, 16 GB RAM, HP Inc. California, U.S.).Real-time experimental software was implemented in Python using the AxoPy library [41].

EMG signal recording and preprocessing
Eight channels of EMG signals were recorded with two Trigno Quattro sensors (Delsys, USA).Sensors were placed on the forearm, c. 2 cm below the elbow.Starting from the extensor carpi ulnaris muscle, we spread the electrodes around the limb equidistantly.The band-pass filtered [10-500 Hz] EMG signals were sampled at 2000 Hz.During the experiment, two features were extracted from each channel with a window size of 150 ms and an overlap of 100 ms.These features were WL and log-variance (log-var).The WL feature quantifies the complexity of the signal waveform by calculating the cumulative length of the EMG signal in each window.The variance of the EMG signal indicates the contraction power [42,43] in a nonlinear way.The log-var feature linearises the variance [44].These features are calculated with where x i and x i+1 stands for two neighbouring samples during a window of length N samples and log-var = log where µ is the mean of sample values within analysis windows. Figure 1(C) shows example EMG signals and the extracted features.Our offline analyses indicated that the two features offer an acceptable trade-off between accuracy and computational complexity.But importantly, the DistaNet methodology may be applied to any type or number of EMG features.

Results
Figure Figure 5(B) provides a clearer view of performance retention across Test blocks 2, 3, and 4. Group DistaNet-SD was significantly better than group LDA (Mann Whitney test, n = 6, p = 0.04) in Test Block 3 at the start of day 2. In Test Block 4 (the end of day 2), participants in both groups DistaNet-SD and DistaNet-S outperformed those in group LDA (both with p = 0.002).No difference was observed between DistaNet-SD and DistaNet-S in Test Block 4. We had predicted that participants in group DistaNet-SD would exhibit the highest performance in both Practice and Test blocks; because they had access to both the distance D feedback and the trial score, which helped form an internal model of that task required in the Test blocks where no feedback was presented.However, the improvement in the DistaNet-S group on the second day in both Test and Practice blocs was unexpected.Therefore, we asked whether the improvement in performance in group DistaNet-S is across all gestures.Figure 5(C) shows evidently that the improvement in the overall score comes primarily from the improvement in the decoding of the Pointer and Lateral gestures, as DistaNet-S shows a relatively stable performance in decoding gestures Open, Tripod, and Power.Figure 5(D) and E depict the confusion matrices and the lowdimensional space, in which the ellipses show the distribution of the samples according to a Gaussian mixture model, at the beginning (top) and the end of day 2 (bottom).The lower panel reveals a better separation of gesture clusters.

Discussions
In the human motor control and sports and exercise literature, the dichotomy between product and process has been a focal point of discussion for decades [45][46][47].This dichotomy reflects two distinct approaches to understanding and evaluating motor learning.The product perspective centres on the final outcome of the movement, such as achieving a particular performance goal [46].Proponents of the product-oriented approach argue that setting specific targets and measuring performance against them is essential for assessing progress and determining success.This viewpoint often aligns with the competitive nature of sports, as achieving desired outcomes is often the ultimate objective.On the other hand, the process perspective places emphasis on the quality of movement execution, the underlying mechanics, and the developmental journey individuals undertake to improve their skills [7,47,48].Advocates of this viewpoint argue that focusing on the process allows people to build a strong foundation, refine their techniques, and enhance their overall motor control.The processoriented approach encourages practitioners to prioritise factors such as body awareness and movement efficiency.This perspective recognises that mastery is a result of consistent practice, deliberate refinement, and a deep understanding of the intricate nuances of movement patterns.
While both process and product perspectives play roles in motor skill acquisition and retention, focusing on the process tends to lead to more robust and enduring learning outcomes [45].In other words, when the training prioritises understanding the underlying mechanics and dynamics, e.g. with feedback, refining movement patterns, and building a deeper level of skill acquisition can be achieved.This process-oriented approach involves cognitive engagement, which aids in encoding the motor skills into long-term memory, making them more likely to be retained over time [45].On the other hand, a sole focus on the product, while motivating in the short term, may lead to shortcuts and short-term winning strategies.This can hinder the long-term retention of the acquired motor skills.
Academic research in the field of human-machine interfaces and upper-limb prostheses has traditionally leaned towards a product-oriented approach, given the short-term and laboratory-based nature of experiments, and hence can be better explained by motor adaptation [49].However, it is becoming increasingly evident that a shift towards a processoriented perspective with the use of biofeedback is necessary to achieve more comprehensive and lasting outcomes.We contribute to this evolving literature by highlighting the significance of grasp-specific distance biofeedback; emphasising that integrating biofeedback mechanisms can greatly enhance learning outcomes and promote the retention of motor skills.Nevertheless, the tension between these two perspectives is not a contradiction but rather a delicate balance.In practice, effective motor learning and skill acquisition often involve a symbiotic relationship between process and product.
We hypothesised that intuitive and smooth biofeedback signals enhance the likelihood of retention and sustainable success, while a focus on achieving goals can provide motivation and a sense of accomplishment.In our experiments which spanned across two days, we sought to examine the impact of grasp-specific distance biofeedback on the retention of motor skills.Our findings showed a promising avenue for enhancing skill retention by utilising feedback tailored to specific grasps in the control space.Importantly, we demonstrated that the DistaNet-SD group achieved higher levels of control compared to the LDA group on the second day, suggesting the potential of this approach for more effective myoelectric interface performance.However, as we currently lack insight into the long-term retention dynamics, an intriguing question arises: how much time will pass before individuals begin to forget the decoderbased control, despite retaining the intuitive motor strategies they have developed?To address this aspect, further investigations are warranted.Additionally, our study indicates the necessity for future experiments to delve into various factors influencing myoelectric interface control, thereby contributing to a more comprehensive understanding of this field and paving the way for advancements in enhancing human-machine interactions.
DistaNet-SD underscores the significance of process-oriented feedback in facilitating exceptional myoelectric interface performance.By providing users with an intuitive and coherent representation of the control manifold, DistaNet-SD bridges the gap between the myoelectric signal domain and the ideal grasp space within the control dimension.This seamless linkage enables the users to establish an innate understanding of the relationship between their physical movements and their positions within the control space, fostering an embodied comprehension of the motor skill execution process.Notably, DistaNet-SD presents users with both process-oriented feedback in the form of bars displayed on the screen and process-oriented information in the form of a score.This dual-layered feedback approach seems to play a pivotal role in achieving exceptional results.The training phase demonstrated the highest accuracy rates, further validating the effectiveness of the process-oriented information.Crucially, during testing, DistaNet-SD exhibited the highest retention levels.Further studies of process-oriented feedback and learning involving deliberate and variable practice, which exposes individuals to different contexts and challenges related to the skill may be informative in future multi-day experiments with myoelectric control users.Such an intentional variability can be achieved with gamification [50,51].
An unexpected outcome of the study was the performance that was observed in the DistaNet-S group.Initially, on the first day, this group exhibited a performance level that aligned with our expectations, given the smaller amount of feedback they received than the DistaNet-SD group (both score and distance information) and the DistaNet-S being a more accurate decoder than the LDA decoder.However, what caught our attention was the developments on the second day.The DistaNet-S participants displayed a distinct pattern: their performance scores dipped initially, not as drastically as the LDA group, but remarkably rebounded.This rapid recovery was followed by a fascinating observation during the final test block, wherein the DistaNet-S participants showcased a performance equivalent to that of the DistaNet-SD group, who had access to detailed feedback.This unexpected observation underscores the complex interplay between feedback provision and myoelectric skill retention, suggesting that if the noise in decoding can be contained with an efficient decoder such as DistaNet, certain intermediate levels of feedback, albeit product-feedback, can be enough to facilitate internalisation of myoelectric skill.Further exploration of such serendipitous results can uncover novel insights into the cognitive and neuromuscular mechanisms governing myoelectric skill learning and retention.
We simplified the DistaNet's complexity to prioritise explainability, as depicted in figure 2(A).The combination of dimensionality reduction, continuous pseudo-labeling, and machine learning offers opportunities for further enhancement.For example, for dimensionality reduction, instead of the LDA model, we could have used the linear optimal lowrank projection method [52], which is asupervised manifold learning method, incorporating classconditional moment.It might be able to have a better presentation of data that involves spatial overlapping issues.Similarly, for the continuous pseudo-labeling, we employed a basic Euclidean distance, but this could be substituted with more intuitive distance metrics, like the Jaccard distance, to incorporate sets of samples instead of pairs.This modification could enhance robustness by shifting from a point-to-point distance measurement to a sequence-to-sequence measurement for pseudo-labeling.Alternatively, we could have opted for the Mahalanobis distance to enable a point-to-sequence approach in pseudolabeling, measuring the distance between a current sample and a target distribution.Whatever the choice of substructures of DistaNet, after reducing the dimensionality, the distances can capture the relationship between samples and labels.This reduction in dimensionality transforms the distances into folded poly-planes without sacrificing the information linking class labels to the samples.This not only decreases the possibility of user confusion but also enhances the feedback's robustness.Importantly, after smoothing and filtering, the distances within each class exhibit greater stability and continuity, promising users more intuitive and comprehensible feedback.Alternative methods to enhance the performance of DistaNet include increasing the training data from each participant for model training, which is likely to boost performance.Additionally, the possibility of training a shared model for all participants and subsequently fine-tuning and calibrating it appears feasible for further improvements, similar to [18].
Everything within the proposed system embodies intuitiveness; however, a noteworthy challenge lies in crafting an equally intuitive interface when dealing with more than 4-5 classes.This challenge extends beyond prosthetic control, permeating into the metaverse applications.As we venture into complex environments and multifaceted interactions, the need for user-friendly interfaces becomes paramount.Such interfaces must seamlessly accommodate an array of options, functions, and controls, ensuring that users can effortlessly navigate and interact within these digital landscapes.

Figure 1 .
Figure 1.DistaNet (A) the diagram for the training stage of the DistaNet method, compared to a basic LDA classifier method.Feature extraction is necessary to both DistaNet and LDA-only methods (B) the diagramme for testing the DistaNet method.Example signal processing pipeline, including (C) eight channels of raw EMG signals; (D) feature extraction resulting in 16 traces (2 features per channel) (E) estimated distances used for continuous pseudo-labelling (F) classifier output determined by the lowest estimated distance as the winning class; (G) task states presented on the user interface (three example trials).

Figure 2 .
Figure 2. The structure of the neural network.(A) Dilated convolution represents the constantly increased kernel size and hollow structure, which significantly increases the receptive field.Causal convolution represents chomping, which makes the outcome only related to previous time steps inputs, [39].(B) The layered structure of a TCN block.

Figure 3 .
Figure 3. Experimental Design (A) hand gestures involved in the experiment.(B) The temporal structure of the task, for each trial.The green volume sign represents a higher pitch sound, and the red one stands for a lower one.(C) The experimental test and practice blocks arrangement in a broad view across two days (D) different feedback availability for different groups in the baseline, practice and test blocks.'None' refer to no feedback, i.e. nothing on the screen after the user finishes the trial.

Figure 4 .
Figure 4. Average performance per group in the baseline block.Subjects are evaluated with a non-feedback baseline test section.Afterwards, they were assigned to different groups to balance the initial performance of each group.All groups have the same average accuracy before the first practice block.

Figure 5 .
Figure 5. Results (A) the average decoding accuracy for three groups during each practice, test and baseline block (B) the specific decoding accuracy for each subject in each group on Test Block 2, 3 and 4. * stands for a statistical significance.(C) The average decoding accuracy for each gesture on each block in day 2 for Group DistaNet-S (D) the confusion matrix of group DistaNet-S at the beginning/end of day 2 (E) the latent space of the one subject's data from group DistaNet-S cross all classes comparison between the beginning and the end of day 2.