Fuzzy inference system (FIS) - long short-term memory (LSTM) network for electromyography (EMG) signal analysis

A wide range of application domains,s such as remote robotic control, rehabilitation, and remote surgery, require capturing neuromuscular activities. The reliability of the application is highly dependent on an ability to decode intentions accurately based on captured neuromuscular signals. Physiological signals such as Electromyography (EMG) and Electroencephalography (EEG) generated by neuromuscular activities contain intrinsic patterns for users’ particular actions. Such actions can generally be classified as motor states, such as Forward, Reverse, Hand-Grip, and Hand-Release. To classify these motor states truthfully, the signals must be captured and decoded correctly. This paper proposes a novel classification technique using a Fuzzy Inference System (FIS) and a Long Short-Term Memory (LSTM) network to classify the motor states based on EMG signals. Existing EMG signal classification techniques generally rely on features derived from data captured at a specific time instance. This typical approach does not consider the temporal correlation of the signal in the entire window. This paper proposes an LSTM with a Fuzzy Logic method to classify four major hand movements: forward, reverse, raise, and lower. Features associated with the pattern generated throughout the motor state movement were extracted by exploring published data within a given time window. The classification results can achieve a 91.3% accuracy for the 4-way action (Forward/Reverse/GripUp/RelDown) and 95.1% (Forward/Reverse Action) and 96.7% (GripUp/RelDown action) for 2-way actions. The proposed mechanism demonstrates high-level, human-interpretable results that can be employed in rehabilitation or medical-device industries.


Introduction
Electromyography (EMG) signals are used to analyze and capture the neuromuscular activation observed in the various muscles during different physical activities [1] that focus on resistance-training exercises. Two typical resistance training are employed in most muscular activation; isometric and isotonic contractions. The study of muscular fatigue and neuromuscular diseases can be performed through EMG signals obtained through isotonic conditions [2,3]. Many assistive robotic systems, exoskeletons, and lower-limb orthoses are designed based on isotonic and isometric contractions.
In isotonic contractions, muscular contractions that oppose resistance are observed as well as changes to the muscle length. By varying the muscle's length, the contraction can generate a force. The types of contractions are eccentric or concentric. In concentric contractions, muscles tend to shorten [4]. During this time, the tension within the muscle is constant [5]. In eccentric concentration, muscles elongate as they face resistance. When the external force is more significant than what can be generated by the muscle, it goes through forced lengthening. Many routine activities, like walking, are considered eccentric and thus a popular study area [6]. Many muscular injuries are also associated with eccentric contractions [7].
In isometric contractions, the muscle length does not change; however, the muscle's energy and tension keep fluctuating, allowing a force to be produced. Typically, isometric contractions are observed when an action is performed toward a fixed object without any resultant movement.
The EMG is a complex signal [8] and has many dependencies on the physiological and anatomical properties of the underlying muscles. The most convenient approach to obtain EMG signals is through the skin surface. Thus, it is also known as sEMG (surface EMG). However, the signals obtained through this method tend to have much noise due to inputs from other neighbouring motor units. Therefore, various signal processing and feature extraction techniques must be utilized for detailed data analysis.
In the human brain, the nerves utilize electrical potentials to convey information. A similar phenomenon is observed with muscles when electrical potentials are generated by the muscles during any motor movement. The motor unit action potential is the combined effect of the various muscle fibres associated with a single motor unit.
Generally, many integrated sensors perform primary signal processing techniques with hardware support. These include amplification, filtering to remove artefacts, and smoothing signals to enable easier envelope detection. They also provide custom signal processing and feature extraction techniques.
A variety of signal processing and feature extraction techniques have been utilized in the study of EMG signals [8]. Machine Learning techniques have become pervasive in system development domains. Various machine learning techniques have helped analyze and classify hand movements. In [9], the authors proposed an approach using cross-recurrence quantification analysis (CRQA) based on a set of specific hand movements. They achieved classification accuracy results in 80% to 94% for the various gestures. However, their method required a plot analysis that only applies to investigation. In [10], the authors proposed real-time classification using recurrent neural networks (RNNs). Their approach utilized 16 EMG channels and achieved an accuracy of 92.7% for specific palm gestures. A more detailed review of various techniques and results has been summarised in [11]. In their review paper, the authors explore various time and frequency domain features used in EMG classification to explore their effectiveness for other gestures.
EMG-based classification features are categorised as time-domain, frequency-domain, and time-frequency domain [12,13]. A study done in [14] concluded that the features that demonstrated the highest categorised classification accuracy were Waveform Length (WL), Mean Absolute Value (MAV), Willison Amplitude (WA), and Auto-regressive coefficients (AR), and slightly modified mean absolute values (MAV1 and MAV2). The results discussed that frequency domain features were not suitable for EMG classification. In [15], a technique to remove user variability was proposed. Using RMS signals and a Support Vector Machine (SVM) classifier, the system accurately classified data from new users without retraining. Different combinations of time-domain features, sixth-order AR, and RMS were used by the authors in [16]. Principal Component Analysis (PCA) and Uncorrelated Linear Discriminant Analysis (ULDA) were performed for dimensionality reduction, and the best results were obtained using ULDA for a 7-class problem.
The authors in [17] used 300 ms and 125 ms windows in the time domain. The 300 ms window had no overlap, while the 125 ms window had a 90 ms overlap. There was no clear reasoning behind the selection of these windows and overlap durations. The classifier trained with the 300 ms window had a better accuracy rate than 125 ms. In [18], 16 EMG channels were analyzed with a window size of 300 ms and 50 ms steps. The classifier was trained with data from 10 distinct motor movements, and classification accuracy of 92.75% was achieved using the RMS features. The authors in [19] proposed using supervised Common Spatial Patterns (CSP) to improve class separability. This technique was found to have enhanced the Signal-to-Noise (SNR) of the EMG signals. In [19], the short-term Fourier Transform (STFT) ranking feature was shown to reflect better the relationship between the signals from various EMG channels.
Convolutional Neural Network (CNN) for EMGbased gesture recognition has been investigated [20]. Using adaptive feature learning techniques, they improved the overall inter-subject accuracy. In [21], EMG signals were explored for onset/offset movement detection using a proposed LSTM-MAD (Long Short-Term Memory for Muscle Activity Detection) network. The authors reported a detection rate of up to 97%; however, the results do not reflect any particular gestures or movement but only the onset of movement detections. The authors in [22] proposed a CNN-RNN network to capture the sequential nature of EMG signals for gesture recognition. The complex process involved using sEMG image representation methods with a high level of accuracy. In [23], the authors proposed using an LSTM network to predict the upper limb motion trajectory of patients. Their system achieved an average prediction accuracy of 95.3% with heart rate information and kinematic data for the classification.
In [24], a comprehensive review of Deep Learning methods for EMG signal analysis and classification was performed. Networks such as CNN, RNN, Gated Recurring Units (GRU), Auto-Encoder (AE), Deep Belief Networks (DBN), and combinations of these were described. Olsson et al [25] proposed a CNNbased multi-labelled classification scheme. Using data from 14 healthy subjects with 16 independent movements, an accuracy level of 78.7% was reached. In [26], a sliding window method, together with a GRU-based scheme, was used to classify 6 gestures from 35 participants, achieving an accuracy level of 77.85%. In [27], bi-directional LSTM with an attention mechanism and a step-wise learning rate was used to classify 18 gestures. The highest accuracy achieved was 86.7%.
The existing research shows that the focus has been on classifying specific hand gestures or movements [28][29][30][31] and overlooks connections between them. This is very limiting as user movements generally consist of several actions linked together in a practical setting. Furthermore, signals from several sensors are generally used to train a classifier without fusing the information from the signals together. In this paper, we have explored the use of various time and frequency-related features with an LSTM network and Fuzzy Logic for full-range hand movement classification. A comparative study is also done to evaluate the effectiveness of using fuzzy logic signals in LSTM networks.
The key contributions of this paper are: • Integration of EMG features using a fuzzy inference system (FIS). The integrated signal, 'emg_score' between the different pairs of movements, has a low correlation factor that generates high classification accuracies. This allows for a more effective means of utilising the various EMG signals before they are passed onto the classifier.
• Utilising the LSTM network to accurately represent the temporal characteristics of the signals based on the 'emg_score' generated from the FIS. As EMG signals generated by different movements are timevarying in nature, the ability to capture this change is crucial in designing an efficient classifier. The LSTM network is naturally able to support this due to its ability to retain the temporal characteristics of the signal. The use of fuzzy logic allows for a more natural representation of the EMG signals. The results are comparable to the current state-of-the-art.
• Achieving 4-class classification accuracies for complete full-range hand movements outperforming other classifiers. This allows a single system to decode various real-world movements accurately.
The rest of this paper is structured as follows: the next section describes the methodology and the details of the approach adopted in this research, including the LSTM architecture, Dataset, Signal Processing, and the FIS. Section 3 outlines the results, followed by a discussion and conclusion of this study.

Methodology
The methodology outlined in figure 1 gives an overview of the steps carried out. A published dataset by [32] is used in this study. This is a comprehensive dataset consisting of 3936 different grasps and lift operations. It captures both EEG and EMG data together with other physiological parameters. There was a total of 12 participants for this data acquisition.
The objective of each participant was to reach for an object, perform a grasp operation with the thumb and the index finger, and then lift the object a short distance in the air. After holding it stable for a few seconds, the participant is given a visual cue to lower the object and place it back in its original position. The hand is then retracted back to the starting position. Data were collected from 32 channels through the EEG dataset and 5 channels through the EMG sensors [32].
The raw EMG data is first extracted from the dataset. Band-pass filtering is performed with cut-off frequencies of 5 Hz and 500 Hz. Their frequencies are selected based on the EMG sensor specifications provided in the dataset. The root-mean-square (RMS) computation is then carried out on the data. The RMS data produces a more easily analyzable waveform compared to the raw data [12]. The specific time instances that correspond to the four movements of 'Forward,' 'Grip and Lift,' 'Down & Release,' and 'Reverse' are used to extract the associated data. These movement onsets can be seen in figure 2, which corresponds to a single EMG channel; the raw and RMS signals are indicated in blue (thin line) and red (thick line), respectively. In the next step, various time and frequency domain features are extracted. These features are then synergized through a fuzzy inference system. The fuzzy outputs are then used to train an LSTM network for classification. A detailed explanation of the LSTM Architecture, Fuzzy Logic, and the rule base to generate the 'emg_score' is given.

LSTM architecture
An LSTM network is a form of a Recurrent Neural Network (RNN). RNNs can effectively learn shortterm dependencies but have an issue with long-term dependencies. LSTMs are designed to address the long-term dependency issue to successfully learn both short and long-term patterns in the data [33]. Figure 3 illustrates how data flows through an LSTM layer. It shows a time series X, with C features (channels) with a length of S. The state at any time step t, is denoted by both h t and c .
t The hidden state h t provides the cell with working memory that allows it to carry information from events that have recently happened. This is also updated at every step. The cell state c t provides the long-term capability of the cell. It can store and retrieve information from past events that may not have happened recently. The initial state of the LSTM block is used for the first LSTM block, together with the first time step of the feature sequence. t The hidden state (output state) at time step t, provides the LSTM layer output for that particular time step. Information learned from previous time steps is contained within the cell state.
The cell states are recursively updated to either add or remove information. This behaviour is achieved through the use of Gates. The components explained in table 1 control each layer's hidden state and cell state, together with their association activation functions. Figure 4 shows the data flow at a particular time step, t within an LSTM unit block introduced in     (1). Further concatenation of the matrices gives us the following: At any time, the state of the cell is given by equation (2).
The hidden state is computed as shown in equation (3).
The state activation function is represented by s .

Fuzzy logic
Fuzzy Logic allows us to relate classes of objects without clearly defining any boundaries. It classifies objects based on membership functions formed as a matter of degree. It can also be viewed as a methodology for interpreting descriptive words rather than numbers, and in this manner, it can very closely approximate human reasoning.

Fuzzy sets and membership functions
The basic idea that drives fuzzy systems is fuzzy sets. In the set theory, elements that are assigned a value of 1 are interpreted to be belonging to the set X, while those elements with a value of 0 do not belong to set X.
Flexibility can be added to the rules in a fuzzy set, allowing a more human-like interpretation of the data. It can be achieved by allowing values in-between 0 and 1 equation (4), is a membership function (MF) ofĀ, whereĀ is a fuzzy set in the Universe U equation (5). The membership function (MF) determines the degree of participation of each input. Each input has a weighted value that may have some overlap with other inputs. The output value is finally determined using rules that interpret these inputs.
Logical Operations such as AND, OR, and NOT can also be performed on Fuzzy Sets. Figure 5 demonstrates the results of these operations. The same logical concepts as digital logic apply, except now they are applied to the MFs.
Control level of cell state added to hidden state
Step 1: Initialisation The first step is to initialize a Mamdani FIS [36]. Such a system allows us to generate a final fuzzy output which can then be utilized in the next phase of the LSTM network.
Step2: Membership Functions (MF) In the next step, membership functions are defined to map the raw time-series RMS values to new outputs. A Gaussian MF was selected with RMS values in the range of 0-1 with an interval of 0.2 for all values. In equations (5) and (6), the membership functions for each channel are specified as m , ch where 'ch' represents the various EMG channels. This was applied to all four movements (Forward/Reverse/GripUp/RelDown) across all 5 channels.
Step 3: Specifying Rule Base The rules fusing the various channels are specified in this step. For the Forward/Reverse movements, the AD, FD and BR RMS values are used to generate a final fuzzified output. For the GripUp/RelDown movements, the FD, CED and FDI channels are used. The rule base can be viewed in the online reference link [37]. The rules formulation is based on observing the signals for the various movements, as discussed in an earlier published paper [38,39].
Step 4: Completing the FIS The FIS model is complete when the required input blocks(channels) are connected to the FIS rule base. The surface plot gives a pictorial view of the interpretation of the rules. Figure 6 shows the surface plot for the Forward/Reverse movements, while figure 6(b) shows the surface plot for the GripUp/Rel-Down movements. The generated Surface Plots are used to compute the final 'emg_score' from the FIS.

Testing and results
The dataset provided in [32] has been used in this research. It is a highly comprehensive dataset consisting of 3936 different grasps and lifts operations. It captures both EEG and EMG signals and other physiological parameters. There was a total of 12 participants for this data acquisition. The objective for each participant was to reach for an object, perform a grasp operation with the thumb and index finger, and then lift the object a short distance in the air. After holding it stable for a few seconds, they are told to lower it and place it back to its original position. Data were collected from 32 channels through the EEG headset and 5 channels of EMG data; FDI, CED, FD, BR, and AD muscles. Figure 7 shows the placement of these sensors on the person's arm.

LSTM analysis
The time-series data were extracted for the duration of each of the movements, as shown in figure 2. In the first phase, each channel was used to extract the time series for the LSTM network. The RMS was computed from all the trials from the entire series. The data for each participant was packed together with the appropriate labels for 'Forward, 'Reverse,' 'GripUp,' and 'RelDown.' 10-fold cross-validation was performed for the feature-set using the LSTM network, and the classification accuracy equation (8) and F1-score (9) were computed for each participant. The average classification accuracies and F1-scores for all the participants are shown in table 2. It can be seen that the use of any single channel for the LSTM is not very effective in the classification.
In an earlier publication [12], the same dataset was analyzed for the various movements. It was demonstrated that the use of the AD, BR, and FD channels generated the highest accuracies for the Forward/ Reverse movement. These channels correspond to the sensor placed on the shoulder and forearms. For the GripUp and RelDown movements, the FD, CED, and FDI channels are most suitable as they correspond to the sensors placed on the forearm and wrist. To fuse these various channels in a more human-like manner, Fuzzy Logic is used.

TWO-WAY classification with fuzzy+LSTM
In order to fuse the various signals, the fuzzy rule-base is applied, such that the resultant output (emg_score) is clearly distinct between the various actions. The RMS data is fed into the rule-base as defined in [37]. Figure 8 shows the generated emg_score output. To measure the 'dis-similarity' between the two actions of forward and reverse, the cross-correlation was computed between their emg_score signals using equation (10). A cross-correlation result of 0.09 was obtained for this pair. A similar approach is taken for the GripUp and RelDown movements. For this pair, the cross-correlation score was 0.07. These scores indicate that there is a clear distinction between the emg_score generated by the two different actions within each pair.
refers to the Covariance and s x refers to the standard deviation Further to this, One-Way ANOVA (Analysis of Variance and Co-Variance) [40] was performed. ANOVA is a hypothesis test used to determine whether the mean of independent groups has any similarity. For One-Way ANOVA, the hypotheses are (1) the null hypothesis where the group means are equal, and (2) the alternative hypothesis where the group means are not equal. To determine if the difference between the group means is statistically significant, we can look at the p-value [40]. By comparing the p-value with the confidence level, we can determine if we are to accept  or reject the null hypothesis. With a confidence level set to 0.05, a p-value of 0.006 was obtained for the Forward/Reverse pair and a p-value of 0.008 was obtained for the GripUp/RelDown pair. This shows that the emg_scores for each pair are significantly different from each other. This allows them to be used as a reliable feature for classification.
The emg_score data is now used as the input stream for the LSTM network. Data from all twelve participants for all trials are used to train the network. 10-fold cross-validation was performed, and the average classification accuracy across all the trials for the participants was then calculated. A classification accuracy of 95.1% for the Forward/Reverse Action was achieved. This is an approximately 45% improvement from using the LSTM with Time-Series RMS Data. For the GripUp/RelDown action, a classification accuracy of up to 96.7% was achieved. This is an approximately 47% improvement from using the LSTM with Time-Series RMS Data.
A comparative study was performed to evaluate the effectiveness of LSTM against other Deep Learning Techniques such as Convolutional Neural Networks (CNN) and Deep-Belief Networks (DBN). For the proposed Long Short-Term Memory (LSTM) network, the average accuracy was 95.9% with an F1 score of 0.9467. For the DBN, the accuracy achieved was 92.3% with an F1 score of 0.9021, and for the CNN, the accuracy was 88.1% with an F1 score of 0.8213. The Receiver Operator Characteristic (ROC) [41] is a probability curve that plots the True-Positive (TP) against the False-Positive (FP). The area under the ROC, referred to as the Area Under the Curve (AUC) is the measure of the ability of a classifier to distinguish  between different classes. The higher the AUC, the better the performance of the model at distinguishing between the positive and negative classes. It can be seen from figure 9 that the AUC for the LSTM network is greater than the CNN and DBN.

FOUR-WAY classification with fuzzy+LSTM
In the final phase, a 4-way classification is done with the data. Leveraging on the existing methodology, the FIS Rules are updated to incorporate all the rules for both Forward/Reverse and GripUp/RelDown action pairs. The emg_scores generated for the GripUp/ RelDown pairs are shown in figure 10.
With the combined model, 10-fold cross-validation is performed on all data from all the participants. The average classification accuracy is shown in figure 11 with the associated Confusion Matrix shown in figure 12.
A comparison study was also performed to evaluate the proposed methodology using other deep learning networks and classifiers, such as CNN, DBN, Support Vector Machine (SVM), k-Nearest Neighbour (kNN), and the Artificial Neural Network (ANN). With the same features and Fuzzy Rule Base, the following networks were tested: Fuzzy+CNN,  Fuzzy+DBN, Fuzzy+SVM, Fuzzy+kNN, and Fuzzy +ANN. In figure 13 we can observe the critical network parameters together with the average four-class classification accuracies for them.
The classification results are shown in table 3.

Discussion
The time-series nature of physiological signals like EMG allows the use of LSTM networks to identify muscle activity instances without attempting to extract extensive features from the raw data.
The initial results highlighted in table 2 indicate that the LSTM accuracy based on individual sensor channels is very low. It has been demonstrated in [12] that a combination of various channels can yield better classification accuracies. The use of Fuzzy Logic allows us to generate a unified 'emg_score' value that is based on specific channels that are related to the specific action being carried out. Using this emg_score, the LSTM network is trained and can achieve much higher classification accuracies, up to 96.7%. With the proposed methodology, 4-way classification is also performed, with up to 92% accuracy. The use of LSTM for EMG signal classification has been explored by others.
EMG signals were explored in [21] for onset/offset movement detection using a proposed LSTM-MAD network. A detection rate of up to 97% was reported by the authors. Their approach demonstrated the ability to differentiate activation intervals from background noise for both simulated and real data. A similar approach was attempted in Teager-Kaiser Energy Operator [42] and double-threshold statistical detector [43]. In [27], a comparative study was  performed between different Recurrent Neural Networks (RNNs). The focus was on the use of LSTM and Gated Recurrent Units (GRU). Classification accuracy of up to 86.7% was achieved.
The authors in [22] proposed a CNN-RNN network to capture the sequential nature of EMG signals for gesture recognition. The complex process involved the use of sEMG image representation methods with a high level of accuracy. In [44], the authors used EMG images to achieve accuracies of 77.8% for 52 gestures and 89.3% for 27 gestures. In [34], the authors utilized the use of fuzzy logic to classify EMG signals for wrist movements alone. They achieved an average classification accuracy of 93.12%.
The Two-Way classification achieved an accuracy of 95.1% for the Forward/Reverse pair and 96.7% for the GripUp/RelDown pair. These high accuracies are achieved for full-range arm movements. The four-way classification for all the movements achieved accuracies in the range of 86.7±5% to 91.3±3%. These results give confidence in the design of a unified Fuzzy +LSTM network that can be trained for multiple arm movements. The comparison study between the proposed system and alternate classifiers shows that Fuzzy +LSTM can outperform other classifiers in a similar test case.
Compared to the mentioned methods in the Literature, the proposed technique in this paper uses a simple RMS time-series feature integrated with fuzzy logic and LSTM. The RMS time-series feature set allows the system to extract meaningful features with reduced computing resources and complexity. The classification accuracies done in earlier papers were targeted toward actions on a particular body part. In our research, what has been demonstrated is the effectiveness of the proposed approach in classifying a complete movement that is more representative of many real-world actions.
Designing wearables with limited resources will be a feasible option with the collected data streamed back for further processing. Implementing the FIS allows us to relate the various sensor readings together more effectively. The ability of the LSTM network to identify patterns in a continuously changing signal allows the system to decode the users' intentions accurately. Accuracy levels of up to 91% for 4-way classification provide a good benchmark for future works that can be extended from this.

Conclusion
The methodology outlined in this paper provides a novel approach to synergize Fuzzy Logic with LSTM networks to achieve a high level of accuracy. The results indicate that using LSTM with individual channel features, such as RMS data, yields poor results. Fusing the various EMG sensor data can generate a much better classification accuracy. The proposed emg_score mechanism using Fuzzy Logic allows for a more human-interpretable form of expressing the relationship between the various sensors. The results obtained through the combined use of Fuzzy Logic with LSTM allow for a more accurate system that is able to clearly distinguish two major movement pairs, Forward/Reverse and GripUp/RelDown. Combining all the movements to perform 4-way classification allows a single LSTM network to generate good results in the range of 90%.
In the next phase, the authors will explore using other physiological signals like EEG together with EMG to improve the efficiency and classification accuracy of the system. This would be beneficial in situ ations where one of the physiological signals is affected due to illness or injury. The system can then consider using other inputs that are functioning well to perform the classification; implementation of such systems on hand-held portable devices will also be explored so as to develop a practical solution that individuals can use. These can include platforms like FPGA or multithreaded programming on a microcontroller.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// doi.org/https://figshare.com/collections/WAY_