Selective peripheral nerve recordings from nerve cuff electrodes using convolutional neural networks

Ryan G L Koh; Michael Balas; Adrian I Nachman; José Zariffa

doi:10.1088/1741-2552/ab4ac4

1. Introduction

Recording and stimulating from the peripheral nervous system are becoming important components in a new generation of bioelectronics systems. Neurostimulation using implanted peripheral neural interfaces has seen a long history of success with applications including the reduction of phantom pain or the restoration of sensation in amputees [1–3]; treatment for an overactive bladder [4, 5]; modulation of inflammatory activity using the vagus nerve [6]; implanted functional electrical stimulation for movement restoration [7]; and phrenic nerve stimulation for obstructive sleep apnea [8]. Unfortunately, recording using implanted peripheral neural interfaces has seen much less success and continues to remain a challenge for applications in humans [9]. Overcoming this recording roadblock would lead to better control signals for implanted closed-looped devices such as bi-directional neuroprosthetics [10–14], or neuromodulation applications [15–19].

Peripheral neural interfaces based on nerve cuff electrodes have demonstrated the desirable long-term stability needed for clinical application in humans for recording [20, 21] and stimulation [1, 2, 22], but they typically suffer from lower signal-to-noise ratio (SNR) than their intraneural counterparts (e.g. microelectrode arrays). Different techniques have been employed to raise the SNR, such as redesigning the cuff electrode [19, 23–25], or using windowing techniques such as rectify-bin-integration to average the signal over time [19, 21, 26–28]. In addition, multi-contact configurations have been used to work around the low SNRs, improving recording selectivity via source localization or beamforming approaches [26–30].

Recently our group established that recording selectivity in nerve cuff electrodes can be improved by integrating temporal (waveform shape and conduction velocity) and spatial information (spatial variations in electric fields) into the form of spatiotemporal signatures [31, 32], allowing for better characterization of neural pathways of interest. This framework for associating spatiotemporal multi-contact nerve cuff recordings of compound action potentials (CAPs) with the corresponding neural pathways, is hereafter referred to as the Extraneural Spatiotemopral CAPs Extraction (ESCAPE) technique.

In brief, our previous study [32] demonstrated that spatiotemporal signatures of individual, naturally evoked compound action potentials (termed here nCAPs in contrast to CAPs evoked by direct electrical stimulation of the nerve) could be associated with neural pathways and used to train a neural network to discriminate different neural pathways. This was a novel contribution as most studies previously have either worked with electrically evoked CAPs [33–35], which have higher amplitudes, or have not directly used the CAPs, relying instead on windowed data [27, 36]. Only [35, 37] have attempted clustering of individual nCAPs by their velocities which is insufficient for discriminating neural pathways with similar conduction velocities. Having the ability to identify individual nCAPs allows for the reconstruction of firing rates of different neural pathways with fine temporal resolution, which could then be used to robustly predict physiological parameters of interest (e.g. joint angles, cutaneous input, etc). Additionally, this could improve our ability to perform selective recordings in situations where multiple neural pathways are active, since using windowed signals increases the chances of multiple sources overlapping in time, while individual nCAPs are less likely to overlap.

We previously suggested that important characteristics are encoded within these spatiotemporal signatures and could be extracted to improve recording selectivity of the nerve cuff electrode [32]. This approach provided the spatiotemporal signatures of nCAPs to the neural network in the form of a 1D vector. However, the benefits provided by the spatiotemporal signatures may be better exploited using techniques that retain its 2D structure. In this manner, information encoded in local relationships of the spatiotemporal signatures may be better preserved. In particular, convolutional neural networks (CNN) [38, 39] (which are most often used in image processing tasks), are an appealing choice as they have the ability to pick out structural similarities within neighbouring pixels through convolutional operations performed in 2D, allowing for improved integration of the spatial and temporal information encoded within the recordings.

In this study, our proposed method takes advantage of the structural information encoded within the spatiotemporal signatures via a CNN called ESCAPE-NET. We present the resulting dramatic improvements to the discrimination of individual nCAPs, comparing the performance to results seen previously in [32], we investigate different factors affecting the underlying patterns in the spatiotemporal signatures (reference montage and ordering of contacts) on the discriminability of neural pathways, the effects of reducing the number of contacts, and we demonstrate the physiological validity of the nCAP-based classification output by predicting joint angles.

2. Methods

2.1. Experimental approach

The same in vivo data as in our previous study [32] was used here. Briefly, nine Long-Evans rats (retired breeders) were placed under isoflurane anesthesia. Access to the sciatic nerve was obtained through an oblique incision on the posterior and dorsal aspect of the hip. This provided access through the natural splitting of the fibers of the gluteus maximus after clearing of the skin and deep fascia. A 56-multi-contact spiral polyimide nerve cuff electrode (CorTec GmbH, Freiburg, Germany) was placed on the nerve to record neural activity, just proximal to its branching into the tibial, peroneal and sural nerves. The electrode consisted of seven rings of eight contacts distributed over the length of the electrode. The cuff had a length of 23 mm and a diameter of 1 mm. A needle electrode, placed in the back of the animal, served as a reference.

Data was acquired through a neural data acquisition board (RHD2000, Intan Technologies, USA) using a sampling rate of 30 kHz. Recordings were bandpass filtered on the acquisition board between 256 Hz and 7.5 kHz. Afferent activity consisting of dorsiflexion, plantarflexion and pricking of the heel was selectively evoked in different fascicles of the sciatic nerve using mechanical stimuli. The foot was held by the claws using a clip with plastic tips and dorsiflexion/plantarflexion of the ankle (approximately 60°) was applied manually to evoke proprioceptive activity in the tibial and peroneal branches, respectively [40]. A cutaneous stimulus to the heel, applied using a Von Frey monofilament (300 g), was used to elicit activity in the sural branch [41]. 100 trials of each activity (dorsiflexion, plantarflexion and pricking of the heel), along with an alternating dorsi- and plantar- flexion activity were collected. The application of stimuli was video-recorded using a webcam at 30 frames s⁻¹ and synchronized to the neural activity through a visible light emitting diode (LED) in the videos. The videos were later manually annotated to determine the times at which each type of stimulus was being applied, and this process provided the ground truth nCAP labels for training the classifier.

The experimental procedures were approved by the Animal Care Committee of the University of Toronto and all experiments were performed in accordance with the Animal Care Committee's guidelines.

2.2. ESCAPE framework

This framework first used in [32] involves extracting the spatiotemporal signatures of nCAPs associated with neural pathways of interest (figure 1(a)). The defining characteristic of the ESCAPE framework is the use of these spatiotemporal signatures to improve recording selectivity. In this context, we use the term 'neural pathway' to refer to a small group of fibers with a related function and firing approximately synchronously. The resulting nCAP produced by such a group of fibers is conjectured to underlie a 'spike' (i.e. sharp change in amplitude) visible in the nerve cuff recording (as discussed further below).

**Figure 1.** (a) The ESCAPE framework is shown, consisting of the following steps. **Raw Signal:** A neural signal is recorded using an implanted extraneural electrode with multiple rings of multiple contacts (i.e. 56-channel nerve cuff). **Preprocessing:** The raw signal is then re-referenced (e.g. tripole) and bandpass filtered per contact. These preprocessed signals in each ring are averaged, and the delay-and-add operations are applied to these average signals to obtain an improved SNR signal. **nCAP detection:** This newly obtained signal is then thresholded and used to obtain the locations of nCAPs in the preprocessed signal (i.e. signal before delay-and-add operations are applied). **Extraction of Spatiotemporal Signatures:** nCAPs are extracted from the preprocessed signal and represented as spatiotemporal signatures. **Classification:** These spatiotemporal signatures can then be used for classification with a particular classifier (e.g. CNNs) to obtain spike times for each neural pathway of interest (i.e. dorsiflexion, plantarflexion or pricking). (b) Example signal throughout the ESCAPE framework from raw recording to extraction of the spatiotemporal signatures. Each row in the spatiotemporal signature shown in (a) corresponds to the signal from one contact, as visible in the grid in the bottom right portion of (b).
Download figure:
Standard image High-resolution image

Preprocessing: First, the raw signal is referenced using a referencing montage (e.g. tripolar [42] or common average [43]) and any filtering needed is applied per contact. nCAP Detection: The preprocessed signals are used to obtain an average signal for each ring and delay-and-add operations, as described in the velocity selective recording methods [33–35, 37, 44], are applied obtaining a single average signal of the middle ring with improved SNR. nCAPs are then detected by applying a thresholding method (e.g. median absolute approach [45]) to this average signal. Signature Extraction: nCAPs' spatiotemporal signatures are extracted from the original filtered referenced signal to be used in the classifier training and evaluation. Classification: The extracted spatiotemporal signatures are used as input to a classifier that associates each nCAP with a neural pathway. Specific steps used in this study are explained below, a visual example of the signal throughout the framework is shown in figure 1(b), and spatiotemporal signatures of each class are shown in figure 2.

**Figure 2.** Example of spatiotemporal signatures extracted from each class. Signals shown are from rat 6's neural activity.
Download figure:
Standard image High-resolution image

2.2.1. Preprocessing.

Tripolar referencing was first applied off-line to the raw signal, by using the average of the contacts in the two outer rings as the reference. Bandpass filtering was then applied using a 6th order Butterworth filter with a 1–3 kHz passband, implemented using Matlab's butter and filtfilt functions.

2.2.2. CAP detection

Next the average signal in each ring was obtained and these signals were shifted in time against each other by an integer multiple of a time-step dt and summed to form a single signal (delay-and-add). A dt of 1 time sample was used, matching the expected nCAP conduction velocity, such that signals in the different rings sum and increase the amplitude of the delay-and-add output. This cleaned signal was then thresholded using the median absolute deviation estimate approach (equation (1)) [45] to find nCAP locations. Detected nCAPs with peak values above 15 µV in the tripolar referenced recordings (before delay-and-add) were discarded to minimize the inclusion of noise or artefacts in the dataset (previous analysis found a negligible impact of small variations in this threshold value on the end results [32]).

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm Threshold}=~4\times \frac{{\rm Median}(\left| X \right|)}{0.6745}.\nonumber \end{align} \tag{ 1 }$

2.2.3. Spatiotemporal signatures

Once nCAPs were detected, 49 time samples before and 50 time samples after the peak location were used to create the spatiotemporal signatures. These spatiotemporal signatures were extracted from the signal after tripole referencing, but before the delay-and-add operation that was used solely for nCAP detection.

Spatiotemporal signatures for each activity (dorsiflexion, plantarflexion and pricking of the heel) were constructed using detected nCAPs for each stimulus using M contacts at T consecutive time samples. This created an M × T matrix constituting the spatiotemporal signature for that nCAP. By associating each nCAP in a new recording with one of these known signatures, the activities of multiple neural pathways can be discriminated.

Two different spatiotemporal signatures were created by numbering contacts within a ring (radially), creating a representation that had a 'spatial emphasis' (SE), or numbering contacts along rings (longitudinally), creating a representation that had a 'temporal emphasis' (TE) as illustrated in figure 3. In addition, different referencing montages (tripolar, and common average) were applied to create alternative representations of the spatiotemporal signatures. The tripolar reference montage uses the average signal from the two outer rings of contacts (i.e. average signal of the 16 contacts) of the nerve cuff electrode as a reference for all contacts. The common average reference montage takes the average signal across all contacts as the reference. These spatiotemporal signatures were then used as inputs into the CNN.

**Figure 3.** (a) Example of the numbering of contacts used for the spatial and temporal emphasis representations. (b) and (c) Show the spatial (SE) and temporal emphasis (TE) representations, respectively. Each signature is comprised of 56 contacts (rows) and 100 time samples (columns), with the ordering of the contacts based on the SE or TE representation shown in (a). Representative signals shown are from a detected nCAP from rat 6's dorsiflexion activity.
Download figure:
Standard image High-resolution image

2.3. ESCAPE-NET architecture

CNNs are comprised of three fundamental building blocks: convolutional layers, pooling layers and a fully connected layer.

A convolutional layer convolves an input image with an N × N filter which slides along the image with a stride of K. The pooling-layer groups neighbouring pixels and represents them as a single pixel. For example, in max pooling, the maximum value from these neighbouring pixels is used. This results in an output image which has a smaller dimensionality than the input image. Lastly, a fully connected layer takes the output of several convolutional and max pooling layers and uses this output as features for training a fully connected neural net.

The CNN architecture used in this study was created in Keras [46]. Two pairs of convolutional and max-pooling layers followed by another convolutional layer was applied before the outputs were flattened and sent into a fully connected neural net (table 1). The convolutional layers used an 8 × 8, 4 × 4 and 2 × 2 filter for the first, second and third convolutional layers, respectively, with a stride length of 1 and zero-padding to maintain the same dimensionality. All max-pooling layers were based on 2 × 2 groupings. All activation functions used in ESCAPE-NET were the rectified linear unit (ReLu) except for the output layer in which softmax was used. A block diagram of the CNN is shown in figure 4.

Table 1. ESCAPE-NET architecture for spatial or temporal emphasis.

Layer	Kernel/number of nodes	Stride	Padding	Activation function
Conv_1	8 × 8 × 64	1	Same	ReLu
Max Pooling_1	2 × 2	1	Same
Conv_2	4 × 4 × 64	1	Same	ReLu
Max Pooling_3	2 × 2	1	Same
Conv_3	2 × 2 × 64	1	Same	ReLu
Fully Connected_1	256			ReLu
Output	3			softmax

'Same' means that the output of the layer is the same size as the input.

**Figure 4.** (a) Block diagram of ESCAPE-NET − SE/TE (b) block diagram of ESCAPE-NET − SE + TE.
Download figure:
Standard image High-resolution image

2.4. ESCAPE-NET training and evaluation

A separate CNN was trained for each rat, using a 3-fold cross-validation approach. nCAPs that were detected from each activity (the 100 trials each from dorsiflexion only, plantarflexion only, and pricking only) were split into training and test sets. These training sets included data augmentation to increase the number of training examples to 10 000 (a combination of detected and augmented nCAPs) for each class. Testing sets only consisted of detected nCAPs (no augmented nCAPs) for each class.

Data augmentation was performed to equalize the number of nCAPs used for training for each activity per rat. This procedure aimed to enhance the training of the CNN by addressing the class imbalance and provided the CNN with more training examples. New synthetic nCAPs were constructed by generating noisy versions of an averaged nCAP for a particular neural pathway. First, an average nCAP was obtained by averaging all nCAPs detected for a particular neural pathway. Second, the noise at each contact was characterized by subtracting the average nCAP from each individual nCAPs and fitting a Gaussian distribution to the resulting noise-only data. These Gaussian distributions were then sampled to create new noisy nCAPs by adding sampled noise to the average nCAP template of a particular neural pathway. This process was based on the assumption that the neural activity is reproducible and that a large portion of the observed variability is due to noise in the nerve cuff recordings, which have low SNR.

ESCAPE-NET was trained for 1000 epochs with an early stopping criterion of no decrease in the cost function over 15 epochs on the validation set (random 10% subset of the training set). ESCAPE-NET was optimized using stochastic gradient descent, with momentum value of 0.9 and calculated using a categorical cross entropy loss. Each convolution layer used 64 filters and the fully connected layer had a 256 node hidden layer and an output of three nodes (one for each class). The first and second convolutional layers were each trained for 25 epochs to initialize their weights before actual training of the full ESCAPE-NET architecture. We additionally investigated combining information from the SE and TE representations (SE + TE) of the spatiotemporal signatures. In this case, the outputs of ESCAPE-NET trained separately on SE and TE were concatenated before the fully connected layer and fed into a 512 (instead of 256) node hidden layer with the rest of the network staying the same.

Classification accuracy was quantified on an individual nCAP basis, assessed through the percentage of correctly identified nCAPs. Additionally, the F₁-score was computed for each class. The F₁-score metric is a better indication of ESCAPE-NET's performance than classification accuracy when classes are imbalanced in the test set. For the multi-class problem, the F₁-score was calculated based on the precision and recall of each class (e.g. dorsiflexion, plantarflexion and pricking). The reported F₁-score is the average of the calculated F₁-scores for all classes.

ESCAPE-NET was compared to algorithms developed previously in [31, 32]. The first algorithm quantifies the similarity of a nCAP with average spatiotemporal templates for each class using a bank of matched filters. The pathway whose corresponding filter has the highest maximum normalized output for a given nCAP is chosen as the active pathway. The second method is a random forest classifier that uses the 1D spatiotemporal signatures as input. This method takes random subsets of the data to produce different decision trees whose outputs can then be combined to obtain a classification. The last method was a feedforward neural network that also uses these 1D spatiotemporal signatures for classifying the activity of different neural pathways.

2.5. Contact information metric

The contact information metric (CIM) previously defined in [43] was developed to evaluate the selectivity of a multi-contact configuration without being biased by the number of contacts or the configuration of the electrode. The CIM is defined individually for each contact using Shannon entropy to quantify the amount of information that the contact provides for discriminating between different neural pathways.

The original metric calculated the entropy of a single probability mass function derived from a histogram of the peak values of the detected nCAPs. However, this peak value may be drastically affected by noise and thus an enhanced definition of the CIM is described here. It uses multiple points from the nCAP and evaluates the entropy from the probability mass function of each point, allowing for more robustness against noise.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {{H}_{kl}}=-\underset{i}{\overset{n}{\mathop \sum}}\,{{P}_{{{x}_{i}}}}{{\log}_{2}}{{P}_{{{x}_{i}}}}.\nonumber \end{align} \tag{ 2 }$

In equation (2), H_kl is the Shannon entropy calculated for the kth contact at the lth time point, x is the histogram of values from the kth contact at the lth time point observed in noisy recordings from M neural pathways, x_i is the ith bin of x, and n is the number of equally spaced bins in x. Bins were selected using the histogram bin optimization method by Shimazaki and Shinomoto [47]. The entropy value is determined by variability in the measurements, which can be due to both variability between pathways and to noise. To deal with this issue, the H_kl's were then normalized by the Shannon entropy calculated from a single neural pathway, within which the variability is the result of noise only. This was accomplished by taking the mean Shannon entropy obtained from each of the M neural pathways. This resulting normalized value provides the CIM for each contact at each time point.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm CI}{{{\rm M}}_{kl}}=\frac{{{H}_{kl}}}{\frac{1}{M}\sum\nolimits_{i}^{M}{{H}_{i}}}.\nonumber \end{align} \tag{ 3 }$

Where H_i is Shannon's entropy calculated for a single neural pathway for the kth contact and lth time point. The CIM for the kth contact is then calculated by taking the mean of the CIM at all time points (100 time samples) for the kth contact.

$\begin{align} \newcommand{\e}{{\rm e}} \displaystyle {\rm CI}{{{\rm M}}_{k}}=\frac{1}{l}\underset{1}{\overset{l}{\mathop \sum}}\,{\rm CI}{{{\rm M}}_{kl}}.\nonumber \end{align} \tag{ 4 }$

This formulation allows for CIM values to range from 0 to ∞ where a higher CIM would correspond to a more informative contact and a lower CIM would correspond to a less informative contact.

2.6. Contact selection

Smaller electrodes with fewer contacts may provide greater flexibility in terms of implantation sites and applications. We therefore explored how the classification accuracy was affected by the use of electrode configurations with fewer contacts. Contacts were removed from the original 7 × 8 configuration, creating 5 × 8 and 3 × 8 configurations. In addition, contacts were also removed using the CIM described above, to examine the effects of removing less informative contacts from the spatiotemporal signature on the nCAP discrimination performance (for example, for a 40-contact configuration, only the 40 contacts with the highest CIMs were used).

2.7. Reconstruction of ankle angle from firing rates

The final metric investigated was the ability to reconstruct the angle of the rat's ankle from the reconstructed nCAP trains of each neural pathway. Training sets consisted of all nCAPs from the dorsiflexion, plantarflexion and pricking only trials in combination with augmented nCAPs for each activity to reach 10 000 training examples per class. ESCAPE-NET was then evaluated on the test set data from an alternating dorsi- and plantar- flexion recording never seen in the training set. Detected nCAPs in the test set were classified, and both estimated and ground truth nCAP trains reconstructed for each neural pathway. The ground truth was constructed using manual classification of all detected nCAPs based on the synchronized video data.

Ground truth and estimated nCAP firing rates for each neural pathway were obtained by convolving the ground truth and estimated nCAP trains, respectively, with a Gaussian kernel with a standard deviation of 150 ms [48]. These firing rates were then binned in 25 ms intervals and used to train a recurrent neural network (RNN) in Keras [46]. The dataset was split into 70% training and 30% testing. Inputs to this RNN were ten time sample windows (250 ms windows) of the binned ground truth firing rates and ten shifted versions (e.g. −500 ms to −250 ms, −475 ms to −225 ms, ... −250 ms to 0 ms). These inputs were used to predict the next time sample's angle.

The RNN architecture had two long short-term memory layers (64 units, 128 units respectively), followed by a 1 unit fully connected layer for regression to output the predicted ankle angle. The RNN was trained for 1000 epochs with an early stopping criterion of no decrease in the cost function over 15 epochs on the validation set (10% random subset of the training set). Weights were initialized using a normal distribution. The network used a mean squared error loss function and was optimized using the adaptive moment estimation (ADAM) approach [49].

This trained network was then used to estimate the angle of the rat's ankle from the firing rates of the testing set by using either the estimated firing rate (to obtain the results with the current algorithm), or the ground truth firing rate (to obtain the best theoretical results). Pearson correlation coefficients were used to describe the relationship between the manually labeled ankle angle obtained from annotating recorded videos (with interpolation performed for points in between) and the estimated ankle angles from the trained RNN. Rat 3's data was removed from this analysis due to the degradation of the plantarflexion signal through the course of the experiment.

2.8. Statistical methods

A repeated one-way analysis of variance (ANOVA) was performed to compare all six algorithms (table 2), and post hoc pairwise comparisons were done with t-tests, including a Bonferroni correction for multiple comparisons. Subsequent evaluations (effects of reference montage and contact configurations) were applied using a pairwise t-test (table 3) or repeated one-way ANOVAs (tables 4 and 5) were performed on the best performing algorithm only (ESCAPE-NET SE + TE). Post hoc comparisons were conducted with t-tests, and Bonferroni correction was applied.

Table 2. Mean classification accuracies and corresponding F₁-score between different algorithms.

Algorithm	Classification accuracy (%)	F₁-score
Match filter [32]	51.0 ± 10.8	0.446 ± 0.157
Random forest [32]	65.8 ± 11.5	0.578 ± 0.210
Neural network [32]	68.6 ± 12.6	0.605 ± 0.212
ESCAPE-NET − SE	79.3 ± 10.5	0.725 ± 0.114
ESCAPE-NET − TE	79.9 ± 10.9	0.739 ± 0.118
ESCAPE-NET − SE + TE	80.8 ± 10.4	0.747 ± 0.114

SE—spatial emphasis; TE—temporal emphasis.

Table 3. Mean classification accuracies and corresponding F₁-score between different referencing montages.

Tripolar reference	Classification accuracy (%)	F₁-score
ESCAPE-NET − SE	79.3 ± 10.5	0.725 ± 0.114
ESCAPE-NET − TE	79.9 ± 10.9	0.739 ± 0.118
ESCAPE-NET − SE + TE	80.8 ± 10.4^a	0.747 ± 0.114^a

Common average reference

ESCAPE-NET − SE	78.8 ± 11.1	0.722 ± 0.116
ESCAPE-NET − TE	79.0 ± 11.2	0.727 ± 0.117
ESCAPE-NET − SE + TE	80.8 ± 10.6	0.745 ± 0.115

^aHighest score from all configurations. SE—spatial emphasis; TE—temporal emphasis.

Table 4. Mean classification accuracies and corresponding F₁-score with different number of rings.

7 × 8 configuration (original)	Classification accuracy (%)	F₁-score
ESCAPE-NET − SE	79.3 ± 10.5	0.725 ± 0.114
ESCAPE-NET − TE	79.9 ± 10.9	0.739 ± 0.118
ESCAPE-NET − SE + TE	80.8 ± 10.4^a	0.747 ± 0.114^a

5 × 8 configuration

ESCAPE-NET − SE	73.8 ± 12.8	0.664 ± 0.123
ESCAPE-NET − TE	73.9 ± 12.8	0.662 ± 0.128
ESCAPE-NET − SE + TE	75.1 ± 12.4	0.678 ± 0.123

3 × 8 configuration

ESCAPE-NET − SE	63.7 ± 11.7	0.546 ± 0.103
ESCAPE-NET − TE	63.7 ± 12.1	0.543 ± 0.105
ESCAPE-NET − SE + TE	64.8 ± 12.5	0.557 ± 0.102

^aHighest score from all configurations. SE—spatial emphasis; TE—temporal emphasis.

Table 5. Mean classification accuracies and corresponding F₁-score with removal of contacts using the CIM.

56 contacts (original)	Classification accuracy (%)	F₁-score
ESCAPE-NET − SE	79.3 ± 10.5	0.725 ± 0.114
ESCAPE-NET − TE	79.9 ± 10.9	0.739 ± 0.118
ESCAPE-NET − SE + TE	80.8 ± 10.4^a	0.747 ± 0.114^a

52 contacts

ESCAPE-NET − SE	78.9 ± 11.6	0.713 ± 0.123
ESCAPE-NET − TE	78.9 ± 11.1	0.708 ± 0.117
ESCAPE-NET − SE + TE	80.6 ± 10.9	0.732 ± 0.117

48 contacts

ESCAPE-NET − SE	78.6 ± 11.7	0.706 ± 0.121
ESCAPE-NET − TE	78.7 ± 11.4	0.705 ± 0.118
ESCAPE-NET − SE + TE	80.5 ± 10.9	0.728 ± 0.122

44 contacts

ESCAPE-NET − SE	78.6 ± 11.6	0.708 ± 0.122
ESCAPE-NET − TE	78.5 ± 11.4	0.705 ± 0.121
ESCAPE-NET − SE + TE	79.8 ± 11.0	0.720 ± 0.118

40 contacts

ESCAPE-NET − SE	79.6 ± 11.0	0.718 ± 0.120
ESCAPE-NET − TE	78.9 ± 11.2	0.714 ± 0.121
ESCAPE-NET − SE + TE	80.8 ± 10.7	0.736 ± 0.116

^aHighest score from all configurations. SE—spatial emphasis; TE—temporal emphasis.

3. Results

3.1. Number of detected nCAPs

The mean and standard deviations of detected nCAPs obtained from all rats for each activity were $5513\pm 2465,~5615\pm 3638,$ and $7513~\pm 4046$ nCAPs for dorsiflexion, plantarflexion and pricking respectively. The minimum and maximum numbers of nCAPs detected were 1950 and 8808, 293 and 9954, and 1164 and 13 180 for dorsiflexion, plantarflexion and pricking, respectively.

3.2. Classification accuracy

Table 2 shows the mean classification accuracy and corresponding mean F₁-score for the algorithms presented previously in [32] in comparison to our new and improved method from all rats in the 3-class discrimination problem (dorsiflexion, plantarflexion and pricking) using a tripolar reference. A significant main effect was seen in the choice of algorithm on the mean classification accuracy and F₁-score (p < 0.001 for both). In post hoc pairwise comparisons, ESCAPE-NET showed significant (p < 0.001, except ESCAPE-NET TE versus Random Forest (p < 0.01)) improvements in both mean classification accuracy and F₁-score over previously reported methods (table 2), regardless of whether the SE, TE or SE + TE representations were used. All post hoc pairwise comparisons can be found in supplementary tables I and II (stacks.iop.org/JNE/17/016042/mmedia).

Table 3 shows the mean classification accuracy and corresponding mean F₁-score for our proposed method from all rats in the 3-class problem for different referencing montages (tripolar referencing, and common average). The reference montage used showed no significant differences in reported mean classification accuracy or F₁-score.

Figure 5 shows an example of the predicted firing pattern of dorsiflexion and plantarflexion using ESCAPE-NET SE + TE for a subsection of the alternating dorsi-/plantar- flexion recording in rat 6.

3.3. Contact selection

Table 4 shows the mean classification accuracy and corresponding mean F₁-score for reduced electrode configurations (5 × 8, 3 × 8 configurations) from all rats in the 3-class problem using a tripolar reference. The contact configuration showed a significant main effect (p < 0.001) on both mean classification accuracy and F₁-score. Additionally, a steady significant decrease in both mean classification accuracy and F₁-score (p < 0.001) between all pairwise comparisons when using ESCAPE-NET SE + TE was found. All post hoc pairwise comparisons can be found in supplementary tables III and IV.

Table 5 shows the mean classification accuracy and corresponding mean F₁-score for reduced electrode configurations from removing the least informative contacts using the CIM (52, 48, 44, 40 contacts) from all rats in the 3-class problem using a tripolar reference. Removing contacts with low CIM scores led to a slight decline in recording selectivity and a main effect of removal of contacts was seen in the F₁-score (p < 0.05). Significant differences were only found between 44 versus 40 contact configurations for both mean classification accuracy (p < 0.05) and F₁-score (p < 0.01). All post hoc pairwise comparisons can be found in supplementary tables V and VI.

Continuing to reduce the number of contacts based on CIM, down to ten contacts, led to gradual decrease in F₁-score, but did not reveal a distinct corner in the curve that could be indicative of a critical number of contacts (supplementary figure 1).

3.4. Reconstruction of ankle angle from firing rates

The mean Pearson correlation coefficient was 0.812 ± 0.177 for the ankle angle predicted based on the estimated firing rate from ESCAPE-NET compared to the manually labelled ankle angle. To contextualize this value, the theoretical maximum performance of the RNN that could be achieved, if classification of the nCAPs was perfect, was calculated by predicting the ankle angles using the ground truth nCAPs as inputs to the RNN. This resulted in a mean Pearson correlation coefficient of 0.980 ± 0.009. Two examples of the predicted joint angles are seen in figure 6, representative of the best and worst results achieved.

**Figure 6.** (a) and (b) Example of prediction of the ankle angle in rat 4 using ground truth firing rate and estimated firing rate using ESCAPE-NET. Pearson correlation coefficient = 0.9783 and 0.9752 respectively. Mean F₁-score = 0.7812 for ESCAPE-NET. (c) and (d) Example of prediction of the ankle angle in rat 10 using ground truth firing rate and estimated firing rate using ESCAPE-NET. Pearson correlation coefficient = 0.9625 and 0.5159 respectively. Mean F₁-score = 0.5298 for ESCAPE-NET.
Download figure:
Standard image High-resolution image

Figure 7 shows the classification performance and the mean Pearson correlation coefficient (i.e. ability to correctly predict the joint angle) for each rat except rat 3, which was excluded as mentioned in section 2.7. The figure shows an expected relationship between the two metrics and suggests that an F₁-score of 0.7–0.8 is required for robust tracking in a 3-class problem.

4. Discussion

The classification accuracies and ankle angle reconstruction from the firing rates achieved, further support the idea of nCAP-based classification introduced in [32], and demonstrate the benefits of a CNN architecture that can make more effective use of the spatiotemporal information. Our results shed light on the influence of the ordering, number and location of contacts on discrimination performance. In addition, as discussed further below, the occasional misclassification of nCAPs is tolerable and can still be used to adequately track the ankle angle.

ESCAPE-NET demonstrated a much higher performance compared to previous techniques, regardless of how the contacts were ordered and whether the SE and TE representations were combined. This improvement is likely due to the design of CNNs, which can emphasize patterns among neighbouring points [50] and are thus able to extract useful structural features from the spatiotemporal signatures in their 2D form, which is lost when the spatiotemporal signatures are represented as 1D vectors.

The representation of the inputs (e.g. ordering of contacts for spatial or temporal emphasis) to ESCAPE-NET did not vastly change discrimination performance, but the combination of the representations (e.g. SE + TE) always provided better discrimination performance. This suggests that different structural features may be extracted between representations of the spatiotemporal signature and can be combined to better emphasize the distinguishing features between nCAPs of multiple neural pathways. Further analysis would be needed to confirm if different features can be obtained. The choice of referencing montage had a relatively minor influence on the discrimination performance of ESCAPE-NET.

The reduction of the number of contacts decreases the discrimination performance of ESCAPE-NET applied to seven rings by eight contact configuration (7 × 8), as expected. A 5 × 8 configuration shows a slight decrease in performance before a steep decrease is seen when a 3 × 8 configuration is used. Similarly, when less informative contacts are removed using their CIMs, a general but lesser decrease in discrimination performance is observed. Note that removing uninformative contacts may facilitate training by reducing the number of free parameters without reducing the amount of useful data available, leading to more emphasis on the informative contacts resulting in enhanced performance.

Robust ankle angle reconstruction was demonstrated using nCAP trains predicted by ESCAPE-NET. Ground truth firing rates demonstrate ideal ankle angle prediction but estimated firing rates from ESCAPE-NET can still be used to predict ankle angles accurately. However, further improvements in classification are needed to ensure robust ankle reconstruction in all cases using predicted firing rates from ESCAPE-NET. These findings demonstrate that the nCAP-based approach can be used to predict the ankle angle robustly and it is likely that this approach can be used to predict other parameters of interest (velocity, cutaneous input, etc) that are associated with the particular neural pathways.

Figure 6 shows examples of the best and worst performances in tracking the ankle angle. The main factor leading to a decreased performance in tracking of the ankle angle can be attributed to the classification performance of ESCAPE-NET. If classification accuracy is high enough, reconstruction of the firing rate will be good and thus provide robust tracking of the ankle angle. However, when classification performance is reduced, the reconstruction of firing rates will start to deteriorate, particularly when the number of nCAPs in each class are imbalanced as the class that contains more detected nCAPs will suffer less from a misclassification than the class with fewer detected nCAPs.

As figure 7 suggests, F₁-scores of 0.7–0.8 are needed for robust tracking of the firing rate in a 3-class problem. It is likely that this score will need to be maintained or improved as the number of classes increases, when scaling to nerves with more complex anatomies for human applications (e.g. median nerve). Improvements could be achieved by exploiting features extracted by ESCAPE-NET from the spatiotemporal signatures. For example, in this particular architecture, features found in the last layer of the convolution are used. However, features from shallower layers may be able to help improve performance as seen in architectures such as U-NET [51] that uses features at each layer. Additionally, in chronic applications, signal changes resulting from slight movement of the electrode and tissue encapsulation will likely deteriorate performance. In order to counter these issues, ESCAPE-NET could be retrained at different time points with new examples of nCAPs to help maintain its performance. In the experiments reported here, data collection for 100 trials required approximately 3–3.5 min for each stimulus (dorsiflexion, plantarflexion or pricking only). These trials generated the numbers of nCAPS presented in section 3.1, which were sufficient for us to achieve our results. Thus, obtaining the necessary training data for this approach is feasible if recalibration becomes necessary.

Although not demonstrated in this study, ESCAPE-NET has the potential to be part of real-time system. A methodological note is that filtering was applied using a zero-phase infinite impulse response (IIR) filter (sixth order Butterworth filter) implemented with filtfilt in MATLAB. However, for real-time applications this cannot be done because this implementation creates a non-causal filter. A sixth order finite impulse response (FIR) filter implementation was investigated (not reported here) which revealed slightly reduced results than seen with ESCAPE-NET SE + TE. Thus, it is expected that alternative filter designs will need to be investigated in order to maximize performance in a real-time system. Additionally, an analysis of computational speed was performed to investigate the real-time applicability of ESCAPE-NET. The classification step takes on average 0.5–0.6 ms for ESCAPE-NET − SE/TE and 0.95 ms for ESCAPE-NET − SE + TE (Intel^® Core^™ i7-6700 CPU @ 3.40 GHz, 24 GB of RAM). Considering the 3.3 ms duration of the spatiotemporal signatures and the nCAP detection steps, this algorithm requires a small window within the range of 5–7 ms to classify new nCAPs.

While this work further supports the possibility of nCAP-based classification introduced in [32], there are several factors that may affect the results of this study. Positioning of the nerve cuff, size of the nerve, or the amount of electrode surface contact must still be investigated to quantify the effects on nCAP-based classification. Even on the short term, factors such as electrode movements or variable instrumentation noise can lead to signal problems and impact on the accuracy. In this study, one of nine rats (rat 3) showed signal degradation in the later portion of the experiment, leading it to be excluded from the angle reconstruction analysis. Furthermore, as this in vivo study was performed acutely on anaesthetized rats, noise from muscle activity is not present and longitudinal information reflecting changes in recordings are not reflected. Nevertheless, we have carefully considered techniques robust to noise to make our proposed approach suitable for chronic studies that will be needed in order to further assess the feasibility of a nCAP-based classification approach.

Additionally, the ankle angle that was manually annotated through video and synchronized to the recordings, may have resulted in slight deviations from the true ankle angle due to pixel resolution during labelling, and human error. In addition, direct recording of neural activity in the tibial, peroneal and sural nerves may have provided more direct ground truth data for the classification than the stimulus timing information, but given the space constraints for electrode implantations would have required a larger animal model. Although these factors may have affected our reported metrics, we do not expect any of the overall trends observed in this study to change.

Lastly, we note that our approach involves a spike-detection step in nerve cuff recordings. Based on both experimental [52] and simulation [53] evidence, single-fiber action potentials (SFAP) are expected to produce signals in nerve cuff recordings smaller than those observed here (on the order of 1.5 µV peak-to-peak amplitude for SFAP, compared to approximately 10–20 µV peak-to-peak for spikes in typical nerve cuff recordings, including this study). Fibers close to the surface of the nerve may produce higher amplitudes [53, 54], however this would result in a larger difference among contacts within a ring, whereas our data showed similar amplitudes consistent with more central sources. The travelling behavior observable in the spatiotemporal signatures is clearly consistent with action potential propagation, discounting the possibility that the spikes originate from a different source. We have therefore conjectured that small groups of fibers with related activity are involved in the generation of the spikes, hence our use of the terminology 'nCAP'. However, it is possible that large SFAPs may be involved. Our data does not allow us to conclusively determine the origin of the spikes, but our method is independent of this point.

5. Conclusion

This study demonstrates that the classification of individual, naturally evoked CAPs, recorded from multi-contact nerve cuffs, is enhanced using a CNN designed to leverage spatiotemporal information. Altering these spatiotemporal signatures through the ordering and number of contacts was shown to be helpful in optimizing the classification accuracy. Although improvements are still warranted to ensure robust performance, the information extracted was shown to be physiologically meaningful, based on the ability to predict joint angles from proprioceptive neural data. These improvements suggest a promising path towards realizing naturally intuitive bi-directional neuroprosthetic systems able to produce more natural movements, as well as other applications in the rapidly growing landscape of bioelectronics technologies.

Acknowledgments

This work was supported in part by the Natural Sciences and Engineering Research Council of Canada (RGPIN-2014-05498, RGPIN-2016-06329 and the NSERC Alexander Graham Bell Canada Graduate Scholarship-Doctoral Program) and the Institute of Biomaterials and Biomedical Engineering at the University of Toronto. The authors would also like to acknowledge Sin-Tung Lau, and Grace A Gabriel for their statistical insight and Daniel Tovbis and Stephen Sammut.

Selective peripheral nerve recordings from nerve cuff electrodes using convolutional neural networks

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction