This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy. Close this notification
Paper

Intracortical recording stability in human brain–computer interface users

, , , and

Published 15 May 2018 © 2018 IOP Publishing Ltd
, , Citation John E Downey et al 2018 J. Neural Eng. 15 046016 DOI 10.1088/1741-2552/aab7a0

1741-2552/15/4/046016

Abstract

Objective. Intracortical brain–computer interfaces (BCIs) are being developed to assist people with motor disabilities in communicating and interacting with the world around them. This technology relies on recordings from the primary motor cortex, which may vary from day to day. Approach. Here we quantify, in two long-term BCI subjects, the length of time that action potentials from the same neuron, or group of neurons, can be recorded from the motor cortex. Main results. These action potentials are identified by their extracellular waveforms and may change within a single day, although some of these identified units can be identified consistently for weeks and even months. Features of the extracellular waveforms allowed us to predict whether a specific unit was more or less likely to remain stable over a prolonged period. Significance. A greater understanding of unit stability and instability can aid the development of motor BCIs, where the goal is to maintain a high level of performance despite changes in the recorded population. BCIs should be able to be operated without technician intervention for hours, and hopefully days, to provide the most benefit to the end-users of this technology.

Export citation and abstract BibTeX RIS

Introduction

Intracortical brain–computer interface (BCI) studies with human subjects have used microelectrode arrays to record action potentials from cortical neurons [14]. It has been documented that these intracortical recordings can be unstable [59], which leads to challenges with maintaining BCI performance. The BCI paradigm relies on a consistent relationship between neural firing rates and desired movement commands, so changes in the recorded signal, if unaccounted for, will degrade the performance of the method. This issue of unit instability, or the inability to consistently record from a given unit, is distinct from unit longevity in which the overall number of units recorded from an implanted electrode array decreases over time due to changes in the device or surrounding tissue [10].

The causes of recording instability are not well understood, though it has typically been attributed to movements of the electrode tip relative to the recorded neurons [11, 12]. The most obvious changes observed on a given electrode occur when a neuron can no longer be recorded, or when a new unit is identified. These changes in whether or not a unit is recorded will significantly degrade decoder performance when the decoding model relies on their presence. Previous studies suggest that as long as stable units are accurately isolated, the relationship between their firing rates and desired motor activity will remain stable as well [6, 1214] unless the decoders are trained such that changes in modulation depth and/or small changes to preferred directions of certain units could improve performance [15, 16]. Therefore, we expect that decoder performance degradation is due to changes in neuron identity, not the tuning of those neurons during a specific task. Additionally, some studies have demonstrated stable decoding when using thresholded activity or local field potentials (LFP) rather than sorted units [17, 18], indicating that larger samples of the neural population may remain stable even if individual neuron recordings do not. The analysis here focuses on identifying changes in unit identity rather than transient changes in neural firing rates that can occur with posture [1921], attention [2224], or other task-related factors [2528].

Previous studies of the stability of individual unit recordings have been conducted in non-human primates and have been limited to a few weeks [7, 9], with the exception of one animal that was studied for 150 d [6]. Pronounced differences in recording stability have been observed between recordings in cats [29] and non-human primates [69], raising the potential of further differences between human and non-human primates. Here we report on long-term recording stability in humans with tetraplegia, which is the target population for many BCI devices. Other motor BCI studies have spanned months or years [28, 30, 31], but have not analyzed unit stability. These studies typically recalibrate decoders on at least a daily basis to maximize performance.

The primary goal of this study was to quantify unit stability using a previously described statistical method [6]. Neural recordings from microelectrodes implanted in motor cortex were compared within and across channels to estimate stability over time, on timescales ranging from hours to months. Units were characterized across four dimensions: shape of the voltage waveform, the firing rate of the unit, the autocorrelation of the unit's firing, and the correlation between the unit's firing and that of other units in the recorded population. Units that showed marked changes between recording sessions were considered unstable. In this way, we could determine the rate at which recorded units turn over. In addition to quantifying the turnover rate, we also investigated whether unit characteristics, such as firing rate, waveform size, recording noise, firing rate consistency, or percent shared variance (a model-free measure of tuning [32]), predicted how long a unit would remain stable. Without an additional independent measure of the activity of individual neurons, we can only determine that we are unable to record from a given neuron. We cannot definitively determine if a unit became unstable because the recorded neuron moved out of range of the recording electrode or because the neuron itself stopped firing.

Previously, Ganguly and Carmena showed that a decoder that used only stable units, could maintain performance for days to weeks [33]. However, that study used a small number of units to provide two degrees of freedom of BCI control. For many BCI applications, more degrees of freedom, and thus more stable units, will be needed. Further, we seek to determine which unit characteristics are predictive of stability, which is critical if the BCIs are to be based on stable units. While many BCI studies make use of threshold crossing activity rather than sorted units [3, 17, 18, 3436], we included sorted single- and multi-activity in our analysis to provide a more comprehensive description of recording stability. Understanding the dynamics of instability and characteristics of stable units may help provide a framework for adaptive decoders, which gradually update the decoding parameters for all channels to account for changes in the recorded activity [3742].

We found that while many recorded units became unstable over the course of hours to days, a substantial proportion of the units are recordable for weeks at a time. Moreover, units with high firing rates (approximately 5 Hz or greater), large peak-to-peak voltage (approximately 50 µV or greater), and those that are well tuned are more likely to remain stable for weeks. These results provide a basis for developing BCI decoding methods that can enable long-term, stable performance.

Methods

Subjects

Two subjects were implanted with platinum-coated silicon intracortical microelectrode arrays (Blackrock Microsystems, Salt Lake City, UT) in left primary motor cortex (M1). Subject 1 was a 54 year old female diagnosed with spinocerebellar degeneration without cerebellar involvement [43]. She was implanted with two 4 mm  ×  4 mm, 96-channel arrays for 33 months. Subject 2 was a 28 year old male with a C5 motor/C6 sensory ASIA B spinal cord injury. He had two 4 mm  ×  4 mm  ×  1.5 mm, 88 channel arrays implanted in motor cortex. Data from the first six months of recording from Subject 2 are included here. Subject 2 also had two 4 mm  ×  2.4 mm  ×  1.5 mm, 32 channel arrays implanted in primary somatosensory cortex, however, the recordings from those arrays are not included in this analysis. The arrays in somatosensory cortex were implanted for the purpose of providing cutaneous sensations through intracortical microstimulation [44], and were not recorded from as consistently as the arrays in motor cortex.

Data collection and spike sorting

Neural data were recorded during BCI test sessions, typically performed 2–3 d a week, using the NeuroPort data acquisition system (Blackrock Microsystems, Salt Lake City, UT). We used 3 min of data collected during closed-loop BCI decoder calibration in order to identify the units recorded for each test session. We recorded 287 sessions across 33 months from Subject 1, and 51 sessions across six months from Subject 2. The neural data recorded during calibration for each of these days were used to quantify between-day unit stability.

We also wanted to quantify stability within a single day, so on a subset of days, 5 min long recordings were taken during closed-loop BCI calibration approximately every hour for 7 h. Subject 1 completed five sessions of testing in order to quantify within-day unit stability. These sessions were completed 276, 297, 307, 466, and 467 d after implant. Subject 2 participated in four sessions of within-day stability testing at 101, 128, 163, and 193 d after implant.

For all recording sessions, voltages were sampled on each electrode at 30 kHz with a 250–4500 Hz bandpass filter. Each time the voltage signal crossed a pre-defined spike threshold, the time of the crossing and a 48-sample (1.6 ms) snippet of the signal, starting ten samples before the threshold crossing, was saved. The spike threshold was set to either  −5.25 (Subject 1 prior to day 565) or  −4.50 (all other test sessions) times the baseline root-mean-square voltage on each channel, and then rarely manually adjusted if necessary.

Threshold crossing events were later sorted offline in Matlab (MathWorks, Natick, MA) to identify 'units', which we defined here as a collection of threshold crossing events from a single electrode that have similar waveform shapes. These units could be single-neuron action potentials, multi-neuron activity without well-isolated action potential shapes, or a collection of indistinct low-amplitude threshold crossings. Our sorting method relied on principal component analysis (PCA) to separate units based on their waveform shape. For each recording session and channel, the researcher viewed two plots of neural data prior to sorting the units. The first plot showed up to 400 overlaid spike snippets from a single channel (figure 1(A)). The second plot showed all recorded snippets from that channel as points in the plane of the first two principal components based on snippet waveform shape (figure 1(B)). The researcher then specified the number of sortable units (including single-neuron, multi-neuron, and indistinct low-amplitude units) on a given channel and initialized a Gaussian mixture model expectation-maximization algorithm to create the specified number of clusters in the principal component space. Every recorded snippet was assigned to the cluster to which it had the highest probability of belonging according to the expectation-maximization algorithm. Once all snippets were assigned to a unit, any unit that had a firing rate less than 0.2 Hz was discarded prior to further analysis.

Figure 1.

Figure 1. Spike sorting procedure. (A) Snippets were overlaid for each electrode and viewed by the experimenter during sorting. (B) The experimenter also viewed the same snippets in principal component space during sorting. The means (red dots) and standard deviations (red ellipsoids) for each cluster were overlaid after sorting.

Standard image High-resolution image

Iterative-comparison stability determination

Once the snippets were sorted into units, we used a method developed by Fraser and Schwartz to determine if the units were stably recorded from one session to the next [6] (posted to Matlab Central as 'Tracking neurons over multiple days', identification no. 30113). A summary of this method is presented here. The method used four classifying characteristics: waveform shape, auto-correlation, mean firing rate, and cross-correlation with other units. The differences in these characteristics between a unit and any unit recorded during a different session generates a 4D similarity score. These characteristics 'are individually weak classifiers, but, because they represent independent sources of information, they can be combined into a strong classifier' [6]. The method also makes use of the assumption that a neuron can only be recorded by a single electrode, and so neurons recorded on different electrodes must be unique. Based on this assumption, the similarity scores for each unit when compared to all units recorded on all the other electrodes is calculated to generate a 4D null distribution of similarity scores for units that are known to be different. Similarity scores were then calculated between sessions for units recorded on the same channel, creating a population of potentially-stable scores. Using expectation-maximization to fit a mixture of Gaussian models to the single-label null distribution of known-different units, and the similarity scores of the multi-label potentially-stable units allows a quadratic classifier to compute an optimal decision boundary with a set error rate. For this analysis, the error rate was set to 5%.

As a first estimate of stability, pairwise comparisons were made from recordings taken on the same electrode during adjacent testing sessions. Each test session was compared to the session that immediately followed it. We refer to this as the iterative-comparison stability method. Pairwise comparisons continued longitudinally across all recording sessions. If a unit was stable from the first session to the second, and then again from the second to the third, it was considered stable across all three sessions. This yielded a monotonically decreasing number of stable units across time. For the within-day stability analysis, iterative pairwise comparisons were made to the recording sessions taken each hour.

The iterative-comparison method provided a conservative estimate of the number of stable units across three or more sessions due to its iterative nature. A false negative result during any single session comparison marked a unit as unstable for the rest of the test period, even if it was labeled as stable in all session comparisons following the false negative. While false negatives could inflate estimates of instability, a complementary feature of the iterative approach was that false positives, which occur at a 5% rate (based on the decision boundary being set at 95% of the null distribution), were rapidly corrected after only a few comparisons. Therefore, the iterative-comparison method for determining stability provided a lower bound on the estimated number of stable units over the test period.

Direct-comparison stability determination

An alternative method to determine stability is to compare each session to all following sessions directly. This direct method of comparison allowed units that may not have been recorded during some sessions to be marked as stable if they were recorded again later, similar to the method used by Vaidya et al [8]. The direct-comparison method provided a more generous estimate of unit stability over time. With this method, each comparison has a 5% false positive rate. However, false negatives which incorrectly mark a unit as unstable, are limited to single pairwise session comparisons rather than every subsequent session. By using both the iterative-comparison method and the direct-comparison method, we estimated a range within which the true number of stable units fell.

To perform the direct-comparison, we compared each session directly to the subsequent 50 recording sessions, or all remaining sessions if there were fewer than 50, to identify stable units. We chose to only make direct-comparisons across 50 sessions at a time because that would always yield comparisons for at least 100 calendar days, and continuing the comparisons beyond that point provided minimal additional information because the shape of the stability decay curve was well defined by the first 100 d. Once those comparisons were done for each session, we had the stability rates for units from every recording session to all subsequent recording sessions. For the within-day comparison, the first recording in each day was directly compared to the subsequent seven recordings.

Unit characteristics for predicting stability

The following predictive characteristics were calculated using data from the first day that a given unit was recorded: peak-to-peak voltage, baseline noise, firing rate, firing rate consistency, and 'model-free tuning', a task-independent metric of a unit's firing rate correlations. These predictive characteristics were selected separately from the classifying characteristics that were used to determine stability between sessions, (see iterative-comparison stability determination section). The stability determination used individual unit characteristics as well as information about the entire recorded population of units to enable a robust characterization of unit identity where we assume that a unit can only be recorded on a single electrode tip. These same characteristics were used previously in non-human primate studies to track unit stability across days [6]. Here, we chose to compare units that we classified as having different levels of stability in terms of characteristics that could be computed for each unit individually rather than considering the relationship to properties of the recorded population. The predictive characteristics computed for each unit are defined as follows:

  • 1.  
    Baseline noise was calculated as two times the standard deviation of the first five and last five samples of each 48-sample unit snippet. These samples were chosen because each snippet is aligned such that the first ten samples occur before the threshold is crossed, and most waveforms will have returned to baseline approximately 1 ms (30 samples) after crossing threshold. Units with high baseline noise might be expected to be less stable because the recording noise could overwhelm the signal.
  • 2.  
    Peak-to-peak voltage was calculated as the difference in voltage from the trough to the peak of the average waveform for each unit.
  • 3.  
    Firing rate was calculated as the average number of threshold crossings per second across the 3 or 5 min recording. Firing rate was square-root transformed to better approximate a normal distribution [45].
  • 4.  
    Firing rate consistency was calculated as the inverse of the variance of the inter-spike interval divided by the mean firing rate of the unit. A unit with a low level of firing rate consistency might be more likely to be unstable if this is an indication that its firing behavior is more transient.
  • 5.  
    Model-free tuning, which we defined as the percentage of firing rate variance explained through factor analysis. Typically, neural tuning is calculated by determining how much of the observed firing rate variance can be explained by task-relevant variables, such as position or velocity. However, since different BCI tasks, including 2 degree-of-freedom cursor control and 3, 5, 7, and 10 degree-of-freedom hand and arm control [2, 28, 31, 36], were conducted across days, we could not evaluate one specific model to quantify neural tuning. Instead, a 10D factor analysis model was used to identify the amount of shared firing rate variance for each unit as described in Williamson et al [32] and Bittner et al [46]. Factor analysis approximates the covariance matrix of the neural population activity as the sum of two components: a shared component, that represents shared fluctuations across the population, and a private component that represents variation in activity that is specific to each neuron. Briefly, if Σ is the n-by-n covariance matrix measured for all n neurons in a single session, Σ  ≈  Λ  +  Φ where Λ is a low-rank covariance matrix that captures the ten dimensions explaining the most shared covariance of the population and Φ is a diagonal matrix that summarizes the remaining 'private' variability not accounted for by shared fluctuations in the population activity. Essentially, Φ represents the independent variance of each neuron that is not explained by other recorded neurons. As a proxy for neural tuning across various BCI tasks, model-free tuning was calculated as the 'percent shared variance', i.e. the ratio of a neuron's shared variance to its total variance:
    Equation (1)
    where mi is the model-free tuning value for unit i and Λii and Φii are the scalar diagonal entries of their respective matrices corresponding to unit i. A 10D model was chosen to match the largest number of degrees-of-freedom decoded during the study [36]. Higher levels of model-free tuning indicate that a neuron's activity is better predicted by the population. All of the tasks during which the data was collected involved the subject using the recordings from M1 to control the velocity of an end-effector. We therefore assume that the majority of shared variance in the population will be related to the task, and that units with a higher percentage of shared variance are therefore better tuned to the task.

A Wilcoxon rank-sum test was used to compare each predictive characteristic between units that were stable for at least 1 d and units that were not stable between days. We also compared these characteristics for units that were stable for 1–6 d, 7–21 d, or greater than 21 d using the Kruskal–Wallis test.

Cox regression modeling of unit survival

Given that there may be an interaction between the different predictive characteristics and their impact on unit stability, we sought to model the relative utility of the different predictive characteristics for predicting the length of stability using Cox regression modeling [47]. The Cox regression model uses a survival function of the form:

Equation (2)

where $S(t,i)$ is the survival rate for unit i at time t, days after a selected recording session. ${{S}_{0}}(t)$ is the baseline survival rate at time t, the average survival rate across the entire population. The exponent of the equation is called the relative risk. The relative risk is calculated using the b regression coefficients for each characteristic: noise, peak-to-peak voltage, firing rate, firing rate consistency, and model-free tuning. The subscript k denotes each different characteristic. ${{C}_{k}}(i)$ is the measured characteristic of each unit, and ${{\overline{C}}_{k}}$ is the population mean for characteristic k. The higher the survival rate for a unit at a given time after the initial recording session, the more likely it is to still be stable. The model was fit based on each observed unit's predictive characteristics on the first day it was observed, and the number of days it was subsequently observed before becoming unstable. The model was only fit to Subject 1's data because the time period of recordings for Subject 2 was insufficient to confidently fit a model.

The distributions of all of the predictive characteristics were heavily skewed. Data were log transformed prior to performing the Cox regression analysis in order normalize the distribution of observed values.

Impact of prior stability

We also sought to examine whether prior stability was predictive of future stability. For example, was a unit that had been stable for a week more likely to be stable for another week than a unit that had only been recorded for one day? Units were labeled by the number of days that they had been stable, ranging from 1 to 60 d. For each duration of stability (n  =  1–60 d), we determined how many days longer each unit would remain stable. In this way, we were able to compare the expected length of stability for units based on how long they had already been recorded. This analysis was performed only for Subject 1, as Subject 2's data was collected over a shorter period of time. Additionally, we chose to investigate prior stability separately, rather than as part of the Cox regression modeling because prior stability was highly correlated with the predictive characteristics.

Results

Recording quality through time

The largest number of recorded units (defined here as a mix of well-isolated single neurons, multi-neuron waveforms, and low-amplitude indistinct threshold crossings) was observed in the first two months after implant for both subjects. Subject 1 had a maximum of 301 recorded units on Day 17 and Subject 2 had a maximum of 407 units on Day 52. Since the participants had a different number of wired electrodes, we computed a ratio of the number of recorded units per channel, as shown in figure 2(A). This ratio peaked at 1.6 units per channel for Subject 1 and 2.3 units per channel for Subject 2. The number of sorted units declined over time, consistent with previous non-human primate and human subject studies. While the yield varies widely across subjects, studies tend to report many fewer units in implants that are more than a year old, compared to implants that are only a few months old [30, 4850].

Figure 2.

Figure 2. Signal quality metrics. (A) The number of sorted units per channel for each recording session. The inset shows the first six months for both subjects. (B) The distribution of peak-to-peak voltages across all recorded channels for each recording session. Again, the inset shows the first six months for both subjects. The dots show the median peak-to-peak voltage for the day, with the dashed lines show the interquartile range. The recording quality metrics exhibit a discontinuity at day 565 for Subject 1 due to a change in the spike threshold from  −5.25 to  −4.5 RMS.

Standard image High-resolution image

The peak-to-peak voltage of the recorded units increased over the first month of the implant before slowly decreasing (figure 2(B)), again consistent with previous literature [30, 4850]. The median unit size reached a maximum value of 89 and 102 µV, for Subjects 1 and 2 respectively. Note that the recording quality metrics in figure 2 exhibit a discontinuity at day 565 for Subject 1 due to a change in the spike threshold from  −5.25 to  −4.5 RMS. By moving the threshold closer to 0, more units with lower peak-to-peak voltages were recorded. This change was motivated by reports that small amplitude units can carry significant movement-related information and contribute to decoding [34, 35]. Subject 2 had more units per recording channel compared to Subject 1. This could be attributed to using the more sensitive threshold (−4.5 RMS) for the duration of the Subject 2's implant. Subject 2 tended to have more large amplitude units, as seen by the higher bound on the interquartile range for peak-to-peak voltage in figure 2(B).

These recording quality results demonstrate that the arrays performed as expected based on previous results [30, 4850]. We do not believe this slow decline in recording quality is directly related to the recording instability observed over the course of hours and days described below. If units were lost from the recorded population because they became permanently unrecordable, then the number of available units would decline much faster than we observed.

Consistent between-day recordings of some units

Data were compiled across all recording sessions to obtain the median trajectory for unit stability over the subsequent days and weeks of testing. Figure 3 shows the median and interquartile ranges for estimated unit stability rates using both the iterative (figure 3(A)) and direct (figure 3(B)) comparison methods. In both cases, we observed two primary responses: (1) a sharp initial decline in the number of stable units and (2) a small group of long-lasting stable units. Therefore we fit the time course of unit loss with a two-term exponential function. This implies that units do not decay at a constant rate. Instead, there is a large proportion of units (~75%) that were moderately stable over multiple days but became unstable within one week. The remaining units were highly stable and were lost at a slower rate. The exponential function used to describe unit stability for both the iterative- and direct-comparison methods was:

Equation (3)

where t is the number of days since the initial recording session. We define this equation such that a is the proportion of the recorded population that were moderately stable, c is the proportion of the recorded population that were highly stable, $(1-{{e}^{-b}})$ was the proportion of moderately stable units that become unstable each day, and $(1-{{e}^{-d}})$ . was the proportion of the highly stable units that become unstable each day. The proportion of units that were unstable is the remainder of the population not described by equation (3) when t  =  0, i.e. 1  −  (a  +  c).

Figure 3.

Figure 3. Estimates of between-day recording stability. (A) Percentage of units remaining stable calculated with the iterative-comparison method. (B) Percentage of units remaining stable calculated with the direct-comparison method. The data have been shifted downward by 5 percentage points to account for the known 5% false positive rate. For (A) and (B), each dot shows the median percentage of units remaining stable for each number of days following every recording session, dashed lines show the interquartile range. The solid lines show the exponential fit to the medians. (C) The exponential fits to estimate the unit stability profiles are shown for both subjects and both comparison methods. The iterative-comparison method is more conservative and estimates fewer highly stable units whereas the direct-comparison method estimates that 10%–15% of units can remain stable for months. The solid lines are for the iterative-comparison method, dashed lines for the direct-comparison method.

Standard image High-resolution image

Figure 3(C) shows the two-term exponential models of unit stability for Subjects 1 and 2 using both the iterative- and direct-comparison methods. The participants exhibited very similar trends in the rate of unit stability.

Using the more conservative iterative-comparison method 47.1% of recorded units from Subject 1 were unstable. However, 44.7% of units recorded from Subject 1 showed moderate stability. Of these moderately stable units, 16.3% became unstable each day. Finally, 8.2% of units showed high stability, with only 2.6% of these units becoming unstable each day. Subject 2 showed similar trends although a larger percentage of units (61.4%) were unstable. Another 32% of units showed moderate stability, with 15.3% of those becoming unstable each day. Another 6.6% of units showed high stability, with 3.7% of those becoming unstable each day. In figure 3(C), the y-intercepts of the solid lines are equal to the sum of the percentage of units that are moderately stable and the percentage of units that are highly stable for each subject.

As expected using the direct-comparison approach, a larger percentage of units were considered to be highly stable (19.7% for Subject 1 and 12.5% for Subject 2). Nearly 20% of units were moderately stable for both subjects. Using the direct-comparison approach, we estimated that approximately two-thirds of units were unstable. The estimated decay rates were lower using the direct estimation of stability as compared to the iterative approach, which is much more conservative across repeated sessions (table 1). All data in table 1 is derived from the two-term exponential models of the stability estimates.

Table 1. Between-day unit stability distributions and decay rates.

  Subject 1   Subject 2
Percentage of units (%) Decay rate (per day) (%)   Percentage of units (%) Decay rate (per day) (%)
Iterative Unstable 47.1 N/A   61.4 N/A
Moderately stable 44.7 16.3   32.0 15.3
Highly stable 8.2 2.6   6.6 3.7
Direct Unstable 62.8 N/A   67.6 N/A
Moderately stable 17.5 18.1   19.9 7.7
Highly stable 19.7 0.8   12.5 0.5

Most units are stable for hours

The between-day stability analysis revealed that approximately 60% of units were only recorded on a single day (table 1). As such, we sought to examine unit stability over a shorter timescale by collecting eight recordings, one every hour for 7 h, to identify units that were stable on the order of hours, rather than days.

For both subjects, the iterative- and direct-comparison methods estimated that 72%–80% of recorded units were stable across two consecutive recordings, separated by 1 h (table 2). Figure 4 shows the unit stability profiles for the iterative (4(A)) and direct (4(B)) comparison methods. Overall, Subject 1 appeared to have a slightly higher percentage of stable units, which is consistent with the between-day results. Both subjects showed the expected trend, in that fewer units were recorded stably as the time since the initial recording increased.

Table 2. Within-day unit stability distributions and decay rates.

  Percentage of units (%) Decay rate (per hour) (%) Percentage of units (%) Decay rate (per hour) (%)
Iterative Unstable 19.6 N/A 20.0 N/A
Stable 80.4 8.7 80.0 12.5
Direct Unstable 23.2 N/A 27.7 N/A
Stable 76.8 3.6 72.3 5.8
Figure 4.

Figure 4. Estimates of within-day recording stability. (A) Percentage of units remaining stable calculated with the iterative-comparison method. (B) Percentage of units remaining stable calculated with the direct-comparison method. The data have been shifted downward by 5 percentage points to account for the known 5% false positive rate. For A and B, each dot shows the percentage of units remaining at the time a recording was made (approximately every hour, but dots are plotted at precise times after first recording). The solid lines show the exponential fit to the data. (C) The exponential models fit to the data from panels (A) and (B) are shown for both subjects and both comparison methods. The iterative-comparison method is more conservative and estimates fewer stable units whereas the direct-comparison method estimates that at least 50% of units can remain stable for 7 h. The solid lines are for the iterative-comparison method, dashed lines for the direct-comparison method.

Standard image High-resolution image

We fit the within-day stability estimates with a single-term exponential function as illustrated in figure 4(C) because the collected data did not extend far enough in time to judge whether highly stable units had a significantly different decay rate as was observed between days. The exponential regression models were used to estimate the rate at which stable units could no longer be recorded. Using the more conservative iterative-comparison approach, we estimated that stable units were lost at a rate of 8.7% or 12.5% per hour for Subjects 1 and 2 respectively. When we instead used the direct-comparison method, we estimated that the number of stable units decreased at a rate of 3.6% or 5.8% per hour for Subjects 1 and 2 (table 2). These decay rates are markedly faster than those from the between-day stability analysis. The shorter length of data collection makes it impossible to properly identify units that will be highly stable, leading to the assumption that all units will become unstable as quickly as the ones that are observed to become unstable during the 8 h of recording. This is why it is important to consider the between-day and within-day analyses together.

Data from the between- and within-day experiments can be combined to provide a rough estimate of overall unit stability. Using the iterative-comparison method, approximately 25% of units were stable on the order of hours, 47.5% of units were stable on the order of days, and 7.5% were stable on the order of weeks. Using the direct-comparison method, approximately 45% of units were stable for hours, 17.5% for days, and 12.5% for weeks.

Unit characteristics predict stability

When each predictive characteristic was considered individually, unstable units had lower noise levels (12.5 versus 14.0 µV and 23.9 versus 25.8 µV for Subjects 1 and 2 respectively), lower peak-to-peak voltages (37.6 versus 43.9 µV and 62.5 versus 75.4 µV), lower firing rates (0.36 versus 1.59 Hz and 1.84 versus 3.52 Hz), less firing rate consistency (0.47 versus 0.67 and 0.74 versus 0.99), and lower model-free tuning (0.010 versus 0.022 and 0.016 versus 0.022) than units that were stable for at least one day (p  <  10−18 for all predictive characteristics for both subjects, Wilcoxon rank-sum). When comparing across different durations of stability (less than 1 week, 1–3 weeks, or greater than 3 weeks), we observed differences in all 5 predictive characteristics in Subject 1 (p  <  10−16, Kruskall-Wallace, figure 5). Subject 2 showed a significant difference (p  <  0.01) for all predictive characteristics other than firing rate consistency (p  =  0.32) for units that were stable for the same three time periods.

Figure 5.

Figure 5. Distributions of predictive characteristics of units based on stability for Subject 1. In all plots, the red line is the median, blue box outlines the interquartile region, and the whiskers extend from the 5th–95th percentile. (A) Model-free tuning from the 10D factor analysis model. (B) Peak-to-peak voltage. (C) Magnitude of the noise during baseline recording. (D) Average firing rate of the unit. (E) Firing rate consistency.

Standard image High-resolution image

We created a Cox regression model to predict the survival rate of a given unit over time based on its noise level, peak-to-peak voltage, firing rate, firing rate consistency, and model-free tuning. For both subjects, higher peak-to-peak voltage, firing rate and model-free tuning all correlated with a higher survival rate. Firing rate had the strongest correlation, followed by peak-to-peak voltage, and then model-free tuning. Firing rate consistency did not have a significant correlation with survival rate. Baseline noise had a strong correlation with survival rate in the data from Subject 1 but not Subject 2. This could be because all of Subject 2's data were collected early in the implant when signal quality is generally higher. The Cox regression model allows us to assess the relative risk that a unit will become unstable. The relative risk is the exponent from the survival function (equation (1)), which is calculated from observed predictive characteristic values and the regression coefficients. An increase in relative risk decreases the survival rate of a unit. To illustrate the relative strength of each characteristic we calculated the relative risk for hypothetical units. For the characteristic of interest, we evaluated the 25th or 75th percentile value measured across all units, while the other characteristics were set at the median value. The change in relative risk between the 25th and 75th percentile hypothetical unit for each characteristic is shown in table 3.

Table 3. Change in relative risk, given a change in each neural characteristica.

  Subject 1 Subject 2
25th %ile value 75th %ile value Change in relative risk (%) 25th %ile value 75th %ile value Change in relative risk (%)
Noise 11 µV 18 µV 15* 21 µV 29 µV 1
Peak-to-peak voltage 27 µV 53 µV −13* 53 µV 90 µV −7*
Firing rate 0.50 Hz 2.4 Hz −20* 1.0 Hz 5.6 Hz −13*
Firing rate consistency 0.28 1.0 1 0.34 1.9 4
Model-free tuning 0.006 0.028 −8* 0.003 0.026 −10*

a The percent change in relative risk when each predictive characteristic was changed from its 25th percentile value to its 75th percentile value while all other characteristics are held at their median. Negative changes indicated that increasing the value of a given characteristic made a unit more likely to remain stable. * Denotes changes in relative risk that were significantly different from a 0% change.

The Cox regression model also allows us to estimate the survival rates for the 'best' and 'worst' units. Here we defined the best-case unit as one in the 95th percentile for firing rate, peak-to-peak voltage, and model-free tuning, and in the 5th percentile for noise and firing rate consistency. Conversely, the worst-case unit was one in the 5th percentile for firing rate, peak-to-peak voltage, and model-free tuning, and in the 95th percentile for noise and firing rate consistency. By plotting the predicted stability for these two cases and for a unit with the median value in all predictive characteristics, we can visualize the range of possible survival rates (figure 6). The best-case unit had a relative risk of becoming unstable that was 3.7×  lower than the worst-case unit and 2.1×  lower than a unit with the median value of each characteristic. This means a best-case unit has a 36.8% chance of remaining stable for one week, compared to 12.8% for a median unit and only 2.4% for a worst-case unit. The best-case unit is also much more likely to be stable for one month (9.5%) compared to the worst-case unit (0.015%).

Figure 6.

Figure 6. Survival rate approximations based on predictive characteristics. The best-case units (green) show far higher survival rate than units with median (black) or worst-case (orange) characteristics according to the Cox regression model.

Standard image High-resolution image

Prior stability predicts future stability

For Subject 1, we could also examine whether the amount of time that a unit had been stable predicted future stability. As shown in figure 7, the longer a unit had been stable, up to a point, the longer it was likely to remain stable in the future. We see a steady increase in both the amount of time a unit will remain stable and the variance about that estimate, up to approximately 40 d of stable recording. After 40 d there appears to be a plateau in the amount of time a unit is expected to remain stable.

Figure 7.

Figure 7. Prior stability predicts future stability. The median and interquartile range for how long a unit will remain stable after being stable for the number of days on the x-axis.

Standard image High-resolution image

Discussion

Intracortical microelectrode arrays provide a way to measure the activity of populations of neurons in the cortex. This technology has enabled many scientific and engineering discoveries, for example by advancing our understanding of motor control and creating brain–computer interfaces to restore movement [1, 2, 16, 33, 51]. However, this technology also presents challenges as single unit recordings can be unstable and signal quality can degrade over time. Here we have presented a detailed look at recording stability from intracortical microelectrodes implanted in human motor cortex. Here we identify changes in unit identity, and ignore transient changes in neural firing rates that can occur with posture [1921], attention [2224], or other task-related factors [2528]. These instabilities are distinct from recording longevity [10]. While many studies have shown long-term recording capabilities using intracortical microelectrode arrays, it has also been well documented that the number of units recorded over time tends to decrease [30, 4850] as was observed for Subject 1 (figure 2(A)). In this study, neural recordings were captured for multiple recording sessions each week for many months, as well as multiple times during single days of testing to capture within-day stability.

Rates of instability

We found that approximately 60% of units are unstable after a single day. Even within a single day, we found that 45% of units became unstable after 7 h. The rate at which units became unstable was well-fit by a one-term exponential for recordings made within a day, and by a two-term exponential for recordings made on multiple days. Within a single day, units became unstable at a rate of approximately 11% per hour. For units that were stable for at least one day, the majority were lost at a rate of 16% per day, while a smaller percentage of units were highly stable for multiple weeks and were lost at a rate of 3% per day. Based on the differences in decay rates estimated by the two-term exponential models, we classified units as unstable, moderately stable, and highly stable. This is a coarse classification of the population that likely has a more continuous distribution of stability rates, however we are limited by the frequency (e.g. three times per week) of our recordings. In the future, continuous wireless [52, 53] recordings will likely make it possible to describe more detailed temporal profiles of unit stability.

Our results confirm prior reports [69] of signal instabilities that are commonly observed with intracortical recordings of rhesus macaques. When using the iterative-comparison method, we found slightly fewer stable units than Fraser et al who used the same method to quantify stability in two macaques [6]. However, our direct-comparison estimates of stability were comparable to what Vaidya et al observed in their macaque subject [8]. An important difference is that both studies showed substantially higher rates of stability over the first two weeks than we observed in our study, but this may be due to the fact that the previous studies only included well-isolated units. Given the similarities in long-term stability estimates between previous studies and this one, a macaque model appears to be appropriate for investigating issues surrounding chronic recording stability. However, further investigation is needed to determine if short term stability, particularly within a single day, is comparable between macaques and humans.

As noted above, some previous studies of stability in macaques [6, 8] only included well-isolated units. However, we chose to include poorly-isolated units, in addition to well-isolated units, because they contain significant information about movement parameters [34, 35] and are often used for decoding [3, 36]. Poorly-isolated unit recordings are often a summation of activity from multiple units and are therefore referred to as multi-unit activity. We recognize that differences between single- and multi-unit activity may be important for interpretation of our stability results. For example, the voltage measured from multi-unit activity, is more variable than that measured from well-isolated single units [35]. Additionally, well-isolated units tend to provide information about multi-dimensional movement velocities, while multi-unit activity may be more representative of movement speed [34]. The Cox regression model showed that units with lower peak-to-peak voltage, higher noise levels, and less distinct tuning were less stable overall (see worst-case curve in figure 6). These predictive characteristics are more typically observed for multi-unit recordings suggesting that multi-unit activity is more unstable than single-unit activity. This result will need to be reconciled with the relatively high levels of stability in LFP and unsorted threshold crossings found by Flint et al [17]. One potential explanation is that by averaging activity over a much larger population for LFPs, or the combination of well-isolated single-units and multi-unit activity that contribute to threshold crossings, individual neuron stability becomes less important.

To be sure that our results were not driven primarily by poorly-isolated units, we conducted a secondary analysis using the within-day stability data. With this analysis, we defined well-isolated units as those with the most consistent waveform shapes. Consistency was calculated by taking the dot product of each recorded snippet with the subsequent one after normalizing them to remove the effect of peak-to-peak voltage. This allows us to identify units with very consistently shaped waveforms (high average dot product). The population of units was separated based on a mixture of Gaussians model, with 47.7% and 54.1% of units being labeled well-isolated for Subjects 1 and 2 respectively, and the rest being labeled poorly-isolated. While waveform similarity is a criteria used to determine stability, stability determination is based on the comparison of mean waveform shape between sessions, while the isolation determination was made based on the variability of the waveform from one spike to the next within a single recording session. After 1 h, 81% of well-isolated units were stable as compared to 63% of poorly-isolated units. Approximately 8.5% of well-isolated units became unstable each hour as compared to 16.4% of the poorly-isolated units. This result agrees with the between-day analysis of the Cox regression model and suggests that multi-unit activity is less stable, although it is informative for BCI. Developers of long-term BCI systems will need to balance the need for the extra information available from multi-unit activity [34, 35] with the need for recorded units to remain stable for long periods of time.

Predicting a unit's stability

In addition to measuring the rate of unit instability, we also sought to determine whether there were particular characteristics that distinguished stable units from those units that were more likely to be unstable. We investigated unit recording noise, peak-to-peak voltage, firing rate, firing rate consistency, and model-free tuning because they provide information about a unit's activity, consistency, clarity, and modulation and are independent of the task being performed. In general, stable units tended to have a higher firing rate, peak-to-peak voltage, and model-free tuning value. Units that were not stable had lower values of these characteristics, likely because many of them were low-amplitude activity without clearly defined action potentials. We used a regression model to predict unit stability and found higher firing rates were most predictive of prolonged stability with higher peak-to-peak voltage and model-free tuning values also providing significant improvements in expected length of stability. These characteristics appear to be the most robust predictive characteristics that could be used to identify units that would provide more stable long-term recordings, creating more reliable decoders.

Possible causes of instability

The within- and between-day stability time courses provide some insight into the causes of recording instability. The typical explanation of recording instability has been that the electrode tip moves relative to the recorded neurons so that the distance becomes too large for clear recording [11, 12]. Our within-day analysis shows that the loss of units is gradual over this time period suggesting that electrode movement does not occur suddenly, for instance from sudden head movement. It remains a possibility that micromotion occurs at the electrode-tissue interface due to ongoing biological processes [54]. Another possible explanation for instability is that neurons go through active and inactive states, not unlike the inactivation that occurs during slow wave sleep [55]. A change in unit state could drastically alter the firing rate causing the unit to be recognized as unstable. Finally, neuronal death and glial scarring have also been proposed as potential causes of unit instability [54, 56]. With glial scarring, neurons are further away from the electrode tip and at some point could become too far away to be recorded. However, we observed that the total number of recorded neurons decreases much more slowly, by orders of magnitude, than the number of stable units so we do not expect that neuronal death or an increase in glial scar thickness are the primary cause of instability (figures 2 and 3).

We found the rate of stability was significantly increased when estimated with the direct-comparison method suggesting units frequently return after being marked as unstable. This is consistent with the theories of micromotion and state changes. Neurons that move away from electrodes could move back towards the tip if movement resembles a random walk, so they would be recognized as stable with the direct-comparison method, but not the iterative-comparison method. Similarly, neurons would be recognized as stable with the direct-comparison method if they returned to an active state after entering an inactive state. Further evidence to disentangle these hypotheses could be gathered using forthcoming wireless recording systems [52, 53] to conduct continuous long-term recordings. If most units become unstable after small changes in peak-to-peak voltage, this would suggest that micromotion may be the primary driver of instability. In contrast, sudden shifts in firing rates would suggest a change in neural state.

If micromotion of the electrode tips relative to the recorded neurons were the primary driver of instability, then the units with large peak-to-peak voltage would be the most stable because the recorded neuron is nearer to the electrode tip and could tolerate a small amount of motion while still being recorded. We did observe that a large peak-to-peak voltage was predictive of stability, however we also found that firing rate and model-free tuning were predictive of stability. There is not an obvious reason for firing rate and model-free tuning to correlate with a neuron's proximity to the electrode. We think that this suggests neuron-type may explain differences in stability rate, potentially because certain neuron-types undergo state changes differently [55] or because certain types of neurons are more or less stable for reasons that are yet to be determined. A thorough waveform analysis, for instance using the fact that interneurons have narrower spike waveforms than pyramidal neurons [57], may be able to determine whether specific types of neurons are more consistently recorded.

Implications for BCI development

While the stability of recordings may improve with changes to the size, shape, materials, and coatings of recording electrodes [56], understanding the rate at which units are changing and which predictive characteristics are more likely to be stable using existing clinical electrodes provides useful information for the development of intracortical BCI devices. One important design criterion for a clinical BCI is to minimize the number of times that the device needs to be recalibrated. One possible approach is to identify very stable units and use only those units for the BCI. For example, Ganguly and Carmena [33] showed that a monkey could complete a cursor task using the same decoder for up to 19 d using just 15 recorded units. The results from the iterative-comparison analysis provide a conservative estimate that a 96-channel array (assuming ~1 unit per channel, figure 1(A)) would provide at least 15 stable units for 9 d. The direct-comparison analysis, predicts at least 15 stable units for 29 d. We would therefore expect that if we could identify these stable units, BCI performance on a cursor control task could be maintained for 9–29 d.

A significant limitation of selecting only highly stable units for a BCI is that it will not scale well to applications that require higher degrees of freedom because a limited amount of information can be extracted from such a small sample of units. However, the estimates of unit stability presented here should aid in the development of unsupervised recalibrating decoders [3742]. These decoders adjust to the loss of units from the recorded population and the addition of the new units that replace them by recalibrating the decoder during use. The stability decay rates shown here provide recalibration algorithms with an a priori estimate of how quickly units will be lost and trigger a recalibration before performance decays too far. The results from the within-day comparisons are most pertinent here, showing that around half of the recorded units turn over within an 8 h period. This indicates that updates will need to occur at least every few hours, which will require a system that is used consistently through the day. The units that are identified as stable could also provide a baseline framework to which the activity of newer units can be compared to begin estimating their tuning. If unsupervised recalibration could run in the background of a BCI system, then a BCI user may only have to participate in supervised decoder training once while maintaining performance for at least the rest of the day without operator intervention.

An important additional consideration is that while a unit may remain stable, it is possible that its relationship to motor output may change over time. For example, some studies have shown that preferred directions and modulation depth can change with learning or task context as the subject's control of the BCI improves [15, 16, 58]. We did not test this explicitly in our study, however, mean firing rate and model-free tuning are likely related to a neuron's functional properties. Future studies will need to characterize stability in terms of BCI performance across different types of tasks. Here we provide information regarding the stability of the ability to record a given unit chronically in a human intracortical implant.

Conclusions

This analysis of the stability of intracortical microelectrode array recordings in the motor cortex of two BCI users with tetraplegia showed that while many units become unstable over hours or days, well-tuned units with high firing rates and large peak-to-peak waveforms are more likely to remain stable for weeks to months. These heterogeneous rates of stability can inform the development of BCI systems intended for consistent long-term clinical use without needing regular technical intervention.

Acknowledgments

This study was conducted under investigational device exemptions granted by the US Food and Drug Administration (NCT01364480 and NCT01894802). We thank Jan Scheuermann and Nathan Copeland for their extraordinary commitment and effort in relation to this study and insightful discussions with the study team; Karina Palko for her participation as an honorary research team member and support of the study; Debbie Harrington (Physical Medicine and Rehabilitation) for regulatory management of the study; Jeffrey Weiss for engineering support; the University of Pittsburgh Clinical and Translational Science Institute and the Office of Investigator-Sponsored Investigational New Drugs and Investigational Device Exemption support for assistance with protocol development and regulatory reporting and compliance; the volunteer members of the DSMB for their continued monitoring of this study; and Blackrock Microsystems (Salt Lake City, UT, USA) for technical support in relation to this project. This material is based upon work supported by the Defense Advanced Research Projects Agency (DARPA) and Space and Naval Warfare Systems Center Pacific (SSC Pacific) under Contract No. N66001-16-C-4051 and the Revolutionizing Prosthetics program (Contract No. N66001-10-C-4056). The views, opinions, and/or findings contained in this article are those of the authors and should not be interpreted as representing the official views or policies of the Department of Veterans Affairs, Department of Defense, or US Government.

Please wait… references are loading.