Neural decoding of electrocorticographic signals using dynamic mode decomposition

Yoshiyuki Shiraishi; Yoshinobu Kawahara; Okito Yamashita; Ryohei Fukuma; Shota Yamamoto; Youichi Saitoh; Haruhiko Kishima; Takufumi Yanagisawa

doi:10.1088/1741-2552/ab8910

1. Introduction

A brain–computer interface (BCI), which enables communication via brain activity, is the only way to express thoughts for severely paralyzed patients who are completely unable to move their body [1]. BCIs are in high demand for these patients [2], who choose to undergo invasive procedures involving the placement of electrodes on the surface [3] or inside the cortex [4, 5]. An electrocorticographic (ECoG) signal, which is measured by a surface electrode, is a promising signal for use in BCIs [6] due to its stability [7] and less invasive nature compared to penetrating electrodes [8, 9]. An ECoG-BCI was successfully applied to a severely paralyzed patient with amyotrophic lateral sclerosis (ALS) and enabled the patient to stably communicate for months [10]. However, much less information is derived from an ECoG signal than from penetrating electrodes [11]. The patient with ALS was able to type faster using unit activity recorded by a penetrating electrode, at more than 30 characters per minute [11], than using ECoG signals [10]. Thus, for the clinical application of ECoG-BCIs, some novel technologies to extract information from ECoG signals are necessary.

To decode the ECoG signals, the oscillation component is considered to have important information, and various power spectral density (PSD) estimations [12–14] have been used as features for machine learning [15–19]. Among the methods to evaluate the PSD [18], fast Fourier transform (FFT) is often used for online decoding because of the low calculation cost [20]. FFT evaluates the powers of some frequency bands for each channel in a time series and for individual time points in space. However, because these features are independently evaluated, FFT cannot evaluate the patterns of ECoG signals in both space and time simultaneously. Therefore, a method to characterize the nonlinear dynamics as the linear summation of coherent patterns across both space and time (e.g. frequency and phase) is necessary.

Dynamic mode decomposition (DMD) has recently attracted attention as a way of obtaining modal representations of nonlinear dynamics from general multivariate time series data without explicit prior knowledge about the dynamics [21, 22]. DMD is closely related to spectral analysis of the Koopman operator (see supplementary information (available online at stacks.doi.org/JNE/17/036009/mmedia)) and has been successfully applied for the extraction of spatiotemporal patterns, broadly in fluid mechanics [21–24], but rarely in neuroscience, e.g. ECoG and functional magnetic resonance imaging (fMRI) [25–28]. Although a previous study using human ECoG signals demonstrated that the DMD was able to capture some characteristic spatiotemporal patterns related to spindles [27], the spatiotemporal patterns were not used for neural decoding. It has not been revealed whether the spatial and temporal coherent patterns identified by DMD improve the accuracy of the neural decoding used in BCI technology.

A problem in the application of DMD for neural decoding is that DMD transforms the spatial and temporal signals into a set of vectors representing the characteristic spatial mode, which has a characteristic frequency in the time domain. Notably, because the characteristic frequency of the DMD mode is estimated from single trial data, the frequency is not fixed across trials and varies depending on the original signals. Therefore, to classify the signals by the vectors estimated by DMD, proper metrics are needed to evaluate how the sets of vectors are similar among trials. Here, we used the principal angle as a metric to evaluate the relationship among the sets of vectors for each trial [29]. A Grassmann kernel was used with support vector machine (SVM) to classify the set of vectors. In this study, we classified upper limb movements using the DMD components with SVM. The classification accuracy using the DMD component was compared with that using the FFT powers and that using the DMD components whose phases were shuffled.

2. Methods

2.1. Subjects

11 patients (four females, seven males; age range, 13–66 years) with subdural electrodes participated in this study. Eight patients were the same patients from our previous report [17]. Five patients had different degrees of motor dysfunction and sensory disturbances in their upper limbs due to strokes without damage to the sensorimotor cortex, previous surgery on the motor cortex, or brain tumor on the sensorimotor cortex (table 1). Six patients with epilepsy had no motor dysfunctions or sensory disturbances. All participants or their guardians gave written informed consent to participate in the study, which was approved by the ethics committee of Osaka University Hospital (No. 08061, No. 19257).

Table 1. Clinical profiles.

Patient no.	Age (yr)/sex	Diagnosis	Paresis in affected limb (MMT)	Sensation in affected limb
1	65/M	Ruptured spinal dAVF	Spastic (4)	Hypoesthesia
2	64/M	R thalamic hemorrhage	Spastic (4)	Hypoesthesia
3	20/F	L intractable epilepsy	None (5)	Normal
4	34/F	R intractable epilepsy	None (5)	Normal
5	22/F	R intractable epilepsy	None (5)	Normal
6	14/M	R intractable epilepsy	None (5)	Normal
7	13/M	L intractable epilepsy	None (5)	Normal
8	33/M	R intractable epilepsy	Slightly spastic (4)	Hypoesthesia
9	26/M	L brain tumor	None (5)	Normal
10	66/F	R subcortical infarction	Spastic (4)	Hypoesthesia
11	49/M	R putaminal hemorrhage	Slightly spastic (5-)	Hypoesthesia

dAVF, dural arteriovenous fistula; F, female; L, left; M, male; MMT, manual muscle test; R, right.

2.2. Movement tasks

Experiments were performed approximately 1 week after electrode placement. The patients performed three types of upper limb movements, selecting among grasping, pinching, hand opening, thumb flexion, and elbow flexion [17]. We presented the patients with three sound cues at 1 s intervals. The patients performed one of three prespecified movements after the third sound cue. The patient selected the movement type for each trial, so the order and number of exercises were random. The numbers of the selected movement types are shown in table 2. The patients were instructed to perform the selected movement once, immediately after the third cue, and to return their hands to the resting position, which was an immediate hand posture among the performed movements. For the resting position, patients were instructed to relax their hands or elbows with slightly flexed joints. Each type of movement was performed approximately 30 to 100 times. The movement instructions were delivered using a PC monitor controlled by ViSaGe (Cambridge Research System, Rochester, UK) placed in front of the patients.

Table 2. Movement tasks and recording conditions.

Patient no.	Pre-specified movement tasks (label 1, 2, 3)	Trial count (label 1)	Trial count (label 2)	Trial count (label 3)	Number of electrodes
1	Grasp, pinch, open	42	46	32	60
2	Grasp, pinch, open	19	25	33	60
3	Grasp, pinch, open	44	36	37	20
4	Grasp, pinch, open	34	44	49	30
5	Grasp, pinch, open	42	39	39	30
6	Grasp, pinch, open	34	20	22	15
7	Grasp, pinch, open	37	50	36	27
8	Rock, paper, scissors	31	30	39	49
9	Rock, paper, scissors	20	20	20	24
10	Grasp, thumb, elbow	36	40	36	20
11	Grasp, thumb, elbow	71	69	39	32

2.3. ECoG recording and preprocessing

ECoG signals were recorded at 1 kHz by an EEG-1200 system (Nihon Koden, Tokyo, Japan). The subdural electrodes were placed on the sensorimotor cortex based on clinical necessity. For each patient, 15–60 electrodes were implanted. All signals were subtracted by the average of the signals of all electrodes at each time point, a well-known method for the calculation of a common average reference [30, 31]. An epileptologist identified the channels with severe noise, and these were excluded from the analysis. Three channels for patient 7 and one channel for patient 10 were removed (supplementary figures 3, 5).

For the following analysis, we used the ECoG signals at 0.5 s after the third sound cue, ${S_{execute}}$ , and at 1.0 s before the first sound cue, ${S_{norm}}$ (figure 1). Here, ${S_{execute}}$ and ${S_{norm}}$ were composed of n channels for 500 and 1000 time points, respectively. Here, we used 1.0 s of ECoG signals for normalization ( ${S_{norm}}$ ) to decrease the variability in ${S_{execute}}$ . Notably, we also used 0.5 s of ECoG signals before the first sound cue for the ${S_{norm}}$ , but the classification accuracy using the FFT feature did not significantly change with the normalization of 0.5 s (supplementary information, table 1). All processes were implemented by MATLAB (R2015b).

2.4. FFT features for neural decoding

As a conventional approach for the neural decoding of ECoG signals, FFT was used to create the FFT features of the signals [32]. FFT was applied as 512 points and 1024 points for ${S_{execute}}$ (500 points) and ${S_{norm}}\,$ (1000 points), respectively, by zero-padding with a hamming window. ${S_{norm}}\,$ was only used to normalize ${S_{execute}}$ to reduce variation among trials. The power features of four frequency bands were obtained by averaging the estimated power of the four frequency bands: theta (4–8 Hz), alpha (8–13 Hz), beta (13–30 Hz), and high gamma (80–150 Hz). In addition to these powers, we also calculated the average 0 Hz power for ${S_{execute}}$ and ${S_{norm}}$ . Finally, these five features of ${S_{execute}}$ were normalized by the mean power $\overline {{\boldsymbol{p}_{norm}}}$ and standard deviations ${{\boldsymbol{\sigma}} _{norm}}$ of the corresponding features derived from ${S_{norm}}$ to create the FFT features, which consisted of five features by n channels [17, 33]. The features were obtained by $\frac{{{\boldsymbol{p}_{execute}} - \overline {{\boldsymbol{p}_{norm}}} }}{{{\boldsymbol{{\boldsymbol{\sigma}}} _{norm}}}}$ ( ${\boldsymbol{p}_{execute}}$ is the power feature vector of ${S_{execute}}$ ). Here, the FFT powers of ${S_{execute}}$ were normalized because normalized features were shown to be decoded with better accuracy than FFT powers of ${S_{execute}}$ without normalization [15].

2.4.1. Computation of the dynamic mode decomposition.

Assuming that the sensorimotor cortex during exercise is one dynamic system, the system can be described using its observed variable ${\mathbf{x}}$ (ECoG signals) as follows:

$\begin{equation}\frac{{{\text{d}}{\mathbf{x}}}}{{{\text{dt}}}} = f\left( {{\mathbf{x}},\,t;\,\mu } \right),\end{equation}$

where $t$ and $\mu$ are the time and system parameters. Because the actual recorded signals are values at discrete times, ${{\mathbf{x}}_k} = {\mathbf{x}}\left( {k\Delta t} \right)$ is delimited by $\Delta t$ , and the dynamic system is described as ${{\mathbf{x}}_{k + 1}} = {\mathbf{F}}\left( {{{\mathbf{x}}_k}} \right)$ . Although F is difficult to obtain analytically, we can obtain linearly approximated dynamics ${{\mathbf{x}}_{k + 1}} = {\mathbf{A}}{{\mathbf{x}}_k}$ from the recorded signals. The DMD can be used to estimate A of unknown dynamics.

First, we briefly explain how the DMD was evaluated for column vector ${{\mathbf{x}}_k}$ , which is the observations across all electrodes at the k-th time sample [27]. Gathering measurements from m snapshots in time, we constructed two n × (m − 1) raw data matrices, ${\mathbf{X}_1}$ and ${\mathbf{X}_2}$ :

$\begin{equation} \begin{array}{l} {\mathbf{X}_1} = \left[ {\begin{array}{*{20}{c}} {{{\bf{x}}_1}}&{{{\bf{x}}_2}}& \ldots &{{{\bf{x}}_{m - 1}}} \end{array}} \right]\\ {\mathbf{X}_2} = \left[ {\begin{array}{*{20}{c}} {{{\bf{x}}_2}}&{{{\bf{x}}_3}}& \ldots &{{{\bf{x}}_m}} \end{array}} \right] \end{array} \end{equation}$

Here, the DMD of the data matrix pair ${\mathbf{X}_1}$ and ${\mathbf{X}_2}$ was given by the eigendecomposition of the transition matrix A, which is an n × n matrix satisfying ${\mathbf{X}_2}\, = \boldsymbol{A}{\mathbf{X}_1}$ .

Using the singular value decomposition (SVD) of the data matrix ${\mathbf{X}_1} = {{\boldsymbol{U}}}\sum {{{\boldsymbol{V}}}^*},{{\boldsymbol{A}}}$ was approximated as ${{\boldsymbol{A}}} \approx \,{\mathbf{X}_2}{\mathbf{X}_1}^ + \triangleq \,{\mathbf{X}_2}{{\boldsymbol{V}}}{\sum ^{ - 1}}{{{\boldsymbol{U}}}^*}$ (⁺ is pseudoinverse matrix of ${\mathbf{X}_1}$ ).

However, the eigenvalue decomposition of A requires a tremendous amount of calculation. Therefore, we used the orthogonal matrix U to make ${\boldsymbol{\tilde A}} \triangleq {{{\boldsymbol{U}}}^T} \times {{\boldsymbol{A}}} \times {{\boldsymbol{U}}} = {{{\boldsymbol{U}}}^T} \times {\mathbf{X}_2}\boldsymbol{V}{\mathop \sum^{ - 1}}$ , which has fewer dimensions than A. That is, we assumed that there exists some low-dimensional spatial structure in ${\mathbf{X}_1}$ . Using ${\boldsymbol{\widetilde A}}$ , the calculation for the eigenvalue decomposition of A was reduced. The eigenvalues ${\boldsymbol{\varLambda}} \,$ of the eigenvalue decomposition ${\boldsymbol{\widetilde A}}$ were obtained by ${\boldsymbol{\widetilde A}}\textbf{W} = \textbf{W}\widetilde {\boldsymbol{\varLambda}}$ , with the eigenvalue matrix $\widetilde {\boldsymbol{\varLambda}}$ approximately equal to the eigenvalue matrix of A.

In addition, for the exact DMD, $\boldsymbol{\widehat A} \triangleq {\mathop \sum \nolimits^{ - \frac{1}{2}}}{\boldsymbol{\widetilde A}}{\mathop \sum \nolimits^{\boldsymbol{\frac{1}{2}}}}$ was set to obtain the eigenvector $\boldsymbol{W}$ by applying the eigenvalue decomposition $\boldsymbol{\widehat AW }= \boldsymbol{W}\widehat {\boldsymbol{\varLambda}}$ . Then, by putting $\boldsymbol{\widehat W} \triangleq {\mathop \sum \nolimits ^{\boldsymbol{-1}}}\boldsymbol{W}$ , the DMD mode was obtained as each row of $\widehat {\boldsymbol{\varPhi}} \triangleq {\mathbf{X}_2}\boldsymbol{V}{\mathop \sum \nolimits ^{\boldsymbol{- 1}}}*\boldsymbol{\widehat W}{\widehat {\boldsymbol{\varLambda}} ^{ - 1}}$ .

Here, $\widehat {\boldsymbol{\varLambda}} \,$ is the diagonal matrix of eigenvalues λ_i. Each eigenvalue λ_i is a DMD eigenvalue. Each column of $\widehat {\boldsymbol{\varPhi}}$ is a DMD mode ϕ_i corresponding to the eigenvalue λ_i. i = 1, ...,m − h − 1. The absolute values and phases of $\widehat {\boldsymbol{\varLambda}}$ are the decay rates and frequencies, respectively.

Finally, we were able to write an approximation of the observed data as a simple dynamic model ${\boldsymbol{\widehat X}}\left( t \right)$ ,

$\begin{equation}{\boldsymbol{\widehat X}}\left( t \right) = \widehat {\boldsymbol{\varPhi}} \exp \left( {{\boldsymbol{\Omega }}t} \right)\boldsymbol{z},\end{equation}$

where ${\boldsymbol{\Omega }} = {\text{log}}\left( {\boldsymbol{\varLambda}} \right)/{{\Delta }}t$ , t is time, and z is a set of weights to match the first time point measured such that ${\boldsymbol{\widehat X}}\left( 1 \right) = \widehat {\boldsymbol{\varPhi}} \boldsymbol{z}$ . It should be noted that $\widehat {\boldsymbol{\varPhi}}$ and Ω are sets of vectors with complex values (figure 2).

**Figure 2.** DMD of ECoG signals during movements. The ECoG signals of n channels ( ${S_{execute}}$ ) were converted into the n × 300 matrix of magnitude and phases that represents the complex values of the DMD modes ***Φ_j***. Λ_real and Λ_imagin represent the decay rates and frequencies.
Download figure:
Standard image High-resolution image

**Figure 2.** DMD of ECoG signals during movements. The ECoG signals of n channels ( ${S_{execute}}$ ) were converted into the n × 300 matrix of magnitude and phases that represents the complex values of the DMD modes ***Φ_j***. Λ_real and Λ_imagin represent the decay rates and frequencies.
Download figure:
Standard image High-resolution image

DMD was originally used to evaluate the ${\mathbf{X}_1}$ of n ≫ m. However, in our data, the ECoG signal has approximately 100 channels for 500 ms. Therefore, to calculate the DMD, we constructed a new augmented data matrix X_{1 aug} as follows:

$\begin{equation}{\mathbf{X}_{1\,aug}} = \left[ {\begin{array}{*{20}{c}} {{{\mathbf{x}}_1}}&{{{\mathbf{x}}_2}}& \ldots &{{{\mathbf{x}}_{m - h}}} \\ {}&{}&{}&{} \\ {{{\mathbf{x}}_2}}&{{{\mathbf{x}}_3}}& \ldots &{{{\mathbf{x}}_{m - h + 1}}} \\ {}&{}&{}&{} \\ \vdots & \vdots & \vdots & \vdots \\ {}&{}&{}&{} \\ {{{\mathbf{x}}_h}}&{{{\mathbf{x}}_{h + 1}}}& \ldots &{{{\mathbf{x}}_{m - 1}}} \end{array}} \right]\end{equation}$

Here, h is a minimum integer that satisfies $h < \frac{{m + 1}}{{n + 1}}$ . X_{1 aug} is an array generated from ${\mathbf{X}_1}$ of m length by stacking them in the channel dimension by sliding the step one by one. We constructed X_{2 aug} using the same procedure.

The DMD features were obtained from X_{1 aug} and X_{2 aug} by the method previously mentioned. Notably, we generated X_{1 aug} and X_{2 aug} using only ${S_{execute}}$ without any normalization by ${S_{norm}}$ to avoid damaging the spatial relations among the channels of ${S_{execute}}$ by subtracting the averages and dividing the standard deviations of ${S_{norm}}$ for each channel. In fact, the DMD mode $\widehat {\boldsymbol{\varPhi}}$ will be changed by the normalization of X_{1 aug} and X_{2 aug} by the same ${S_{norm}}$ .

Here, the number of channels, n, and the size of the X_{1 aug} were different among the subjects. Therefore, to equalize the number of DMD modes, we selected the 300 modes with the largest singular values. Because the DMD mode ϕ_i_aug yielded from X_{1 aug} and X_{2 aug} was a vector of $hn \times 1$ , we used the first $n$ vector of ϕ_i_aug corresponding to ϕ_i, which is the mode for the original data ${\mathbf{X}_1}$ and ${\mathbf{X}_2}$ for the decoding task. In addition, because the size of ϕ_i generally decreases with the higher frequency range (imagery part of the eigenvalue λ_i), we normalized each mode by the L2 norm of the mode ϕ_i. Notably, the principal angle requires the orthonormal matrix to be equalized among the permutation of the composed vectors. To simplify the calculation, we aligned the mode magnitude by the L2 norm instead of the orthogonalization.

2.4.2. Neural decoding by the Grassmann kernel.

The FFT feature is represented as a set of vectors of n × 5. The DMD features are composed of eigenvectors ${{\boldsymbol{\varPhi}} _j}\,$ corresponding to a characteristic frequency ${\omega _j}\,\left( {j = 1 \ldots 300} \right)$ .

Therefore, the DMD feature is represented as a set of vectors n × 300. Notably, the DMD feature for classification is a subspace of the representation. To compare the subspaces for the classification task, we evaluated a principal angle, which measures the distance between the subspaces based on the set of vectors. Here, based on the principal angle, we used the Grassmann kernel to classify the set of vectors by SVM [29].

Y₁ and Y₂ were set as two orthonormal matrices of size D by m. The principal angles 0 ≦ θ₁ ≦...≦ θ_m ≦ π/2 between two subspaces, span (Y₁) and span (Y₂), were defined recursively by the following:

$\begin{equation}\begin{array}{*{20}{l}} cos{\theta _k} = \mathop {\max }\limits_{{u_{k \in span\left( {{Y_1}} \right)}}} \mathop {\max }\limits_{{v_{k \in span\left( {{Y_2}} \right)}}} {u_k}^\prime {v_k},{\rm{subject}}\ {\rm{to}} \\[5pt] {u_k}^\prime {u_k} = 1, {v_k}^\prime {v_k} = 1,\\[5pt] {{u_k}^\prime {u_i} = 0,{v_k}^\prime {v_i} = 0,\;\left( {i = 1, \ldots ,k - 1} \right)} \end{array}\end{equation}$

The principal angles were computed from the SVD of ${Y_1}'{Y_2}$ as follows:

$\begin{equation*}Y_1^\prime {Y_2} = U(\cos {{\Theta }})V',\end{equation*}$

where $U = \left[ {{u_{1 \ldots \,}}{u_m}} \right]$ , $V = \left[ {{v_{1 \ldots \,}}{v_m}} \right]$ , and $\cos {{\Theta }}$ is the diagonal matrix $\cos {{\Theta }} = {\text{diag}}(\cos {\theta _1} \ldots \,\cos {\theta _m})$ .

Based on the principal angles, the projection kernel of subspaces Y₁ and Y₂ was determined as follows:

$\begin{equation*}{k_p}\left( {{Y_1},{{\,}}{Y_2}} \right) = {Y_1}'{Y_2}_F^2 = {\text{tr}}\left[ {\left( {{Y_1}{Y_1}'} \right)\left( {{Y_2}{Y_2}'} \right)} \right]\end{equation*}$

Here, we applied the projection kernel to the DMD features $\,{\widehat {\boldsymbol{\varPhi}} _i}$ and ${\widehat {\boldsymbol{\varPhi}} _j}$ to generate the Gram matrix ${{\,}}{k_p}\left( {{{\widehat {\boldsymbol{\varPhi}} }_i},\,{{\widehat {\boldsymbol{\varPhi}} }_j}} \right)$ . To apply the Gram matrix to SVM, each component of the Gram matrix was divided by the average of all components of the Gram matrix. For comparison, the FFT features were also classified using SVM with the projection kernel. We also calculated the inner product matrix correlation [34] defined as ${r_{in}}\left( {{\boldsymbol{A}},\,{\boldsymbol{B}}} \right) = \frac{{{\text{tr}}\left( {{\boldsymbol{A}}'{\boldsymbol{B}}} \right)}}{{{{\left[ {{\text{tr}}\left( {{\boldsymbol{A}}'{\boldsymbol{A}}} \right){\text{tr}}\left( {{\boldsymbol{B}}'{\boldsymbol{B}}} \right)} \right]}^{1/2}}}}$ of two $i \times j$ matrices A and B to examine the independent information contained in the DMD and FFT features.

2.4.3. T-distributed stochastic neighbor embedding analysis of the Gram matrix.

Based on the Grassmann kernel, the distance between each DMD mode is represented in the Grassmann space, in which the distance between each mode of the Gram matrix ${{{\boldsymbol{G}}}_{i,\,j}}$ corresponding to the inner product of ${\widehat {\boldsymbol{\varPhi}} _i}$ and ${\widehat {\boldsymbol{\varPhi}} _j}$ was defined as $Distanc{e_{i,\,j}} = \sqrt {{{{\boldsymbol{G}}}_{i,\,i}} + {{{\boldsymbol{G}}}_{j,\,j}} - 2{{{\boldsymbol{G}}}_{i,\,j}}}$ . To visualize the distribution of each DMD mode in the space, we applied t-distributed stochastic neighbor embedding (t-SNE) to the space [35].

2.5. Decoding analysis

The classifier was independently trained for each patient by SVM. The classification accuracy was evaluated by 10-fold nested cross validation. In the nested loop, the cost parameter was optimized in the range of ${10^{ - 1\,}} > {\text{ }}cost{\text{ }} > {10^8}$ . The classification accuracies were evaluated as balanced accuracy because the number of each movement type was uneven [36]. The balanced accuracy is defined as the average of recall obtained in each category.

We evaluated how the classification accuracies depend on the characteristic frequency. We selected 10 modes ${{\boldsymbol{\varPhi}} _j}\,\left( {j = {\text{ }}1 \ldots 10} \right)$ , which have the characteristic frequency ${\omega _j}$ closest to 29 selected frequencies: 10, 20, ..., 290 Hz. Then, the classification was performed with 10 selected modes for each frequency.

${{\boldsymbol{\varPhi}} _j}$ is the vector of complex values. Each complex value is expressed as a radius and a phase in a complex plane. To evaluate how much the phase information contributes to the decoding, we also evaluated the classification accuracy using the DMD features in which the phases of each ${{\boldsymbol{\varPhi}} _j}$ were randomized.

2.6. Statistical tests

We compared the classification accuracies evaluated by the two types of features, DMD features and FFT features, by Student's paired t-test, with p < 0.05 indicating significance.

We also compared the correctness of prediction by SVM trained with DMD and FFT features by Welch's t-test, with p < 0.05 indicating significance.

The classification accuracy of each frequency was compared to the classification accuracy using the same features with shuffled labels by using Bonferroni-corrected Student's paired t-test, with p < 0.05 indicating significance.

The phases of the DMD mode of each electrode were evaluated by the Rayleigh test to show some biases in the phase distribution of the electrode among each movement task.

3. Results

3.1. Gram matrix based on projection kernel with DMD mode and clustering of dynamics

For illustrative patient 1, we calculated the projection kernel values ${{{\boldsymbol{G}}}_{i,\,j}}$ , which correspond to the inner products between each DMD mode ${\widehat {\boldsymbol{\varPhi}} _i}$ and ${\widehat {\boldsymbol{\varPhi}} _j}$ . The projection kernel values demonstrated some clusters according to the movement types (figure 3(A)). Based on the projection kernel, we further calculated the distance matrix between each DMD mode, which represents the Grassmann space for the SVM classification. To visualize the Grassmann space, t-SNE was applied for the distance matrix to embed each trial into a two-dimensional space (figure 3(B)). Each movement type was classified into different clusters. The Grassmann space using the projection kernel seemed to be successful at classifying the movement types.

**Figure 3.** Gram matrix and similarity in the Grassmann space for patient 1. (A) The projection kernel values were color-coded on the matrix of each trial. The numbers on the vertical and horizontal axes represent the movement type (1, grasping; 2, opening; 3, pinching). (B) The distribution of the data for each trial in the Grassmann space that was embedded into two dimensions by t-SNE. Each class corresponds to the movement type.
Download figure:
Standard image High-resolution image

3.2. Neural decoding using DMD features

Using the Grassmann kernel, the classification of the movement types was evaluated. The classification accuracy significantly varied among the different features: DMD feature, FFT feature, DMD feature with shuffled phase, and DMD features with shuffled labels (p = 9.12 × 10^–11, one-way ANOVA; figure 4). The three types of hand movements were successfully classified using the DMD features (79.8 ± 6.28%; mean ± 95% confidence interval), the FFT features (67.3 ± 8.56%), and the DMD features with shuffled phase (66.7 ± 9.96%), with accuracies exceeding the accuracy using the DMD features with shuffled labels (33.1 ± 3.90%, i.e. chance level). Among these classification schemes, the classification accuracy using the DMD features was significantly superior to that using the FFT features and the DMD features with shuffled phase (p < 0.05, paired Student's t-test, uncorrected).

**Figure 4.** Classification accuracy using different features. The mean and 95% confidence interval of the classification accuracies are shown for the classification using the DMD features, FFT features, DMD features with shuffled phase, and DMD features with shuffled labels (*p < 0.05, **p < 0.01, paired Student's t-test).
Download figure:
Standard image High-resolution image

We also examined how the prediction of the DMD and FFT features were correlated. First, we evaluated the inner product matrix correlation of the Gram matrices of the DMD features and the FFT features. For 10 out of 11 patients, the correlation was more than 0.8 (supplementary information, table 2), suggesting that the Gram matrices were similar to each other. However, for patient 5, whose correlation was 0.359, the classification accuracy was higher with the DMD features for this patient. Second, we compared the predictions using two features. We divided the classification results with the two features into three groups for each patient: Group 1, the predictions of both features were the same; Group 2, the prediction of DMD was correct, but that of FFT was incorrect; and Group 3, the prediction of DMD was incorrect, but that of FFT was correct. Among all patients, the ratio of the groups was significantly high for Group 1 (66.5 ± 8.02%) (figure 5). Moreover, for the conflicting predictions, Group 2 was significantly larger than Group 3. Therefore, these two features mostly result in the same predictions, but in the case of conflicting predictions, the DMD features predicted significantly better than the FFT features.

**Figure 5.** Comparison of correctness of prediction by SVM trained with DMD and FFT features. The mean and 95% confidence interval of prediction ratio where the two sets of features had the same prediction label, where DMD had correct prediction labels and FFT had incorrect prediction labels, and where DMD had incorrect prediction labels and FFT had correct prediction labels (**p < 0.01, Bonferroni-corrected Welch's t-test).
Download figure:
Standard image High-resolution image

We compared the classification accuracy using subsets of the DMD features to evaluate which frequency band modes contributed to the classification. The classification accuracies using DMD features of the characteristic frequency close to the selected frequency significantly exceeded the accuracy by chance at 60, 80, 100, 110, and 150 Hz (p < 0.01, Bonferroni-corrected paired Student's t-test; figure 6). The peak mean accuracy was 50.4 ± 2.42% (mean ± 95% confidence interval) at 100 Hz, which corresponds to the high gamma band.

**Figure 6.** Classification accuracy for each frequency mode. The classification accuracies using the DMD features whose frequency components were close to each frequency point (blue line) were compared to those using the same features with shuffled labels (chance level, orange line). The accuracies at approximately 100 Hz were significantly higher than those at chance level (*p < 0.05, **p < 0.01, paired Student's t-test, Bonferroni corrected).
Download figure:
Standard image High-resolution image

3.3. DMD modes during movement

The DMD mode captured spatiotemporal patterns characterizing movement type. For the high gamma band frequency (80–150 Hz) in patient 1, the averaged amplitudes and phases of the DMD mode were evaluated and color-coded according to electrode position (figure 7(A)). The magnitude of the DMD mode was high at the electrodes on the primary motor cortex and demonstrated different patterns in accordance with movement type for patient 1. In addition, the relative phases of each electrode were significantly biased around the motor cortex, and the averaged relative phases were different among movement types. Similarly, the magnitudes and phases of the DMD modes changed based on the movement types for all other patients (supplementary information). On the other hand, although the FFT powers of each electrode averaged among the same frequency band demonstrated some spatial pattern characterizing the movement type, the distribution of the high powers was different than that of the magnitude of the DMD mode and not restricted around the primary motor cortex (figure 7(B)). The DMD mode captured some characteristic spatiotemporal patterns of the phases and magnitudes.

**Figure 7.** DMD features and FFT features for patient 1. (A) The top row represents the averaged magnitudes of the DMD modes at each electrode. These magnitudes were obtained by averaging the mode magnitudes in the high gamma band among each movement type. The bottom row represents the averaged phases of the DMD modes. The phase was evaluated for the modes with positive imagery components. Notably, each DMD mode was composed of a pair of complex conjugates. The phase of each electrode was subtracted by the phase of electrode number 1 (upper right of 60 displayed electrodes). The relative phases were averaged among each movement type. The relative phases at the electrodes with black circles were significantly biased (p < 0.05, uncorrected Rayleigh test). The white dotted line represents the location of the central sulcus. (B) The scaled FFT power was averaged between 80 and 150 Hz (high gamma band) of each movement type and color coded by electrode position.
Download figure:
Standard image High-resolution image

4. Discussion

Our analyses indicated that the DMD mode with projection kernel provided better accuracy of the neural decoding than the FFT power, thereby demonstrating that DMD succeeded in capturing neural information better than FFT. Moreover, the projection kernel was suggested to represent the characteristic differences in ECoG signals among different movement types. Our proposed method using DMD with a projection kernel is a promising technique for neural decoding.

DMD succeeded in extracting some spatial and temporal patterns characterizing three types of hand movements. Although a previous study demonstrated that DMD extracts some characteristic spatial patterns in ECoG signals [27], it remained unclear whether these patterns included more information about the movement types than the patterns evaluated by FFT. In our study, the decoding accuracy using DMD features was higher than that using FFT and DMD features with shuffled phases. Notably, both DMD and FFT were able to capture spatial and temporal patterns of each frequency range, as illustrated in figure 4. However, FFT was not able to maintain the information of phases between the dynamics of different channels. Indeed, the classification accuracy using the DMD mode significantly decreased when the phases of the DMD modes were shuffled and were similar to the accuracy using the FFT. Moreover, comparing the predictions of the DMD features and the FFT features, these predictions mostly overlapped, but the DMD features had more useful information than the FFT features. Therefore, our data suggest that the DMD mode improved the classification accuracy compared to the FFT by capturing some spatially and temporally coherent patterns in the ECoG signals to obtain motor information.

Among the frequency features examined, the DMD modes at approximately 100 Hz demonstrated higher classification accuracy, implying that they contained a larger amount of motor information. Some studies using FFT powers have also shown that the frequency powers at approximately 100 Hz provide more motor information [15, 37]. Here, DMD was suggested to extract the spatial and temporal pattern relating to the oscillation at 100 Hz. Moreover, our results demonstrated that the DMD mode close to 60 Hz was also decoded with accuracy exceeding the chance level. Although a previous study using FFT signals demonstrated that the powers at approximately 60 Hz were difficult to decode due to some electrical power noise [37], our results suggested that the DMD mode is useful for extracting neural information from the ECoG signals, even with noise.

There are some problems in applying the proposed decoding method to a BCI. First, the DMD mode calculation must be made within at least several hundred milliseconds, which would require sufficient machine power and an appropriate approximation of the calculation. An incremental SVD might be useful to obtain such an appropriate approximation [38–40]. Second, we might need to select an appropriate DMD mode for online use during which the activity patterns of the brain continuously change. Because DMD assumes a stationary nonlinear property over a certain period of time, even for nonstationary dynamics, adapting the method to nonstationary data might be difficult. Therefore, some other machine learning techniques, such as a hidden Markov model, may be needed to address nonstationary states [41].

Although some technical improvements (e.g. calculation speed, adapting to nonstationary data) are necessary, improvements in the decoding accuracy of the performed movement types will improve the BCI as a communication device for the patient with limited ability to move their bodies. Our previous study demonstrated that the ECoG signals between the performed movement and attempted movements of paralyzed patients were similar and were decoded similarly. Our proposed method is also expected to improve the performance of attempted movements. Therefore, our proposed method with DMD will improve the decoding accuracy for paralyzed patients. Using precise decoding of attempted movements, paralyzed patients will be able to control some external devices to communicate with others [10]. Our method will improve BCI performance to realize better communication using BCI for paralyzed patients.

Practically, our method can be applied for to the functional mapping of movements or other various cortical functions, such as language, which has been developed with power spectral analysis of ECoG signals [42]. DMD will reveal the spatially and temporally coherent pattern of the ECoG signals in the characteristic frequency range with the highest information regarding particular cortical functions and might improve the accuracy of the functional mapping compared to that using power spectral analysis without phase coherence among electrodes. Similarly, our method can be applied to identify particular characteristic ECoG signals related to epileptic activity. Our neural decoding with DMD will be useful to identify spatially and temporally coherent patterns that represent particular types of information.

This study had some limitations. To simplify and shorten the calculation, we aligned the mode magnitude by the L2 norm instead of the orthogonalization. Therefore, the projection kernels between two matrices were not guaranteed to be the same when the rows of the matrix were exchanged with each other [29]. However, our results demonstrated that the simplified method succeeded in decoding three movement types with accuracy that exceeded that using FFT. The proposed method with a projection kernel was shown to be useful for motor decoding. Our future work should evaluate how orthogonalization affects the decoding accuracy. In addition, our current method ignored the frequency information. Although we used the DMD modes at specific frequencies, we did not use the frequency components for the decoding. The decoding accuracy is expected to improve with the addition of the frequency components into the DMD features. Notably, we compared the classification accuracy of three types of movements that were different among patients. The differences in the movement types might have affected the accuracy. However, by comparing the accuracy using FFT features and the DMD features with Grassmann kernels for each patient, we eliminated the effects of the different sets of movement types. Moreover, the ECoG signals in this study came from two types of patients with and without paralysis in their hands. These patients had different diseases and were implanted with different numbers of electrodes but implantations on their sensorimotor cortices was common among the patients. Notably, those differences induced no apparent differences in the decoding accuracies (supplementary information). However, the anatomical pattern of the DMD features might be affected by ischemic stroke and brain tumors in the motor cortex. We have demonstrated that the classification accuracy was significantly improved by DMD features compared to FFT features using the same classification method for the ECoG signals from the sensorimotor cortex in various conditions.

5. Conclusions

We demonstrated that DMD, which extracts spatial and temporal patterns with phase relations, is useful for extracting more motor information than FFT. The proposed method will be applied to improve BCIs and to capture some characteristic patterns of cortical activities related to motor functions.

Study funding

This research was conducted under the Japan Science and Technology Agency (JST) Core Research for Evolutional Science and Technology (JPMJCR18A5). This research was also supported in part by the Japan Science and Technology Agency (JST) Precursory Research for Embryonic Science and Technology (JPMJPR1506), Exploratory Research for Advanced Technology (JPMJER1801) and CREST (JPMJCR1913); Grants-in-Aid for Scientific Research from KAKENHI (JP17H06032, JP15H05710, JP18H05522 and JP18H03287); Grants from the Japan Agency for Medical Research and Development (AMED) (19dm0207070h0001, JP19dm0307103, JP19de0107001, JP18H05522 and JP19dm0307009); Contract research with the National Institute of Information and Communications Technology (Grant #209); SIP (Innovative AI Hospital System) of NIBIOHN; and the Canon Foundation.

Conflict of interest

We declare no competing interests.

Author contributions

Yoshiyuki Shiraishi: Software, Validation, Formal analysis, Investigation, Visualization, Writing-Original Draft, Writing-Review and Editing. Yoshinobu Kawahara: Conceptualization, Methodology, Writing-Review and Editing. Okito Yamashita: Conceptualization, Writing-Review and Editing. Ryohei Fukuma: Software, Writing-Review and Editing. Shota Yamamoto: Data Curation. Youichi Saitoh: Resources. Haruhiko Kishima: Resources. Takufumi Yanagisawa: Conceptualization, Investigation, Data Curation, Writing-Original Draft, Writing-Review and Editing, Supervision, Project administration, Funding acquisition.

Data/code availability

The data that support the findings of this study are not publicly available because they contain information that could compromise research participants' privacy and/or consent. The code used in this study are publicly available at https://github.com/yanagisawa-lab.

Neural decoding of electrocorticographic signals using dynamic mode decomposition

Article metrics

Submit

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. Subjects

2.2. Movement tasks

2.3. ECoG recording and preprocessing

2.4. FFT features for neural decoding

2.4.1. Computation of the dynamic mode decomposition.

2.4.2. Neural decoding by the Grassmann kernel.

2.4.3. T-distributed stochastic neighbor embedding analysis of the Gram matrix.

2.5. Decoding analysis

2.6. Statistical tests

3. Results

3.1. Gram matrix based on projection kernel with DMD mode and clustering of dynamics

3.2. Neural decoding using DMD features

3.3. DMD modes during movement

4. Discussion

5. Conclusions

Study funding

Conflict of interest

Author contributions

Data/code availability

Neural decoding of electrocorticographic signals using dynamic mode decomposition

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Methods

2.1. Subjects

2.2. Movement tasks

2.3. ECoG recording and preprocessing

2.4. FFT features for neural decoding

2.4.1. Computation of the dynamic mode decomposition.

2.4.2. Neural decoding by the Grassmann kernel.

2.4.3. T-distributed stochastic neighbor embedding analysis of the Gram matrix.

2.5. Decoding analysis

2.6. Statistical tests

3. Results

3.1. Gram matrix based on projection kernel with DMD mode and clustering of dynamics

3.2. Neural decoding using DMD features

3.3. DMD modes during movement

4. Discussion

5. Conclusions

Study funding

Conflict of interest

Author contributions

Data/code availability