Analytic beamformer transformation for transfer learning in motion-onset visual evoked potential decoding

Arno Libert; Arne Van Den Kerchove; Benjamin Wittevrongel; Marc M Van Hulle

doi:10.1088/1741-2552/ac636a

1. Introduction

Brain computer interfaces (BCIs) offer alternative communication or rehabilitation facilities to those suffering from severe neuromuscular disorders [1–6] such as amyotrophic lateral sclerosis, stroke and spinal cord injury. Depending on the used brain recording technique, BCIs are divided into invasive, semi-invasive and non-invasive [7–9]. Whilst invasive BCIs enjoy the highest information transfer rate (ITR) and usability scores [10], a significant group of patients (up to 40%) prefer not to undergo the required surgery despite invasive BCIs being justified on medical grounds [11, 12]. For these and other patient groups, non-invasive BCIs could be a viable option when traditional assistive devices are unsuited [13–17]. The most widely used non-invasive recording technique is electroencephalography (EEG) although functional near infrared spectroscopy, magnetoencephalogram (MEG) and optically pumped magnetometers MEG have also been used in BCI settings. EEG-based BCI paradigms are based on slow cortical potentials [2, 9, 18–20], sensorimotor rhythms [2, 3, 9, 11], visual event-related potentials (ERP) [2, 4, 5, 9, 18, 21], steady-state visual evoked potentials (SSVEP) [2, 22–24], and code-modulated visual evoked potentials (cVEP) [21, 25, 26], or a combination of these (hybrid BCI, for a review see [2]).

Here we focus on a less popular BCI paradigm: motion-onset visual evoked potential (mVEP) [27–30]. In contrast to the aforementioned paradigms, which rely on sudden or repeatedly flickering stimuli, mVEPs are evoked in response to the onset of a moving non-flickering stimulus, which is reported to more comfortable to the user [29–31]. Motion direction has been shown to impact on the recorded mVEP responses [27, 32, 33] and the observed phase difference between EEG channels could be exploited to achieve a higher communication rate [34].

Ideally, for the individual user, a BCI application should be plug-and-play, prior developed on population data or through a zero-training strategy. In reality, EEG signals are spatiotemporally complex, age and mental state dependent, and possibly the nature and extent of his/her brain disorder, and further susceptible to noise and a range of physiological and muscular artefacts [8, 35]. As a result, EEG data exhibits intra- and inter-subject variability [35–38], challenging the main assumption behind machine learning: training and test data should belong to the same feature space and probability distribution [8]. This explains why subject-specific training is relied on when targeting superior BCI performance [21, 28, 39, 40]. In the case of SSVEP, despite there have been promising demonstrations requiring little or no training to reach high decoding accuracy [40, 41], inter-subject variability remains an issue also in other visual paradigms. This points to the more general topic of transfer learning: 'the ability of a system to recognize and apply knowledge and skills learned in previous tasks to novel tasks' [42, 43]. In the BCI domain, it refers to the transfer of subject-specific, labeled data [8, 35, 43] learned by one model to another or by finding a structure invariant across datasets [44]. For accounts on transfer learning in EEG-based BCI settings, we refer to [8, 35, 42–44].

In this work, we propose a novel way to extract phase and magnitude information from ERP responses, called analytic beamformer transformation (ABT), and apply it to subject- as well as population-based decoder training.

2. Methods

2.1. Software

All recordings, preprocessing, analysis and classification was performed in Matlab [45]. The Psychtoolbox [3, 46, 47] extension was used for precise timing of the experimental interface.

All code can be made available upon reasonable request.

2.2. Subjects and ethics

We recruited 26 healthy subjects (13 female, average age 24.842, standard deviation 2.641, 16 right-handed) of which 23 never before participated in an EEG experiment and the remaining 3 had no previous experience with the motion-onset paradigm. Prior to the experiments, participants read, and when they agreed, signed the consent form prior approved by the ethical committee of the university hospital UZ Leuven (S62547). All participants had normal or corrected-to-normal vision and were remunerated for their participation. Handedness was determined using an adapted Edinburgh inventory questionnaire [40]. In contrast to previous work, all recordings were included in modeling efforts, despite possible issues with BCI illiteracy and poor signal quality, to test the robustness of the method in a realistic setting.

2.3. Data acquisition and experimental setup

The motion-onset data stemmed from a previous experiment [34] and is therefore only briefly described. Subjects were seated approximately 70 cm from the screen. Each subject repeated the experiment twice, once with the motion stimulus going from left to right, termed right translation (RT), and once going from right to left, termed left translation (LT). Both experiments were counterbalanced across subjects. Albeit the mVEP responses depend on translation direction, previous work showed it had no impact on decoding accuracy [48]. The experimental interface consisted of nine rectangular targets (2.5 × 1.25 cm) laid out in a 3 × 3 matrix with 6 cm between nearest neighbor targets. Subjects were asked to gaze at a '+' positioned in the center of the target and overlaying the cued target, and to mentally count the number of motion-onset stimulations of that target. Motion-onset occurred in the form of a vertical line moving across the target starting from the left edge of the target for RT and from the right edge for LT. Motion-onset was brisk and motion lasted for 140 ms followed by a 60 ms inter-stimulus interval. Targets were stimulated sequentially, in a pseudorandom, non-overlapping manner, with each target stimulated exactly 10 times; one target stimulation is termed an 'epoch' and the series of 10 epochs for each of the nine targets a 'trial', with one stimulation being an 'epoch'. Sequential trials were separated by a 600 ms interval to enable the subject to direct his/her gaze to a new cued target. Targets were selected pseudorandomly. After nine trials, in which each target was highlighted as cued target, further called a 'block', the subject was allowed take a break. In total, the experiments consisted of four such blocks. This resulted in 3240 epochs per experiment with 360 'on-target' epochs and 2880 'off-target' epochs.

The experimental interface was displayed on a 1920 × 1080 monitor (ViewPixx, Canada) operating at 120 Hz refresh rate while subjects were seated at approximately 70 cm distance. The gaze of the subjects was tracked using an EyeLink 1000 Plus device (SR-Research, Canada) to verify correct target gazing. Note that the data from the eye tracker was not included in the analysis. The subject's EEG was recorded with a SynAmps RT device (Compumedics, Australia) using a cap equipped with 32 active Ag/AgCl electrodes, evenly distributed over the scalp in accordance with the international 10–20 system, and sampled at 2000 Hz. The ground and reference electrode were placed at AFz and FCz respectively.

2.4. Data preprocessing

All data processing was performed offline. First, raw EEG signals were re-referenced to the average of the mastoid signals (TP9 & TP10). Second, the re-referenced signals were bandpass filtered between 1 and 10 Hz using a zero-phase 4th order Butterworth filter. Next, filtered data was cut into 800 ms epochs starting 200 ms pre-stimulus onset until 600 ms post-stimulus onset. The average activity in the 200 ms pre-stimulus window was used to baseline-correct the post-stimulus onset epochs and was then discarded. Finally, epochs were downsampled to 100 Hz and stored for further analysis. When the cued target was stimulated, this resulted in an 'on-target' epoch, when a non-cued target was stimulated, an 'off-target' epoch; recall that the ratio cued- vs. non-cued epochs is 1/8.

Channel selection was performed manually based on averaged motion-onset responses, averaged across epochs and across subjects. Eventually, 14 channels {'O1','Oz','O2','PO3','PO4','P7','P3','Pz','P4','P8','CP5','CP1','CP2','CP6'} were selected.

2.5. Analysis

The ABT we propose is inspired by the spatiotemporal linearly constrained minimum variance (LCMV) beamformer [21, 23, 42], which we redeveloped to extract phase and/or magnitude information in mVEP responses which is then input to the classifier that decodes the cued target. We adhere to the usual notation: a scalar is represented by an uncapitalized letter, a vector is in bold, a matrix capitalized in bold and a tensor capitalized and underlined in bold. Further, R refers to real numbers and C to complex-valued numbers.

2.6. Phase and magnitude extraction using analytic beamformers

In essence, the beamforming algorithm estimates the contribution of a predefined activation pattern ('template') to a given signal. After preprocessing, we assemble our experimental data in a spatiotemporal tensor $\underline {{\mathbf{EP}}}$ $\in {\mathbb{R}^{{\text{c}} \times {\text{n}} \times {\text{r}}}}$ with c the number of electrodes, n the number of time samples in an epoch, and r the number of epochs. The spatiotemporal activation matrix ${\mathbf{AP}} \in {\mathbb{R}^{{\text{c}} \times {\text{n}} \times {\text{r}}}}$ captures the spatiotemporal component-of-interest, calculated by averaging the on-target epochs in $\underline {{\mathbf{EP}}}$ . In spatiotemporal beamforming, we compile a vector of each epoch by concatenating the rows of the epochs in $\underline {{\mathbf{EP}}}$ , yielding a matrix ${\mathbf{X}} \in {\mathbb{R}^{{\text{r}} \times \left( {{\text{cn}}} \right)}}$ and our concatenated spatiotemporal activation pattern ${\mathbf{ap}} \in {\mathbb{R}^{\left( {{\text{cn}}} \right) \times 1}}$ by averaging the vectors in r generated by the on-target epochs. From ${\mathbf{X}}$ , the covariance matrix ${\boldsymbol{\Sigma }} \in {\mathbb{R}^{\left( {{\text{cn}}} \right) \times \left( {{\text{cn}}} \right)}}$ is calculated. We use shrinkage regularization [31] of the covariance matrix:

$\begin{equation}{{\boldsymbol{\Sigma }}_R} = \left( {1 - \alpha } \right)*{\boldsymbol{\Sigma }} + \alpha {\mathbf{I}}{\text{*}}{\boldsymbol{\Sigma }},\end{equation} \tag{ 1 }$

with $\alpha = 0.05{\text{ }}$ and I the identity matrix with the same dimensions as, as this significantly increases classification accuracy, as shown in [33]. LCMV beamformers [21, 25, 49] are calculated under the constraint ${\mathbf{a}}{{\mathbf{p}}^{\text{T}}}{\mathbf{w}} = 1$ :

$\begin{equation}{\mathbf{w}} = \frac{{{\boldsymbol{\Sigma }}_R^{ - 1}{\mathbf{ap}}}}{{{\mathbf{a}}{{\mathbf{p}}^{\text{H}}}{\boldsymbol{\Sigma }}_R^{ - 1}{\mathbf{ap}}}}.\end{equation} \tag{ 2 }$

With ${\mathbf{a}}{{\mathbf{p}}^{\text{H}}}$ the conjugate transpose of ${\mathbf{ap}}$ . The likelihood of an activation pattern present in a new epoch ${\mathbf{EP}} \in {\mathbb{R}^{{\text{c}} \times {\text{n}}}}$ is obtained by a simple weighted sum:

$\begin{equation}y = {\mathbf{ep}}*{{\mathbf{w}}^H},\end{equation} \tag{ 3 }$

where ${\mathbf{ep}} \in {\mathbb{R}^{1 \times \left( {{\text{cn}}} \right)}}$ indicates the concatenated rows of epoch ${\mathbf{EP}}$ and y the contribution of activation patterns ${\mathbf{ap}}$ in ${\mathbf{EP}}$ as a single number.

For ABT, the proposed data transformation, we used the Hilbert transform to calculate the discrete-time analytic signal of each epoch in $\underline {{\mathbf{EP}}}$ , yielding a training set of analytic epochs $\underline {\widehat {{\mathbf{EP}}}} \in {\mathbb{C}^{{\text{c}} \times {\text{n}} \times {\text{r}}}}$ and a subsequent analytic activation pattern $\widehat {{\mathbf{AP}}} \in {\mathbb{C}^{{\text{c}} \times {\text{n}}}}$ . In turn, we selected each channel ${\text{q}} \in \left[ {1..c} \right]$ of $\widehat {{\mathbf{AP}}}$ to contain the desired signal ${\mathbf{s}} \in {\mathbb{C}^{1 \times {\text{n}}}}$ . We then calculate the analytic covariance matrix ${\boldsymbol{\Sigma}_q} \in {\mathbb{C}^{\text{n}\times \text{n}}}$ of the channel containing the desired signal by taking data from this channel for every epoch. We repeat this process for every channel of $\widehat {{\mathbf{AP}}}$ and assembled the resulting signal in a matrix of analytic beamformers ${\mathbf{ABF}} \in {\mathbb{C}^{{\text{n}} \times {\text{c}}}}$ . The pseudocode is shown in figure 1.

**Figure 1.** Definitions of APR, AMR and AMPR.
Download figure:
Standard image High-resolution image

2.7. Data transformation using the analytic beamformer transform (ABT)

We apply ABT to transform the individual epochs in $\widehat {{\mathbf{EP}}}$ by adapting the weighted sum described in (3) to each channel 'k' of every epoch using the analytic beamformers trained using data from each channel 'j':

$\begin{equation}{y_k} = {\widehat {{\mathbf{ep}}}_k}*{\mathbf{abf}}_j^H,\end{equation} \tag{ 4 }$

with ${y_k} \in \mathbb{C}$ the transformed data for channel k, ${\widehat {{\mathbf{ep}}}_k}$ the analytic signal in channel k, and ${\mathbf{ab}}{{\mathbf{f}}_{\text{j}}}$ the analytic beamformers trained using information from channel j. The result Y_k can be presented as a vector with magnitude m and angle θ. The magnitude represents the difference between the shape and amplitude of ${\widehat {{\mathbf{ep}}}_{\text{k}}}$ and ${\mathbf{s}}$ used to construct ${\mathbf{ab}}{{\mathbf{f}}_{\text{j}}}$ , whilst θ is the difference between the phase of ${\widehat {{\mathbf{ep}}}_{\text{k}}}$ and ${\mathbf{s}}$ which can be written as $p = {e^{i\theta }}$ . Note that, when the signal in ${\widehat {{\mathbf{ep}}}_k}$ is identical in magnitude, phase and shape to the analytic signal ${\mathbf{s}}$ used to construct ${\mathbf{ab}}{{\mathbf{f}}_j}$ , ${y_k}$ would have magnitude m = 1 and θ = 0 (thus, p = 1). We applied (4) to every epoch by taking the signal of each channel of the epoch as ${\widehat {{\mathbf{ep}}}_{\text{k}}}$ and by using each set of ${\mathbf{ABF}}$ as ${\mathbf{ab}}{{\mathbf{f}}_{\text{j}}}$ . This transforms each complex epoch $\widehat {{\mathbf{EP}}} \in {\mathbb{C}^{{\text{c}} \times {\text{n}}}}$ into two matrices; ${\mathbf{P}} \in {\mathbb{C}^{{\text{c}} \times {\text{c}}}}$ and ${\mathbf{M}} \in {\mathbb{R}^{{\text{c}} \times {\text{c}}}}$ containing the phase shifts in Euler notation and the magnitudes of $\widehat {{\mathbf{EP}}}$ transformed with ${\mathbf{ABF}}$ , respectively. After repeating this for every individual epoch, we obtain ${\mathbf{\underline{P{}} }} \in {\mathbb{C}^{{\text{c}} \times {\text{c}} \times {\text{r}}}}$ and ${\mathbf{\underline{M{}} }} \in {\mathbb{R}^{{\text{c}} \times {\text{c}} \times {\text{r}}}}$ which completes our data transformation. Pseudocode is presented in figure 2. Note that ${\mathbf{\underline{P{}} }}{\text{ }}$ , ${\mathbf{\underline{M{}} }}{\text{ }}$ and their combination (through dot multiplication) are more compact representations of the original spatiotemporal epochs and, therefore, cannot be called epochs anymore. In order to avoid confusion, we will henceforth refer to analytic phase representation based (APR-based), analytic magnitude representation based (AMR-based) and analytic magnitude-phase representation based (AMPR-based) decoding (figure 3).

**Figure 2.** Pseudocode of analytic beamformer calculation.
Download figure:
Standard image High-resolution image

**Figure 3.** Pseudocode of the analytic beamformers applied to transform spatiotemporal epochs (analytic beamformer transformation, ABT).
Download figure:
Standard image High-resolution image

2.8. Individual subject- and population-based validation

In this work, we investigated the performance of individual subject- and population-based classification of on-target and off-target epochs.

Individual subject classifiers were trained on epochs from three of the four blocks acquired from an experiment and validated on the 4th block in a fourfold cross validation setting. The reported accuracies are the averages of the four possible folds. The analytic beamformers, used for data transformation (ABT), were calculated using only training data without further optimization. We also investigated the APR-based, AMR-based and AMPR-based classification performance.

For population-based decoding, we considered four cases. First, the spatiotemporal epochs case. Here, we adapted a leave-one-subject-out cross-validation strategy. All other data was concatenated into the training dataset used to train the decoders and tested on the data from the left-out subject. This was primarily done to compare the results of the existing methods against ABT.

Second, we verified whether ABT would have a positive or negative effect on population decoding performance given abundance of available data to calculate the analytic beamformers (ABs). Hereto, we used all available data from the individual training sets to develop and test the ABs. It was hypothesized that every dataset needs a dataset-specific covariance matrix and activation pattern to calculate the ABs. In other words, each individual dataset would need to be transformed using the dataset-specific ABs. After this transformation, all individually transformed training datasets are concatenated into one big dataset and used to train the decoders; the transformed test set would be used to test the decoders. This was repeated until each individual dataset was used as test set in a leave-one-subject-out cross-validation strategy.

Third, as a use case, we considered a BCI that needs to be developed for a new user (=test set), given a large set of data available from other users (training set). Since all data from the training set is available, we used all data to calculate the ABs to transform the individual training datasets. For the test set, we gradually increased the amount of available data to see how many epochs we would need to calculate the ABs, applied ABT using these ABs, and then used the transformed data to assess decoder performance.

Fourth, in order to evaluate how many epochs need to be present in a dataset before it can be added to the training set, we gradually increased the available data of each individual training set used to calculate the ABs similar to case 3 (see previous paragraph). In addition, we also limited the available data in the test set as described in case 3.

Note that for cases 2, 3 and 4 we calculated performance using AMR, APR and AMPR-based decoding, and when one dataset of a subject was used as testing data, the other dataset from that subject was omitted from the population dataset (leave-one-subject-out).

From the training data, we excluded any epochs with an amplitude higher than 75 µV and lower than −75 µV to reduce the variability and remove outliers. This resulted in 12 991 epochs being rejected out of 174 960. Note that outlier removal was not performed on the test data. We observed that mainly the decoding algorithms based on the spatiotemporal epochs benefitted from removing outliers from the training data.

2.9. Classification methods

We considered three classifiers (spatiotemporal beamformer classifier, SWLDA, SVM) to decode population-based mVEP responses with and without the proposed transform, whilst only requiring a small amount of person-specific information. The classification problem was to determine which target was observed during the trial, creating a binary classification problem. The winning target was the one whose classifier returned the highest value.

The proposed transformation ABT results in two tensors: ${\mathbf{\underline{P{}} }}{\text{ }}$ and ${\mathbf{\underline{M{}} }}$ . As the SWLDA and SVM classifiers could not be trained on analytic data directly, we concatenated APR and AMPR in the following manner; ${\mathbf{APR}} \in {\mathbb{C}^{{\text{c}} \times {\text{c}}}}$ becomes ${\mathbf{APR}} \in {\mathbb{C}^{{\text{c}} \times 2{\text{c}}}}$ and ${\mathbf{AMPR}} \in {\mathbb{C}^{{\text{c}} \times {\text{c}}}}$ becomes ${\mathbf{AMPR}} \in {\mathbb{C}^{{\text{c}} \times 2{\text{c}}}}$ . Alternatively, the cosine and sine could also be used. We were aware that classifiers using complex-valued numbers do exist but we opted for the SWLDA and SVM classifiers as these have been used before to classify mVEP data [33]. The goal here is to evaluate the effect of transformed vs. spatiotemporal epochs on decoding accuracy when using linear classifiers.

To train the linear classifiers, all epochs were concatenated into vector form. Note that this only occurred after ABT was applied.

2.9.1. L1—support vector machine

The linear SVM algorithm calculates the optimal decision hyperplane separating two classes; its regularization parameter (penalty for misclassification) was optimized using a line search [50] and 10-fold cross-validation. This 10-fold cross validation for optimization was performed using training data. Parameters of the line search were kept at the default values suggested in [50]. As said before, the SVM was not trained on analytic data directly but on concatenated data.

2.9.2. Spatiotemporal beamformer classifier

We adapted the spatiotemporal beamformer to perform classification by following the methodology described in [21, 25, 49]. As activation pattern we took the average of all on-target spatiotemporal epochs or transformed epochs and we regularized the covariance matrix of the epochs using shrinkage regularization as described in (1) with = 0.05. Note that the spatiotemporal beamformer classifier was trained on analytic data directly. The result of the APR and AMPR-based classification was a complex number y. In previous works [21, 25, 49] the maximum value of y would be selected as the winner. The maximum value of a complex number would be the highest absolute value in Matlab which would allow strong negative results to outweigh positive ones. In this work, the highest real value was chosen as the winner. Note that, whilst the proposed ABT was inspired by spatiotemporal beamforming, it should be distinguished from the decoder based on the latter, not the least because the output formats differ substantially.

2.9.3. Step-wise linear discriminant analysis (SWLDA)

SWLDA is an extension of Fisher's LDA that selects suitable features in the discriminant function [27]. As parameters we set the maximum number of steps to 'no maximum' (=default), p = 0.05 and 0.10 for entry and exit tolerances, respectively, as suggested in the 'stepwisefit' function of Matlab. Note that the step-wise linear discriminant analysis algorithm was not trained on analytic data directly.

2.10. Statistics

A pairwise comparison was performed between the accuracy of the classifier operating on spatiotemporal epochs directly and on AMR-based, APR-based or AMPR-based classification. We used the paired Wilcoxon sign-rank test to determine whether the observed accuracy differences were significant or not. Our null hypothesis is that the median of the paired differences is zero. The significance threshold was set at 0.05, the Bonferroni correction for multiple comparisons was set at 0.0167 (0.05/3).

3. Results

In this section, we display the results for all classifiers; the results grouped per classifier can be found in the supplementary material (available at stacks.iop.org/JNE/19/026040/mmedia). In the figures, bright red, blue and green depict the results when decoding with AMR, APR and AMPR, respectively, while dark red, dark blue and dark green represent the results when using spatiotemporal epochs.

3.1. Effect of the transformation on mVEP data

In this section, we investigate the effect of ABT on the data. We use violin plots of the distributions of the spatiotemporal epochs and AMR, APR and AMPR in supplementary figure 1 and the activation patterns for the spatiotemporal epochs, AMR, APR and AMPR in supplementary figure 2.

First, we will go over some principles of the beamforming technique. When a beamformer is calculated for an activation pattern (in the case of mVEP this can be found in supplementary figure 2(A)), the result will be 1 in the case of a perfect match in wave shape and amplitude; if the match is not perfect, the value will be less than 1. However, in the case of brainwaves, where the activation pattern is often many multitudes smaller than those of the individual epochs responses, the resulting magnitude can quickly surpass 1. Similarly, when applying the analytic beamformer to the analytic signal it is trained on, it will yield a complex number with magnitude 1 and phase 0. Note that, ideally, the beamformer works on signals that are time and phase locked, which is the case with mVEP data within a dataset, but the variation in shape and magnitude across subjects provides a less than ideal working condition which is why we are obliged to calculate dataset-specific analytic beamformers when optimizing accuracy.

Second, we will talk about the effect of the transformation on the spatiotemporal epochs. The transformation calculates an analytic beamformer for each channel and applies this analytic beamformer to each channel. By doing so, we calculate the similarity between signals in different channels and split this into AMR by calculating the absolute value of the complex number, representing amplitude and shape similarity, and APR by calculating the angle of the complex number, representing the phase difference. By doing so, the transformation creates an abstract representation of the relation between the signals in the channels of a spatiotemporal epoch, removing any dataset-specific noise. When we observe the effect of the transformation on the data distributions (supplementary figure 1), we see that in spatiotemporal data there is a large variation centered around 0 primarily ranging between −20 and 20. No differences can be noticed in the distributions between on-target and all-data (supplementary figures 1(A) and (G)). When we look at the population-based activation patterns (supplementary figures 2(A) and (B)), we can say that the amplitudes of the activation patterns are much lower than those of the individual epochs.

When we observe the distribution of the AMR data (supplementary figures 1(B) and (H)), we first notice that no negative values are present in the data. Second, we notice a smaller variation of the data as most data points have a magnitude of less than 10. The lower variation can be explained by the nature of the transformation. If the range in amplitudes in a dataset is large, then the amplitudes of the activation pattern will be large as well. Thus, by calculating dataset-specific analytic beamformers and applying them, we remove this source of variation from the data. Again, no variation can be observed between the distributions for on-target and all data. The AMR activation (supplementary figure 2(C)) provides an explanation for the upcoming results as it appears very difficult to distinguish between on-target (blue) and off-target (red) AMR data. Next, when studying the distribution of the APR data (supplementary figures 1(C), (D), (I) and (J)), we notice two things. First, the variation of the real and imaginary numbers lies exclusively between 1 and −1. This is to be expected as the APR contains the phase information in Euler notation meaning that the real and imaginary numbers are the cosine and sine of the angle, respectively, which by definition lie between −1 and 1. Second, we notice the shift to positive real values in the on-target epochs compared to the data distribution of all epochs. This is to be expected as on-target epochs will more closely match the activation pattern, leading to a smaller phase angle, which have greater, positive real components. Similarly, the distribution of the imaginary APR data becomes more uniformly spread as an exact phase match is unlikely and the signals can be out of phase with a positive or negative angle. Note that for all subjects not necessarily all channels contain the mVEP response, it is subject-specific, which can explain why there are still maximal imaginary components present. The APR activation pattern shows that the on-target and off-target APR data can be distinguished primarily through a strong real-value.

The AMPR data distribution (supplementary figures 1(E), (F), (K) and (L)) show the expected combination of the APR and AMR distributions. The distribution much more closely resembles those of the spatiotemporal epochs but with a lower variation, which can be explained using the same logic as the one used for the AMR case above. Similarly, we can see that the on-target data is distributed more positively, although not as obvious as for the APR data. When studying the activation pattern, we can detect that, by combining the APR and AMR data, whilst the activation pattern is still easily distinguishable, the variation has drastically increased.

3.2. APR, AMR and AMPR-based decoding performance using individual subject datasets

Figure 4 shows boxplots of accuracies of SVM (blue), spatiotemporal beamformer (red) and SWLDA (green) classifiers, when using spatiotemporal epochs directly, AMR-based (A), APR-based (B) and AMPR-based (C) classification. One or more asterisks indicate the certainty (p-level) with which the null hypothesis was rejected. The p-values of the pairwise Wilcoxon Sign-rank test of all classifiers are listed in supplementary table 1.

From the accuracies shown in figure 4(A), we observe that AMR-based classifiers yield significantly worse performance (see also supplementary table 1(A)) for 1–10 repetitions compared to classifiers operating on spatiotemporal epochs directly. Note that the accuracies in the case of AMR are actually a metric for the similarity between the expected and the observed activation pattern.

From figure 4(B), we observe that APR-based classifiers perform significantly worse for spatiotemporal beamformer classification for 1–5 repetitions, for SVM for 1–6 repetitions and for SWLDA for 1–7 repetitions (see also supplementary table 1(B)) but a noticeable increase in decoding accuracy can be observed for a larger number of repetitions. This indicates that APR data, which is an indication of the phase relation between channels, carries more information than AMR data but neither contains all information of that individual as can be seen when decoding spatiotemporal epochs directly.

In figure 4(C), supplementary figures 1(C) and 2(C) we show the performance of AMPR-based classification. We notice that for SVM the accuracies are on par with those using spatiotemporal epochs directly. The AMPR-based decoding resulted in a significant improvement for the LCMV beamformer classifier for 1, 3 and 7 repetitions and a significantly worse performance for SWLDA for eight repetitions (see also supplementary table 1(C)). From this, we conclude that decoding the intention of individuals through AMPR-based decoding can affect the decoding performance positively or negatively depending on the used decoder.

3.3. APR, AMR and AMPR-based decoding performance using datasets of a population of subjects

Figure 5 depicts boxplots of accuracies when decoding datasets in a leave-one-subject-out cross validation setting using population-trained SVM- (blue), spatiotemporal beamformer- (red) and SWLDA (green) classifiers, respectively, when using spatiotemporal epochs directly (A, B, C, dark colors), AMR-based (A), APR-based (B) and AMPR-based (C) classification.

In figure 5(A) we showed the performance of AMR-based classification. We notice a significant decrease in decoding accuracy for all algorithms for 1–10 repetitions compared to decoding spatiotemporal epochs directly (supplementary table 2(A)).

When using APR-based population training we notice a strong, significant increase in decoding accuracy for all classifiers (figure 5(B)) yielding significant improvement from 1 to 10 repetitions for all investigated decoders (supplementary table 2(B)) compared to direct spatiotemporal epoch classification.

Finally, the accuracy results of AMPR-based classification can be found figure 5(C). We observe a significant increase (see also supplementary table 2(C)) in decoding accuracy for SVM and LCMV BF decoders for 1–10 repetitions and a significant performance increase for SWLDA for 8 and 9 repetitions.

From these results, we conclude that AMR data contains information relevant to mVEP classification when trained on individual datasets, but has an adverse effect when trained on population data. Indeed, AMPR-based decoding shows a drop in decoding accuracy for population-based training whilst the highest accuracies were achieved when performing APR-based decoding. The latter could indicate that, although a large variability in response shape can be observed in the elicited mVEPs, the phase between the channels remains more stable, making it more suitable for pre-training decoders using population data.

3.4. Compared performance of the classifiers

To complete our investigation, we checked whether one of the chosen decoders benefitted more from one of the data representations. Here we have excluded the comparison with the spatiotemporal epochs as we have already reported on this in previous work [34, 48]. Boxplots of the results can be found in figure 4 for the decoding of individual datasets and figure 5 for population-trained decoders. Results of a paired Wilcoxon sign-rank test between classifiers can be found in supplementary table 3 for individual dataset decoding and supplementary table 4 for population-trained decoding.

For decoding individual datasets, we notice that, when using AMR data, the LCMV beamformer performs significantly worse than both SVM and SWLDA for 1–10 repetitions whilst SWLDA performs better than SVM for 6 and 8 repetitions. When observing results for APR data, LCMV BF outperforms SVM for 1, 2, 6 and 7 repetitions and outperforms SWLDA for 1, 2, 4, 5, 6, 7 and 8 repetitions whilst SVM outperforms SWLDA for three repetitions. This sudden shift in performance between classifiers is most likely caused by the attempt to classify complex values. Whilst the LCMV BF is capable of using the complete complex numbers for training, SVM and SWLDA are not, and in this way the former gains an extra advantage where weights of complex features remain grouped. In AMPR-based decoding, we notice that the LCMV BF and SWLDA outperform the SVM for one repetition and the LCMV BF outperforms SWLDA for three repetitions. These results indicate that, although exceptions are present, the performance of SVM and SWLDA are comparable whilst LCMV BF has an advantage in the ability to decode complex numbers. SWLDA appears to take more information from the AMR information whilst SVM can better formulate a model for concatenated complex values.

The performance of population-trained decoders tells a different story. First, there is no significant difference in performance between LCMV BF, SVM and SWLDA for AMR decoding. For APR decoding we see that SWLDA and SVM perform on par, whilst the LCMV BF outperforms SVM for eight repetitions and SWLDA for six repetitions, most likely because of the previously mentioned capability to process complex values. In AMPR decoding, the LCMV BF outperforms SVM for six repetitions and SWLDA performs significantly worse than LCMV BF and SVM for 2–10 repetitions. This is in line with our previously procured results as SWLDA prefers AMR data, and AMR data actually causes a drop in performance when added back to the APR data.

3.5. Data size required for data transformation

Since the transformation relies on the calculation of analytic beamformers: what is the minimum number of epochs required for correctly estimating the activation pattern and the covariance matrix? Note that only the performance of the spatiotemporal beamformer classifier will be reported as the training time for the spatiotemporal beamformer is by far the lowest [33] of the used classifiers and given that the effects of the proposed transformation are similar across classifiers.

We performed cross-validation both on individual- and population level where the analytic beamformers were calculated using a subset of epoch data ranging from 9 epochs to 2430 epochs. Subsequently, we transformed all data using the obtained analytic beamformers and compared the results when trained on transformed data or on spatiotemporal epochs directly. In the supplementary table 5, the results of the Wilcoxon signed rank test (α < 0.0167) are shown for AMR-based (A), APR-based (B) and AMPR-based (C) decoding. We observe that AMR-based classification performs significantly worse for 1–10 repetitions, regardless of the number of epochs used in the transformation. This is not surprising since, even when using all data, performance was already significantly worse compared to the case where the classifier was operating on spatiotemporal epochs directly. When observing the results of APR-based decoding, we observe that, at first, for 9–36 epochs, the performance of the classifiers operating on spatiotemporal epochs directly is significantly better for 1–10 repetitions (supplementary table 2(B)), however, as more epochs are used to estimate the transformation, we notice a shift in decoding performance. The two decoders work on par when only 54 epochs were used for the transformation and starts performing significantly better for multiple repetitions (7–10) when 180 epochs were used. When using 450 epochs, classifier accuracy in the case of the transformation significantly outperforms for 2–10 repetitions and the difference in accuracy broadens as more epochs are used. As to the AMPR-based case, the two decoders work on par when 180 epochs are used for transformation (supplementary table 2(C)). At least 450 epochs are required before the AMPR-based decoder significantly outperforms the spatiotemporal epoch-based decoder (1 and 3–10 repetitions) and 540 epochs are required for the transformation to lead to a significant performance increase for 1–10 repetitions.

To complete our accuracy analysis, supplementary table 4 lists the outcome of a Wilcoxon signed rank test (α < 0.0167) when the transformation is developed when using a subset of epochs vs. when using all epochs for AMR-based (A), APR-based (B) and AMPR-based (C) classification. For AMR-based decoding, we notice a significantly better classification accuracy for 1–10 repetitions (with some exceptions) when using all epochs which holds until at least 270 epochs are used for the transformation. After 360 epochs the two methods appear to perform on par, with a few exceptions where the significance threshold is passed. APR-based decoding appears to require a large number of epochs to reach results that are statistically similar to those when using all epochs. It requires at least 2250 epochs before both transformations perform on par, with results even passing the significance threshold after more epochs are considered, when a higher number of repetitions is used. AMPR-based decoding shows that 450 epochs are required to reach comparable results between decoders but, again, after more epochs are used, the significance threshold is passed. From these results, it is apparent that the best scenario is reached when a lot of subject-specific data is used, which demotes the usability of the proposed transformation in this case. To study the relation between number of epochs and accuracy, we plotted the course of the average accuracy over repetitions and number of epochs used for the transformation both in testing and training and testing in figure 6 and supplementary figure 3, respectively. Observing AMR, APR and AMPR in both cases, we notice a logarithmic relation between decoding accuracy and number of epochs used in the transformation, which is logarithmic in nature as accuracy is naturally capped at 100%. The curve can be linked to number of repetitions, with a higher number of repetitions leading to a sharper rise. In the case where we gradually increase both testing and training data used for AB calculation compared to only increasing the test set, we notice that initially, the results are lower when few epochs are used but when approximately 450 epochs from each subjects are available, decoders appear to reach similar results. The conclusion of these results is that the number of epochs used for the transformation impacts on the accuracy of a classifier trained on a low number of epochs, but quickly leads to diminishing returns when the number of epochs increases. However, actually a quite low number of epochs suffices to significantly outperform traditional spatiotemporal epoch decoding.

**Figure 6.** Heatmaps depicting the course of the average decoding accuracy when varying number of repetitions (y-axis) vs. the number of epochs used for transformation (x-axis) in the case of an AMR- (A), APR- (B) and AMPR-based (C) spatiotemporal beamformer classifier. These results were obtained through population based dataset decoding as we gradually add more dataset specific data to the transformation algorithm for the testing data but use all available data for training datasets to perform the transformation.
Download figure:
Standard image High-resolution image

4. Discussion

In this work, we described a novel transformation termed analytic beamforming transformation (ABT) which, when used to transform motion-onset VEP (mVEP) epochs, separates spatiotemporal epochs into AMR or APR, both of which can be combined through dot multiplication into AMPR. We used these transformations to train spatiotemporal LCMV beamformer-, SVM- and SWLDA classifiers to decode data from individual datasets through fourfold cross validation and data from a population in a leave-one-subject-out cross validation setting and compared the performance with the classifiers operating on spatiotemporal epochs directly. Finally, we calculated how much epochs are needed for the transform to perform optimally.

4.1. Effect of the transformation on the data distribution

We have shown the effect on the data transformation on the distribution of the mVEP data. By performing the transformation, we create for AMR a relation between the amplitudes and channels of an epoch and for APR a relation between their phases. This brings to light a drawback by splitting the data in this a way: when we would have a signal that was the opposite of the activation pattern, this would need to result in a magnitude of −1, but by taking the absolute value as the magnitude, we lose crucial information of the sign. The sign is easily reinstated in AMPR through the multiplication of APR and AMR since a match in signals will result in a small phase difference, which in turn results in a positive real component.

The data distributions provide an explanation as to why individual dataset decoding works best with AMPR data and population dataset decoding works best with APR data. The AMR data contains information about the similarity in shape and amplitude of the spatiotemporal epochs, this information is naturally informative for the decoding algorithms, but is high in variance across datasets. The APR data contains information on how the phases of the signals in different channels relate to each other. If many channels of an epoch exhibit a small phase difference, they appear to be containing very similar signals when looking at the activation patterns in supplementary figure 2. This may simply be a more consistent measure across a population compared to the large variation in spatiotemporal signals but, additionally, APR provides an included normalization as the data is naturally limited between −1 and 1, independent of the dataset which is removed by reintroducing AMR data in AMPR.

4.2. Positioning with respect to BCI transfer learning

The described transform is an example of the domain adaptation approaches implemented for BCI use. Since beamforming is a filtering technique, it could be compared to the common spatial patterns (CSP) techniques which have been implemented in previous works primarily for imaginary motor paradigms but less for other paradigms based on ERP data [51–53] and extended to common spatio-temporal patterns [54]. There is, however, a crucial difference in the assumption between these techniques: where CSP assumes there exists a set of linear filters that is invariant across either sessions or subjects, the results described here show that ABT requires filters to be session-specific, hence the need for session-specific data as reported in section 3.4.

ABT would be more similar to techniques that attempt to model the variation between subjects to a common space and discover a transformation for new individuals to this space [51]. This brings with it an interesting path for future work. The calculation of the analytic beamformers relies on the covariance matrix of the session-specific data and the activation pattern of which the covariance matrix calculation would require the bulk of the data. By using techniques proposed in works approximating the covariance matrix [55, 56], we could remove the need for a subject-specific dataset altogether and as a result perform the transform requiring only the activation pattern.

Promising results in BCI transfer learning have been achieved using Riemannian Geometry and reference EEG signals to transform data from two separate sessions into a common space [51, 57–59], Riemannian geometry has also been used to improve the covariance matrix estimation for beamforming applications [60]. It would be interesting to combine these techniques in future work to remove the need for a specific covariance matrix for a new user by directly estimating the similarities between data of the available population datasets and a new user's data. This way, instead of using the entire population data, we could pick a subset of the population whose data is similar to that of the new user to calculate a population decoder which is more finetuned to the new user to remove the requirement of a dataset-specific covariance matrix for the new user and thus severely lowering the amount of required data for a new user.

4.3. AMR, APR and AMPR-based decoding performance in decoding individual subject datasets

We trained classifiers using data from individual datasets in a fourfold cross validation setting with and without the proposed transformation based on analytic beamformers. AMR-based training turned out not to contain sufficient information (supplementary table 1(A), figure 4(A), supplementary figures 1(A) and 2(A)) as classifier accuracy was significantly lower than when operating on spatiotemporal epochs directly. These results were confirmed for every subsequent population analysis we performed. However, when observing decoding accuracy of APR-based classification (supplementary table 1(B), figure 4(B), supplementary figures 1(B) and 2(B)) we notice an increase in accuracy compared to AMR-based classification, but all considered linear classifiers required several repetitions before performing on par with them operating om spatiotemporal epochs directly. Finally, AMPR-based decoding resulted in a significantly improved accuracy (supplementary table 1(C), figure 4(C), supplementary figures 1(C) and 2(C)) for the spatiotemporal beamformer classifier, significantly worse accuracy for the SWLDA classifier and equal performance for the SVM classifier. These results show that, when considering individual datasets, AMPR-based decoding contains as much information as the spatiotemporal epochs, but performance appears to be classifier-specific as no univocal effect on decoding was observed across classifiers. From the comparison between classifiers, we can conclude that LCMV BF has an advantage over SVM and SWLDA because it is capable of decoding complex values directly. The performance difference is most likely due to the SVM and SWLDA being faced in this case with double the number of features per epochs and therefore a more complicated problem to model.

4.4. AMR, APR and AMPR-based decoding performance using population of subject datasets

When a leave-one-subject-out cross validation strategy was adopted, we noticed a significant increase in decoding accuracy when using APR- and AMPR-based classifiers (supplementary table 2(B), figure 5(B), supplementary figure 3(B), supplementary figure 4(b) and supplementary table 2(C), figure 5(C), supplementary figure 3(C), supplementary figure 4(C), respectively) compared to a classifier operating on spatiotemporal epochs directly, for 1–10 repetitions. An interesting result was that, whilst on individual level AMPR-based decoding performed best, on population level APR-based decoding already provided more stimulation-specific information. This would indicate that on population level AMR actually creates a more challenging linear classification problem. A possible explanation would be the large variability in response shape between subjects [33, 44] whilst phase relations between channels remain more stable. Note that AMR decoding (supplementary table 2(A), figure 5(A), supplementary figure 3(A), supplementary figure 4(A)) performed significantly worse on population scale than on individual datasets. When comparing decoder performances, SWLDA performed significantly worse in population-trained decoding. A suggested explanation is that SWLDA focusses more on the relative strength between the features, which is more prevalent in AMR data, whilst the more complex SVM can more accurately model the relation between features. The LCMV BF most likely performs better because of its capability to process complex numbers.

4.5. Minimum data required to estimate the transformation

The impact of number of epochs used to estimate the transformation on decoding performance was investigated for the spatiotemporal beamformer classifier; other classifiers require a more substantial computational effort to train them [33]. The time required to perform one block (90 epochs) was 24 s, we will provide both the required number of epochs and the stimulation time required.

First, the impact on the transformation was assessed in comparison to the spatiotemporal beamformer classifier operating on spatiotemporal epochs directly. In order for APR-based decoding to be on par with spatiotemporal epoch-based decoding, 15 s of stimulation (54 epochs) were required and significant improvements started to be noticeable from 48 s of stimulation (180 epochs) onwards (supplementary table 3(B)). After 2 min of stimulation (450 epochs) 4, APR-based decoding significantly outperformed spatiotemporal decoding for 1–10 repetitions. For AMPR-based decoding, a minimum of 48 s of stimulation (180 epochs) were required for equal performance, and after 96 s of stimulation (360 epochs), AMPR-based decoding started to outperform spatiotemporal epoch-based decoding. At least 2 min and 24 s of stimulation (540 epochs) were required for AMPR-based decoding to work significantly better than spatiotemporal epoch-based decoding (supplementary table 3(C)). Since AMR-based decoding never outperformed spatiotemporal epoch-based decoding, the influence of the number of epochs was not discussed but can be found in supplementary table 3(A).

Second, we compared the results of using a limited number of epochs for the transformation vs. when using all available epochs. We noticed similar results for AMR-based decoding when 96 s of stimulation (360 epochs) were used in the transformation (supplementary table 4(A)). For APR-based decoding, whilst at least 10 min of stimulation (2250 epochs) were required for the two methods to be on par, diminishing returns in increase in accuracy started to be noticeable when using 2 min and 24 s of stimulation (540 epochs) (supplementary table 4(B)) whilst for AMPR-based decoding this was 96 s of stimulation (360 epochs) (supplementary table 4(C)) with 2 min of stimulation (450 epochs) required for the two strategies to perform on par.

Upon investigating the average accuracies plotted in function of repetitions and epochs used, we could see that a point of diminishing returns is reached for around 2 min of stimulation (450 epochs) used in the transformation for AMR, APR and AMPR-based decoding (figures 6(A)–(C) and supplementary figures 3(A)–(C), respectively). We therefore suggest that the transformation can be successfully developed using as little as 450 epochs or 2 min of stimulation. This result holds for both a new user dataset, and a new dataset being added to the training data.

Here, an interesting trade-off becomes apparent. Would it be more beneficial to use a lot of data from a small subset of the population of training datasets, or use little data from all subjects of the training population? Spatiotemporal epoch decoding would definitely benefit from a subset of training data that elicits similar responses in order to minimize the variability in the training set. In the case of ABT, given the reported results, it would be more beneficial to use as much data as available from as many subjects as available. Two arguments support this. Supplying machine learning algorithms with more data can lead to better models and the ABs benefit from more data availability as well.

4.6. Significance and future work

We have shown that ABT can be used to create population-trained mVEP decoders using a limited number of epochs. We suspect this result to pertain to other ERPs or synchronous stimulation paradigms to which the spatiotemporal beamformer has been applied [21], allowing for a more effective, population-based training of visual BCI applications. Similarly, the results seem to imply that this method may be used to transform datasets of subject-specific data into a space where data can be compared more directly which in turn would allow for an easier performance comparison between subjects. Another research direction would be to investigate the relation between phase and magnitude epochs [61] in view of more complex, non-linear decoding algorithms. Additionally, it would be interesting to investigate AMR and APR-based decoding for different age groups of healthy subjects and in neurology patients with a progressive disorder or traumatic brain injury. The transformation tackles several of the current challenges in EEG-based BCI [62] as the population-trained decoders would remove long and tedious training sessions and the transformation provides a new angle on the decoding of brain patterns recorded with EEG.

In future work, we aim to investigate the proposed transform on different ERPs and BCI paradigms.

Acknowledgments

We thank the Flemish Supercomputer Center (VSC) and the High-Performance computing (HPC) center of the KU Leuven for allowing us to run our code on their systems (vsc32985), project llonpp.

Data availability statement

The data generated and/or analysed during the current study are not publicly available for legal/ethical reasons but are available from the corresponding author on reasonable request.

Funding

AL is supported by the Belgian Fund for Scientific Research—Flanders (FWO 1SC3419N).

AVDK is supported by the special research fund of the KU Leuven (GPUDL/20/031).

MMVH is supported by research grants received from the European Union's Horizon 2020 research and innovation programme under Grant Agreement No. 857375, the special research fund of the KULeuven (C24/18/098), the Belgian Fund for Scientific Research—Flanders (G0A4118N, G0A4321N, G0C1522N), and the Hercules Foundation (AKUL 043).

Institutional review board statement

The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by the Ethical committee of KU Leuven's university hospital (UZ Leuven) (S62547 approved 11 June 2019).

Informed consent statement

Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper

Conflict of interest

The authors declare no conflict of interest.