Introducing Block-Toeplitz Covariance Matrices to Remaster Linear Discriminant Analysis for Event-related Potential Brain-computer Interfaces

Covariance matrices of noisy multichannel electroencephalogram time series data are hard to estimate due to high dimensionality. In brain-computer interfaces (BCI) based on event-related potentials and a linear discriminant analysis (LDA) for classification, the state of the art to address this problem is by shrinkage regularization. We propose a novel idea to tackle this problem by enforcing a block-Toeplitz structure for the covariance matrix of the LDA, which implements an assumption of signal stationarity in short time windows for each channel. On data of 213 subjects collected under 13 event-related potential BCI protocols, the resulting 'ToeplitzLDA' significantly increases the binary classification performance compared to shrinkage regularized LDA (up to 6 AUC points) and Riemannian classification approaches (up to 2 AUC points). This translates to greatly improved application level performances, as exemplified on data recorded during an unsupervised visual speller application, where spelling errors could be reduced by 81% on average for 25 subjects. Aside from lower memory and time complexity for LDA training, ToeplitzLDA proved to be almost invariant even to a twenty-fold time dimensionality enlargement, which reduces the need of expert knowledge regarding feature extraction.


Introduction
A brain-computer interface (BCI) enables the use of brain activity as an input for a computer program.Example applications are spelling programs (Farwell & Donchin, 1988), wheelchair control (Fernández-Rodríguez et al., 2016) or even rehabilitation trainings for neurological diseases, e.g., stroke (Ang & Guan, 2013) or attention deficit hyperactivity disorder (Lim et al., 2012).As brain activity differs for each individual, modern BCIs make use of machine learning methods to adapt to each user.To record brain activity, the electroencephalogram (EEG) is a popular choice, as it is noninvasive, inexpensive and almost anyone can use it (Nunez, 2012).The decoding of EEG signals using machine learning methods is challenged by a very low signal-to-noise ratio of the EEG and as the characteristics of the EEG signal ideally should be learned from very little individual training data.One approach to solve this problem is to use transfer learning (Jayaram et al., 2016).However, this can be hard to realize as the EEG shows large differences between subjects and even between sessions of the same subject.Another approach is to make classifiers more data efficient, i.e., by exploiting better the available data (Gruenwald et al., 2019;Sosulski et al., 2021).
One commonly used type of brain activity for BCIs is the event-related potential (ERP), a transient potential as a response to a specific stimulus, e.g., a visual emphasis of letters on a screen (Farwell & Donchin, 1988;Lin et al., 2018) or playing a tone or word (Schreuder et al., 2010;Höhne et al., 2012).Here the BCI user is presented with multiple stimuli while the BCI exploits the fact that attended target stimuli produce different ERPs than ignored non-target ones (Sellers et al., 2012).By classifying the ERP responses, the target stimulus, e.g., the attended letter on a screen, can be detected.Note that the amplitude of an ERP is often more than ten times smaller than that of the ongoing background activity in the EEG.
For classification of ERP components, linear discriminant analysis (LDA) is often used, as it is computationally cheap during training and inference and still delivers state of the art performance (Lotte et al., 2007;2018).LDA requires estimates of target and non-target class means and of the feature distribution expressed by covariances.As BCI problems often come with few training data points, the difficult estimation of the potentially high-dimensional covariance matrix is typically regularized by shrinking (Ledoit & Wolf, 2004;Zhao et al., 2017;Blankertz et al., 2011).
ERPs are analyzed by extracting epochs, i.e., short time windows aligned to the onset of each stimulus.While EEG generally exhibits non-stationarity over longer time scales, a weak stationarity assumption, i.e., that mean and autocovariance do not depend on time, is reasonable (Cohen & Sances, 1977) for short (few seconds) epochs of high-pass filtered EEG.While an ERP response can be seen as a mean shift non-stationarity, it is removed with other mean information prior to estimating the covariance matrix.For data generated from a stationary process, the estimation of the covariance matrix becomes much easier (Xiao et al., 2012), especially in high-dimensional settings (Chen et al., 2013).Another popular method to deal with time series covariance estimation is tapering (Furrer & Bengtsson, 2007;Pourah- We propose to make LDA training for spatiotemporal ERP features more efficient by enforcing these assumptions.We hypothesize-and later show in ERP data and classification results-that imposing a block-Toeplitz structure on the covariance matrix of the background activity of the EEG in ERP epochs can greatly enhance the classification performance of LDA.

Benchmark Datasets
We use datasets with ERPs evoked by stimuli modalities based on either words, tones, or visual stimuli.The signal-tonoise ratio (the ERP amplitude) depends, among others, on this stimulus modality.In the context of LDA, class means are provided by the average target/non-target ERPs, whereas background activity and other noise are characterized by the covariance matrix.
For a general overview of all used datasets please refer to Table 1, while dataset details are provided in Appendix A. We used MOABB (Mother of all BCI Benchmarks) (Jayaram & Barachant, 2018) to access publicly available datasets, enriched by private datasets recorded in our lab.

Classification Pipeline
While multiple sessions were recorded for some subjects, we are mainly interested in within-session performance, i.e., training and applying a classifier to data of the same We then obtained feature matrices X by windowing epochs relative to the onset of all stimuli and resampling the epochs to 40 Hz.A typical ERP pipeline predefines time intervals in which the signal is averaged.Thus we evaluate different ERP intervals and report the one performing best across all datasets.In addition, we evaluate a setting where every single time sample in a specific time window is used.For the complete list of used hyperparameters see Appendix B.

Classification with Linear Discriminant Analysis
In a binary classification task, the Fisher linear discriminant analysis (LDA) requires two statistical measures of the data (Bishop, 2006): First, the within-class covariance matrix Σ and second the class means µ 1 and µ 2 .With this, we can calculate the weight vector w = Σ 1 (µ 2 − µ 1 ) and the bias b = w T (µ 1 + µ 2 ).For high-dimensional features, shrinkage regularized covariance matrices (Ledoit & Wolf, 2004;Blankertz et al., 2011) are used in practice to improve conditioning.
While generally a bias adaptation is required for unbalanced classes this is unnecessary for most ERP-based BCI applications, as the stimulus set contains (besides multiple non-targets) exactly one target stimulus.Assuming positive target labels, it can safely be determined by the highest classifier output.This may explain the popularity of the area under the receiver operating characteristic curve (AUC) as a metric in the realm of BCI, as the bias does not impact the AUC metric.where x tn cm is the signal at the m-th channel and the n-th time point of that epoch.

Enforcing
In this paper we use the form x, where first all channel features are stacked, and refer to it as being channel-prime, whereas we refer to t x as time-prime.This choice impacts on the structure of the covariance matrix Σ: Stacking each feature vector of all N e epochs, we finally obtain our data X ∈ R NcNt×Ne .Using the centered data X, we can estimate the sample covariance matrix as 1 Ne−1 X XT .
When using feature vectors x that are channel-prime, we can construct the channel-prime covariance matrix Σ as a block matrix consisting of multiple blocks of the form where block C i,j ∈ R Nc×Nc describes the channel-wise (cross-)covariance matrix between the i-th and j-th time point.Note that in this work we do not enforce any assumptions about the spatial covariance between channels and simply use the empirical values.With this, we can write the complete channel-prime covariance matrix as which describes the full relationship of the features across both the temporal and spatial domain.

ENFORCING STATIONARITY
We now want to incorporate two assumptions for ERP data: A1 Only the ERP signal is time locked, whereas the EEG background activity is generated from a stationary process (Xiao et al., 2012).As a result, the covariance across the time domain depends only on the temporal distance of two samples, i.e., cov(x tj , x ti ) = cov(x tj +δ , x ti+δ ) ∀δ ∈ R.
Both assumptions only concern the temporal, but not the spatial domain.To incorporate assumption A1 of stationarity into the channel-prime covariance matrix, i.e., i , we introduce the notation C ∆t=d , where d is the signed temporal distance.Now we can write the stationary covariance matrix as , which has a block-Toeplitz structure.Note that it is fully defined by its first block-row or block-column as the blockdiagonals of Σ stat are equal and as C T ∆t= d = C ∆t=d holds due to symmetry.
The advantage of representing the covariance matrix in this way is that the number of free parameters that need to be estimated decreases from (N t N c )(N t N c + 1)/2, which scales quadratically in N t , to only N c (N c + 1)/2 + (N t − 1)N c N c , which scales linearly in N t .Aside from fewer free parameters, using block-Toeplitz covariance matrices reduces memory usage, even if many temporal features are used.
For block-Toeplitz matrices computationally efficient algorithms for inversion or solving linear systems exist.We sped up the solving of block-Toeplitz linear systems by using Levinson-Durbin recursion (Golub & Van Loan, 2013;Arushanian et al., 1983) which has a complexity of O(N 2 t ) in the time dimension.Please note, that also so-called superfast algorithms exist (Ammar & Gragg, 1988), which further reduce the complexity to O(N t log N t ) and are faster in practice for N t > 256.
A general spatiotemporal covariance matrix Σ for the background activity in a visual ERP task is shown in channelprime form in Figure 1.To construct its stationary block-Toeplitz form, blocks sharing the same color are averaged, resulting in more robust block estimates per block-diagonal.Depending on the block-diagonal only N t − d block matrices are available for averaging.Note that if the number of epochs N e is small, this transformation can lead to matrices that are not positive definite any more-an issue which is resolved by taking the second assumption A2 into account.
To incorporate A2, we introduce a monotonically decreasing tapering function (Xiao et al., 2012)   1 − |d|/N t , a tapered channel-prime covariance matrix is obtained by Combining both assumptions, we finally obtain which is a channel-prime block-Toeplitz covariance matrix with both assumptions incorporated.Note that by averaging the block-diagonals and then applying the linear tapering function, the resulting block matrices are the sum along the block-diagonal with an additional scaling by 1/N t .
To facilitate this process in software, we published an open source package that builds on NumPy (Harris et al., 2020) and exposes all methods needed for working with block matrices with two domains to implement our approach. 1As shrinkage regularization improves matrix conditioning and helps with solving the linear system, we apply our assumptions on top of a shrinkage regularized covariance matrix and term the resulting LDA 'ToeplitzLDA'.

Benchmark: Binary Classification of ERP Epochs
Across the datasets in Section 2.1 we apply ToeplitzLDA and compare it with an LDA that uses only shrinkage regularization (sLDA).Additionally, we use the publicly available implementation of time-decoupled LDA (Sosulski et al., 2021) (TDLDA)-a different approach to improve the covariance estimate-and a logistic regression classifier making use of Riemannian geometry (Barachant & Congedo, 2014) for comparison.Contrary to an LDA classifier, the latter augments each ERP epoch with class templates and calculates a feature covariance matrix for every epoch, which is classified using logistic regression in the tangent space (Kolkhorst et al., 2018).Note that the covariance matrices we talk about in this paper are very different as they represent EEG background activity, artifacts and other noise instead of features.
In most datasets six stimuli form a group, e.g., visual spellers with six rows/columns or auditory oddball paradigms with 1:5 target/non-target ratios.Therefore we use [6,12,24,48,96,192,384] epochs as sizes for the training subsets (if supported by the size of the dataset) and draw them randomly seven times.An additional setting uses all epochs provided in the training set, the number of which varies between 300 and 7782, depending on the dataset.On each subset, all classifiers are trained and then evaluated on D val .We report average AUC performance for each subset size and each subject in every dataset, i.e., classification results are averaged across the randomly drawn subset and (if applicable) across multiple sessions of a subject.
Many BCIs use an aggregate of binary classification outputs obtained for several stimuli to solve a multi-class problem, e.g., in a visual ERP speller application the subject can choose between many different letters/symbols.Due to this aggregation, even modest improvements in binary classification performance can lead to strong improvements of the application-level performance.

Improving a Visual ERP Speller Application
Most BCI setups consist of two phases: In the calibration phase labeled data is recorded, e.g., by instructing the user which symbol to attend to.In the following productive/online phase, the user can actually operate the application according to his intention with the support of the trained classifier.The usability of a BCI improves with shorter calibration and with quicker and more reliable detection of the user's intention, e.g., the desired symbol.Unsupervised classifiers have been proposed that build upon either expectation maximization (Kindermans et al., 2012), learning from label proportions (Quadrianto et al., 2009;Hübner et al., 2017) which makes use of known label proportions in the stimuli to estimate the class means, or a combination thereof (Hüb-ner et al., 2018).In visual speller BCIs their use allowed to eliminate the calibration phase entirely.
To evaluate the benefit of ToeplitzLDA in an actual BCI application, we used both the V_LLP_A and the V_LLP_B datasets, which had been recorded under a modified visual ERP paradigm that made learning from label proportions possible.
In these datasets, the subjects were instructed what to spell, however, the underlying LDA classifier did not receive this information but was trained unsupervisedly.One trial corresponds to the spelling of one letter and consisted of 68 stimuli.The interval between the onset of two stimuli was 250 ms such that one trial took approximately 20 seconds.
In our offline replay, we re-train the LDA classifier after each trial on the data of all past trials and-as we use an unsupervised classifier-also the current trial.Only then the updated classifier is employed to predict the letter of the current trial by summing the classifier outputs separately for all possible letters and selecting the one with the maximal value.
In each of three blocks, a subject repeated the spelling of a sentence while the classifier was trained from scratch each block.Note that in V_LLP_A a 63 letter sentence was spelled per block, whereas in V_LLP_B, only a 35 letter sentence was spelled per block.
While the class means can be estimated using learning from label proportions, we cannot calculate the within-class covariance matrix as label information would be necessary for this.However, as shown by Hübner 2020 for two class problems, the global covariance can be used instead which does not require label information.Hübner showed, that the resulting weight vector defining LDA's projection has a different scaling but points into the same direction.
As the authors of the original publication used an sLDA classifier, we will compare the spelling performance of the sLDA with our ToeplitzLDA classifier.To reproduce the results as closely as possible, we use the same bandpass filter ([0.5, 8] Hz) as in the original publication.Regarding feature selection, the authors hand-picked N t = 6 differently sized time intervals and averaged the measurements inside the intervals of each epoch before passing it to the classifier.Additionally, the authors corrected for baseline drifts-a frequently used technique for ERP analysis-by subtracting the average value of the baseline interval [ 0.2, 0] s from each epoch, and omitted two artifact-prone frontal EEG channels.Even though baseline correction and using differently sized time interval averages violate the assumptions we make in Section 2.4.3 we still evaluate the performance of ToeplitzLDA in this setting as a robustness check.
To evaluate performance when our assumptions are not vio- lated, we also evaluate a very basic feature selection settings, i.e., keeping all channels, applying no baseline correction and using all time samples in [0.05, 0.70) s relative to the stimulus onset.In this setting, the number of time features N t is determined by the sampling rate.While a sampling rate of 20 Hz should reasonably contain all information for 8 Hz low-pass filtered data, we additionally evaluate the sampling rates 40, 100 and 200 Hz.While these variants should not add new information, they shall reveal the impact of increasing time dimensionality on both, sLDA and ToeplitzLDA.An implementation of ToeplitzLDA and the code to reproduce our results on the unsupervised speller paradigm can be found in our code repository2 .

Block-Toeplitz Structure in Visual ERPs
The temporal covariance within three different channels, estimated during a visual paradigm is depicted in Figure 2.
If the data was perfectly stationary, the temporal covariance of each row would be overlapping.This is approximately for which the best possible class mean information has been calculated from the whole training set and only the covariance matrix was estimated from smaller subsets of varying sizes.Generally, reported averages were obtained by first averaging all sessions of a subject, and then averaging across all subjects.Note that a subset size of 384 was supported by only 10 out of 13 datasets.
the case for channels F3 and Cz, but less so for channel O2.Our method imposes the stationarity assumption on the covariance matrix, which corresponds to averaging the different temporal covariance curves per channel.

Improved Binary Classification Performance on the ERP Benchmark
For all LDA variants, using an ERP interval of [0.1, 0.6) s yielded the best performance on average across all datasets when using the complete training data.The hand-picked intervals degrade in performance when they have to be generalized over various different datasets.The Riemannian classification method performed best when applied upon six xDAWN components and when both target and non-target ERP templates were used.
In Figure 3, the average AUC across all subjects and all datasets is shown.For moderately sized training data subsets, TDLDA is close to ToeplitzLDA, however, when training data grows, ToeplitzLDA has an advantage.Note that the Riemannian classifier performs markedly better than The difference between ToeplitzLDA and sLDA is even more pronounced when considering the hypothetical case where the class mean estimates are known a priori, but the covariance is not.While this setting is mostly of theoretic interest, it emphasizes that even with a very accurate mean estimation, performance of LDA classifiers depend strongly on the estimation of the covariance matrix.In this setting, sLDA needs somewhere between four to eight times as much data to reach the AUC of the ToeplitzLDA.
Finally, in Figure 4 the influence of each assumption upon the classification performance is shown for a hypothetical setting where the mean information is known.Applying A2 only, i.e., a linear tapering on the covariance matrix, already leads to better performances compared to sLDA.Contrary, enforcing time stationarity only (A1) by using the averaging method decreases performance markedly.This can be explained by the different number of epochs available for very small time distances compared to very large time

ToeplitzLDA sLDA
Figure 5: Ratio of correctly spelled letters (all subjects, LLP datasets).For N t = 6 the preprocessing violated two assumptions of ToeplitzLDA.Feature dimensionality is the number of time samples multiplied by the number of channels.The best median performance for sLDA is 0.895, obtained using N t = 6.ToeplitzLDA reaches the median AUC 0.989 for N t = 27.
distances (see Section 2.4.3).Combining both assumptions eliminates the issue with non-positive definiteness of the transformed matrices and achieves the best classification performances.

Effect on an Unsupervised Visual ERP Speller Application
In contrast to the binary classification benchmark reported so far, classifiers in the unsupervised ERP speller setting had to cope with approximate class means (provided by learning from label proportions) and the global covariance matrix, as labeled data was not made available.
The results of applying both LDA classifiers to the data of 25 subjects are shown in Figure 5. Spelling errors were reduced on average by 81 % across all 25 subjects (see Appendix D).Interestingly, even when assumptions are explicitly violated in the setting N t = 6, the ToeplitzLDA classifier performs better than sLDA.For larger feature dimensions performance of sLDA degrades quickly, whereas this effect is minimal for ToeplitzLDA.Please note the almost stable performance of ToeplitzLDA for thousands of features.
As shown in Figure 6, the violation of the assumptions of ToeplitzLDA under a typical preprocessing for N t = 6 impacts performance only when more data is available.These results also emphasize that sLDA requires careful consideration which time interval to use.Interestingly, the performance advantage of ToeplitzLDA is more pronounced on this LLP data compared to the benchmark, were the true class labels were used.To reach 0.80, 0.85 and 0.90 AUC, ToeplitzLDA needs approximately 2, 5, 8 letters respectively, whereas sLDA requires 7, 16, 33 letters.

Discussion
Making use of domain specific assumptions for ERP-based BCI improves the performance of an sLDA classifier by up to 6 AUC points across 213 subjects from 13 different ERP datasets.The proposed ToeplitzLDA additionally obtains better AUC values compared to other state of the art classification methods in BCI.This shows that the relatively simple but still popular (Lotte et al., 2018) LDA classifier can be brought up to speed by using this novel covariance estimation.Reducing the free parameter count makes the covariance estimation more efficient, which may explain ToeplitzLDA's performance benefits especially on small data sets.As improvements were observed even for large training data, the enforced assumptions may also prevent overfitting to artifacts, which otherwise cause seemingly non-stationary covariance matrix estimates.While the lin-ear tapering is already successful, other tapering methods should be evaluated in future work.
In an unsupervised visual speller BCI application, the ToeplitzLDA reduces the median spelling error from 10.5 % to 1.1 % compared to the originally used sLDA.Aside from the obvious benefit of making fewer errors, BCI users perceive less frustration (Hougaard et al., 2021) when the error rate is small, and motivated engagement with the BCI may improve signal quality (Kleih et al., 2010).As our method reduces errors especially in the first few minutes of usage, the BCI user faces a productive system from the start on.
By enforcing the block-Toeplitz structure, we also reduce time and memory complexity of LDA training.This is especially useful in applications requiring frequent re-trainings like the unsupervised visual speller, which estimates a new weight vector after every spelled letter.Note that in our implementation we still store the full covariance matrix to facilitate the averaging process of the block-diagonals, however, this could be avoided by calculating incremental updates of the block-matrices.
Using a structured covariance matrix has been explored earlier in the domain of EEG signal modeling and classification.This comprises modeling the EEG's spatiotemporal covariance as a Kronecker product of parameterized temporal and spatial covariance matrices (Huizenga et al., 2002) or applying the Kronecker assumption on the generating sources (Hashemi et al., 2021), which improves source estimation compared to using an ordinarily estimated covariance.Estimating respective temporal and spatial covariance matrices from the data, but shrinking them towards different targets (Beltrachini et al., 2013) led to a reduced distance between estimated and true covariance matrix, if the power values of EEG channels are similar.Also using a Kronecker product structure for the covariance matrix of a linear discriminant analysis, Gonzalez-Navarro et al. ( 2017) report an increased performance for 12 subjects and small training data sets.However, the authors used a large number of time samples (N t = 64), which is a difficult and unusual feature setting for an sLDA (see Section 3.3).A Kronecker product structured covariance in the sensor space is most suitable, if (up to scaling) a single temporal covariance curve matches the covariance observed within each channel.Given the observations in Figure 2 this seems too restrictive for ERP classification with LDA, whereas our proposed block-Toeplitz structure is very close to the data.We also found that slight violations of the stationarity assumptions, i.e., by averaging non-regular ERP intervals or baseline correction, seems to be detrimental only, when at least data of 10 letters (680 epochs) has been recorded.This may indicate why a Kronecker product structured covariance can still be successful with fewer training epochs.
Currently, many ERP-based BCIs still average ERP signals in specific time intervals to reduce time dimensionality and thus to make sLDA feasible.If the intervals are chosen well, this approach works.Knowing which intervals to use, however, requires expert knowledge and optimal intervals may vary between sessions, subjects and paradigms.Using ToeplitzLDA makes this step obsolete, as every time sample in a typical ERP interval can and should be used.Practically, we found our method performs well even when artifacts or time-locked noise violate the stationarity assumption.However, employing automatic artifact removal techniques (Islam et al., 2016) may align the signal even better with our assumptions and should be evaluated in further work.

Conclusion
We  As the optimal time intervals are often very paradigm specific, we evaluate two additional feature settings, where we simply use every sample (at 40 Hz sampling rate) in the following two intervals.
The Riemannian geometry based classifier we used required an xDAWN decomposition (Rivet et al., 2009) to enhance the ERP response and to reduce dimensionality.We evaluated each of {1, 3, 6, 9} as the number of used xDAWN components.
The Riemannian classifier can be trained using additional templates of either target ERPs only, or of target and non-target ERPs.As performance depends on both of these parameters, we trained a classifier for each of the resulting eight possible combinations.

C. ToeplitzLDA Performance Difference for each Dataset
Figure 7 shows the classification performance improvement of using ToeplitzLDA for each dataset.Most notably in the word datasets, all classifiers performed on a similar, though low level.This can be explained by the general difficulty of the task, and as auditory evoked responses tend to be lower than visually evoked responses.D. Subject-wise Results for the Unsupervised P300 Speller For fair performance comparisons, we used the best performing hyperparameters of sLDA (N t = 6) and of ToeplitzLDA (N t = 27) for the visualization of spelling performances in Figure 8a.Note that we omitted the second block of subject 6 of dataset V_LLP_A, as the data could not be loaded using the provided dataset wrapper.
As shown in Figure 8a, most errors of both methods occur during early letters and that the same subset of subjects can be considered difficult for both methods.However, ToeplitzLDA manages to reduce many of the early errors and for most subjects becomes very stable once the classifier has seen enough data.On average, ToeplitzLDA allows to reduce the number of incorrectly spelled letters by 81.1 % as shown in Figure 8b.Notably, no single subject decreases in performance when using ToeplitzLDA instead of sLDA.

Figure 1 :
Figure 1: Example of a sample covariance matrix of an ERP feature vector for the time points [0.1, 0.15, 0.2, 0.25, 0.3] s with three EEG channels from different head locations: F3 (front left), Cz (center) and O2 (back right).The matrix was calculated on 3896 epochs of a visual ERP paradigm.Black lines delineate the channel-wise (cross-)covariance blocks of the matrix.In order to transform this general covariance matrix into block-Toeplitz form, you would average along the block-diagonals, i.e., averaging blocks sharing the same color.

Figure 2 :
Figure 2: Temporal covariance within different channels of the EEG of subject 6 in the visual V_LLP_A dataset estimated from 3896 epochs.Data was bandpass filtered to [0.5, 16] Hz, transformed into epochs using the [0.1, 0.6) s interval relative to each stimulus, and resampled to 40 Hz sampling rate.The legend in the lower right corner helps to interpret the shown temporal covariances: Dark blue entries relate time samples early in the interval with all later samples, while covariances between late time samples and all even later samples are indicated by bright yellow entries.

Figure 6 :
Figure6: Averaged learning curves of sLDA and Toeplit-zLDA for the first 35 letters (=2380 epochs), averaged over three blocks and 25 subjects.Classifiers were trained on all letters up to and including the one that was evaluated on (see Section 2.6) and updated after each letter.Red lines indicate scenarios violating assumptions A1 and A2.

Figure 7 :
Figure 7: Improvement of classification performance in AUC of ToeplitzLDA compared to all other classifiers on all evaluated ERP datasets.The results are averaged over all subjects of a dataset.Error bars indicate the 95 % confidence interval.All classifiers were trained on the training subset only, i.e., for the LDA methods both class means and covariance were estimated on the subset.
(a) Heatmap showing classification results for the first 35 letters for each subject(left: sLDA, right: ToeplitzLDA).Dark purple squares indicate correctly selected letters and yellow blocks indicate wrong letters.Reduction of incorrectly spelled letters for each subject for ToeplitzLDA compared to sLDA.The average reduction is indicated by the black line.For seven subjects, the ToeplitzLDA allows for a correct selection from the first letter on.

Figure 8 :
Figure 8: Detailed results of the unsupervised visual speller application on both LLP datasets.Subjects from V_LLP_A are denoted with a leading 'A' and analogously for V_LLP_B.

Table 1 :
ERP datasets used for the benchmark.Note that datasets marked with p are private and datasets with * have less than 384 epochs available in the training set (see Section 2.5).
madi, 2013)where the entries of the covariance matrix are regularized towards zero with increasing temporal distance.
session.Data of a session is split exclusively at boundaries of continuous runs, into a training set D tr and a validation set D val , with |D tr | ≈ |D val |.The continuous runs were bandpass filtered to a typical ERP frequency range of [0.5, 16] Hz.
proposed to exploit inherent structure in ERP electroencephalogram data to greatly improve the classification performance of LDA across many different ERP datasets.In an unsupervised visual speller BCI, our method reduces the error rate by 81 % and could even allow seven subjects to spell perfectly starting from the very first letter.Additionally, using the proposed ToeplitzLDA lowers the complexity of LDA training considerably, reducing the needed computational resources.Zhao, M., Guo, C., Feng, C., and Chen, S. Oracle approximating shrinkage estimator based cooperative spectrum sensing for dense cognitive small cell network.In 2017 IEEE/CIC International Conference on Communications in China (ICCC), pp.1-6.IEEE, 2017.