Sampled sinusoidal stimulation profile and multichannel fuzzy logic classification for monitor-based phase-coded SSVEP brain–computer interfacing

Nikolay V Manyakov; Nikolay Chumerin; Arne Robben; Adrien Combaz; Marijn van Vliet; Marc M Van Hulle

doi:10.1088/1741-2560/10/3/036011

1. Introduction

With a brain–computer interface (BCI), brain activity is recorded and used for enabling the subject to directly interact with the external world, by-passing the normal output pathways. In this study, brain activity is read noninvasively, using electroencephalography (EEG). The dependent BCI considered here is based on the steady-state visual evoked potential (SSVEP). This type of BCI relies on the psychophysiological properties of EEG brain responses recorded from the occipital area during periodic visual stimulation (e.g., flickering stimuli). If the latter is at a rate higher than 6 Hz, the individual transient visual responses, which are time- and phase-locked to the stimulus onset, overlap and form a steady state signal that resonates at the stimulation frequency and its integer multipliers [1]. This means that, when the subject is looking at a stimulus flickering at the frequency f, the evoked increase in amplitudes at frequencies f, 2f, 3f, ... can be detected in the Fourier transform of the EEG signal recorded above the subject's occipital lobe. When using several frequencies, each one for encoding a different target, one achieves a frequency-coded SSVEP-based BCI. It has been shown that the overall performance of the frequency-coded SSVEP BCI can be increased by also considering phase information in addition to amplitude information in the decoding process [2, 3]. However, when using a frequency-coding strategy a number of limitations arise: only stimulation frequencies within a particular frequency range evoke reasonable SSVEP responses [4]; the harmonics of some stimulation frequencies could interfere with one another, leading to a deterioration of the decoding performance [5]; additionally, when using a computer screen in combination with the standard 'on/off' visual stimulation (see figure 1(a)), frequencies are limited by the refresh rate of the screen and it is desirable for frequencies to be divisors of the screen refresh rate [5]. This encouraged the search for modifications in stimulation methods in computer screen-based frequency-coded SSVEP BCIs [6].

**Figure 1.** Examples of frame-based 'on/off' stimulation patterns. The white-shaded squares represent 'on' frames, the dark-shaded ones 'off' frames. A screen refresh rate of 60 Hz is assumed.
Download figure:
Standard image

To overcome the above mentioned restrictions, a number of authors suggested [4, 7–9] encoding the target commands not in the frequency, but rather in the phase of the stimulation: the N stimuli are simultaneously flickering at the same frequency f but with different time delays Δt_m = (m − 1)/(fN), corresponding to phase shifts Δφ_m = 2π(m − 1)/N, one for each target command m (m = 1, ..., N).

However, when using a computer screen as a stimulation device, the number of phase-coded targets is also limited in the case of an 'on/off' stimulation: the target stimuli are either maximally bright ('on') or completely dark ('off') depending on the video frame. For example, to produce a 15-Hz stimulus on a 60-Hz screen, there are only four video frames per stimulation period, which leads to only four possible phase-shifts that can be rendered (as, for example, in figure 1(b)). In order to deal with this, one can rely on the fact that video frames on computer screens are updated progressively, usually from top to bottom. This means that relatively small stimuli at different vertical positions physically appear on the screen at different moments in time, thus causing different phase lags in the stimulation. This effect can be used to increase the number of phase-coded stimuli [10]. Although useful, this approach still relies on frequencies that are integer dividers of the screen refresh rate. In addition, it poses some restrictions on the stimulus layout: e.g., it is not possible to achieve an arbitrary phase shift between horizontally arranged stimuli. In this study we propose another way to overcome the current limitations of screen-based stimulation for phase-coded SSVEP.

In phase-coded SSVEP BCI, the classifier has to operate on wrapped phase information, extracted from EEG data, which is circular by nature. The output classes are associated with phases and therefore are circularly interrelated as well. However, the majority of conventional classifiers used for phase-coded BCIs assume noncircular data, which, in turn, implies some (unfolding) conversion of the input/output circular data. Usually this conversion is done in a straightforward way, not preserving the topological structure of the circular data, and, therefore, in fact inappropriately. This calls for a new BCI decoding algorithm specifically designed/tuned for circular data.

Due to the complex nature of the problem, phase-coded SSVEP BCI systems have been proposed that consider the phase extracted only from a single channel (either Oz referenced to the mastoid [7, 11], or by using a bipolar lead [4, 9]). To incorporate phase information from several channels, spatial filtering can be used [12–16]: it provides weighted mixtures of recording channels, leading to new channels from which the phases can be further extracted. The phase can then be processed either by the mentioned single channel decoding approach [12], or by applying standard multichannel decoding strategies such as single layer neural networks [13], support vector machines [14] or probabilistic neural networks [15]. While such classification procedures lead to adequate results, the proposed decoding strategies either do not meet the multichannel requirement as in the case of [12] (the authors admitted that 'the whole methodology is not fully based on multiple channels'), or do not cope with the problem's inherent circular nature as in [13–15]. In this paper we present a fully multichannel approach based on a fuzzy logic classifier that deals with circular information from several channels either without the use of spatial filtering or on top of it. As is customary when adopting a multichannel solution, feature selection is first performed to select the optimal (in terms of decoding accuracy) channels and to reduce the dimensionality of the data. In this study we propose a filter-based feature selection technique based on circular statistics and specifically designed for our fuzzy classifier.

2. Methods

2.1. Experimental procedure

In all of the experiments the subjects were sitting about 60 cm from the laptop's LCD screen on which the stimuli of size 6 cm × 6 cm were shown. Seven (N_s = 7) male subjects (aged 23–35 with an average of 28.3 years) participated in the experiments.

The following two experiments were performed.

(1)
A stimulus flickering at frequency f = 15 Hz with zero phase shift (Δφ = 0) was presented at the center of the screen for 5 s with a fixation point (a small marker on the screen indicating the location the subject should attend to) in the center of the stimulus, followed by 1 s of no stimulation. This was repeated 30 times with the standard 'on/off' (50% duty cycle) and the proposed sampled sinusoidal stimulation (see following section 2.2). The recorded data were used to investigate the phase stability.
(2)
A set of N = 6 stimuli (50% more than the number of targets feasible with the standard 'on/off' stimulation) flickering at frequency f = 15 Hz with phase shifts Δφ_m = π(m − 1)/3 (m = 1, ..., 6) were simultaneously presented using the proposed sampled sinusoidal stimulation (see section 2.2). The 6 cm × 6 cm stimuli were arranged in two rows and three columns, separated 7.5 cm horizontally and 7.75 cm vertically. A fixation point indicated the stimulus the subject should attend to. All stimuli flickered for 5 s followed by 1 s without stimulation, allowing the subject to shift his/her focus to the new position of the fixation point. The subjects were requested to attend each stimulus N_r = 20 times. In total, we acquired 6 × 20 = 120 five-second-long EEG data intervals per subject.

2.2. Visual stimulation

The stimulation frequency in our experiments was f = 15 Hz, which was reported in [17] to elicit on average the largest SSVEP amplitude. For visual stimulation, we used a laptop LCD screen with a reported refresh rate of about 60 Hz. More precisely, the estimated refresh rate was on average 59.92 Hz, which led to an 'on/off' stimulation very close to, but not exactly at 15 Hz.

To encode more than N = 4 targets (which is the limit for 'on/off' stimulation for the used stimulation frequency), we propose varying the desired intensity of the stimulus continuously between 0 and 1 using a sinusoidal periodic profile with frequency f = 15 Hz, specified as $\alpha _f(t)=\frac{1}{2}(1+\sin (2\pi ft+\Delta \varphi ))$ . For each video frame, the intensity of each target is estimated by sampling the desired intensity profile at the time t when the stimulus appears on the screen (see figure 2). In this way, the stimulation relies on time t, rather than on the frame counter.

2.3. Luminance profiles of the stimuli

In order to gauge the real shape of the stimulus luminance profiles and to confirm our prediction (depicted in figure 2 as the rendered profile), we recorded the actual luminance of the stimuli with a photodiode. The photodiode measurements are expressed in volts, while we were interested in luminance measured in cd m⁻², thus we had to convert the photodiode's output into luminance. To build this mapping we collected a series of luminance measurements (using a Minolta Chroma Meter CS-100) along with the photodiode readouts. By uniformly sampling the requested intensity in the [0; 1] range we collected about 200 measurements of the stimulus luminance and the corresponding photodiode output. The collected data were then fitted by a polynome.

Figure 3 shows the stimulus luminance profiles recorded with the photodiode for both the 'on/off' and the sampled sinusoidal stimulations. All luminance-related recordings were done using the same setup as in our experiments (see section 2.1).

2.4. EEG data acquisition

The EEG data were recorded using an eight-channel EEG system [18] developed by Holst Centre² and built around an ultra-low power eight-channel EEG amplifier chip (for technical specifications, see [19]). In this device each EEG channel is sampled at 1024 Hz with 12 bits per sample. We used active Ag/AgCl electrodes (ActiCap, Brain Products) located primarily on the occipital area, namely at positions PO7, PO3, POz, PO4, PO8, O1, Oz, O2 according to the international 10–10 system. The reference and ground electrodes were placed on the right and left mastoids respectively (mainly to compare our decoding algorithm with the one proposed in [7], where information from only a single Oz electrode referenced to the right mastoid was used).

2.5. Feature generation

Additionally to the eight EEG channels mentioned above, for further analysis we also considered the same eight electrodes re-referenced with respect to a common average reference (CAR), and all possible bipolar combinations ( $C^2_8=28$ ), thus in total K = 8 + 8 + 28 = 44 channels s_i(t), i = 1, ..., K.

For each stimulation stage the wrapped phases φ_i were estimated as [4, 20]:

$\begin{eqnarray} &&\fl \varphi _i = {{\rm Arg}}\mathopen {}\mathclose {\left(\mathcal {F}(s_i)(f)\right)} \nonumber\\ &&\hphantom{\fl \varphi _i }= {{\rm Arg}}\mathopen {}\mathclose {\left(\sum _ts_i(t)\cos (2\pi hft) + j\sum _ts_i(t)\sin (2\pi hft)\right)} \nonumber\\ &&\hphantom{\fl \varphi _i } ={{\rm atan2}}\mathopen {}\mathclose {\left(\sum _ts_i(t)\cos (2\pi hft),\sum _ts_i(t)\sin (2\pi hft)\right)},\nonumber\\ \end{eqnarray} \tag{ 1 }$

where f is the stimulation frequency, $\mathcal {F}(s_i)$ is the discrete Fourier transform of the i-th channel data sequence s_i(t), h is the considered (sub)harmonic coefficient, and $j=\sqrt{-1}$ . The phases of only the fundamental stimulation frequency were used in this study, thus h = 1, leading to 44-dimensional feature space of wrapped phases φ_i (i = 1, ..., K, K = 44). For our assessment we considered N_t = 5 cases, in which the recordings s_i(t) had corresponding durations T = 1, ..., 5 s and were taken from the beginning of each stimulation stage, i.e. when the subjects were expected to attend to the flickering stimulus marked with the fixation point (see section 2.1 for details).

2.6. Feature selection

To reduce the amount of information for the subsequent classification we propose a filter-based (thus not relying on further classification) feature selection procedure on training data, which among all class–feature pairs $\mathcal {P}=\lbrace (m,i): m={1},\ldots ,{N}, i={1},\ldots ,{K}\rbrace$ select only a subset $\mathcal {S}\subset \mathcal {P}$ of the statistically most relevant (for the desired class separation) pairs.

Since the input features (i.e. the wrapped phase values φ_i estimated from the channels s_i, i = 1, ..., K) are circular, we suggest employing circular statistics [21] and assume that the input features φ_i from the m-th class are sampled from a von Mises distribution $p_i^m(\varphi |\mu _i^m,\kappa _i^m)=\exp (\kappa _i^m\cdot \cos (\varphi -\mu _i^m))/(2\pi I_0(\kappa _i^m))$ , where I₀ is the modified zero-order Bessel function, $\kappa _i^m$ is the concentration parameter and $\mu _i^m$ is the circular mean parameter. To clarify the meaning of these parameters, assume we have the i-th channel data (s_i) acquired when the subject was observing the flickering target corresponding to the m-th class, i.e. the one with the phase shift Δφ_m = 2π(m − 1)/N. Having N_r such recordings for given m and i, it is possible for each recording r (r = 1, ..., N_r), using (1), to obtain the phase estimate $\varphi ^m_{i,r}$ . For fixed m and i, all phase estimates $\varphi ^m_{i,r}$ are expected to cluster around their circular mean $\mu _i^m$ (see figure 4, where the directions of the radial lines correspond to the discussed phase means), because the phase shifts (Δφ_m), of the corresponding visual stimulations, are identical. The dispersion of the phase estimates is reciprocally related to the concentration parameter $\kappa _i^m$ : the larger $\kappa _i^m$ is, the more tightly the phase estimates $\varphi ^m_i$ are grouped around the mean $\mu _i^m$ , and, therefore, the better the m-th class is localized in the i-th channel.

**Figure 4.** Distribution of phases (expressed as angular values) estimated from different channels of the data from the second experiment (see section 2.1) for the first subject. Each dot corresponds to a phase estimated from a 1 s long interval recorded when the subject was observing a particular phase-shifted stimulus. Colors represent target classes (with the stimulus shifted by Δϕ_m = (m − 1)π/3, where m is the class index). The directions of the radial lines correspond to the circular means for each class. For the sake of visualization, each class is drawn on a circle with a different radius.
Download figure:
Standard image

By feature selection, we want to select only those class–channel combinations (m, i), that enable us to separate the classes as well as possible. In order to perform this selection, we need a measure $Q^{m_1,m_2}_i$ of the quality of classes m₁ and m₂ phase data separation in the i-th channel. Assuming an underlying von Misses distribution for the phase estimates (for each class within each channel), we propose employing the Watson–Williams test—a circular analogue of the one-factor ANOVA—to define the class separation quality measure. The channels, for which this test produces smaller p-values, would potentially separate classes m₁ and m₂ better than others. Thus, the measure $Q^{m_1,m_2}_i$ can be specified, for example, as:

$\begin{equation} Q_i^{m_1,m_2}=\exp \big(-p_i^{m_1,m_2}\big), \end{equation} \tag{ 2 }$

where $p_i^{m_1,m_2}$ is the p-value of the corresponding Watson–Williams test applied to the phases estimated from the i-th channel of the data coming from several recordings for the m₁-th and m₂-th classes. Now, the better channels m₁ and m₂ are separated, the larger the value of the measure $Q_i^{m_1,m_2}$ is.

Assume that we want to find a feature (channel) that provides the best separation of the classes m₁ and m₂. This can be done by selecting the channel i with the largest value of $Q_i^{m_1,m_2}$ . Note, that this i-th channel is selected to separate the chosen classes m₁ and m₂ only, but not all other classes. This leads to the inclusion of only two class–channel pairs (m₁, i) and (m₂, i) into $\mathcal {S}$ . In the same way, we can select not only one but several (n₁) channels $\mathcal {I}_1=\lbrace i_1,\ldots ,i_{n_1}\rbrace$ producing n₁ largest values of the separability measure (or, equivalently, n₁ smallest p-values), leading to the inclusion of class–feature pairs (m₁, i₁), $(m_2,i_1),\ldots ,(m_1,i_{n_1})$ , $(m_2,i_{n_1})$ into $\mathcal {S}$ . This procedure could be seen as an implementation of a one versus one classification strategy, which is done for all possible pairs of classes m₁ and m₂ (1 ⩽ m₁ < m₂ ⩽ N).

The procedure presented above selects an optimal set of class–channel pairs for a pairwise class separation. As a consequence, this approach does not guarantee that the set $\mathcal {S}$ of the selected class–channel pairs would contain the entire channels (an i-th channel we call the entire channel with respect to $\mathcal {S}$ , if all the possible class–channel pairs (m, i) for all m = 1, ..., N, were selected into $\mathcal {S}$ ). To select the entire channels, in which all the target classes are well separated, we propose the following approach.

The measure of separability of all classes within the i-th channel, can be defined as:

$\begin{equation} Q_i^{{\rm all}} = \min _{1\le m_1<m_2\le N}Q_i^{m_1,m_2}. \end{equation} \tag{ 3 }$

Similarly to $Q_i^{m_1,m_2}$ , larger values of $Q_i^{{\rm all}}$ suggest a better separability of all classes in the i-th channel, while lower values indicate that at least two classes are badly separated. Using $Q_i^{{\rm all}}$ , we choose the n₂ 'best' entire channels (features) $\mathcal {I}_2=\lbrace \tilde{i}_1,\ldots ,\tilde{i}_{n_2}\rbrace$ (corresponding to the first n₂ maximal values of $Q_i^{{\rm all}}$ across channels i = 1, ..., K), and extend the selection $\mathcal {S}$ with all the class–feature pairs related to the indices from $\mathcal {I}_2$ : $(1,\tilde{i}_1),\ldots ,(N,\tilde{i}_1),\dots ,(1,\tilde{i}_{n_2}),\dots ,(N,\tilde{i}_{n_2})$ . Thus, for each feature $\tilde{i}\in \mathcal {I}_2$ , all N classes are selected. Hence, such a procedure allows one to select those entire channels for which any class has the best separability with respect to all other classes (which can be seen as a one versus all strategy). This extension of the set $\mathcal {S}$ might seem redundant, but it enables us to perform a comparison with the single channel-based classifiers described in the literature.

2.7. Classification

During the training stage the system learns to distinguish between N phase shifted stimuli/classes using the previously selected on training data set $\mathcal {S}$ of class–feature pairs. To each class–feature pair $(m,i)\in \mathcal {S}$ we assign a characteristic membership function $\mu _{A_i^m}(\cdot )$ indicating the level of affiliation of feature i to class m. It is defined as a normalized to [0; 1] von Mises distribution:

$\begin{equation*} \mu _{A_i^m}(\varphi ) =\frac{p_i^m\big(\varphi |\mu _i^m,\kappa _i^m\big)}{p_i^m\big(\mu _i^m|\mu _i^m,\kappa _i^m\big)} = \exp \big(\kappa _i^m\big(\cos \big(\varphi -\mu _i^m\big)-1\big)\big). \end{equation*}$

The classifier is based on a fuzzy system [22] with a K-dimensional input (here, for the sake of simplicity, all K features are considered—keep in mind that some of those features could have been eliminated by the feature selection procedure) and a two-dimensional output. The fuzzy system consists of N (one for each class m = 1, ..., N) fuzzy IF-THEN rules R^m of the form:

$\begin{eqnarray*} R^m : &{\rm IF\ }\big(\varphi _1\hbox{ is }A_1^m\hbox{ AND\ }\cdots \hbox{ AND\ }\varphi _K\hbox{ is } A_K^m\big)\\ &{\rm THEN\ }\big(y_1\hbox{ is }B_1^m\hbox{ AND\ }y_2\hbox{ is }B_2^m\big), \end{eqnarray*}$

where the fuzzy sets $A_i^m$ and $B_l^m$ are characterized by the membership functions $\mu _{A_i^m}$ and $\mu _{B_l^m}$ respectively. Since the resulting classes are distributed circularly, we divide the unit circle into N equal segments $\mathopen {}\mathclose {\left[2\pi (m-1)/N; 2\pi m/N\right)}$ (m = 1, ..., N), centered at φ_m = 2π(m − 0.5)/N, and use as the output class membership functions $\mu _{B_1^m}$ and $\mu _{B_2^m}$ , which are singletons³ at cos φ_m and sin φ_m respectively.

The classification is based on Mamdani-type reasoning, where antecedents and consequences of each rule are connected by the min T-norm, leading to a universal approximator [23]. Fuzzifications of the actual (indicated with a bar) input values $\bar{\varphi }_i$ are performed based on the singleton fuzzifier. Thus, each rule R^m leads to the result:

$\begin{equation*} \mu _{\bar{B}_l^m}(y_l)=\min _{\lbrace i:(m,i)\in \mathcal {S}\rbrace }\lbrace \mu _{A_i^m}(\bar{\varphi }_i),\mu _{B_l^m}(y_l)\rbrace ,\quad l=1,2. \end{equation*}$

As a consequence, the resulting (output) fuzzy sets (taken among all rules m = 1, ..., N) will be:

$\begin{equation*} \mu _{B_l}(y_l)=\max \lbrace \mu _{\bar{B}_l^1}(y_l),\dots ,\mu _{\bar{B}_l^N}(y_l)\rbrace ,\quad l=1,2. \end{equation*}$

The defuzzification is based on the center of gravity method and produces crisp values $\bar{y}_1$ and $\bar{y}_2$ , which are then used to estimate the target class index $\bar{m}$ that satisfies the inequality $2\pi (\bar{m}-1)/N\le {{\rm Arg}}(\bar{y}_1+j\bar{y}_2)<2\pi \bar{m}/N$ .

3. Results

3.1. Phase stability and class separability

While the sampled sinusoidal stimulation in principle allows us to encode more targets, it could very well be that it increases the scatter of the phases within each class compared to the 'on/off' mode. To verify this, the data from the first experiment were used. Phases were estimated and subjected to an F-test with the null hypothesis that two independent samples resulting from 'on/off' and sampled sinusoidal stimulations came from distributions with the same variance. Such a test was performed separately for each subject, for different stimulation intervals T = 1, ..., N_t, and for all features i = 1, ..., K (in total N_s × N_t × K tests). Here, we should note that a smaller variance potentially leads to more separable classes and, therefore, to the encoding of more targets simultaneously.

Analysis of the phase stability revealed that in only 7.7% of the tests, the 'on/off' stimulation produced a significantly (p < 0.05) smaller variance than the sampled sinusoidal case. The sampled sinusoidal stimulation appeared to be significantly better (i.e. the variance is smaller) in 42.3% of all tests. For the remaining tests there was no significant difference. If we consider only the best n₂ channels (N_s × N_t × n₂ tests), the above mentioned percentage for the sampled sinusoidal stimulation method even increases. For example, for n₂ = 5 the sampled sinusoidal stimulation is significantly better in 71.4% whereas the 'on/off' stimulation in only 0.8% of all tests.

In addition to the previous group of tests, we estimated the circular variances for both SSVEP stimulation methods for every subject, every feature i and every stimulation interval T. For each channel and each stimulation interval separately, a repeated measures ANOVA was performed to assess the differences between both stimulation methods (K × N_t tests). In 19.16% of the tests the sampled sinusoidal stimulation performed better (p < 0.05), while in the other tests there were no significant differences between the two stimulation methods.

We also explored the feasibility of encoding more than four simultaneous targets at 15 Hz on a 60-Hz monitor by using the proposed sampled sinusoidal stimulation. For this, we used all recorded data from the second experiment. As one can see from figure 4, depending on the observed channel, the classes can be well separated, confirming the hypothesis that more than four targets can be encoded. The results for the Oz electrode referenced to the right mastoid, as used in [7] (see figure 4(a)), and for the bipolar combination Oz–POz, as exploited in [4] (see figure 4(b)), show less class separation than for the Oz referenced with respect to CAR (see figure 4(c)). However, the class separability as well as the optimal channel vary from one subject to another. This supports the necessity of the proposed feature selection procedure prior to classification.

3.2. Classification performance

In order to analyze the performance of the proposed classifier, we acquired 120 five-second-long EEG data intervals for each subject, which corresponds to N_r = 20 trials per target class in experimental conditions 2 (see section 2.1). This is a comparable amount of data as used in similar studies [4, 12, 16], where correspondingly 15, 20 and 12 trials per class were used. We have employed leave-one-out cross-validation to compare the performance of the considered classifiers. This method is commonly used in BCI research not only to estimate performance of a BCI system [4, 12], but also to compare different approaches [16].

Figure 5 shows the leave-one-out cross-validation classification results (using data from the second experiment) based on the proposed fuzzy classifier for different EEG segment lengths T for phase derivation and classification. We compare the results to other methods proposed in the literature when applied to our data. The feature selection parameters (n₁, n₂) were obtained through a grid-search (n₁, n₂ = 0, ..., 3) on the training data (leaving the test data for the sole purpose of measuring the classifier performance). The statistical significance of the difference in accuracy between the different methods was assessed using repeated-measures ANOVA tests.

When comparing the results obtained by applying the existing single channel methods (see sections A.1 and A.2 in the appendix) to our data (with the data measured from the Oz channel referenced to the right mastoid and from the Oz–POz bipolar combination, respectively) and by applying the method employing spatial filtering based on canonical correlation analysis (see section A.4 in the appendix), we observe a superior performance for the proposed method (p < 0.001). By applying the single-channel methods described in sections A.1 and A.2 in the appendix to the optimal channel (obtained via an exhaustive search through all channels s_i, using a wrapper approach applied on the training data, as described in section A.3 in the appendix), and by applying the multichannel method based on the multilayer feedforward neural network with multi-valued neurons (MLMVN, see section A.5 in the appendix), we also observe the superiority of our multichannel fuzzy classifier with p-values as seen in figure 5. This shows a better generalization performance of the proposed fuzzy method, in particular in comparison to some of the methods considered, since the multichannel information increases the decoding accuracy compared to the single-channel case. To verify the latter assertion, we also repeated the classification with the same fuzzy classifier for all pairs (n₁, n₂) used in the grid search.

We found that the best accuracy was reached uniquely by a single-channel mode (n₁ = 0, n₂ = 1) in only 6% of the cases. In 70% of the cases this was achieved solely by a multichannel mode (other combinations of (n₁, n₂)) and in the remaining 24% the best accuracy could be reached by both modes of the classifier. This means that in range 70–94% of the cases, the gain in accuracy was due to the multichannel mode.

In figure 6, we present the averaged cross subject results for all seven classifiers considered. With increasing T (the length of the EEG segment used for classification) all classifiers reveal a saturating accuracy. The comparison becomes more informative when the classifiers are evaluated on more difficult conditions, i.e. when T is small as in our case. That is why we present our results for the shortest considered length of the EEG segment used for classification: T = 1 s. As can be seen from figure 6, the proposed classifier yields a superior performance for all classes compared to other classifiers for the same experiment and for the same subjects.

The information transfer rate (ITR) [24] of the proposed system, estimated for all subjects as a function of EEG segment length T, is presented in figure 7. The duration of each decision consisted of T seconds of visual stimulation extended by the half-second pause needed by the subject to shift his/her gaze and focus on the next target, as was also done in [4].

**Figure 7.** ITR of the proposed system for each subject (thin blue lines) and averaged across subjects ITR (thick black line) as function of EEG segment length T used for decoding in the second experiment (see section 2.1).
Download figure:
Standard image

4. Discussion

In this paper we proposed a sampled sinusoidal stimulation mode for SSVEP BCI that overcomes some of the limitations of the conventional 'on/off' stimulation mode on a regular computer screen. The proposed stimulation design allows for any phase shift Δφ in the stimulation. This can be exploited not only to increase the number of target stimuli for SSVEP BCI, but also to compensate for the stimulus appearance lags caused by the progressive rendering of the screen.

Based on the results of section 3.1 we can also conclude on statistical grounds that for phase-coded SSVEP BCI the proposed sampled sinusoidal stimulation running at 15 Hz on an LCD screen is at least not worse than the popular 'on/off' method: the deviation from the circular mean in the case of sampled sinusoidal stimulation is not higher than for the 'on/off' one. We attribute this to the fact that the proposed (sinusoidal) intensity sampling relies on the precise system timing, rather than the video frame index, which makes it less dependent on the screen refresh rate than the 'on/off' stimulation, as explained in section 2.2.

To answer the question regarding the number of targets allowed to be encoded by the proposed method using a single frequency, we have to take into account several factors: the refresh rate of the screen used in the stimulation, the stimulation frequency, the length of the EEG interval taken for deriving the phase information, the desired accuracy, and so on. If one takes a monitor with a higher refresh rate (than the 60-Hz one used in this study), or uses a lower stimulation frequency (than the considered 15 Hz), one will have more frames per stimulation period, which will lead to a smoother (and closer to sine wave) luminance profile of the stimuli (compare with figure 3). This, as a consequence, could render the phase estimates more stable, i.e. more tightly concentrated around their means, which would make the corresponding values of the concentration parameter κ higher. Evidently, one can increase the number of encoded targets at the expense of a lower decoding accuracy. The exhaustive validation of target amount dependency on the parameters mentioned is a large study on its own, and can be seen as grounds for future work.

In order to provide a suggestion on the possible number of encoded targets with proposed stimulation for the conditions used in this study (15-Hz stimulation on a 60-Hz monitor), we estimated the (averaged across all subjects and all classes) standard deviation of phase distribution within a class. For each considered EEG segment duration T = 1,2,...,5 s the corresponding estimated mean standard deviations are: 27.8°, 18.1°, 16.3°, 14.8° and 13.4°.

As can be seen for the best channel in figure 4(c), the class mean phases are not completely equidistantly distributed on the unit circle. This was also noticed in [4, 12] for an 'on/off' stimulus on a computer screen. In [12] it was hypothesized that this could be due to the layout of the target stimuli: an attended target is surrounded by several other flickering targets which could influence the phase of the subject's EEG. In our experiments, we tried to separate the stimuli on the screen as much as possible, but the distance between the stimuli was, probably, not large enough to avoid any influence from neighboring stimuli. This influence could be inferred from the results presented in figure 6, where often the first and the fifth classes were more accurately detected. We explain this by the stimulus layout on the screen: classes one and five correspond to the stimuli shown at the top-left and top-right corners of the screen. Following the above mentioned hypothesis, one can assume that the targets in the corners of the screen are detected with a higher accuracy, as they have fewer neighbors. This assumption would need to be explored by specific experiments with stimuli placed at different locations on the screen. Based on the outcome, one could experimentally verify whether the target stimuli layout and/or the stimulation mode, or neither, influence the decoding accuracy.

As can be seen from figure 4(c), different channel combinations and referencing strategies influence the separability of the classes. Thus, the proper selection of the channels should always be considered, as indicated in [4]. Compared to [4], where only a single channel was considered, we report on a multichannel approach supplied with an automatic feature selection procedure, both of which can be observed as a step forward in the phase-coded SSVEP BCI domain, as only some channel combinations provide good separability. This calls for a search for the proper spatial filtering approach, allowing the improvement of class separability. As a step in this direction, the approaches of [12–16] can be considered.

We hasten to add that the spatial filters should be specially constructed to enhance the separability between classes in the phase domain (by, for example, minimizing the circular variance within each class and maximizing the angular distances between the classes circular means). Filters that are primarily designed to enhance the amplitude signal-to-noise ratio in the frequency domain, as it is done in frequency-coded SSVEP BCI, do not necessary yield good results for phase-coded SSVEP BCI. Since our study is not focused on spatial filtering (preprocessing), but rather on multichannel classification, we considered some commonly used channel combinations: original EEG channels, bipolar channel combinations and channels referenced to CAR. But the proposed system performance could be further improved by adapting the spatial filter to the given data. Nevertheless, we should also mention that spatial filtering does not exclude the multichannel decoding proposed in this study. Normally, spatial filtering constructs several (linear) combinations of the channels according to some criteria. For further decoding it is useful to take not only one such combination (where, in this case, we can use a single-channel decoder as in the CCA method used for comparison in our study, see section A.4 in the appendix), but rather several of them, which could be beneficial for increasing the decoding accuracy. These combinations have to be further processed by a multichannel decoder. The multichannel decoding method presented here can also be used after the spatial filtering, as, for example, in this work, where it is applied on top of a predefined simple spatial filter (which involves the bipolar lead combinations and CAR re-referencing mentioned above).

The comparison performed in this paper revealed the superior performance of the proposed classifier on our data, especially for the short EEG segments used for phase estimation. But, we have to add that we applied all of the methods used for comparison under the conditions of our particular experiment, while the decoding algorithms, described in the appendix, were initially designed for other BCI systems with other stimulations, number of targets and channels, and so on. Thus, it could be true that for conditions different than ours the outcome of the comparison could be different.

Our comparison reveals that better accuracy is achieved by the methods which use some adjustments (through exhaustive wrapper channel selection or training) on training data. All such adjustments take more time than filter-based feature selection and parameter induction, used for the proposed classifier, which also favor the latter.

The comparison of the proposed classifier with another multichannel approach, based on MLMVN (see section A.5 in the appendix), reveals a better performance of the former. Note, that the MLMVN approach also depends on several parameters (such as number of selected features, number of neurons in the hidden layer, desired root mean square error (RMSE) during the training), which we took following the strategy described in [25]. Proper adjustment of these parameters through cross-validation on training data could lead to an improvement, albeit it would be time consuming.

The experiments reported here were done using stimulations with only one frequency f = 15 Hz, but they can be easily extended to the case of several stimulation frequencies. This would allow one to increase the number of targets even more by applying frequency-coding and phase-coding simultaneously. The latter SSVEP BCI was already proposed in [4], using the single-channel decoding algorithm described in section A.1 in the appendix. In [4], the frequencies 10, 12 and 15 Hz were used, and at each frequency the phase lag was used to encode a maximal number of phases with 'on/off' stimulation, i.e. 6, 5 and 4 (15 in total) stimuli, respectively, which are the limits for such a stimulation. The methodology we presented in this paper could be used to improve such a system in two ways. First, the sampled sinusoidal stimulation allows one to increase the number of targets encoded by each frequency. For example, the results of our experiments show that instead of four phase-lagged stimuli for 15 Hz, one can easily encode six stimuli. Thus, even keeping the same classifier as in [4], one can increase the number of encoded targets. Another improvement could be achieved by using a fuzzy decoder, as proposed in this paper. By constructing several such fuzzy decoders, one for each frequency (to distinguish phase shifts at this particular frequency), and by taking for each such decoder not only the angular value ${{\rm Arg}}(\bar{y}_1+j\bar{y}_2)$ (see section 2.7), but also the absolute value $M=|\max _{y_1}\mu _{B_1^m}(y_1)+j\max _{y_2}\mu _{B_1^m}(y_2)|$ , one can determine the stimulation frequency for which M is maximal for all decoders and corresponding phase shifts. Here, $\max _{y_l}\mu _{B_1^m}(y_l),\ l=1,2$ determine the strength of the result from each decoder by checking the maximum of resulting fuzzy sets. Thus, a high M value could be seen as evidence for a particular frequency. Such a method is, as a consequence, an extension of the method of [4] (see section A.1 in the appendix) developed by adding multichannel decoding (instead of one channel) and by assuming that the boundaries of the phase classes depend not only on their means, but also on their variances. In this way, the methodology presented here could lead to an improvement of existing systems.

While the previously described decoding of frequency-phase combinations is preferable, one could also use a two-stage procedure that first identifies the observed stimulation frequency by any of the methods developed for frequency-coded SSVEP BCI, and then uses phase to identify the target stimulus.

In [4] the authors analyzed how the decoding is affected by the involvement of harmonics and found that the latter improves the decoding quality. In our paper we restricted ourselves to the fundamental frequency only (h = 1), but, as we have already mentioned, the proposed decoding algorithm is able to incorporate features from the harmonics among other signal features. We expect this will lead to even better results.

We constructed a fuzzy classifier, the membership function parameters of which were statistically derived from the training data. One could possibly further increase the accuracy of the detector by applying the more flexible architecture of a neuro-fuzzy system [22], where many parameters (such as means and concentration) can be adjusted by sampling the training set, instead of the statistical induction we used. In our feature selection procedure we did not explicitly excluded dependence between the features. Thus, one can foresee an improvement in classification (or, at least, by making the classifier even more sparse) by eliminating such dependencies.

Another improvement for the phase-coded scenario can be expected from an optimization of the stimulation profile. Due to hardware limitations (i.e. nonzero pixel response time and input latency), the real screen luminance profile does not coincide with the desired intensity profile, but rather is a smoothed version of the latter [26], as could also be seen in figure 3. When stimulating with intensity values from the [0; 1] range (rather than binary set {0, 1}) and accounting for the screen-specific pixel intensity dynamics, one could construct an intensity profile in a such way that it would more closely follow the desired one.

The second experiment (see section 2.1) setup is similar to the EEG-based BCIs based on synchronous (or cue guided) protocols where the subject must follow a fixed repetitive scheme to switch from one mental task to the next [27]. In these synchronous BCI systems, the phenomena (to be recognized in EEG) are time-locked to the cue and the trial lasts for a predefined time span of several seconds. On the contrary, asynchronous (or self-paced) protocols are based on the fact that the subject makes voluntary self-paced decisions on when to stop doing a mental task and when to start the next one. The question regarding how to implement phase-coded SSVEP BCI working in the asynchronous mode is still open.

5. Conclusion

We advocated the use of a sampled sinusoidal stimulation mode for phase-coded SSVEP BCI. We showed that it yields stable phase estimates and that it allows for encoding more targets than the traditional 'on/off' stimulation mode for phase-coded SSVEP BCI. We introduced a filter-based feature selection procedure relying on circular statistics to select the relevant features according to their impact on the stimuli/class separability. We also proposed a multichannel fuzzy logic-based classifier that distinguishes itself from the existing phase-based SSVEP decoding methods for BCI which are intrinsically single-channel ones.

Acknowledgments

NVM is supported by the research grant GOA 10/019, NC is supported by the Tetra project Spellbinder, AR and AC are supported by IWT doctoral grants, MvV is supported by IUAP P6/29 and MMVH is supported by PFV/10/008, CREA/07/027, G.0588.09, IUAP P6/29, GOA 10/019 and the Tetra project Spellbinder.

Appendix.: Decoding methods

A.1. Method of Jia and co-workers

In [4], the following method was proposed. Based on the training data, the reference phases $\varphi _{m,f}^{{\rm Ref}}$ are estimated by averaging, over the whole training set, the Fourier coefficients at the stimulation frequency f of each target class m = 1, ..., N. Here, only one optimal electrode (see section A.3 in the appendix) or Oz–POz bipolar combination (as a proper channel for the data in [4]) is considered. To decode input data (e.g., from the test set), the Fourier coefficients at stimulation frequency f are estimated from the same, previously selected optimal electrode. These complex coefficients, considered as vectors in $\mathbb {R}^2$ , are then projected onto the directions of all reference $\varphi _{m,f}^{{\rm Ref}}$ phases and the resulting target class $\tilde{m}$ is selected as the one with the maximal projected value $\rho _{\tilde{m}}$ . If additionally to the fundamental frequency the harmonics are also considered, then their Fourier coefficients are also projected onto the considered reference phase directions. The class, for which the sum of the projections of all the considered frequencies (fundamental + harmonics) is maximal, is then selected as the winner class.

A.2. Method of Lee and co-workers

In [7], the following method was proposed. The EEG signals recorded from Oz channel referenced to the mastoid were band pass filtered in the range [f − 2 Hz, f + 2 Hz], where f is the stimulation frequency. Based on the training data (1 min recorded when a subject is observing a flickering stimulus with 'zero' phase lag), an SSVEP_Ref is generated by averaging all epochs (EEG recordings corresponding to one period of stimulation, from one channel). The reference value t_Ref is defined from the obtained SSVEP_Ref as the latency of the maximum amplitude peak. In the decoding stage, the phase lag between SSVEP_Ref and SSVEP_gaze is evaluated in order to identify the target class. This was done by cutting the SSVEP_gaze signal into one-period-long segments, averaging across these segments, and by determining from the resulting average the latency of the maximum amplitude peak, called t_peak. Next, the difference Δt = t_Ref − t_peak is transformed into a phase difference θ = 2πΔtf. This phase difference θ is then wrapped to the interval [0, 2π) by (if necessary) adding or subtracting 2π. The achieved phase distance θ is compared to the expected phase delays θ_m = 2π(m − 1)/N (m = 1, ..., N) through an estimation of the angular distance as $D_{m_c}=|\theta _m-\theta |$ . The resulting target class is then derived as $\begin{array}{@{}c@{}}{{{\!\!\hbox{ arg min} }}\;D_m}\\ {m}\end{array}$ .

A.3. Optimal channel

We also evaluated the methods described in sections A.1 and A.2 in the appendix using an optimal channel. The latter was chosen based on a leave-on-out cross-validation on training data with the wrapper method, i.e. one sample (validation data) was excluded from the training data and the classifier was trained on each of K considered channels (see section 2.5) on the remaining data and applied to the validation data. By using every sample as the validation data, we aggregated all results and obtained a measure of the classification accuracy for every channel. The channel with the maximal accuracy was chosen as the optimal one. After that, the classifier was retrained for this optimal channel on the entire training data set.

A.4. Method based on CCA

In [12], the following method was proposed. A spatial filter was constructed to incorporate information from several channels followed by a decoding based on a one-channel classifier. Consider a signal $x=w_x^1s_1(t)+w_x^2s_2(t)+\cdots +w_x^Ks_K(t)$ as a linear combination of several EEG channels (in our case all eight channels above the occipital lobe), and the signal $y=w_y^1\sin (2\pi ft)+w_y^2\cos (2\pi ft)$ , where f is the stimulation frequency. The vectors $W_x=(w_x^1, w_x^2,\dots ,w_c^K)^T$ and $W_y=(w_y^1, w_y^2)^T$ are estimated using a canonical correlation analysis (CCA), to maximize the correlation coefficient between x and y. The vector $W_y=(w_y^1, w_y^2)^T$ contains the phase information of the SSVEP. To avoid a possible ambiguity regarding the direction of the vector W_y, its components are estimated as: $\bar{w}_y^1={\rm sign}(-{\rm Im}(\mathcal {F}(s_{{\rm Oz}})(f)))|w_y^1|$ and $\bar{w}_y^2={\rm sign}({\rm Re}(\mathcal {F}(s_{{\rm Oz}})(f)))|w_y^2|$ , where s_Oz denotes the recordings from the Oz channel, and $\mathcal {F}(s_{{\rm Oz}})(f)$ is the Fourier transform of s_Oz. The phase is then estimated as ${{\rm Arg}}(\bar{w}_y^1+j\bar{w}_y^2)$ . The decoding algorithm from section A.1 in the appendix was applied for classification.

A.5. Method based on MLMVN

In [25] a multi-channel decoding method based on the multilayer feedforward neural network with multi-valued neurons (MLMVN) was proposed. A network with one hidden layer with four neurons in the input layer, ten in the hidden layer and one in the output layer was chosen as it was recommended in [28]. As input, complex values exp (jφ_m) were used, where the φ_m are the phases extracted from four selected channels. The selection was done by retaining the channels with the lowest (in the training data) averaged among all classes circular standard deviation.

Training was done based on the back-propagation algorithm described in [28], where $\theta _m=\exp ({\rm j}2\pi (m-\frac{1}{2})/N)$ were used as the desired outputs instead of class indicator complex values. The network was trained until the RMSE became lower than 0.1 radians. From the output O of the network, the resulting class index $\tilde{m}$ is deduced as an integer satisfying two conditions: $2\pi (\tilde{m}-1)/N\le {{\rm Arg}}\,O<2\pi \tilde{m}/N$ and $1\le \tilde{m}\le N$ .

Sampled sinusoidal stimulation profile and multichannel fuzzy logic classification for monitor-based phase-coded SSVEP brain–computer interfacing

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

Dates

Abstract

1. Introduction