Unsupervised frequency-recognition method of SSVEPs using a filter bank implementation of binary subband CCA

Md Rabiul Islam; Md Khademul Islam Molla; Masaki Nakanishi; Toshihisa Tanaka

doi:10.1088/1741-2552/aa5847

1. Introduction

A brain–computer interface (BCI), which utilizes neurophysiological signals for communication between a human and an external environment without depending on any peripheral nerve and muscle activities, has been studied for several decades [1–3]. In recent years, electroencephalogram (EEG)-based BCIs have received increasing attention from researchers in neural engineering, neuroscience, and clinical rehabilitation due to their noninvasiveness and relatively low cost devices. Various applications of BCIs have great potential to allow people with severe motor disabilities such as locked-in-syndrome (LIS), comas, and spinal cord injuries to communicate with other people [1, 4–6].

The recognition performance and speed of EEG-based BCIs have been significantly improved in the past decade that utilize various brain responses including sensorimotor rhythms, event-related potentials (ERPs), and visual-evoked potentials (VEPs) [7–10]. For example, Jin et al proposed ERP based BCI based on facial expression change to improve the performance [9]. The difference familiar face stimuli enlarged the N200 and N400 components by reducing actual error outputs [10]. However, further research still remains for its real-life applications. Recently, steady-state visual evoked potential (SSVEP)-based BCI has attracted more and more attention in the field of BCI due to its advantages of minimal user training, ease of system configuration, higher signal-to-noise ratio (SNR) and information transfer rate (ITR) [11–13]. An SSVEP is a periodic EEG response elicited by repetitive visual stimuli flickering at frequencies higher than 6 Hz [14, 15]. An SSVEP-based BCI enables a user to communicate with external devices by gazing at one of the multiple repetitive visual stimuli flickering at different frequencies. The target stimulus, which the user is gazing at, can be identified by analyzing the resulting SSVEPs.

In previous studies, to detect SSVEPs, advanced signal processing and machine learning techniques have been applied. For example, the targets can be identified by detecting the peak in power spectrum density (PSD) of EEG signals obtained by discrete Fourier transform (DFT) [15]. A canonical correlation analysis (CCA)-based recognition method, which was first introduced by Lin et al is very powerful in the frequency detection of SSVEPs [16]. The CCA-based method has shown significantly better recognition performance than that of the traditional PSD-based methods [16, 17]. However, a potential problem with this approach was that the canonical correlation tended to decrease as flickering frequency increased, leading to the decreased accuracy of SSVEP detection at higher frequencies [18]. Nakanishi et al have proposed normalized canonical correlation analysis (NCCA), which incorporates background EEG activities to improve the frequency recognition performance of CCA at a higher flickering frequencies [19].

In SSVEP-BCI studies, multiway CCA and multiset CCA have been proposed to optimize the reference signals by incorporating SSVEP training data into the CCA [20, 21]. Nakanishi et al also proposed an extended CCA-based method, which combined CCA-based spatial filtering and correlation analysis with single-trial SSVEPs and training reference signals obtained by averaging training set of SSVEP data [22]. Islam et al [23, 24] generated two levels of data-adaptive reference signals with dominant frequency, which were significantly improved at higher and lower frequencies; these levels were derived from the training set of real SSVEP signals rather than articial sine-cosine signals with several harmonics. Nevertheless, these procedures of recording calibration data are time consuming.

Chen et al have demonstrated a high-speed SSVEP-based BCI with the combined methods of CCA and filter bank analysis [25]. The underlying idea behind filter bank analysis is to extract independent information embedded in decomposed harmonic components. The accuracy of SSVEP frequency detection have increased with respect to the increasing number of subbands without any training data. Our recently published study proposed binary subband-based CCA (BsCCA), in which only two subbands are used to enhance the discriminability of SSVEPs with different frequencies in an unsupervised way [26]. The first subband contains the whole frequency band of the SSVEPs, and CCA is applied with the entire set of artificial reference signals. In second subband, the frequency components of SSVEPs originating from the desired number of lower-order stimulus frequencies are suppressed, and CCA is employed with a reduced dimension of references. Then, a weighted correlation of two subbands is used to improve frequency recognition performance. Although the above methods were proposed to enhance the discriminability of SSVEPs with different frequencies, they can be combined for further improvement.

This study performs a comprehensive comparison of unsupervised feature extraction methods, including CCA, NCCA, and BsCCA, which are incorporated with filter bank analysis. This type of analysis can consider the independent information embedded in the harmonics. A data-driven grid-search method was coordinated to optimize the parameters in NCCA and BsCCA, and optimal parameters are used for BCI experiments to demonstrate the efficacy of each method.

2. Methods

2.1. Dataset

In this study, the performance of different target identification methods was evaluated using an SSVEP dataset that has been publicly available [27]⁶. This dataset was collected under the approval by the Human Research Protections Program of the University of California, San Diego, and all participants signed a written informed-consent form before participating in this study. The dataset was acquired from 10 healthy subjects (9 males and 1 female; mean age: 28 years) using a Biosemi ActiveTwo EEG system (Biosemi, Inc.) with eight Ag/AgCl electrodes to cover the occipital area. Five of the subjects had experience with SSVEP-based BCI experiments, and the others were naive to attend an SSVEP experiment. The EEG signals were recorded with a sampling rate of 2048 Hz, and all electrodes were with reference to the CMS electrode close to Cz. Twelve visual stimuli were tagged with different frequencies (9.25–14.75 Hz with an interval of 0.5 Hz) and phases (0, 0.5π, π and 1.5π). The EEG recording consisted of 15 sessions. In each session, each subject was instructed to gaze at one of the stimuli indicated by a random order for 4 s and go though all 12 targets with an interval of 1 s. All data epochs were down-sampled to 256 Hz and then band-pass filtered between 6–80 Hz with an infinite impulse response (IIR) filter based on the Chebyshev Type I design. A second-order IIR notch filter was also applied to the data epoch to suppress the contamination of power-supply noise. Considering the latency delay of 135 ms in the visual pathway [22], the data epochs were extracted in [0.135 s 4.135 s], where the time 0 indicated the stimulus onset.

2.2. Filter bank analysis

In signal processing, subband decomposition by a filter bank analysis is used to divide a signal into a set of analysis signals that exhibit multiple subband frequency components [28, 29]. To extract different rhythmic components of an EEG signal (mu and beta band), Ang et al first integrated the filter bank with common spatial pattern (CSP) to enhance the classification accuracy of different motor imagery states [30]. To implement filter bank method in an SSVEP-based BCI, Chen et al proposed the use of band-pass filters to extract discriminative fundamental and harmonic frequencies from the original EEG signals with zero-phase Chebyshev Type IIR filters [25]. Three filtering approaches were defined to optimize the selection of subbands of SSVEP signals. The $n\text{th}$ subband started from the frequency at $n\times k$ Hz (where n is an index of the subbands and k is the starting frequency) and ended at a fixed maximum frequency (88 Hz in the configuration) [25]. In the implementation of bandpass filtering, an additional bandwidth of 2 Hz was added to both sides of the passband for each subband.

2.3. Feature extraction

2.3.1. CCA.

As a multivariate statistical method, CCA, which is a type of correlation technique for measuring similarity between two multivariate signals, has been widely used to detect the frequency of SSVEPs [31]. CCA is as an extension of the ordinary correlation suitable for dealing with direct processing between two random variables [16, 17, 31]. The CCA finds pairs of linear transformations called canonical variates for two sets of multidimensional variable so as to maximize the correlation between the two canonical variates.

Considering two sets of multidimensional variable, X and Y, and their linear combinations, $x=w_{x}^{T}X$ and $y=w_{y}^{T}Y$ , CCA finds the weight vectors, w_x and w_y, which maximize the correlation between x and y by solving the following problem:

$\begin{eqnarray}\begin{array}{*{35}{l}} \rho (X,Y) & =\underset{{{w}_{x}},{{w}_{y}}}{{\max}}\,\frac{E\left[x{{y}^{T}}\right]}{\sqrt{E\left[x{{x}^{T}}\right]E\left[\,y{{y}^{T}}\right]}} \\ {} & =\underset{{{w}_{x}},{{w}_{y}}}{{\max}}\,\frac{w_{x}^{T}X{{Y}^{T}}{{w}_{y}}}{\sqrt{w_{x}^{T}X{{X}^{T}}{{w}_{x}}w_{y}^{T}Y{{Y}^{T}}{{w}_{y}}}}, \end{array}\end{eqnarray} \tag{ 1 }$

where ρ is called the canonical correlation, XX^T and YY^T are the within-sets covariance matrices, and XY^T is the between-sets covariance matrix. The maximum of correlation coefficient ρ with respect to w_x and w_y is the maximum canonical correlation.

For SSVEP frequency recognition, X refers to the set of multi-channel EEG signals, and Y contains a set of reference signals with the same length as X. The frequency of the reference signals with maximal correlation is selected as the stimulus of the SSVEPs [16]. The reference signals are artificially generated with sine and cosine waves of all the stimulus frequencies and their harmonics. The artifical reference signals Y_k are derived in the following manner:

$\begin{eqnarray}&&{{Y}_{k}}=\left[\begin{array}{c} \sin \left(2\pi {{f}_{k}}t\right) \\ \cos \left(2\pi {{f}_{k}}t\right) \\ \vdots \\ \sin \left(2\pi {{N}_{h}}\,{{f}_{k}}t\right) \\ \cos \left(2\pi {{N}_{h}}\,{{f}_{k}}t\right) \end{array}\right],t=\frac{1}{F},\frac{2}{F},\ldots,\frac{P}{F},\end{eqnarray} \tag{ 2 }$

where f_k is the stimulation frequency, N_h is the number of harmonics, F is the sampling frequency, and P denotes the number of time points, respectively. The CCA calculates the canonical correlation ${{\rho}_{k}}=\rho \left(X,{{Y}_{k}}\right)$ as a feature of the target frequency by solving equation (1).

2.3.2. NCCA.

The detectability level of the standard CCA-based method tends to decrease as stimulation frequency increases. According to the previous study of Wang et al, the SNRs of SSVEPs in the alpha frequency band have comparable levels despite the fact that the amplitude of SSVEPs decreased while the stimulation frequency increased [32]. Therefore, a normalization procedure similar to the calculation of the SNR is useful to improve the stimulus frequency detection of SSVEPs in a higher frequency band. Nakanishi et al proposed NCCA for SSVEP frequency detection considering the background EEG activities within neighboring frequency bands [19]. In the NCCA-based method, normalized canonical correlation coefficients ${{\rho}_{k}}$ at frequency f_k are calculated as a feature for target identification:

$\begin{eqnarray}&&{{\rho}_{k}}=\frac{L{{\rho}^{\prime}}\left(\,{{f}_{k}}\right)}{\underset{l=1}{\overset{L}{\sum}}\,\left({{\rho}^{\prime}}\left(\,{{f}_{k}}+l\bigtriangleup {{f}_{k}}\right)+{{\rho}^{\prime}}\left(\,{{f}_{k}}-l\bigtriangleup {{f}_{k}}\right)\right)},\end{eqnarray} \tag{ 3 }$

where ${{\rho}^{\prime}}\left(\,{{f}_{k}}\right)$ represents the CCA between SSVEP and the artificial stimuli of frequency f_k, $\bigtriangleup f$ represents frequency resolution in normalization, and L is the number of neighbering frequencies on each side of f_k.

2.3.3. BsCCA.

To improve the performance of SSVEP-based BCI, Islam et al [26] proposed the BsCCA-based method to enhance the detectability of SSVEPs in a high frequency range. To compute the canonical correlation, a subset of reference signals and the subband components of SSVEPs were used. In addition, the full dimension of these reference signals and the original SSVEP of these subband components were used to increase classification performance. The proposed method called for BsCCA broadly consists of three steps: (i) the subband extraction of multichannel SSVEP signals in equation (4), (ii) the measure of CCA between subband components and corresponding subsets of artificial reference signals, and (iii) frequency recognition based on the measured correlation values.

In BsCCA, only two subband components are extracted from the multichannel SSVEP signals. This method is implemented using a zero-phase IIR-type filter. The first subband represented by ${{\hat{X}}_{1}}$ contains all the stimulus frequencies, including harmonics derived from the bandpass filtering of original SSVEP between 7 Hz and 88 Hz. The second subband contains the EEG-carrying frequency components of higher-order stimuli. The subband is obtained by filtering out the components (suppressing) that correspond to a desired number of lower-order stimulus frequencies. The second subband is obtained by applying a zero-phase highpass filter. The cut-off frequency of the highpass filter for SSVEPs is determined by a function of reference stimuli with a suppression parameter $0<\sigma \leqslant K$ , where σ (a suppression parameter) is the number of lower-order stimuli to be suppressed, and K is the number of stimuli. The lower cut-off frequency of the second subband $\hat{X}_{2}^{(\sigma )}$ can be expressed as

$\begin{eqnarray}&&{{\lambda}^{(\sigma )}}={{f}_{\sigma +1}}-\delta,\end{eqnarray} \tag{ 4 }$

where ${{f}_{\sigma}}$ is the $\sigma \text{th}$ stimulus frequency, and δ is an offset constant (here $\delta =0.5$ Hz). The EEG up to any desired number of lower-order stimulus frequencies can be suppressed using equation (4).

In this method, the second subband of SSVEPs only contains the frequency components of higher-order stimuli and, hence, a reduced number of reference frequencies can be used instead of all of them. The conventional method of CCA is applied to each subband component individually, yielding the correlation values between the subband components and the required subset of the artificial reference signals. The second subband only contains the SSVEPs of higher-order stimuli. Only the reference signals corresponding to the stimuli contained within the second subband are used in CCA computation. Because the constant offset is used, the adjacent lower-order reference stimulus is also kept in the subset of reference signal. The idea behind the reduced reference set is that the proposed subband filtering approach suppresses SSVEPs of lower-order stimuli keeping the higher orders and shrinking the classification domain for the second subband. Hence, the reduced reference set improves the recognition accuracy for higher-order stimuli of the second subband by reducing the misclassification probability. For instance, if $\sigma =2$ , only one stimulus frequency f₁ is discarded from the set of reference signals. Hence, the subset of artificial reference signals corresponding to the subband $\hat{X}_{2}^{(\sigma )}$ can be defined as [26]

$\begin{eqnarray}&&Y_{k}^{(\sigma )}={{ \Phi }^{(\sigma )}}\left({{Y}_{k}}\right),\end{eqnarray} \tag{ 5 }$

where ${{ \Phi }^{(\sigma )}}(\centerdot )$ is the function that excludes sinusoidal elements with frequencies less than ${{\lambda}^{(\sigma )}}$ and their harmonics. A graphical representation of a reference signal reduction is shown in figure 1.

**Figure 1.** Reduced reference signals for the frequency of five stimuli with $\sigma =3$ derived from Y_k for the BsCCA method.
Download figure:
Standard image High-resolution image

**Figure 1.** Reduced reference signals for the frequency of five stimuli with $\sigma =3$ derived from Y_k for the BsCCA method.
Download figure:
Standard image High-resolution image

The feature of target identification is performed by a weighted sum of two canonical correlation components. The first component is computed between the full dimension of artificial reference signals and the pre-filtered SSVEP signals with bandpass filtering between 7 Hz and 88 Hz. The second canonical correlation is obtained by applying CCA between the second subband of SSVEPs and the reduced set of reference signals. For the $k\text{th}$ stimulus frequency, two correlations, ${{\rho}_{1,k}}$ and ${{\rho}_{1,k}}$ , corresponding to two subbands, ${{\hat{X}}_{1}}$ and $\hat{X}_{2}^{(\sigma )}$ , can be defined as [16]

$\begin{eqnarray}&&{{\rho}_{1,k}}=\rho ({{\hat{X}}_{1}},{{Y}_{k}}),~{{\rho}_{2,k}}=\rho (\hat{X}_{2}^{(\sigma )},Y_{k}^{(\sigma )}),\end{eqnarray} \tag{ 6 }$

The feature of target identification could be defined as:

$\begin{eqnarray}&&{{\rho}_{k}}=\underset{s=1}{\overset{2}{\sum}}\,w(s){{\left({{\rho}_{s,k}}\right)}^{2}},\end{eqnarray} \tag{ 7 }$

where s is an index of the subbands. The weights for the subband components are defined as $w(s)={{s}^{-\alpha}}+\beta$ ; α and β are constants that maximize the classification performance as defined by [25]. In practice, α and β can be determined using a grid search method from offline analysis.

2.4. Target identification

Considering the distinct spectral properties of multiple harmonic frequencies, CCA can be represented as a subset of filter banks so that independent information from harmonic components has great potential to improve the accuracy of SSVEP frequency recognition [25]. It was reported in [19] and [26] that NCCA and BsCCA, respectively, can improve the classification results of standard CCA for SSVEP-based BCI. Therefore, the representation of NCCA and BsCCA as subsets of a filter bank can also improve performance. Figure 2 illustrates the procedure of the proposed target identification using filter bank analysis. After this analysis, a feature extraction method using either CCA, NCCA, or BsCCA was applied to each subband component separately to get a set of correlation vectors: $\rho _{k}^{(1)},\ldots,\rho _{k}^{(N)}$ . A weighted sum of squares for the correlation values corresponding to all subband components calculated the feature for target identification as the following:

$\begin{eqnarray}&&{{\hat{\rho}}_{k}}=\underset{n=1}{\overset{N}{\sum}}\,v(n){{\left(\rho _{k}^{n}\right)}^{2}},\end{eqnarray} \tag{ 8 }$

where the weight for $n=1,\ldots,N$ can be defined as $v(n)={{n}^{-\alpha}}+\beta$ , and N represents the total number of subbands. Finally, ${{\hat{\rho}}_{k}}$ , which corresponds to all stimulation frequencies (i.e. ${{\hat{\rho}}_{1}},\ldots,{{\hat{\rho}}_{12}}$ ), is considered to determine the frequency of the SSVEPs as

$\begin{eqnarray}&&\hat{f}=\text{arg}\,\underset{{{f}_{k}}}{{\max}}\,\;{{\hat{\rho}}_{k}},\;k=1,\ldots,K.\end{eqnarray} \tag{ 9 }$

**Figure 2.** This flowchart shows the different components of the proposed system for frequency recognition in an SSVEP-based BCI. The values ${{b}_{1}},{{b}_{2}},\ldots,{{b}_{N}}$ represent the subbands proposed in [25], and N represents the total number of subbands.
Download figure:
Standard image High-resolution image

**Figure 2.** This flowchart shows the different components of the proposed system for frequency recognition in an SSVEP-based BCI. The values ${{b}_{1}},{{b}_{2}},\ldots,{{b}_{N}}$ represent the subbands proposed in [25], and N represents the total number of subbands.
Download figure:
Standard image High-resolution image

2.5. Performance evaluation

In addition to classification accuracy, BCI performance was also evaluated by ITR [2]. The ITR is a standard measure of communication systems, which indicates the amount of information communicated per unit of time. The ITR can be expressed in the form below:

$\begin{eqnarray}&&\begin{array}{*{35}{l}} \text{ITR}=T\left\{{{\log}_{2}}K+A{{\log}_{2}}A+\,(1-A){{\log}_{2}}\left[\frac{(1-A)}{K-1}\right]\right\}, \end{array}\end{eqnarray} \tag{ 10 }$

where K is the total number of commands, A is the classification accuracy, and T (seconds/selection) is the average time for a selection. Depending on individual target, classification performances were calculated with different T (target gazing time: 0.5 s to 4.0 s with an interval of 0.5 s; gaze shifting time: 0.5 s). Since previous studies have presented online spelling experiments with a gaze-shifting duration of 0.5 s [22, 25], a 0.5 s interval was used to estimate practical ITRs; however, there was an interval of 1 s between two consecutive stimulations in the experiment. To evaluate the performance of different methods, feature values were evaluated across all subjects for different stimulation frequencies. In this study, the average computational time required for a single-trial target detection for each method was estimated. Computational time is the length of time required to perform CCA-based feature extraction and classification. Statistical analyses of each method were also conducted using MATLAB to compare the difference of classification accuracy.

3. Results

Two parameters were optimized based on a grid-search: L represented the number of neighboring frequencies in NCCA, and σ was a suppression parameter in BsCCA. The number of subbands in the filter bank method is another crucial factor for classification performance. The performances were observed in terms of recognition accuracy, the ITRs of NCCA with different values of L, and the number of subbands, as shown in figure 3(a). Note that the case when N = 1 with BsCCA corresponds to the standard BsCCA originally proposed in [26]. The average accuracy and ITR were calculated across all the subjects with 1 s data epochs. Maximum performance was achieved with (N = 2, L = 6). Similarly, the optimized parameters for the BsCCA approach accord with (N = 4, $\sigma =6$ ), as illustrated in figure 3(b). It is of note that no optimization parameter is required except the number of subbands needed for standard CCA with the filter bank approach. The average recognition accuracy with different methods across subjects is shown in figure 4. The optimal parameters are used in all cases. It is observed that the maximum accuracy of CCA is achieved with (N = 5). Thus, the selected optimal parameters were used to conduct the experiments in the performance evaluation and comparison.

**Figure 3.** Grid search for optimizing the parameters L, σ, and number of subbands N. Classification accuracy and ITR were averaged as a function of subbands L and σ, which were obtained by NCCA (a) and BsCCA (b), respectively, with 1 s data epochs. In both cases, the results for all 10 subjects were averaged.
Download figure:
Standard image High-resolution image

**Figure 4.** Average frequency recognition accuracy across all subjects obtained by CCA, NCCA (with L = 6), and BsCCA (with $\sigma =6$ ) as a function of subbands N. Error bars indicate standard errors.
Download figure:
Standard image High-resolution image

**Figure 4.** Average frequency recognition accuracy across all subjects obtained by CCA, NCCA (with L = 6), and BsCCA (with $\sigma =6$ ) as a function of subbands N. Error bars indicate standard errors.
Download figure:
Standard image High-resolution image

The average recognition accuracy and ITR across all subjects with data lengths from 0.5 s to 4 s are shown in figure 5. The parameters α and β used to calculate the weighting factor in equation (7) were set to 1.25 and.25, respectively, according to [25]. It can be observed that the BsCCA always outperforms the other methods. The performances of CCA and NCCA are quite similar. The comparison among the methods indicates that the performance of BsCCA is better than others over a wide range of data lengths of SSVEP.

The accuracy and ITRs of BsCCA is noticeably superior compared to other methods as a result of the following statistical analysis. To test the statistical significance of the methods, the independent variable was 'Condition' with three levels (CCA, NCCA, and BsCCA) and the dependent variable was 'Accuracy' at different time points. The p value was adjusted according to Bonferoni. Since the accuracy was estimated under three levels (i.e. methods) at different time points for the same subjects, we used two-way repeated-measures analysis of variance (ANOVA) to determine whether any change in the accuracy is more statistically significant. The result of ANOVA showed the significant effects of the method (F(2,18) = 11.50, p < 0.01) and data length (F(7, 63) = 110.66, p < 0.01). There was also significant interaction between the method and data length (F(14,126) = 8.42, p < 0.01). All methods achieved the maximum ITR when the data length was 1.5 s (CCA: $69.29\pm 32.76$ bits min⁻¹; NCCA: $69.44\pm 32.24$ bits min⁻¹; BsCCA: $77.04\pm 31.06$ bits min⁻¹). In the remaining simulations, a data length of 1.5 s was used for a performance comparison. Table 1 lists the classification accuracy and ITRs for each subject with a data length of 1.5 s. From the results of paired t-tests, BsCCA obtained significantly higher average accuracy across subjects than the other methods (BsCCA versus CCA: p < 0.001; BsCCA versus NCCA: p < 0.05), whereas there was no significant difference in accuracy between CCA and NCCA (p = 0.82). A similar scenario is observed for the performance in ITR. Consistently, the BsCCA-based approach achieves the highest performance for all subjects.

Table 1. Accuracy (%) and ITR (bits min $-1$ ) for each subject with 1.5 s data length.

Subjects	Evaluation	CCA	NCCA	BsCCA
s1	Accuracy	38.33	47.22	48.33
	ITR	14.74	22.84	23.95
s2	Accuracy	41.11	42.22	50.55
	ITR	17.12	18.11	26.24
s3	Accuracy	65	60.55	75.00
	ITR	43.20	37.58	57.26
s4	Accuracy	91.66	84.44	95.55
	ITR	86.49	72.70	95.07
s5	Accuracy	92.77	98.89	96.11
	ITR	88.83	103.75	96.40
s6	Accuracy	97.77	94.44	99.44
	ITR	100.63	92.50	105.48
s7	Accuracy	87.77	88.89	93.33
	ITR	78.79	80.92	90.03
s8	Accuracy	98.88	99.44	100
	ITR	103.75	105.48	107.55
s9	Accuracy	92.22	92.78	95.00
	ITR	87.65	88.83	93.77
s10	Accuracy	83.88	83.89	85.56
	ITR	71.72	71.72	74.68
Mean ± STD	Accuracy	78.94 ± 22.75	79.28 ± 21.33	83.89 ± 19.60
	ITR	69.29 ± 32.76	69.44 ± 32.24	77.04 ± 31.06

The average computational time across the subjects required in single-trial analysis for CCA, NCCA, and BsCCA methods with a different number of subbands are simulated as shown in figure 6. The time is measured using MATLAB R2014b on a MacBook Pro (with an Intel Core i5 processor and 8 GB of RAM). Note that all the processing, including optimized reference signal generation, are excluded from the computational time. The recognition time increases linearly with the number of subbands. In every case, NCCA requires the highest computational time, and the standard CCA requires the shortest time. As it was ensured that the required time depended on the number of subbands, it was found that NCCA requires extra time for normalization after computing the CCA. BsCCA requires a comparable amount of time to NCCA to apply the set of reference signals to binary subband components. For individual subbands, the standard CCA-based method does not require any further processing for frequency recognition, and it requires lower computational time.

**Figure 6.** Computational time (ms) for single-trial analysis across all subjects with 1.5 s data epochs as a function of subbands. Error bars indicate standard errors.
Download figure:
Standard image High-resolution image

4. Discussion

With statistical analysis, the performance of the proposed multiband BsCCA approach in terms of frequency-recognition accuracy, and ITR exhibit its superiority over the related techniques. This method demonstrates higher performance over a wide range of time lengths for all ten subjects used in the experiment. Correlation coefficients among each target and neighboring non-targets are used as a feature for classification to evaluate the performance of the proposed methods [27]. Therefore, the underlying reason for the improvement in recognition accuracy is that the proposed method extracts discriminative features among target and non-targets. Figure 7 illustrates the average feature values of SSVEPs across all the subjects for a stimulus frequency of 11.5 Hz. The range of the feature values for standard CCA, NCCA, and BsCCA is from 0 to 1. These values are calculated as a canonical correlation between the test data and the reference signals. The features of BsCCA are obtained by equation (8). Furthermore, it is evident from figure 6 that the extracted feature values of the BsCCA method improved discriminability between the target frequency and the non-target frequencies. The values of ρ remain at stable level for non-target frequencies compared to the other methods. Such characteristics suggest that the proposed method can effectively improve the recognition of SSVEP frequencies.

**Figure 7.** Average feature values of SSVEPs across all subjects at 11.75 Hz stimulation frequency with 1.5 s data epochs. Error bars indicate standard errors.
Download figure:
Standard image High-resolution image

The ITR is an important factor to measure the performance of a BCI. With a dataset of EEG for 12 visual targets, a comparison study for variants of CCA has been reported [27]. The ITR of CCA with unsupervised regulation for data length (2 s) achieved 50 bits min⁻¹. Some of the methods with supervision significantly improved the ITR of the standard (unsupervised) CCA (Cluster analysis of CCA coefficients [33]: 52.44 bits min⁻¹; Multiway CCA [20]: 64.15 bits min⁻¹; L1-reguralized Multiway CCA [34]: 65.06 bits min⁻¹; Multiset CCA [21]: 66.22 bits min⁻¹ for 1.5 s data length; and Individual Template-Based CCA (IT-CCA) [35]: 71.37 bits min⁻¹ for 1 s data length). The present ITR of the proposed, unsupervised method is close to the results obtained in the studies of combination method of CCA and IT-CCA (91 bits min⁻¹ [27]). Some of the studies showed a high ITR by increasing the number of targets and improving the accuracy of target selection [36]. Most of the algorithms for SSVEP-based BCI illustrated that higher ITR was achieved for a 1- or 2 s duration. For example, the simulated ITR of the 32-target speller reported by Nakanishi, which used a mixed frequency and phase coding method, was 167 bits min⁻¹ [22]. Chen et al reported an ITR of 105 bits min⁻¹ in a 45-target system [36]. Bin et al proposed a code-modulated VEP (c-VEP) paradigm, which reported an ITR of 108 bits min⁻¹ in a 32-target speller [35]. However, such types of BCI system, which always requires pre-recorded data for individual calibration, and user gaze need to be synchronized with the system. In contrast, pre-constructed signal sets are used as reference signals in the unsupervised method, which is more convenient for real application than other BCI systems.

The average frequency recognition accuracy of the proposed method exhibits stable results for a wide time window (i.e. from 0.5 s to 4.0 s [ $\sigma =6$ , N = 4]; see figure 5). NCCA provided stable accuracy in longer data lengths but lower accuracy in shorter data lengths. Notably the computational time of CCA with a filter bank is increased with an increasing number of subbands. Since the longest computational time in all conditions is around 20 ms, all the methods could be executed in nearly real-time in an online system. In addition, as these methods do not require calibration data, the system can be used without any training procedures. For example, Nakanishi et al suggested that more than five trials of training data should be recorded to achieve higher performance using CCA with calibration data than using other methods [22]. However, assuming that 5 trials of 12 visual stimuli are recorded at 5 s each, the recorded training data would be around 5 min long. The BsCCA-based method could facilitate this procedure without compromising performance. In this way, the BsCCA could improve the usability of an SSVEP-based BCI system.

Further research is required to increase the number of stimuli without compromising performance toward various kinds of practical real-world BCI applications. The influential parameters for the BsCCA method were selected by a grid search. The parameter is required to adjust its value in a data-adaptive nature. SSVEP responses to high frequency visual stimuli are also important to reduce the fatigue of patients [37–39]. We plan to conduct an extensive study using high-frequency stimuli with the multiband BsCCA method to resolve such problems. In recent study, the advanced signal processing method based on CCA with individual calibration data drastically improved the performance of an SSVEP-based BCI [40]. The method, however, requires training session for recording users calibration data before online operation, which is time consuming. In another studies, hybrid methods using multiple EEG responses (e.g. SSVEP and P300) has been also successfully employed to improve the performance of BCI [41, 42]. The proposed BsCCA-based method could be integrated to these hybrid BCI to enhance the accuracy of SSVEP detection without any training data. Thus, there are several avenues for further research to implement high-speed BCI speller with an increased number of commands.

Acknowledgments

This work was supported in part by the Japan Society for Promotion of Science (JSPS) under KAKENHI, Grant Number is 15H04002.

Unsupervised frequency-recognition method of SSVEPs using a filter bank implementation of binary subband CCA

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

Dates

Peer review information

Abstract

1. Introduction