An SSVEP-based BCI with 112 targets using frequency spatial multiplexing

Objective. Brain–computer interface (BCI) systems with large directly accessible instruction sets are one of the difficulties in BCI research. Research to achieve high target resolution ( ⩾ 100) has not yet entered a rapid development stage, which contradicts the application requirements. Steady-state visual evoked potential (SSVEP) based BCIs have an advantage in terms of the number of targets, but the competitive mechanism between the target stimulus and its neighboring stimuli is a key challenge that prevents the target resolution from being improved significantly. Approach. In this paper, we reverse the competitive mechanism and propose a frequency spatial multiplexing method to produce more targets with limited frequencies. In the proposed paradigm, we replicated each flicker stimulus as a 2 × 2 matrix and arrange the matrices of all frequencies in a tiled fashion to form the interaction interface. With different arrangements, we designed and tested three example paradigms with different layouts. Further we designed a graph neural network that distinguishes between targets of the same frequency by recognizing the different electroencephalography (EEG) response distribution patterns evoked by each target and its neighboring targets. Main results. Extensive experiment studies employing eleven subjects have been performed to verify the validity of the proposed method. The average classification accuracies in the offline validation experiments for the three paradigms are 89.16%, 91.38%, and 87.90%, with information transfer rates (ITR) of 51.66, 53.96, and 50.55 bits/min, respectively. Significance. This study utilized the positional relationship between stimuli and did not circumvent the competing response problem. Therefore, other state-of-the-art methods focusing on enhancing the efficiency of SSVEP detection can be used as a basis for the present method to achieve very promising improvements.


Nomenclature
Hm, H N (v i m ) , Hc, Hidden layer features of nodes of Gm, N (v i m ), and Gc.

R k
Reference template for the kth stimulus frequency.S i sb i sb th subband filtered EEG signal.T emb m Temporal embedding layer parameter in Gm.

Wα
Edge weights of attention aggregators.
Xm, X N (v i m ) , Xc, Features of Gm, N (v i m ) and Gc.

Yc
Predicted output of DSGAT.

Gm, Gc
Intermediate and minimum graphs.
The 1-step neighbors of node v i Wα Softmax normalized Wα. ϱ k FBCCA score at frequency f k .c ij jth minimum cell in M i .k Index of attention aggregator groups.l Index of DSGAT layers.L ij L-shaped region of c ij .

Introduction
The research on improving brain-computer interface (BCI) performance mainly takes two strategies: enhancing classification algorithms and designing more effective BCI paradigms.An effective paradigm can maintain more targets while eliciting sufficiently strong brain response activity.The design of a new paradigm seeks to build a larger instruction set to improve the efficiency of target selection, but the increase in the target number usually enhances classification difficulty.Therefore BCI systems need to make a compromise between the number of targets and classification performance [1,2].In terms of instruction set size, evoked potential-based BCI has an advantage over spontaneous BCI and thus is often applied to applications that require a large number of options, such as BCI-based spellers [3] or scenarios that require more refined intentions [4].Steadystate visual evoked potential (SSVEP) based BCI usually has a higher information transfer rate (ITR) and larger instruction set than other BCI paradigms [5,6].When the human eye is stimulated by periodic flicker, the occipital region of the brain generates a modulated signal of the corresponding frequency, which produces energy enhancement at the frequency or the second or third harmonics [7].
The frequency range capable of evoking SSVEP components can be roughly divided into three intervals, the low-frequency band (6-12 Hz), the middlefrequency band (12)(13)(14)(15)(16)(17)(18)(19)(20)(21)(22)(23)(24)(25)(26)(27)(28)(29)(30), and the high-frequency band (30-60 Hz).The harmonic nature of SSVEP narrows the range of available frequencies for an SSVEP paradigm.On the other hand, there is a limit to the minimum difference between neighboring frequencies that can be discriminated, and the smaller the difference, the greater the difficulty posed to the classification algorithm.Therefore, the frequency band used for SSVEP-BCI is limited [8].SSVEP paradigm innovations in recent years have focused on the design of mechanisms dedicated to generating more target options using limited stimulation frequencies.Hwang et al proposed a dualfrequency stimulation method in which the black and white pattern in the checkerboard paradigm is set to flip at two different frequencies, thus enabling an SSVEP-BCI with 12 options using four frequencies [9].Chen et al used three luminance variation frequencies combined with two color variation modulation frequencies to evoke an intermodulation frequency response, achieving three frequencies to produce eight targets [10].Kimura et al devised a new way of encoding stimuli using binary digits encoding of different frequency sequences to increase the number of visual stimuli with different characteristics [11].Liang et al proposed a new dual-frequency and phase modulation paradigm to optimize the combinations of two frequencies in the checkerboardlike dual-frequency paradigm [12].Chen et al implemented an SSVEP paradigm with 160 targets using the idea of multiple frequency sequential coding, in which a stimulus sequentially flickers at different frequencies [13].
To summarize these approaches, the main ideas for increasing the number of targets focus on (1) multiplexing stimulus frequencies, including temporal and spatial multiplexing, and (2) increasing the heterogeneous properties of same-frequency stimuli.Time-division multiplexing comes at the cost of increased time, In contrast, spatial multiplexing has the potential to further improve efficiency by being able to utilize different frequency information at one time and is, therefore, an approach worth exploring.
In addition, when a subject focuses on a target flicker, its neighboring stimuli also evoke SSVEP components.Therefore, previous studies typically increase the distance between adjacent stimuli and require subjects to reduce their attention to neighboring distractors [14][15][16][17][18].In contrast, we reverse the use of the competition mechanism and propose a new method to produce more targets with limited frequencies.
This paper proposed a stimulus frequency spatial multiplexing method to design an SSVEP-BCI paradigm with 112 targets.The paradigm does not design for individual stimulus targets, but rather applies the location relationship between different stimuli to increase the attribute differences of targets with the same frequency.Specifically, different stimuli of the same frequency are in different locations, and the frequencies of the stimuli adjacent to them are different.This compound location relationship is used to encode targets, thus expanding 40 stimulus frequencies into 112 targets.

Frequency spatial multiplexing paradigm 2.1.1. Neurological principles
SSVEP responses are usually strongest for the stimulus located in the center of the visual field, with an approximately Gaussian distribution gradually decaying outward [7].Fuchs et al investigated the competitive neuronal dynamics in cortical networks of early visual processing in the human brain utilizing two frequency stimuli.The results showed that during attention to the target stimulus, competing stimulus proximity (visual angle less than 4.5 • − 5 • ) leads to an increase in the amplitude of the competing frequency response accompanied by a significant enhancement of the intermodulation frequency, but the intensity of the target frequency response is not diminished because the attentional mechanism releases the inhibitory effect on the target stimulus [14].This mechanism of integrated neuronal processing of the target and competing stimuli provides the theoretical basis for this study, i.e. by designing the visual stimulus layout to match targets of the same frequency with different competing neighbors, forming a neighbor encoding where all visual stimuli serve as targets and competing stimuli for each other, thus making full use of the available frequencies.

Paradigm design
A total of 40 frequencies were used, ranging from 8 Hz-15.8Hz, in 0.2 Hz steps.Figure 1  The interface contains 160 minimum stimulus cells, but the cells outside the red box are not sufficiently distinguishable due to the lack of neighbors, so they serve as competing stimuli to assist in locating the inner cells.Therefore, the actual effective divisible cells are the inner 8 × 14 region, i.e. 112 divisible targets (the red boxed area in figure 1).The visual stimuli were presented using luminance sinusoidal variation, and the phase within the intermediate cells was consistent, with a phase difference of 0.5π between adjacent intermediate cells.

Target fixation method with a regional attention
In this paper, we used a target fixation with a regional attention approach, which gives selective attention to the neighborhood stimuli while gazing at the target.Taking the AA paradigm as an example (figures 1 and 2(a)), when gazing at a target c ij , the eight nearest neighbors around are the competing stimuli closest to the visual field center, which can theoretically evoke stronger SSVEP responses.Among these neighbors, the stimuli that can distinguish c ij from the other three corners (c iq , q ̸ = j) is the L-shaped region L ij in the figure.Therefore, the simultaneous regional attention method is to give selective attention to L ij while gazing at c ij .Due to a large number of targets in the paradigm, it is difficult to quickly locate c ij in M i and the corresponding L ij , and the target cell may even be lost during the selection process.To assist in locating the target, the colors of the intermediate cells were set to alternate between white and yellow in the experiments, as shown in figure 2.

Target classification
The target classification process consists of two steps:

SSVEP detection of intermediate cells
The filter bank canonical correlation analysis (FBCCA) algorithm [19] was used.FBCCA is a classical and efficient classification algorithm in SSVEP-BCIs, which decomposes the EEG signal into several sub-band components by frequency bands, calculates the correlation coefficient of each sub-band separately using CCA, and then classifies the weighted features.The 12-channel EEG signals are divided into subbands by bandpass filters, each with the same upper cutoff frequency (88 Hz) and a different lower cutoff frequency.For the i sb th subband, the lower cutoff frequency is i sb × 8 Hz, the i sb th subband filtered EEG signal is S i b .The sinusoidal reference template R k ∈ R 2N h ×Ns for the kth stimulus frequency is: where f k is the kth frequency, N h is the number of harmonics, N s is the signal length, t is the current time point, and F s is the sampling rate.CCA finds a set of weight vectors α and β to maximize the correlation between α T S i sb and β T R k : where N sb is the number of filter banks and N f is the number of stimulus frequencies.The weighted sum of the squared correlation coefficients of all subbands is used as the correlation characteristic with the kth frequency, and the weight of each subband is where the optimal values of a and b have been optimized in the study [19] and set to 1.25 and 0.25, respectively.The response score of an EEG sample at frequency f k is

Minimum target classification algorithm
We used a graph neural network (GNN) to classify the intermediate cell response results further.GNN abstracts irregular data as nodes, and each node establishes a relationship with its neighbor nodes through edges; thus GNNs can capture the complex structural relationships in graphs [20][21][22].Graph Convolutional Network (GCN) is a representative GNN, especially the spatial-based GCN borrows the idea of the convolutional neural network to define graph convolution based on the spatial relationship of nodes, which is essentially a process of iteratively aggregating neighborhood information.
The key to the minimum target classification of the proposed paradigm is the SSVEP response distributions of competing neighbors.Therefore, the properties of GCN neighbor aggregation fit well with the paradigm design.We designed a dual-scale graph attention network (DSGAT) for the global 112 classification problem.
The feature vectors of all intermediate nodes form X m .Thus the number of feature channels N Tm of X m is the number of SSVEP detection times during the whole stimulus process.The first-order neighbors of node v i are denoted as N (v i ), and the feature matrix of N (v i ) is denoted as X N (v i ) .There is a corresponding relationship between V m and V c , consistent with the inclusive relationship of the two types of cells in the paradigm.

Structure of DSGAT model
Figure 4 illustrates the architecture of DSGAT, consisting of an aggregation module, a fully connected layer, and a softmax output layer, where the aggregation module consists of a temporal embedding layer and a multi-head graph attention layer.The core idea of DSGAT is summarized in three points.
(i) Operations on the graph are performed sequentially at G m and G c , so that feature information flows from coarse-grained nodes to finegrained nodes, corresponding to the refinement process of SSVEP responses to global target localization.(ii) The relationship between the distributions of SSVEP responses to different competing stimuli and target localization is captured using a multi-head graph attention mechanism to cope with the contradiction between repetition and heterogeneity in the paradigm.(iii) A temporal embedding layer is added to handle the imbalance of SSVEP responses in the time dimension.
The graph structures of the three example interfaces are not identical, and the network structures here are exemplified by the AA paradigm (figure 2(a)).
The network input is the SSVEP detection results at all frequencies, and the input features are first subjected to a neighbor aggregation operation on G m .For the AA paradigm, the 1-step neighbors of an intermediate node are its 8-neighborhood intermediate cells as well as itself.Unlike G m , the nodes in G c have an additional attribute: the location detection of the minimum cell in the intermediate cell, and thus there are four location attributes.Therefore, X m is fed into four sub-networks with the same structure but different parameters to distinguish four minimum nodes.The output of the pth (p = 1, 2, 3, 4) sub-network is passed to the pth group of minimum nodes, thus allowing the information to flow from the intermediate nodes to the more numerous minimum nodes.
The SSVEP detection results by FBCCA algorithm have fluctuating amplitudes over time, and the stimulus duration affects the accuracy of SSVEP detection.Therefore, a temporal embedding layer is designed before aggregation, which creates a learnable temporal embedding matrix |×NT m to compensate for this temporal disequilibrium.For the pth sub-network, the embedded feature matrix of node v i m and its 1-step neighbor nodes is where • is the Hadamard product operation, σ ReLU is the ReLU activation function.
The embedded feature signals are then neighborhood aggregated through the graph attention layer.There are two types of relationships between the intermediate nodes v i m and v j m ∈ N (v i m ): frequency relationship and location relationship.When gazing at a target in SSVEP-BCI, the responses of its competing stimuli are also enhanced in both the spatial and frequency domains.For v i m , v j m ∈ N (v i m ) have a regular arrangement in space and frequency, with leftright, up-down, and diagonal neighbors symmetric to each other and have equal frequency differences to the target stimulus.
An intermediate node is related to four minimum nodes, and when a minimum node is gazed at, the SSVEP responses of neighboring intermediate nodes in the direction of other minimum nodes are also affected; thus, the target localization needs to consider all 1-step neighbors of the intermediate node.
The influence of neighboring intermediate nodes varies from different minimum nodes so the attention mechanism is applied to learn this difference adaptively, with different weights on the edges.
The global localization from 40 frequency responses to 112 targets is essentially a 4-classification problem.The four feature distributions of the intermediate nodes and their 1-step neighbors determine the classification results of the four minimum nodes.We used a multi-headed graph attention layer to learn the four distribution patterns (unlike the standard graph attention layer, DSGAT restricts the attention calculation to 1-step neighbors, and the aggregation is also varied).Each sub-network uses K a independent Suppose the input feature vector of the central node v i m at lth layer is H i m (l) and the weight coefficient of the neighbor node v j m to v i m is α ij , then the parameters of layer l are For the kth (k = 1, . . .K a ) group of attention aggregators, the edge weights are softmax normalized, and the new coefficients are The new feature vector of node v i m is obtained by weighted summation of neighbors: σ LReLU is the LeakyReLU activation function.The K a group outputs are combined by the concatenation operation into an aggregated feature vector of the pth group of minimum nodes: where || is the concatenation operation.To avoid over-smooth, the new feature vector of minimum nodes are obtained by skip concatenating the preaggregation and aggregation feature vectors: After the information is propagated from the intermediate nodes to the minimum nodes, the subsequent computation is performed on G c .The features of the aggregation operation are fed to a fully connected layer and a final softmax output layer.The final predicted output is: where Tc is the number of feature channels after aggregation, and W f and b f are the fully connected layer parameters.
The aggregation operation in DSGAT is consistent for all nodes, but the nodes located at the outermost circle of the paradigm lack sufficient neighbors.For this case, borrowing from the padding method in convolutional neural networks, virtual nodes are constructed to fill the missing neighbors.Suppose a virtual node used for padding is v pad , and the initial feature vector is X pad m ∈ R NT m .The tth value of X pad m is the minimum value of the tth channel of X m .

Subjects
Eleven healthy subjects participated in the experiment, including six males, aged 23-32 years old, mean aged of 26.6 years, and all participants had normal or corrected normal vision.Two subjects had experience with SSVEP-BCI experiments, and the others did not.This study was performed in accordance with the Declaration of Helsinki.This human study was approved by The Ethics Committee of the Xiangya Hospital of Central South University-approval: 2021 111 249.All adult participants provided written informed consent to participate in this study.

Experiment process
The visual paradigm was presented on a 27 inch LCD monitor with a 240 Hz refresh rate and a 1920px × 1080px resolution.Frequency stimulus presentation was implemented using the PsychoPy toolkit [23], with 40 frequencies set to 8-15.8 Hz at 0.2 Hz intervals.The experimental procedure was controlled using the BCI2000 platform [24].Subjects seated at a distance of 60 cm from the monitor, with the horizontal viewpoint at the monitor center.The single minimum cell spanned 2.81 • horizontally and 2.53 • vertically and was separated from adjacent cells by approximately 0.65 • and 0.59 • in horizontal and vertical viewing angles, satisfying the competing stimulus condition.
Each subject performed at least one training experiment before the formal experiments to understand the experimental process and adapt to the new paradigm.In the formal experiments, each subject performed three sessions of experiments corresponding to the three paradigms.Each session consists of two groups, with 112 trials in each group, i.e. one data collection for each target in the effective area.112 trials were divided into 12 blocks, each block containing ten trials (the last group contained two trials).The rest period between groups was determined by the subjects themselves according to their status.Each trial consisted of a visual cue and a stimuli flicker phase.In the target cue phase, all stimuli were presented at maximum luminance, and the target to be selected was covered by a red square for 2 s.During the flicker phase, all stimuli started to flicker for 6.5 s.The target cue turned from a solid square to a red wireframe during the stimulus phase.To improve the subjects' attention level and their sense of control over the experiment, we used a real-time feedback mechanism.Since the global classification parameters had not been trained yet, and the FBCCA algorithm for SSVEP detection is an unsupervised method, the current detected 40 classification results were fed back in real-time during the flicker phase.SSVEP detection is performed every 0.2 s.The intermediate cell corresponding to the maximum value of each FBCCA score is marked with a red dot.To avoid the effect of experimental sequence, each subject performed a session within one day, and the order of trials was arranged in a randomized form.Before the formal experiment, subjects were asked to perform several practice trials to familiarize themselves with the paradigm, the gaze method, and the experimental process.During the cueing phase in the practice experiment, the target minimum cell was marked by a red square and its adjacent L-shaped area was framed by a red wireframe, indicating the area to which the subject needed to pay attention simultaneously while gazing at the target.The target cue turned to a red box during the flicker phase, and the attention cue disappeared.
For DSGAT training, the data collected for one trial is one sample, and the data of a subject is divided into training and test sets in the ratio of 8:2.The network model uses two attention operations (K a = 2) for each sub-network and contains a 1-layer aggregation module.The model is trained with an Adam optimization algorithm to minimize the crossentropy on the training data for 100 epochs.The initial learning rate is 0.1 with a decay rate of 0.7 after every 20 epochs.The batch size is 128.
In addition, we used two baselines for comparison with DSGAT, using global classification accuracy as the evaluation metric.

Baseline 1
An unsupervised method to locate the minimum cell in the intermediate cell based on the SSVEP detection results.For an EEG sample, the intermediate cell detected by FBCCA at t is For c ij in M tg , the sum of the SSVEP response scores of c ij and its L-shape region neighbors at moment t is taken as the new feature value of c ij .The minimum cell with the highest number of maximum feature values over a period is taken as the final global target.

Qualitative results of SSVEP responses
We first verified whether the frequency response components of the target and competing stimuli could be evoked simultaneously in the EEG signal by timefrequency analysis.For ease of observation, the paradigm is simplified to a schematic diagram with only the intermediate cells, as shown in figure 6(a).It is clearly observed from every subplot that the 11.6 Hz peak is evoked, while amplitude peaks at competing and harmonic frequencies are also observed.For c 19,1 , the spectra show amplitude peaks at the target stimulus frequency 11.6 Hz, the 2nd harmonic frequency of 23.2 Hz, and the competing stimulus frequency of 9.8 Hz.The 9.8 Hz stimuli located in the neighboring L-shaped region L 19,1 , whereas the amplitudes of competing frequencies farther away do not show a significant increase.Similarly, for the target c 19,2 , an amplitude peak occurs at 10.4 Hz in the L 19,2 region, while the neighboring competing stimulus responses for c 19,3 and c 19,4 show peaks at 13.2 Hz and 13.4 Hz, respectively.
In addition, the spectra of c 19,1 and c 19,2 also show amplitude peaks at frequencies such as 20 Hz (c 19,1 ) and 19 Hz (c 19,2 ).These peaks may be the intermodulation and harmonic components of competing stimulus frequencies.The 20 Hz peak in c 19,1 may be the 2nd harmonic of the 10 Hz stimulus, or intermodulations of different frequencies, such as the sum of 10.2 Hz and 9.8 Hz, or the sum of 8.6 Hz and 11.4 Hz.The 19 Hz peak in c 19,2 may be caused by the complex intermodulations, such as the sum of 8.8 Hz and 10.2 Hz.
These results suggest that the competing response enhancement has different distributions when different minimum cells in the same intermediate cell are targeted, which initially verifies the validity of the proposed paradigm.The enhancement is not limited to 1-step neighbors; the 2-step neighbors in this example also show different degrees of amplitude increase, such as 9.6 Hz and 10.4 Hz.This phenomenon may be caused by a combination of frequency, spatial distance, and subjects' selective attention.
Further, we use the FBCCA scores to analyze the time-frequency pattern, still using the four examples above, and the results are shown in figure 7. The upper panels show the FBCCA scores of each frequency at different time points.The scores for all frequencies at the beginning phase are higher than those afterward (overfitting due to the short signal length [25]), resulting in unintuitive visualization of the stabilization phase later.Thus the figure shows the normalization results of all frequency scores at each moment.we randomly selected the FBCCA detection results at 4.6 s as an example for visualization, as shown in the lower panels, which display the scores according to the arrangement of intermediate cells.
It can be seen that all plots have the strongest response at 11.6 Hz, and the SSVEP responses are gradually ordered from about 1 s onwards.Two frequencies, 11.4 Hz and 11.8 Hz, also show higher response enhancement than the other frequencies because they are closest to the target 11.6 Hz in both frequency and spatial distances.This dual effect of space and frequency is not limited to 11.4 Hz and 11.8 Hz, as the 2-step neighbors with a 0.4 Hz frequency difference also have different degrees of response enhancement.In addition, the neighbors located on the left and right sides are slightly different from each other, and intuitively the response on the side of the minimum cell is slightly higher than the other side.Other competing stimuli also have response enhancement, and the degree of enhancement and the spacial distance to the target minimum cell showed correlations, such as 9.  compared to the target frequency and fluctuate over time, which is related to the fixation method with regional attention, with selective attentional wandering leading to instability of competing neighbor responses.
To summarize the time-frequency analysis, the target frequency evokes the strongest SSVEP response, and the 1-step neighboring intermediate cells could elicit higher responses than other stimuli, especially in the direction of the minimum target cell.Enhanced responses are also observed for 2-step neighbor stimuli, but are generally lower than for 1step neighbors.This subsection is just a qualitative analysis of the SSVEP response of the intermediate cells through an example to visualize the feasibility of the method.The performance of the proposed method needs to be statistically analyzed further.

Statistical results of SSVEP detection
Figure 8 shows the intermediate cell classification accuracy-data length curves for each subject, with accuracy calculated every 0.2 s for 224 trials in each paradigm.This metric reflects the performance of the first step of signal processing.A higher accuracy result only represents a more accurate identification of the intermediate cell containing the target, and does not reflect the responses to the competing frequencies.Overall, for three paradigms, the accuracy tends to increase with data length.In the beginning, the accuracy increases rapidly and basically stabilizes.
In terms of accuracy, most of the subjects were able to achieve high accuracy, with some subjects achieving a maximum accuracy close to 100%.There are also individual subjects, such as S8, with an average classification accuracy of only 67.97% for the 40 intermediate cells.In terms of response speed, there were individual differences between subjects, but basically they all stabilized from about 2 s.Thus reliable SSVEP detection results were obtained most of the time throughout the stimulation, providing a data base for subsequent global classification.It can also be observed in the results of some subjects that the accuracy shows a slight decrease in the later stage.This phenomenon was particularly obvious for S6, which shows a decreasing trend after the accuracy reached the highest point.The reason may be that, on the one hand, the proximity of the stimuli and their mutual influence make detection difficult, but the main reason should be the subjects' selective attention operation, which indicates that the method of accompanying attentional fixation puts requirements on the subjects.It is potentially possible for the subjects to further enhance the efficiency of the method in this study by training and mastering the skill.
In terms of the of variability in the performance across the three paradigms, we record the statistical accuracy for each subject in table 1 (SSVEP columns).The statistical accuracy was calculated as follows: for each 6.5-s sample, the sequence of SSVEP detection results was counted (every 0.2 s), and if the frequency with the highest number of occurrences was consistent with the ground-truth frequency, the sample result was considered correct.The 11 subjects' average accuracies for AA, NA, and SA are 95.01±7.77%,94.30±9.34%,and 90.80±10.67%,respectively.One-way repeated measures analysis of variance (RANOVA) was used to test the difference in the classification performances across paradigms, with a Greenhouse-Geisser correction and statistical significance defined as p < 0.05.The one-way RANOVA shows that there is a statistically significant difference in accuracies among the three paradigms [F(2,20) = 4.397, p = 0.047].In the AA and NA paradigm, there are eight competing stimuli around an intermediate cell, and in the SA paradigm, the number is six.We expected that the reducing competing stimuli would promote the efficiency of SSVEP detection, but the experimental results were contrary to our expectations.We also note that the accuracy of S9 in the SA paradigm has a gap with AA and NA paradigms.The reason may be that the row stagger design of the SA interfered with the subjects' observations, thus affecting the SSVEP detection.

Minimum target classification results
Table 1 records the minimum target classification accuracy of each subject applying three paradigms and using different algorithms.The minimum target classification accuracy for each subject is a 5-fold cross-validation average.The table shows the SSVEP detection accuracy, the accuracy results of the DSGAT method, two baselines, and the DSGAT without temporal embedding (w/o T emb m ). Figure 9 shows the average accuracy histogram of the three paradigms using different algorithms.It can be seen that the DSGAT approach achieves higher accuracy in three paradigms (AA: DSGAT vs BS1, p < 0.001, DSGAT vs BS2, p < 0.01; NA: DSGAT vs BS1, p < 0.001, DSGAT vs BS2, p < 0.05; SA: DSGAT vs BS1, p < 0.001, DSGAT vs BS2, p < 0.005).For the three paradigms, the average 112 classification accuracies across all subjects using DSGAT are 89.16%,91.38%, and 87.90%, indicating that all three paradigms are feasible and effective.For each subject, at least one paradigm was able to achieve a classification accuracy of over 70%, and the highest accuracy of each subject (using any paradigm and approach) reaches an average of 92.84%, with subject S11 being able to achieve 99.13% using the AA and NA paradigms.
Comparing the three paradigms, it can be concluded from the overall average results that the NA paradigm obtains the best performances using different algorithms, the AA paradigm comes second, and the SA paradigm has the lowest average classification accuracy.But the one-way RANOVA shows that there is no statistically significant difference in the accuracies among the three paradigms [F(2,20) = 2.228, p > 0.05].
Compared to the SSVEP detection accuracies of intermediate cells, the classification results of the minimum cell by DSGAT do not show obvious correlation with them, with a Pearson correlation coefficient between the two types of accuracies of -0.206, i.e. the SSVEP detection accuracy does not directly affect the results of DSGAT.Especially for S8, the SSVEP detection accuracies under the three paradigms are 73.48%,67.83%, and 62.61%, while the accuracies obtained by the DSGAT algorithm are 90.00%,92.17%, and 93.91%, respectively.This interesting result indicates that DSGAT is able to compensate for the detection error of FBCCA.In the case of unsatisfactory SSVEP detections, it is still possible for the DSGAT algorithm to learn the correct mapping relation from the intermediate cell response distribution.
Compare the performance of different global localization methods.BS1 is an unsupervised method that intuitively reflects the approximate distribution of competitive stimulus responses.This method obtained the lowest accuracy, indicating that it is difficult to comprehensively capture the underlying relationships and distribution patterns among competing stimuli by analyzing them in a piecemeal fashion.Some subjects can achieve an accuracy of 80% or 90% or even higher through the unsupervised method, while some others are much lower; for example, the data of S4 has excellent separability, which directly demonstrates the feasibility of the paradigm.BS2 has higher accuracy compared to BS1, and a few subjects are able to achieve results close to those of DSGAT.Compared to the two baselines, DSGAT is more stable and has the smallest standard deviation over 5-fold cross-validation, while the baselines are more affected by the data.For example, for S2 and S8, both subjects have very limited baseline results, but achieve good results with DSGAT.
Comparing the three algorithms, two baselines perform 4-classification on the basis of a single intermediate cell screened by FBCCA, and thus global localization is directly affected by the SSVEP detection.In contrast, DSGAT applies global SSVEP response information, which makes it possible to analyze subjects' SSVEP distribution preferences and thus locate the correct target minimum cell, even if the intermediate cell is mislocalized.We further analyzed the relationship between the two-step localization results.For DSGAT and the two baselines, we calculated the proportion of samples with correct global localization (TP C ) among those with correct SSVEP detection results (TP M ), denoted as P(TP C |TP M ), and among those with incorrect SSVEP detection results (TN M ), denoted as P(TP C |TN M ), respectively.The results are recorded in table 2. It can be seen that DSGAT can accomplish correct global localization even in the case of incorrect SSVEP detection, while the baselines must base on the right intermediate cell.
Temporal embedding operation was added to the DSGAT network.To verify the impact of this operation, we removed the temporal embedding layer to perform the ablation study.The results are presented in table 1.The temporal embedding operation improved the average accuracy of all subjects by 2.56%, 3.71%, and 3.46% for AA, NA, and SA paradigms, respectively, demonstrating that the temporal embedding further exploits the process information of SSVEP detection to optimize the classification performance.DSGAT w/o T emb m outperforms two baselines and can get even higher accuracy than DSGAT for some subjects, further demonstrating the superiority of treating the paradigm as graph data and showing that for the proposed paradigm, the key to minimum target classification lies in dealing with the relationship between competing and target stimuli.
Table 3 lists the average ITRs by different algorithms using the three paradigms.Since the data length used was the same for all subjects, only the average ITRs are reported in the table.The highest average ITRs for the three paradigms are 51.66 ± 5.07 bits/min, 53.96 ± 7.33 bits/min, and 50.55 ± 5.36 bits/min, respectively.

Discussion
Frequency spatial multiplexing can also be seen as a segmentation of the intermediate cell.Then the 112 target classification is achieved merely by relying on the relative position of the stimuli and the position of the fixation point.Based on the AA paradigm, the NA paradigm was proposed to decouple the frequency and spatial relationship to explore whether the same efficient global classification can be achieved through the spatial relationship only.NA paradigm keeps similar frequencies as far away from each other as possible, which helps to improve the SSVEP detection performance, but in turn, introduces the risk of locating completely off-target areas.SA paradigm was proposed to increase the distinguishability between adjacent targets.The experimental results show that the NA paradigm has better performance, but the three paradigms do not show significant differences  (p > 0.1), demonstrating that all three paradigms can achieve high-resolution target selection and that global classification relies more on the spatial relationships of competing stimuli.In this study, the minimum target classification used the strategy of first detecting the SSVEP response distribution of the intermediate cells and then locating the minimum target.Therefore, the supervision part of the location is essentially a 4-classification problem.If minimum target classification were performed directly, it transformed into a 112 classification problem, then a massive amount of data needs to be collected to ensure each class has enough data.
The classification accuracy of the GCN method can reach a satisfactory level.Although the paradigm layout is regular, we still treated it as graph data because the properties of each minimum cell are different, determined by its relative position in the intermediate cell, which is the irregularity of the paradigm.The advantages of DSGAT are: (  Technically, DSGAT differs from the standard attention mechanism because classical attention is a shared mechanism, i.e. all neighbors share the same parameters to calculate the correlation coefficient through the features themselves.In this work, the neighbor features are determined by the neighborhood relationship, so the shared attention aggregation can only invert the established node relationships.Therefore, for the specificity of the paradigm, DSGAT directly takes the edge weights W α as the learnable parameters.
As can be seen from the results, the classification accuracies of the two stages do not show correlation and have individual differences.The average intermediate cell classification accuracy of most subjects is higher than the final minimum target classification accuracy, and there are also some experimental results that the final accuracies are higher, indicating the effectiveness of the algorithm.The reasons for the lower final classification accuracy could be, on the one hand, insufficient data samples resulting in the algorithm not fully capturing the pattern features.On the other hand, it is also necessary to consider the individual data differences, which may be the different adaptability of subjects to the new paradigm, or the differences in the subjects' response patterns to the target and competing stimuli.
As shown in table 4, we compared the proposed method with representative studies that increase the target resolution of SSVEP-BCI.The table lists the methods of each study, the number of targets, the number of frequencies used, and the performance of the experiments.In addition, a cVEP method is also listed in the table, which illustrates the ability of the cVEP method to enhance target resolution, but we believe that SSVEP BCIs also hold promise for further enhancement and is worth continuing to explore.Compared to other methods, our method does not currently stand out in terms of number of targets and ITR.But the proposed method has its unique advantages.First of all, we use the method of frequency spatial multiplexing, a line of research that is only attempted with a small number of targets, whereas our method has large number of targets.On the other hand, the method we proposed does not conflict with other excellent methods.While other paradigms focus on the design of individual target stimuli, this study relies on the positional relationship between the stimuli and does not circumvent the competing response problem.Therefore, not only these methods, but also other state-of-the-art methods focusing on enhancing the efficiency of SSVEP detection can be used as a basis for the present method to achieve very promising enhancements to make further contributions.
Some limitations of this study should to be discussed further.(1) Firstly, the proposed method has only achieved good results in offline experimental evaluations, in the future, we need to further validate the feasibility under online conditions.
(2) In addition, there is still a gap in the ITR with some cutting-edge SSVEP-BCI systems, due to the long stimulus duration.There is a significant difference between the laboratory environment and the real-world.For applications in the real-word, methods with excellent response speed could ensure more robust performance.And for the SSVEP BCI study, longer duration of visual stimulation inevitably brought about visual fatigue.These factors prompted us to further explore more strategies to improve the performance under short time windows.(3) Considering that in some practical applications, visual stimuli may be superimposed on top of the image, but the current large size of the stimuli will affect the observation of other tasks.If subjects are required to balance observation and control, the situation of long-time operation risks aggravating the brain load.Therefore lower burden interactivity is also an area that should be further optimized.(4) Currently our algorithm requires within-subjects training, which puts pressure on data collection and a relatively heavier burden on the subjects.The next step is to explore cross-subjects classification methods, employing transfer learning techniques to make full use of the old data to transfer to the new subject data.(5) This work focuses on validating the effectiveness of the proposed method through three paradigms, and the next step will be to further optimize the performance for a specific paradigm.
Next, we aim to address the current shortcomings and systematically improve the performance.In terms of time reduction, according to the time-frequency analysis, the SSVEP response of competing neighbors has increased in the first half of the stimulation phase.The next step is to use graph spatialtemporal networks (GSTN) to study the real-time global classification method.GSTN better captures the temporal variation of the relationship and is suitable for the temporal continuity of SSVEP detection results.Thus, GSTN is expected to learn the changing pattern of competing stimuli better to shorten the stimulus duration.Regarding raising the number of targets, (1) In the proposed paradigm, a frequency is assigned only to one intermediate cell.Further, the frequency spatial multiplexing mechanism can be extended to intermediate cells, i.e. a frequency is assigned to multiple intermediate cells at different locations, further increasing the number of targets through a richer frequency scheduling relationship.(2) Combine with other methods to advance research based on better performing SSVEP BCIs.Finally, the present method has application prospects in the fields of medical health, computer vision assistance, spatial navigation, complex cognitive decoding, and human-machine shared control, etc.However, the current research is synchronous, and in practical asynchronous applications, the spatio-temporal pattern of the SSVEP response changes when the subject needs to switch the target, and how to adapt to and discriminate such a situation is an important aspect that needs to be further optimized.

Conclusion
This study designed a novel unimodal SSVEP-BCI paradigm with 112 targets based on frequency spatial multiplexing and neuronal competing mechanism.Instead of designing individual visual stimuli, the present work distinguishes the minimum cells relying on the location relationship of the stimulus arrangement.Three specific interfaces were designed, namely AA, NA, and SA paradigms.A dual-scale graph attention network was constructed as a global localization algorithm based on the SSVEP detection using FBCCA.Eleven subjects participated in the offline validation experiments and obtained an average global localization accuracy of 91.38% and ITR of 53.96 bits/min using the DSGAT algorithm in the NA paradigm.This study is applicable to BCI application scenarios with a large number of targets and has the potential to expand the number of targets for SSVEP-BCI further.

Figure 1 .
Figure 1.Illustration of elements and concepts of the paradigm design.

1 ) 3 )
illustrates the basic elements of the paradigm interface and figure 2 presents three interface examples.The minimum cell is a rectangular single-frequency flickering stimulus, denoted as c.Each cell is replicated as a 2 × 2 matrix, as an intermediate cell (noted as M), c ij denotes the jth corner in M i .The 40 intermediate cells are arranged in a tiled array of 10 × 16 to form the stimulation interface.The four minimum cells in an intermediate cell flicker at the same frequency, but the intermediate cells adjacent to different c ij are differentiated.The frequency response evoked by other intermediate cells in the vicinity of c ij is also enhanced when c ij is the target, so gazing at different minimum cells leads to different SSVEP frequency response distributions; thus, a minimum cell can be located according to the distribution patterns.In this paper, three example interfaces are designed: Aligned arrangement (AA) paradigm, 40 intermediate cells in sequential order with ranks aligned (figure 2(a)); 2) Nonadjacent arrangement (NA) paradigm, adjust the order of intermediate cells in the AA paradigm so that spatially adjacent intermediate cells are distanced in the frequency domain (figure 2(b)); Row stagger arrangement (SA) paradigm, on the basis of the AA paradigm, the intermediate cells are arranged in rows misaligned by one minimum cell, and the minimum cells at the end of the row that is out of the range of Map are filled to the vacancy at the beginning of the row (figure 2(c)).

Figure 2 .
Figure 2. Schematic diagram of the three paradigms interfaces.Stimulus frequencies and phases marked are not shown in the practical experiments.

( 1 )
40 classification: Calculate the response distribution of the 40 intermediate cells by SSVEP detection algorithm; (2) 112 classification: Identify the minimum cell target according to the 40 classification results.

Figure 3 .
Figure 3. Local diagram of Gm and Gc.The circles denote nodes, squares denote the minimum cells in the paradigm, and 2 × 2 squares denote intermediate cells.(a) is the local structure of Gm, including a central intermediate node and its first-order neighbors.(b) is the local structure of Gc, including a central minimum node and its first-order neighbors.
(ii) Define minimum graph G c = (V c , E c ), as shown in figure 3(b), with each minimum cell as node v c (fine-grained node, later referred to minimum node), V c is the set of minimum nodes, |V c | = N c .E c denotes the set of minimum node edges.The feature set of G c is denoted by X c ∈ R Nc×NT c .

Figure 4 .
Figure 4. Structure of DSGAT model.The operations with the same color in the same sub-network in the figure have the same parameters.

Figure 5 .
Figure 5. Schematic diagram of the aggregation operation of Gm.The edges in four colors and line types indicate the aggregation operations of four sub-networks.

Figure 6 (
b) shows example spectra of four data samples using Fast Fourier Transform (FFT) on a typical subject (S1) under the AA paradigm (figure2(a)).The spectrum in each subplot is the average of all channels of the EEG signal.The targets of the four samples belong to the same intermediate cell with a stimulus frequency of 11.6 Hz, noted as c 19,1 , c 19,2 , c 19,3 and c 19,4(11.6  Hz is the 19th frequency), and the mean spectra of all signal channels for each data sample is shown in the figure.

Figure 8 .
Figure 8. Accuracy-data length curves for the classification of 40 intermediate cells per subject.Red, blue and green curves represent AA, NA and SA paradigms, respectively.

Note:
Bolding indicates the highest accuracy for the different methods for each subject in each paradigm.

Figure 9 .
Figure 9. Average accuracy histogram of the three paradigms using different algorithms.
Dual scale graph attention networks (DSGAT).The SSVEP response scores calculated by FBCCA is used as the feature matrix Xm ∈ R Nm×NT m of Gm Output: Prediction results Ŷc of each node of Gc 1: for p = 1 : 4 do 2: EEG signals acquisition utilized a BrainAmp Amplifier (Brain Products GmbH, Germany) Algorithm 1.
Logistic Regression (LR) algorithm is used to classify the four minimum cells in M tg .The SSVEP score vectors of M tg and its 1-step neighbors in G m are concatenated as the feature vectorX N (Mtg) LR ∈ R |N (Mtg)|×NT m ,and the sequence number of the minimum cell in the intermediate cell is used as the 4classification label.The training data were screened before training because wrong M tg localization could interfere with the LR algorithm training.A data sample is considered valid if the intermediate cell containing the true target gets the highest FBCCA score for the most times in a period of time, and conversely, the sample is rejected in training.

Table 2 .
Comparison analysis of SSVEP detection and global localization results.

Table 3 .
The average ITR (bits/min) for the three paradigms using different algorithms.

Table 4 .
Characteristics of BCI study focusing on multiple targets.