LDER: a classification framework based on ERP enhancement in RSVP task

Objective. Rapid serial visual presentation (RSVP) based on electroencephalography (EEG) has been widely used in the target detection field, which distinguishes target and non-target by detecting event-related potential (ERP) components. However, the classification performance of the RSVP task is limited by the variability of ERP components, which is a great challenge in developing RSVP for real-life applications. Approach. To tackle this issue, a classification framework based on the ERP feature enhancement to offset the negative impact of the variability of ERP components for RSVP task classification named latency detection and EEG reconstruction was proposed in this paper. First, a spatial-temporal similarity measurement approach was proposed for latency detection. Subsequently, we constructed a single-trial EEG signal model containing ERP latency information. Then, according to the latency information detected in the first step, the model can be solved to obtain the corrected ERP signal and realize the enhancement of ERP features. Finally, the EEG signal after ERP enhancement can be processed by most of the existing feature extraction and classification methods of the RSVP task in this framework. Main results. Nine subjects were recruited to participate in the RSVP experiment on vehicle detection. Four popular algorithms (spatially weighted Fisher linear discrimination-principal component analysis (PCA), hierarchical discriminant PCA, hierarchical discriminant component analysis, and spatial-temporal hybrid common spatial pattern-PCA) in RSVP-based brain–computer interface for feature extraction were selected to verify the performance of our proposed framework. Experimental results showed that our proposed framework significantly outperforms the conventional classification framework in terms of area under curve, balanced accuracy, true positive rate, and false positive rate in four feature extraction methods. Additionally, statistical results showed that our proposed framework enables better performance with fewer training samples, channel numbers, and shorter temporal window sizes. Significance. As a result, the classification performance of the RSVP task was significantly improved by using our proposed framework. Our proposed classification framework will significantly promote the practical application of the RSVP task.


Introduction
Brain-computer interface (BCI) constructs a connection between the brain, external devices, and the environment directly, without using muscles and peripheral nerves [1][2][3]. Electroencephalography (EEG) is an effective method widely used in BCI studies owing to its affordability, high temporal resolution, non-invasive, and reliability [4]. EEG-BCI has been applicated in several fields, such as external device control [5,6], disease detection [7], post-stroke rehabilitation [8], and emotion recognition [9].
Event-related potentials (ERPs) are small voltages generated by specific events or stimulus, which correlates with cognitive processes [10]. ERP has been used in BCI technology because of its short responding time after the stimulus. Rapid serial visual presentation (RSVP) is a specific type of BCI, which display images sequentially at high presentation rates [11]. It is the most popular target detection technique using EEG signals and a realizable technology to enhance human-machine symbiosis [12]. RSVP task distinguishes target and non-target stimuli via extracting ERP features; thus, the detection and analysis of ERP components play a major role in the classification of RSVP-BCI [13]. The classification framework of RSVP-BCI generally includes preprocessing, feature extraction, and classification, in which the effectiveness of feature extraction is an important factor in determining classification performance [13].
Over the past years, many studies have focused on feature extraction approaches to enhance the performance of RSVP-BCI. Sajda et al proposed a spatial-temporal hybrid feature extraction method named hierarchical discriminant component analysis (HDCA), which extracts spatial features by using Fisher linear discrimination (FLD) and uses a logistic regression to obtain temporal characteristics [14]. Alpert et al developed a spatially weighted FLD-principal component analysis (PCA) (SWFP) method, which uses FLD and PCA to extract spatialtemporal hybrid features [15]. Besides, Alpert et al also developed a modified version of the HDCA named the hierarchical discriminant principal component analysis (HDPCA) method. The difference between HDPCA and HDCA is that HDPCA uses PCA for temporal dimensionality reduction [15]. Xie et al developed a filter bank spatialtemporal component analysis (FBSCA) approach, which decomposes the EEG data of gamma-band in time-frequency-space domains and extracts spatialtemporal features to improve the classification performance of the RSVP task [16]. Xiao et al proposed a discriminative canonical pattern matching method that can detect miniature ERPs and performs well, even in small training sets [17]. Cui et al designed a spatial-temporal hybrid common spatial pattern (CSP)-PCA (STHCP) algorithm to decode EEG signals in the RSVP task, which adopted CSP to extract spatial features and PCA to obtain temporal features [18,19]. Besides, several variants based on the idea of spatial-temporal hybrid feature extraction have contributed to improving the performance of RSVP-BCI. Note that the performance of most algorithms is limited by the characteristics of ERP components, such as amplitude, latency, and duration [20]. However, these characteristics of ERP components are seriously affected by heavy cognitive workload, boredom, and fatigue, even in the intra-subject [21]. Therefore, offsetting the negative impact of the variability of ERP components on the model's performance is a crucial issue for RSVP-BCI.
In previous studies, researchers have tried to address the variability of ERP components differently. Marathe et al characterized a modified HDCA algorithm named sliding HDCA, which used sliding windows in HDCA to account for the variability of ERP components [22]. He and Wu developed a Euclidean space EEG signal alignment method that can make the distributions of different data more similar [23]. This method adopted the idea of transfer learning to improve the performance of classifiers. In addition, some researchers have improved the classification performance by focusing on the latency of ERP components to align EEG signals. Woody et al proposed an adaptive filter system to analyze the variable latencies of ERP components. To improve the performance of ERP components detection, the latencies of ERP were estimated by the cross-correlation method, and the EEG signal was aligned by latencies [24,25]. Song et al proposed an iterative minimum distance square error for ERP feature alignment, which constructs an ERP template to compensate for possible time jitter [26]. However, these methods are only concerned with aligning the original EEG signals by detecting the latencies, essentially finding the most appropriate temporal window for detecting ERP, resulting in relatively limited performance improvement.
On the basis of the above research analysis and review, a classification framework based on the ERP enhancement in the RSVP task named latency detection and EEG reconstruction (LDER) was proposed in this study. In this framework, we construct an EEG signal model to correct the latency of ERP components to enhance ERP features. Specifically, instead of using latency detection to align EEG signals, we modeled EEG signals to enhance ERP components in EEG signals by using latency information. Furthermore, to avoid inaccurate latency estimation due to false similarity and make full use of the temporal-spatial characteristic of EEG signals, we estimated the latencies of the single-trial EEG signal by measuring the similarity with the ERP template and single-trial EEG signal of spatial distribution at different time points. This method is no longer simply to estimate latencies by the similarity of the EEG waveform but to make full use of the spatial-temporal information to estimate latencies. Note that an effective training strategy is included in our proposed framework to avoid overfitting and improve the generalization ability. To the best of our knowledge, it is the first attempt to enhance the ERP feature in the RSVP task to improve the classification performance.
We organized the remainder of our paper as follows: section 2 introduces the design of the RSVP experiment. Section 3 describes the detail of our classification framework. Section 4 compares the classification performance of LDER and the conventional framework with different conditions. Section 5 analyzes the ERP components before and after enhancement and discusses the factors affecting RSVP-BCI classification performance, the visualization of signals before and after enhancement, and future studies. Section 6 draws conclusions.

Experiment settings
We designed an RSVP experiment to verify the performance of our proposed classification framework.

Participants
Nine participants (age range 21-24, 6 males, 3 females) were recruited to participate in the RSVP experiment, with no history of psychiatric or neurological problems. All participants were right-handed with normal or corrected-to-normal vision. They signed informed consent before participating in the experiment. The Northwestern Polytechnical University Medical and Experimental Animal Ethics Committee approved all the experimental procedures.

RSVP protocol
The procedure of the RSVP paradigm is shown in figure 1. First, subjects were sat in a suitable chair in front of a screen and were required to have body movement and eye blinking as little as possible during the experiment. Then, the formal experiment began with a resting state for 1min. We asked subjects to keep their eyes open during the resting state. The beep sounded for 2s to remind subjects at the end of the resting state. After that, a '+' was presented at the center for 2s to correct the visual position of the subjects. Two sessions were included in the formal experiment, and each of them included two runs. There was a break time during 2min between the two sessions. At the end of the break time, the beep sounded 2s to remind subjects to stop the break. The subjects were required to press the blank space key when they saw the target image presented on the screen.
The image stream contains two types of images the target image and the non-target image. All the images are dark background, which contains a vehicle defined as target, and the rest are non-target. The total number of images is 1500, of which the target images account for about 14%, and each run contains 375 images. Each image was displayed randomly on the screen for 200ms, and a continuous detection was conducted to the sequence of image stream to avoid attentional blink. The demonstration of the target and non-target images is shown in figure 2.

Data acquisition
EEG data were acquired by a Neuracle wireless amplifier using 64 electrodes located over the scalp region in accordance with the international 10-20 system. The reference electrode is Cz located in the centralparietal area.

Classification framework based on the ERP enhancement
The process of our proposed classification framework LDER is shown in figure 3. This framework contains five steps: signal preprocessing, similarity measurement, single-trial EEG construction, feature extraction and classification, and final decision. Compared with the conventional classification framework, LDER adds two modules of similarity measurement and single-trial EEG construction. Since LDER needs to enhance ERP features according to both the target and non-target templates, the final decision should compare the two posterior probabilities. In addition, LDER includes a unique training strategy in that the training set should include both original EEG and enhanced EEG signals.

Signal preprocessing
The data preprocessing includes downsampling, a band-pass filter, a notch filter, average electrode reference, data segmentation, and baseline correction. The sampling rate of the raw EEG signal was down to 250Hz to increase processing speed. A band-pass filter of 0.5-30Hz and a notch filter of 50Hz was adopted to remove high-frequency noise, slow drifts, and power frequency interference. The range of segmentation was between −0.1s and 1s, and the baseline was corrected with 0.1s before the stimulus.

Similarity measurement
The target and non-target templates are obtained by averaging single-trial signals in the training set. A sliding window is used in the single-trial EEG signals, in which the length of a sliding window is 100 points with 1 point step size. After that, the spatial-temporal similarity measurement is calculated between the signals in the sliding window and the template signals of the target and non-target based on global field power (GFP) and global map dissimilarity (GMD) [27].
The voltage distribution of each distinct topography represents the different predominating brain activity, and both individual ERP components and combined ERP components have diverse topography [28]. Since to estimate the global brain activity at different time points, the GFP is used to calculate the   root of the mean of the squared potential differences between the voltage and average voltage [29]. The GFP can be expressed as follows: where N represents the number of electrodes, v i (t) represents the voltage of electrode i at time t, and v(t) represents the average voltage at time t crosses all the electrodes. After that, the GMD based on GFP is used to measure the topographic dissimilarity of the sample of the current window and template. The GMD is defined as follows: where w represents the wth window of the EEG sample, the voltage at electrode i of two individual topographies is defined as m i and n i , respectively. The average voltage across all electrodes of two individual topographies is defined as m and n, respectively. Specifically, one of the topographies represents the template signal and another represents the sample of the current window. The more similar the two individual topographies are, the GMD value is closer to zero, Therefore, we take the start time point of the window that minimizes the GMD values as the latency. To enhance the ERP feature, a single-trial EEG model is constructed by a linear decomposition of several ERP components with latency-variable [25,30], which can be expressed as: where EEG(t) is the single-trial EEG signal, G(τ ) represents the ERP components located at its latency, X is a time function which represents the location of ERP components and corresponds to the 'latency' in equation (3), ε is the noise of EEG signal. After that, the G(τ ) can be obtained by the solution of equation (4) using the least square method: where G is the ERP signal after latency correction, which realizes the enhancement of ERP signal. X t is the transpose of matrix X, (X t X) −1 represents the inverse of the covariance matrix X t X.

Feature extraction and classification
In our proposed classification framework, enhanced signals can be directly used for feature extraction and classification. Note that we measure the similarity between each sample and both the target and non-target templates in the previous step, so the class corresponding to the maximum of the two posterior probabilities is the final decision. In addition, both original signals and enhanced signals should be included in the training set.

Results
We used four popular algorithms (STHCP, HDPCA, SWFP, and HDCA) in RSVP-BCI for feature extraction and LDA for classification to verify the performance of LDER. Since the number of target and non-target samples in RSVP task is unbalanced, the performance of our proposed framework was evaluated by using the area under the receiver operating characteristic curve (AUC), true positive rate (TPR), false positive rate (FPR), and balanced accuracy (BA) [31,32]. Fifty percent of the EEG data for each subject was chosen as the training data and the rest was considered as the test data. The training set and testing set were selected randomly, and the whole process was repeated 4 times.

Performance comparison between LDER and conventional framework
The classification performance of nine subjects using LDER and conventional classification framework was depicted in figure 4. After using LDER, the AUC, TPR, and BA values of STHCP, SWFP, HDPCA, and HDCA were increased in all subjects. Congruently, the FPR of STHCP, SWFP, HDPCA, and HDCA were reduced in all subjects. Table 1 shows the average AUC, TPR, FPR, and BA values for the nine subjects. The significant differences between the classification performance using LDER and the conventional framework were analyzed using the Wilcoxon signed-rank test. The AUC, TPR, and BA of our proposed framework based on STHCP were significantly better than those of the conventional framework based on STHCP, showing increases of 11.5%, 21.6%, and 17.5%. The FPR of LDER framework based on STHCP was significantly lower than that of the conventional framework based on STHCP. Congruently, the AUC, TPR, and BA of LDER framework based on SWFP were significantly superior to the conventional framework based on SWFP, representing increases of 23.4%, 34.8%, and 28.2%. The FPR of our proposed framework based on SWFP was 75% lower than the conventional framework based on SWFP. For the HDPCA method, the AUC, TPR, and BA of our proposed framework were increased by 19.7%, 21.2%, and 21.4%, respectively. The FPR of the LDER framework based on HDPCA was significantly lower than that of the conventional framework based on HDPCA. The AUC, TPR, and BA of LDER based on HDCA were significantly higher than those of the conventional framework based on HDCA. Compared with the conventional framework, the FPR of LDER based on HDCA had a significant decline.

Performance with different numbers of selected channels
To explore the effect of electrode numbers on classification performance, three-channel conditions (15channel condition, 28-channel condition, 59-channel condition) were selected and shown in figure 6. The average AUC values with different channel conditions of four baseline methods using LDER and conventional classification framework are shown in figure 5(B). Furthermore, paired t-tests were A repeated-measures ANOVA with Bonferroni adjustment was adopted to compare the classification performance of each method among different channel conditions. In the conventional framework, we can find a significant difference in AUC values with different channel conditions for four baseline methods except for SWFP (STHCP:

Performance with different time windows
Selecting different time windows may involve various individual ERP components or combined ERP components, so we analyzed the classification performance with varying time windows, as shown in figure 5(C). First, we conducted paired t-tests to compare the AUC values of each algorithm using LDER and conventional classification framework in each time window. The results show that there was a significant difference in AUC values between using LDER and conventional classification framework for four methods among different time windows (STHCP vs. In addition, we compared the classification performance of each method among different time windows by using repeated-measures ANOVA with Bonferroni adjustment. The statistical results of the four methods using the conventional classification framework show that the classification performance is significantly different across various time windows (STHCP: F(3,24) = 12.895, p < 0.001; HDPCA: F(3,24) = 9.319, p < 0.001; SWFP: F(3,24) = 9.578, p < 0.001; HDCA: F(3,24) = 8.682, p = 0.004). The statistical results of the four methods using the LDER framework show that there is no significant or weak significant difference among different time windows for LDER-STHCP and LDER-SWFP, but there is a significant difference can be found in LDER-HDPCA and LDER-HDCA (LDER-STHCP: F(3,24) = 1.037, p = 0.394; LDER-HDPCA: F(3,24) = 13.102, p < 0.001; LDER-SWFP: F(3,24) = 4.335, p = 0.014; LDER-HDCA: F(3,24) = 6.178, p = 0.006). However, compared with HDCA, LDER-HDCA showed relatively weak significant differences. The statistical results reveal that the classification performance of LDER-STHCP, LDER-SWFP, and LDER-HDCA is less sensitive to the time window selection, compared with using conventional framework.

Discussion
In cognitive tasks, we can find variation in EEG signals, in which there is not only across-subject variability but also within-subject variability. Besides, the variability of EEG signals spans a wide range of spatial and temporal scales [33]. Target and non-target images are distinguished by identifying ERP components, so the variability of ERP components can negatively affect classification performance. The variability of the ERP components is usually reflected in the latencies or amplitudes of ERPs, and the variability of the ERP latency makes the ERP amplitude smaller due to feature overlap, thus drowning in the background EEG signal. Therefore, we focus on exploring the latency variability of ERP components in this study.
In order to deal with the negative impact of ERP variability on classification performance, LDER framework was proposed to enhance ERP features by constructing a single-trial EEG model and correcting the latency of ERP components. We analyzed the ERP components to show the effectiveness of our framework and compared the ERP components before and after enhancement. Besides, we discussed several factors that affected the classification performance of RSVP-BCI and analyzed the reason why LDER can improve classification performance.

Analysis of ERP components
The original and enhanced grand-average ERP waveform of target and non-target across different subjects are shown in figure 7. As can be seen, the grand-average ERP waveforms of the target and nontarget are different. Except for subject 3 and subject 1, all the other subjects showed a prominent P3 component and N2 component. Subject 1 only contained a significant P3 component, and subject 3 only contained a significant N2 component. In addition, the amplitudes of the components that distinguish targets from non-targets are enhanced, especially at 200-600 ms. Figure 8 demonstrates the topographies of the original and enhanced grand-average ERP waveform of target and non-target at 200-600 ms. 'non-Target' and 'Target' represents the topographies of the EEG signal under the non-target image stimulus and target image stimulus, respectively. 'enhanced-Target' represents the topographies of the EEG signals which enhanced by using our proposed framework under the target stimulus. The depth of the color means the amount of power. Compared with the topographies of the non-target, the darker regions of the target topographies are the spatial distribution of ERP components at various times. During the time period when ERP components appear, the enhanced ERP waveform displays stronger power while keeping the spatial distribution unchanged. Notably, nine subjects we analyzed all showed such a pattern.

Analysis of factors affecting RSVP-BCI classification performance
Many existing classification methods in RSVP-BCI extract ERP features by means of spatial-temporal hybrid, such as SWFP, STHCP, FBSCA, HDCA, etc. This type of classification has been proven to be able to achieve a satisfied performance in RSVP-BCI. However, the performance of hybrid spatial-temporal classification methods is limited by channel selection, time window selection, and the number of training trials [34][35][36][37]. As shown in figure 5(A), with the increasing sample size of the training set, the classification performance will be better. This means that we may need more samples to achieve better classification performance. However, there is no significant or a weak significant difference for the LDER framework, which represents LDER only needs fewer samples to achieve good classification performance.
As shown in figures 5(B) and (C), almost all the methods show that the various channel condition and time window selection can lead to different classification performance. The ERP distribution of all brain regions is shown in figure 9. It indicated that ERP's temporal patterns were spatial-various. Specifically, different ERP components are distributed in different brain regions, and different ERP components have various appearance times. Owing to the spatial-temporal variability of ERP components, it is difficult to find the most optimal channel condition and its corresponding temporal window. This may be the reason why the classification performance of the hybrid spatial-temporal classification methods is significantly different under different channel selection and time window conditions. Notably, our framework is not aimed at any of the above factors to improve the classification but directly enhances the ERP features and improves the signal quality. Therefore, a good classification performance can be achieved by using a small number of channels, Figure 7. The grand-average ERP waveform from Pz channel of nine subjects. The green line represents the non-target waveform, the red line represents the target waveform, and the blue line represents the target waveform after the enhancement. Shaded regions represents the standard deviation in the grand-average ERP waveform. a small number of training trials, or a short time window.

Visualization of signals before and after enhancement
The discriminability of signals before and after enhancement is depicted in figure 10, which visualizes the single-trial EEG signals before and after enhancement using the t-distributed stochastic neighbor embedding (t-SNE) method [38]. The visualization results show that the original EEG signals are completely inseparable, and the EEG signal after enhancement presents a completely separable distribution. Overall, our framework enhances the ERP components that can distinguish targets and non-targets.

How LDER works
As a result, the main crux for RSVP-BCI to achieve a good performance is to enlarge the difference of signals between target and non-target. Some existing methods tend to extract ERP features through spatial and temporal domains, but their performance is restricted to ERP components' variability. Even some strategies specifically dealing with ERP variability do not deviate from the idea of searching optimal channel conditions or temporal windows. Our proposed framework considered enhancing ERP components that distinguish target and non-target before feature extraction and classification to improve the effectiveness of most feature extraction and classification methods. Specifically, LDER could calculate the corrected ERP components from an EEG signal model through latency detection of ERP components, resulting in enhanced features. The enhanced effect of LDER can be demonstrated in figures 7, 8, and 10. Note that, to deal with the issue of ERP components' variability within the subject, all the test samples will be calibrated according to the non-target and target templates obtained from the training set. This could counteract the negative effect of ERP components' variability on the model's performance for RSVP-BCI. Unlike the other methods make efforts on increasing the training set and finding the most appropriate temporal windows or spatial regions, LDER can enhance ERP features directly, resulting in making non-target and target signals more discriminative before feature extraction and classification. That can explain why LDER did not sensitive to the size of training set, selected time window, and selected channel. Furthermore, LDER adopts a unique training strategy in which the training set contains both original and enhanced EEG data, which means that it also increases the size of the training set to be viewed as one type of data augmentation method.

Future study
In this study, we verify the effectiveness of our method only for the within-subject variability, and the across-subject performance should be conducted in the future. Furthermore, variability is also one of the reasons that affect the performance of online experiments; thus, it is necessary to verify our method in an online experiment. Another limitation is that the applicability of our proposed method to deep learning has not been explored.  Finally, only nine subjects were included in our experiment, and further studies should expand the dataset.

Conclusion
This study proposed a classification framework based on ERP enhancement for RSVP-BCI, which can enhance ERP components by constructing the EEG model and correcting ERP latency. An RSVP based on vehicle detection experiment was designed to verify the performance of LDER. The experimental results demonstrate the promising performance of our proposed classification performance over the conventional classification framework using four popular feature extraction methods. Furthermore, statistical results show that our proposed framework enables a better performance with fewer training samples, channel numbers, and shorter temporal window sizes. Our study promotes the development of the practical process of RSVP-BCI in the real world.