ETucker: a constrained tensor decomposition for single trial ERP extraction

Objective. In this paper, we propose a new tensor decomposition to extract event-related potentials (ERP) by adding a physiologically meaningful constraint to the Tucker decomposition. Approach. We analyze the performance of the proposed model and compare it with Tucker decomposition by synthesizing a dataset. The simulated dataset is generated using a 12th-order autoregressive model in combination with independent component analysis (ICA) on real no-task electroencephalogram (EEG) recordings. The dataset is manipulated to contain the P300 ERP component and to cover different SNR conditions, ranging from 0 to −30 dB, to simulate the presence of the P300 component in extremely noisy recordings. Furthermore, in order to assess the practicality of the proposed methodology in real-world scenarios, we utilized the brain-computer interface (BCI) competition III-dataset II. Main results. Our primary results demonstrate the superior performance of our approach compared to conventional methods commonly employed for single-trial estimation. Additionally, our method outperformed both Tucker decomposition and non-negative Tucker decomposition in the synthesized dataset. Furthermore, the results obtained from real-world data exhibited meaningful performance and provided insightful interpretations for the extracted P300 component. Significance. The findings suggest that the proposed decomposition is eminently capable of extracting the target P300 component’s waveform, including latency and amplitude as well as its spatial location, using single-trial EEG recordings.


Introduction
Event-related potentials (ERPs) are the responses of brain to an internal/external stimulus and are mostly observable in electroencephalogram (EEG) signals. Depending upon the specifications of the stimulus and the characteristics of the person responding to it, amplitude, time of occurrence (after the onset), polarity, and the spatial position of the ERPs may vary (Schomer and Da Silva 2012, Luck 2014, Ting et al 2014, Luck et al 2000.
As an example, P300 ERP component, which is provoked as a result of an oddball task during EEG recording, mostly occurs after 250-650 ms of the onset, has positive amplitude, and is located in thesuperior and inferior frontal and parietal lobe (Kaplan et al 2013, Sabeti et al 2016. Cross-trial temporal averaging, also known as grand averaging method, which averages trials over time, has been widely used to extract ERP components from EEG recordings, particularly in brain-computer interface applications (Xiao et al 2019, Ma et al 2021. The underlying assumption of this method is that the ERP component is deterministic and the background EEG is a white noise (Luck 2014). This assumption, though practical and handy, has been questioned in several studies in presence of the jitter effect, which may happen due to mental fatigue or attention deficit (Käthner et al 2014). Moreover, the amplitude, latency, and spatial location of the ERP components have been prone to variations in different recording trials (between-trial-variability) (Jarchi et al 2011). In this respect, several studies have attempted to propose methods for extracting ERP (sub) component(s) from single-trial EEG recordings. The approaches can be classified into five groups; blind source separation (BSS), e.g. independent component analysis (ICA)  or principal component analysis (PCA) (Dien 2012), adaptive filter, e.g. Wiener filter (Cerutti et al 1987) or Kalman filter (Georgiadis et al 2005), denoising algorithms such as wavelet transform (Quiroga andGarcia 2003, Aniyan et al 2014), iterative estimation algorithm such as RIDE (Ouyang et al 2011), and spatio-temporal filtering methods (Li et al 2009, Jarchi et al 2010, Ranjbar et al 2018. While some methods suffer from lack of interpretable spatial estimation for the targeted ERP component, the others require several trials to estimate the component in each trial or fail to extract the dynamics of the component in bad SNR conditions (lower than −15 dB).
Tensor decompositions have been widely used in EEG analysis in various applications, including but not limited to multi-modal EEG and functional magnetic resonance imaging (fMRI) analysis (Mørup et al 2008, Eliseyev and Aksenova 2013), brain-computer interfaces (BCIs) (Cichocki et al 2008), seizure detection and localization (De Vos et al 2007, Deburchgraeve et al 2009, and ERP analysis (Vanderperren et al 2013, Idaji et al 2017, Maki et al 2018, Wang et al 2018, Zhang et al 2020. Different tensor decompositions, such as canonical polyadic (CP), also known as parallel factor analysis (PARAFAC) (Mørup et al 2006, Eliseyev and Aksenova 2013, Tucker decomposition (Latchoumane et al 2012, Cong et al 2013, Non-negative Tucker decomposition (NTD) (Dao et al 2020, Rošt'áková et al 2020, Wang et al 2020 , and their variations (Zhao et al 2012, Cichocki et al 2015, Idaji et al 2017, have been explored in these studies. While much research in the field of ERP analysis has focused on extracting features for pattern recognition models, to the best of our knowledge, no previous work has proposed a new tensor decomposition specifically designed for robust single-trial ERP estimation from raw EEG recordings in low signal-to-noise ratio (SNR) conditions. The flexibility offered by tensors in handling multi-dimensional data may be beneficial in addressing the current limitations in single-trial ERP estimation. A new approach, which can benefit from multi-dimensional flexibility of tensors may be useful to address the current gaps in single-trial ERP estimation.
The contribution of the paper is to propose a customized tensor decomposition by including a constraint, which is inspired by spatio-temporal filtering models, to the conventional Tucker decomposition. In this regard, the notations, original Tucker decomposition, Non-negative Tucker decomposition, and our proposed method (called ETucker) are discussed in section 2, then the performance of the proposed method on a physiologically meaningful synthesized EEG dataset as well as on a real-world dataset is validated in section 3. And some conclusions are drawn in section 4.

Notations
Before bringing up the details of the proposed decomposition, we need to demonstrate the assumptions and notations. We use the same notations and definitions used by Kolda and Bader (Kolda and Bader 2009 is characterized as an Nth dimensional array, also called an N-way tensor. Boldface calligraphic letters, e.g.  symbolizes tensors of dimension 3 or greater; bold capital letters like X symbolize matrices; bold letters in lower-case, e.g. x symbolize vectors; and lowercase letters, e.g. x denote scalars. '⊗' and '°' denote the Kronecker and outer products, respectively, and the n-mode tensor-times-matrix product of  and the matrix ÎḾ IR J I n is indicated by and is defined by mapping from element is called factor matrix. The factor matrices are assumed to have orthonormal columns (Malik and Becker 2018). Also, the tensor in (1) is a rank-(R 1 , R 2 , R 3 ) tensor. Tucker decomposition can also be transformed to an optimization problem using the cost function as follows: where   . F 2 indicates the Frobenius norm of a tensor. Throughout this paper, it is assumed that the tensor Î´ IR C T 2 corresponds to a single trial EEG epoch, and accordingly U (1) and U (2) and U (3) contain information regarding spatial dispersal, temporal dynamics, and contribution of the estimated sources in reconstructing each slice of the tensor, respectively.

Non-negative Tucker decomposition
The NTD is a variant of Tucker decomposition where the core tensor ( ) and factor matrices (U (i) ) are constrained to have only non-negative elements (Kim and Choi 2007). This non-negativity constraint ensures that all elements in the core tensor and factor matrices are non-negative, promoting sparsity and interpretability of the factors.
The non-negativity constraint ( (2), and the problem is solved through an optimization process, which are commonly projected gradient methods or alternating non-negative least squares algorithms (Kim and Choi 2007). In this paper, we utilize NTD to compare the performance of our model in cases with positive peaks in the synthesized dataset. In contrast to all the algorithms implemented in MATLAB for this study, NTD was implemented using the Tensorly library in Python (Kossaifi et al 2018).

Constrained Tucker decomposition for ERP extraction
Though capable of source separation, Tucker decomposition fails to extract sources properly in low SNR conditions (Cichocki et al 2009, Zhou and. In such cases, adding a priori information regarding the sources can enhance not only the convergence possibility of the algorithm but also the accuracy of estimated sources (Fonał andZdunek 2019, Chen et al 2021).
In spatio-temporal filtering-based papers (Li et al 2009, Jarchi et al 2010, Ranjbar et al 2018, it is assumed that the ERP component can be extracted by applying a spatial filter (w ä IR C ) to the observation matrix (X ä IR C×T ) and comparing the result with a pre-assumed waveform with different delays (g 0 (τ)), which can be formulated as the following problem: where C is the number of channels, T is the number of temporal samples, and g 0 (τ) is the τ-sample shifted gamma function. The original gamma function is defined as: where c and K and θ are amplitude, bandwidth, and peak of the waveform, respectively. Similar assumptions can likewise be made on the tensor's factor matrices. In this respect, the cost function in (5) is defined so that the estimated sources in the second factor matrix of the tensor form an analogous waveform to the gamma function (r(t) is the vectorized notion of g(t)).
Since (2) and (5) should be optimized concurrently, the second equation is appended in (2) as a constraint using Lagrange multiplier (λ). As a result, the Tucker decomposition in (2) is transformed to the following problem: where core tensor, factor matrices, filter coefficients (w i ), and delay (τ) are target parameters and should be estimated through an optimization process. To this end, the objective function in (6) is split into three objective functions as follows: where   . F 2 denotes Frobenius norm of the matrix, and C (i) s are defined as follows: To reduce the risk of divergence, gradient descent method was employed to estimate the target parameters. The first-order derivatives with respect to the factor matrices and the filter are demonstrated below (Akbari et al 2015) : The parameters are initialized using algorithm 1, where R i specifies the expected rank of the tensor and f (.) denotes any transformation or algorithm that can be used to extract ERP components in high SNR conditions, e.g. spatio-temporal, ICA, or PCA-based methods (MacDonald and Barry 2014, Lee et al 2016, Ranjbar et al 2020, Zang et al 2022. Subsequently, the factor matrices and filter are updated recursively using (10) until the algorithm converges (the difference between the norm of two successive estimated filtering vectors, namely w, is below 0.0001) or reaches the maximum number of iterations (1000).
Algorithm 1. Initialization algorithm for factor matrices and the core tensor where γ is the learning rate (0.001 in this paper). The proposed method is summarized in algorithm 2, where T s is the suspected range of the target ERP component (0.25s-0.65s in this work), s is the extracted ERP waveform, and a is the spatial vector of the extracted component.

Algorithm 2. ETucker
. 1: Initialize U i ( ) and  using algorithm 1.  . Such studies have assumed that the background EEG activity is similar to the white noise, and accordingly used the Gaussian probability density function to generate the temporal signals. Also, uniform distribution has been used to generate the spatial vector of the ERP component. However, this approach does not fully act following the frequency-domain characteristics of the EEG signals such as hyperbolic (1/f ) morphology of the power spectrum distribution. Therefore, in this work, an approach was developed to utilize real EEG recordings and simulate the synthetic recordings for further performance validation of the proposed decomposition. Accordingly, ICA was used to extract sources of a dataset recording no-task EEG. The dataset had been originally recorded with closed-eyes using 19 channels and a sampling frequency of 200 Hz to simulate and identify the effect of electrooculogram artifacts on EEG signals (Klados and Bamidis 2016). In this research, an autoregressive (AR) model was used to learn the behaviour of estimated sources from ICA. The model is characterized as follows: where u(n) is the input of the system and is usually considered as zero-mean Gaussian white noise (GWN), x(n) is the observation sequence, and w k , 1 k q is the AR parameter with q being the order of the AR model (Li et al 2015). For training the proposed AR model, q was set to 12, and the Yule-Walker algorithm was used to find the AR parameters (w k ). Using the AR model and white Gaussian noise as the input, multiple meaningful EEG sources can be generated. Regarding the target ERP component, which is P300 in this article, the parameters of the gamma function are set as c = 1, K = 14, and q = -K 0.3 1 . Furthermore, to simulate the jitter effect, the values of K and θ are incremented by 0.02 and 0.005 in each iteration over 100 trials, respectively. The sampling frequency is 200 Hz and the duration of each trial is 2 s. Simulated ERP components are illustrated in figure 1. To project the sources in a meaningful manner, the Brainstorm toolbox was used, and the sources' positions were manually located. The scalp and skull and brain tissue conductivities were set to 0.33, 0.004, and 0.33 μs cm −1 , respectively. The position of the corresponding source was also located around C z channel location (Linden 2005). The projection of sources from the source space to the observation space was done using , where X is the observation matrix, N s is the number of sources or 19 (in this article), a i is the spatial vector corresponding to the ith source, and s i is the ith source. In figure 2, the power spectral density (PSD) of the generated recordings using the proposed synthesis method and white noise are shown, and it can be conceivable that the proposed method could retrieve more information in all frequency bands; moreover, it was able to preserve the hyperbolic shape of the EEG PSD.
As the performance of the methods should be evaluated in different SNR conditions, the matrix corresponding to the reconstructed ERP component (S) and the matrix of noise signals or background EEG recording (N) should be merged using different coefficients, X = S + ηN, where η is calculated as follows: The EEG synthesizing algorithm is illustrated in figure 3.

Real data
To evaluate the effectiveness of the proposed algorithm in real-world scenarios, we utilized EEG signals from BCI Competition III (2004)-Dataset II (Blankertz et al 2004(Blankertz et al , 2006. The dataset consisted of participants engaging in five sessions of several runs, where they were tasked with spelling a series of characters by observing a 6 × 6 speller matrix. During each character presentation, the matrix was displayed for a duration of 2.5 s with uniform intensity (i.e. blank matrix). Subsequently, individual rows and columns of the matrix were randomly intensified for 100 ms, resulting in 12 different stimuli. After each intensification, there was a 75 ms blank period. The order of row/column intensifications was randomized within blocks of 12, and each set of 12 intensifications was repeated 15 times for each character, resulting in a total of 180 intensifications per character. Following each sequence of 15 sets of intensifications, there was another 2.5 s blank period.  Since the target character was repeated in two out of the 12 rows/columns, the experimental paradigm was considered as an odd-ball paradigm, with the intensification of the target row/column expected to elicit a P300 component. EEG signals were recorded using a 64-channel recording scheme at a sampling frequency of 240 Hz.
For this study, we utilized the training set of 'Subject A,' which consisted of 85 epochs (characters). To focus specifically on the time range where P300 components are predominantly observed, we extracted the signals from the onset of stimuli to 650 ms after the onset. Subsequently, an 8th-order Chebyshev bandpass filter with a passband of 0.1-10 Hz was applied to remove baseline drift and high-frequency artifacts from the signals. To estimate the P300 component, we only considered the first repetition (trial) out of the 15 trials available for each character. The estimated component was then correlated with the average of the 15 trials for each character. To demonstrate the significance of our work, we applied the proposed algorithm to both target trials (containing P300 components) and non-target trials (without P300 components), allowing for a comprehensive evaluation of its performance.

Metrics
To compare the estimated and the original ERP component in the synthesized EEG dataset directly, four different metrics have been proposed; (1) delay, which is the difference between the index of the maximum amplitude in estimated and the original waveform, (2) relative amplitude, which is the ratio of maximum amplitude of estimated component to the original one, (3) temporal correlation, which is the correlation between the estimated waveform and the original one, and (4) spatial correlation, which is the correlation between spatial vectors.

Tensorization
Making tensors from EEG recording matrix can be done in several ways, e.g. concatenating recording of different epochs in the third mode of tensor, using time-frequency representation such as wavelet, or using transforms that have been developed for ERP extraction and adding the reconstructed component as the second slice of the tensor. In this work, though exploring all approaches, the third family has shown better performance, and accordingly were chosen for presentation and comparison of the results. Temporal PCA (TPCA), and ICA were used as the blind source separation methods in which sources are extracted from the observation matrix (Jung et al 1999, Kayser and Tenke 2003, Bernat et al 2005. Then, the source that is more correlated with the standard gamma function is selected as the candidate source, and projected to the observation space using the corresponding spatial vector. Thereafter, the reconstructed matrix is placed in the second slice of the third mode of tensor. The implementation of the mentioned methods has been done using ERP PCA Toolkit and ICALAB (Cichocki 2002, Dien 2010. The other method is based on spatio-temporal filtering, called STF, which estimates the source and the spatial vector utilizing several trials (Li et al 2009). Afterwards, the source is projected to observation space using the forward model (a T s) and the reconstructed matrix is placed in the third mode of the tensor.
Regarding the rank estimation, we employed the modified eigenvalues estimator for Tucker rank determination (MEET) algorithm (Yokota et al 2016), to estimate the rank of the first mode (R 1 ). After careful consideration and for the sake of consistency among all SNRs, we set the first mode rank to be 8 and the third mode rank (R 3 ) to be 2. Additionally, we intuitively set the temporal rank, R 2 , as 3 due to the presence of one ERP source, one physiologically meaningful EEG component, and one non-physiological source (e.g. noise).

Synthesized EEG
The comparison of the performance of the proposed method using different tensorization approaches is shown in figure 4. It can be inferred that integrating the spatio-temporal filtering method (STF) with the ETucker decomposition outperformed the other approaches. What can be clearly seen in this figure is the robustness of the algorithm in low SNR (high noise) conditions. However, almost every approach failed to estimate the spatial vector as the SNR gets worse (less than −20 dB).
As a tensor decomposition, Tucker has been shown as a source separation algorithm in which each vector of the factor matrix can be considered as the source basis. In table 1, the performance of Tucker decomposition and ETucker, which was exclusively extended for single-trial ERP extraction application, were compared with the reference algorithms that were used initially for generating tensors. The table also shows that there has been a sharp drop in the performance of Tucker in low SNR conditions as if the Tucker is unable to distinguish source and background noise.
As table 1 illustrates, utilizing ICA alone can accurately estimate the temporal occurrence and morphology of the target component across all SNR conditions. However, it is found to be inadequate in determining the peak amplitude and spatial vector of the component. The proposed method, ETucker, in conjunction with ICA, significantly improves the peak detection and peak onset estimation, while the spatial distribution estimation remains inferior. A comparison of the proposed method with the Tucker decomposition and ICA alone, reveals that ETucker demonstrates superior performance in almost every SNR condition, particularly in low SNR or extremely noisy conditions. The results of applying STF on our synthesized dataset suggest that the method demonstrates reliable performance in high SNR conditions. However, it is found to be inadequate in the estimation of amplitude and peak occurrence in lower SNR conditions. These findings align with the results of the original paper (Li et al , 2009. Furthermore, the results indicate that the use of the proposed ETucker method, in conjunction with STF, not only improves the performance of amplitude estimation, but also enhances component occurrence detection and temporal correlation. It is important to note, however, that the spatial correlation remains inadequate in high SNR conditions for all methods. Additionally, the findings of our study indicate that the Tucker decomposition built upon the STF method is ineffective in extracting the target component, particularly in high SNR levels where the component is almost diminished.
Furthermore, table 1 suggests that TPCA method performs relatively acceptable in estimating the target component at zero SNR level. However, as the SNR decreases, the temporal occurrence and correlation of the estimation deteriorate significantly. To address this limitation, we propose the use of ETucker in conjunction with TPCA. The outcomes imply that this approach improves the delay, temporal correlation, and amplitude estimation of the target component with a high level of significance. Despite this, it is important to note that all methods, including TPCA, exhibit limitations in accurately estimating the spatial vectors of the target component in noisy conditions. This is a common limitation among all methods, regardless of the use of Tucker or ETucker decomposition.
In addition to Tucker decomposition, we also compared our proposed method with NTD, specifically targeting the peak corresponding to our intended ERP component, which is the P300 characterized by a positive peak. Table 2 presents the results of this comparison. The temporal estimation performance of NTD was found to be superior to Tucker decomposition, indicating its ability to capture the temporal dynamics of the P300 component more accurately. However, it is important to note that the spatial precision of NTD was poor. This  can be attributed to the specific characteristics of the synthesized data used in our study, where different weights were assigned to project sources onto the EEG subspace, resulting in a mixture of positive and negative values. The optimization procedure of NTD, which enforces non-negativity, ignores the presence of negative values, leading to suboptimal spatial estimation. Furthermore, when comparing our proposed method with NTD, our model demonstrated superior performance. This can be attributed to the fact that our approach considers the condition as a physiologically meaningful constraint, rather than solely relying on non-negativity.

Real data
The STF tensorization approach, using the specified hyperparameters that were used to simulate synthesized EEG data, was employed to generate a tensor from the multi-channel recordings. Temporal correlation was calculated between the estimated component in both target and non-target epochs and the grand average of four specific channels: Cz, Fpz, Oz, and T9. The selection of Cz and Fpz channels was based on their proximity to the source of the P300 component, while Oz and T9 were chosen as they are spatially distant and not expected to contain substantial P300-related information. The reported results present the average correlation of the target row and column for target trials (2 out of 12), as well as the average correlation of all rows and columns for nontarget trials (10 out of 12). These correlations provide insights into the performance of the algorithm in distinguishing between target and non-target trials.
The findings indicate that the Cz channel ( figure 5(a)) and Fpz channel ( figure 5(b)) exhibit the highest mean correlation across different epochs, suggesting their sensitivity to the underlying ERP characteristics. Furthermore, these channels demonstrate a significant distinction between target and non-target trials. Conversely, in figures 5(c) and (d), the mean correlation between target and non-target ERPs is not significant. This indicates that the algorithm successfully detects the ERP component in target trials, while being unable to identify a significant component in channels that lack sufficient information about the underlying ERP component.

Limitations and future directions
The proposed algorithm in this study has been shown to perform superiorly compared to Tucker, NTD, and other conventional matrix-based transformations for the task of source separation. However, the convergence of the algorithm is not fully guaranteed due to the complexity of the loss function. This complexity leads to some unexpected trends in the results, as seen in figures 4 and 5 and tables 1 and 2. In future directions of this study, more complex and non-linear optimization algorithms should be investigated to address this challenge. One possible approach is to use second-order methods, such as Newton's method and quasi-Newton methods. These methods require the computation of the Hessian matrix, which can be computationally expensive, but they can provide faster convergence than first-order methods. Another approach is to use heuristic methods, such as genetic algorithms or simulated annealing, which do not rely on gradient information and can be less sensitive to the loss function's complexity.
In this study, we utilized higher-order singular value decomposition (HOSVD) for initializing the first and third modes of the tensor decompositions under investigation. While the uniqueness of HOSVD is guaranteed due to the uniqueness of the corresponding singular vector matrices of the unloading factors of the tensor, we adopted a different approach to initialize the second factor matrix, which in turn compromised the uniqueness of the proposed decomposition. However, it is important to note that the main focus of this study was to explore the Tucker decomposition in a way that allows for accurate prediction of the underlying ERP by appropriately combining different fibers of the second factor matrix. Additionally, the introduced constraint (l t - ) ensures that the amplitude of the estimated component falls within an acceptable range. Moreover, the presence of w helps alleviate the issue of column permutation within the second mode factor matrix. In future steps of this study, additional constraints and/or unique initialization methods can be incorporated into the decomposition to address the issue of non-uniqueness.
Additionally, it is important to investigate the robustness of the proposed algorithm to different types of noise and different EEG datasets. This will allow us to better understand the limitations and strengths of the proposed algorithm and to identify areas for further improvement. Furthermore, it would be also interesting to study the extension of the proposed algorithm to other source separation tasks such as audio and image source separation.
Overall, this study provides a promising approach for source separation, but there is still room for improvement. By investigating more complex and non-linear optimization algorithms and studying the robustness of the proposed algorithm, we can further improve its performance and make it more widely applicable to different source separation tasks.

Conclusion
In this paper, we proposed an extended tensor decomposition for single-trial ERP extraction. The proposed model is based on Tucker decomposition and adds meaningful constraints to the tensor factorizations to improve performance in source separation. Our experimental results in the synthesized EEG dataset show that the proposed model outperforms Tucker decomposition in terms of source separation accuracy. Additionally, the feasibility of leveraging the model in clinical applications is evaluated using a publicly available dataset that includes the P300 component.
This study has important implications for the field of EEG signal processing and source separation. Our findings suggest that modified tensor decompositions may be a promising approach for extracting single-trial ERPs from EEG data. Furthermore, the proposed method may also be applied to other source separation problems, such as audio and image source separation.
In future work, we plan to investigate the effectiveness of the proposed method on different EEG datasets and with different types of noise. Additionally, we will explore the possibility of incorporating additional constraints into the tensor decomposition to further improve the performance of the proposed method. Overall, this study has the potential to pave the way for new and more accurate methods for extracting single-trial ERPs and other source separation applications.

Data availability statement
The data are simulated and can be provided upon request. The data that support the findings of this study are available upon reasonable request from the authors. The real-world data are also available through BCI competition website ( https://www.bbci.de/competition/iii/.)