EEG-based hierarchical classification of level of demand and modality of auditory and visual sensory processing

Objective. To date, most research on electroencephalography (EEG)-based mental workload detection for passive brain-computer interface (pBCI) applications has focused on identifying the overall level of cognitive resources required, such as whether the workload is high or low. We propose, however, that being able to determine the specific type of cognitive resources being used, such as visual or auditory, would also be useful. This would enable the pBCI to take more appropriate action to reduce the overall level of cognitive demand on the user. For example, if a high level of workload was detected and it is determined that the user is primarily engaged in visual information processing, then the pBCI could cause some information to be presented aurally instead. In our previous work we showed that EEG could be used to differentiate visual from auditory processing tasks when the level of processing is high, but the two modalities could not be distinguished when the level of cognitive processing demand was very low. The current study aims to build on this work and move toward the overall objective of developing a pBCI that is capable of predicting both the level and the type of cognitive resources being used. Approach. Fifteen individuals undertook carefully designed visual and auditory tasks while their EEG data was being recorded. In this study, we incorporated a more diverse range of sensory processing conditions including not only single-modality conditions (i.e. those requiring one of either visual or auditory processing) as in our previous study, but also dual-modality conditions (i.e. those requiring both visual and auditory processing) and no-task/baseline conditions (i.e. when the individual is not engaged in either visual or auditory processing). Main results. Using regularized linear discriminant analysis within a hierarchical classification algorithm, the overall cognitive demand was predicted with an accuracy of more than 86%, while the presence or absence of visual and auditory sensory processing were each predicted with an accuracy of approximately 70%. Significance. The findings support the feasibility of establishing a pBCI that can determine both the level and type of attentional resources required by the user at any given moment. This pBCI could assist in enhancing safety in hazardous jobs by triggering the most effective and efficient adaptation strategies when high workload conditions are detected.


Introduction
A passive brain-computer interface (pBCI) is a system that continuously monitors a user's mental state (e.g.cognitive or emotional) as they are engaged in human-computer interaction and uses this information to enhance the interaction in some helpful way.An example would be a system that automatically adjusts the difficulty level of a video game based on the estimated level of engagement of the player.In a pBCI, the information about the user's mental state is derived from neurophysiological signals, and because it is non-invasive, relatively cost-effective, portable, and has a high temporal resolution, electroencephalography (EEG) is currently the most common method for acquiring these signals [1].
Key to developing a pBCI is the ability to reliably detect the different mental states of interest, and research has been done toward detecting such states as fatigue [1,2], attention [3][4][5], stress [6,7], and various emotions [8][9][10][11].Among the most researched states in pBCI research is mental workload, due to its relevance in many applications, including improving workplace safety in high-risk environments.The purpose of a pBCI for mental workload detection would be to monitor the user's mental workload and make adjustments to the task demands in order to avoid overload, the point at which the individual's processing capacity is exceeded and performance deteriorates significantly.
Mental workload is defined as the portion of an individual's limited cognitive resources required to perform a task or set of concurrent tasks [12].Several studies have demonstrated the ability to classify mental workload associated with various laboratory tasks (e.g.N-back [13], mental arithmetic [14], Sternberg memory [15], auditory oddball target paradigm [16,17], visual search [17], and a laboratory version of air traffic control for vigilance tests [18]), as well as in more realistic task scenarios like flight [1,17,[19][20][21] or driving [1,22] simulation, using EEG signals.Typically in such studies, different levels of mental workload are induced by altering the difficulty of the task being performed or by introducing secondary tasks.
A distinct but closely related concept to mental workload is multiple resource theory (MRT) [23]).According to MRT, an individual's capacity for workload consists of separate resource pools, defined in terms of a set of three dimensions of information processing, each with different levels [23].In terms of multi-tasking, an individual will be better able to perform two tasks concurrently to the extent that different tasks appear to be in separate dimensions.Broadly, the dimensions can be described as modality (levels: visual or auditory), code (levels: verbal or spatial), and stage of processing (levels: perceptualcentral or response-related) [23].
In virtually all practical scenarios where pBCI systems would find application, the user will almost certainly at times be engaging in tasks involving various sub-tasks using different levels within the three dimensions of information processing.For example, an air traffic controller may be receiving information both visually and aurally.In such a scenario, a pBCI that is capable of predicting not just the overall level of mental workload (amount of cognitive resources being used) but also the type(s) of resources that are being used would be very useful, as it would allow the BCI to apply more suitable adaption techniques to avoid overload conditions.For example, if the pBCI determines that the air-traffic controller is using a significant amount of auditory attentional resources, but no visual ones, the system could adjust the system to present some information visually rather than aurally.Indeed, research has demonstrated that dual-task interference, which is the enhanced difficulty when performing two or more tasks simultaneously, can be reduced by off-loading some of the information channels from one modality to another.In particular, it has been shown that transferring some information channels from the visual modality to the auditory modality can reduce dual-task interference [24].Similarly, transferring information channels from the auditory to the visual modality has also been shown to reduce dual-task interference [25,26].This is likely because different modalities have different cognitive load dimensions, and thus by switching the modality of the task, the cognitive load is reduced.The development of a pBCI that is capable of predicting both the amount and type of cognitive resources being used is the ultimate, long term objective of this research.
While many studies have attempted to detect overall mental workload level via EEG, few studies have considered the classification of sensory modality.Putze et al [27] explored single-trial classification of tasks involving primarily visual and auditory processing via band power changes and the event-related potential (ERP) waveform.They showed that trials of a 'silent video watching' task and an 'audiobook listening' task could be differentiated with about 94% accuracy.While the results of this study were promising, the authors acknowledged that other factors, including the different memory loads in the auditory and visual conditions, may have contributed to the separability of the classes.Also, this study only used stimuli from a single modality in each task, so it is unclear whether task separability was based on attention to/perception of stimuli or just passive exposure to them.
In a recent study [28], we sought to address some of these issues by investigating visual and auditory versions of a target monitoring task that differed only in the sensory modality of the target stimuli to be monitored.We also included passive/task-irrelevant stimuli from the opposing modality, so that we could be sure any ability to differentiate the tasks was indeed due to sensory processing/perception and not simply to the passive sensation of the stimuli.Furthermore, we investigated the effect of the amount of sensory processing requirements on the ability to classify modalities by implementing high and low demand conditions of each task.The results showed that at the higher level of demand, the auditory vs. visual processing tasks could be distinguished with an accuracy of 77.1% on average.However, in the low demand condition, where the required sensory processing was very low, the visual and auditory tasks could not be classified with an accuracy exceeding chance.These results support the feasibility of developing a pBCI for detecting the type of attentional resources being required of the user at a given time, at least in the higher demand condition which is most critical.
The current study aims to build significantly on this work and move toward the overall objective of developing a pBCI that is capable of predicting both the level and the type of cognitive resources being used.In doing so, we incorporated a more diverse range of task scenarios than was previously explored, including not only single-modality task conditions (i.e.those requiring one of either visual or auditory processing) as in our previous study, but also dual-modality task conditions (i.e.those requiring both visual and auditory processing) and notask/baseline conditions (i.e. when the individual is not engaged in either visual or auditory processing).In total we considered nine different task conditions (four single-modality, four dual-modality, and one baseline).Whereas in our previous work we explored just the classification of auditory vs. visual sensory processing within high task demand conditions and low task demand conditions separately, in this work we developed a hierarchical classifier to simultaneously detect, among these varied conditions, (1) the level of task demand (high or low), (2) the presence or absence of auditory processing, and (3) the presence or absence of visual processing.The novel hierarchical structure we have proposed provides a foundational model for creating a pBCI designed to predict sensory domain-specific workload across diverse task conditions.The scenarios explored more closely reflect the range of conditions an actual pBCI user may be experiencing in real-world scenarios, and the pBCI must be able to handle this variation.

Material and methods
The dataset used in this study was previously described in [28].

Participants
Fifteen healthy adults (9 female, average age = 28.7 ± 5.7 years) participated in this study.Participants were included if they had no history of neurological disease, disorder, injury, or cognitive impairment, and had normal or corrected-to-normal visual acuity and normal auditory acuity.The study was approved by the Interdisciplinary Committee on Ethics in Human Research at Memorial University of Newfoundland.All participants provided written informed consent prior to participating.The data from one male subject were excluded from analysis due to a reported lack of concentration during the session, as well excessive body movement which significantly affected signal quality.

EEG data acquisition
Scalp EEG were recorded for each subject using a 64channel ActiCHamp system (Brain Products GmbH, Gilching, Germany) with active electrodes.The electrode placement followed the 10-10 international standard [29,30].The EEG signals were recorded at a sampling rate of 500 Hz, and referenced to electrode FCz.Electrode impedance was maintained below 10 kΩ throughout recording.To reduce motion artifacts, participants were asked to limit movement as much as possible during experimental trials.

Experimental design
We designed a simple monitoring task that allowed us to induce two types of sensory processing (visual and auditory) each at two levels of demand (high and low demand).Single-and dual-modality versions of the task were used.In the single-modality tasks, both the visual and auditory conditions contained nearly identical visual and auditory stimuli (within a demand level), and the conditions differed only with respect to which type of stimuli the individual attended to.In the dual-modality tasks, the participant attended to both types of stimuli.The different tasks are described in detail in the following sections.The cues for the experimental trials were designed using the Cogent 2000 toolbox in MATLAB, and the trials were completed via a desktop computer.

Type of sensory processing: auditory and visual
In each task trial, the participant was asked to monitor stimuli (letters A-Z from the English alphabet and numbers 0-9) and respond by pressing the keyboard space-bar when specific target characters were presented.During the trial, targets were presented at a rate of 30 ± 3%.For Auditory (Aud) trials, the stimuli were presented via speakers located on the desk in front of the subject.For Visual (Vis) trials, the stimuli were presented as white characters in the center of the black computer screen.The target characters were indicated prior to the start of each trial (they were different for each trial).
We wanted to ensure that any differences we observed between the auditory and visual trials were actually due to the sensory processing requirements of the task, and not merely due to passive exposure to sensory stimuli.Therefore, in addition to the stimuli described above, which were presented in the sensory modality of interest for that trial and which the participant was instructed to attend to (henceforth referred to as 'active stimuli'), in the singlemodality tasks (described in section 2.3.3)another set of stimuli were presented in the opposite sensory modality which the participant was instructed not to attend/respond to or monitor in any way (henceforth referred to as 'passive stimuli').The passive stimuli were characters from the Greek alphabet, chosen because they were similar to the active stimuli yet different enough to be obviously task-irrelevant and easily ignored.
The result of this task design was that for the single-modality condition, both visual and auditory trials included both visual and auditory stimuli, and the only difference between the conditions was the type of sensory stimuli that the subject had to actually pay attention to during the trial.Note that during auditory trials, the participants were instructed to keep their eyes open and look at the screen, but not to pay attention to the visual passive stimuli.For the visual trials, the passive auditory stimuli were played through the speakers, so the participants were exposed to them but were told not to pay attention to them.In the dual-modality tasks, there were active stimuli in both sensory modalities (described in section 2.3.3).

Sensory processing demand
For each sensory modality, two levels of sensory processing were induced: low-demand (L) and highdemand (H).The level of processing demand was varied by changing: 1) the stimulus presentation speed and 2) the number of target letters/numbers.In the low-demand condition, the stimuli were presented slowly (once every 2.25 s) and there was a single target, while in the high-demand condition the stimuli were presented more quickly (once every 0.75 s) and there were two targets.

Types of task trials 2.3.3.1. Single-modality trials
For the single-modality tasks, one of the two sensory modalities made up the active stimuli, while the other made up the passive stimuli.That is, for the visual trials the active stimuli were visual while the auditory stimuli were passive, and for the auditory trials the active stimuli were auditory while the passive stimuli were visual.There were trials in both the low-and high-demand conditions for each sensory modality.
In the single-modality trials, the active and passive stimuli were presented at the same speed.
To summarize, there were four different singlemodality task conditions: high demand auditory (Aud H ), high demand visual (Vis H ), low demand auditory (Aud L ), and low demand visual (Vis L ).

Dual-modality trials
There were four different dual-modality task conditions also.In each case, both visual and auditory active stimuli were present, and the participant had to attend (and respond) to the stimuli from both modalities.In the first dual-modality task, both the auditory and visual stimuli were presented at the faster speed (Aud H Vis H ). In the second dual-modality task, both the auditory and visual stimuli were presented at the slower speed (Aud L Vis L ).In the remaining two dual-modality tasks, the auditory stimuli were presented at the faster speed while the visual stimuli were presented at the lower speed (Aud H Vis L ) and vice versa (Aud L Vis H ).

Baseline trials
Three types of baseline trials were collected: one in which passive stimuli in both modalities were presented at the higher speed, one in which the passive stimuli in both modalities were presented at the lower speed, and a true baseline in which the participant focused on the black computer screen with a constant '+' symbol in the center.Because the results of our previous work suggested that our BL L condition was too similar to the Aud L and Vis L conditions [28], we excluded them (and the BL H condition) and only the true baseline (BL) trials were considered in this work.
Table 1 summarizes the different types of task trials.

Experimental procedure
The experiment was carried out in a single session taking approximately 2 hours.The session started with the recording of two 1-minute baseline trials, one with eyes-open and one with eyes-closed, followed by a practice block for participants to become familiar with the tasks and the experimental procedure.
The main part of the experiment consisted of four blocks of trials.Each block comprised 18 trials: two trials of each of the four single-modality task conditions (Aud H , Vis H , Aud L and Vis L ), one trial for each of the four dual-modality task conditions (Aud H Vis H , Aud L Vis L , Aud H Vis L and Aud L Vis H ), and two trials for each of the three baseline conditions (BL H , BL L and BL).Each trial was 30 s in duration.This resulted in a total of 240 s of data for each of  the single-modality tasks, 120 s of data for each of the dual-modality tasks, and 120 s of data for each of the baseline conditions.The order of trials was nearrandom (no two trials of the same condition appeared back-to-back) and different for each block.
Before each trial, the subject was informed about the type of trial and reminded of the instructions via text on the computer screen.For non-baseline trials, the subject was then presented with the target stimulus/stimuli.Participants started each trial by pressing the space bar, and thus were able to progress through the trials at their own pace, taking breaks as needed between trials and blocks.The experiment ended with one eyes-closed and one eyes-open baseline trial, each 60 s in duration.
At the end of each trial, the participant's response accuracy was shown in order to increase engagement in the experiment.Response accuracy was calculated according to: where: • TP (true positive) = key press when the stimulus is the target • TN (true negative) = no key press when the stimulus is not the target • FP (false positive) = key press when the stimulus is not the target • FN (false negative) = no key press when the stimulus is the target.
At the end of each trial, participants were asked to complete a modified version of the Rating Scale for Mental Effort, indicating the amount of mental effort they perceived was needed to complete the trial (see figure 1). Figure 2 illustrates the timing of the experiment overall, as well as at the block and trial levels.

Data analysis 2.4.1. EEG pre-processing and artifact removal
The EEG data was analyzed using EEGLAB 2021.0 [31] and Matlab 2022a (The MathWorks, Inc.), with the use of custom code where necessary.The preprocessing steps included two central signal processing techniques, artifact subspace reconstruction (ASR) [32] and independent component analysis (ICA) [33,34].For each subject, the data from the entire session was first down sampled to a frequency of 250 Hz using an anti-aliasing filter.Following this, a bandpass filter was applied to the data to eliminate any potential baseline drifts and line noise.The filter used for this step was a 0.5-55 Hz anti-aliasing Hamming windowed sinc FIR filter with the transition bandwidth of 1 Hz.
The ASR algorithm is described in detail in [32].EEGLAB plugin clean_rawdata() which is an offline version of data cleaning suites from BCILAB, was used to de-noise the continuous channel data [35].This process included removing poorly correlated (r < 0.7) channels, rejecting all inter-trial intervals (to remove any noisy components that may contribute to the next step of the process) and removing nonstationary high-amplitude bursts.The standard deviation cut-off for removal of bursts was set to 50 to be very conservative and avoid losing potentially valuable EEG.This process improved data stationarity, which is an assumption for the later ICA.The next steps were to interpolate any removed channels, and finally reference the data to the common average.

ICA
Adaptive mixture independent component analysis (AMICA) [36,37] was applied.AMICA assumes that PDF of the source activations are mixture of multiple Gaussians.The parameters used for AMICA were: number of models, 1 (i.e.single model for faster analysis); number of times to perform rejection of unlikely data based on initial samples, 15; iteration interval between rejections, 1.All other parameters were set to default.Next, equivalent current dipoles were estimated using dipfit plugin [38] for scalp projections of the ICs.Then, ICs with bilaterally nearsymmetrical projection pattern were modeled with equivalent dipoles using the fitTwoDipoles EEGLAB plugin [39].ICs were selected if: (1) they were localized inside the brain [40]; (2) they had residual variance less than 25%; [41] (3) their PSD followed a 1/f curve, (4) they were scored via ICLabel as brain ICs [42].
All steps of the EEG data pre-processing, including ASR, and ICA, was done in a single script for each subject's entire session data.However, manual inspections were done during the parameter selection for each plugin to ensure that artifacts were removed effectively without losing a significant amount of data.

Feature extraction
To extract relevant features from the EEG data, we focused on the power of the signals within standard frequency bands of interest.These frequency bands included the standard Delta (1-4 Hz), Theta (4-8 Hz), Alpha (8-12 Hz), Beta (12-30 Hz), and Gamma (30-50 Hz) bands.To calculate the average power within each of these bands, we divided the data into 10 s windows or epochs with a sliding window of 1 s (i.e.overlapping 9 s).The power spectra were then computed using the fast Fourier transformation method with a DPSS taper and the specified frequency smoothing of 0.5 Hz.For each trial, the average power was calculated for each of the 63 electrodes within each of the five frequency bands.The function ft_freqanalysis() in the FieldTrip toolbox [40] was used for this process.

Classification 2.4.3.1. Classification problems
In this study, we aimed to investigate the use of EEG for classifying both the level and type of cognitive load experienced as individuals underwent a diverse range of task conditions, including the performance of both single-and dual-modality tasks at different levels of task demand, as well as baseline/rest tasks.To accomplish this, we used a hierarchical classification approach.
The first level of the hierarchical classifier was responsible for predicting the level of cognitive demand (i.e.high or low).This was because our previous results suggested that level of demand could be classified with better accuracy than type of demand [28], and it is good practice in hierarchical classification to put more accurate layers first to avoid error propagation.It consisted of a single classifier trained using data from all task conditions.The first column of table 2 lists the 'true label' for demand level assumed for each task condition (note that for the dual-modality tasks, all but the Vis L Aud L task, where both the visual and auditory components were at the low demand level, were considered 'high demand' conditions.),Every test sample, X i , first went through this classifier to get a prediction for demand level, ỹLi (low = 0, high = 1).
The second level of the hierarchical classifier was responsible for predicting the type(s) of cognitive demand, i.e. the sensory modality (or modalities) of the task being performed.This level consisted of two paths.The first path was for samples predicted in the first level as being high demand (ỹ Li = 1), and the classifiers in this path were trained only on samples from high demand conditions.The second path was for samples predicted in the first level as being low demand (ỹ Li = 0), and the classifiers in this path were trained only on samples from low demand conditions.Because it is possible for a given sample to have both an auditory and a visual component (i.e. the dual-modality conditions), or neither an auditory nor a visual component (i.e. the baseline/rest condition), each path of the second level consisted of two distinct modality classifiers-one predicted if the sample, X i , had a visual component (ỹ Vi = 1) or not (ỹ Vi = 0), and one predicted if the sample had an auditory component (ỹ Ai = 1) or not (ỹ Ai = 0).The second and third columns of table 2 list the 'true labels' for visual/not visual (y Vi ) and auditory/not auditory (y Ai ) for each task condition.The demand-specific visual classifier and the demand-specific auditory classifier operate in parallel and are independent.The output ỹVi = 1 indicates the presence of visual sensory processing, while ỹAi = 1 indicates auditory sensory processing.So, in scenarios where both outputs are 1, it implies that the task contained elements of both visual and auditory sensory processing (i.e. was from a dual-task condition).In scenarios where both outputs are 0, it implies that the task did not contain elements of either visual or auditory sensory processing (i.e. was from the baseline condition).The results from these classifiers are distinct and do not influence each other.For each individual classifier within the hierarchical classification scheme, regularized linear discriminant analysis (r-LDA) from the MatLab Toolbox BCILAB was used as the classification algorithm.
We first implemented the hierarchical classification approach considering only single-modality and baseline conditions (i.e.excluding the dual-modality scenarios), and then incorporated the four dualmodality conditions as well.Figure 3 depicts the hierarchical classifier structure (for the case where dualmodality conditions are included).

Classification method
For each classification scenario (i.e. with and without dual-modality conditions included), stratified 4-fold block-wise cross-validation was used.To begin, the data was randomly divided into four subsets such that each subset comprised 25% of the trials from all conditions to be classified (i.e. two out of eight trials for all single-modality conditions and the baseline condition, one out of four trials for the dual-modality conditions).In each 'fold' of the cross-validation, one of the subsets was held out for testing, while the remaining trials made up the training set.To balance the classes in the training set, the class with the larger number of trials was down sampled in order to match the number of trials from the other class.This down sampling was done pseudo-randomly to ensure that each and every task condition was present in the training set, and that the number of trials for each of the task conditions within a class were as similar as possible.To reduce the variability introduced by the random division of data into four subsets for the cross-validation, and to the random down sampling of data needed to balance the classes, the 4fold block-wise cross-validation was repeated for 100 runs.Because we extracted overlapping 10 s epochs from the 30 s trials of each condition, employing trialwise cross-validation (rather than randomized k-fold cross-validation) ensures that there is no data leakage between the test and training sets.For the three classification problems of (i) High vs. Low Demand, (ii) Visual vs.Not Visual, and (iii) Auditory vs.Not Auditory, overall classifier performance was estimated via the mean of the accuracy, as well as sensitivity and specificity, over all runs and folds of the crossvalidation.To further investigate the performance of the classifiers, we also calculated the accuracies for each task condition within the two classes.For example, for the High vs. Low classification, within the 'high' class, we calculated the percentage of Aud H samples that were correctly classified as being high demand (and did so also for all task conditions with y Li = 1).

Considering single-modality and baseline conditions only
The performance of the classifier in predicting the level of task demand (first layer of the hierarchy) in the case where only the single-modality and baseline conditions were considered is presented in table 3.An overall accuracy of 86.4 ± 6.9% was achieved in classifying level of task demand, averaged across all participants.An analysis using a paired t-test on the sensitivity and specificity results show that high demand conditions were classified significantly better (88.3 ± 6.9%) compared to low demand (84.4 ± 9.4%) conditions (t(13) = 2.17, p = .049).The breakdown of these results into individual accuracies for each of the task conditions considered indicate that the level of task demand could be classified with relatively high accuracy for samples from all task conditions (per condition accuracies ranged from 78.1% to 91.8% ).The high demand conditions included Aud H and Vis H , while low demand conditions included Aud L , Vis L and BL.
The results for detecting sensory modality (second layer of the hierarchy) are presented in tables 4 and 5. Recall that based on the prediction of the task demand classifier in the first layer, test samples were fed into two demand-specific classifiers trained to detect the presence or absence of (i) auditory and (ii) visual processing.
In terms of auditory processing detection (see table 4(a)), the overall accuracy of the classifier in determining the presence or absence of an auditory component was 71.9 ± 5.4%, averaged across all participants.Of all samples with an auditory processing component, 73.3 ± 8.0% were correctly identified, while of all samples with no auditory processing component, 70.5 ± 6.5% were correctly identified.The results for each individual task condition show that, in general, auditory processing detection was more successful in high demand conditions (Aud H = 80.5 ± 10.5% and Vis H = 78.5 ± 9.2%) than low demand conditions (Aud L = 67.7 ± 9.8%, Vis L = 60.4 ± 10.9%, and BL = 75.4± 10.5%)(paired t-test, t(13) = 8.21, p < .001).
In terms of visual processing detection (see table 5(a)), the overall accuracy of the classifier in determining the presence or absence of a visual processing component was 70.6 ± 4.4%, averaged across all participants.Of all the samples with a visual processing component, 71.0 ± 6.3% were correctly identified while of the samples with no visual processing component, 70.2 ± 7.9% were correctly identified.The results for each individual task condition show that visual processing detection was generally more successful in high demand conditions (Vis H = 80.6 ± 7.8% and Aud H = 77.7 ± 11.1%) than low demand conditions (Vis L = 63.7 ± 10.0%, Aud L = 61.8± 11.8% and BL = 73.1 ± 11.1%)(paired t-test, t(13) = 9.3, p < .001).
Figure 4 represents the accuracies of the hierarchical classifications for each subject when considering single-modality tasks.
Table 6 presents the confusion matrices for each of the classifiers used in classifying single-modality tasks to provide detailed evaluation of their performance.

Considering single-modality, dual-modality, and baseline conditions
The performance of the classifier in predicting the level of task demand (first layer of the hierarchy) in the case where single-modality, dual-modality and baseline conditions were considered is presented in table 7.An overall accuracy of 87.7 ± 5.3% was achieved, averaged across all participants.The sensitivity and specificity results again showed that Again, two distinct demand-specific classifiers were trained to determine the presence or absence of i) auditory and ii) visual components.In terms of auditory processing detection (see table 8(a)), the overall accuracy in determining the presence or absence of an auditory processing component was 70.2 ± 5.0%, averaged across participants.The sensitivity and specificity, shown in table 8(a), indicate that 70.3 ± 6.9% of samples with an auditory processing component were correctly identified while 70.2 ± 4.8% of samples without an auditory processing component were correctly identified.Breaking down the results into each task condition in tables 8((b) and (c)) shows that the presence or absence of an auditory component could be more accurately detected in high demand task conditions (per condition accuracies ranged from 68.9 to 77.1%) than low demand task conditions, excluding BL (paired t-test, t(13) = 4.43, p < .001),(per condition accuracies ranged from 58.5 to 66.3%, with the exception of the BL condition at 81.4%).
In terms of visual processing component detection (see table 9(a)), the overall accuracy in determining the presence or absence of a visual processing component was 70.8 ± 3.9%, averaged across all participants.The sensitivity and specificity, shown in table 9(a), indicate that 71.9 ± 4.9% of samples with a visual processing component were correctly identified while 69.9 ± 5.7% of samples without a visual processing component were correctly identified.Breaking down the results into each task condition, tables 9((b) and (c)) shows that the presence or absence of a visual processing component could be more accurately predicted in high demand conditions  (per condition accuracies ranged from 70.3% to 83.8%) than low demand conditions (per condition accuracies ranged from 59.2% to 70.0%, with the exception of 78.7% for the BL condition)(paired ttest, t(13) = 4.70, p < .001).
Figure 5 visualizes the accuracies of the hierarchical classifications for each subject when considering multi-modal tasks.
Table 10 presents the confusion matrices for each classifier used in classifying multi-modal tasks to provide better insight on the classifiers' performance.
To determine the most important features contributing to the level of task demand and sensory modality classification problems, we examined the value of the feature weights used by the r-LDA classifier for each classification problem, separately.Figures 6 and 7 show these feature weights, averaged across all participants, and over all runs and folds of the trial-wise cross-validation.Recall that the features were calculated as the EEG signal power in each of five frequency band (delta, theta, alpha, beta, gamma) at all 63 electrodes, therefore to improve visualization and interpretability, the feature weights were shown for three classification problem, in each of the single-modality and multi-modal tasks in figures 6 and 7, respectively.These include, high demand vs low demand classification problem, and modality specific classifiers within high demand classification.

Classification of level of task demand
The results from tables 3 and 7 support previous studies and suggest that EEG-measured brain    (a) Results of hierarchical visual component detection.
Visual component detection in the multi-modal tasks 70.8 ± 3.9 % Vis correct % Non-Vis correct 71.9 ± 4.9 69.9 ± 5.7 (b) Breakdown of the results for the high demand conditions.activity can effectively be used to predict level of processing demand (high vs. low).The use of regularized LDA, a traditional machine learning classifier, was able to robustly and accurately distinguish the level of demand with accuracies around 86%-87%, even when there was significant variability in the task conditions used to train and test the classifier.Furthermore, the level of task demand could be detected with reasonable accuracy for samples from all task conditions considered.While an important result on its own, this was also critical for the purpose of the sensory modality detection.The classifiers for auditory and visual processing detection were demand-specific, meaning they were trained using samples from only the high or the low demand task conditions.Depending on their predicted level of task demand, test samples were fed into the appropriate demand-specific modality classifiers.If the accuracy  of the task demand classifier (the first layer of the hierarchy) was insufficient, it would negatively impact the results of the modality classifications (the second level of the hierarchy).
To get more insight into what features are most useful in classifying the level of processing demand, we looked at the weights given to the features by the r-LDA classifier, for the high demand vs low demand classification problem.Figures 6(a) and 7(a) show the average of these weights when considering (i) only single-modality and (ii) multi-modal tasks in the classifications, respectively.The results indicate that theta and delta band features were weighted most highly by the classifiers.Theta band spectral power has been shown to increase with increase of task demand [1] and cognitive load (see [43][44][45] for review), and synchronization in the theta band has been shown to be sensitive to working memory load [46].Experimental studies provide evidence hinting that delta oscillations in the brain have functions that are broadly similar to those of theta oscillations (see [45] for review).On the other hand, weights associated to beta band show a reverse trend, especially mostly in the parietal area.Increased power within beta is shown to correspond with overload [1,47] and is negatively correlated with task performance [47].
The reader may note that the stimuli were presented periodically, and at different rates for the low and high demand conditions.This may raise the question of whether the ability to classify the low demand from the high demand conditions is really due to the changes in band power related to sensory processing or rather due to steady state evoked potentials.To investigate this matter, we classification of the BL L and BL H conditions using the same feature selection and classification procedure described in this paper for the individual classifiers within the hierarchical structure.The results were that BL L and BL H could be classified with an accuracy of 58% across participants.While this is slightly above chance (which could be due to steady state evoked potentials, or some other factor), it is far below the average accuracies of approximately 87% that were obtained in the high demand vs. low demand scenarios.This provides strong evidence that classification was not based on steady-state frequencies but the changes in band power due to sensory processing as expected.

Classification of sensory processing
The results from tables 4(a) and 5(a) suggest that when considering only single-modality and baseline conditions, the presence or absence of both auditory and visual sensory processing could be detected with accuracies around 70%, which is significantly higher than chance (for approximately 640 samples, the upper bound of the 95% confidence intervals for chance is about 53.9%) .For both modalities, though, the ability to detect the presence or absence of each type of sensory processing was significantly better in high demand than low demand conditions.Even when including dual-modality conditions, along with single-modality and baseline, the presence or absence of both auditory and visual sensory processing could be detected with similar accuracies, around 70%, which is again significantly higher than chance (for approximately 800 samples, the upper bound of the 95% confidence intervals for chance is about 53.4%).Again, predictions were more accurate for samples from the high demand conditions compared to those from the low demand conditions; specifically in visual component detection, the result was significantly higher in high demand conditions (see tables 8(a) and 9(a)).
Overall these are very encouraging results.The results from the scenario where all task conditions (single-modality, dual-modality and baseline) were included suggest that even when sensory processing in the two modalities are occurring simultaneously, the presence or absence of both auditory and visual processing can be detected via EEG with accuracy significantly exceeding chance.This supports the feasibility of developing an EEG-based passive BCI for detecting both the level and type of cognitive processing for use in complex, realistic task scenarios.And while the ability to classify sensory processing modality was lower in the low task demand conditions, in real-world scenarios where such a passive BCI would most likely be used, it is the high demand conditions that are most critical, particularly when considering applications to improve safety in highrisk work environments.It is in high demand scenarios where one would want the system to redistribute tasks between the two modalities to reduce the overall workload on the user; such redistribution would not be necessary in low demand conditions.The fact that we were able to accurately classify the auditory and visual sensory modalities in high demand conditions is a very promising result (see tables Delta band weights appear to be of most relevance in all four of the sensory processing classification problems.This is in line with the previous literature where it has been shown that delta oscillations demonstrate their highest response amplitude in parietal locations when reacting to visual oddball targets, whereas the highest delta response amplitudes to auditory target stimuli are observed in central and frontal areas (see [45,48]).Specifically, high weights were given to occipital theta in both single-modality and multimodal classification problems in visual component detection 6(c) and 7(c).The occipital lobe serves as the brain's visual processing center, so it is reasonable that this region would play a significant role in detecting visual component.Additionally, a correlation between enhanced delta and theta EEG power in the occipital areas and exertion in a visual attention task has been previously shown [49].
The fact that sensory processing was lower in low demand conditions was not unanticipated based on our previous work on this dataset [28].In that work, we explored the ability to distinguish visual from auditory processing trials via EEG signals, in singlemodality conditions and within workload levels (i.e.Aud H vs Vis H and Aud L vs. Vis L ).Our findings showed that while the Vis H and Aud H conditions could be differentiated with an accuracy of approximately 77.1%, the accuracy achieved for Vis L vs. Aud L was at chance level.It is encouraging that the results in the current work showed improvement over our previous findings for both the low and high demand task conditions (tables 4(c) and 5(c)).Even when including the more complex dual-modality conditions and the baseline condition (tables 8(c) and 9(c)), the presence and absence of auditory and visual processing in the Vis L and Aud L conditions were detected with accuracies between 59%-66% (for approximately 320 samples, the upper bound of the 95% confidence intervals for chance is about 55.4%).The improvement of the results may be attributable to the use of a single classifier trained to distinguish auditory vs.
processing in the previous work, as compared to the use of two classifiers for predicting the presence/absence of each modality in this work.In future, these accuracies could potentially be increased again with further improvement of the classifier, collection of more training data, etc.
It is worth noting that given the way our classes are developed, it is possible that two consecutive trials may be from the same class, and in the trial-wise CV one might be included in the training set while the other is in the testing set, resulting in temporal autocorrelation that could inflate the classification accuracy.We did not anticipate this effect to be significant.However, to ensure that our results are not affected by potential temporal auto-correlation across consecutive trials, we conducted a re-analysis of the data using randomized labels.In this approach, all aspects of the analysis remained unchanged, except the condition labels were randomized across trials in the dataset.By doing this, true class differences were eliminated while maintaining the potential temporal autocorrelation.If the temporal auto-correlation was significant enough to affect the classification accuracy, we would expect to observe accuracies exceeding chance in the randomized labels analysis.The results from the analysis with randomized labels yielded accuracies of 49.7 ± 4.4% and 50.4 ± 4.2% for auditory and visual component detection in the singlemodality tasks, respectively.Similarly, for the multimodal tasks, the accuracies with randomized labels were 49.7 ± 2.8% and 50.1 ± 2.5% for auditory and visual component detection, respectively.These outcomes suggest that there is indeed no significant effect of temporal auto-correlation on the classification accuracies reported in the original analysis.
While the results presented are promising, it is important to acknowledge some limitations of this study.First of all, we designed a very controlled task rather than using a highly realistic one (e.g.flight simulator).This was necessary to eliminate potentially confounding variables, however it is not clear how our results will transfer to more naturalistic scenarios in which a passive BCI might be used.The strict design may have either overestimated or underestimated the performance of the classifier in recognizing auditory and visual processing tasks in a more realistic environment.Further research is needed to examine this.Also, while we were able to detect both the level of task demand and the presence/absence of auditory and visual components in single-modality, dual-modality and baseline conditions, we did not detect the level of demand within each modality.For example, for samples from the Aud L Vis H condition, we may successfully classify that they are high demand with both a visual and auditory processing component, but we would not know that the auditory processing component is low and the visual component is high.This additional information could be useful for the passive BCI, and future work could aim to achieve this added functionality.However, the system we propose would still be very practical.It would (1) identify whether the task demand on the user is high or low, and (2) identify whether the user is engaged in a visual-only task, an auditory-only task, a visualauditory task, or is at baseline.As mentioned, it is in the high demand condition that adaptation strategies might be employed.In the case of a single-modality task, the system could off-load some information to the other modality to avoid potential overload.In the dual-modality case, even if the demand level of each modality were known, it would likely be unwise to offload from the higher demand modality to the lower one-it would be wiser to just reduce demand altogether.So the important information is whether the level of demand is high or low, and what are the types of processing the individual is experiencing (i.e. is it a single-modality or a dual-modality task) regardless of the level of each type.This is the information the system we have described would provide.

Conclusion
The current study aimed to build on our previous work and move toward the overall objective of developing a pBCI that is capable of predicting both the level and the type of cognitive resources being used.In doing so, we incorporated a more diverse range of task scenarios including not only single-task conditions (i.e.those requiring one of either visual or auditory processing) as in our previous study, but also multi-task conditions (i.e.those requiring both visual and auditory processing) and no-task/baseline conditions (i.e. when the individual is not engaged in either visual or auditory processing).These scenarios more closely reflect the range of conditions an actual pBCI user may be experiencing in real-world scenarios.Our results indicate that using a traditional classifier within a hierarchical approach, the overall level of demand was successfully detected in all task conditions, and the presence/absence of auditory and visual processing could be detected accurately in high demand scenarios (and less accurately in low demand ones, though above chance level).These results support the potential of developing an EEG-based passive BCI system for detecting both the level and type of cognitive processing.Such a system could significantly improve safety in high-risk and safety critical work environments by detecting and off-loading high workload from one modality to another, thereby decreasing overall workload and, in turn, the likelihood of human error-related accidents.

Figure 1 .
Figure 1.The modified Rating Scale for Mental Effort (RSME) shown at the end of each trial to ask the participant to rate the amount of mental effort required to complete the trial.

Figure 2 .
Figure 2. Experimental Procedure.The experiment was completed in a single session.The protocol began and concluded with one eyes-closed and one eyes-open baseline trial, followed by a practice block.In the main part of the experiment, blocks were similar in terms of the arrangement of trials, and the only difference was the trials' order.There were two trials for each singlemodality condition in each block, and one trial for each dual-modality conditions.

Figure 3 .
Figure 3.The structure of the hierarchical classifier, showing the flow of a test sample, i, with feature vector X i , and the resulting predicted labels ỹLi , ỹVi and ỹAi for workload level (1 = high, 0 = low), visual component (1 = yes, 0 = no) and auditory component (1 = yes, 0 = no), respectively.

Figure 4 .
Figure 4. Accuracies of the hierarchical classifications in single-modality tasks per subject averaged across cross-validation folds.The blue bars represent the first layer of the hierarchical classification, which determines task demand.The orange and yellow bars correspond to the second layer that detects the auditory and visual components, respectively.

( a )
Results of hierarchical auditory component detection.Auditory component detection in the multi-modal tasks 70.2 ± 5.0 % Aud correct % Non-Aud correct 70.3 ± 6.9 70.2 ± 4.8 (b) Breakdown of the results for the high demand conditions.

Figure 5 .
Figure 5. Accuracies of the hierarchical classifications in multi-modal tasks per subject, averaged across cross-validation folds.The blue bars represent the first layer of the hierarchical classification, which determines task demand.The orange and yellow bars correspond to the second layer that classifies the auditory and visual components, respectively.

Figure
Figure Average of feature weights in the r-LDA classifier across single-modality classification problems for (a) high demand vs low demand; (b) and (c) modality specific classification within high demand conditions.

Figure 7 .
Figure 7. of feature weights in the r-LDA classifier across multi-modal classification problems for (a) high demand vs low demand; (b) and (c) modality specific classification within high demand conditions.
4(b), 5(b), 8(b) and 9(b)).To get more insight into what features are most useful in classifying the sensory processing, the weights given to the features by the r-LDA classifier are represented in figures 6 and 7. Figures 6(b) and 7(b) show the average of these weights in the classification problem of auditory component detection withing high demand classification.Figures 6(c) and 7(c) show the average of these weights in the classification problem of visual component detection.

Table 1 .
Different trial types, including task and baseline trials.Trials are defined based on the modalities of the target and passive stimuli (Auditory and Visual), and the stimulus presentation speed defining High demand and Low demand trials.

Table 2 .
True labels for samples of all task conditions in terms of demand level, presence of visual component, presence of auditory component.

Table 3 .
Classification accuracies for workload in the single-modality tasks.

Table 4 .
Classification accuracies for auditory component detection in the single-modality tasks.

Table 5 .
Classification accuracies for visual component detection in the single-modality tasks.

Table 6 .
Confusion matrices of three different classifiers used in classifying single-modality tasks.

Table 7 .
Classification accuracies for workload in the multi-modal tasks.

Table 9 .
Classification accuracies for visual component detection in the multi-modal tasks.

Table 10 .
Confusion matrices of three different classifiers used in classifying multi-modal tasks.
(a) Level of cognitive demand classifier.