Abstract
Objective. The subthalamic nucleus (STN) of the basal ganglia interacts with the medial prefrontal cortex (mPFC) and shapes a control loop, specifically when the brain receives contradictory information from either different sensory systems or conflicting information from sensory inputs and prior knowledge that developed in the brain. Experimental studies demonstrated that significant increases in theta activities (2–8 Hz) in both the STN and mPFC as well as increased phase synchronization between mPFC and STN are prominent features of conflict processing. While these neural features reflect the importance of STN-mPFC circuitry in conflict processing, a low-dimensional representation of the mPFC–STN interaction referred to as a cognitive state, that links neural activities generated by these sub-regions to behavioral signals (e.g. the response time), remains to be identified. Approach. Here, we propose a new model, namely, the heterogeneous input discriminative-generative decoder (HI-DGD) model, to infer a cognitive state underlying decision-making based on neural activities (STN and mPFC) and behavioral signals (individuals' response time) recorded in ten Parkinson's disease (PD) patients while they performed a Stroop task. PD patients may have conflict processing which is quantitatively (may be qualitative in some) different from healthy populations. Main results. Using extensive synthetic and experimental data, we showed that the HI-DGD model can diffuse information from neural and behavioral data simultaneously and estimate cognitive states underlying conflict and non-conflict trials significantly better than traditional methods. Additionally, the HI-DGD model identified which neural features made significant contributions to conflict and non-conflict choices. Interestingly, the estimated features match well with those reported in experimental studies. Significance. Finally, we highlight the capability of the HI-DGD model in estimating a cognitive state from a single trial of observation, which makes it appropriate to be utilized in closed-loop neuromodulation systems.
Export citation and abstract BibTeX RIS
1. Introduction
The subthalamic nucleus (STN) of the basal ganglia and medial prefrontal cortex (mPFC) are essential brain structures in the decision-making processes [1, 2]. The interaction between the STN and the PFC results in slower and more controllable decisions during conflict tasks which has been previously demonstrated [1–4]. The presence of ambiguity in selection choices or conflict choices activates the cortex which sends excitatory inputs to the STN via the hyper-direct pathway [4–8]. In response, the STN sends excitatory inputs to the globus pallidus internus (GPi). GPi sends inhibitory inputs to the thalamus to suppress subsequent motor actions that lead to wrong decisions [4, 6, 7, 9]. This inhibition acts as a brake that slows the decision-making process down and helps to prevent impulsive decisions [4, 6, 7, 10].
Despite this mechanistic understanding of the role of the mPFC-STN neuronal circuit in decision-making with competing options, a low-dimensional representation of the mPFC–STN interactions that links neural activities generated by these sub-regions to behavioral signals (e.g. the response time) remains to be identified. This cognitive state can reveal the state of the brain during decision-making. Such a low dimensional representation, also known as the latent brain state, was identified in other situations, e.g. the mood state in patients with major depression [11], the state underlying cognitive deficits [12], the state of speech intention [13], the learning state in memory [14], the state of attention in eye-metrics [15], and the state underlying pain [16]. In this work, we propose a Bayesian-based model, the heterogeneous input discriminative-generative decoder (HI-DGD), which can combine information from multiple sources with different sampling rates, suitable for decoding the dynamics of the cognitive states underlying conflict processing in the verbal Stroop task. The HI-DGD consists of
- (i)A state transition model that controls the cognitive state evolvement over time.
- (ii)A generative model that represents the behavioral signal as a function of the cognitive state.
- (iii)A discriminative model that formulates the cognitive state as a function of (high-dimensional) neural activity features.
This hybrid approach will allow us to simultaneously combine simple generative models of behavior as a function of the underlying cognitive state with information encoded in complex patterns of high-dimensional neural activity. By using HI-DGD, we show that cognitive states underlying conflict and non-conflict tasks can be inferred from individual trials of the experiment accurately. Moreover, calculated model parameters reveal significant features of neural activities that classify conflict and non-conflict choices. Interestingly, these features have been proven experimentally, demonstrating the fidelity of our approach.
2. HI-DGD model
Here we propose the HI-DGD model. HI-DGD model infers a cognitive state, x k , that is a low-dimensional representation of conflict processing state underlying observed neural activity, , and behavioral signals, . A graphical representation of the HI-DGD model is shown in figure 1(B). The HI-DGD model uses three processes to infer a cognitive state that links neural activities and behavioral observations:
- (i)A state transition process that represents the evolution of the states in time, which follows a random walk model [17], .
- (ii)A discriminative process to link neural activity and the cognitive state as a function of high-dimensional observations, .
- (iii)A generative process that links the cognitive state to behavioral signals, .
Assume that s k represents high dimensional observations, at time index k, and z k represents all other observation elements which are usually low dimensional or very sparse [14, 18]. Given s k and z k , our goal for the HI-DGD model is to infer the posterior distribution over cognitive states, x k , as written below
This equation defines the conditional density function of the current value of the cognitive state given all sources of observations from k = 1. Following [19], equation (1) can be approximated as follow
where the nominator of the first term is called the prediction process. The second term of equation (2), , is the conditional density of the behavior observation, i.e. response time, given the current value of the state. The integral term of equation (2) is the one-step prediction density by the posterior function from the last time-step derived by the Chapman–Kolmogorov equation [20].
The state transition process is defined by a set of free parameters β as
The observation processes are defined by
where equations (4) and (5) indicate observation processes for the neural and the behavioral signals, respectively.
Most of the previous studies in cognitive state modeling [11, 12, 21, 22] considered a two-step process, namely, encoding and decoding, to identify the relationship between brain dynamics and behavior through a latent dynamical variable. In contrast to these works, the HI-DGD model infers the underlying cognitive state using both neural and behavioral observations simultaneously. In this way, the HI-DGD can use information from both behavior and neural data to estimate cognitive dynamics. In addition, the HI-DGD model can combine information at different temporal scales through the process defined in equations (3)–(5) to infer the underlying cognitive state. This property will make the solution flexible and powerful in decoding a cognitive state from different sources of information where each source carries information in different temporal scales.
2.1. Marked-point process observation model for modeling behavioral signals
In the present work, we utilized a marked point-process (MPP) observation model [23] for modeling the behavioral signals, both response times and trial types (conflict or non-conflict), in the Stroop task [24]. Each participant's response time is considered an event. We define as the random variable representing the onset of the participant's response time. At each time step k, indicates the onset of the participant's response and otherwise. Additionally, to differentiate between conflict and non-conflict tasks, we define another random variable, referred to as a mark, , in the MPP theory [13, 23, 25]. The variable mk can take two values, 1 and −1, where 1 represents a conflict decision, and −1 indicates a non-conflict choice. The distribution of the rk and mk represents the MPP observation process associated with the behavioral signals.
For the MPP observation, the joint mark intensity function of the process is defined by
where λ defines the instantaneous probability of observing an event, speech onset, at time index k with mark mk given the full history of events up to time k, Hk . Using the factorization rule, the joint distribution of the mark values and events is proportional to
where is the conditional intensity function (CIF) for the underlying event process and is the conditional intensity of the marks for specific events. Hereafter is written as for the sake of simplicity. In the case of observation processes characterized by the response time accompanied by a two choices decision like conflict and non-conflict choices, we can define the following distribution
Note that there are only two types of tasks, therefore mk can be fully represented by a Bernoulli random variable. To be consistent with the general linear model (GLM) theory, we used the 'log' and 'logit' link functions to describe the Poisson and Bernoulli distributions, respectively. Therefore, the predictor can be defined as a function of the cognitive state by
where are the observation process free parameters [25] and Hk is the history term of previous events. This leads to the observation likelihood formulated as
The first term, , characterizes the distribution of observing an event (the speech onset) and the remaining term, , characterizes the distribution of mark values (trial type) given an event time.
The remaining continuous observations such as local field potentials (LFP) and electroencephalogram (EEG) neural features during the cognitive task can be modeled as Gaussian observation processes (unless another type of observation is preferred) defined as
where maps the state values to the observations and εk is additive white Gaussian noise with covariance matrix .
2.2. Feature extraction and data processing
To reach a set of representative features relevant to conflict processing in the brain, we replicated the procedure proposed in [26]. We wrote a customized Python script along with using the MNE library [27] to analyze the LFP/EEG signals data. All LFP/EEG data were down-sampled to 1000 Hz and filtered by a band-passed filter (between 1 and 400 Hz).
2.2.1. Power spectrum
Following [26], we use time-frequency features from 3 s before to 2 s after the cue time. The LFP/EEG was decomposed into the time-frequency domain using a bank of band-pass filters for the 2–100 Hz in steps of 1 Hz. Then, the frequency features are normalized with the baseline power (p) of 3 to 2 s before the cue time using
where means the average of baseline power. We quantified the power spectrum features for delta-theta (2–8 Hz), alpha-l (8–12 Hz), alpha-h (12–20 Hz), low-beta (20–30 Hz), gamma-l (30–55 Hz), and gamma-h (65–100 Hz) frequency bands by averaging power spectrum over each frequency band.
2.2.2. Phase coherency
The STN and mPFC phase synchronization values at (k, f), , were estimated by projecting the phase difference between STN (LFP) and cortex (EEG) by phase coherency measure described in [28, 29] as follows.
here k is the time step (we considered 10 ms time resolutions for the time steps) and f is the frequency. The range of is from 0 to 1, where 0 indicates random phase differences between STN and mPFC, and 1 indicates identical phase differences.
2.2.3. Behavioral signals
For behavioral recording, a microphone of an Apple MacBook recorded the patients' voices. The speech onset time considered the response time and also measured the patient's performance based on his/her ability in making the correct decision. Trials with response time 2.0 s or 0.3 s and those with noise or unclear/incorrect speech responses were excluded from the analysis used in [26].
3. Results
Previous studies demonstrated that conflict processing is associated with statistical changes in the power and the phase of low-frequency oscillations (2–8 Hz—delta and theta bands) in the STN and mPFC, as well as in behavior such as the participants' response-time [26, 30]. We analyzed LFPs, EEGs, and response time of six PD patients who under went deep brain stimulation (DBS) surgeries recruited in a verbal Stroop task. A schematic view of neural feature extraction from LFP signals recorded from STN and EEG signals recorded from mPFC is shown in figure 1(A). This figure includes the inferred cognitive state from neural and behavioral signals (the blue signal). Details regarding the patients and experimental procedures reported in a previous study that studied the same cohort of patients [26, 30] (patients 3, 5–13 in table 1 of [26]). The location of STN contacts is described in [26] under 'contact localization'. In nine patients, the contact was in the STN, and in one patient it was the dorsal border of the STN. As described in [30], the adjacent contact pair (0–1, 1–2 or 2–3) with the highest beta (13–30 Hz) power in the left STN was chosen for further analysis as it has been shown to indicate the contacts most likely within the STN. Additionally, mPFC was derived from a bipolar recording of Fz-Cz EEG electrodes.
In the present work, we selected 6 out of 10 patients to test the accuracy of our proposed algorithm for individual trials. The selection criterion was based on the accuracy of the baseline model; we excluded those patients with low accuracy ( accuracy) in distinguishing conflict and non-conflict trials by the baseline model discussed in section 3.6. Moreover, we compared the accuracy of the proposed HI-DGD model with that of the baseline model including all ten patients. As shown in supplementary results—figure C1, the accuracy of the proposed model is significantly higher than that of the baseline model (p-value ). To this end, we used neural and behavioral signals to find a low-dimensional cognitive state underlying conflict processing in the selected participants.
3.1. Identifying optimal kernel of neural features using behavioral signals
To identify how neural features are linked to the behavioral signal, we, first, utilized a Poisson probabilistic model that represents the response time as a function of neural features. This is a traditional probabilistic model that frequently has been used to identify the relationship between inputs and the time of an event (here, patient response time) [14, 18]. By considering kc as the start time of a trial, as the feature vector extracted from neural activities and M as the number of the neural features for each time index, we define the response time indicator rk at time index k with a Poisson probabilistic model as
where the is the CIF that links the neural features to the response time indicator rk with a Poisson distribution. The parameter β0 governs the baseline response time, whereas ω represents the weight by which the subject response time is related to the neural features. The logarithmic relationship imposes an empirical limitation of human response time into the model [31, 32]. The model parameters can be optimized by a GLM with a proper link function and additive noise [18], see figure 2(A). Figure 2(B) shows the absolute weight of the extracted neural features ( ω ), indicating the importance of the neural features, in representing the response times for the patients. As time passes from cue time, the importance of the neural features that represent the response time of a patient decreases, and the variability of the features increases. To identify the optimal kernel width during which neural features represent response times, we calculated the log-likelihood (LL) of the optimized model and plotted the measure for different windows in figure 2(C). The LL value represents the goodness of fit for the behavioral model. The higher the value of the LL, the better a model fits to a dataset. The LL converges to its maximal value after [400–500] ms of observing neural features after the cue time. This means that increasing the time window for neural features is not statistically significant for the model fitting. Therefore, the most relevant neural features that represent the participant's response times, i.e. behavioral signal, lie within the 500 ms time window after the cue time.
Download figure:
Standard image High-resolution imageTable 1. Neural featured used in the HI-DGD model. Note that we remove the mPFC gamma features from the neural features due to its recorded from noisy EEG [33], therefore, it may not carry useful information but noise.
F01 | Power STN 2–8 Hz | F08 | Power mPFC 8–12 Hz |
F02 | Power STN 8–12 Hz | F09 | Power mPFC 12–20 Hz |
F03 | Power STN 12–20 Hz | F10 | Power mPFC 20–30 Hz |
F04 | Power STN 20–30 Hz | F11 | Phase Coh. STN-mPFC 2–8 Hz |
F05 | Power STN 30–55 Hz | F12 | Phase Coh. STN-mPFC 8–12 Hz |
F06 | Power STN 65–95 Hz | F13 | Phase Coh. STN-mPFC 12–20,Hz |
F07 | Power mPFC 2–8 Hz | F14 | Phase Coh. STN-mPFC 20–30 Hz |
3.2. HI-DGD model parameters inference using neural activity and behavioral signal
Building on the analysis from the previous section and experimental evidence related to conflict processing representation in both neural and behavioral signals, we used the HI-DGD model to infer a cognitive state that represents the dynamics underlying conflict/non-conflict processing from neural and behavioral signals (response time). Figure 1(B) shows a graphical representation of the HI-DGD model. In HI-DGD, represents the cognitive state, neural features, and the behavioral features at time index k, respectively. The HI-DGD model uses s k and z k to infer a cognitive state xk .
Therefore, the objective for the HI-DGD model optimization is to find a parameter set that maximizes the joint distribution of neural features, behavioral signals, and the cognitive state, , for all time steps K. To meet this objective, as shown in figure 3(A), we optimized the HI-DGD model parameters by maximizing the following function
Since both model parameters and the true cognitive state are unknown, one cannot simply use a maximum likelihood approach for finding the optimal parameters. To address this issue, we used expectation maximization (EM) to estimate the model parameters. The EM algorithm is an established method for maximum likelihood estimation of model parameters in the presence of a latent process or missing observations [34]. Other approaches including fully Bayesian or variational-Bayesian approaches, can be applied to our modeling approach in the case that a prior distribution for the model parameters can be defined [35]. In appendix
With an updating rule as
Further details of the function Q are provided in appendix
Download figure:
Standard image High-resolution image3.3. HI-DGD model validation using simulated dataset
To verify HI-DGD model training and validate its performance, we simulated a dataset with the assumption that there is a low dimensional dynamical state, we referred to it as the cognitive state, that simultaneously drives two sources of observations: neural activity and behavioral signals. We generate the data for 1000 data points and 5 simulated patients. More details about the simulation are available in appendix
Figure 4(A) shows a sample of simulated data. Our simulation data has a complex observation process along with a low-dimensional state process; this setting was purposefully picked to justify the HI-DGD prediction power and build a clear comparison across trial types. The value of the objective function for the EM algorithm and all simulated patients is shown in figure 4(B), wherein the increasing trend in Q values shows that the new parameters set for the HI-DGD, at iteration , improves model training. The Q values eventually reach a steady state at around 10 iterations and give the optimal model parameters sets. The optimized model parameters for the behavioral and neural signals are shown in figures 4(C) and (D), respectively. The HI-DGD result in inferring the cognitive state associated with different trial types are shown in figure 4. The qualitative result shown in this figure verifies the HI-DGD model's ability to infer the latent state from a group of noisy observations.
Download figure:
Standard image High-resolution image3.4. HI-DGD model identification in the verbal Stroop task dataset
A similar analysis is performed for HI-DGD model identification in the verbal Stroop task dataset. The value of the objective function for the EM algorithm is shown in figure 5(A), wherein the increasing trend in Q values shows that the new parameters set for the HI-DGD, , improves LL for the joint distribution of . The Q values eventually reach a steady state at around 15 iterations and give the optimal model parameters sets.
Download figure:
Standard image High-resolution imageThe optimized model parameters for the neural- and behavioral signals are shown in figures 5(C) and (B), respectively. In the HI-DGD, we assumed that the discriminative process model, , is a linear function of the neural features with additive normal noise with variance σn [21]. The generative process model, , is a marked-point process in which its marginal CIF is weighted by the cognitive state, by coefficient cx , and the history of response times, by coefficient , with a baseline intensity of c0 [18, 25]. Similarly, the generation of the associated marks to behavioral signals is parameterized by the cognitive state, by coefficient dx , and the history of response time, by coefficient , with a baseline of d0. Notice that we fixed the and in the optimization to avoid computational complexity [13]. The variables represents the prior knowledge model of the cognitive state with a linear transition process and additive Gaussian noise with variance σ0. The parameter σx is significantly smaller than σn which indicates that the cognitive state dynamic is smoother than the actual observed neural activity.
3.5. Decoding cognitive state from single trial observations
Here we tested the ability of the HI-DGD ability in decoding the conflict state from single trials of neural recordings (STN-mPFC). Using optimized model parameters, we applied the HI-DGD model to the first 500 ms (i.e. the optimal kernel width for neural features) of neural activities (after the cue time) and inferred the cognitive state associated with each participant. We then passed the mean and variance of decoded cognitive state to a logistic regression model to distinguish between conflict and non-conflict trials, figure 3(B). The logistic regression model can simply draw a border (a threshold) to separate between conflict and non-conflict trials based on the moments of the decoded cognitive state. The time interval used to decode cognitive state is shown by a green shaded area, the first 500 ms of neural features, see figure 6(A). The average of the inferred cognitive state is shown in figure 6(B). There is a difference between the distributions of the means in different trial types. This means the mean of the decoded cognitive state for the first 500 ms represents information about the trial types. To evaluate the performance of the decoded cognitive state by HI-DGD in distinguishing between conflict and non-conflict trials, we measured accuracy and f1-score on the prediction result by the logistic regressor for each patient in figure 6(C). The mean accuracy and f1 score for HI-DGD are and ; respectively. This shows that the cognitive state decoded by HI-DGD reliably and accurately represents conflict processing in the STN in PD patients.
Download figure:
Standard image High-resolution image3.6. Performance comparison with the baseline
To identify to what extent the HI-DGD model improves conflict state prediction compared to the traditional analysis, we compared the decoding performance of the HI-DGD model with traditional methods in the literature. One should note that the power of the STN theta band was considered as the main indicator of conflict versus non-conflict choices [26]. Therefore, we compared the HI-DGD model with a baseline model that only uses this feature to distinguish between conflict and non-conflict trials. We used the classifier that was used in the previous section with the STN theta band power as the input feature (consistent with [26, 30]). Similar to the HI-DGD model, we only used the first 500 ms of the neural features after cue time. The accuracy and f1-score of the baseline and the HI-DGD model are shown in figure 6(C). The HI-DGD model significantly improved the prediction of conflict vs. non-conflict trials compared to the baseline model for all the patients (KS-2 [36] with p-value ). We also compared the HI-DGD result with the Encoder-Decoder model suggested in [21] in figure 6(C). The plot shows the HI-DGD increased the accuracy and f1-score of the predictions by approximately 5%.
3.7. Significant neural features underlying conflict and non-conflict processing
To determine the importance of individual neural features in conflict processing by HI-DGD, we masked all neural features except the feature of interest and used the neural decoder to measure the accuracy of the prediction. In this way, we could identify the contribution of each neural feature in decoding the cognitive state. As shown in figure 7(A), the most significant features are
- (i)The power of the STN and mPFC channels in the 2–8 Hz frequency band.
- (ii)Phase coherency between STN and mPFC in the theta frequency band (2–8 Hz frequency band).
- (iii)Power of the STN channel in the gamma frequency band.
Download figure:
Standard image High-resolution imageThe power of the theta frequency band (2–8 Hz) in both STN and mPFC, as well as the phase coherency in the theta band, significantly contributed to the decoding of conflict vs. non-conflict choices. These results of the HI-DGD model are in agreement with those reported in the literature [26, 30], confirming that the inferred cognitive state represents a low dimensional representation of the cognitive state underlying conflict processing in the STN-mPFC neuronal circuit. One possible interpretation of the role of Gamma frequency band features in decoding the cognitive state is as follows. Gamma activities are associated with contralateral movement such as speech production [37]. Speech production, in a verbal Stroop task, is an inevitable part of the task and its onset is marked as the response time. As reported in [26], the response times were significantly longer for conflict trials than non-conflict trials which makes it a good indicator for separating conflict and non-conflict trials. Because gamma activity was locked to the response time (speech onset), the contribution of gamma activity may reflect the contribution of response time in decoding the cognitive state. Note that even though the gamma frequency band features are likely related to speech involved in the Stroop task, the gamma activity is involved in other functions such as learning, perception, memory, and emotional processing, and these processes could also be involved. This makes it difficult to interpret the exact role of gamma activity in the decoding cognitive state by HI-DGD.
Furthermore, we tested the performance of the HI-DGD with all neural features against that using only the significant ones in figures 7(B)–(E). We use the receiver operator characteristic (ROC) curve as an evaluation metric for the HI-DGD performance. The ROC curve is a probabilistic curve that indicates the true prediction rate against the false prediction rate at various threshold values for a classifier in identifying conflict and non-conflict trials. We plotted the ROC curve for HI-DGD performance with all neural features and using only the significant ones in figures 7(B) and (C); respectively. The area under the curve (AUC) of the ROC plot is the measure of the ability of the classifier to identify trial types and is used as a summary of the ROC curve. The dashed line in figures 7(B) and (C), associated with AUC = 0.5, indicates the performance of a non-expert classifier that is unable to distinguish between trial types. Therefore, The higher the AUC, the better performance of the model at distinguishing between the trial types. The AUC for HI-DGD performance for the patients is shown in figure 7(D). The non-significant difference between the AUC of HI-DGD with the significant neural features and all neural features shows that the identified significant neural features represent a similar result as the HI-DGD model does with all neural features. We also verified the findings by measuring accuracy and f1 scores for the HI-DGD in distinguishing between the trial types in figure 7(E). In sum, the difference between these measures is negligible.
4. Discussion
Both mPFC and STN play crucial roles in conflict processing [30]. Previous studies reported elevated theta power in STN and mPFC, and increased phase synchrony in the theta band between STN and mPFC in conflict trials [26, 30]. A recent study showed increased theta-gamma phase-amplitude coupling (PAC) at STN in conflict trials [30]. Despite the importance of these features in conflict processing (calculated by group-averaging methods across conflict and non-conflict trials), none of them represent common dynamics of conflict processing at every single trial of observation. Such dynamics, referred to as a cognitive state, represent a low-dimensional representation of the mPFC-STN interaction and link it to the behavioral signal, i.e. the response time. In this study, we developed a Bayesian-based method, namely, HI-DGD model to infer a cognitive state underlying decision-making from simultaneously recorded EEG from the mPFC and local field potentials from the STN recorded using deep brain stimulation electrodes in PD patients performing a Stroop task. First, we used a simulation dataset to assess the performance of the model. We showed that the HI-DGD model can infer cognitive states accurately and tracks the dynamics of the data. Second, by applying the HI-DGD model to the experimental data, we showed that the HI-DGD model can predict conflict vs. non-conflict choices using only 500 ms of STN-mPFC recordings. Moreover, the performance of the HI-DGD model in distinguishing between conflict and non-conflict choices was significantly better than using traditional methods. Third, we identified significant neural features estimated by the HI-DGD model: (a) theta band power in both STN and mPFC, (b) theta phase synchrony between mPFC and STN, and (c) gamma power in STN. These features match well with those reported in previous studies, indicating the interpretability of the HI-DGD model.
4.1. The HI-DGD enables trial-based detection of conflict and non-conflict states
One of the main advantages of the HI-DGD model, compared to the traditional group-averaging analysis, is that the cognitive state can be inferred (and thus decoded) from individual trials. The model not only enables studying the trial-to-trial variability of cognitive states underlying conflict processing but also provides an opportunity to track the dynamics of these states and design model-based closed-loop cognitive controllers. In section 3.5, we applied a (trained) neural decoder in the first 500 ms of observed neural activities after the cue time. An interval of 500 ms is sufficient to track at least one cycle of each frequency band, specifically for the theta band power which is a prominent neural feature of conflict processing. We showed in figure 6 that the HI-DGD model can capture dynamics underlying conflict processing and distinguishes between conflict and non-conflict choices with the mean accuracy of (compared to using traditional methods).
Two main sources of error in these types of problems are
- (i)Uncertainty or inconsistency in patterns of EEG/LFP data due to the quality of data. For example, if there is no consistent pattern in EEG/LFP power spectrums, the model is not able to find an accurate model parameter associated with this and therefore the model is not able to perform accurately.
- (ii)Suboptimality of models due to the difficulty of the problem or modeling limitations. In these types of problems, because we have latent/unobservable dynamical models, we need to approximate the nose model with a predefined distribution and use variational inference or EM to be able to approximate intractable computations for posterior inference over latent variables, such as the cognitive state in this study, with simpler ones. Therefore the identified noise model is not ideal or the optimization schemes may not guarantee optimal model parameters, especially in the EM algorithm.
Therefore, the source of error in single-trial decoding by HI-DGD could have been either or a combination of these factors. Unfortunately, there is no solid way to measure the contribution of each of these factors in our HI-DGD.
4.2. HI-DGD comparison with state-space models traditional models
Various methods including state-space models (SSMs) [18, 29, 38] and machine learning approaches [39–41] has been used to model cognitive states using neural or behavioral data [11, 12, 21, 31, 42]. An important key assumption behind these approaches is that dynamics in the latent brain states, hereafter known as cognitive states, are encoded in a subset of features of the neural activity and behavioral signals. Therefore, it can be decoded from either neural recordings or behavioral signals.
Despite the usefulness of these methods in decoding cognitive states, they suffer from several limitations. For high-dimensional neural data (recorded by several electrodes over time), these models become impractical and computationally inefficient for building accurate receptive-field models for individual neurons or sources [43–46]. Building accurate receptive-field models requires additional modeling steps, such as spike sorting or cluster-less models, which add complexity and computational expense to the process [47]. Moreover, individual neural receptive-field models do not contain information about the co-activity across neurons that encapsulate essential information about interactions among different brain regions of the brain [48, 49]. For example, traditional models often ignore the coherence and synchrony between different channels of neural activity for decoding a cognitive state [12].
In addition, traditional models do not use all possible sources of information at the same time for decoding a cognitive state [11, 12, 21, 22]. The existing methods utilize behavioral signals as a noisy observation of the designated cognitive state. Then, the cognitive state is solely decoded from neural signals. For example, Yousefi et al [21] considered the response time of participants performing a multisource interference task as a noisy measure of cognitive flexibility state and inferred it from LFP signals in a separate step [50]. Therefore, this strategy of decoding a cognitive state employs only one source of information and may lead to suboptimal results.
The hybrid inference solution suggested by the HI-DGD simultaneously combines simple generative models of behavior as a function of the underlying cognitive state with information encoded in complex patterns of high-dimensional neural to address the shortcomings of traditional models.
In developing our modeling framework, the HI-DGD, we hypothesized that there is a low-dimensional dynamical manifold—or, a state process—present in the data which helps to better characterize and potentially understand neural mechanisms of complex cognitive processes. The state process in HI-DGD is directly connected to both generative and discriminative processes and can update its value by either one or two observation(s) at the same time. The discriminative part can therefore be used for decoding a latent state from high-dimensional observed neural recordings [51, 52]. By considering the discriminative process, the HI-DGD does not need to perform intermediate processing steps such as spike sorting or independent component analysis, which are common practices in preparing neural data [53]. As the HI-DGD uses the discriminative model for high-dimensional data, it offers a more accurate prediction of the cognitive state compared to generative models where the model inadequacy might be problematic [54, 55].
We compared the performance of the HI-DGD model with that using a baseline analysis technique. For the latter, we used a logistic regressor with an input calculated by the power of the theta band of the STN averaged over all trials and across participants (consistent with [26, 30]). Similar to the HI-DGD model, we only used the first 500 ms of observed neural recordings after the cue time. We showed that the HI-DGD model significantly improved the accuracy of the predictions compared to the baseline technique (statistics for accuracy and F1-score, with p-value = 0.002).
There are many choices to build the discriminative model including deep neural networks, convolutional neural networks [56], recurrent neural networks [57], and parametric models like the Gaussian process [58]. Unlike high dimensional neural recordings, behavioral signals such as response-time and error rates are usually very low dimensional and have slow temporal dynamics compared to neural signals, thus it is not efficient to use discriminative processes for capturing dynamics of behavioral signals [59, 60]. Therefore, we used a generative model, similar to the SSM, for the low dimensional subset of the observed behavioral signals.
Acknowledgments
This work was supported by Dr Lankarany's NSERC Fund (RGPIN-2020-05868) and MITACS Accelerate (IT19240).
Data availability statement
The data that support the findings of this study are available upon reasonable request from the authors.
Conflict of interest
The authors declare no competing interests.
Appendix A: Objective function derivation for HI-DGD model identification
We use EM algorithm [61] to find maximum likelihood estimates of the model parameters, . The EM algorithm is an established solution to perform maximum likelihood estimation of model parameters when there is an unobservable process or missing observations [21]. In the HI-DGD, the state variable is the unobserved process, the latent dynamical variable. The EM solution recursively estimates the model parameters , based on an updated posterior distribution of and the observation from the previous EM iteration, . The EM includes two steps: expectation and maximization [62]. The expectation step (or Q function) is defined by
Here after we replace with for summarizing the notation. The Q function can be rewritten as
Expanding the Q function yields
The integration term here can be expressed as
When the state process is linear with an additive Gaussian noise and the prediction process is a multi-variate normal, there is a closed form solution for this expectation. To derive a general solution for nonlinear and non-Gaussian noises, we can rewrite Q function as
Since , we can exchange the log and expectation operations to yield a lower bound for Q, which can be written as
where, the expectation in the last term, defined by the prediction process, is a function of the model free parameters. The second step of the EM is maximization step (M-step) that updates parameter set at iteration (r) by maximizing Q. The maximization is defined by
This optimization can be calculated analytically or numerically (e.g. gradient descent [63]). After each iteration, a new set of parameters are estimated and the EM routine is stopped when a stopping criteria based on the likelihood growth or parameter changes is satisfied.
Appendix B: Simulation dataset details
We simulated a group of patients with the assumption that there is a low dimensional dynamics, we referred to it as the cognitive state, that drives both neural activity and behavioral signals. The statistical properties of the cognitive state are different for trial types. Let us assume the cognitive state for trial j of mth patient, is defined by , where d is the trial type and is modeled by a Bernoulli random variable. Here we selected the Bernoulli parameter 0.5 to make sure we have balanced data from conflict and non-conflict trials. We modeled the state process as a random walk as follows
where and are the transition coefficient and standard deviation for each random walk. We generated two sets of observations. The first one is a high-dimensional and continuous, s k , designed to resemble the dynamics of the LFP/mPFC signals in a cognitive task. Using the state, we then generate a 20-dimensional continuous signal with the conditional distribution defined by
where is a 20-dimensional vector of non-linear functions, like tanh, and cosine, with an argument defined by a subset of state processes at the current and previous time points. The lm is the maximum number of data points used in creating function. For each channel of data, we pick a l uniformly from 0 to lm randomly, which is used for data generation. For example, in our simulation, the g1 function corresponding to the first element of g , is a with the argument to be . is the stationary covariance matrix with sizer . 's non-diagonal terms are non-zeros, implying observations across channels are correlated. In our simulation, we picked a = .9, b = 0, and .
The second observation, rk , is a low dimensional, discrete, and sparse signal that resembles the dynamics of a behavioral signal in a cognitive task such as response time. rk can be seen as the time of the agent response to a task and it can be modeled as:
where represent the weight of current state and represent the weight of the error signal mk in generating . ϑ is an additive Gaussian noise with a standard deviation of 0.3. We assume this relationship between rk , conflict state, and error rate because empirically response times typically decline rapidly in non-conflict trials and later tend to reach a lower bound based on subject-specific physiological constraints.
Appendix C: Supplementary figure
Download figure:
Standard image High-resolution image