Inferring cognitive state underlying conflict choices in verbal Stroop task using heterogeneous input discriminative-generative decoder model

Mohammad R Rezaei; Haseul Jeoung; Ayda Gharamani; Utpal Saha; Venkat Bhat; Milos R Popovic; Ali Yousefi; Robert Chen; Milad Lankarany

doi:10.1088/1741-2552/ace932

1. Introduction

The subthalamic nucleus (STN) of the basal ganglia and medial prefrontal cortex (mPFC) are essential brain structures in the decision-making processes [1, 2]. The interaction between the STN and the PFC results in slower and more controllable decisions during conflict tasks which has been previously demonstrated [1–4]. The presence of ambiguity in selection choices or conflict choices activates the cortex which sends excitatory inputs to the STN via the hyper-direct pathway [4–8]. In response, the STN sends excitatory inputs to the globus pallidus internus (GPi). GPi sends inhibitory inputs to the thalamus to suppress subsequent motor actions that lead to wrong decisions [4, 6, 7, 9]. This inhibition acts as a brake that slows the decision-making process down and helps to prevent impulsive decisions [4, 6, 7, 10].

Despite this mechanistic understanding of the role of the mPFC-STN neuronal circuit in decision-making with competing options, a low-dimensional representation of the mPFC–STN interactions that links neural activities generated by these sub-regions to behavioral signals (e.g. the response time) remains to be identified. This cognitive state can reveal the state of the brain during decision-making. Such a low dimensional representation, also known as the latent brain state, was identified in other situations, e.g. the mood state in patients with major depression [11], the state underlying cognitive deficits [12], the state of speech intention [13], the learning state in memory [14], the state of attention in eye-metrics [15], and the state underlying pain [16]. In this work, we propose a Bayesian-based model, the heterogeneous input discriminative-generative decoder (HI-DGD), which can combine information from multiple sources with different sampling rates, suitable for decoding the dynamics of the cognitive states underlying conflict processing in the verbal Stroop task. The HI-DGD consists of

(i)
A state transition model that controls the cognitive state evolvement over time.
(ii)
A generative model that represents the behavioral signal as a function of the cognitive state.
(iii)
A discriminative model that formulates the cognitive state as a function of (high-dimensional) neural activity features.

This hybrid approach will allow us to simultaneously combine simple generative models of behavior as a function of the underlying cognitive state with information encoded in complex patterns of high-dimensional neural activity. By using HI-DGD, we show that cognitive states underlying conflict and non-conflict tasks can be inferred from individual trials of the experiment accurately. Moreover, calculated model parameters reveal significant features of neural activities that classify conflict and non-conflict choices. Interestingly, these features have been proven experimentally, demonstrating the fidelity of our approach.

2. HI-DGD model

Here we propose the HI-DGD model. HI-DGD model infers a cognitive state, x _k, that is a low-dimensional representation of conflict processing state underlying observed neural activity, $\boldsymbol{s}_{1:k}$ , and behavioral signals, $\boldsymbol{z}_{1:k}$ . A graphical representation of the HI-DGD model is shown in figure 1(B). The HI-DGD model uses three processes to infer a cognitive state that links neural activities and behavioral observations:

(i)
A state transition process that represents the evolution of the states in time, which follows a random walk model [17], $p(\boldsymbol{x}_{k}|\boldsymbol{x}_{k-1})$ .
(ii)
A discriminative process to link neural activity and the cognitive state as a function of high-dimensional observations, $P(\boldsymbol{x}_k|\boldsymbol{s}_k)$ .
(iii)
A generative process that links the cognitive state to behavioral signals, $P(\boldsymbol{z}_k|\boldsymbol{x}_k)$ .

**Figure 1.** HI-DGD encodes the cognitive state associated with conflict processing from neural activities of STN and mPFC and the patient response time. (A) A schematic view of the encoding mechanism. Each trial of the experiment begins with a cue signal, after which a color word appears on the screen. The subject should respond with the ink color of the words and ignore their meaning. As illustrated, the subject would speak (RED) whether the word is 'RED' (non-conflict trial) or 'GREEN' (conflict trial). A one-to-one ratio of conflict and non-conflict trials was used in this study. (B) The graphical view of the HI-DGD model. s _k, z _k, x _k represent the neural features, behavioral features, and the inferred cognitive state at time k; respectively.
Download figure:
Standard image High-resolution image

Assume that s _k represents high dimensional observations, at time index k, and z _k represents all other observation elements which are usually low dimensional or very sparse [14, 18]. Given s _k and z _k, our goal for the HI-DGD model is to infer the posterior distribution over cognitive states, x _k, as written below

$\begin{equation} p(\boldsymbol{x}_k|\boldsymbol{s}_{1:k},\boldsymbol{z}_{1:k}). \end{equation} \tag{ 1 }$

This equation defines the conditional density function of the current value of the cognitive state given all sources of observations from k = 1. Following [19], equation (1) can be approximated as follow

$\begin{align} p(\boldsymbol{x}_{k}| \boldsymbol{s}_{1:k}, \boldsymbol{z}_{1:k}) &\propto \frac{p(\boldsymbol{x}_{k}|\boldsymbol{s}_{k})}{p(\boldsymbol{x}_{k})} \times p(\boldsymbol{z}_{k}| \boldsymbol{x}_{k}) \nonumber\\ &\quad \times {\int p(\boldsymbol{x}_{k}|\boldsymbol{x}_{k-1}) p(\boldsymbol{x}_{k-1}|\boldsymbol{s}_{1:k-1}) \mathrm d\boldsymbol{x}_{k-1}} \end{align} \tag{ 2 }$

where the nominator of the first term is called the prediction process. The second term of equation (2), $p(\boldsymbol{z}_{k}| \boldsymbol{x}_{k})$ , is the conditional density of the behavior observation, i.e. response time, given the current value of the state. The integral term of equation (2) is the one-step prediction density by the posterior function from the last time-step derived by the Chapman–Kolmogorov equation [20].

The state transition process is defined by a set of free parameters β as

$\begin{equation} \boldsymbol{x}_{k}|\boldsymbol{x}_{k-1}\sim f(\boldsymbol{x}_{k-1};\boldsymbol{\beta}). \end{equation} \tag{ 3 }$

The observation processes are defined by

$\begin{equation} \boldsymbol{x}_{k}|\boldsymbol{s}_{k}\sim g(\boldsymbol{s}_{k};\boldsymbol{\Omega}) \end{equation} \tag{ 4 }$

$\begin{equation} \boldsymbol{z}_{k}|\boldsymbol{x}_{k}\sim l(\boldsymbol{x}_{k};\boldsymbol{\Gamma}) \end{equation} \tag{ 5 }$

where equations (4) and (5) indicate observation processes for the neural and the behavioral signals, respectively.

Most of the previous studies in cognitive state modeling [11, 12, 21, 22] considered a two-step process, namely, encoding and decoding, to identify the relationship between brain dynamics and behavior through a latent dynamical variable. In contrast to these works, the HI-DGD model infers the underlying cognitive state using both neural and behavioral observations simultaneously. In this way, the HI-DGD can use information from both behavior and neural data to estimate cognitive dynamics. In addition, the HI-DGD model can combine information at different temporal scales through the process defined in equations (3)–(5) to infer the underlying cognitive state. This property will make the solution flexible and powerful in decoding a cognitive state from different sources of information where each source carries information in different temporal scales.

2.1. Marked-point process observation model for modeling behavioral signals

In the present work, we utilized a marked point-process (MPP) observation model [23] for modeling the behavioral signals, both response times and trial types (conflict or non-conflict), in the Stroop task [24]. Each participant's response time is considered an event. We define $r_k\in\{0,1\}$ as the random variable representing the onset of the participant's response time. At each time step k, $r_k = 1$ indicates the onset of the participant's response and $r_k = 0$ otherwise. Additionally, to differentiate between conflict and non-conflict tasks, we define another random variable, referred to as a mark, $m_k\in\{-1,1\}$ , in the MPP theory [13, 23, 25]. The variable m_k can take two values, 1 and −1, where 1 represents a conflict decision, and −1 indicates a non-conflict choice. The distribution of the r_k and m_k represents the MPP observation process associated with the behavioral signals.

For the MPP observation, the joint mark intensity function of the process is defined by

$\begin{equation} \lambda(k,m_k|H_k) \end{equation} \tag{ 6 }$

where λ defines the instantaneous probability of observing an event, speech onset, at time index k with mark m_k given the full history of events up to time k, H_k . Using the factorization rule, the joint distribution of the mark values and events is proportional to

$\begin{equation} \lambda(k,m_k|H_k) \propto \Lambda(k|H_k)f(m_k|k,H_k) \end{equation} \tag{ 7 }$

where $\Lambda(k|H_k) = \sum_{m} \lambda(k,m_k\mid H_k)$ is the conditional intensity function (CIF) for the underlying event process and $f(m_k|k,H_k)$ is the conditional intensity of the marks for specific events. Hereafter $\Lambda(k|H_k)$ is written as $\Lambda_k$ for the sake of simplicity. In the case of observation processes characterized by the response time accompanied by a two choices decision like conflict and non-conflict choices, we can define the following distribution

$\begin{equation} r_k \sim Poisson(\Lambda_k), m_k\sim Bernoulli(p_k). \end{equation} \tag{ 8 }$

Note that there are only two types of tasks, therefore m_k can be fully represented by a Bernoulli random variable. To be consistent with the general linear model (GLM) theory, we used the 'log' and 'logit' link functions to describe the Poisson and Bernoulli distributions, respectively. Therefore, the predictor can be defined as a function of the cognitive state by

$\begin{align} \log (\Lambda_k) &= c_0 + c_x*x_k + c_H*H_k, \nonumber\\ logit (p_k) &= d_0 + d_x*x_k + d_H*H_k, \end{align} \tag{ 9 }$

where $\{c_0, d_0, c_x, d_x, c_H, d_H\}$ are the observation process free parameters [25] and H_k is the history term of previous events. This leads to the observation likelihood formulated as

$\begin{align} P(r_k,m_k|x_k,H_k) &\propto \Lambda_k^{r_k} \exp -\Lambda_k \nonumber\\ & \quad \times {\left({p_k}^{m_k}{(1-p_k)}^{(1-m_k)}\right)}^{r_k}. \end{align} \tag{ 10 }$

The first term, $\Lambda_k^{r_k} \exp -\Lambda_k$ , characterizes the distribution of observing an event (the speech onset) and the remaining term, ${({p_k}^{m_k}{(1-p_k)}^{(1-m_k)})}^{r_k}$ , characterizes the distribution of mark values (trial type) given an event time.

The remaining continuous observations such as local field potentials (LFP) and electroencephalogram (EEG) neural features during the cognitive task can be modeled as Gaussian observation processes (unless another type of observation is preferred) defined as

$\begin{equation} \boldsymbol{y_k} = F_k(x_k) +\boldsymbol{\epsilon_k}, \boldsymbol{\epsilon_k}\sim N(0,\Sigma_k), \end{equation} \tag{ 11 }$

where $F_k(.)$ maps the state values to the observations and ε_k is additive white Gaussian noise with covariance matrix $\Sigma_k$ .

2.2. Feature extraction and data processing

To reach a set of representative features relevant to conflict processing in the brain, we replicated the procedure proposed in [26]. We wrote a customized Python script along with using the MNE library [27] to analyze the LFP/EEG signals data. All LFP/EEG data were down-sampled to 1000 Hz and filtered by a band-passed filter (between 1 and 400 Hz).

2.2.1. Power spectrum

Following [26], we use time-frequency features from 3 s before to 2 s after the cue time. The LFP/EEG was decomposed into the time-frequency domain using a bank of band-pass filters for the 2–100 Hz in steps of 1 Hz. Then, the frequency features are normalized with the baseline power (p) of 3 to 2 s before the cue time using

$\begin{equation} p_\textrm{normalized} = \frac{p-\bar{p}_\textrm{baseline}}{\bar{p}_\textrm{baseline}} \end{equation} \tag{ 12 }$

where $\bar{p}_\textrm{baseline}$ means the average of baseline power. We quantified the power spectrum features for delta-theta (2–8 Hz), alpha-l (8–12 Hz), alpha-h (12–20 Hz), low-beta (20–30 Hz), gamma-l (30–55 Hz), and gamma-h (65–100 Hz) frequency bands by averaging power spectrum over each frequency band.

2.2.2. Phase coherency

The STN and mPFC phase synchronization values at (k, f), $Coh(k,f\,)$ , were estimated by projecting the phase difference between STN (LFP) and cortex (EEG) by phase coherency measure described in [28, 29] as follows.

$\begin{equation} Coh(k,f\,) = |\exp(i*(\theta^\textrm{LFP}_{k,f\,}- \theta^\textrm{EEG}_{k,f}))|, \end{equation} \tag{ 13 }$

here k is the time step (we considered 10 ms time resolutions for the time steps) and f is the frequency. The range of $Coh(k,f\,)$ is from 0 to 1, where 0 indicates random phase differences between STN and mPFC, and 1 indicates identical phase differences.

2.2.3. Behavioral signals

For behavioral recording, a microphone of an Apple MacBook recorded the patients' voices. The speech onset time considered the response time and also measured the patient's performance based on his/her ability in making the correct decision. Trials with response time $\gt$ 2.0 s or $\lt$ 0.3 s and those with noise or unclear/incorrect speech responses were excluded from the analysis used in [26].

3. Results

Previous studies demonstrated that conflict processing is associated with statistical changes in the power and the phase of low-frequency oscillations (2–8 Hz—delta and theta bands) in the STN and mPFC, as well as in behavior such as the participants' response-time [26, 30]. We analyzed LFPs, EEGs, and response time of six PD patients who under went deep brain stimulation (DBS) surgeries recruited in a verbal Stroop task. A schematic view of neural feature extraction from LFP signals recorded from STN and EEG signals recorded from mPFC is shown in figure 1(A). This figure includes the inferred cognitive state from neural and behavioral signals (the blue signal). Details regarding the patients and experimental procedures reported in a previous study that studied the same cohort of patients [26, 30] (patients 3, 5–13 in table 1 of [26]). The location of STN contacts is described in [26] under 'contact localization'. In nine patients, the contact was in the STN, and in one patient it was the dorsal border of the STN. As described in [30], the adjacent contact pair (0–1, 1–2 or 2–3) with the highest beta (13–30 Hz) power in the left STN was chosen for further analysis as it has been shown to indicate the contacts most likely within the STN. Additionally, mPFC was derived from a bipolar recording of Fz-Cz EEG electrodes.

In the present work, we selected 6 out of 10 patients to test the accuracy of our proposed algorithm for individual trials. The selection criterion was based on the accuracy of the baseline model; we excluded those patients with low accuracy ( ${\sim} 50\%$ accuracy) in distinguishing conflict and non-conflict trials by the baseline model discussed in section 3.6. Moreover, we compared the accuracy of the proposed HI-DGD model with that of the baseline model including all ten patients. As shown in supplementary results—figure C1, the accuracy of the proposed model is significantly higher than that of the baseline model (p-value $\leqslant 0.001$ ). To this end, we used neural and behavioral signals to find a low-dimensional cognitive state underlying conflict processing in the selected participants.

3.1. Identifying optimal kernel of neural features using behavioral signals

To identify how neural features are linked to the behavioral signal, we, first, utilized a Poisson probabilistic model that represents the response time as a function of neural features. This is a traditional probabilistic model that frequently has been used to identify the relationship between inputs and the time of an event (here, patient response time) [14, 18]. By considering k_c as the start time of a trial, $\boldsymbol{\psi_k} \in \mathbb{R}^M$ as the feature vector extracted from neural activities and M as the number of the neural features for each time index, we define the response time indicator r_k at time index k with a Poisson probabilistic model as

$\begin{equation} r_k \sim Poisson(\Lambda_k),\log \Lambda_k = {\boldsymbol{\omega}}^\mathrm T \boldsymbol{\psi}_k + \beta_0 \end{equation} \tag{ 14 }$

where the $\Lambda_k$ is the CIF that links the neural features to the response time indicator r_k with a Poisson distribution. The parameter β₀ governs the baseline response time, whereas ω represents the weight by which the subject response time is related to the neural features. The logarithmic relationship imposes an empirical limitation of human response time into the model [31, 32]. The model parameters $\{\boldsymbol{\omega},\beta_0\}$ can be optimized by a GLM with a proper link function and additive noise [18], see figure 2(A). Figure 2(B) shows the absolute weight of the extracted neural features ( ω ), indicating the importance of the neural features, in representing the response times for the patients. As time passes from cue time, the importance of the neural features that represent the response time of a patient decreases, and the variability of the features increases. To identify the optimal kernel width during which neural features represent response times, we calculated the log-likelihood (LL) of the optimized model and plotted the measure for different windows in figure 2(C). The LL value represents the goodness of fit for the behavioral model. The higher the value of the LL, the better a model fits to a dataset. The LL converges to its maximal value after [400–500] ms of observing neural features after the cue time. This means that increasing the time window for neural features is not statistically significant for the model fitting. Therefore, the most relevant neural features that represent the participant's response times, i.e. behavioral signal, lie within the 500 ms time window after the cue time.

**Figure 2.** Behavioral model and optimal kernels. (A) The schematic view of creating the design matrix of the lasso-GLM model for behavioral analysis. (B) the $mean \pm std$ relative importance of the extracted neural features ( ω ) in time to the response time. The neural features are defined in table 1. (C) The log-likelihood (LL) plot for the lasso-GLM model vs the time window length of the neural features after the cue time for a selected group of patients. The LL of the model stops growing after [400–500] ms of observing neural features windows. Therefore the first 500 ms after the cue time is selected for decoding purposes in the result section.
Download figure:
Standard image High-resolution image

**Figure 2.** Behavioral model and optimal kernels. (A) The schematic view of creating the design matrix of the lasso-GLM model for behavioral analysis. (B) the $mean \pm std$ relative importance of the extracted neural features ( ω ) in time to the response time. The neural features are defined in table 1. (C) The log-likelihood (LL) plot for the lasso-GLM model vs the time window length of the neural features after the cue time for a selected group of patients. The LL of the model stops growing after [400–500] ms of observing neural features windows. Therefore the first 500 ms after the cue time is selected for decoding purposes in the result section.
Download figure:
Standard image High-resolution image

Table 1. Neural featured used in the HI-DGD model. Note that we remove the mPFC gamma features from the neural features due to its recorded from noisy EEG [33], therefore, it may not carry useful information but noise.

F01	Power STN 2–8 Hz	F08	Power mPFC 8–12 Hz
F02	Power STN 8–12 Hz	F09	Power mPFC 12–20 Hz
F03	Power STN 12–20 Hz	F10	Power mPFC 20–30 Hz
F04	Power STN 20–30 Hz	F11	Phase Coh. STN-mPFC 2–8 Hz
F05	Power STN 30–55 Hz	F12	Phase Coh. STN-mPFC 8–12 Hz
F06	Power STN 65–95 Hz	F13	Phase Coh. STN-mPFC 12–20,Hz
F07	Power mPFC 2–8 Hz	F14	Phase Coh. STN-mPFC 20–30 Hz

3.2. HI-DGD model parameters inference using neural activity and behavioral signal

Building on the analysis from the previous section and experimental evidence related to conflict processing representation in both neural and behavioral signals, we used the HI-DGD model to infer a cognitive state that represents the dynamics underlying conflict/non-conflict processing from neural and behavioral signals (response time). Figure 1(B) shows a graphical representation of the HI-DGD model. In HI-DGD, $\{x_k, \boldsymbol{s}_k,\boldsymbol{z_k}\}$ represents the cognitive state, neural features, and the behavioral features at time index k, respectively. The HI-DGD model uses s _k and z _k to infer a cognitive state x_k .

Therefore, the objective for the HI-DGD model optimization is to find a parameter set $\boldsymbol{\theta} = \{\boldsymbol{\beta}, \boldsymbol{\Omega}, \boldsymbol{\Gamma}\}$ that maximizes the joint distribution of neural features, behavioral signals, and the cognitive state, $P(x_k,z_k,\boldsymbol{s}_k;\boldsymbol{\theta})$ , for all time steps K. To meet this objective, as shown in figure 3(A), we optimized the HI-DGD model parameters by maximizing the following function

$\begin{equation} \boldsymbol{\theta}^{*} = \max_{\boldsymbol{\theta}} \sum_{k = 1}^{K}\log P(x_k,z_k,\boldsymbol{s}_k;\boldsymbol{\theta}). \end{equation} \tag{ 15 }$

Since both model parameters and the true cognitive state are unknown, one cannot simply use a maximum likelihood approach for finding the optimal parameters. To address this issue, we used expectation maximization (EM) to estimate the model parameters. The EM algorithm is an established method for maximum likelihood estimation of model parameters in the presence of a latent process or missing observations [34]. Other approaches including fully Bayesian or variational-Bayesian approaches, can be applied to our modeling approach in the case that a prior distribution for the model parameters can be defined [35]. In appendix A we derived the EM optimization for the HI-DGD model which recursively updates the model parameters $\boldsymbol{\theta}^{(r+1)} = \{\boldsymbol{\beta}^{(r+1)},\boldsymbol{\Omega}^{(r+1)},\boldsymbol{\Gamma}^{(r+1)}\}$ , where superscript r indicates an iteration of the EM. Based on the updated posterior distribution of iteration r, the objective function Q is defined as

$\begin{align} \begin{array}{l} Q({\mathbf{\theta }},{{\mathbf{\theta }}^{(r)}}) = {E_{{x_{0:K}}|{{\textbf{s}}_{1:K}},{z_{1:K}};{{\mathbf{\theta }}^r}}}\left[ {\log P({x_0})} \right.\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\left. { + \sum\limits_{k = 1}^K P ({x_k},{{\textbf{s}}_k},{z_k};{\mathbf{\theta }})} \right]. \end{array} \end{align} \tag{ 16 }$

With an updating rule as

$\begin{equation} \boldsymbol{\theta}^{(r+1)} = \max_{\boldsymbol{\theta}} Q(\boldsymbol{\theta}, \boldsymbol{\theta}^{(r)}). \end{equation} \tag{ 17 }$

Further details of the function Q are provided in appendix A.

**Figure 3.** HI-DGD model identification and inference scheme used in our analysis. (A) A schematic representation of HI-DGD model identification. s _k represents neural features extracted from LFP signals recorded from the STN and EEG signal recorded from the mPFC. We quantified the frequency band features by averaging them in frequency bands delta-theta (2–8 Hz), alpha (8–12 Hz), beta-l (12–20 Hz), beta-h (20–30 Hz), gamma-1 (30–55 Hz), and gamma-2 (65–95 Hz). We extracted power and phase coherency features according to what is described in the methods section. Both neural and behavioral features are passed to the HI-DGD model and the EM algorithm optimizes the identifies the optimal values for the model parameters set $\boldsymbol{\theta}^*$ . The HI-DGD model output is an inferred cognitive state that represents the conflict processing in the brain. (B) The conflict state prediction from the HI-DGD outputs (the inferred cognitive state). The statistics of the first 500 ms of the inferred cognitive state by the HI-DGD pass to a logistic regressor to predict the probability of the conflict and non-conflict states.
Download figure:
Standard image High-resolution image

**Figure 3.** HI-DGD model identification and inference scheme used in our analysis. (A) A schematic representation of HI-DGD model identification. s _k represents neural features extracted from LFP signals recorded from the STN and EEG signal recorded from the mPFC. We quantified the frequency band features by averaging them in frequency bands delta-theta (2–8 Hz), alpha (8–12 Hz), beta-l (12–20 Hz), beta-h (20–30 Hz), gamma-1 (30–55 Hz), and gamma-2 (65–95 Hz). We extracted power and phase coherency features according to what is described in the methods section. Both neural and behavioral features are passed to the HI-DGD model and the EM algorithm optimizes the identifies the optimal values for the model parameters set $\boldsymbol{\theta}^*$ . The HI-DGD model output is an inferred cognitive state that represents the conflict processing in the brain. (B) The conflict state prediction from the HI-DGD outputs (the inferred cognitive state). The statistics of the first 500 ms of the inferred cognitive state by the HI-DGD pass to a logistic regressor to predict the probability of the conflict and non-conflict states.
Download figure:
Standard image High-resolution image

3.3. HI-DGD model validation using simulated dataset

To verify HI-DGD model training and validate its performance, we simulated a dataset with the assumption that there is a low dimensional dynamical state, we referred to it as the cognitive state, that simultaneously drives two sources of observations: neural activity and behavioral signals. We generate the data for 1000 data points and 5 simulated patients. More details about the simulation are available in appendix B.

Figure 4(A) shows a sample of simulated data. Our simulation data has a complex observation process along with a low-dimensional state process; this setting was purposefully picked to justify the HI-DGD prediction power and build a clear comparison across trial types. The value of the objective function for the EM algorithm and all simulated patients is shown in figure 4(B), wherein the increasing trend in Q values shows that the new parameters set for the HI-DGD, at iteration $(r+1)$ , improves model training. The Q values eventually reach a steady state at around 10 iterations and give the optimal model parameters sets. The optimized model parameters for the behavioral and neural signals are shown in figures 4(C) and (D), respectively. The HI-DGD result in inferring the cognitive state associated with different trial types are shown in figure 4. The qualitative result shown in this figure verifies the HI-DGD model's ability to infer the latent state from a group of noisy observations.

**Figure 4.** The HI-DGD model validation results on simulation data. (A) A sample of simulated neural activity with 20 channels for simulated patient $\#1$ . (B) Q value, defined in equation (A.1), plots for all the simulated patients over iterations of the EM algorithm. (C) Box plot of optimized model parameters, the generative process, for all the patients; dots represent the actual value of the model parameters for each simulated patient. (D) Box plot of the absolute value of optimized model parameters, the discriminative processes, for all the simulated patients. Larger values here also represent the larger importance of the associated neural feature in inferring the cognitive state in (E). (E) inferred cognitive stat by HI-DGD. The inferred hidden state for different trial types of the simulation is identified by red (conflict trials) and blue (non-conflict trials). More details about the simulation are available in appendix B.
Download figure:
Standard image High-resolution image

3.4. HI-DGD model identification in the verbal Stroop task dataset

A similar analysis is performed for HI-DGD model identification in the verbal Stroop task dataset. The value of the objective function for the EM algorithm is shown in figure 5(A), wherein the increasing trend in Q values shows that the new parameters set for the HI-DGD, $\boldsymbol{\theta}^{(r+1)}$ , improves LL for the joint distribution of $\{x_k,\boldsymbol{s}_k,{z}_k\}^K_{k = 1}$ . The Q values eventually reach a steady state at around 15 iterations and give the optimal model parameters sets.

**Figure 5.** HI-DGD model identification results for experimental data. (A) Q values plot for iterations of the EM algorithm for the selected patients. The Q values eventually reach a steady state at around 15 iterations, giving the optimal model parameters sets. (B) The generative processes, $\boldsymbol{z}_k|\boldsymbol{x}_k$ , optimized model parameters for all the patients; dots represent the actual value of the model parameters for each simulated patient. (C) The discriminative processes, $\boldsymbol{x}_k|\boldsymbol{s}_k$ , optimized model parameters for all the patients.
Download figure:
Standard image High-resolution image

**Figure 5.** HI-DGD model identification results for experimental data. (A) Q values plot for iterations of the EM algorithm for the selected patients. The Q values eventually reach a steady state at around 15 iterations, giving the optimal model parameters sets. (B) The generative processes, $\boldsymbol{z}_k|\boldsymbol{x}_k$ , optimized model parameters for all the patients; dots represent the actual value of the model parameters for each simulated patient. (C) The discriminative processes, $\boldsymbol{x}_k|\boldsymbol{s}_k$ , optimized model parameters for all the patients.
Download figure:
Standard image High-resolution image

The optimized model parameters for the neural- and behavioral signals are shown in figures 5(C) and (B), respectively. In the HI-DGD, we assumed that the discriminative process model, $g(\boldsymbol{s}_k;\boldsymbol{\Omega})$ , is a linear function of the neural features with additive normal noise with variance σ_n [21]. The generative process model, $l(x_k;\boldsymbol{\Gamma})$ , is a marked-point process in which its marginal CIF is weighted by the cognitive state, by coefficient c_x , and the history of response times, by coefficient $c_\mathrm h$ , with a baseline intensity of c₀ [18, 25]. Similarly, the generation of the associated marks to behavioral signals is parameterized by the cognitive state, by coefficient d_x , and the history of response time, by coefficient $d_\mathrm h$ , with a baseline of d₀. Notice that we fixed the $c_x = 1$ and $d_x = 1$ in the optimization to avoid computational complexity [13]. The variables $\{\mu_0, \sigma_0\}$ represents the prior knowledge model of the cognitive state with a linear transition process and additive Gaussian noise with variance σ₀. The parameter σ_x is significantly smaller than σ_n which indicates that the cognitive state dynamic is smoother than the actual observed neural activity.

3.5. Decoding cognitive state from single trial observations

Here we tested the ability of the HI-DGD ability in decoding the conflict state from single trials of neural recordings (STN-mPFC). Using optimized model parameters, we applied the HI-DGD model to the first 500 ms (i.e. the optimal kernel width for neural features) of neural activities (after the cue time) and inferred the cognitive state associated with each participant. We then passed the mean and variance of decoded cognitive state to a logistic regression model to distinguish between conflict and non-conflict trials, figure 3(B). The logistic regression model can simply draw a border (a threshold) to separate between conflict and non-conflict trials based on the moments of the decoded cognitive state. The time interval used to decode cognitive state is shown by a green shaded area, the first 500 ms of neural features, see figure 6(A). The average of the inferred cognitive state is shown in figure 6(B). There is a difference between the distributions of the means in different trial types. This means the mean of the decoded cognitive state for the first 500 ms represents information about the trial types. To evaluate the performance of the decoded cognitive state by HI-DGD in distinguishing between conflict and non-conflict trials, we measured accuracy and f1-score on the prediction result by the logistic regressor for each patient in figure 6(C). The mean accuracy and f1 score for HI-DGD are $77.4\% \pm 3.0$ and $77.5\% \pm 3.1$ ; respectively. This shows that the cognitive state decoded by HI-DGD reliably and accurately represents conflict processing in the STN in PD patients.

**Figure 6.** HI-DGD inference results in six patients for the verbal Stroop task. (A) $mean \pm 2std$ of inferred cognitive stat for different trial types by HI-DGD. The inferred cognitive state for different trial types of the patients is identified by red (conflict trials) and blue (non-conflict trials). The transparency area identifies the first 500 ms of the inferred cognitive state that is used for identifying the trial types. (B) Mean of the inferred cognitive state for the first 500 ms after the cue time for different trial types and each patient. (C) Accuracy and F1-score for HI-DGD and the baseline model evaluation; dots represent the actual value of the model parameters for each simulated patient. HI-DGD outperforms the baseline in both accuracy and F1-score performance measures significantly (p-value 0.002).
Download figure:
Standard image High-resolution image

**Figure 6.** HI-DGD inference results in six patients for the verbal Stroop task. (A) $mean \pm 2std$ of inferred cognitive stat for different trial types by HI-DGD. The inferred cognitive state for different trial types of the patients is identified by red (conflict trials) and blue (non-conflict trials). The transparency area identifies the first 500 ms of the inferred cognitive state that is used for identifying the trial types. (B) Mean of the inferred cognitive state for the first 500 ms after the cue time for different trial types and each patient. (C) Accuracy and F1-score for HI-DGD and the baseline model evaluation; dots represent the actual value of the model parameters for each simulated patient. HI-DGD outperforms the baseline in both accuracy and F1-score performance measures significantly (p-value 0.002).
Download figure:
Standard image High-resolution image

3.6. Performance comparison with the baseline

To identify to what extent the HI-DGD model improves conflict state prediction compared to the traditional analysis, we compared the decoding performance of the HI-DGD model with traditional methods in the literature. One should note that the power of the STN theta band was considered as the main indicator of conflict versus non-conflict choices [26]. Therefore, we compared the HI-DGD model with a baseline model that only uses this feature to distinguish between conflict and non-conflict trials. We used the classifier that was used in the previous section with the STN theta band power as the input feature (consistent with [26, 30]). Similar to the HI-DGD model, we only used the first 500 ms of the neural features after cue time. The accuracy and f1-score of the baseline and the HI-DGD model are shown in figure 6(C). The HI-DGD model significantly improved the prediction of conflict vs. non-conflict trials compared to the baseline model for all the patients (KS-2 [36] with p-value $\leqslant0.002$ ). We also compared the HI-DGD result with the Encoder-Decoder model suggested in [21] in figure 6(C). The plot shows the HI-DGD increased the accuracy and f1-score of the predictions by approximately 5%.

3.7. Significant neural features underlying conflict and non-conflict processing

To determine the importance of individual neural features in conflict processing by HI-DGD, we masked all neural features except the feature of interest and used the neural decoder to measure the accuracy of the prediction. In this way, we could identify the contribution of each neural feature in decoding the cognitive state. As shown in figure 7(A), the most significant features are

(i)
The power of the STN and mPFC channels in the 2–8 Hz frequency band.
(ii)
Phase coherency between STN and mPFC in the theta frequency band (2–8 Hz frequency band).
(iii)
Power of the STN channel in the gamma frequency band.

**Figure 7.** HI-DGD model interpretation and explanation for the experimental data. (A) the contribution of each neural feature in distinguishing between conflict and non-conflict trials; dots represent the actual value of the model parameters for each simulated patient. (B) The ROC curve for the conflict state prediction with all neural features for each patient. (C) The ROC curve for the conflict state prediction with the significant neural features highlighted in (A). (D) ROC-score comparison of HI-DGD model with significant (with star) and all neural features evaluated on all patients. (E) Accuracy and F1-score for HI-DGD model with significant (with star) and all neural features evaluated on all patients. The accuracy and F1-score performance measures are approximately equal for the HI-DGD model with significant and all neural features.
Download figure:
Standard image High-resolution image

The power of the theta frequency band (2–8 Hz) in both STN and mPFC, as well as the phase coherency in the theta band, significantly contributed to the decoding of conflict vs. non-conflict choices. These results of the HI-DGD model are in agreement with those reported in the literature [26, 30], confirming that the inferred cognitive state represents a low dimensional representation of the cognitive state underlying conflict processing in the STN-mPFC neuronal circuit. One possible interpretation of the role of Gamma frequency band features in decoding the cognitive state is as follows. Gamma activities are associated with contralateral movement such as speech production [37]. Speech production, in a verbal Stroop task, is an inevitable part of the task and its onset is marked as the response time. As reported in [26], the response times were significantly longer for conflict trials than non-conflict trials which makes it a good indicator for separating conflict and non-conflict trials. Because gamma activity was locked to the response time (speech onset), the contribution of gamma activity may reflect the contribution of response time in decoding the cognitive state. Note that even though the gamma frequency band features are likely related to speech involved in the Stroop task, the gamma activity is involved in other functions such as learning, perception, memory, and emotional processing, and these processes could also be involved. This makes it difficult to interpret the exact role of gamma activity in the decoding cognitive state by HI-DGD.

Furthermore, we tested the performance of the HI-DGD with all neural features against that using only the significant ones in figures 7(B)–(E). We use the receiver operator characteristic (ROC) curve as an evaluation metric for the HI-DGD performance. The ROC curve is a probabilistic curve that indicates the true prediction rate against the false prediction rate at various threshold values for a classifier in identifying conflict and non-conflict trials. We plotted the ROC curve for HI-DGD performance with all neural features and using only the significant ones in figures 7(B) and (C); respectively. The area under the curve (AUC) of the ROC plot is the measure of the ability of the classifier to identify trial types and is used as a summary of the ROC curve. The dashed line in figures 7(B) and (C), associated with AUC = 0.5, indicates the performance of a non-expert classifier that is unable to distinguish between trial types. Therefore, The higher the AUC, the better performance of the model at distinguishing between the trial types. The AUC for HI-DGD performance for the patients is shown in figure 7(D). The non-significant difference between the AUC of HI-DGD with the significant neural features and all neural features shows that the identified significant neural features represent a similar result as the HI-DGD model does with all neural features. We also verified the findings by measuring accuracy and f1 scores for the HI-DGD in distinguishing between the trial types in figure 7(E). In sum, the difference between these measures is negligible.

4. Discussion

Both mPFC and STN play crucial roles in conflict processing [30]. Previous studies reported elevated theta power in STN and mPFC, and increased phase synchrony in the theta band between STN and mPFC in conflict trials [26, 30]. A recent study showed increased theta-gamma phase-amplitude coupling (PAC) at STN in conflict trials [30]. Despite the importance of these features in conflict processing (calculated by group-averaging methods across conflict and non-conflict trials), none of them represent common dynamics of conflict processing at every single trial of observation. Such dynamics, referred to as a cognitive state, represent a low-dimensional representation of the mPFC-STN interaction and link it to the behavioral signal, i.e. the response time. In this study, we developed a Bayesian-based method, namely, HI-DGD model to infer a cognitive state underlying decision-making from simultaneously recorded EEG from the mPFC and local field potentials from the STN recorded using deep brain stimulation electrodes in PD patients performing a Stroop task. First, we used a simulation dataset to assess the performance of the model. We showed that the HI-DGD model can infer cognitive states accurately and tracks the dynamics of the data. Second, by applying the HI-DGD model to the experimental data, we showed that the HI-DGD model can predict conflict vs. non-conflict choices using only 500 ms of STN-mPFC recordings. Moreover, the performance of the HI-DGD model in distinguishing between conflict and non-conflict choices was significantly better than using traditional methods. Third, we identified significant neural features estimated by the HI-DGD model: (a) theta band power in both STN and mPFC, (b) theta phase synchrony between mPFC and STN, and (c) gamma power in STN. These features match well with those reported in previous studies, indicating the interpretability of the HI-DGD model.

4.1. The HI-DGD enables trial-based detection of conflict and non-conflict states

One of the main advantages of the HI-DGD model, compared to the traditional group-averaging analysis, is that the cognitive state can be inferred (and thus decoded) from individual trials. The model not only enables studying the trial-to-trial variability of cognitive states underlying conflict processing but also provides an opportunity to track the dynamics of these states and design model-based closed-loop cognitive controllers. In section 3.5, we applied a (trained) neural decoder in the first 500 ms of observed neural activities after the cue time. An interval of 500 ms is sufficient to track at least one cycle of each frequency band, specifically for the theta band power which is a prominent neural feature of conflict processing. We showed in figure 6 that the HI-DGD model can capture dynamics underlying conflict processing and distinguishes between conflict and non-conflict choices with the mean accuracy of $77.4\%$ (compared to $61.0\%$ using traditional methods).

Two main sources of error in these types of problems are

(i)
Uncertainty or inconsistency in patterns of EEG/LFP data due to the quality of data. For example, if there is no consistent pattern in EEG/LFP power spectrums, the model is not able to find an accurate model parameter associated with this and therefore the model is not able to perform accurately.
(ii)
Suboptimality of models due to the difficulty of the problem or modeling limitations. In these types of problems, because we have latent/unobservable dynamical models, we need to approximate the nose model with a predefined distribution and use variational inference or EM to be able to approximate intractable computations for posterior inference over latent variables, such as the cognitive state in this study, with simpler ones. Therefore the identified noise model is not ideal or the optimization schemes may not guarantee optimal model parameters, especially in the EM algorithm.

Therefore, the source of error in single-trial decoding by HI-DGD could have been either or a combination of these factors. Unfortunately, there is no solid way to measure the contribution of each of these factors in our HI-DGD.

4.2. HI-DGD comparison with state-space models traditional models

Various methods including state-space models (SSMs) [18, 29, 38] and machine learning approaches [39–41] has been used to model cognitive states using neural or behavioral data [11, 12, 21, 31, 42]. An important key assumption behind these approaches is that dynamics in the latent brain states, hereafter known as cognitive states, are encoded in a subset of features of the neural activity and behavioral signals. Therefore, it can be decoded from either neural recordings or behavioral signals.

Despite the usefulness of these methods in decoding cognitive states, they suffer from several limitations. For high-dimensional neural data (recorded by several electrodes over time), these models become impractical and computationally inefficient for building accurate receptive-field models for individual neurons or sources [43–46]. Building accurate receptive-field models requires additional modeling steps, such as spike sorting or cluster-less models, which add complexity and computational expense to the process [47]. Moreover, individual neural receptive-field models do not contain information about the co-activity across neurons that encapsulate essential information about interactions among different brain regions of the brain [48, 49]. For example, traditional models often ignore the coherence and synchrony between different channels of neural activity for decoding a cognitive state [12].

In addition, traditional models do not use all possible sources of information at the same time for decoding a cognitive state [11, 12, 21, 22]. The existing methods utilize behavioral signals as a noisy observation of the designated cognitive state. Then, the cognitive state is solely decoded from neural signals. For example, Yousefi et al [21] considered the response time of participants performing a multisource interference task as a noisy measure of cognitive flexibility state and inferred it from LFP signals in a separate step [50]. Therefore, this strategy of decoding a cognitive state employs only one source of information and may lead to suboptimal results.

The hybrid inference solution suggested by the HI-DGD simultaneously combines simple generative models of behavior as a function of the underlying cognitive state with information encoded in complex patterns of high-dimensional neural to address the shortcomings of traditional models.

In developing our modeling framework, the HI-DGD, we hypothesized that there is a low-dimensional dynamical manifold—or, a state process—present in the data which helps to better characterize and potentially understand neural mechanisms of complex cognitive processes. The state process in HI-DGD is directly connected to both generative and discriminative processes and can update its value by either one or two observation(s) at the same time. The discriminative part can therefore be used for decoding a latent state from high-dimensional observed neural recordings [51, 52]. By considering the discriminative process, the HI-DGD does not need to perform intermediate processing steps such as spike sorting or independent component analysis, which are common practices in preparing neural data [53]. As the HI-DGD uses the discriminative model for high-dimensional data, it offers a more accurate prediction of the cognitive state compared to generative models where the model inadequacy might be problematic [54, 55].

We compared the performance of the HI-DGD model with that using a baseline analysis technique. For the latter, we used a logistic regressor with an input calculated by the power of the theta band of the STN averaged over all trials and across participants (consistent with [26, 30]). Similar to the HI-DGD model, we only used the first 500 ms of observed neural recordings after the cue time. We showed that the HI-DGD model significantly improved the accuracy of the predictions compared to the baseline technique (statistics for accuracy and F1-score, with p-value = 0.002).

There are many choices to build the discriminative model including deep neural networks, convolutional neural networks [56], recurrent neural networks [57], and parametric models like the Gaussian process [58]. Unlike high dimensional neural recordings, behavioral signals such as response-time and error rates are usually very low dimensional and have slow temporal dynamics compared to neural signals, thus it is not efficient to use discriminative processes for capturing dynamics of behavioral signals [59, 60]. Therefore, we used a generative model, similar to the SSM, for the low dimensional subset of the observed behavioral signals.

Acknowledgments

This work was supported by Dr Lankarany's NSERC Fund (RGPIN-2020-05868) and MITACS Accelerate (IT19240).

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Conflict of interest

The authors declare no competing interests.

Appendix A: Objective function derivation for HI-DGD model identification

We use EM algorithm [61] to find maximum likelihood estimates of the model parameters, $\boldsymbol{\theta} = \{{\boldsymbol{\beta}},{\boldsymbol{\Omega}},{\boldsymbol{\omega}} \}$ . The EM algorithm is an established solution to perform maximum likelihood estimation of model parameters when there is an unobservable process or missing observations [21]. In the HI-DGD, the state variable ${\boldsymbol{x}}_{\boldsymbol{k}}$ is the unobserved process, the latent dynamical variable. The EM solution recursively estimates the model parameters $\boldsymbol{\theta}^{(r)} = \{{\boldsymbol{\beta}}^{(r)},{\boldsymbol{\Omega}}^{(r)},{\boldsymbol{\omega}}^{(r)} \}$ , based on an updated posterior distribution of ${\boldsymbol{x}}_{\boldsymbol{0:K}}$ and the observation from the previous EM iteration, ${\boldsymbol{\theta}}^{(r-1)} = \{\boldsymbol{\beta}^{(r-1)},{\boldsymbol{\Omega}}^{(r-1)},{\boldsymbol{\omega}}^{(r-1)} \}$ . The EM includes two steps: expectation and maximization [62]. The expectation step (or Q function) is defined by

$\begin{align} \begin{array}{l} Q({\mathbf{\theta }},{{\mathbf{\theta }}^{(r - 1)}}) = {{\textbf{E}}_{{{\textbf{x}}_{{\textbf{0}}:{\textbf{K}}}}|{{\textbf{s}}_{{\textbf{1}}:{\textbf{K}}}},{{\textbf{z}}_{{\textbf{1}}:{\textbf{K}}}};{{\mathbf{\theta }}^{(r - 1)}}}}\left[ {\log (p({{\textbf{x}}_{\textbf{0}}})} \right.\quad \\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\, \times \prod\limits_{k = 1}^K p ({{\textbf{x}}_{\textbf{k}}}|{{\textbf{x}}_{{\textbf{k}} - {\textbf{1}}}}) \times p({{\textbf{s}}_{\textbf{k}}}|{{\textbf{x}}_{\textbf{k}}},{{\textbf{h}}_{\textbf{k}}})\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\left. {\quad \times p({{\textbf{z}}_{\textbf{k}}}|{{\textbf{x}}_{\textbf{k}}}))} \right]. \end{array} \end{align} \tag{ A.1 }$

Here after we replace ${\boldsymbol{E}}_{{\boldsymbol{x}}_{\boldsymbol{0:K}}|{\boldsymbol{s}}_{\boldsymbol{1:K}},{\boldsymbol{z}}_{\boldsymbol{1:K}};{\boldsymbol{\theta}}^{(r-1)}}$ with ${\boldsymbol{E}}_{K}$ for summarizing the notation. The Q function can be rewritten as

$\begin{align} \begin{array}{l} Q({\mathbf{\theta }},{{\mathbf{\theta }}^{(r - 1)}}) = {{\textbf{E}}_K}\left[ {\log (p({{\textbf{x}}_{\textbf{0}}})\prod\limits_{k = 1}^K p ({{\textbf{x}}_{\textbf{k}}}|{{\textbf{x}}_{{\textbf{k}} - {\textbf{1}}}}) \times p({{\textbf{z}}_{\textbf{k}}}|{{\textbf{x}}_{\textbf{k}}})} \right.\\ \,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\,\left. { \times \frac{{p({{\textbf{x}}_{\textbf{k}}}|{{\textbf{s}}_{\textbf{k}}},{{\textbf{h}}_{\textbf{k}}})}}{{\int p ({{\textbf{x}}_{\textbf{k}}}|{{\textbf{x}}_{{\textbf{k}} - {\textbf{1}}}})p({{\textbf{x}}_{{\textbf{k}} - {\textbf{1}}}}|{{\textbf{s}}_{{\textbf{k}} - {\textbf{1}}}},{{\textbf{h}}_{{\textbf{k}} - {\textbf{1}}}})}}d{{\textbf{x}}_{{\textbf{k}} - {\textbf{1}}}})} \right]. \end{array} \end{align} \tag{ A.2 }$

Expanding the Q function yields

$\begin{align} &Q(\boldsymbol{\theta},\boldsymbol{\theta}^{(r-1)}) \nonumber\\ &\quad = {\boldsymbol{E}}_{K} \left[ \log \; p({\boldsymbol{x}}_{\boldsymbol{0}}) + \sum_{k = 1}^{K} \log\; p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}}) \right.\nonumber\\ & \qquad + \sum_{k = 1}^{K} \log \; p({\boldsymbol{z}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k}}) +\sum_{k = 1}^{K} \log \; p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{s}}_{\boldsymbol{k}},{\boldsymbol{h}}_{\boldsymbol{k}}) \nonumber\\ &\qquad \left.- \sum_{k = 1}^{K} \log \int p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}})p({\boldsymbol{x}}_{\boldsymbol{k-1}}|{\boldsymbol{s}}_{\boldsymbol{k\!-\!1}},{\boldsymbol{h}}_{\boldsymbol{k\!-\!1}}) \mathrm d{\boldsymbol{x}}_{\boldsymbol{k-1}} \right]\!. \end{align} \tag{ A.3 }$

The integration term here can be expressed as

$\begin{align} & \int p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}})p({\boldsymbol{x}}_{\boldsymbol{k-1}}|{\boldsymbol{s}}_{\boldsymbol{k-1}},{\boldsymbol{h}}_{\boldsymbol{k-1}}) \mathrm d{\boldsymbol{x}}_{\boldsymbol{k-1}} \nonumber\\ &\quad = {\boldsymbol{E}}_{{\boldsymbol{x}}_{\boldsymbol{k-1}}|{\boldsymbol{s}}_{\boldsymbol{k-1}},{\boldsymbol{h}}_{\boldsymbol{k-1}}} [p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}})]. \end{align} \tag{ A.4 }$

When the state process is linear with an additive Gaussian noise and the prediction process is a multi-variate normal, there is a closed form solution for this expectation. To derive a general solution for nonlinear and non-Gaussian noises, we can rewrite Q function as

$\begin{align} Q(\boldsymbol{\theta},\boldsymbol{\theta}^{(r-1)}) &= {\boldsymbol{E}}_{K} \left[ \log \; p({\boldsymbol{x}}_{\boldsymbol{0}}) + \sum_{k = 1}^{K} \log\; p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}}) \right.\nonumber\\ &\quad + \sum_{k = 1}^{K} \log \; p({\boldsymbol{z}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k}}) +\sum_{k = 1}^{K} \log \; p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{s}}_{\boldsymbol{k}},{\boldsymbol{h}}_{\boldsymbol{k}}) \nonumber\\ & \left.\quad - \sum_{k = 1}^{K} \log \; {\boldsymbol{E}}_{{\boldsymbol{x}}_{\boldsymbol{k-1}}|{\boldsymbol{s}}_{\boldsymbol{k-1}},{\boldsymbol{h}}_{\boldsymbol{k-1}}} [\,p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}})]\right]. \end{align} \tag{ A.5 }$

Since $\log(E(f(x))) \geqslant E[\log(f(x))]$ , we can exchange the log and expectation operations to yield a lower bound for Q, which can be written as

$\begin{align} Q(\boldsymbol{\theta},\boldsymbol{\theta}^{(r-1)})& \geqslant {\boldsymbol{E}}_{K} \left[ \log \; p({\boldsymbol{x}}_{\boldsymbol{0}}) + \sum_{k = 1}^{K} \log\; p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}}) \right.+ \sum_{k = 1}^{K} \log \; p({\boldsymbol{z}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k}})\nonumber\\ &\kern-2.5pt\quad \left. +\sum_{k = 1}^{K} \log\;p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{s}}_{\boldsymbol{k}},{\boldsymbol{h}}_{\boldsymbol{k}}) \right] - \sum_{k = 1}^{K} {\boldsymbol{E}}_{{\boldsymbol{x}}_{\boldsymbol{0,\ldots,K}}|{\boldsymbol{s}}_{\boldsymbol{1,\ldots,K}},{\boldsymbol{z}}_{\boldsymbol{1,\ldots,K}}}\nonumber\\ & \quad{\boldsymbol{E}}_{{\boldsymbol{x}}_{\boldsymbol{k-1}}|{\boldsymbol{s}}_{\boldsymbol{k-1}},{\boldsymbol{h}}_{\boldsymbol{k-1}}} [\log \; p({\boldsymbol{x}}_{\boldsymbol{k}}|{\boldsymbol{x}}_{\boldsymbol{k-1}})] \end{align} \tag{ A.6 }$

where, the expectation ${\boldsymbol{x}}_{\boldsymbol{k-1}}$ in the last term, defined by the prediction process, is a function of the model free parameters. The second step of the EM is maximization step (M-step) that updates parameter set at iteration (r) by maximizing Q. The maximization is defined by

$\begin{equation} \boldsymbol{\theta}^{(r)} = \max_{{\boldsymbol{\theta}}} \; Q(\boldsymbol{\theta},\boldsymbol{\theta}^{(r-1)}). \end{equation} \tag{ A.7 }$

This optimization can be calculated analytically or numerically (e.g. gradient descent [63]). After each iteration, a new set of parameters are estimated and the EM routine is stopped when a stopping criteria based on the likelihood growth or parameter changes is satisfied.

Appendix B: Simulation dataset details

We simulated a group of patients with the assumption that there is a low dimensional dynamics, we referred to it as the cognitive state, that drives both neural activity and behavioral signals. The statistical properties of the cognitive state are different for trial types. Let us assume the cognitive state for trial j of mth patient, $m = 0,\ldots,M$ is defined by $x_k^{m,j,d}$ , where d is the trial type and is modeled by a Bernoulli random variable. Here we selected the Bernoulli parameter 0.5 to make sure we have balanced data from conflict and non-conflict trials. We modeled the state process as a random walk as follows

$\begin{equation} P(x_k^{m,j,d}|x_{k-1}^{m,j,d})\sim N(\alpha_x^d x_{k-1}^{m,j,d}, \sigma_x^d ) \end{equation} \tag{ B.1 }$

where $\alpha_x^d$ and $\sigma_x^d$ are the transition coefficient and standard deviation for each random walk. We generated two sets of observations. The first one is a high-dimensional and continuous, s _k, designed to resemble the dynamics of the LFP/mPFC signals in a cognitive task. Using the state, we then generate a 20-dimensional continuous signal with the conditional distribution defined by

$\begin{equation} P(\boldsymbol{s}_k|x_k) \sim N(\boldsymbol{g}(x_k, x_{k-1},\ldots,x_{k-l_m}),\boldsymbol{\Sigma}_s) \end{equation} \tag{ B.2 }$

where $\boldsymbol{g(.)}$ is a 20-dimensional vector of non-linear functions, like tanh, and cosine, with an argument defined by a subset of state processes at the current and previous time points. The l_m is the maximum number of data points used in creating $g(.)$ function. For each channel of data, we pick a l uniformly from 0 to l_m randomly, which is used for data generation. For example, in our simulation, the g₁ function corresponding to the first element of g , is a $\tanh(.)$ with the argument to be $x_k+0.8x_{k-1}+0.6*x_{k-2}+0.4*x_{k-3}+0.2*x_{k-4}$ . $\boldsymbol{\Sigma}_s$ is the stationary covariance matrix with sizer $20\times20$ . $\boldsymbol{\Sigma}_s$ 's non-diagonal terms are non-zeros, implying observations across channels are correlated. In our simulation, we picked a = .9, b = 0, and $\sigma_x = 0.1$ .

The second observation, r_k , is a low dimensional, discrete, and sparse signal that resembles the dynamics of a behavioral signal in a cognitive task such as response time. r_k can be seen as the time of the agent response to a task and it can be modeled as:

$\begin{equation} \log r_k = \alpha_r x_k + \beta_r m_k + \vartheta \end{equation} \tag{ B.3 }$

where $\alpha_r = 0.9$ represent the weight of current state and $\beta_r = 0.1$ represent the weight of the error signal m_k in generating $\log r_t$ . ϑ is an additive Gaussian noise with a standard deviation of 0.3. We assume this relationship between r_k , conflict state, and error rate because empirically response times typically decline rapidly in non-conflict trials and later tend to reach a lower bound based on subject-specific physiological constraints.

Appendix C: Supplementary figure

**Figure C1.** HI-DGD accuracy measure for ten patients in the experimental data. Accuracy for HI-DGD and the baseline model evaluated on all patients. The accuracy of the proposed HI-DGD is significantly higher than that of the baseline model (p-value $\leqslant 0.001$ ).
Download figure:
Standard image High-resolution image

**Figure C1.** HI-DGD accuracy measure for ten patients in the experimental data. Accuracy for HI-DGD and the baseline model evaluated on all patients. The accuracy of the proposed HI-DGD is significantly higher than that of the baseline model (p-value $\leqslant 0.001$ ).
Download figure:
Standard image High-resolution image

Inferring cognitive state underlying conflict choices in verbal Stroop task using heterogeneous input discriminative-generative decoder model

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. HI-DGD model

2.1. Marked-point process observation model for modeling behavioral signals