Reliability of the Granger causality inference

Douglas Zhou; Yaoyu Zhang; Yanyang Xiao; David Cai

doi:10.1088/1367-2630/16/4/043016

1. Introduction

Information flow plays a central role in physical and biological systems, such as in gene regulation and cortical computation in the brain. How to characterize information propagation within dynamical systems is a great theoretical challenge. There has been a long history of studying causal relations between time series [1–6]. Recently, Granger causality (GC) analysis has proven itself as a powerful method, widely used for extracting causal connectivity from data in fields ranging from physics [7–10] and biology [11–14], to social science [15–18].

In general, it is often quite involved to acquire reliable GC inference. Usually the GC framework requires the time series to be linear. For linear systems, the directed GC corresponds to the structural connectivity that mediates physical interactions within a system. For nonlinear systems, the relationship between the causal connectivity and the structural connectivity is still under active investigation. Extending the GC theory to nonlinear, non-Gaussian times series remains a central theoretical endeavor [19–21]. Note that the notion of GC is statistical rather than structural, i.e. it identifies directed statistical causal connectivity through statistical features of time series. As a statistical method, one of the important issues is how to collect time series for dynamical variables in order to faithfully capture the information flow within the dynamics. In this work, we study the reliability of GC inference for both linear and nonlinear systems and demonstrate that there are inference hazards in the GC analysis arising from the manner in which a continuous dynamical process is projected to a discrete process.

Because most dynamical quantities are continuous in time and GC values are generally evaluated using discrete time series sampled from these continuous time processes, we investigate the reliability of GC evaluation by studying the GC value as a function of the sampling interval length τ (we will term this function the GC sampling structure). We show that, for both linear and nonlinear dynamics, there are surprisingly common features in the GC sampling structure: (i) when by the design of our system there is a causal flow, yet the GC value can vanish over a certain set of τ, yielding an incorrect causal inference; or (ii) conversely, when there is no causal influence by construction, the GC value can become of appreciable size for some ranges of τ, yielding again a potentially erroneous causal inference. Clearly, these phenomena greatly complicate the interpretation of the GC inference and potentially produce opposing causality conclusions when using different sampling τ's for empirical data. These issues have yet to receive sufficient attention despite the wide use of GC analysis in many fields. It is important to examine these issues in a GC analysis to avoid erroneous causal inference. We use a small-world network reconstruction as an example to illustrate this hazard. As is well known, many important biological or social networks possess small-world attributes [22, 23]. The small-world topology is generally linked to efficient parallel information processing on both local and global levels. We show that an incorrect inference can arise, i.e. the small-world attributes are completely lost in the reconstructed network when GC analysis is blindly applied to the network topology reconstruction. Clearly, these sampling issues should also arise in data processing for GC evaluations using low-pass filtering or down-sampling for many types of imaging data, such as the BOLD signal in functional magnetic resonance imaging (fMRI) [24, 25].

To remove the inference artifacts in the GC analysis, it seems that a natural resolution is to incorporate as much information as possible by using sufficiently fine sampling. However, as we show below, this approach produces a paradox, i.e. a finer sampling may not imply a more reliable GC inference. In what follows, we will use idealized models to study the mechanisms underlying these phenomena and discuss an approach that can eliminate these artifacts to obtain a reliable GC relation.

2. GC sampling structures

GC aims to analyze the causal influence of one time series, ${{X}_{t}}$ , on another, ${{Y}_{t}}$ , by examining whether the statistical prediction of ${{Y}_{t}}$ can be improved when the past information of ${{X}_{t}}$ is incorporated into it [2, 26, 27]. First, to fix the notation, we briefly recapitulate the GC analysis [2, 27, 28]. For two stationary zero-mean discrete time series $\left\{ {{X}_{t}} \right\}$ and $\left\{ {{Y}_{t}} \right\}$ , a statistical prediction can usually be accomplished using the autoregression

$\begin{eqnarray*}\begin{matrix} {{X}_{t}} & = & \underset{j=1}{\overset{\infty}{\mathop{\mathop{\sum }^{}}}}\,{{a}_{1j}}{{X}_{t-j}}+{{\epsilon }_{1t}}, \\ {{Y}_{t}} & = & \underset{j=1}{\overset{\infty}{\mathop{\mathop{\sum }^{}}}}\,{{d}_{1j}}{{Y}_{t-j}}+{{\eta }_{1t}}, \\ &&\end{matrix}\end{eqnarray*}$

where ${{\epsilon }_{1t}}$ and ${{\eta }_{1t}}$ are residuals representing the prediction errors when we separately consider the history of each time series. The corresponding joint regression is

$\begin{eqnarray*}\begin{matrix} {{X}_{t}} & = & \underset{j=1}{\overset{\infty}{\mathop{\mathop{\sum }^{}}}}\,{{a}_{2j}}{{X}_{t-j}}+\underset{j=1}{\overset{\infty}{\mathop{\mathop{\sum }^{}}}}\,{{b}_{2j}}{{Y}_{t-j}}+{{\epsilon }_{2t}}, \\ {{Y}_{t}} & = & \underset{j=1}{\overset{\infty}{\mathop{\mathop{\sum }^{}}}}\,{{c}_{2j}}{{X}_{t-j}}+\underset{j=1}{\overset{\infty}{\mathop{\mathop{\sum }^{}}}}\,{{d}_{2j}}{{Y}_{t-j}}+{{\eta }_{2t}}, \\ &&\end{matrix}\end{eqnarray*}$

where ${{\epsilon }_{2t}}$ and ${{\eta }_{2t}}$ are residuals representing the prediction errors after we consider the shared history for both time series. The sequences $\left\{ {{\epsilon }_{1t}} \right\}$ , $\left\{ {{\epsilon }_{2t}} \right\}$ , $\left\{ {{\eta }_{1t}} \right\}$ , and $\left\{ {{\eta }_{2t}} \right\}$ are white-noise time series. The GC ${{F}_{y\to x}}$ characterizing the causal influence of $\left\{ {{Y}_{t}} \right\}$ on $\left\{ {{X}_{t}} \right\}$ is defined as

$\begin{eqnarray}&&{{F}_{y\to x}}=\ln \frac{\operatorname{var}\left( {{\epsilon }_{1t}} \right)}{\operatorname{var}\left( {{\epsilon }_{2t}} \right)}.\end{eqnarray} \tag{ 1 }$

Note that $\operatorname{var}\left( {{\epsilon }_{1t}} \right) \geqslant \operatorname{var}\left( {{\epsilon }_{2t}} \right)$ , i.e. the prediction of one time series cannot become worse (i.e. have greater residual variance) after information from the other time series is incorporated, therefore, ${{F}_{y\to x}} \geqslant 0.$ The GC from $\left\{ {{X}_{t}} \right\}$ to $\left\{ {{Y}_{t}} \right\}$ is defined similarly as ${{F}_{x\to y}}=\ln \frac{\operatorname{var}\left( {{\eta }_{1t}} \right)}{\operatorname{var}\left( {{\eta }_{2t}} \right)}.$ The total GC is ${{F}_{x,y}}={{F}_{y\to x}}+{{F}_{x\to y}}+{{F}_{x\cdot y}}$ , where ${{F}_{x\cdot y}}$ is the instantaneous GC, ${{F}_{x\cdot y}}=\ln \frac{\operatorname{var}\left( {{\epsilon }_{2t}} \right)\operatorname{var}\left( {{\eta }_{2t}} \right)}{\left| \Sigma \right|},$ $\left| \Sigma \right|=\operatorname{var}\left( {{\epsilon }_{2t}} \right)\operatorname{var}\left( {{\eta }_{2t}} \right)-{{\left[ \operatorname{cov}\left( {{\epsilon }_{2t}},{{\eta }_{2t}} \right) \right]}^{2}}$ [2, 27, 28]. In addition, we can also describe GC in the frequency domain [2, 27, 28], e.g. ${{F}_{x,y}}=\frac{1}{2\pi }\mathop{\int }^{}_{-\pi }^{\pi }\ln \left( \frac{{{S}_{yy}}\left( \omega \right){{S}_{xx}}\left( \omega \right)}{\left| \mathbf{S}\left( \omega \right) \right|} \right)\text{d}\omega$ . Here, $\left| \mathbf{S}\left( \omega \right) \right|$ is the determinant of the spectral matrix $\mathbf{S}\left( \omega \right)$ , where matrix elements ${{S}_{xx}}\left( \omega \right)$ and ${{S}_{yy}}\left( \omega \right)$ are the power spectra of $\left\{ {{X}_{t}} \right\}$ and $\left\{ {{Y}_{t}} \right\}$ , respectively, and ${{S}_{xy}}\left( \omega \right)=\mathop{\sum }^{}_{n=-\infty }^{+\infty }\operatorname{cov}\left( {{X}_{t}},{{Y}_{t-n}} \right){{e}^{-in\omega }}$ is the cross-spectrum of $\left\{ {{X}_{t}} \right\}$ and $\left\{ {{Y}_{t}} \right\}$ .

To illustrate the application of GC in nonlinear dynamics, we choose network dynamics of conductance-based, pulse-coupled integrate-and-fire (I and F) neurons [29]

$\begin{eqnarray}&&\frac{\text{d}{{v}_{i}}}{\text{d}t}=-{{g}_{\text{L}}}({{v}_{i}}-{{\epsilon }_{\text{L}}})-{{g}_{i}}({{v}_{i}}-{{\epsilon }_{\text{E}}}),\end{eqnarray} \tag{ 2a }$

$\begin{eqnarray}&&\frac{\text{d}{{g}_{i}}}{\text{d}t}=-\frac{{{g}_{i}}}{\sigma }+\underset{j,k}{\mathop{\mathop{\sum }^{}}}\,{{s}_{ji}}\delta \left( t-{{T}_{j,k}} \right)+\lambda \underset{l}{\mathop{\mathop{\sum }^{}}}\,\delta \left( t-T_{i,l}^{\text{p}} \right),\end{eqnarray} \tag{ 2b }$

where ${{v}_{i}}$ and ${{g}_{i}}$ are the membrane potential and conductance for the ith neuron, respectively, ${{g}_{\text{L}}}$ is the leak conductance, ${{\epsilon }_{\text{L}}}$ and ${{\epsilon }_{\text{E}}}$ are the resting and reversal potentials, respectively, and σ is the conductance time constant. When ${{v}_{i}}$ $\in \ ({{\epsilon }_{\text{L}}},{{V}_{\text{th}}})$ , a neuron evolves according to equation (2). When ${{v}_{i}}$ reaches the threshold ${{V}_{\text{th}}}$ , it is reset to the resting potential ${{\epsilon }_{\text{L}}}$ for a refractory period ${{\tau }_{\text{ref}}}$ . This threshold-reset event is referred to as a spiking event. The coefficient ${{s}_{ji}}=s{{A}_{ji}}$ denotes the connection from neuron j to neuron i, with s the coupling strength and $\mathbf{A}=\left( {{A}_{ji}} \right)$ the adjacency matrix of the network; ${{T}_{j,k}}$ is the kth spike time of the jth neuron, whereas $T_{i,l}^{\text{p}}$ is the lth spike time in the Poisson input to the ith neuron. This input has rate μ and strength λ.

For a network of two neurons, x and y, with the adjacency matrix $\mathbf{A}$ with elements ${{A}_{xx}}=0$ , ${{A}_{xy}}=0$ , ${{A}_{yx}}=1$ , and ${{A}_{yy}}=0$ , we apply the GC analysis on the voltage time series. We expect that a successful GC analysis of the network topology should result in consistency between the causal and structural connectivities, that is, the presence of a structural coupling, ${{A}_{yx}}=1$ , should induce a causal flow, i.e. ${{F}_{y\to x}}\ne 0$ , whereas the absence of the coupling ${{A}_{xy}}=0$ should yield ${{F}_{x\to y}}=0$ . We demonstrate below how we can reliably assess this consistency by first addressing artifacts arising from the sampling in the evaluation of GC values.

Note that the discrete time series used in the GC analysis is usually sampled from continuous-time processes. The sampling effects can be captured by the GC sampling structure, i.e. the GC values, ${{F}_{y\to x}}$ , ${{F}_{x\to y}}$ , as functions of the sampling interval τ, denoted by $F_{y\to x}^{\left( \tau \right)}$ , $F_{x\to y}^{\left( \tau \right)}$ . For the nonlinear network dynamics of two neurons, from its GC sampling structure as shown in figure 1(a), we can observe the following properties:

GC is an oscillatory, decaying function of τ,
GC becomes nearly zero, or zero, periodically.

**Figure 1.** The GC sampling structure. (a) For the uni-directed two-neuron network (2), plotted are $F_{x\to y}^{\left( \tau \right)}$ (red) and $F_{y\to x}^{\left( \tau \right)}$ (cyan), obtained from voltage time series of the system (2). Shaded regions along the curves indicate a 95% confidence interval. The parameters are ${{V}_{\text{th}}}=1$ , ${{\epsilon }_{\text{L}}}=0$ , ${{\epsilon }_{\text{E}}}=14/3$ , ${{\tau }_{\text{ref}}}=2\;\text{ms}$ , $\sigma =2\;\text{ms}$ and ${{g}_{\text{L}}}=0.05\;\text{m}{{\text{s}}^{-1}}$ throughout the text. Here $\mu =1\;\text{kHz}$ , $\lambda =0.06,$ s = 0.02. (b) For the second-order linear regression model (3), plotted are $F_{x\to y}^{\left( k \right)}$ (red) and $F_{y\to x}^{\left( k \right)}$ (cyan) with sampling interval k. The insets in (a) and (b) are corresponding spectra: ${{S}_{xx}}\left( \omega \right)$ (cyan), ${{S}_{yy}}\left( \omega \right)$ (red), and $\left| {{S}_{xy}}\left( \omega \right) \right|$ (black) computed from the original time series. (c) Comparison with asymptotic results of $F_{y\to x}^{\left( k \right)}$ : the black dotted line is numerically obtained GC and the cyan is the asymptotic form (5) with $\beta =\pi /8$ and $\phi =-\pi /2.$ (d) For model II, plotted are $F_{x\to y}^{\left( k \right)}$ (red), $F_{y\to x}^{\left( k \right)}$ (cyan), $F_{x.y}^{\left( k \right)}$ (dashed cyan), $F_{x,y}^{\left( k \right)}$ (dashed red) with sampling interval k (parameters $\xi =0.05$ , $\beta =\frac{\pi }{5}$ , $\gamma =0.3$ , $\phi =\frac{\pi }{2}$ ). Corresponding spectra are plotted in the inset: ${{S}_{xx}}$ (cyan), ${{S}_{yy}}$ (red), and $\left| {{S}_{xy}} \right|$ (black).
Download figure:
Standard image High-resolution image

**Figure 1.** The GC sampling structure. (a) For the uni-directed two-neuron network (2), plotted are $F_{x\to y}^{\left( \tau \right)}$ (red) and $F_{y\to x}^{\left( \tau \right)}$ (cyan), obtained from voltage time series of the system (2). Shaded regions along the curves indicate a 95% confidence interval. The parameters are ${{V}_{\text{th}}}=1$ , ${{\epsilon }_{\text{L}}}=0$ , ${{\epsilon }_{\text{E}}}=14/3$ , ${{\tau }_{\text{ref}}}=2\;\text{ms}$ , $\sigma =2\;\text{ms}$ and ${{g}_{\text{L}}}=0.05\;\text{m}{{\text{s}}^{-1}}$ throughout the text. Here $\mu =1\;\text{kHz}$ , $\lambda =0.06,$ s = 0.02. (b) For the second-order linear regression model (3), plotted are $F_{x\to y}^{\left( k \right)}$ (red) and $F_{y\to x}^{\left( k \right)}$ (cyan) with sampling interval k. The insets in (a) and (b) are corresponding spectra: ${{S}_{xx}}\left( \omega \right)$ (cyan), ${{S}_{yy}}\left( \omega \right)$ (red), and $\left| {{S}_{xy}}\left( \omega \right) \right|$ (black) computed from the original time series. (c) Comparison with asymptotic results of $F_{y\to x}^{\left( k \right)}$ : the black dotted line is numerically obtained GC and the cyan is the asymptotic form (5) with $\beta =\pi /8$ and $\phi =-\pi /2.$ (d) For model II, plotted are $F_{x\to y}^{\left( k \right)}$ (red), $F_{y\to x}^{\left( k \right)}$ (cyan), $F_{x.y}^{\left( k \right)}$ (dashed cyan), $F_{x,y}^{\left( k \right)}$ (dashed red) with sampling interval k (parameters $\xi =0.05$ , $\beta =\frac{\pi }{5}$ , $\gamma =0.3$ , $\phi =\frac{\pi }{2}$ ). Corresponding spectra are plotted in the inset: ${{S}_{xx}}$ (cyan), ${{S}_{yy}}$ (red), and $\left| {{S}_{xy}} \right|$ (black).
Download figure:
Standard image High-resolution image

From this GC sampling structure, clearly, one is confronted with the question of what a proper τ is to obtain the GC value for causal inference or network topology extraction.

Properties I–II are general, not limited to nonlinear dynamics, and they also appear in many of the linear models we have examined. Figure 1(b) illustrates such an example of linear (second-order regressive) models:

$\begin{eqnarray}&&{{X}_{t}}=0.9{{X}_{t-1}}-0.6{{X}_{t-2}}-0.3{{Y}_{t-1}}+0.15{{Y}_{t-2}}+{{\epsilon }_{t}},\end{eqnarray} \tag{ 3a }$

$\begin{eqnarray}&&{{Y}_{t}}=0.7{{Y}_{t-1}}-0.4{{Y}_{t-2}}+{{\eta }_{t}},\end{eqnarray} \tag{ 3b }$

where ${{\epsilon }_{t}}$ and ${{\eta }_{t}}$ are Gaussian white noise with zero mean, $\operatorname{var}\left( {{\epsilon }_{t}} \right)=0.5$ , $\operatorname{var}\left( {{\eta }_{t}} \right)=1$ , and $\operatorname{cov}\left( {{\epsilon }_{t}},{{\eta }_{t}} \right)=0.2$ . Here, 'sampling interval length' $(\tau =k)$ refers to a 'coarse-grained' discrete time series constructed using an original datum point after skipping $k-1$ points in between. As shown in figure 1(b), clearly, the model also possesses properties I–II.

Next, we construct idealized models for which we have perfect control of causal influence and use them to ascertain the phenomena observed in the GC sampling structure analytically. Using the lag operator L, i.e. $L{{X}_{t}}={{X}_{t-1}}$ , and defining $a(L)=1+\mathop{\sum }^{}_{n=1}^{+\infty }{{a}_{n}}{{L}^{n}}$ , $b(L)=\mathop{\sum }^{}_{n=1}^{+\infty }{{b}_{n}}{{L}^{n}}$ , and $c(L)=1+\mathop{\sum }^{}_{n=1}^{+\infty }{{c}_{n}}{{L}^{n}}$ , the dynamics we consider are

$\begin{eqnarray}\begin{matrix} {{X}_{t}} & = & a(L){{\epsilon }_{t}}+b(L){{\eta }_{t}}, \\ {{Y}_{t}} & = & c(L){{\eta }_{t}}, \\ &&\end{matrix}\end{eqnarray} \tag{ 4 }$

where ${{\epsilon }_{t}}$ and ${{\eta }_{t}}$ are independent white-noise series, so that $\operatorname{cov}\left( {{\epsilon }_{t}},{{\eta }_{t}} \right)=0$ . Clearly, the structural connectivity in the system (4) means that ${{Y}_{t}}$ causally influences ${{X}_{t}}$ and is not influenced by ${{X}_{t}}$ .

We first consider model I: $c(L)=1$ , $\operatorname{var}\left( {{\epsilon }_{t}} \right)=\Omega$ , $\operatorname{var}\left( {{\eta }_{t}} \right)=1$ . Denoting the Fourier transforms of a(L) and b(L) by $a\left( \omega \right)=1+\mathop{\sum }^{}_{n=1}^{+\infty }{{a}_{n}}{{e}^{-in\omega }}$ and $b\left( \omega \right)=\mathop{\sum }^{}_{n=1}^{+\infty }{{b}_{n}}{{e}^{-in\omega }}$ , respectively, we obtain the spectra of $\left\{ {{X}_{t}},{{Y}_{t}} \right\}$ as ${{S}_{xx}}=a\left( \omega \right){{a}^{*}}\left( \omega \right)\Omega +b\left( \omega \right){{b}^{*}}\left( \omega \right)$ , ${{S}_{yy}}=1$ , and ${{S}_{xy}}=b\left( \omega \right)$ . Here, we first consider ${{S}_{xx}}=C$ (C is a constant), i.e. there are to be no oscillations in $\left\{ {{X}_{t}} \right\}.$ Note that ${{S}_{xx}}\left( \omega \right)$ is the spectrum of the original time series $\left\{ {{X}_{0}},{{X}_{1}},{{X}_{2}},\cdots \right\}$ . We denote by $S_{xx}^{\left( k \right)}\left( \omega \right)$ the spectrum for the time series $\left\{ {{X}_{0}},{{X}_{k}},{{X}_{2k}},\cdots \right\}$ , sampled from the original $\left\{ {{X}_{n}} \right\}$ with the sampling interval of length k. For this model, we can easily show that $F_{x.y}^{\left( k \right)}\equiv F_{x\to y}^{\left( k \right)} \equiv 0$ , which are GC computed using the time series sampled with interval length k. Therefore, for the sampling interval k, we have

$\begin{eqnarray*}&&F_{y\to x}^{(k)}=F_{x,y}^{(k)}=-\frac{1}{2\pi }\mathop{\int }^{}_{-\pi }^{\pi }\ln \left( 1-\frac{S_{xy}^{(k)}(\omega )S_{yx}^{(k)}(\omega )}{S_{xx}^{(k)}(\omega )S_{yy}^{(k)}(\omega )} \right)\text{d}\omega \end{eqnarray*}$

by the definition of the total GC. The only spectrum component that varies with k is the cross-spectrum $S_{xy}^{\left( k \right)}$ . To illustrate the behavior of the GC sampling structure, we choose $b(L)=\mathop{\sum }^{}_{n=1}^{+\infty }{{e}^{-\xi n}}\cos \left( \beta n+\phi \right){{L}^{n}}$ , which reflects the typical oscillatory, decaying behavior of the correlations in the time series. Here ξ, β, and ϕ are constants. Under the condition that $C{{\left| b\left( \omega \right) \right|}^{2}},$ using the spectral representation of $F_{y\to x}^{\left( k \right)}$ above, we can show that $F_{y\to x}^{\left( k \right)}$ has the asymptotic form

$\begin{eqnarray}&&F_{y\to x}^{\left( k \right)}\approx {{e}^{-2\xi k}}{{\cos }^{2}}\left( \beta k+\phi \right)/C\end{eqnarray} \tag{ 5 }$

for large k, which clearly possesses properties I–II. In particular, for suitable values of β and $\phi ,$ the GC value vanishes on a certain set of k. Figure 1(c) shows such an example, which also confirms that the GC asymptotic solution for large k agrees well with the numerically obtained GC sampling structure from the time series. By the model construction (4), there is a causal flow from ${{Y}_{t}}$ to ${{X}_{t}}$ . However, through our asymptotic analysis, we confirm that the corresponding GC vanishes on certain sampling intervals, k.

To emphasize another kind of sampling artifact [30] in the GC sampling structure, we consider model II, for which $a(L)=1+\mathop{\sum }^{}_{n=1}^{+\infty }{{\text{e}}^{-\xi n}}\cos \left( \beta n \right){{L}^{n}}$ , $b(L)=\gamma \mathop{\sum }^{}_{n=1}^{+\infty }{{\text{e}}^{-\xi n}}\cos \left( \beta n+\phi \right){{L}^{n}}$ , and $c(L)=1+\mathop{\sum }^{}_{n=1}^{+\infty }{{\text{e}}^{-\xi n}}\cos \left( \beta n \right){{L}^{n}}$ , where ξ, β, γ, ϕ are constants: $\operatorname{var}\left( {{\epsilon }_{t}} \right)=1$ , $\operatorname{var}\left( {{\eta }_{t}} \right)=1$ , and $\operatorname{cov}\left( {{\epsilon }_{t}},{{\eta }_{t}} \right)=0.$ As shown in figure 1(d), model II also possesses properties I–II. Importantly, it displays a prominent feature of spurious GC as property

$F_{x\to y}^{\left( k \right)}$ (the red curve in figure 1(d)) has a non-zero value, which becomes appreciable for $k>1$ , despite the fact that there is no causal influence from $\left\{ {{X}_{t}} \right\}$ to $\left\{ {{Y}_{t}} \right\}$ by the model construction.

This phenomenon is also discernible in figure 1(a) for the nonlinear network, for which ${{A}_{xy}}=0$ , yet $F_{x\to y}^{\left( k \right)}$ can significantly deviate from zero over some interval τ, as clearly demonstrated by the 95% confidence region in figure 1(a).

As described above, properties I–III in the GC sampling structure are common features for both linear and nonlinear models. These properties have strong ramifications in the application of the GC analysis. Importantly, properties II and III demand additional sampling criteria for establishing the reliability of any conclusions about the causal influence for both linear and nonlinear systems.

3. Small τ limit for GC

Because GC is not invariant with respect to τ, we cannot choose the sampling-interval length arbitrarily in the GC analysis. One possible solution is to use a discrete time series sampled with very fine intervals from a continuous-time process. Intuitively, this approach would incorporate ever more information for causality determination with ever finer intervals. However, as shown in figure 2(a), for the case of nonlinear network dynamics, we have the following phenomenon as property

the GC value, obtained using a discrete time series with the sampling interval $\tau ,$ approaches zero almost linearly as $\tau \to 0$ .

**Figure 2.** The GC sampling structure as $\tau \to 0$ for the uni-directed two-neuron network (2) with $\mu =1\;\text{kHz}$ , $\lambda =0.0177,$ s = 0.02. (a) $F_{y\to x}^{\left( \tau \right)}$ (gray dashed line) and $F_{x\to y}^{\left( \tau \right)}$ (black dotted line) obtained from voltage times series. (b) $\frac{1}{\tau }F_{x\to y}^{\left( \tau \right)}$ (dotted line), $\frac{1}{\tau }F_{y\to x}^{\left( \tau \right)}$ (gray dashed line), $\frac{1}{\tau }F_{x.y}^{\left( \tau \right)}$ (black dashed line ), $\frac{1}{\tau }F_{x,y}^{\left( \tau \right)}$ (black solid line), and the horizontal gray straight line is $-\mathop{\int }^{}_{-\infty }^{+\infty }\ln \left( 1-C\left( f \right) \right)\text{d}f$ (see equation (6)). The inset is the coherence C(f) computed from PSD. Note that, by the asymptotic distribution theory of GC [27], the estimator of a directed GC has a bias $p/n$ , where p is the regression order and n is the length of the time series. We have used the Bayesian information criterion [31] to determine the regression order p and have subtracted this type of bias in (a) and (b).
Download figure:
Standard image High-resolution image

**Figure 2.** The GC sampling structure as $\tau \to 0$ for the uni-directed two-neuron network (2) with $\mu =1\;\text{kHz}$ , $\lambda =0.0177,$ s = 0.02. (a) $F_{y\to x}^{\left( \tau \right)}$ (gray dashed line) and $F_{x\to y}^{\left( \tau \right)}$ (black dotted line) obtained from voltage times series. (b) $\frac{1}{\tau }F_{x\to y}^{\left( \tau \right)}$ (dotted line), $\frac{1}{\tau }F_{y\to x}^{\left( \tau \right)}$ (gray dashed line), $\frac{1}{\tau }F_{x.y}^{\left( \tau \right)}$ (black dashed line ), $\frac{1}{\tau }F_{x,y}^{\left( \tau \right)}$ (black solid line), and the horizontal gray straight line is $-\mathop{\int }^{}_{-\infty }^{+\infty }\ln \left( 1-C\left( f \right) \right)\text{d}f$ (see equation (6)). The inset is the coherence C(f) computed from PSD. Note that, by the asymptotic distribution theory of GC [27], the estimator of a directed GC has a bias $p/n$ , where p is the regression order and n is the length of the time series. We have used the Bayesian information criterion [31] to determine the regression order p and have subtracted this type of bias in (a) and (b).
Download figure:
Standard image High-resolution image

Therefore, the GC constructed in this way would give rise to a paradox that we eventually lose the ability for causal inference through GC despite the fact that more information is incorporated as $\tau \to 0$ . We present the mathematical reasons below for property IV, from which one can see that it is also a general phenomenon.

Because GC can be computed through the spectral representation, we first study the limit of the spectral matrix ${{\mathbf{S}}^{\left( \tau \right)}}\left( \omega \right)$ which is the spectrum of the time series $\left\{ {{X}_{n\tau }},{{Y}_{n\tau }} \right\},\ n\in \mathbb{Z}$ , with the sampling interval τ obtained from a bivariate continuous-time stationary process $\left\{ {{X}_{t}},{{Y}_{t}} \right\}$ , $t\in \mathbb{R}$ . Then, the covariance matrix $\mathbf{G}\left( n\tau \right)$ , $n\in \mathbb{Z}$ is a sampling of $\mathbf{G}\left( s \right)=\left[ \begin{matrix} \operatorname{cov}\left( {{X}_{t}},{{X}_{t-s}} \right) & \operatorname{cov}\left( {{X}_{t}},{{Y}_{t-s}} \right) \\ \operatorname{cov}\left( {{Y}_{t}},{{X}_{t-s}} \right) & \operatorname{cov}\left( {{Y}_{t}},{{Y}_{t-s}} \right) \\ \end{matrix} \right]$ , $s\in \mathbb{R}$ . Define $\mathbf{P}\left( f \right)=\mathop{\int }^{}_{-\infty }^{+\infty }\mathbf{G}\left( s \right){{e}^{-\text{i}s2\pi f}}\text{d}s$ , i.e. the power spectral density (PSD) of the continuous processes ${{X}_{t}},{{Y}_{t}}$ . By the Wiener–Khinchin theorem, ${{\mathbf{S}}^{\left( \tau \right)}}\left( \omega \right)={{\mathop{\sum }^{}}_{n\in \mathbb{Z}}}\mathbf{G}\left( n\tau \right){{e}^{-\text{i}n\omega }},$ noting that the relation between the frequency f for the continuous process and ω for the discrete time series is $\omega =2\pi \tau f$ , we obtain $\tau {{\mathbf{S}}^{\left( \tau \right)}}\left( \omega \right)\to \mathbf{P}\left( f \right)$ as $\tau \to 0$ . Using this relation, the total GC spectral representation yields

$\begin{eqnarray}&&\frac{1}{\tau }F_{x,y}^{\left( \tau \right)}\to -\mathop{\int }^{}_{-\infty }^{+\infty }\ln \left( 1-C\left( f \right) \right)\text{d}f\;\text{as}\ \tau \to 0\end{eqnarray} \tag{ 6 }$

where $C\left( f \right)=\left[ {{P}_{xy}}\left( f \right){{P}_{yx}}\left( f \right) \right]/\left[ {{P}_{xx}}\left( f \right){{P}_{yy}}\left( f \right) \right]$ is the coherence of the continuous-time processes ${{X}_{t}}$ and ${{Y}_{t}}$ . Similarly, we can also show that $\frac{1}{\tau }F_{x\to y}^{\left( \tau \right)}$ , $\frac{1}{\tau }F_{y\to x}^{\left( \tau \right)},$ and $\frac{1}{\tau }F_{x\cdot y}^{\left( \tau \right)}$ have limits as $\tau \to 0$ and these limits are intrinsic properties of the continuous processes. Therefore, the GC is linearly proportional to the sampling interval length for small $\tau .$

4. Procedure for reliable GC inference

From the above analysis, it is clear that one needs to be rather careful in interpreting causal inference using the GC analysis. Our analysis also provides a general approach to circumventing the issues discussed above: first, a range of τ should be used to sample continuous-time processes to obtain a set of discrete time series. Second, one computes the GC structure to examine its general behavior in order to (i) avoid using an accidental sampling interval τ for which GC may vanish despite the presence of causal influence, or (ii) avoid spurious non-zero GC values for finite τ as seen in figure 1(d) when there is no causal influence. Third, on account of the typical length scales ${{\tau }_{\text{osci}}}$ and ${{\tau }_{\text{d}}}$ of the oscillations and decay in the GC sampling structure, one should use fine sampling $\tau$ ${{\tau }_{\text{osci}}}$ , ${{\tau }_{\text{d}}}$ to find the linear range of the GC, ${{F}^{\left( \tau \right)}}$ (with the estimator bias [27] removed as in figure 2), as a function of τ, and then plot the ratio of ${{F}^{\left( \tau \right)}}/\tau$ to extract its limiting value as $\tau \to 0$ .

Returning to the network of two neurons, using the above approach, we can now successfully confirm the consistency between the causal connectivity and structural connectivity, i.e. that the topology ${{A}_{yx}}=1$ and ${{A}_{xy}}=0$ of the two-neuron networks can be reliably linked to $\frac{1}{\tau }F_{y\to x}^{\left( \tau \right)}\ne 0$ , $\frac{1}{\tau }F_{x\to y}^{\left( \tau \right)}=0$ as $\tau \to 0$ , as shown in figure 2(b).

In addition, we comment on some other aspects of the GC oscillation features observed in figure 1. As shown in figures 1(a) and (b) for both linear and nonlinear dynamics, the oscillation frequency in the GC sampling structure is twice the peak frequencies of the spectra. This relation also appears in the asymptotic solution (5) for model I, which contains the oscillation frequency ${{f}_{\text{GC}}}=\beta /\pi$ and whose $\left| {{S}_{xy}}\left( \omega \right) \right|$ has a peak at ${{f}_{\text{S}}}=\beta /2\pi ={{f}_{\text{GC}}}/2$ . Note that for nonlinear networks, the GC analysis extracts an underlying regression structure. The spectra (figure 1(a)) in the network case all have the same peak frequency, resembling those in model II (figure 1(d)), therefore, they share similar GC sampling structures by the GC spectral theory. Complicated oscillation structures in the GC sampling structures may arise and, in general, are related to the spectral peak frequencies of the original time series.

5. An example: small-world networks

Finally, we illustrate the importance of the above GC extraction strategy in an example of the reconstruction of a directed, small-world network of neurons. The small world network is initially wired using a similar method to the Watts–Strogatz method [22] with a modification to take into account directed edges [32, 33]. The small-world attributes are reflected in a large clustering coefficient and a small shortest path length in comparison with random networks. For our network of 100 excitatory neurons, the average clustering coefficient and the average shortest path lengths are 0.434 and 6.53, respectively. (For an undirected random network of the same number of nodes and edges, the average clustering coefficient and the average shortest path length would be 0.040, 2.44, respectively. For a directed random network, a node can then no longer always reach any other node through a directed path.) Using the voltage time series obtained from solving system (2), we can compute the conditional GC from neuron i to neuron j given the information about other neurons in the network (for details see [34]). The network topology is then successfully reconstructed (i.e. identical to that of the original network), as shown in figure 3(a), using the above GC extraction strategy. If a fixed sampling interval, say, $\tau =6.0\;$ ms, was used, the conventional GC analysis would produce a network topology that completely fails to capture the original network topology. This reconstructed network is shown in figure 3(b), which recovers only five directed connections while failing to find nearly 400 original connections in the total 9900 possible edges. Clearly, this reconstructed network, with a vanishing clustering coefficient and no connected paths, has even failed to capture the statistical structures of the network. Incidentally, we can also successfully reconstruct networks of coupled excitatory and inhibitory neurons with an arbitrary adjacent coupling matrix [19]. This example suggests that one should be very careful about the interpretation and implication of causal connectivity obtained using conventional methods for time series with a fixed sampling frequency. A systematic verification is needed to establish their validity.

**Figure 3.** Loss of small-world attributes. Left panel: a small-world I and F network of 100 excitatory neurons whose directed connections are constructed using our GC extraction strategy with the smallest sampling interval $\tau =1\;\text{ms}$ (see text). These have been verified to be identical to the original directed network topology; here, a neuron is represented by a small circle and a directed edge is indicated by an arrow pointing from the presynaptic neuron to the postsynaptic neuron. Right panel: a network topology constructed using the $\tau =6\;\text{ms}$ sampling interval in the conventional GC analysis. This has recovered only five directed connections, while failing to capture nearly $\sim 400$ original connections. This drastic loss of small-world attributes is a consequence of the incorrect sampling interval used in the GC analysis. The parameters for the I and F system (2) are $\mu =1\;\text{kHz}$ , $\lambda =0.012$ and s = 0.005.
Download figure:
Standard image High-resolution image

**Figure 3.** Loss of small-world attributes. Left panel: a small-world I and F network of 100 excitatory neurons whose directed connections are constructed using our GC extraction strategy with the smallest sampling interval $\tau =1\;\text{ms}$ (see text). These have been verified to be identical to the original directed network topology; here, a neuron is represented by a small circle and a directed edge is indicated by an arrow pointing from the presynaptic neuron to the postsynaptic neuron. Right panel: a network topology constructed using the $\tau =6\;\text{ms}$ sampling interval in the conventional GC analysis. This has recovered only five directed connections, while failing to capture nearly $\sim 400$ original connections. This drastic loss of small-world attributes is a consequence of the incorrect sampling interval used in the GC analysis. The parameters for the I and F system (2) are $\mu =1\;\text{kHz}$ , $\lambda =0.012$ and s = 0.005.
Download figure:
Standard image High-resolution image

6. Discussion and conclusion

In summary, we have shown that for both linear and nonlinear processes the computed GC is dependent on the sampling interval length τ. For instance, the GC may vanish on a certain set of τ despite the presence of true causal influence, or it can become non-zero over a range of τ despite the absence of any causal influence. Furthermore, the naive idea of using a sufficiently fine sampling interval, thus incorporating as much information as possible, will always give rise to vanishing GC regardless of whether there is a causal influence between the time series or not. By constructing simple idealized models, we have ascertained analytically the above phenomena as observed in both linear and nonlinear processes, and proposed approaches to overcome these sampling artifacts in order to obtain a reliable GC inference. We discussed the sampling issues of GC using the parametric methods of GC evaluations in the time domain as above. However, we point out that, naturally, similar issues about the sampling effects in the GC analysis also exist for non-parametric methods in the frequency domain. One has to deal carefully with the sampling issue regardless of whether a parametric method or non-parametric method of computing GC is used. We further comment that there is another related, interesting scientific question, namely, even for a fixed sampling rate, whether there is causal influence from one time scale to another in a coarse-grained sense. We note that this question was addressed in [35] by using a wavelet decomposition in time. Finally, we expect that our approach of obtaining reliable GC values can be used in analyzing GC relations for various kinds of observational data, such as EEG and fMRI signals, and may shed light on the impact of sampling effects in other empirical data-inference methods.

Acknowledgments

DZ is supported by the Shanghai Pujiang Program (grant no. 10PJ1406300), the National Science Foundation in China (grant nos. 11101275 and 91230202) and the Scientific Research Foundation for the Returned Overseas Chinese Scholars from the State Education Ministry in China. DC is supported by NSF-DMS-1009575. All authors are also supported by a research grant G1301 from NYU Abu Dhabi Research Institute. YZ and YX were partially supported by an Undergraduate Research Program in Zhiyuan College at Shanghai Jiao Tong University.

Reliability of the Granger causality inference

Article metrics

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. GC sampling structures

3. Small τ limit for GC

4. Procedure for reliable GC inference

5. An example: small-world networks

6. Discussion and conclusion

Acknowledgments

Reliability of the Granger causality inference

Article metrics

Share this article

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction

2. GC sampling structures

3. Small τ limit for GC

4. Procedure for reliable GC inference

5. An example: small-world networks

6. Discussion and conclusion

Acknowledgments