Abstract
We analyze, by means of Granger causality (GC), the effect of synergy and redundancy in the inference (from time series data) of the information flow between subsystems of a complex network. While we show that fully conditioned GC (CGC) is not affected by synergy, the pairwise analysis fails to prove synergetic effects. In cases when the number of samples is low, thus making the fully conditioned approach unfeasible, we show that partially conditioned GC (PCGC) is an effective approach if the set of conditioning variables is properly chosen. Here we consider two different strategies (based either on informational content for the candidate driver or on selecting the variables with highest pairwise influences) for PCGC and show that, depending on the data structure, either one or the other might be equally valid. On the other hand, we observe that fully conditioned approaches do not work well in the presence of redundancy, thus suggesting the strategy of separating the pairwise links in two subsets: those corresponding to indirect connections of the CGC (which should thus be excluded) and links that can be ascribed to redundancy effects and, together with the results from the fully connected approach, provide a better description of the causality pattern in the presence of redundancy. Finally we apply these methods to two different real datasets. First, analyzing electrophysiological data from an epileptic brain, we show that synergetic effects are dominant just before seizure occurrences. Second, our analysis applied to gene expression time series from HeLa culture shows that the underlying regulatory networks are characterized by both redundancy and synergy.
Export citation and abstract BibTeX RIS
Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
1. Introduction
Living organisms can be modeled as an ensemble of complex physiological systems, each with its own regulatory mechanism and all continuously interacting between them [1]. Therefore, inferring the interactions within and between these modules is a crucial issue. Over the past few years, the interaction structure of many complex systems has been mapped in terms of networks, which have been successfully studied using tools from statistical physics [2]. Dynamical networks have modeled physiological behavior in many applications; examples range from networks of neurons [3], genetic networks [4], protein interaction nets [5], and metabolic networks [6–8].
The inference of dynamical networks from time series data is related to the estimation of the information flow between variables [9]; see also [10, 11]. Granger causality (GC) [12, 13] has emerged as a major tool to address this issue. This approach is based on prediction: if the prediction error of the first time series is reduced by including measurements from the second one in the linear regression model, then the second time series is said to have a Granger causal influence on the first one. It has been shown that GC is equivalent to transfer entropy [14] in the Gaussian approximation [15] and for other distributions [16]. See [17] for a discussion about the applicability of this notion in neuroscience and [18] for a discussion on the reliability of GC for continuous dynamical processes. It is worth stressing that several forms of coupling may mediate information flow in the brain, see [19, 20]. The combination of GC and complex networks theory is also a promising line of research [21].
The pairwise Granger analysis (PWGC) consists of assessing GC between each pair of variables independently from the rest of the system. It is well known that PWGC cannot disambiguate direct and indirect interactions among variables. The most straightforward extension, the conditioning approach [22], removes indirect influences by evaluating to what extent the predictive power of the driver on the target decreases when the conditioning variable is removed. It must be noted however that, even though its limitations are well known, the PWGC approach is still used in situations where the number of samples is limited and a fully conditioned (CGC) approach is unfeasible. As a convenient alternative to this suboptimal solution, a partially conditioned approach, consisting of conditioning on a small number of variables, chosen as the most informative one for the driver node, has been proposed [23]; this approach leads to results very close to those obtained with a fully conditioned analysis that are even more accurate in the presence of a small number of samples [24]. We remark that the use of partially conditioned GC (PCGC) may also be useful in non-stationary conditions whereas the GC pattern must be estimated on short time windows.
However, sometimes a CGC approach can encounter conceptual limitations in addition to the practical and computational ones: in the presence of redundant variables, the application of the standard analysis leads to an underestimation of influences [25]. Redundancy and synergy are intuitive, yet elusive, concepts that have been investigated in different fields, from pure information theory [26–28], to machine learning [29] and neural systems [30, 31], with definitions that range from purely operative to conceptual. When analyzing interactions in a multivariate time series, redundancy may arise if some channels are all influenced by another signal that is not included in the regression; another source of redundancy may be the emergence of synchronization in a subgroup of variables, without the need for an external influence. Redundancy manifests itself through a high degree of correlation in a group of variables, both for instantaneous and lagged influences. Several approaches have been proposed to reduce dimensionality in multivariate sets that eliminate redundancy, which rely on generalized variance [32], principal components analysis [33], or GC itself [34].
A complementary concept to redundancy is synergy. The synergetic effects that we address here, related to the analysis of dynamical influences in a multivariate time series, are similar to those encountered in sociological and psychological modeling, where suppressors are variables that increase the predictive validity of another variable after its inclusion into a linear regression equation [35]; see [36] for examples of easily explainable suppressor variables in multiple regression research. Redundancy and synergy have been further connected to information transfer in [37], where an expansion of the information flow has been proposed to prove redundant and synergetic multiplets of variables. Other information-based approaches have also addressed the issue of collective influence [28, 38]. The purpose of this paper is to provide evidence that, in addition to the problem related to indirect influence, PWGC shows another relevant pitfall: it fails to detect synergetic effects in the information flow. In other words, it does not account for the presence of subsets of variables that provide some information about the future of a given target only when all the variables are used in the regression model. We remark that, because it processes the system as a whole, CGC evidences synergetic effects; when the number of samples is low, PCGC can also detect synergetic effects after an adequate selection of the conditioning variables.
The paper is organized as follows. In the next section, we briefly recall the concepts of GC and PCGC. In section 3, we describe some toy systems that illustrate how redundancy can affect the results of CGC, whereas indirect interactions and synergy are the main problems inherent in PWGC. In section 4, we provide evidence of synergetic effects in epilepsy: we analyze EEG recordings of an epileptic patient corresponding to ten seconds before the seizure onset; we show that the two contacts that constitute the putative seizure onset act as synergetic variables driving the rest of the system. The pattern of information transfer proves the actual seizure onset only when synergy is correctly considered.
In section 5, we propose an approach that combines PWGC and GCC to prove the pairwise influences are only because of redundancy and not recognized by CGC. The conditioned GC pattern is used to partition the pairwise links in two sets: those that are indirect influences between the measured variables, according to CGC, and those that are not explained as indirect relationships. The unexplained pairwise links, presumably due to redundancy, are thus retained to complement the information transfer pattern discovered by CGC. In cases where the number of samples is so low that a fully multivariate approach is unfeasible, PCGC may be applied instead of CGC. We also address here the issue of variable selection for PCGC, and consider a novel strategy for the selection of variables: for each target variable, one selects the variables sending the highest amount of information to that node, as revealed by a pairwise analysis. By construction, this new selection strategy works more efficiently when the interaction graph has a tree structure: indeed, in this case, conditioning on the parents of the target node ensures that indirect influences will be removed. In the epilepsy example, the selection based on the mutual information with the candidate driver provides results closer to those obtained by CGC. Finally we apply the proposed approach to time series of gene expressions, extracted from a data set from the HeLa culture. Section 6 summarizes our conclusions.
2. Insights into Granger causality
GC is a powerful and widespread data-driven approach used to determine whether and how two time series exert direct dynamical influences on each other [39]. A convenient nonlinear generalization of GC has been implemented in [40], exploiting the kernel trick, which makes computation of dot products in high-dimensional feature spaces possible using simple functions (kernels) defined on pairs of input patterns. This trick allows the formulation of nonlinear variants of any algorithm that can be cast in terms of dot products, for example support vector machines [41]. Hence, in [40], the idea is still to perform linear GC but in a space defined by the nonlinear features of the data. This projection is conveniently and implicitly performed through kernel functions [42] and a statistical procedure is used to avoid over-fitting.
Quantitatively, let us consider n time series ; the lagged state vectors are denoted
where m is the order of the model (window length). Let denote the mean squared error prediction of on the basis of all the vectors (corresponding to the kernel approach described in [43]). The multivariate GC index is defined as follows: consider the prediction of on the basis of all variables except and the prediction of using all variables, then the GC is the (normalized) variation of the error in the two conditions, i.e.,
In [44], it has been shown that not all the kernels are suitable to estimate GC. Two important classes of kernels that can be used to construct nonlinear GC measures are the inhomogeneous polynomial kernel (whose features are all the monomials in the input variables up to the pth degree; p = 1 corresponds to linear GC) and the Gaussian kernel.
The PCGC is given by:
The PWGC is defined as follows. Let denote the variables in , excluding and . Then equation (1) can be written as:
When is only a subset of the total number of variables in not containing and , then is called the PCGC. In [23], the set is chosen as the most informative for . Here, we will also consider an alternative strategy: fixed a small number k, we select as the k variables with the maximal pairwise GC w.r.t. that target node, excluding .
3. Examples
In this section, we provide some typical examples to note possible problems that pairwise and fully conditioned analysis may encounter.
3.1. Indirect GC among measured variables
We consider the following lattice of ten unidirectionally coupled noisy logistic maps, with
and
with . Variables η are unit variance Gaussian noise terms. The transfer function is given by . In this system, the first map is evolving autonomously, whereas the other maps are influenced by the previous maps with coupling ρ, thus forming a cascade of interactions. In figure 1(a), we plot, as a function of ρ, the number of GC interactions found by PWGC and CGC using the method described in [25] with the inhomogeneous polynomial kernel of degree two. The CGC output is the correct one (nine links), whereas the PWGC output also accounts for indirect influences and therefore fails to provide the underlying network of interactions. In this example we also tested PCGC (see figure 1(b)). We considered only one conditioning variable, chosen according to the two strategies described previously. First, we consider the most informative w.r.t. the candidate driver, as described in [23]; we call this information-based (IB) strategy. Second, we choose the variable characterized by the maximal pairwise influence to the target node, a pairwise–based (PB) rule. The PB strategy yields the correct result in this example, whereas the IB strategy fails when only one conditioning variable is used and requires more than one conditioning variable to provide the correct output. This occurrence is due to the tree topology of the interactions in this example, which favors PB selecting by construction the parents of each node.
3.2. Redundancy due to a hidden source
Here, we show how redundancy constitutes a problem for CGC. Let h(t) be a zero mean and unit variance hidden Gaussian variable, influencing n variables , and let be another variable influenced by h but with a larger delay. The variables are unit variance Gaussian noise and s controls the noise level. In figure 1(c), we depict both the linear PWGC and the linear CGC from one of the x s, to w (note that h is not used in the regression model). As n increases, the conditioned GC vanishes as a consequence of redundancy. The GC relation that is found in the pairwise analysis is not revealed by CGC because variables are maximally correlated and thus xi drives w only in the absence of any other variables.
The correct way to describe the information flow pattern in this example, where the true underlying source h is unknown, is that all variables are sending the same information to w, i.e., that variables constitute a redundant multiplet w.r.t. the causal influence to w. This pattern follows from observing that for all xs, CGC vanishes whereas PWGC does not vanish. This example shows that, in presence of redundancy, the CGC pattern alone is not sufficient to describe the information flow pattern of the system and PWGC should also be taken into account.
3.3. Synergetic contributions
Let us consider three unit variance i.i.d. Gaussian noise terms x1, x2, and x3. Let
Considering the influence , the CGC reveals that 3 is influences 4, whereas PWGC fails to detect this causal relationship—see figure 1(d), in which we use the method described in [25] with the inhomogeneous polynomial kernel of degree two, where x2 is a suppressor variable for x3 w.r.t. the influence on x4. This example shows that PWGC fails to detect synergetic contributions. We note that use of nonlinear GC is mandatory in this case to verify the synergy between x2 and x3.
3.4. Redundancy due to synchronization
As another example, we consider a toy system made of five variables . The first four variables constitute a multiplet made of a fully coupled lattice of noisy logistic maps with coupling ρ, evolving independently of the fifth variable. The fifth variable is influenced by the mean field from the coupled map lattice. The equations are, for :
and
where η denotes unit variance Gaussian noise terms. Increasing the coupling ρ among the variables in the multiplet , the synchronization of these variables (measured by Pearson correlations) increases and they become nearly synchronized for ρ greater than 0.1 (perfect synchronization cannot be achieved due to the noise terms); redundancy, in this example, arises due to the complex inherent dynamics of the units. In figure 2, we depict both the causality from one variable in the multiplet (x1; the same results hold for x2, x3, and x4 ) to x5, and the causality between pairs of variables in the multiplet: both linear and nonlinear PWGC and CGC are shown for the two quantities.
Download figure:
Standard image High-resolution imageConcerning the causality , we note that, for low coupling, both PWGC and CGC, with a linear or nonlinear kernel, correctly detect the causal interaction. Around the transition to synchronization, in a window centered at , all algorithms fail to detect the causality . In the almost synchronized regime, , CGC continues to fail due to redundancy, whereas the PWGC correctly provides the causal influence, using both the linear and the nonlinear algorithm.
As far as the causal interactions within the multiplet are concerned, we note that using the linear approach, we obtain small values of causality just at the transition, whereas we obtain zero values far from the transition. Using the nonlinear algorithm, which is the correct one in this example because the system is nonlinear, we obtain a nonzero causality among the variables in the multiplet, using both PWGC or CGC: the resulting curves are non-monotonous, as one may expect, due to the inherent nonlinear dynamics. For , nonzero GC is observed because of the noise that prevents the system to go in the complete synchronized state.
This example again shows that, in the presence of redundancy, one should consider both CGC and PWGC results. Moreover, it also shows how nonlinearity may render extremely difficult the inference of interactions: in this system, there is a range of values, corresponding to the onset of synchronization, in which all methods fail to provide the correct causal interaction.
4. Synergetic effects in the epileptic brain
As a real example, we consider intracranial EEG recordings from a patient with drug-resistant epilepsy with an implanted array of 8 × 8 cortical electrodes (CE) and two depth electrodes (DE) with six contacts each. The data are available in [45] and further details on the dataset are given in [46]. Data were sampled at 400 Hz. Here we consider a portion of data recorded in the preictal period, 10 seconds preceding the seizure onset. To handle this data, we use linear GC with m equal to five. In figure 3, we depict the PWGC between DEs (panel a), from DEs to CEs (panel b), between CEs (panel c), and from CEs to DEs (panel d). We note a complex pattern of bivariate interactions among CEs, whereas the first DE seems to be the subcortical drive to the cortex. We note that there is no PWGC from the last two contacts of the second DE (channels 11 and 12) to CEs and neither to the contacts of the first DE. In figure 4, we depict the CGC among DEs (panel a), from DEs to CEs (panel b), among CEs (panel c), and from CEs to DEs (panel d). The scenario in the conditioned case is clear: contacts 11 and 12, from the second DE, are the drivers both for the cortex and for the subcortical region associated to the first DE. These two contacts can be then associated to the seizure onset zone (SOZ). The PWGC strength among CEs is due to redundancy, as these are all driven by the same subcortical source. Because contact 12 is also driving contact 11 (see figure 4(a)), we conclude that contact 12 is the closest to the SOZ and that contact 11 is a suppressor variable for it, because it is necessary to include it in the regression model to verify the influence of contact 12 on the rest of the system. Conversely, contact 12 acts as a suppressor for contact 11. We stress that the influence from contacts 11 and 12 to the rest of the system emerges only in the CGC and is neglected by PWGC: these variables are synergetically influencing the dynamics of the system. To our knowledge, this is the first time that synergetic effects are found in relation with epilepsy.
Download figure:
Standard image High-resolution imageDownload figure:
Standard image High-resolution imageWe also apply PCGC on this data using one conditioning variable. The results are depicted in figure 5: using the IB strategy, we obtain a pattern very close to the one from CGC, whereas this is not the case of PB. These results seem to suggest that IB works better in the presence of redundancy; however, we do not offer arguments to claim that this a general rule. It is worth mentioning that in the presence of synergy, the selection of variables for partial conditioning is equivalent to the search of suppressor variables.
Download figure:
Standard image High-resolution image5. Combination of pairwise and conditioned Granger causality
In the preceding sections, we have shown that CGC encounters issues resulting in poor performance in presence of redundancy, and that information about redundancy may be obtained from the PWGC pattern. Here we develop a strategy to combine the two approaches: some links inferred from PWGC are retained and added to those obtained from CGC. The PWGC links that are discarded are those that can be derived as indirect links from the CGC pattern. In the following, we describe the proposed approach in detail.
Let Δ denote the matrix of influences from CGC (or PCGC). Let denote the matrix from PWGC. Nonzero elements of Δ and correspond to the estimated influences. Let these matrices be evaluated using a model of order m.
The matrix
contains paths of length with delays in the range . Indeed:
because all elements of Δ are non-negative, it follows that is non-vanishing if, and only if it is possible, in matrix Δ, to go from node i to node j moving steps backward and α steps forward, where a step is allowed if the corresponding element of Δ is not zero. Therefore, the nonzero elements of the matrix describe a situation where two nodes receive a common input from a third node that is α steps backward in time from one node and steps backward in time from the other node. In other words, if the element does not vanish, then there exists an indirect interaction between nodes i and j due to a common input. The circuit corresponding to is represented in figure 6(a): if the element is non-vanishing, then i and j are connected as shown in figure 6(a).
Download figure:
Standard image High-resolution imageBecause the order of the model is m, a simple comparison between the delays from the common source to i and j demonstrates that the indirect influence corresponding to the nonzero element might be detected by PWGC only if
this is equivalent to
Now, the matrix with , contains paths of length α with delays in the range , indeed:
Any nonzero element of the matrix describes an indirect causal interaction between nodes i and j, where i sends information to j through a cascade of α links: i→i1, , ..., . The circuit corresponding to F2 is depicted in figure 6(b). The indirect causal interaction , corresponding to the nonzero element , might be detected by PWGC if .
Let us now consider the matrix
where the first sum is over pairs satisfying If Bij is non-vanishing, then according to CGC, there is an indirect causal interaction between i and j: therefore, PWGC might misleadingly reveal such interact, considering it a direct one. In the approach previously described, we discard (as indirect) the links found by PWGC for which . Therefore, in the pairwise matrix , we set to zero all the elements such that (pruning). The resulting matrix contains links which cannot be interpreted as indirect links of the multivariate pattern, and will be retained and ascribed to redundancy effects,
For m = 1, we have that the only terms in the first sum are those with , so the first non-trivial terms are
For m = 2, the simplest terms are:
Because, due to the finite number of samples, a mediated interaction is more unlikely to be detected (by the pairwise analysis) if it corresponds to a long path, we limit the sum in the matrix B to the simplest terms.
As a toy example to illustrate an application of the proposed approach, we consider a system made of five variables . The first four variables constitute a multiplet made of unidirectionally coupled logistic maps, equations (4)–(5) with i ranging in , coupling ρ, and interactions , , and . The fifth variable is influenced by the mean field from the coupled map lattice, see equation (7). The four variables in the multiplet become almost synchronized for . In figure 7, we depict both the average influence from the variables in the multiplet to x5 and the average influence between pairs of variables in the multiplet: both linear and nonlinear PWGC and CGC are shown for the two quantities. Note that only the nonlinear algorithm correctly verifies the causal interactions within the multiplet of four variables, whereas the linear algorithm detects a very low causal interdependency among them. The driving influence from the multiplet to x5 detected by CGC vanishes at high coupling redundancy, both in the linear and nonlinear approach, due to the redundancy induced by synchronization.
Download figure:
Standard image High-resolution imageTo explain how the proposed approach works, we describe two situations corresponding to low and high coupling, respecively. At low coupling, the CGC approach estimates the correct causal pattern in the system, and the nonzero elements of Δ are , , , , , , and . The nonzero elements of the matrix , from PWGC analysis, are the same as Δ plus , , , corresponding to indirect causalities; however, these three interactions lead to nonzero elements of Δ2 (and, therefore, of B), hence they must be discarded. It follows that does not provide further information than Δ at low coupling.
On the contrary, at high coupling, due to synchronization, the CGC approach does not reveal the causal interactions , , , and , although still they are recognized by PWGC. Here, is still nonzero in correspondence to , , , and , while the corresponding elements of B are vanishing. According to our previous discussion, the interactions , , , and , detected by PWGC, should not be discarded: combining the results from CGC and PWGC, we obtain the correct causal pattern even in presence of strong synchronization.
6. Application to gene expression data
HeLa [47] is a famous cell culture, isolated from a human uterine cervical carcinoma in 1951. HeLa cells have acquired cellular immortality, in that the normal mechanisms of programmed cell death after a certain number of divisions have somehow been switched off. We consider the HeLa cell gene expression data of [48]. Data corresponds to 94 genes and 48 time points, with an hour interval separating two successive readings (the HeLa cell cycle lasts 16 hours). The 94 genes were selected from the full data set described in [49], on the basis of the association with cell cycle regulation and tumor development. We apply linear PWGC and linear CGC (using only another conditioning variable and using both IB and PB selection strategies, described in section 3). We note that the CGC approach is unfeasible in this case due to the limited number of samples. In this case, due to the limited number of samples, we do not use statistical testing assess the significance of the retrieved links. Rather, we introduce a threshold for the influence and analyze the pattern as the threshold is varied. In figure 8, results are reported as a function of the number of links found by PWGC npairwise (which increases as the threshold is decreased); we plot (1) the number of links found by PGC npartial, (2) the number of links found by PGC and not by PWGC nnovel, which are thus a signature of synergy, (3) the percentage of pairwise links that can be explained as direct or indirect causalities of the PGC pattern (thus being consistent with the partial causality pattern), found using the matrix B1 to detect the indirect links, which correspond to circuits like the one described in figure 6(a) and (4) the number of causality links found by PWGC and not consistent with PWGC nunexplained, corresponding to redundancy. The two curves refer to the two selection strategies for partial conditioning.
Download figure:
Standard image High-resolution imageThe low number of samples here allowed us to use only one conditioning variable, and therefore, to analyze the circuits of only three variables; a closely related analysis [50] has been proposed to study how a gene modulates the interaction between two other genes. Conversely, the true underlying gene regulatory network being unknown, we cannot assess the performances of the algorithms in terms of correctly detected links.
We note that both nnovel and nunexplained assume relatively large values; hence, both redundancy and synergy characterize this data set. The PB selection strategy yields slightly higher values of nnovel and nunexplained, emerging as a better discriminator of synergy and redundancy than IB. A comparison with the fully conditioned approach is not possible in this case. On the other hand, as far as the search for synergetic effects is concerned, we find that the synergetic interactions found by PCGC with the two strategies are not coinciding; indeed, only 10% of all synergetic interactions are found by both strategies. This suggests that, when searching for suppressors, several sets of conditioning variables should be used in CGC to explore more possible alternative pathways, especially when there is not a priori information on the network structure.
7. Conclusions
In this paper, we considered the inference, from time series data, of the information flow between subsystems of a complex network, which is an important problem in medicine and biology. In particular, we analyzed the effects that synergy and redundancy induce on the Granger causal analysis of time series.
Concerning synergy, we have shown that the search for synergetic contributions in the information flow is equivalent to the search for suppressors, i.e., variables that improve the predictive validity of another variable. Pairwise analysis fails to verify this kind of variable. However, fully multivariate GC solves this problem: conditioning on suppressor variables leads to nonzero GC. In cases when the number of samples is low, we have shown that PCGC is a valuable option, provided that the selection strategy to choose the conditioning variables, succeeds in picking the suppressors. In this paper, we considered two different strategies: choosing the most informative variables for the candidate driver node, or choosing the nodes with the highest pairwise influence to the target. From the several examples analyzed here, we have shown that the first strategy is viable in the presence of redundancy, whereas when the interaction pattern has a tree-like structure, the latter is preferable. However, the issue of selecting variables for PCGC deserves further attention, as it corresponds to the search for suppressor variables and correspondingly of synergetic effects. We have also provided evidence, for the first time, that synergetic effects are present in an epileptic brain in the preictal condition (just prior to seizure).
We have then shown that CGC approaches do not work well in the presence of redundancy. To handle redundancy, we propose to split the pairwise links into two subsets: those that correspond to indirect connections of the multivariate GC, and those that do not. The links that are not explained, as indirect connections are ascribed to redundancy effects and they are merged to those from CGC to provide the full causality pattern in the system. We have applied this approach to a genetic data set from the HeLa culture, and found that the underlying gene regulatory networks are characterized by both redundancy and synergy, hence these approaches are also promising w.r.t. the reverse engineering of gene regulatory networks.
In conclusion, we observe that the problem of inferring reliable estimates of the causal interactions in real dynamical complex systems, when limited a priori information is available, remains a major theoretical challenge. Recently, the most important results in this direction are related to the use of data-driven approaches, such as GC and transfer entropy. In this work, we have shown that in the presence of redundancy and synergy, combining the results from the pairwise and conditioned approaches may lead to more effective analyses.