Revealing networks from dynamics: an introduction

What can we learn from the collective dynamics of a complex network about its interaction topology? Taking the perspective from nonlinear dynamics, we briefly review recent progress on how to infer structural connectivity (direct interactions) from accessing the dynamics of the units. Potential applications range from interaction networks in physics, to chemical and metabolic reactions, protein and gene regulatory networks as well as neural circuits in biology and electric power grids or wireless sensor networks in engineering. Moreover, we briefly mention some standard ways of inferring effective or functional connectivity.

2.1 Systems of ordinary differential equations describing networks . . 1 Where are you linked?
1.1 Relating connection topology of a network to its dynamics Networks are everywhere. And most of them are dynamic. From networks of biochemical reactions that regulate the metabolism in the cells of our bodies to the neuronal circuits in our brains, from social ties forming networks of our friendships and collaborators to the power grids and internet that provide huge amounts of electric energy and information every second. All of these systems form networks of units that interact to yield complex, collective forms of functions -and all are crucial to our everyday life.
The interaction topology of complex networks strongly impact their collective dynamics and thus the function of entire systems. For many network dynamical systems, for instance in physics and biology, the dynamics of individual units becomes more and more accessible whereas their intricate web of interactions remains uncertain or even often largely unknown. As an example, many constituents of protein and gene interaction networks are well characterized but how they interact and which pathways are relevant for suitable functioning is not well understood [1,2]. In neuroscience, the number of units from which one can simultaneously measure neuronal activity is increasing rapidly from a few units to hundreds of them [3]. Still, identifying the synaptic connections of a neuronal circuit by anatomical methods is mostly restricted to individual synapses and computer-aided reconstruction based on optical methods for more than two cells becomes available only since very recently [4,5,6]. In social networks, even in simple settings such as basic games, pairwise interactions are roughly understood, but often both the (temporally varying) interaction network and its collective consequences remain a riddle. Thus, reconstructing the structure of interaction networks from (only) the collection of local dynamical data constitutes a current open challenge, with applications across the natural and social sciences as well as engineering.
Network dynamics: forward vs. inverse problem. Yet, the vast majority of research on network dynamics has focused on the "forward direction" of modeling and asked what types of collective dynamics emerge from a network of given topology. Researchers from the natural sciences and engineering systematically address the reverse questions -how to control a network or, more generally, how to design networks for a desired dynamics and thus function and how to infer topology from dynamics -now at a rapidly increasing pace: In particular in engineering, the design of systems for a specific function always was core [7] and with the systems becoming more complex, considering recurrently interacting networks becomes indispensable. Conversely, complex systems in physics and biology require a view on networked systems to understand how complex emergent phenomena contribute to (possibly optimal) system's dynamics or function [8,9,10,11,12]. Finally, also how one could perhaps redesign collective dynamics of networks, e.g., gene and protein networks [13,14] poses challenges of frontier research.  The inverse problem of how to infer interaction topology from network dynamics constitutes the main topic of this review.

Aims and options of network inference
What do we aim to infer? Before addressing any inference problem, we have to clarify what we actually want to find out about the networked system and which level of detail we are interested in, see Table 1. For instance, we may want to know effective or functional connectivity not caring about individual interactions per se but only about statistical dependencies which the entire set of interactions yield between pairs of units through the collective network dynamics. Inter-unit correlations and various information-theoretic measures have been devised to solve such problems. These often neglect the temporal dynamics as they use temporal averages or statistical distributions of observables. Moreover, effective connectivity may depend on the current collective state and function of a given system and thus the same physical network may display different effective topologies for different functions or in different states. We may alternatively want to know structural connectivity, i.e. which unit directly Figure 1: Diverse dynamic impact of structural links on effective connectivity. (a) Structural links may not be detectable by certain correlation measures due to strong independent driving signals (e.g. noise). For example, strong inputs along two links (from within or outside the network, marked by arrows) may decorrelate the dynamics of the two nodes (gray disks) although they directly interact. (b) Common input may create effective link without the structural link present. For example, common input from a third node (open circle) may create effective connectivity (dashed line) between two nodes (gray disks) that are not directly connected. In both examples, (a) and (b), it may depend on the entire collective state of the network (and the external inputs it may receive) whether or not an effective connection is detected.
interacts with which other units (and how)?
In this review, we focus on these direct physical interactions and address various levels of detail. We also briefly present basic methods to infer different types of effective connectivity . By construction, such a review cannot be complete, also because the field is currently developing at a breathtaking pace. We therefore select specific reconstruction methods from those that are commonly used, appear promising for the future of the field, or have been recently developed and form the basis of current research.
What can we learn about the connections of a network from accessing the units' dynamics? Mathematically, inferring the connectivity constitutes a high-dimensional inverse problem and various methods have been devised to address this question. Every inference method starts from different levels of pre-knowledge about the system and has its own aims what to reconstruct, cf. Table 1. We may be interested in whether the interactions are directed or undirected, in whether or not a link exists, in the sign of the interaction dynamics, the type of links (e.g., electric vs. chemical synapses, diffusive vs. nonlinear coupling), the strengths and the temporal and spatial scales of interactions etc.
Structural connectivity may be very different from effective or functional connectivity ( Figure 1). On the one hand, high correlations may exist between two units that are not directly connected but only influenced by each other via a third unit they both directly interact with. In general, such indirect interactions may be induced not only by one third node, but equally by the entire collective dynamics of a network. On the other hand, even a strong direct interaction between two units does not necessarily mean that their dynamics is highly correlated; correlations could be submerged, e.g., by external noise or recurrent inputs the two units receive, e.g., from two distinct other parts of the network, or even from outside of it.
In general, effective and structural connectivity are related in a highly nontrivial way. In fact, a number of counter-intuitive phenomena have been observed in various systems. For instance, recent work on coupled oscillator networks highlight that under certain conditions noise may aid to reconstruct structural connectivity from correlation-based effective connectivity [15]; and one may detect small world features in the functional connectivity even if it is derived from randomly connected dynamical systems without any specific small-world features [16].
Where do we start from? It is important to clarify which knowledge about the networked system we presuppose. Is the collective dynamics known to be simple such as converging to a fixed point of concentrations in biochemical networks or to a limit cycle in coupled oscillator networks? Or do we expect more complex, irregular, and perhaps unpredictable, chaotic or random types of dynamics? Can we change the dynamics of units by interfering with the system or can we just observe? Does the research question require a global analysis in state space or do we focus on a specific dynamical state where local analysis may suffice? We should answer these questions, among others, before using or developing any inference method -to achieve reconstruction at the level we need with best quality and minimal efforts, both experimentally and computationally.
Here we review recently developed approaches to inferring structural connectivity of a network from accessing its collective dynamics. The presented approaches assume various levels of pre-knowledge about the system and may or may not require the observer to interfere with the system. The article is structured as follows. We first clarify in Section 2 what we mean by a network dynamical system, taking the view of continuous-time dynamics described by coupled ordinary differential equations. In Sections 3, 4, and 5, we explain three principally different classes of methods to obtain information about the structure of the network topology from dynamical quantities. Section 3 illustrates basic theoretical approaches based on measuring and evaluating the response of a network to external perturbations or driving. As the response depends on both the external driving signal and the interaction topology of the network, sufficiently many driving-response experiments yield information about the entire network topology. A second class of methods sets up a model copy of the original system (Section 4) and adapts the coupling matrix of the model so as to synchronize its dynamics with the original. If synchronization is achieved, the topology obtained for the model is taken as a proxy for the original. Finally, a set of direct methods (Section 5) rely on measuring time series (or features thereof), evaluating temporal derivatives, and exploiting smoothness assumptions to find solutions to an optimization problem given by the restrictions by data.
We briefly comment on technical issues (Section 6) and mention basic core approaches for identifying effective connectivity (Section 7). These approaches rely on simple linear correlation, maximum entropy principles and related statistical inference methods. Finally, we provide an outlook (Section 8) where we highlight current challenges, point out aspects sometimes overlooked and show potential research paths towards uncovering more of the topology of the networks that surround us. This review on purpose is short but self-contained. It is intended for researchers mainly in the physical and biological sciences and engineering and assumes basic knowledge of dynamical systems and probability theory. Let's start with the details.

Systems of ordinary differential equations describing networks
Throughout the main part of this review, we consider networks of units assumed to be described by systems of ordinary differential equations. Discrete time maps coupled to a network are not discussed but typically approaches similar to those presented here are viable in slightly modified form. We discuss specific issues for systems of pulse-coupled units, such as spiking neurons, in section 5. These are formally hybrid systems, i.e. mixtures of continuous-time and discrete time systems.
Assuming that interactions occur between pairs of coupled units, a generic network dynamical systems is given by the state of the i-th unit at time t ∈ R, and the functions f i : R D → R D and g ij : R D × R D → R D mediate intrinsic and interaction dynamics of the Ddimensional units, respectively. The term I i (t) represents a vector of external driving signals (possibly random) and ξ i (t) symbolically represents external noise. Finally, the J ij define the interaction topology, in the simplest setting in terms of the adjacency matrix A such that J ij = A ij = 1 if there is a direct physical interaction from unit j to i and J ij = A ij = 0 otherwise. In general, units' interaction may be higher order, e.g. requiring terms like h ijk (x i , x j , x k ) added to the right hand side of (1). For instance in gene and protein interaction networks, a protein (the so-called transcription factor, say unit k) is directly influencing the rate of transcription of a gene (say, unit j) to a DNA segment (say unit i). We do not treat such terms here explicitly. Their relevance for network dynamical systems is discussed in [17]. For some methods to infer effective connectivity, the functional form of (1) does not directly enter the inference argument, other methods can be extended to include higher order terms explicitly. We comment on higher order terms where appropriate (cf. also Sect. 5).

Rescaling, simplifications, and common interactions
Some a priori technical issues: Considering (1) as our basic level of description, in case the dimension D i of the local dynamical system i depends on unit i, we would just consider the maximal occurring dimension D = max i∈{1,...,N } D i and for each unit ignore the D − D i dummy variables. This is done purely for notational simplification. Note further that in general, the quantity J ij is a D × D matrix of coupling strengths J dd ′ ij , but that for many paradigmatic model systems, only one of these elements is non-zero, i.e.
ij is sufficient to describe the influence of unit j on i. Given a dynamical system of the form (1), the right hand side is determined up to N × D additive constants C d i and one overall multiplicative constant C 0 . Shifting the state variables x d i to co-moving frames and rescaling time enables us to set C d i = 0 for all i and all d and C 0 = 1 without loss of generality. For simplicity of presentation, we furthermore describe the methods below as if they were for networks of coupled one-dimensional units only. Often, different dimensions may be treated independently during reconstruction.
We thus take the coupled equations as our basic network characterization we start from, where now the variables x i and x j and the functions f i and g ij are treated as real scalars. Common Interaction Functions. Only non-trivial coupling terms that are not identically zero, for the relevant domain of arguments actually contribute to interactions, so we assume all coupling functions g ij for which J ij = 0 to be non-trivial in this sense. A broad range of systems exhibits diffusive coupling such that which is a special case of coupling functions that depend on state differences (e.g. phase differences for coupled oscillators) only. The simplest non-trivial form of interaction is linear coupling, and does not depend on the dynamical variable of the unit it influences.
We have now set the stage to dive into specific inference approaches.

Driving-response experiments
One idea of inferring network topology is to measure the collective response of a network dynamical system to driving by external signals. For instance, if a system exhibits a stable collective state (e.g. fixed point or periodic orbit, cf. Fig. 2), it will relax back to that state after a transient input (pulse), if the latter is not too strong (such as to not kick the system out of the basin of attraction of the stable state) [18,19,20,21,22]. Here, the input signal (driving) effectively changes the initial condition of the system, leaving the system features (given by the local and interaction functions and their parameters) the same. Similarly, a stable state of the system will generically move in state space (and keep qualitatively the same stability properties) in response to sufficiently weak, temporally constant external perturbations. These kinds of perturbations effectively creates a non-identical but similar systems with different parameters determined by the driving signal. Both the relaxation dynamics and the shift in state space in general depend not only on the external signal (which unit is perturbed, how and how strongly, i.e. known quantities), but also on the (unknown) interaction topology of the network. Each collective response of the system to an external perturbation yields a restriction on the network topology such that sufficiently many drivingresponse experiments may reveal the entire topology. In this section, we present the main ideas underlying several related driving-response approaches.

Restrictions from local fixed point analysis
Recent efforts for developing methods to identify a network's topology have emerged from the need to understand biological, in particular gene regulatory networks [1,23,24,25]. Such networks consist of genes and proteins that interact with each other within a cell [26,27]. These interactions in particular control (indirectly) the rates at which genes are transcribed into mRNA and the regulatory features emerge via the interactions and generally do not follow from the single-gene level [28]. Gene regulatory networks and other reaction networks, e.g., in chemistry or population dynamics, are often modeled by nonlinear differential equationṡ describing the rate of change of the expected numbers (or concentrations) z i (t) of entities (e.g. genes, atoms and molecules or individual organisms) at time t in terms of their dependence on the number of other entities [27]. 1 Here p is a set of parameters andĨ represents a set of external perturbations directed to the entities. The z i are typically positive real numbers but mathematically there is no restriction for them to also be negative. Under the assumption that such a system is close to a steady state z * = (z * 1 , z * 2 , . . . , z * N ), a stable fixed point where f i (z 1 , z 2 , . . . , z N ; p,Ĩ) = 0 for all i, the dynamics for perturbations i from steady state concentrations of such nonlinear models may be approximated to first order in the x i bẏ where the local Jacobian J ij = (∂f i /∂z j )(z * ) is the effective interaction matrix given the steady state and I i (t) is assumed to be an external perturbation linearly coupling to deviations of variables x i . We remark that when deriving (8) from (7) we implicitly use the relation ignoring the second and higher order terms.

Driving the system constantly to move a stable state (changing parameters)
The method presented now effectively moves the parameters and thus in particular the fixed points of a system. Originally intended for gene and protein interaction networks, Gardner and coworkers [1,29] have explicated that reconstruction is possible, at least for small networks. We here discuss two approaches of using the general form (8) to reconstruct the interaction network, i.e. the J ij . The first is based on driving the system (8) by sufficiently small, temporally constant driving forces I i (t) = I i,m to new fixed points x * i,m = 0 close to the original one x * i = 0. As the original fixed point was structurally stable, the new fixed point will generically exist and have qualitatively the same stability properties if the I i,m are small enough (Fig. 2a). This is because solutions of (7), in particular fixed point solutions, typically vary continuously with changing parameters (here: changing I i,m from zero) and there is no bifurcation close to the parameters yielding a generic stable fixed point y * i . The observed values assumed at each new steady state together with the (known) driving signals provide information about the interaction topology J ij .
one for each experiment m and for each unit i.
After an arbitrary number M of experiments, eq. (10) in matrix form becomes where J ∈ R N ×N represents the connectivity among the units, X ∈ R N ×M the steady state values with X i,m = x * i,m , and Y ∈ R N ×M the perturbations Y i,m = −I i,m that we assume to be known. This matrix equation restricts the connectivity J given the measured data X and the input perturbations I. The matrix equations constraining the full network topology J can be split into N equations one for each input connectivity J i := (J i,1 , . . . , J i,N ) ∈ R 1×N of a unit i. Thus, the same set of data X restrict all the sets of units providing interactions to i ∈ {1, . . . , N } but the data Y i = (Y i,1 , . . . , Y i,M ) T are unit dependent. This reduction to N individual equations also admits to split the computational effort for solving them. The problem becomes trivially parallelizable because for different i, these restrictions (12) are independent in the sense that reconstruction of the input coupling strengths to each unit i can be performed without taking care of input coupling strengths of other units k = i.

Observe relaxation to stable state after transient driving (changing initial conditions)
A second approach assumes that the quantities y i (and thus the x i ) are perturbed such that at time t 0 we have x i (t 0 ) = x  (11), but now with the Y i,m = −x i,m being estimated of the derivativesẋ i (t m ). We remark that these derivatives may be estimates in various ways, each of them requiring a resolution of the measured data on sufficiently small time scales, cf. section 5.1. In this second approach, the different times the transient dynamics is measured replaces the different driving experiments in the first approach. Finally, both approaches can of course be combined, several experiments evaluated at several time points, again yielding the same form of restrictions (11).

Solving the restricting equations
How can we finally obtain the coupling elements J ij and thus the interaction network? In principle, solving the matrix equation (11) yields the interaction matrix J as a function of the known data X and Y . One may naively assume that it is directly solvable once the number of experiments equals the number of units in the network, M = N . However, this problem can be numerically ill-conditioned [30] for large N , such that the result is not reliable. In addition, as also stressed in [1], the results may be sensitive to noise in the measured data.
A way to overcome this problem is by performing (many) more experiments than nodes available, M ≫ N , thus over-determining (11). In general, due to noise and measurement inaccuracies, this yields the system (12) to be inconsistent such that there is no vector J i that satisfies all constraints. It will still be possible to find a robust approximationĴ i that minimizes the error between the predicted dynamics J i X and the actual dynamics Y i for a given node. Specifically, this error function may be modeled as where the distance measure d(v, w) = v −w p p between two vectors v, w ∈ R M is commonly defined in terms of the pth power of an L p -norm with p ≥ 1, due to its convexity properties. This guarantees that any local minimum of E i is also a global minimum [31]. Particularly, the L 2 -minimization criterion, is of great importance because it has an analytical solution for its extremum. Equating to zero the derivatives of the error function with respect to the matrix elements, ∂ ∂J ik E i Ĵ i ! = 0, yields an analytical solution (see Appendix A) to L 2 error-minimization given bŷ Evaluating such equations for all i ∈ {1, . . . , N } yields the complete reconstructed networkĴ. This mathematical form of minimum L 2 -norm solution is implemented in many mathematical packages (e.g. as the mrdivide function in Matlab [32] or the LeastSquares function in Mathematica [33]). We explicate that the obtained off-diagonal termsĴ ij serve as the best estimate (in the procedural sense using the L 2 -minimization above) for the coupling constants J ij ; at the same time, the diagonal elements J ii are not relevant for the network topology because the influence of these terms on the dynamics of unit i is physically indistinguishable from an intrinsic drive to i included in the local dynamics specified by f ( Experimentally, it is in principle possible to over-determine a system of equations by performing repeated measurements on the network until the condition M ≫ N is achieved. Nevertheless, it may often seem unsuitable for large networks due to the large number of experiments that would be required. So, if the size of the network is an issue or the number of available measurements insufficient to over-determine the system, we have M < N . Assuming that the network is sparse (i.e. that each unit is connected with a small number of others and thus many connection strengths are J ij = 0), may still yield the collection of all network links. This implies that several J ij are effectively set to zero, therefore decreasing the amount of unknown coefficients to be solved for. It leaves us with the problem of finding which links are actually present and which are not. We present two related options to do so.

Sparse solution with a bounded connectivity per unit
If an upper bound for the number of links K i < M is known, we may assume only M < N experiments are available and we have a rough idea of how many nodes (at most) are connected to a particular node. In particular, assume that the number of incoming connections for node i is given by at most K i < M [1]. This assumption shifts the system (12) from having more unknowns than constraints, M < N , to have more constraints than unknowns, M > K i , therefore, implicitly over-determining the system. It means that out of the N nodes present in the network only K i of them are chosen to be part of the system of equations for node i. Such assumption may be done when there is some a priori information about the network's connectivity and dynamics. Specifically, the system of equations (12) may in principle be rewritten as where B i ∈ R 1×Ki is the reduced connectivity vector for node i that contains the coupling strengths for the selected nodes and Z i ∈ R Ki×M is a matrix that contains the states of such nodes. If we knew which K i of the N − 1 possible connections actually contributed, we could use eq. (16) to solve (17) using L 2minimization yieldingB whereB i is the best approximation to B i . Yet, it so far remains unclear which of the N Ki = N ! Ki!(N −Ki)! possible combinations of incoming connections is best suited for reproducing the dynamics of i. In principle,B i may be calculated for each combination of K i genes, and the combination that yields the smallest value of the L 2 norm B i 2 in (18) may be chosen as the best estimate for J i . The efficiency of such a procedure relies in number K i of interactions per node. Hence, choosing a proper K i aiming to recover the largest number of real interactions with the smallest number of false positives is a key factor to achieve a successful topological reconstruction.

Maximizing the sparseness of the connectivity matrix
If M < N and the K i are unknown, cannot be estimated or there are too many of them (making the combinatorial search practically impossible) maximizing the number of zero entries in J, (i.e. minimizing the K i and thus maximizing sparseness) may be a way to solve (12). This approach is particularly useful if the only a priori knowledge about the network's connectivity is some sparsity.
For general matrix equations where yields an analytic solution whereΣ = Σ T ΣΣ T −1 that parametrizes the space of all solutions through the vector c ∈ R n×1 with c i = 0 for i ∈ {1, ..., r} and r = Rank(A).
In our reconstruction problem, we are seeking to maximize the number of zero entries in J based on solving the restricting equations (12) for J i as we solved (19) for y. Consider the transpose of (12). The analogous SVD-based solution then reads where U ∈ R M×M , V ∈ R N ×N ,Σ ∈ R N ×M and c ∈ R N ×1 is a vector of remaining coefficients parametrizing the solution space. Thus, the set of all possible solutions for J i is given by (23). The goal now is to pick the sparsest solution from this set. Therefore, eq. (23) may be posed as the overdetermined Minimizing the L 1 error yields a sparse solution [31]. However, unlike the L 2 minimization, L 1 minimization has no analytical solution, so choosing an appropriate iterative algorithm to solve it is essential. The Barrodale Roberts algorithm [34] provides a particularly fast solver that has been vastly used in the field of network reconstruction [10,12,29,35].
Remarks. The core equations (50) also provide the option to reconstruct network connectivity via maximizing sparseness of the network and there is a particular relation to what is known as compressive sensing cf., e.g. [36] .
In general, linearization of dynamical equations, e.g. linearizing in state variables close to fixed points, often well approximates nonlinear dynamics. This seems to hold for gene regulatory networks [29] as well as in models of Drosophila segmentation networks [37] and may thus be of general use across systems. For gene and protein interaction networks, often single genes are selected for perturbations in an experiment, with the danger of providing non-generic restrictions in (12). Finally, for some systems increasing the number of experiments may reduce the resulting computational costs such that this trade-in may be considered.

Driving the system's state to a fixed point
One may also infer network structure by externally driving the system to a fixed point and shifting component values of the fixed point for individual units [38,39]. As before, the differences between pairs of steady-state responses are analyzed. Let us describe the network aṡ where the A ij ∈ {0, 1} are the entries of the adjacency matrix specifying only if an interaction from j to i is present (A ij = 1) or not (A ij = 0), the g ij (x i , x j ) are the coupling functions from j ∈ {1, 2, . . . , N } to i, and I i is the driving signal applied to unit i. It was demonstrated by Yu and Parlitz [38] that under driving signals with sufficiently large gain factor θ ∈ R and Lipschitz continuous f i and g ij , the network may be driven to a globally stable fixed point that is arbitrarily close to a predetermined pointx := (x 1 ,x 2 , . . . ,x N ) T ∈ R N , independent of the initial conditions [38] . At such fixed point we have for all i. To understand how the network responds to changes inx, let us define hence, eq. (28) may be rewritten in terms of ∆ i as The main idea at this point is to check whether unit k couples to unit i by evaluating how x * i reacts to the shifting of x * k throughx k . Therefore, let us set and evaluate (30) at this point, yielding Shifting the same component twice tox (1) k andx (2) k , resp., fixes a reference frame and thereby yields equations characterizing the difference between responses of a given unit i to a shifted fixed point component for unit k. It results in a condition that may be rewritten as We remark that the differences [∆ ik,1 − ∆ ik,2 ] in (33) are not known but the general form (34) may be used to reveal whether they are zero, λ ik = 0, or not: Given that we are dealing with entries of the adjacency matrix, we may infer two possible outcomes from (34), whether system k is coupled to i or not. Specifically, It was also demonstrated by Yu and Parlitz [38] that ∆ i decreases with θ if f i and g ij are Lipschitz continuous. This permits to discriminate whether there is a coupling between a pair k → i. Especially, when θ is sufficiently large, the |S ik θ| values may be classified into sets I 0 and I 1 , non-coupled and coupled sets, respectively. To construct such sets, Yu and Parlitz [38] propose to: • For fixed k, organize the |S ik θ| values in an ascending order, i. e., the values should be arranged into a new series z where z k,j < z k,j+1 < . . . that defines the indexing j of the z k,j 's.
• Establish the critical values of each set. In this case, the critical values j c and j c+1 define the end and the beginning of I 0 and I 1 , respectively. Yu and Parlitz suggest to find j c by requiring the distance between any element from I 1 with respect to z i,1 to be larger than twice the size of I 0 , Finally, by performing this process on every unit, the topology of the network may be reconstructed.
Remarks: The approach relies on the feasibility of (i) perturbing the systems in a specific manner, and (ii) measuring the steady states, suggesting that it is model independent to a large extent in that it does not in principle require knowledge of local dynamics or coupling functions (yet it requires these to be Lipschitz continuous). These features may make the approach of interest under certain conditions where only little pre-knowledge about the system is available. Yet the approach requires substantial control over the system, in particular, the option of externally driving every unit (independently) constitutes a major requirement. The study [38] does not state how indirect actions are treated, for instance, how is an indirect effect from unit k via k ′ onto i distinguished from direct interactions from k to i? Possibly, driving k may indirectly affect i only weakly and this potentially second order contribution could be treated in a perturbative way.

Distributed perturbations to collective periodic dynamics
The approaches presented above (sections 3.1-3.3) required the existence of fixed points either in the original system or in the presence of sufficiently strong external driving. Yet, more complex dynamics prevails in a large range of biological, physical or artificial systems. The second most simple invariant dynamics are periodic orbits and often arise as limit cycles of coupled oscillatory units, thus asking for a generalization beyond simple fixed point approaches. Even more complex dynamics, e.g. collective chaos, is treated by a direct approach below in section 5.1. Is it possible to infer network topology from driving-response experiments also for oscillator networks? Below we positively answer this question, at the same time showing that distributed driving signals not precisely targeted to one or a few units are at least equally appropriate to infer network topology. Several theoretical model studies of coupled oscillators [40,41,42,43,44,45] have shown that the response of single units in a network to constant or periodic driving signals as well as the transient dynamics of synchronization depend on the network topology. Some recent works [43,44] helped us to understand specific quantitative influence of structural features on the response and how the network response provides some information about the structure (and the driving signal). For instance, the magnitude of responses seem to decay exponentially with distance from the driving node [43], and the coarse-scale connectivity among connected components may qualitatively determine to which degree network dynamics is coordinated globally [44]. Further developing such insights, a follow-up work [10] presents a method of reconstructing network topology from systematic measurements of network responses to temporally constant, distributed driving signals in coupled phase-oscillator networks.
The basic idea is that any network displaying a stable invariant dynamics, not just fixed points, yield a specific response to a given perturbation as a consequence of the network's topology and the perturbation itself [40,42,44] cf. Fig.  3. If the perturbations are small, the invariant set is typically qualitatively unchanged and only slightly moved in state space. Keeping track of which driving signals resulted in which responses, we can collect evidence about the interactions among units in a network. Sufficiently many repetitions of appropriate driving-response experiments then yield the network's topology.
Weakly coupled limit cycle oscillators are well-characterized by ignoring (in the long time limit) amplitude responses to coupling and by modeling them as phase-oscillators with coupling via their phase-differences only. A method to infer network topology for coupled phase oscillators with arbitrary stable, phase-locked dynamics has been presented in [10]. One key observation is that the phase differences (yet not the phases themselves) in such systems converge with time and that comparing differences of phase differences among different driving conditions yield restrictions to network topology. The network dynamics is given byφ where φ i (t) and ω i are the phase and natural frequency of oscillator i, respectively, J ij the connection strength from oscillator j to i and I i,m is a temporally constant driving signal applied to i during the experiment m. We assume that in the absence of driving, I i,m ≡ 0, the network is in a phase-locked state wherė φ j −φ i = 0 for all i, j. We remark that one, several or all units may be perturbed during each given experiment, such that driving can be arbitrarily distributed and effectively changes the frequencies of the driven oscillators. As for the approaches relying on fixed points (section 3.1), the existence of a stable periodic orbit (and thus in particular a phase-locked state) implies that sufficiently small constant perturbations yield a (only slightly moved and slightly different) stable periodic orbit. If for a given driving condition m, the dynamics becomes phase-locked, the phase differences become constant in time, ∆ ij,m (t) → ∆ * ij,m := lim t→∞ (φ j,m (t) − φ i,m (t)) because all oscillators move at the same collective frequency Figure 3: Topology revealed by driving a stable periodic orbit: As for fixed point approaches discussed above (Section 3.1 and Fig. 2), the difference vector v (red) depends on both the driving signal and the network topology such that several measurements of v under different driving conditions yields information about the topology.
Hence, if the network is perturbed by a sufficiently small driving signal, the original phase-locked state (for I i,m ≡ 0) is slightly moved such that |∆ * ij,m − ∆ * ij,0 | ≪ 1 and there is a small difference between the perturbed and nonperturbed collective frequencies Ω m and Ω 0 . Defining the effective frequency difference D i,m := Ω m − Ω 0 − I i,m of oscillator i, and approximating the arbitrarily nonlinear coupling functions g ij by a first order Taylor expansion around ∆ * ij,0 we obtain where θ j,m := φ j,m − φ j,0 is the phase shift andĴ is the Laplacian matrix of the network given byĴ Now, identifying the matrices X i,m = D i,m and Y i,m = θ i,m we have reduced the problem of identifying network topology using distributed perturbations in systems of limit cycle oscillators to solving the same linear algebraic equation (11). As remarked in previous sections, several experiments are necessary in order to perform the reconstruction ofĴ. Therefore, from repeated measurements for different conditions it is possible to rewrite eq. (39) in the form (11) in terms of Y = D ∈ R N ×M and X = Θ ∈ R N ×M representing the differences between a b Figure 4: Revealing network topologies from response dynamics for directed networks of phase-locked Kuramoto oscillators. Variables evolve Now, for sufficiently many experiments, i.e. M ≥ N , the reconstruction may be accomplished in principle but may be ill-conditioned numerically. In addition, for large N the many required experiments may not be practical. if this condition is not fulfilled, methods like the setting a maximum connectivity per system or maximizing the sparseness of the connectivity matrix (section 3.2.2), may be applied in order to make the problem of retrievingĴ an overdetermined problem.
As shown in [10], we may compare how accurate our prediction is by defining , and using a relative difference defined as where ∆J ij ∈ [0, 1] ∀ i, j . In addition, the quality of reconstruction Q α may be posed as the fraction of connection strengths which are assumed to be correct. Here α ≤ 1 is a constant employed to set the required accuracy for predictions and H is the required for a reconstruction with a quality level q and with a prediction accuracy α. Figure 5 illustrates these measures for random networks of phase-locked Kuramoto oscillators for several random topologies and parameters. The driving response method in principle may be applied to a broad variety of problems involving stable dynamics. A model analogous to eq. (39) could be inferred as long as the systems may be linearized around a stable state, allowing to retrieve the topology from the network responses as above. Yet, there may be practical problems. For instance, even for perturbations induced by constant driving signals, the invariant solution resulting from perturbations to more complex periodic orbits or other stable invariant sets may be describable only by time dependent quantities (and not, e.g. temporally constant phase differences), limiting the approach suggested above to specific classes of systems.

Features and restrictions
One common advantage of the approaches presented above is that their required computational effort scales well (weaker than linearly) with system size N such that at least moderately large systems appear accessible (cf. Figure 5). At the same time, the approaches are relatively simple to realize because they do not require knowledge in higher mathematics or computational approaches beyond a basic standard.
A possible route of generalization is to combine some of the above approaches. For instance, one may first drive a system to a stable fixed point as in section 3.1 and then apply small perturbations around that new point as in section 3.3.
Yet, all these approaches require the researchers to be able to access (measure and drive) the dynamics of all units in the system. Moreover, the local dynamics as well as the (approximate) form of interactions typically need to be at least partially known. The collective dynamics suitable for the drivingresponse approaches described above also need to be simple, in fact to exhibit a stable fixed point or periodic orbit or to admit the system to be driven to such as state. Finally, the presented inference of the existence of physical interactions and their functional form [46] seems well understood for networks of phase-oscillators, where perturbations in oscillation amplitude decays on faster time scale than the relaxation of phases. It thus remains an open problem how to use a driving response approach to properly infer structural network connectivity of coupled oscillators in systems, where the amplitude degrees of freedom play a role or are even dominant. More generally, systems exhibiting more complex dynamics, such asynchronous chaotic activity, bifurcations, multistability or other prevalent features of high-dimensional, nonlinear systems, currently still prevent network reconstruction by the methods presented above.
These requirements severely restrict the range of applicability in praxis to simple, well-accessible systems only. In particular for biological systems such as neural circuits or gene interaction networks, dynamics are typically more complex, systems are large and it is still hard to implement controlled largescale driving experiments on the single-unit level. Direct methods (section 5) that do not rely on driving the system seem to offer viable directions towards reconstructing networks with such more complex dynamics as well. A currently open question of research constitutes how to exploit recorded time series from only a fraction of the units.

Copy-Synchronization: Adapting a model copy
Another way of reconstructing network topology of a given network is by adapting the topology of a second, model system, a network copy, such that it synchronizes with the original system. The idea is to update the model topology continuously until the copy system exhibits a dynamics identical to the original system; the rationale is that the final topology of the copy is likely to be the original topology [47].
Specifically, consider an (original) system of the forṁ where f i and g ij are known and assumed to be Lipschitz continuous. This original system can in principle have arbitrary dynamics. Now, let us propose a model copyẏ where y i represents the state of the copy system, I i (t) are control feedback signals and K ij (t) is current coupling strength in the test system. In order to synchronize the copy to the original system, both the feedback signals I i (t) and the coupling strengths K ij (t) evolve in time and depend on the states of the actual and the copy system. Defining the synchronization error we adapt the coupling strengths in the model copy according tȯ and where α > 0 and γ ij > 0. We remark that here the local dynamics f i (.) as well as the coupling function g ij (.) need to be known in order to set up the test copy. It was proven in [47] that under feedback signals (48) with sufficiently large α, the synchronization error e i decreases in time,ė i ≤ 0 for all i such that the two systems converge to a synchronized state. The rationale is that after synchronization, the copy topology equals that of the original network, K ij ≈ J ij , cf. Fig. 6. With minor modifications on the control signals I i this method admits to reconstruct networks and sub-networks in the presence of disturbances and modeling errors as well [47]. We remark that to the best of our knowledge, there is no guarantee that the resulting topology of the copy system actually reflects the one of the original network. In particular, symmetries might lead to disambiguities.
Further, the method based on copy synchronization is model dependent such that knowing the intrinsic and coupling functions of units is vital, as in several parts of section (3). Its applicability has been explicated for sample networks of up to N = 16 nodes, but it remains unclear how to handle large networks as of the order of N 2 links K ij need to be co-evolved in time and a bound of convergence times is currently missing. At the same time, the copy approach does not require perturbations to the original systems, so experimental access to it need not include driving access to its units. Interestingly, Yu et.al [47] highlight that the method may be useful to track changes in a network's connectivity in real time.

Direct approaches
In the previous two sections, we have introduced methods to infer network connectivity by either interfering with the system (driving response approaches, Section 3) or by setting up and synchronizing a second, model system (section 4). Both classes of methods work if certain requirements are met (in particular, the option to actively drive the system or the option to synchronize the copy with the original system, respectively). It remains a challenge to infer network topology from dynamics without such requirements.

Reconstruction by purely observing dynamics 2
Methods based on copy-synchronization (section 4) assume that the local dynamics f i as well as the interaction functions g ij in (1) are known and the g ij do not depend on the state of unit i, g ij (x j ). Inferring network connectivity, i.e. the J ij , then relies on the construction of a second, model network, with dynamics governed by Eq. 1 and network parameters J ′ ij that are tuned to that of the real network by an error minimization procedure. As noted recently [12], one may solve the same reconstruction problem with significantly reduced efforts and reduced requirements by evaluating the states and their time derivatives directly from the time series recorded from the original system. In particular, such a simple direct method [12] removes the need to set up and synchronize a second, copied, system.
The idea is as follows: If the local dynamics and the coupling functions are known, their evaluations at different times are also known from recorded time series and the only remaining unknown parameters in Eq. 1 are the coupling strengths, which are to be determined. Specifically recording a time series x i (t m ) at sufficiently closely spaced times t m ∈ R and estimating the temporal derivatives 3 of it yields the dynamics of the i-th unit given bẏ If there are M such times, m ∈ {1, . . . , M }, we have M equations of the forṁ with abbreviationsẋ i,m := x i (t m ), f i,m := f i (x i (t m )) and g ij,m := g ij x i (t m ) , x j (t m ) . Repeated evaluations of the equations of motion (49) of the system at different times t m thus comprise a simple and implicit restriction on the network topology J ij as follows: writing Y i,m =ẋ i,m − f i,m and the matrix X i = (g ij,m ) j,m ∈ R N ×M , these equations constitute the matrix equation where Y i ∈ R 1×M and J i ∈ R 1×N is the i-th row of the interaction matrix J, comprising the vector (J ij ) j∈{1,...,N } of all input coupling strengths to unit i. The main restricting equations (51) again have the same form as the generic restrictions (11) and thus may be solved analogously. In [12], Euclidean L 2norm minimization was used to infer the topology. Numerical tests show that reconstruction works well for transient as well as attractor dynamics, for simple as well as complex, chaotic units and collective states, and even in the presence of noise that substantially alters the dynamics and thus the recorded time series. Figure 7 illustrates successful reconstruction in the presence of various levels of noise.
It is clear that certain types of dynamics do not allow topology inference. For instance, if all local dynamics are identical f i ≡ f , all coupling functions are identical, g ij ≡ g, and collectively the units are identically synchronous, i.e. all in the same states at the same times, there is no information about the connection topology one can possibly obtain from (only) recording time series, because for all strongly connected topologies 4 , the collective dynamics would be identical.
Further theoretical considerations show that following the same lines of analysis as above, all parameters occurring linearly in the local dynamics of coupling functions, can also be reconstructed by the same error minimization based on eqn. (11). For instance for the (fictional) coupling function g ij (x, y) = a sin(xy) − exp(b + y − x), the parameters a and b need not be known but can be inferred as well (up to one constant prefactor for each pair (i, j) of nodes), because both a and exp(b) occur linearly in the equations of  motions (49). In many physical systems, parameters actually do occur linearly. These include, for instance, the dynamics of coupled classical electric LCR circuits and the strengths of diffusive (4) or direct linear coupling (6). Moreover, widely used model systems for networks of neurons, such as leaky integrate-andfire and quadratic integrate-and-fire neurons [50] or networks of coupled chaotic systems such as Rössler or Lorenz systems [51,52] exhibit all or at least most parameters occurring linearly or affinely. For all such systems, local dynamics f i and coupling functions g ij may not or not completely be required to be known in advance.
Such a direct inference method [12] for network topology from complex collective dynamics thus serves as a simple starting strategy for connectivity reconstruction in a broad range of networked systems.

Pulse-coupling: Reconstruction from spike patterns
Networks of spiking neurons or other pulse-coupled units are formally hybrid systems [53], because the continuous-time flow is interrupted at certain time points, where events (e.g. spike sending or reception) modify the dynamics [54,55] via discrete time maps.
To start addressing the reconstruction problem for such hybrid systems, researchers have considered one of the simplest model networks of pulse-coupled units based on leaky integrate-and-fire (LIF) models. Such network models are most broadly used in mathematical analysis and computational modeling. Each unit in such a network has one state variable, its membrane potential, roughly emulates leaky capacitive properties observed for membranes of biological neurons and is complemented with a threshold where a pulse (action potential or spike [56]) is artificially created and the membrane potential is reset. Although these models lack certain biological details, such as a natural spike generating mechanisms, they are simple enough to be studied analytically and they have been useful for furthering our understanding how the information is processed among neurons [57].
Explicitly, the membrane potential V i (t) of a LIF unit i changes as in an RC circuit (resistor and a capacitor connected in parallel) according tȯ where R i , γ i and I i are the membrane resistance, inverse membrane time constant and the external current applied to neuron i, respectively. Once the potential of a unit j crosses a threshold, V j (t) ≥ V T,j , the potential is reset to V j (t + ) = V R,j and the unit emits a pulse that it transmitted to the neuron's post-synaptic neighbors [58]. The time of this pulse sending event is remembered as the unit's m-th spike time t j,m := t. The collection of such pulses then define the actions onto post-synaptic units i such that the quantity in (52) represents the pulses that unit i receives from the rest of the network. Here, J ij and τ ij are the synaptic coupling strength and the synaptic transmission delay from unit j and i, respectively. Furthermore, δ(.) is the Dirac delta distribution modeling a potential response kernel that is much faster than all other time scales involved. The main question we address now is whether and how the network connectivity, as specified by the matrix of coupling strength J ij can be reconstructed if the pattern of spikes times (t j,m ) j∈{1,...,N },m∈Z is given? We remark that we do not assume access to the natural state variables, the membrane potentials V i (t), which may not be observable, but only to the spike times that are generated by the dynamics of these potentials. This difference constitutes the main novelty of the approaches presented in this subsection compared to those for continuous time state variables presented before.
First observe that (52) has an explicit solution [35] given by (54) if the time interval [t 0 , t) lies in between two subsequent spikes of neuron i. The first sum is taken over all neurons while the second is taken over those spikes generated by the other neurons j = i that affect i in the time interval [t 0 , t).
Based on this solution, we now present two distinct approaches to infer network connectivity, one direct and exact inference method assuming oscillatory units and one coarse approximate method based on stochastic optimization:

Direct topology inference assuming oscillatory units
Van Bussel et al. [35] assumes that all the parameters of (54), besides the synaptic couplings J ij , are known. The rationale is that delays and membrane time constants as well as a unit's equilibrium potential R i I i may be estimated beforehand. How could these coupling strengths and thus the structural network topology be inferred? If the currents I i are sufficiently large such that R i I i > V T,i the units are oscillatory such that they create spike even without recurrent inputs from the remaining network. After crossing the threshold V T,i , unit i sends a spike to its post-synaptic neighbors and is reset to a resting value V R,i , so that the unit's state at the boundaries of each inter-spike interval [t i,k−1 , t i,k ) is determined and one may take t 0 = t i,k−1 and t = t i,k . However, these threshold crossings may be induced in two manners at t = t i,k−1 , (a) by incoming excitatory spikes from other neurons such that V i (t − ) < V T,i but V i (t − ) + {j:∃m:tj,m+τij=t} J ij > V T,i or (b) by the intrinsic oscillation of the unit such that V i (t − ) = V T,i without simultaneously incoming spike(s). In both cases, the membrane potential is at reset value immediately after sending a spike. So identifying t 0 = t i,k−1 in (54), we have If the next spike is oscillation-induced (b), the membrane potential approaches its threshold value V T,i continuously such that in addition where we now identified t = t i,k in (54). Thus, each oscillation-induced spike at some t = t i,k implies a linear restriction of the form (54) for the coupling strengths J ij by equating For M different inter-spike intervals obeying (55) and (56), this provides a linear system of equations restricting the coupling matrix. However, as van Bussel et al. [35] remark, consecutive intervals may display similar patterns, such that it is often advisable to select M > N inter-spike intervals sufficiently separated in time, to minimize correlations between intervals and thus numerical inaccuracies. For each i, defining the subset of inter-spike intervals and eq. (54) may be rewritten as where J i := (J ij ) j∈{1,...,N } and X i ∈ R M×N and Y i ∈ R M×1 are given by and Repeating the same process for all units i yields the topology of the whole network. As shown in section 3.2 , the overdetermined problem, M > N , and the undetermined, M ≪ N , may be solved minimizing the L 2 -norm or maximizing the sparseness of the network, respectively. A major limitation is that the presented approach requires a large amount of prior knowledge about the system, such as the time delays between units and their time constants, among others. Given this knowledge, the approach is capable of inferring large networks of neurons as the computation time, as well as the number M of required inter-spike intervals to be evaluated scales linearly with system size [35,59].

Stochastic optimization of all systems parameters
In a complementary study, Makarov et al. [60] showed how stochastic optimization of all system parameters given the spike trains for all neurons in the network may yield a network topology roughly consistent with the actual one. The idea is to evaluate the predicted inter-spike interval durationsT i,m (p) from a test set of parameters and to optimize those parameters to reproduce a given (measured) spike train as closely as possible. Thus, minimizing the difference between the predicted and measured spike trains is vital for this method. The authors in particular applied their idea also to recordings of biological neurons.
As the eq. (54) yields transcendental relations for theT i,m (p), their estimates need to be determined numerically. Thus, finding yields the best (in Euclidean distance norm) solution for the set of parameters. Briefly, to find the minimum p * i in (63) one must explore the parameters space. This means that the solution is found through iteratively choosing random values for p i and comparing the value of (63) for consecutive iterations. By relating the changes in (63) with the changes in p i one may establish directions in which the minimum may be found by gradient descent. However, it is also advisable to change directions in the parameters space randomly. Mainly, because there may better solutions for p * i in regions of the parameters space where they are not expected to exist [61]. Makarov et al. [60] thus applied stochastic optimization for searching the minimum.
As a strong requirement, this method needs that the number of recorded spikes to be M ≫ 2N ; as noted in [60], robust regression models are more suitable to handle this kind of problems and special care with the inter-spike intervals must be taken as, e.g., spike bursts may lead to false intervals.

Features and limitations
The stochastic optimization approach provides a generic approach in finding best fitting parameters and thus here, potentially a well fitting network; yet, it is computationally demanding and simultaneously requires many recordings compared to the size of the network. In contrast, the direct approach assumes a large amount of pre-knowledge, in particular regarding the unit's parameters. In addition, by requiring to pre-process the data sets (i.e. choosing appropriate time intervals), the minimization problem is no longer required to deal with transcendental relations. This leads to a considerable increase on the computational performance and it is the key factor that makes the method suitable for large networks. We remark that still, both methods are currently not suitable in our opinion to reliably infer network topology from real recordings of spike data, because of several reasons: For instance, other network-external sources of spike generation, stochastic fluctuations due to intrinsic noise and the highly specific conditions required for reconstruction still limit the methods applicability. Finally, both approaches assume linearity of interactions, yet it is known that interactions can be nonlinear, e.g. due to dendritic spikes in response to sufficiently strong, simultaneous inputs to single dendritic branches of a neuron [62,63,64].

Technical issues
Let us briefly comment on four technical issues related to the structural inference approaches discussed above. We mention how some of them may be directly transferred to settings, where discrete time dynamics describes the system (section 6.1), discuss issues when measuring data from real, e.g. biological networks (section 6.2), remark on L 1 vs. L 2 minimization (section 6.4) and finally discuss what happens for hyper-networks, where more than two-point interactions influence the collective state (section 6.5).

Discrete time maps
If the dynamical system is described as a network of coupled discrete-time maps instead of by coupled differential equations (2), approaches similar to the ones presented above are often viable in slightly modified form. For instance, the core equation becomes (for 1-dimensional local units) where now the time t ∈ Z is an integer. Here, the direct method of subsection 5.1 is immediately applicable, even with the additional advantage that no temporal derivatives need to be estimated such that an observed time series (x i (t)) i∈{1,...,N },t∈{1,...,M} enters the inference equations without approximations. Of course, that time series may still contain substantial measurement errors that propagate into any solution of the inference problem.
considerations should be accounted for to actually use what it is measured (the data) for what one wants to know about the system, cf. section 1.2.

L 1 norm minimization as a linear program
Given the optimization problem where A ∈ R M×N , y ∈ R N and b ∈ R M , (65) may be posed as [31] min where 1 ∈ R M is a vector of ones and and denote entry-wise comparison and s is an auxiliary variable. To solve (66), one has to solve the linear program where and I ∈ R M×M and 0 ∈ R N are the identity matrix and a vector of zeroes, respectively. The advantage of posing problem (65) as (67) is that the latter can be easily solved in a standard way by implementing any solver for linear programs (e.g. the linprog function in MATLAB [32]).

L 2 vs. L 1 norm minimization
The need to choose a minimization scheme may turn the reconstruction problem into a great challenge, because different schemes may yield different solutions, thus forcing us to discern which scheme is best suited for our purposes in specific reconstruction problems. As illustrated in [1,10,60,68,59,35,29] these differences between minimizers may be exploited in certain situations. For instance, the L 2 minimizer finds the closest solution in the L 2 -norm and moreover has an analytic solution. Yet, given the nature of the minimizers (check [68]), an L 1 minimizer is suited for finding particular sparse solutions and it is more robust to outliers than L 2 , so it might be seen as more useful for applications. However, minimizing an L 1 norm is computationally more costly compared to the L 2 and it may have more than one solution [31]. We note that p 2 ≤ p 1 for any vector p ∈ R N

Hypernetworks: beyond two-point interactions
We have explicitly excluded networks with more than two-unit interactions from our general mathematical description (2) or dynamical networks with temporally changing connections. Those may occur, for instance in networks of computers, or other communication networks of engineering, where inputs from several units that give input to the same other unit are nonlinearly processed (for instance, multiplied) to change the latter units' state. Similarly, non-additive dendritic interactions in neurons [63,20], where two simultaneously received synaptic inputs in close spatial proximity initiate so-called dendritic spikes and thereby a nonlinearity [62], naturally imply three-point interactions.
We remark that direct three-unit and higher order interactions (i.e. threeterm and higher order products such as x d i x d ′ j x d" k where i, j, k are mutually different) are omitted in (1) because they refer to dynamical systems on hypernetworks, thus going beyond the scope of this review. Such third order and higher order terms are not covered by (1), firstly, because the notation stays much simpler without them, but secondly and more importantly, because there seem to be few major theoretical results on reconstructing such systems, if any, that may be or become of practical use. Yet, in social, communication and information networks, such questions may soon become of interest as 'big data' are pouring in.
A brief introduction to the dynamics of complex hypernetworks is given in [17].
Here the averages are time averages If multiple measurements are available, averages may be taken over ensembles of experiments rather than (or in addition to) temporal averaging. The obtained covariance matrix is then often thresholded, either just choosing a heuristic value or according to some measure of statistical significance (against an appropriate null hypothesis) and the resulting non-zero values interpreted either as "strength" of effective connectivity (weighted connectivity matrix) or just as existence (adjacency matrix) of a functional link between units. Sometimes, the value of a threshold is systematically varied and changes in resulting connectivity evaluated. Correlation, covariance and other linear algebra measures are commonly used, often complemented by nonlinear operations such as thresholding, to analyze, for instance fMRI or other spatio-temporal data [69]. We note that correlation measure may be (mathematically) seen in a number of different ways [70].

Entropy maximization
Entropy measures the uncertainty associated with a given probability distribution and constitutes a key quantity in information theory [71] . Given the probabilities of a set of events, the entropy measures how uncertain, on average, the occurrence of an event is; or in other words, how much information, on average, one obtains by measuring the occurrence of that event knowing the probability distribution of events. Reversely, given a collection of (observed) data points, we can choose probabilities to maximize the entropy. Such a distribution, known as a maximum-entropy probability distribution, would be the least biased distribution possible and any other would require further assumptions on the nature of the problem [72]. For a network dynamical system, i.e. systems of coupled dynamical units, we can ask what the effective interactions are such that the probability distribution that best describes the data (averages and correlations) has maximum entropy. Specifically, the principle of maximum information from Bayesian statistics postulates the "most likely" probability ρ(x) of measuring x given the same type of time series data x(t) = (x 1 (t), . . . , x N (t)) is the one maximizing the information obtained from measuring one more state y. More precisely, the goal is to find the probability ρ(x) that maximizes Shannon entropy under the constraints that the first and second moments are consistent with those estimated from the data, and By restricting ourselves to these conditions (and no others), we here focus on pairwise interactions and neglect three-point and higher order coupling that arise in hypernetworks (cf. section 6.5 and [17]). This yields (Appendix A) the probability distribution of the form x is a normalization constant and h i andĴ ij are parameters to be chosen such that (88) and (89) hold. The quantitiesĴ ij are interpreted as effective couplings between units i and j.
If the matrix of covariances between the N time series is the effective coupling matrix is given by its inverse (Appendix A) This means that the best available probability distribution (i.e. that yielding the maximum information) is given by second order effective coupling strengths determined by (but by no means identical to) the linear correlation matrix. This concept is applied to a range of different systems, in particular in biology, including gene networks [73], protein networks [74] and neural circuits [75]. We comment on an approach originally suggested by Bialek and coworkers [75,76] for revealing in how far two-point interactions characterize the coupled dynamics of neural circuits in retina. In fact, a systematic study of neural activity in the retina of larval tiger salamander and guinea pig revealed that 90% of the multi-information (which measures all correlative dependencies in a system [76]) is covered by pairwise correlations only [75]. The authors took this finding as a sign that pairwise interactions well characterize the full network dynamics and that higher order interactions may be neglected. Specifically, they temporally discretized neural responses into "1" or "0" states depending on whether a neuron was or was not active during each considered time interval of generically 20 ms. Thus, the state of the entire (observed) network at each time interval is given by an N −dimensional word composed of the binary components of the N neurons. As sometimes done for effective connectivity, researchers often tend to go beyond what Bialek and coworkers concluded and interpret the effective coupling strengths (75) as actual physical interactions of experimentally observed units. However, it is typically not clear how correlative dependencies yield information about direct physical interactions and such that such interpretations are not justified without substantial further knowledge about the system.

Further in finding effective connectivity
We would like to emphasize that there is a multitude of additional approaches for finding effective or functional connectivities. For instance, a range of methods are based on information theoretic measures such as mutual information [77], transfer entropy [78] and Granger causality [79,80] and extensions thereof. In addition, there are various methods relying on Bayesian statistics or explicit or implicit modeling or extended regression such as used in generalized linear models [81]. In particular in biological sciences, such statistical methods have been used early on even at times not many or not very reliable data were available, see for instance [82] for an early study regarding effective connectivity in neural circuits.
There are a number of open challenges regarding both precision of such methods and the interpretation of the respective results. For instance, functional magnetic resonance imaging (fMRI) experiments of brain areas may rely on differences in blood oxygen level (so-called BOLD signals) as observables but often aims to relate actual dependencies in neural spiking activity. The reasoning here is that the cell metabolism and thus the blood oxygen consumption in a local region (typically one cubic millimeter) of the brain is larger the more action potentials per time are generated by the (10 4 − 10 6 ) neurons in that region so that such approaches are not undisputed [83]. Moreover, the terms effective connectivity, functional connectivity and structural or anatomical connectivity are sometimes not well defined, used in different meanings across studies. There are even overlapping definitions of non-structural forms of connectivity, e.g. for functional connectivity, effective connectivity etc. Here we did not delve into historic waters and used the term "effective connectivity" for all forms of connectivity that is not structural. Sometimes, effective connectivity is even taken as an indication for structural connectivity of physical interactions. For instance, distinguishing direct interactions from indirect influences may be an important issues (cf. Fig. 1) that is not yet fully clarified [84].
Finally, we mention that for neural circuits [85] have devised a statistical method to find the couplings J ij that optimize the likelihood that a class of integrate and fire models generates the spike trains observed in experiments. This statistical method assumes the same model class as the approach for inferring structural connectivity presented above (section 5.2) and its goal indeed is finding the (most likely) structural connectivity.
As a conclusive warning, we remark that in particular different methods exit for obtaining effective connectivity given the same data set; the results each provide information about specific features of the system: exactly those defined by the method. Some might almost coincide with others, some might be congruent with and some may well be contradicting a given structural connectivity (cf. [86]).

Conclusion and Open Questions
This review focuses on how structural connectivity of networks may be inferred from dynamical features of the networks' nodes. It is on purpose on an introductory level and (given that the area of network reconstruction is rapidly growing simultaneously in different fields, from gene and neural networks to engineering systems) by construction only highlights some selected approaches, most of which based on a perspective of considering the collective nonlinear network dynamics. We provide basic approaches about finding effective or functional connectivities of a network from time series as a brief complement and to get a taste for essential differences in perspective. One main distinction between approaches for identifying qualitatively different types of connectivity is that structural inference, aiming to reconstruct real physical interactions, take into account the time evolution of a system. In contrast, finding effective connectivity is often based on a stationarity assumption and uses distributions of states, neglecting all or parts of their temporal evolution. Relating observable, possibly effective and physical connectivity, is at the heart of current interdisciplinary research [15,87]. As also mentioned in the Introduction and briefly discussed in section 7, functional vs. structural inference if often not well distinguished and, in particular in early publications, the qualitative difference in approach was sometimes not even mentioned.
The methods and approaches presented in this review aim to tell whether or not and how strongly units in a network directly interact with each other. This is in distinction to the entire field of system's identification [7] where the aim is to identify the rules underlying a dynamical system from time series and predict its (future) dynamics based on this identification. Systems identification for multi-dimensional systems, and thus in particular for large networks, seems intractably hard because even if the "real" system is almost (but not entirely) perfectly reconstructed, predicting their future dynamics can be impossible due to chaos (sensitive dependence on initial conditions), exponentially many different collective states and uncontrolled external influences, with all these factors becoming typical for multi-dimensional complex systems. At the same time, as in part illustrated in this review Figure 5, learning the existence of strengths of interactions only (and not the precise form of dynamics) may well be successful for much larger systems. We thus speculate that novel methods complementing those of systems identification may yield further insights into the interaction networks of various complex systems.
A number of key issues are not discussed in this review but still are sometimes pressing for progress research, in particular with respect to applications to real world settings. We list a few: Answers to all these questions will specifically restrict or enhance the space of optional networks consistent with recorded data. From an abstract point of view, these modify the form and dimensionality of the set of all possible networks and points to a direction that still addresses network dynamics as an inverse problem but goes beyond network reconstruction. Given all restrictions, perhaps we can design or control a network to robustly exhibit a specific functionality. Network design and control form two additional branches in the theory of network dynamical systems, with currently highly active research, from biological sciences to engineering [11,96,97,98,99]. Such approaches might soon be extremely influential and thought-provoking, when, e.g., the neural and biochemical networks in our bodies as well as our infrastructure networks surrounding us can not only be reconstructed, but even controlled and specifically engineered. We recommend a view point by Freeman Dyson for a very practical taste [100].

A Multiple Linear Regression and L 2 −Norm Minimization
Multiple linear regression is a widely-used statistical tool for predicting values of a set of variables depending on one independent set of variables. Its main purpose is to infer a linear relationship between them. Thus, the relationship between sets is assumed to have the form where y ∈ R 1×M are the values for a dependent variable, β ∈ R 1×K the column vector of unknown linear coefficients, x ∈ R K×M is the set of values for the K−independent variables and ε = (ε 1 , . . . , ε M ) ∈ R 1×M are random errors ε i , i ∈ {1, . . . , M }. In addition, the errors ε i are assumed to be independent random variables distributed according to a Gaussian distribution with mean µ = 0 and constant variance σ 2 [101]. The question is how to estimate β by someβ such that the difference between the predicted and real values, βx and y, is minimized? Using L 2 minimization, the problem formally becomeŝ also known as the linear least squares method [101]. The local extremum of the L 2 norm appearing in the right hand side of (77) implies ∀i ∈ {1, . . . , K} : such that of β according to the linear least squares method [101].

B Singular Value Decomposition and L 1 −Norm Minimization
Singular Value Decomposition (SVD) is regarded as an important tool for statisticians due to its broad variety of applications (reduced rank regression, polar decomposition, image compression, among others [102]). Formally, from the fundamental theorem of linear algebra, any rectangular matrix A ∈ R m×n may be decomposed into the product of three matrices as where U ∈ R m×m and V ∈ R n×n are unitary matrices with their columns referred to as left and right singular vectors, respectively, and Σ ∈ R m×n is a rectangular diagonal matrix containing the singular values [103]. This decomposition is known as the Singular Value Decomposition of matrix A.
The SVD properties are useful in finding the set of all possible solutions to an under-determined system of equations, because for every under-determined system where b ∈ R m×1 , we can use SVD (80) to rewrite A such that solving (81) for y yields the particular solution whereΣ ∈ R n×m is the pseudo-inverse of Σ and is defined as Equation (82) defines a particular solution from the set of all possible solutions. The general solution to equation (81) is given by our particular solution plus a linear combination of vectors in the null-space of A, i.e., where c ∈ R n×1 and c i = 0 for i ∈ {1, ..., r} and r = Rank(A). The latter ensures that the linear combinations are made only with vectors that span the null-space.
In this case, the problem consists in maximizing the number of zero entries in y by choosing the c j ∀ j > r properly. This may be achieved by solving the overdetermined system (84) with M equations and M −r unknowns. Specifically, it has been illustrated in [10,29,35] that by solving arg min employing the Barrowdale and Roberts algorithm [34], a solution having a low number (possibly the least number) of non-zero values, consistent with the restricting equations, is often recovered. However, there seem to not be a general proof of this observation [31]. In the network reconstruction context, it has been shown that optimizing systems like (85) (where y and b are replaced by the unit's connectivity J T i and network constraints Y T i ) yields the actual network topology when the network is sparse (even if there are less linear constraints than units in the network).

C Estimating maximum entropy parameters.
Maximizing the entropy we derive first the functional form of ρ(x) and second its parameters under the constraints that the non-negative function ρ ≥ 0 is normalized and the first moment and second moment are consistent with those estimated from data of N time scalar series x i (t). The covariance matrix is defined via the data as We maximize the entropy under these 1 + N + N (N − 1)/2 constraints using a Lagrange function where we drop constants as they do not influence the location of a maximum.
Here a, h i and 1 2 γ ij are the Lagrange multipliers to be determined. Computing the first derivatives and equating them to zero − ∂L ∂ρ(y) = ln (ρ(y)) + 1 + a + yields the (unique local) maximum entropy of the form where we use the abbreviationsĴ := (γ ij ) ij , z := x +Ĵ −1 h is an affine function of x and is independent of x.
In summary, this exact computation yields the probabilities of the form where Z −1 = exp (−1 − a).
To estimate the parameters a, h i andĴ ij we make the approximation that the data x i form a continuous set such that we can approximate sums by integrals. We then first observe that due to normalization (87). This implies A = (| det(Ĵ)|/(2π)) N/2 , and thus yields the parameter a as a function of h andĴ. Similarly, fixing the averages (88) yields (approximately) for all i and thus h as a function ofĴ. Finally, the equations fixing the second moments can be evaluated using subjected to With the transformation x = z −Ĵ −1 h, eqn. (99) becomes where the last equations follows from (94), (97) and (98). Thus, using the definition of the correlation matrix (90) and assuming that temporal and statistical averages are the same, we have that the matrix J = C −1 (103) of effective coupling strengths between the variables is given by the inverse of the correlation matrix of the data.