Network structure of cascading neural systems predicts stimulus propagation and recovery

Objective. Many neural systems display spontaneous, spatiotemporal patterns of neural activity that are crucial for information processing. While these cascading patterns presumably arise from the underlying network of synaptic connections between neurons, the precise contribution of the network’s local and global connectivity to these patterns and information processing remains largely unknown. Approach. Here, we demonstrate how network structure supports information processing through network dynamics in empirical and simulated spiking neurons using mathematical tools from linear systems theory, network control theory, and information theory. Main results. In particular, we show that activity, and the information that it contains, travels through cycles in real and simulated networks. Significance. Broadly, our results demonstrate how cascading neural networks could contribute to cognitive faculties that require lasting activation of neuronal patterns, such as working memory or attention.


Introduction
A central question in neuroscience is how connections between neurons determine patterns of neurophysiological activity that support organism function.Networks of neurons receive incoming stimuli and perform computations to shape cognition and behavior, such as the visual recognition of faces in regulating social behavior [1].While many studies laud the ultimate goal of determining how the brain's network structure supports information processing [2,3], it remains challenging to empirically study the direct interactions between neural dynamics, connectivity, and computation.Indeed, neural connections and their underlying computational function have often been inferred through neural dynamics, and formal studies probing mechanistic relations among the three components have remained largely theoretical [4][5][6].
One characteristic empirical feature of many systems is cascading dynamics, in which neurons display spontaneous bursts of activity.While these bursts may seem arbitrary, they actually comprise stochastic cascades that follow spatiotemporal patterns of activity [7].These cortical cascading dynamics have been well-characterized in the empirical literature using a range of methods in vitro [8,9], in vivo [10][11][12][13][14][15], and ex vivo [16] in a variety of organisms, including humans.In a complementary line of theoretical work, these neural systems have been hypothesized to operate within a regime that maximizes information transmission [8,17], information storage [7], computational power [18], and dynamic range [19][20][21].However, often left implicit in these analyses is the structure of the networks underlying such dynamics and how the structure may constrain those dynamics.Relatively recent empirical data show evidence of specific patterns of cortical connectivity.Cortical neurons are often strongly, bidirectionally connected to each other [22][23][24], and form higherorder network motifs in clusters of neurons [25][26][27], which in turn group into communities of neurons.By forming cyclical network motifs, neurons can temporally extend activity, thereby supporting short-term information storage [28][29][30][31] and computation [32][33][34].These features of network structure have yet to be linked to cascading dynamics and computations that are supported by those same circuits.
Here, we address the gap in knowledge between network structure, cascading dynamics, and information retrieval through a series of analytical and numerical analyses on simulated and empirical data.We first show that network structure constrains system memory through sustained activity by framing spike propagation as state transitions in a Markov chain [35].We then apply linear systems theory to predict distributions of cascade duration in the stochastic dynamics of simulated and empirical spiking neural networks.We find that cascades flow through cycles in the underlying network, which have been widely observed in experiments, to contribute to long tails in distributions of cascade duration.Finally, we use mutual information to probe the relations among network structure, cascade duration, and the information maintained in a network in four commonly studied generative graph models.Moreover, our method can accomodate networks that are both non-critical [36][37][38][39] and critical, and that show avalanche behavior, characterized by power-law distributions of cascade size (i.e. the number of neurons that spike in a cascade) and duration.Collectively, our findings show that the network topology reported extensively in the empirical literature can produce complex cascading dynamics through which a network can support the lasting activation of a cluster of neurons, which in turn allows for the discrimination of stimulus patterns implicated in working memory [40][41][42].

Mathematical framework 2.1 Network formulation
We begin with the stipulation of a network as well as a dynamical process that occurs atop the network.We formalize the notion of a network as a directed graph G = (V, E) in which neurons are represented as nodes V = 1, …, n and neuron-to-neuron connections are represented as edges E ⊆ V × V (figure 1(A)).The weighted and directed adjacency matrix A = a ij thus encodes the edge weights from neuron j to neuron i (figure 1(B)).

Stochastic McCulloch-Pitts neuron
To model neuronal cascades, we next stipulate a stochastic, discrete-time version of the McCulloch-Pitts neuron [43].In the McCulloch-Pitts model, a neuron i receives inputs scaled by the weights of the edges and sums the scaled inputs, a i ⋅ y, to produce an output spike y i (t) via an activation function (figure 1(C)).Here, the activation function is a random Bernoulli process, where probability p is the sum of the scaled inputs.The sum of the scaled inputs a i ⋅ y is bound by 0 and 1 such that p = min 1, max 0, a i ⋅ y .The network starts at some non-random initial state y(0), which can also be interpreted as a stimulus received at t = 0.
The state of an n-neuron network is a binary vector y(t) ∈ 0, 1 n such that each element indicates whether a neuron fired at time t and evolves as where B(p) is a Bernoulli process with probability p, and a i is the ith row vector of A.

Markov chain formulation
The model that we consider can be represented as a Markov chain with states s i ∈ 0, 1 n representing all possible patterns of spikes in the network, and with state s 1 = 0 representing the zero state.The column vector p(t) = p 1 (t); ⋯; p 2 n(t) = P y(t) = s 1 ; ⋯; P y(t) = s 2 n represents the probability that the network exists in any state s i at time t.The transition matrix T governs where each entry T = T lk = P y(t) = s l | y(t − 1) = s k represents the transition probability from state k to state l.See S1 Methods for details regarding the computation of the matrix T and S1 Result for numerical validation.With the transition matrix, we can compute the fundamental matrix of the Markov chain [35].

Estimation as a linear dynamical system
Because computing a Markov chain is intractable for large network sizes, we instead estimate the process stated in equation ( 1) using a linear dynamical system with the same parameters A. Specifically, the average activity generated by the stochastic model can be written as x(t) = E[y(t)].Given equal initial states x(0) = y(0), ∀i ∈ V: ∑ j a ij ≤ 1, and a ij ≥ 0, it is straightforward to show that this average network state obeys (see Methods for a formal proof and S2 Result for numerical validation).Equation ( 3) offers a natural intuition: the average behavior of the stochastic model follows linear dynamics and evolves exponentially as a function of time.Such a relationship allows the application of rich mathematical principles of linear dynamical systems to describe average stochastic dynamics of the model.

Network structure constrains cascade duration
As the first step towards uncovering relations between structure and cascading dynamics, we provide mathematical relationships between network topology and cascade duration.First, the Markov representation above makes explicit the relationship between the network A and the stimulus propagation and discrimination.The process stated in equation (1) determines a unique map from adjacency matrix A to transition matrix T .Given an initial distribution of states, i.e. the stimulus patterns, p(0) = p 0 (0); p 1 (0); ⋯ , the fraction of cascades that terminate, thereby going to the absorbing state [35], by time t is simply given by the first entry of p(t).Conversely, the probability that a cascade is alive at time t is given by P (alive, t) = 1 − p 0 (t).
While the Markov representation gives an exact formulation of the stochastic dynamics, the space complexity of the transition matrix is O 2 n , making its computation intractable in empirical data with hundreds of neurons.Thus, to more generally describe cascade behavior, we use intuitions grounded in the theory of linear dynamical systems.We can decompose the weight matrix A into eigenvalues and eigenvectors to identify the elementary modes of activity propagation.Using the dominant eigenvalue λ 1 to identify the constraint on the dominant propagation of activity, we can estimate non-linear, stochastic behavior with a linear system.The dominant eigenvalue λ 1 is defined as with the maximum absolute value.The dominant eigenvalue λ 1 scales the dominant eigenvector v 1 , which constrains the most persistent mode, or vector, of activity propagation [21,44,45], thus quantifying the decay in activity in the network.
To numerically demonstrate the utility of the metric λ 1 in explaining cascade duration (figure 2(A)), we simulated cascades on 104 networks with 2 8 nodes for 10 3 time steps (see Methods and Supplementary Information (stacks.iop.org/JNE/17/056045/mmedia) for network parameters).Using maximum likelihood estimation (MLE) [46,47], we fit a truncated power law p(x) ∼ x −α e −x/τ to distributions of cascade duration [48] and computed τ′ as min d max , τ , bounded by the maximum duration d max (figure 2(B)).(The truncated power law fits the simulated and empirical data better than a power law; see S9 Result.)Intuitively, the metric τ′ captures the temporal scale in which activity can propagate in a network.We found that τ′ is monotonically correlated with λ 1 , with a Spearman's correlation coefficient ρ of 0.93(p ≈ 0; N = 104), and that α has a mean of 2.0 ± 0.14 (standard error; figure 2(C) [49].Notably, these relations can inform how one would tune the network A to produce heavy-tailed distributions of cascade duration.
Finally, we empirically tested our predictions in 25 multielectrode array (MEA) recordings of spiking neurons in the mouse somatosensory cortex [50] and found similar correlations between network structure and dynamics (figure 2(C,D)).In the recordings, we binned the spikes into 5 ms bins and used MLE to fit a truncated power law and compute τ′.To derive λ 1 , we calculated an effective connectivity matrix from each recording using first-order vector autoregression (VAR) [51,52].We found that τ′ is monotonically correlated with λ 1 , as reflected in a Spearman's ρ of 0.69 (p = 1.8 × 10 −4 ; N = 25; figure 2(D)).Moreover, we can simulate stochastic cascades on the empirically derived networks and find a significant positive correlation between τ′ and λ 1 , with a Spearman's ρ of 0.68 (p = 2.6 × 10 −4 ; N = 25; figure 2(C)).With a mean α of 2.3 ± 0.1, these recordings range in their proximity to criticality (see Supplementary Information for their exponent relations), yet their dynamics are all well-described by their network structures.All together, these results demonstrate the dependence of the temporal scale of activity propagation on the network structure of neural systems.

Local network structures: cycles
Having demonstrated in the previous section that cascade duration can be predicted from the network structure, we next turn to a deeper examination of which specific features of a network's topology and geometry can support a heavy-tailed distribution of cascade duration.Note that we use the phrase network topology to indicate the arrangement of binary edges and we use the phrase network geometry to indicate the distribution of edge weights [53].The two candidate features that we consider are (i) the presence of cycles and (ii) the strength of connections in cycles.We will study these features through a rewiring process on an initial set of edges.
We begin by noting that cycles support temporally extended cascades.Given a single initial stimulus or spontaneous spike, a cascade can have a duration greater than the number of nodes in the graph if and only if there exists at least one cycle in the network.We demonstrate this simple intuition with an acyclic 3-node network and a cyclic 3-node network, where each edge in both networks has a weight of 0.5 (figure 3(A)).In simulations of 10 4 cascades, we found that the acyclic network produces a maximum cascade duration of 3 time steps, as expected.In contrast, using the same number of simulations on the cyclic network, we found the much greater maximum cascade duration of 13 time steps.
Next, we show that the cascade duration scales monotonically with the prevalence of cycles in a network as measured by cycle density, which we define as the number of simple cycles divided by the number of connected edges (figure 3(B)).To study the effect of cycle density, we begin with a 10-node, directed acyclic graph and randomly rewire each edge with probability p to a different target node.The directed acyclic graph has the maximum number of edges; that is, the weight matrix is an upper triangular matrix without the diagonal entries.By sweeping over rewiring probabilities p = 0.0, 0.1, 0.2, …, 1.0, we generated networks with different numbers of simple cycles, but the same number of edges and same edge weights of 1 n .For each p, we simulated 10 4 cascades with a maximum duration of 10 4 , and we measured the slope of the linear tail of the distribution on a log-log plot.In these simulations, we found that as a network is rewired to contain more cycles, the average cascade duration increases (Pearson's correlation coefficient r = 0.82, p = 1.0835 × 10 −27 ; N = 30; figure 3(B)).These examples illustrate the more general rule that networks containing cycles can support longer cascades and can extend the tail of the distribution of cascade duration.Importantly, while structural cycles have been experimentally observed [22][23][24], we here empirically validate that activity can actually propagate through cycles.A key potential constraint for cyclical activity propagation is a large refractory period, which can impede such activity even if cycles are structurally present [54].Hence, using the same 25 MEA recordings of spiking cortical neurons as employed previously, we measured the extent of cyclical activity by quantifying the occurrence of n-cyclic spikes, a phenomenon which occurs in a cascade when a neuron spikes again after n time bins of its previous spike.We found that on average, 1-, 2-, 3-, and 4-cyclic spikes occur 14.5 ± 3.1 times per cascade (with an average of (9.3 × 10 4 ) ± (6.8 × 10 3 ) cascades for 25 recordings, standard errors; figures 3(C) and (D)).With a 5 ms bin width, these cyclical activity patterns are within biophysical limits [55].Collectively, these result suggest that cyclical activity propagation is not impeded by refractory periods and indeed occurs frequently in living neuronal systems.

Local network structures: connection strength
We now turn to a consideration of the distribution of edge weights.To maximize the specificity of our inferences and to generally build our intuition, we constrained ourselves initially to simple networks that only contain a small cycle (a 2-node cycle) or that also contain one relatively larger cycle (a 4-node cycle; see Supplementary Information).We probed the role of weight distributions in the dynamics of the network by placing the strongest weights on edges on one cycle and by placing the weakest weights on edges not on that cycle.Specifically, in both the 2-node and 4-node cycle networks for each simulation, we took the strong weights initially placed on the cycle and redistributed some of their weight by Δw to randomly chosen edges that are not part of the original cycle (i.e.w strong, new : = w strong, old − Δw and w weak, new : = w weak, old + Δw; figure 3(E).Upon these new networks, we simulated the stochastic model.We found that as the weight on the original cycle is continuously redistributed away from the initial cycle and throughout the network, we observe fewer and fewer cascades of long duration (figure 3(F) and (G)).
Across empirical studies [8, 11, 12, 14-16, 37, 56, 57], the distributions of avalanche duration have been described by power law functions, where the exponent is known as the lifetime.Typical values vary from −1.0 to −2.6.We seek to show how cycle density and edge weights in cycles together explain the topological and geometric differences in the networks underlying the various distributions of cascade duration.As we redistributed edge weight more uniformly in the networks, we found that mean duration of terminated cascades increases (figures 3(F) and (G)).Furthermore, as we redistributed away from the uniform geometry, continuously increasing the range of edge weights, we again observed more and more cascades of long duration.These observations underscore the tight coupling between the range of edge weights, and the heavy-tailed nature of the distribution of cascade duration.
Lastly, we seek to determine whether the distribution of edge weights along cycles contributing to cascade duration is captured by eigenvalue analysis.Towards this goal, we employed the same perturbative numerical experiments on the networks.Specifically, we found that as the weights of the original cycle are redistributed evenly to the alternate cycle (and vice versa), the mean duration of cascades increases monotonically with the average of eigenvalues of the network (Spearman's rank correlation coefficients ρ = 0.99, p = 2.4 × 10 −53 and r = 0.46, p = 9.0 × 10 −4 , respectively;N = 30; figure 3(H)).Because the dominant eigenvalues of the networks in these simulations are all equal to 1, the average of eigenvalues provide a more descriptive estimation of activity propagation.Thus, this result demonstrates how network geometry can more subtly constrain cascade duration by determining the strength of non-dominant eigenmodes.

Node-specific dynamics
Even within a single network architecture, the range of cascade dynamics can vary depending on the nodes that are stimulated, either spontaneously as the initial state of a cascade or exogenously through input.Further, cascades follow precise activity patterns that are stable for hours [9].Thus, we now consider the role of the stimulus pattern on cascade dynamics.We extend our eigenvalue analysis to estimate the role of a stimulus pattern on stochastic cascade dynamics by calculating the magnitude of the eigenprojection of the stimulus pattern.Because the average dynamics are explained by linear systems theory, we then use network control theory to more accurately predict how stimulation of individual nodes alters the dynamics of cascades.
The eigenprojection of the stimulus pattern.-As a natural extension of the dominant eigenvalue analysis, we first tested whether the magnitude of the eigenprojection of the stimulus pattern could predict cascade dynamics.Given a stimulus y(0), the eigendecomposition of the weight matrix A into A = P DP −1 yields c = P −1 y(0) as the coefficients of the eigenmode excitation of y(0).The components of c determine how much the stimulus y(0) projects onto the eigenvectors of A and describes the modes of average activity propagation through the network A. Then, as a predictor for mean duration, we can compute the 1-norm, or the sum of absolute values, of the eigenprojection of the stimulus pattern scaled by the corresponding eigenvalues λ, c ⋅ λ 1 . (5) We numerically test the eigenprojection metric by simulating cascades on a 100-node, weighted random network.The mean duration of cascades generated from the stimulation of a single node was significantly positively correlated with the magnitude of the eigenprojection (Pearson's correlation coefficient r = 0.34, p = 4.5 × 10 −4 ).To determine the generalizability of these findings, we expanded our simulation set to include 30 random instantiations of networks with the same parameters.In this broader dataset, we found that the Pearson's correlation coefficient was highly variable (median r = 0.23; figure 4(A)).Thus, we can weakly estimate the role of a stimulus pattern on cascade dynamics with eigenvalue analysis.
Network control theory.-Tomore accurately predict the role of a stimulus pattern on cascade dynamics, we adopt the recently developed metrics of average and modal controllability from network control theory [58].We hypothesized that these metrics, previously applied to large-scale brain networks [59,60], predicts cascade duration since network control necessitates activity.In the same set of simulations reported above, we compared the mean cascade duration to the finite average controllability of each node, defined as where Intuitively, average controllability is the magnitude of the impulse response of the system when stimulating a node.We observed that the mean cascade duration and finite average controllability were significantly positively correlated (Pearson's correlation coefficient r = 0.79, p = 2.7 × 10 −22 ).In contrast, modal controllability was not strongly correlated with mean cascade duration (Pearson's correlation coefficient r = − 0.12, p = 0.24; see Methods for mathematical definition).Intuitively, modal controllability is a heuristic to determine how well a node produces activity patterns that other nodes cannot easily produce.To determine the generalizability of these findings, we expanded our simulation set to include 30 random instantiations of networks with the same parameters.In this broader dataset, we observed consistent effects (median Pearson's correlation coefficient r = 0.74 and r = − 0.27 for finite average controllability and modal controllability, respectively; figure 4(A).In comparing the predictions from linear control theory with the predictions from eigendecomposition, we note that finite average controllability is consistently more strongly correlated with the mean cascade duration than the magnitude of the eigenprojection.
Interestingly, networks with the same topological parameters as above, but with a bimodal distribution of weights show even stronger correlations between network control statistics and cascade dynamics (figure 4(B)).Such a weight distribution reduces variance in the stochastic process, which intuitively can serve to strengthen the correlation.We observed that the mean cascade duration and finite average controllability were significantly positively correlated (Pearson's correlation coefficient r = 0.87, p = 3.2 × 10 −32 ).Modal controllability became strongly negatively correlated with mean cascade duration (Pearson's correlation coefficient r = − 0.50, p = 9.2 × 10 −8 ).Again to determine the generalizability of these findings, we expanded our simulation set to include 30 random instantiations of networks with the same parameters.We observed consistent effects (mean Pearson's correlation coefficients between mean cascade duration and finite average controllability, modal controllability, and magnitude of eigenprojection were r = 0.86, r = − 0.54, and r = 0.59, respectively; figure 4(B)).Again we note that finite average controllability is consistently more strongly correlated with the mean cascade duration than the magnitude of the eigenprojection.These simulations suggest that the skewed weight distributions, as identified in the previous section as network motifs that support long cascades, may strengthen the relationship between network control and network dynamics.Collectively, the results illustrate that the stimulus patterns and the network must be tailored for each other to produce the desired neural dynamics.
Finally, we tested these predictions in empirical data and find that controllability of the initial states is correlated with cascade duration (figure 4(C)).In each recording from the same MEA data used earlier from spiking neurons in the mouse somatosensory cortex, we calculated the mean finite average controllability of all nodes active in the first 1, …, T time bins of each cascade.Mean finite average controllability is monotonically correlated with the duration of each cascade with a median Spearman's ρ = 0.20 for T = 1 (largest p − value = 1.9 × 10 −33 ) and ρ = 0.26 for T = 2 (largest p − value = 7.9 × 10 −49 ; see Methods for number of neurons in empirical data).It is important to remember that the cascades are stochastic and cannot be predicted deterministically.Thus, it is notable to find any correlation between mean finite average controllability and cascade duration in empirical data.

Cascade duration allows network discriminability and stimulus recovery
If certain network topologies and stimulus patterns can produce long-lasting cascades consistent with avalanche dynamics, what role can lasting cascades contribute to information processing?Intuitively, one cannot recover information about stimuli from cascades that have already terminated.For lasting cascades, network states can be discriminated and can also provide information about stimuli.Such delayed recovery of stimuli can allow the associative learning of stimuli across temporal delays [40][41][42].The intuition that lasting cascades allow network discriminability can be formalized mathematically via equations ( 2) and ( 3).Then, with simulations, we test the intuition that cascade dynamics support stimulus recoverability.
In the Markov formulation, the discrimination between network states y(t) propagated from stimulus y(0) = s i and from y(0) = s j depends upon the similarity between probability vectors p i (t) = T t s i and p j (t) = T t s j .For quickly decaying systems, p i (t) and p j (t) will both have a high probability of being in the zero state s 1 , inherently reducing discriminability.Hence, the architecture of the network A constrains the amount of persisting activity that permits discrimination of the initial spiking distribution p(0).
Network discriminability.-Toanalytically show the relationship between cascade duration and discriminability, we first define network discriminability as the Euclidean distance between two states d y 1 (t), y 2 (t) in n-dimensional space.Recall that E[y(t)] = x(t) for stimulus x(0) from equation (3).Then, given two stimuli, x 1 (0) and x 2 (0), we can calculate the expected network discriminability as the distance between the expected network states d x 1 (t), x 2 (t) at time t.Given that the dominant eigenvalue λ 1 < 1, then x(t) approaches the zero vector 0 as t approaches ∞.As described in previous sections, the decay in activity is constrained by the dominant eigenvalue of the network and by the finite average controllability of the individual node being stimulated.Thus, the rate at which both x 1 (t) and x 2 (t) decay to 0 determines the rate at which d x 1 (t), x 2 (t) approaches d(0, 0) where discriminability between two network states is zero.
Stimulus recovery.-Tonumerically show the relationship between cascade duration and stimulus recoverability, we first define stimulus recoverability as the mutual information I S; Y t between stimulus patterns s ∈ S and network states y ∈ Y t at time t (see Methods for details and figures 5(A)-(D) for an intuitive schematic).Similar to discriminability, mutual information between the stimuli and network states decreases with shorter cascade duration because the Shannon entropy of the network states decreases.To probe this relation formally, we simulated cascades with 100-node networks from 4 different graph topologies with 30 instantiations of each graph type.Consistent with our intuition, we observe that mutual information is maintained longer when cascades last longer on average (figure 5(E)).We then quantified the decay in mutual information by first performing linear regression on the mutual information as a function of time for the first 10 time steps.By calculating the Pearson's correlation coefficient between the slope of linear regression and the mean cascade duration, we found that for all four graph topologies, mutual information decays faster when the propagation of activity also decays faster (figure 5(F)).Collectively, these results demonstrate that stimulus recoverability is maintained longer when the cascades generated by stimulus patterns last longer.
To link information retention back to network structure, we assessed the relation between stimulus recoverability and the sum of eigenvalues of each network.Using the same 100-node networks from 4 different graph topologies with 30 instantiations of each graph type, we found a significant positive correlation between the average decay rate in mutual information and the sum of eigenvalues, implying that network structure supports the retention of information within the network (Pearson correlation coefficient r = 0.92, p = 1.8 × 10 −49 ; figure 5(G)).Moreover, while all networks had similar parameters, each graph type generated distinct ranges of decay rates and sums of eigenvalues, suggesting that certain graph types may be better suited for information retention than others (figure 5(H)).In particular, we observe lower decay rates in mutual information and lower sum of eigenvalues in the weighted random and modular graphs, than in the random geometric and Watts-Strogatz graphs.Collectively, these findings demonstrate the interplay among network architecture, network dynamics, and information processing.

Discussion
Neural systems display strikingly rich dynamics that harbor the marks of a complex underlying network architecture among units, from the small scale of individual neurons to the large scale of columns and areas [33,61].Cascades are a quintessential example of such dynamics, and, when they are close to a critical regime, are thought to allow for a diverse range of computations [7,8,19,20].Yet, precisely how a neuronal network's structure supports stochastic dynamics and the computations that can arise therefrom remains unclear.Here, we seek to provide clarity using both precise analysis of mathematical formulations and statistically rigorous assessments of numerical experiments.We consider a generalized stochastic spiking model and demonstrate that the time-averaged activity of this model can be treated as a linear dynamical system.From this observation, we derive intuitions for how network structure, which estimates the patterns in synaptic interactions, constrains cascade duration.In subsequent numerical experiments and empirical validation, we use eigendecomposition and network control theory appropriate for linear dynamical systems to describe how network structure and the stimulus pattern together determine the manner in which a stimulus propagates through the network during a neural cascade.We identify strongly connected cycles, which have been widely empirically observed, as prevalent network motifs that promote long cascade duration in neuronal networks.Finally, we use mutual information to demonstrate that long-lasting cascades can serve as a mechanism to allow for temporally delayed recovery of desired patterns of stimulation.Broadly, our work blends dynamical systems theory, network control theory, information theory, and computational neuroscience to address the wide gap in the field's current understanding of the relations between architecture, dynamics, and computation.

Biophysical implications of results
The biophysical implications of the results demonstrated here are threefold.The first implication is on the scale of a local neuronal population of hundreds of neurons.On this scale, we showed that the dominant eigenvalue of a network scales the distribution in the duration of cascades.From a broad perspective, this result shows that complex behavior of a neuronal population can be described by the collective pairwise interactions between neurons.The second implication is on the scale of a handful of neurons.On this scale, neurons form cycles through which spikes can propagate.This result demonstrates that the extensive empirical observation of bidirectionally connected neurons [22][23][24] is integral to propagating activity in a network.Importantly, a refractory period, which can limit cyclic activity if large enough [54], does not prohibit a cyclic propagation of activity, at least at a temporal scale of 5 ms and longer.The third implication is on the scale of individual neurons.On this scale, the results of the eigenprojection and controllability analyses show that a neuron propagates activity for longer if it has a large magnitude of the eigenprojection or if it has high controllability.In a neuronal population, different neurons can have different roles in performing computations depending on the topology [34].Our results suggest that high controllability neurons may serve as 'broadcasting neurons' which, upon activation, propagate activity for a long duration to the entire network.

Linear form of stochastic network dynamics
Because of the inherently stochastic nature of neuronal cascades, many previous studies have simply inferred properties about the underlying network through statistical methods [8,62].An important innovation in this study was the demonstration that the time-averaged activity of the stochastic system has an equivalent form as a linear dynamical system.In real neuronal systems, dynamics are non-linear, which most likely accounts for the difference in range of τ′ in figures 2(C) and (D).Such linear estimation of the dynamics makes available powerful computational tools in matrix and linear systems theory, and allowed us to capitalize on recent advances in network control [58,63].Network control theory is a formal approach to modeling, predicting, and tuning the response of a networked system to exogenous input, and has been recently applied to neural systems at both the cellular [64][65][66] and regional [59,60,67,68] scales (for a recent review, see [69]).In these previous efforts, linear dynamics have been assumed, whereas here such dynamics have been proven, to be relevant for the neural system under study.Extensions of linear systems analysis, such as observability [70] and optimal control [71-73], follow immediately from this work and could provide added insights into other dynamical and computational properties of neural networks.Finally, it would be of interest to directly probe the effects of stimulation patterns defined by network controllability statistics on information transmission in vitro or behaviors in vivo, following work in a similar vein in large-scale human neuroimaging [74][75][76][77].

Topological constraints on dynamics and computation
Proving formally that network topology affects dynamics and computation is important, but can be further complemented by providing intuitions regarding the specific features of a network topology that are most relevant, thus explaining and guiding experimental results.The identification of functionally relevant features of networked systems has a long history in molecular biology [78], with notable efforts identifying structural motifs in transcription regulation networks [79], protein-protein interaction networks [80], and cellular circuits [81], which are thought to arise spontaneously under evolutionary pressures [82].Significantly extending prior statistical efforts in large-scale connectomes [83], here we demonstrate that specific structural motifs in the form of strongly connected cycles are topological features that support long cascade dynamics.These structural motifs form elementary units or building blocks of the network that can be combined to create connectivity architectures that produce certain dynamical behaviors [32,33].Other theoretical studies have also found strongly and bi-directionally connected neurons as motifs that produce long-lasting memory [31], potentially as a mechanism for attractor dynamics [4].Importantly, empirical studies have shown that the network motifs identified here are observed in both cortical microcircuits [22][23][24][25][26][27] and macrocircuits [84].Future work is needed to better understand the rules by which neurons connect to one another, and to determine whether those rules serve to increase the memory capacity of cortical networks.It would also be interesting in the future to determine whether higher-order structural motifs, such as those accessible to tools from algebraic topology [85,86], might also play a role in the relationships between topology, dynamics, and computation [84,87].

Information theory as a performance measure
To measure information retention, we use mutual information between stimulus patterns and network states, but it only captures a certain aspect of information processing.Mutual information, originally developed to study communication channels [88], has proven to be a powerful tool for the study of information transmission in avalanching neural networks [8,17].While previous studies of neuronal avalanches use power law statistics that suggest criticality as the theoretical link between dynamics and information processing [7,8,13,16,[18][19][20], we take a more mechanistic approach embedded in dynamical systems theory to study the relationships between network structure, dynamics, and mutual information.While there is substantial evidence that cortical networks frequently operate near a critical point [37,89], this is not always the case [36,38,39]; we therefore did not assume that all activity took the form of critical avalanches.Our more generic approach allowed us to develop a framework that would apply all cascades, critical or not.Despite its utility in studying information channels, mutual information is unlikely to be the only useful performance measure for a neural system, given the numerous purported computations of cortical networks [20,90].Indeed, the explanation posited here for the prevalence of strongly connected neurons does not account for the information faculties of the rest of the neural system.Such considerations compel further investigation into how network structure supports other types of information processing accessible to other information theoretic measures.

Methodological considerations
A few remarks are warranted on the topic of linear dynamics in neural systems.Linear dynamics accurately predicts stochastic, cascade dynamics, and its rich mathematical properties have been used to study neural dynamics in many organisms across a wide range of temporal and spatial scales [59,63,64,91].At the neuronal level, however, neural dynamics are non-linear [92].Efforts analytically demonstrating properties about non-linear systems are more limited [93], and thus, further study is required to more thoroughly demonstrate the relationships shown here in a non-linear system.

Future directions
In closing, we note that the natural direction in which to take this work will be to consider other types of information processing and to identify network structures and neuronal dynamics of different cell types that produce complex network dynamics which in turn support such computations.Here, we demonstrate that the rich mathematical properties of linear systems can reveal insights into the complex dynamics of non-linear, non-deterministic neural systems.In the future, we can further apply this theory to cascading and other neural systems to ask questions about networks, their dynamics, and their computations.It would be apt to apply this framework to cortical networks from functional, structural, and effective connectivities and measure memory performance in terms of the network topology and dynamics.It would be interesting to measure differences in memory performance across brain regions, and to test for relationships between topological features and performance.Third and finally, studying well-known network learning rules-such as Hebbian plasticity [94] and spike-timing dependent plasticity [95]-in a dynamical systems and information theoretic framework may shed further light on the functional purpose of these rules.

Synthetic network generation
We use five different commonly studied graph models from network science in our analyses [96].The first graph model is the Weighted Random Graph model (WRG), which is a weighted version of the canonical Erdös-Rényi model.The weight of an edge is distributed as a geometric distribution with probability of success p.Second, we use a Random Geometric model (RG) that is embedded in a unit cube, where the edge weights are equal to the inverse of the Euclidean distance between two nodes.We kept only a fraction of the shortest edges in order to achieve a desired edge density p. Third, we use a Modular Graph with 4 Communities model (MD4).Pairs of nodes within communities have an edge density of 0.8, and nodes across communities are connected to achieve a desired edge density of p.The edges of nodes in the same community and across communities are weighted according to a geometric distribution with probability of success p and 1 − p, respectively.Fourth, we use a Watts-Strogatz model (WS).The model builds a ring lattice and then uniformly rewires the network, creating a small-world architecture with a random probability of r = 0.1.Fifth, we use a Hierarchical Modular Graph (HM).The model generates a directed network with m hierarchical levels of modules with size s, and connection density decays as 1/E n .See synaptic transmission in the array area [32,33,37].In a previous study (see Supplemental figure 7 in [32]), the authors use the identical multielectrode array data and show that the distribution of delays, measured with transfer entropy, falls largely within 5 ms.Then, using the ARfit software package for MATLAB [52], we perform vector autoregression (VAR) for the autoregressive (AR) model where y(t) is a vector representing the number of spikes for each neuron at time t, w is a vector of intercept terms, and the matrices A 1 , …, A p ∈ ℝ m × m are the coefficient matrices of the AR model [51].With 5 ms time bins, each term of the VAR model captures synaptic delays within 5 ms.For all empirical networks, negative edge weights are allowed to capture inhibition [97].We set the lower and upper bounds for the model order, p min and p max , to 1 and 4, respectively.After selecting an optimal model order p opt using Schwarz's Bayesian Criterion [98], we compute the effective connectivity A as the sum of the coefficient matrices A 1 , …, A p ∈ ℝ n × n of the VAR model over the model orders such that for the elements a ij in A and a l, ij in A l , a ij = ∑ l = 1 p opt a l, ij .The effective connectivity A is equivalent to a linear system, which in turn equals the stochastic McCulloch-Pitts neurons averaged across trials given the constraints laid out in equation ( 3).One advantage of using an autoregressive model to build an effective connectivity network, compared to, for example, a transfer entropy network, is that one can directly use linear systems theory to analyze the linearized dynamics of the network.

Network analysis
We use three sets of weight distributions: a uniform distribution, a truncated normal distribution, and a bimodal distribution.In some simulations, however, we explicitly set the weights to particular values.In a uniform distribution of weights, we set all weights equal to 1 and normalize each row.In a truncated normal distribution, we set the nonzero weights to the upper half of a truncated normal distribution.A truncated normal distribution of weights has been widely observed both in a theoretical context with synaptic plasticity and in the experimental literature [99][100][101].Lastly, we use a skewed, bimodal distribution with a few connections centered at a normal distribution with a large mean and most other connections centered at a normal distribution with a small mean.Bimodal distributions occur theoretically in the context of additive synaptic plasticity [102], and positively skewed distributions have been observed experimentally [103][104][105].Our skewed, bimodal distributions combine these two observations by having a few strong connections.All weights are static and do not change with time t.See the Supplementary Information for network parameters.
To calculate the cycle density of a graph, we compute the number of simple cycles divided by the number of connected edges.A simple cycle is defined as the set of edges in a closed walk with no repetitions of vertices and edges, other than the starting and ending vertex.The number of simple cycles was calculated using the NetworkX software package (version 2.1) on Python (version 3.7.3).

Simulating the stochastic McCulloch-Pitts model
We model cascades as spikes propagating through a recurrent network (see Mathematical Framework).For computational tractability, we set a maximum time step K for the simulations.The simulated spike counts y(t) are stored as a n-by-K matrix.All simulations and calculations were run on MATLAB (version 2018a) provided by The MathWorks, Inc.

Stimulus pattern generation
We investigate the propagation of activity through a network initiated by stimulus patterns.The stimulus pattern is set as the initial state y(0) or x(0) of a network and then propagated forward in time according to either stochastic or linear dynamics, respectively.In our study, we consider two ways to generate stimulus patterns.In the analysis of cascade duration and controllability, we stimulate individual nodes by creating a set of vectors in which the ith element of the ith vector is set at 1 and all other elements are set at 0. In the mutual information analysis, we create a set of column vectors such that their finite average controllability values evenly span the range of controllability values (see later section of this Methods for definition of finite average controllability).In each of the n m = 25 vectors, we choose m = 4 nodes from n = 100 total nodes to stimulate such that each node that we select is increasing in its finite average controllability value.Because finite average controllability is highly correlated with cascade duration, such input vectors will evenly span the possible duration of cascades.

Characterizing distributions of cascade duration
We characterized the distributions of cascade duration using a truncated power law.We used maximum likelihood estimation to estimate the power law with exponential cutoff P (x) ∼ x −α e x/τ [46,47].The exponent τ describes the value of x at which the exponential cuts off the tail of the power law duration.To avoid overgeneralizing the extent of the power law, we bound τ by the maximum duration of x and indicate this bound value as τ′ = min(τ, max(x)).

Mutual information calculation to probe stimulus recovery
To measure the capacity of a network to transfer information during a cascade, we calculated the mutual information I(X; Y ), which quantifies the amount of information, in bits, that one random variable X reveals about another random variable Y .Here, the two random variables of interest are the initial network state y(0), where y(0) is a stimulus s i ∈ S, and the network states y(t) ∈ Y t at a later time t.With mutual information, we measure the amount of information that the network states Y t at time t reveal about the stimulus patterns S.
For each stimulus pattern s i , we simulated 1,000 cascades where P s i = P s j | j ≠ i = 0.5.
In the analysis of the relationship between the average cascade duration and the mutual information, we quantify the decay in mutual information over time.We also calculate the correlation between the decay rate of the mutual information and the predicted mean cascade duration.For this latter calculation, first we perform a linear regression of the decay in mutual information with respect to time.Then, we calculate the Pearson correlation coefficient between the slope of the linear regression and the mean cascade duration.

Estimation by linear dynamical systems
We prove by induction that linear dynamics estimates average behavior of the stochastic model, i.e.E y j (t) = x j (t), given the same initial conditions y(0) = x(0).At t = 0, both y j (0) and x j (0) are set as the stimulus pattern, and so, E y j (0) = x j (0).Now, assume E y j (t − 1) = x j (t − 1), and see that x j (t) = a j T x(t − 1) = a j T E[y(t − 1)] = E a j T y t − 1 = E y j (t) and thus, E y j (t) = x j (t).To demonstrate this relation numerically, we take the average cascades that begin with the same initial state by taking the mean of y i k (t) for all cascades k at each time step t.All cascades start with the same initial condition y(0).We perform numerical validation in the Supplement Information.

Eigenvalue analysis
In our analysis of networks, we decompose the weight matrix A into eigenvalues and eigenvectors.Such an eigendecomposition is formalized as where P is a matrix of eigenvectors as columns and D is a diagonal matrix of corresponding eigenvalues.We calculate the absolute value of the eigenvalue with the largest absolute value as the dominant eigenvalue λ 1 .
When the row sum ∑ j a ij is greater than 1, the linear dynamical system does not equal the expected value of the stochastic model.However, the eigenvalue analyses can still be useful in describing average stochastic behavior.In particular, when λ 1 > 1, the state x(t) of the linear dynamical system can explode exponentially.While the state y(t) of a stochastic model with the same parameters does not similarly explode exponentially, it is bound by 1 for each neuron and reaches a fixed point at 1.In this case, the states of both models, x(t) and y(t), cannot reach quiescence at 0 and thus have infinite cascade duration.

Network control theory and controllability statistics
Network control theory is a formulation of control theory for networks of interacting components.This formulation typically consists of a set of n component nodes V = 1, …, n , where the vector x(t) ∈ ℝ n represents the state of node activities at time t ≥ 0. These nodes are connected by a set of edges E ⊆ V × V, where the adjacency matrix A ∈ ℝ n × n has elements a ij as the strength of the connection from node j to node i.Here, control typically refers to a set of k inputs u(t) ∈ ℝ k at time t ≥ 0 that drive the evolution of system states according to B ∈ ℝ n × k .In linear control theory, the system states evolve as x(t + 1) = Ax(t) + Bu(t) . (8)

Finite average controllability
Motivated by a desire to understand how network architecture affects its control properties, recent work iterates network-based metrics for control of such linear systems [58].
Particularly germane to our discussion of cascade duration is average controllability [59,106], defined as the H 2 norm of the system's infinite average controllability given by Here, we set B as a binary column vector where vector elements corresponding to the nodes of interest are set to and the remaining vector elements are set to 0; this formulation represents an impulse of magnitude 1 to the nodes of interest.The finite average controllability (FAC) is similarly defined by taking the sum to some finite positive integer F instead of infinity, and represents the norm of the system's impulse response over F time steps.Because cascades are expected to last for a finite number of time steps, we use F = 100 in the main text, and in the supplement we show that larger and smaller values of F produce similar results.

Modal controllability
Another network-based control metric we use here is modal controllability [58,59].While modal controllability was originally formulated for symmetric matrices, here we extend the definition to include asymmetric matrices.To do this, we take the absolute value of both the eigenvalues and the eigenvector components, which can be complex numbers in an asymmetric matrix.Thus, we define the version of modal controllability of node i for asymmetric matrices as (10)

Finite average controllability of initial states
To predict the duration of a cascade, we can calculate the finite average controllability of an initial state y(0) defined as the finite average controllability averaged over the nodes that are active in the initial state,    and 75th percentiles, extremes, and outliers.Points are outliers if they are greater than q 3 + 1.5 × q 3 − q 1 or less than q 1 − 1.5 × q 3 − q 1 , where q n is the nth quartile.Extremes are the most extreme data that are not outliers.For all networks, we used a fractional connectivity of 0.05 to show a wide range of decay rates in mutual information (see Supplementary Information for simulations with other fractional connectivities).

Figure 1 .
Figure 1.A linear dynamical system accurately estimates the average spiking of neurons in a stochastic model.(A): An example network represented as an adjacency matrix A. (B): A Markov chain of network states can accurately predict the fraction of active cascades at time t.In 10 4 trials of stimulating neuron 8 in the network in panel b, the root-mean-square error between the state-space prediction and the stochastic model is 1.2×10 −4 .(C): Examples of simulations of cascades generated by stimulating neuron 8 in the network in panel b.

Figure 2 .
Figure 2. Network topology constrains cascade duration.(A): Cascade duration is defined as the number of time steps t between the point at which the first spike occurs after a time step of quiescence, and the point at which the last spike occurs, followed by a time step of quiescence.(B): The distribution of cascade duration can be described by a truncated power law, where parameter α indicates the log-log slope of the initial distribution and τ indicates the duration of the power law on the distribution.(C): In simulations of the stochastic McCulloch-Pitts (SMP) model, the dominant eigenvalue λ 1 of synthetic (blue) and empirical

Figure 3 .
Figure 3.Cycles and strong connections facilitate long cascades.(A): Distribution of cascade duration in acyclic and cyclic networks (all edges have weight 0.5).In 10 6 trials, the maximum cascade durations for the acyclic and cyclic networks are 3 and 11 time steps, respectively.(B): Networks with higher cycle density have longer cascades.We randomly rewired a directed acyclic graph to produce networks of varying cycle density.Cycle density is the number of simple cycles divided by the number of edges.(C): Distribution of n-cyclic spikes in MEA recording 1 with 5 ms bins.An n-cyclic spike occurs when node i fires and then fires again after n time bins.(D): Neurons in mouse somatosensory cortex fire in cycles with small refractory periods.Plot shows distribution of the average number of n-cycles observed per cascade (9.3 × 10 4 ± 6.8 × 10 3 cascades for 25 recordings; standard error).(E): A schematic of a 2-node network.We redistributed the weights from the 2-node cycle to self-loops by Δw. (F): Distributions of cascade duration for Δw = 0.02, 0.26, and 0.50 in the 2-node cycle.(G): Cycles with strong connections, at either Δw 0 or Δw 1, extend the mean duration of cascades that do not reach fixed point 1 (quadratic fit: y = 2.7 × 10 5 x 2 − 2.7 × 10 5 x + 6.9 × 10 4 ).(H): Mean eigenvalue λ tracks a network geometry's capacity for long-lasting cascades that do not reach the fixed point 1.

Figure 4 .
Figure 4.Network controllability is tightly linked with cascade duration.(A): Pearson's correlation coefficients between mean duration of cascades from the stimulation of individual nodes and controllability measures of the respective nodes.Controllability measures include the magnitude of the eigenprojection (ME), modal controllability (MC), and finite average controllability (FAC).The networks here are 30 random instantiations of weighted random graphs, each with 100 nodes and a density of around 0.2.(B): The same plot as in panel a except with a bimodal distribution of weights-with 10% of connections normally distributed with a mean of 0.9 and 90% of connections with a mean of 0.1, all with a standard deviation of 0.1, before weight normalization.(C): Controllability measurements in spiking neurons in mouse somatosensory cortex predict cascade duration.Spearman's correlation between the duration of each cascade and mean finite average controllability of neurons active in its first T time bins (5 ms bins) for 25 MEA recordings.See Supplementary Information for individual plots.The box-plot elements, center, bottom and top edges, whiskers, '+' symbols, indicate respectively, the median, 25th and 75th percentiles, extremes, and outliers.Points are outliers if they are greater than q 3 + 1.5 × q 3 − q 1 or less than q 1 − 1.5 × q 3 − q 1 , where q n is the nth quartile.Extremes are the

Figure 5 .
Figure 5.A stimulus can be well-recovered when it generates long-lasting cascades.(A)-(B): A schematic showing two cascades triggered by different stimuli.(C): Recovery of the stimulus using an observation of a network state during a cascade.(D): Failed recovery of the stimulus.(E): Decay in mutual information (MI) over time.When activity from a stimulus pattern lasts longer, mutual information also persists for longer for a weighted random graph.(F): The linear decay rate of mutual information over the first 10 time steps plotted against the mean cascade duration in the example weighted random graph from panel E. (G): The Pearson correlation coefficients between the linear slope of decay in mutual information over time and the mean cascade duration for four graph types: a weighted random graph (WR), a random geometric graph (RG), a modular graph with 4 communities (M4C), and a Watts-Strogatz graph (WS).The boxplot shows data from 30 instantiations of each graph type, each network containing 100 nodes and characterized by a fractional connectivity of around 0.05.The whiskers extend to the extreme data points not considered outliers, and the outliers are plotted individually using the '+' symbol.(H):The mean decay rate in mutual information for a network is correlated with the sum of eigenvalues of the network (Pearson's correlation coefficient r = 0.92, p = 1.8 × 10 −49 ; N = 120).For all networks, we used a fractional connectivity of 0.05 to show a wide range of decay rates in mutual information (see Supplementary Information for simulations with other fractional connectivities).