Building catastrophes: networks designed to fail by avalanche-like breakdown

We present a simple method for constructing networks designed to fail catastrophically due to an avalanche-like breakdown. Our method simulates an avalanche in reverse, building a network designed to fail by avalanche-like breakdown. Some restrictions are imposed on the output flow rates of the nodes. An expression for the critical output flow rate of a node is derived. Nodes in the network are considered to have failed when their output flow rate exceeds this value. Two cases are considered: networks where total flow in the network increases with network size; and networks where the total flow is constant. We also consider networks in which nodes have weighted output flow rates. The topology of the generated networks is studied, and it is seen that networks that are almost homogeneous in node degree may still fail catastrophically. Finally we present some possible extensions to the method.


Introduction
Man-made complex networks [1]- [3] such as the Internet, power transmission grids and telephone systems are susceptible to catastrophic failures in which the entire network ceases to function [4]- [6]. The most common cause of a catastrophic failure is an avalanche-like breakdown. This can result from the failure of a single node in a network in which nodes are sensitive to overloading. Redistribution of the load of this failed node over the network may cause other nodes to fail, triggering an avalanche-like event in which node failures propagate through the network. The entire network may in the end fragment into disconnected subnetworks. Networks with heterogenous node degree distribution, such as scale-free networks [7], are much more likely to suffer this type of event [8]. This is because a small subset of core nodes will be highly connected and handle much of the traffic in a scale-free network. If one of these heavily loaded core nodes ceases to function, either through malicious attack or random failure, it will have a large impact on other nodes in the network, making subsequent failures very likely. However, similar catastrophic failures are possible in networks with more homogeneous degree distributions. As we show in this paper, if the network degree distribution has just a small amount of heterogeneity then avalanche-like breakdowns are possible when all nodes are close to their failure load. Similar behaviour has been seen in social networks [6]. In the theoretical case of a completely homogeneous network in which all nodes are close to their maximum load, failure of a single node could cause the whole network to collapse in a single stage.
In this paper, we are concerned with transport networks in which particles of information (we shall call them packets in this paper) are transported through the network. Packet data networks such as the Internet are the most familiar examples of this, but the model can also be applied to road networks [9] and social acquaintance networks [10]. The most obvious approach to routing packets through a network, and the one used in the Internet, is to pass them through the shortest path. In Internet routing weights are placed on links according to different metrics. These weights are used to calculate shortest paths and generate routing tables [11,12]. Much work has been done in finding better alternatives to shortest path routing [13]- [17]. All show considerable improvements in carrying capacity, that is the load that can be carried by the network before jamming occurs. However, as Sreenivasan et al [18] showed, there is a limit to how much improvement may be made in this way. All heavily loaded networks are in the end vulnerable to cascade failure.

DEUTSCHE PHYSIKALISCHE GESELLSCHAFT
In [6,8], [19]- [21], cascading breakdown in static networks has been studied. In [6,8,20,21], the method is to overload one or more nodes in a pre-existing network and study the resulting cascade. Holme and Kim [19] have a slightly different approach, evolving a scale-free network until cascade failure occurs due to the increasing load in the network. (Load here is defined by the topological property betweenness centrality, defined in section 2.1 below.) In [19,20], breakdown is simulated by computer, whereas [6,8,21] use mathematical models. One difficulty of the former approach is that this type of simulation is very computationally demanding, which imposes a limit on the size of network that can be modelled. This makes it difficult to find out how well the mathematical models scale with network size. In this paper, we approach the study of cascading failure from another direction. We build networks in such a way as to ensure their breakdown. In essence we follow the cascading breakdown in reverse. By doing this we hope to better understand the dynamics of the process. This approach is also less computationally demanding and will therefore allow the simulation of larger networks. The model can be easily extended to real-world networks.

Definitions and network measures
It is conventional to represent a complex network by an undirected graph, G(V, E). Here V is the set of vertices of the graph representing the nodes of the network; E is the set of edges representing the links of the network. In a packet data network, for example, the vertices would represent routers or hosts; the edges data links. Edges are unweighted and there are no self-edges or duplicate edges between vertices. We assume that flows between source and destination all follow the shortest possible path (the geodesic path). The average shortest path length, where s,d is the length of the shortest path between source, s, and destination, d. N is the number of nodes in the network. As in [13,14,19,21] and others, we chose B(v), the vertex betweenness centrality [22,25] (often abbreviated to 'betweenness') to give a measure of the load on a node based purely on the topology of the network. If one imagines that for a single time step one packet of information is passed between each node pair in the network, the route taken always being the shortest path, then the load on any given node would be equivalent to the number of shortest paths passing through that node. 2 This is the basis of betweenness. The proportion of shortest paths from s to is the number of shortest paths between s and d, and σ sd (v; s, d) is the number of shortest paths between s and d that pass It should be noted that our definition of B(v) is slightly different from others. In Freeman's original definition [22], node v is not counted as either source or destination when summing values of p s,d (v) in (2). Other authors do include v as source or destination [10,23]. In our case we would like to include single hop routes (routes with no intervening nodes) and allow packets to leave the network immediately on reaching their destination. Hence when summing in (2), v can be the source, but not the destination.
A property of the betweenness centrality as defined here is that 3

Load and congestion at a node
The average information flow arriving at node v is [24,26,27] where F( , N ) is the flow generated per unit time by the whole network. The flow is a function of the rate of packet production at a node, , and network size, N. If µ v is the output flow, then the node will get congested if its input flow is greater than its output flow, λ v µ v . The onset of congestion therefore occurs at the critical value: We consider two cases: 1. Each node v produces packets at a rate v = , distributed evenly between the N − 1 destinations. In this case total flow in the network increases with network size. The total flow in the network is F( , N) = N . If node v is the first to get congested in the network, it follows from (5) that this will occur when the packet production rate reaches the critical value [24]: In terms of betweenness, congestion will occur when (see [13,14,17,18,26]) 5 DEUTSCHE PHYSIKALISCHE GESELLSCHAFT 2. Only K of the nodes produce packets (all at rate ) distributing flow evenly amongst the N − 1 possible destination nodes. The total flow in the network is then F( , N) = K where K is a constant. That is, total flow in the network is independent of network size. The average arrival rate of packets at node v is The corresponding critical load and betweenness for node v are: and

Avalanches and betweenness
An avalanche in one of our networks would occur in the following way. When a node became congested all edges connected to that node would be removed. After removal of this node and its edges, loads would be recalculated. The load on other nodes might increase sufficiently for them to also get congested: these nodes also would be removed from the network and loads would again be recalculated. The process would continue until no nodes in the network were overloaded.
Considering the first case of section 2.2 in which the total flow in the network increases as the network grows: equation (7) holds, that is node v will get congested when B(v) = (µ v (N − 1))/ * . After node v and its links are removed, node w will become congested when B (w) = (µ w (N − 1))/ * , where B (w) is the betweenness of node w in the reduced network and N is the size of the reduced network. Hence B(v) and B (w) must satisfy If we consider the case µ v = 1 for all v, then a lower bound for the betweenness of the network is obtained from (3). In this case the vertex with the highest critical load is the vertex with the largest betweenness, so if the average betweenness in the network is given by and the maximum betweenness, B max = max{B(v), v ∈ V}, then we have a lower bound to B max [19]: B max (N − 1)¯ . From (7) we can obtain an upper bound to the maximum betweenness by noticing that the vertex with the largest betweenness will be congested if its load is greater than or equal to * , in this case B max (N − 1)/ * . A similar argument may be applied in the second case of section 2.2. Here the total flow is constant, independent of network size. The betweennesses in the original and reduced networks are related by As before, if we take µ v = 1 for nodes v, then (N − 1)¯ B max N(N − 1)/(K * ).

Building catastrophes
To build a catastrophic network we follow the avalanche process in reverse. Starting with one or more small core networks we build the network a node at a time. The process is illustrated in figure 1. Square nodes are congestion nodes, required to fail in the avalanche. n of these congestion nodes have been added to the network at level n. At level n + 1 the (n + 1)th node is added-the grey square. To satisfy the conditions for an avalanche we require this node to fail. The condition for this is * n+1 * n , where * n and * n+1 are the critical packet production rates for the network as it is at level n and at level n + 1. For our purposes we want * n+1 ≈ * n . Apart from changing n , we can affect the load either by adding links between the new node (grey square) and the original network or by adding a new node that connects with the new congestion node and/or any of the other nodes (grey circles) introduced at level n + 1. We carry on adding new nodes and links, following these rules, until the condition * n+1 * n is satisfied. We then continue to level n + 2 where the next square node is added.

Examples
A possible starting network is a star network. For a star the betweenness of the rays of the star (as defined in (2)) is given by B r = N − 1; for the centre of the star the betweenness is B c = (N − 1) 2 . The centre of the star will become congested at the critical packet production rate * c = (N − 1)/B c . In figure 2 the maximum betweenness (that is the betweenness of congestion node n) is plotted against network size. In this case output flow µ v is assumed constant for all nodes. Squares show the maximum betweenness at each step of the growth; circles represent the . At each step in the building of the network the program searches for a network satisfying * n+1 ≈ * n with the constraint that * n+1 * n (or the avalanche will not occur). Finding a network satisfying these conditions was not always possible, especially in the case of figure 2(a). A network that becomes congested at the target * does not always exist and becomes harder to find as the network grows. This explains the divergence of B max from the upper limit in figure 2(a). The upper bound is followed closely in figure 2(b), so the maximum betweenness is approximately proportional to the square of the network size in this case.
In figure 3 we show histograms of the degrees of the nodes in the network. As in figure 2, figure 3(a) shows data for a network in which total flow grows with network size; in figure 3(b) the total flow is independent of network size. In the first case there is not much variation in node degree. In the second case the degree distribution is skewed and has an exponential shape for low degree values, so this is a heterogeneous network in terms of node degree.
It is also possible to construct networks in which the output flow µ is not the same for all nodes. This makes it possible to build networks in which the majority of node failures triggered by the avalanche happen at nodes having a large output flow; the remainder at nodes having a small output flow. This simulates man-made networks such as the Internet or power grids where a main server or an electrical substation has a large output flow and consequently more 'importance' in the network. Failure may begin with a node with high centrality (measured by betweenness), but the next node to fail in the avalanche may have a relatively small centrality, yet be fundamental to the propagation of the avalanche. In this case the node's betweenness does not reflect its importance in the cascade sequence. This behaviour is illustrated in figure 4. Here nodes failed in the sequence A, B, C, D even though A, C and D can handle twice the flow B can. The radii of the nodes in the figure are proportional to the square root of their betweennesses. Clearly the order of failure is unrelated to the betweenness of a node.

Conclusions
We have presented a simple mechanism for building networks designed to fail catastrophically. The failure of nodes in the network is related to the node's output flow rate. The technique can be used to construct networks with nodes that have differing output flows, so we can produce a network with nodes that have low topological importance (or centrality), but are crucial in the avalanche-producing catastrophe.
In the case where total flow in the network increases linearly with network size, we find that the network formed has a small amount of heterogeneity in node degree distribution. This shows that catastrophic failure does not only occur in highly heterogeneous networks like the Internet. If all nodes have similar loads and are close to their failure threshold, then cascade failure is also possible in almost homogeneous networks.
The next stage in the work is to modify the technique so that the generated networks have more realistic topologies. In addition, our method applies to bufferless networks, we intend to extend it to account for queueing at nodes in the network as occurs in packet data networks. There is a need for more rigorous theoretical results to accompany this future work.
There are many other ways to extend our method. Other measures of centrality representing different flow mechanisms could be used, and routing mechanisms other than shortest path [13]- [17] might be considered. Another possibility is to allow the creation of edges between nodes that do not get congested.
Finally, we make the comment that usually, as the name implies, catastrophic failures are unwanted and efforts are made to prevent their occurrence. However, there are circumstances in which this property is desirable. In vehicle and shop windows, for example, tempered glass is used, partly because it is stronger, but also because it has the property of shattering into much safer small pieces when broken. In cases like this catastrophic failure might be seen as being 'engineered into' the material. Another example in which catastrophic failure would be desirable is that of criminal networks where one person's capture may result in the collapse of the whole network. These types of total catastrophic failure are similar to that seen in our current model.