Structural constraints in complex networks

We present a link rewiring mechanism to produce surrogates of a network where both the degree distribution and the rich-club connectivity are preserved. We consider three real networks, the autonomous system (AS)-Internet, protein interaction and scientific collaboration. We show that for a given degree distribution, the rich-club connectivity is sensitive to the degree–degree correlation, and on the other hand the degree–degree correlation is constrained by the rich-club connectivity. In particular, in the case of the Internet, the assortative coefficient is always negative and a minor change in its value can reverse the network's rich-club structure completely; while fixing the degree distribution and the rich-club connectivity restricts the assortative coefficient to such a narrow range, that a reasonable model of the Internet can be produced by considering mainly the degree distribution and the rich-club connectivity. We also comment on the suitability of using the maximal random network as a null model to assess the rich-club connectivity in real networks.


Introduction
In graph theory the degree k is defined as the number of links a node has. The distribution of degree P (k) provides a global view of a network's structure and is one of the most studied topological properties. Many complex networks are scale-free because they exhibit a power-law degree distribution, i.e. P (k) ∼ k −γ , γ > 1 [1,2,3,4,5,6,7,8]. A more complete description of a network's structure is obtained from the joint degree distribution P (k, k ′ ) [9,10,11], which is the probability that a randomly selected link connects a node of degree k with a node of degree k ′ . The degree distribution can be obtained from the joint degree distribution: P (k) = (k/k) k ′ P (k, k ′ ), wherek is the average degree.
The joint degree distribution characterises the degree correlation [12,13] between two nodes connected by a link. But in practice P (k, k ′ ) can be difficult to measure, in particular for a finite-size and scale-free network [14]. Nevertheless the degree-degree correlation can be inferred from the average degree of the nearest neighbours of k-degree nodes [10,15,16], which is a projection of the joint degree distribution given by If the nearest-neighbours average degree k nn is an increasing function of k then the network is assortative, where nodes tend to attach to alike nodes, i.e. high-degree nodes to high-degree nodes and low-degree nodes to low-degree nodes. If k nn (k) is a decreasing function of k then the network is disassortative, where high-degree nodes tend to connect with low-degree nodes. A network's degree-degree correlation, or mixing pattern, can also be summarised by a single scalar called the assortativity coefficient α, −1 ≤ α ≤ 1 [12], where L is the total number of links, k i , k j are the degrees of nodes i and j, and a ij is an element of the network's adjacency matrix, where a ij = 1 if nodes i and j are connected by a link otherwise a ij = 0 [17]. For an uncorrelated network α = 0, for a assortative network α > 0 and for a disassortative network α < 0. In some scale-free networks the best connected nodes, rich nodes, tend to be very well connected between themselves. A rich-club is the set of nodes R >k with degrees larger than a given degree k. The connectivity between members of the rich-club is measure by the rich-club connectivity [18], which is defined as the ratio of the number of links E >k shared by the nodes in the set R >k to the maximum possible number of links that the rich nodes can share, where |R >k | is the number of nodes in the set R >k [17,19]. The rich-club connectivity as a function of the degree is a global property of a network. It describes the interrelationship between subsets of nodes. The rich-club connectivity is a different projection of the joint degree distribution [19], where N is the total number of nodes and k max is the maximum degree in the network. The rich-club connectivity, the nearest-neighbours average degree and the assortative coefficient are not trivially related. Our motivation here is twofold. First to study if the description of a network using P (k) and φ(k) produces a reasonable model of a real network. We consider three real networks, the AS-Internet, the protein interaction and the scientific collaboration. Our approach is, from a real network, to create surrogate networks with the same P (k), or even the same φ(k), as the original network, and then compare properties of the surrogates with the original network. Second, we are interested in the properties of the surrogates, in particular the maximal random case of a network, as it has been used as a 'null model' to assess network properties.

Link Rewiring Algorithms
We create surrogate networks by using the link rewiring algorithms [20,21].

Maximal Cases I: Preserving P (k)
The broad degree distribution P (k) is an important characteristic for complex networks and it should be preserved by any link rewiring process [22]. Figure 1 shows that four nodes with degrees k 1 > k 2 > k 3 > k 4 can be connected by two links in three possible wiring patterns. One can see that reconnecting a pair of links from one wiring pattern to another preserves the degree of individual nodes and therefore preserves the degree distribution P (k). It is possible to obtain different kind of surrogate networks by rewiring links in the following ways.
• Maximal random case I : randomly choose a pair of links and swap two of their end nodes. This is equivalent to reconnect the four end nodes using a wiring pattern chosen at random. The process is repeated for a sufficiently large number of times.
• Maximal assortative case I : reconnect a pair of links in the assortative wiring pattern (see figure 1(a)) and repeat the process until all link pairs are assortative wired.
• Maximal disassortative case I : similarly, reconnect all pairs of links using the disassortative wiring pattern (see figure 1(b)).

Maximal Cases II: Preserving Both P (k) And φ(k)
It is possible to modify the link-rewiring process such that the rich-club connectivity is preserved as well. For a given degree k the rich-club connectivity φ(k) depends on the number of links E >k shared by the nodes belonging to the set R >k . Any rewiring between nodes belonging to R >k , or between nodes outside R >k , will not change E >k hence φ(k) will remain the same. As shown in figure 1, E >k 1 , E >k 2 , E >k 3 and E >k 4 in the disassortative wiring (figure 1(b)) and the neutral wiring (figure 1(c)) are the same, because the link e 1 only and always belongs to E >k 4 , and the other link e 2 only and always belongs to E >k 3 and E >k 4 . This means that when reconnecting a pair of links between the disassortative wiring and the neutral wiring, φ(k) remains unchanged for all degrees. This allow us to obtain a different set of maximal cases for a network while preserving both the network's P (k) and φ(k).
• Maximal random case II : if a chosen pair of links are assortatively wired, they are discarded and a new pair of links is selected; otherwise the four end nodes are reconnected using either the disassortative wiring or the neutral wiring at random.
• Maximal assortative case II : if a pair of links are not assortatively wired, the four nodes are reconnected using the neutral rewiring, which will produce a more assortative mixing than using the disassortative wiring. The process is repeated for all pairs of links.
• Maximal disassortative case II : if a pair of links that are not assortatively wired, the four nodes are reconnected using the disassortative wiring. The process is repeated for all pairs of links. Table 1 describes the data sets and some of their topological properties. Figure 2(a) shows that the three networks have a power-law decay in P (k). The degree distribution of the Internet is well approximated by P (k) ∼ k −γ , γ ≃ 2.24 [?], and it exhibits a fat tail where the maximum degree, k max = 2070, is larger than the power-law cut-off degree k cut = 1573. The degree distribution of the protein interaction and the scientific collaboration deviates from a strict power-law and have short tails. Figure 2(b) shows that the scientific collaboration exhibits the assortative mixing Table 1. Three real networks considered are: (a) the Internet network at the autonomous system (AS) level [5,15,23,24,25,26,27] from data collected by CAIDA [28], in which nodes represent Internet service providers and links connections among those; (b) the protein interaction network [6,29] of the yeast Saccharomyces cerevisiae (http://dip.doe-mbi.ucla.edu/); and (c) the scientific collaboration network [30,31], in which nodes represent scientists and a connection exists if they coauthored at least one paper in the archive. The three networks contain multiple components. In this paper we study the giant component of the networks. We show the following properties: the number of nodes N and links L in the giant component, the average degreek = 2L/N , the power-law exponent γ by fitting P (k) with k −γ for degrees between 6 (the average degree) and 40, the maximum degree k max , the power-law cut-off degree k cut = N 1/(γ−1) [9], the assortative coefficient α, the rich-club connectivity φ(k >40 ) between nodes with degrees larger than 40, the rich-club exponent θ obtained by fitting φ(k) with k θ for degrees between 6 and 40, the size of rich-club clique n clique , the average shortest path length ℓ, and the average shortest path length expected in a random graph ℓ * = ln N/ lnk [9]. behaviour, which is common in social networks. The Internet and protein interaction exhibit the disassortative mixing behaviour, which is typical for technological and biological networks. The mixing behaviours are also confirmed by evaluating the assortative coefficient of the networks (see α in table 1). Figure 2(c) shows that the three data sets exhibit different rich-club structures. Rich nodes in the disassortative Internet are significantly more tightly interconnected with each other than in the assortative scientific collaboration. Only the Internet contains a rich-club clique where the top 16 richest nodes are fully connected with each other (see n clique in table 1). One can see that an assortative network does not always exhibit a strong rich-club structure, neither does a disassortative network always lack a rich-club structure. Indeed high-degree nodes have very large numbers of links and only a few of them are enough to provide the connectivity to other high-degree nodes, whose number is anyway small [5].

Internet
A relevant metric of a network is the average shortest path length ℓ between all nodes. As shown in table 1 the average shortest path length in the Internet is significantly smaller than the average shortest path length expected in a random graph with the same network size. The Internet is so small [32] because it exhibits both a strong rich-club structure and a strong disassortative mixing behaviour. While members of the rich-club are tightly interconnected with each other and they collectively function as a 'super' traffic hub for the network, the disassortative mixing ensures that the majority of the network nodes, peripheral low-degree nodes, are always near the rich-club core. Thus a typical shortest path between two peripheral nodes consists of three hops, the first hop is from the source node to a member of the rich-club, the second hop is between two club members and the final hop is to the destination node. One can see that a combination of the degree-degree correlation and the rich-club connectivity can also explain the distribution of the hierarchical path [33] and the short cycles [20] in a network. Figure 3 shows the range of the assortative coefficient α obtained by the link rewiring algorithms preserving the degree distribution (case I ) against that preserving both the degree distribution and the rich-club connectivity (case II ). The maximal random case of a real network is averaged over 40 surrogate networks, each of which is obtained by repeating the appropriate link rewiring process for 1000 × L times, where L is the total number of links in the network.
For case I preserving P (k), the maximal random rewiring of the protein interaction and the scientific collaboration almost decorrelates the networks, and the assortative and disassortative rewiring can produce surrogate networks in the range from assortative to disassortative. This is in contrast to the Internet, where the maximal random case is Figure 3. Range of the assortative coefficient α of the three networks under study obtained by the link rewiring algorithms preserving P (k) (case I ) comparing with that preserving both P (k) and φ(k) (case II ). The inset shows the enlargement for the Internet. The standard deviation for a maximal random case is smaller than the symbol representing it.
almost as disassortative as the original data. In fact all the surrogate networks produced by rewiring the Internet are disassortative, the assortative coefficient is always negative and its value is restricted to a very small range. This behaviour of the Internet is due to the restriction of having a finite network that has a power-law decay in its degree distribution and that the maximum degree is larger than the cut-off degree [9,14].
For case II preserving both P (k) and φ(k), the range of α is narrower than case I when only fixing P (k). This result confirms the analytical analysis by Krioukov and Krapivsky [34] that the rich-club connectivity constrains a network's degree-degree correlation. In the case of the Internet, the assortative coefficient is restricted to a even smaller range. This observation suggests that a reasonable model of a real network can be produced by modelling the degree distribution and the rich-club connectivity, e.g. the Positive-Feedback Preference (PFP) model [27,35,36] of the Internet. Figure 4 shows the rich-club connectivity of the three networks each compared with their three maximal cases I obtained by preserving P (k). The rich-club connectivity changes dramatically due to the rewiring. For all the maximal assortative networks there is a notable increase of φ(k) throughout all degrees and all contain a fully connected richclub clique which consists nodes with degrees larger than 78, 48 and 46 for the Internet, the protein interaction and the scientific collaboration respectively. For all the maximal disassortative networks there is a complete collapse of the rich-club structure such that there is no single link shared between nodes with degrees larger than 23. This suggests that two networks with the same degree distribution can have very different rich-club connectivity. In other words the degree distribution does not constrain the rich-club connectivity. The rich-club connectivity is sensitive to the change of a network's degreedegree correlation. For the Internet, a minor change in the assortative coefficient within the narrow range of α ∈ (−0.218, −0.275) could reverse the rich-club structure completely. This highlights the importance to measure the rich-club connectivity when evaluating a network model.

Discussion
The maximal random network obtained by preserving P (k) has been used to discern whether the existence of an interaction between two proteins is due to chance or not [6]. To do such, the probability that two nodes share a link in the protein interaction network is compared against the probability that the same two nodes will share a link in the maximal random network. The maximal random network is used as a null model because in this case it is almost a decorrelated network (see figure 3).
Recently the maximal random network has also been used as a null model to detect the origin of the rich-club connectivity in real networks [19]. The argument is that if the rich-club connectivity of the original network is the same as that of the maximal random network then the rich-club connectivity was created by chance, otherwise there was an 'organisational principle' responsible for the existence (or the lack) of the richclub structure. In the case of the Internet, the original network ‡ and the maximal random network have similar rich-club connectivity (see red and short dashed line in figure 4(a)), then the conclusion in Ref. [19,38] was that 'hubs in the Internet ... are not tightly interconnected ' and 'the Internet does not have an oligarchic structure whereas, for example, scientific collaborations do'. However, as shown in figure 2(c), the Internet does contain a well connected rich-club core and we do not need more statistical analysis to support this observation.
To understand the problem of using the maximal random network of the Internet as a null model, one need to realise that the maximal random network in this case is not an uncorrelated network. On the contrary it is a strongly correlated network and is almost as dissasortative as the Internet. Rich nodes in both the original network and the maximal random network are tightly interconnected, and the similarity between the rich-club connectivity of the two networks does not implies that the Internet lacks a rich-club structure.
Notice that the maximal random network for the Internet with P (k) and φ(k) both fixed becomes more dissasortative than the original network (and more dissasortative than the maximal random network with only P (k) fixed, see inset in figure 3). This suggests that the rich-club structure depends strongly on the nature of the degreedegree correlation and it was not formed by chance. This strong dependence on the tail of the degree distribution (k max ) and the degree-degree correlation has also been noted in the estimates of large cliques that appear in random scale-free networks [39]. A more detail analysis of the null-model of the rich-club connectivity will be published elsewhere.

Conclusions
The rich-club connectivity and the degree-degree correlation describe the global structure of a network from different perspectives. We show that for a given degree distribution, the rich-club connectivity is sensitive to the degree-degree correlation, and on the other hand the degree-degree correlation is constrained by the rich-club connectivity. In particular for the case of the Internet, the assortative coefficient is always negative and a minor change in its value can reverse the network's richclub structure completely; if fixing both the degree distribution and the rich-club connectivity, the assortative coefficient is restricted to such a narrow range that a reasonable model of the Internet can be produced by considering mainly the degree distribution and the rich-club connectivity.
We also clarify some misinterpretations that have appeared in the literature which use the maximal random case as a null model to assess the rich-club connectivity in real networks. We remark that some care is needed to avoid reaching misleading conclusions, in particular when studying the Internet.