This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy. Close this notification
Paper The following article is Open access

A k-shell decomposition method for weighted networks

, and

Published 24 August 2012 © IOP Publishing and Deutsche Physikalische Gesellschaft
, , Citation Antonios Garas et al 2012 New J. Phys. 14 083030 DOI 10.1088/1367-2630/14/8/083030

1367-2630/14/8/083030

Abstract

We present a generalized method for calculating the k-shell structure of weighted networks. The method takes into account both the weight and the degree of a network, in such a way that in the absence of weights we resume the shell structure obtained by the classic k-shell decomposition. In the presence of weights, we show that the method is able to partition the network in a more refined way, without the need of any arbitrary threshold on the weight values. Furthermore, by simulating spreading processes using the susceptible-infectious-recovered model in four different weighted real-world networks, we show that the weighted k-shell decomposition method ranks the nodes more accurately, by placing nodes with higher spreading potential into shells closer to the core. In addition, we demonstrate our new method on a real economic network and show that the core calculated using the weighted k-shell method is more meaningful from an economic perspective when compared with the unweighted one.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

The continuously growing interest in complex network science has resulted over the last few years in novel methods of analysis for a large number of complex systems in various scientific fields [17]. The fundamental view of this interdisciplinary approach is that large complex systems can be described as complex networks (or graphs in mathematics terminology) where the nodes (or vertices) represent the system's interacting elements and the links (or edges) represent their interactions. This unified view was used in the analysis of social [79], biological [1013], physiological [14], technological [15, 16], climate [1719], economic [2023] and financial systems [24, 25]. In combination with the technological advances that made enormously detailed data available, we are now able to understand and model the evolution of dynamical processes, such as epidemic outbreaks and information spreading [2630].

Even the earliest empirical works in this field made it clear to researchers that the topology of a network affects its properties. For example, networks with broad degree distributions are more robust to random failures, but are fragile under intentional attacks [3135]. Nowadays, there is a growing body of literature trying to understand global properties of a network by focusing on the properties of individual nodes, and their connectivity patterns [36]. Of course, the role of individual nodes has a profound relation with the evolution of any dynamical process, and with the evolution of the network itself. For example, very popular individuals in a social network (i.e. individuals with a large number of connections) usually attract more attention and increase their connectivity even more. While it is clear that such processes affect the evolution of the network topology, we can imagine that such individuals could assume key roles in the case of disease spreading, etc.

It is clear that questions such as 'who are the most important nodes in the network?' are natural to ask. Such questions can be addressed using centrality measures, which are the most frequently used measures when it comes to quantitative network analysis. However, there is a variety of centrality measures aiming to address the question of node 'importance'. For example, there is the degree centrality (or just the degree of a node, i.e. the number of its links), the eigenvector centrality [37], the betweenness centrality [38], the closeness centrality [39], etc. In this paper, we focus on a centrality measure based on the notion of k-cores which is a fundamental concept in graph theory [40] when it comes to ranking the centrality of nodes in a complex network. Such a ranking was applied in many real networks [21, 4148], allowing a thorough investigation of their structure, while highlighting the role of various topology-dependent processes.

One major limitation of most centrality measures, including the k-core decomposition method, is their design to work on unweighted graphs. However, in practice, real networks are weighted, and their weights describe important and well-defined properties of the underlying systems. In a weighted network, nodes have (at least) two properties that can characterize them, their degree and their weight. However, since weights are properties of the network's links, the node's weight is calculated as the sum over all link weights passing through a particular node. These two properties, even though in some cases they are correlated, are in general independent. As a result, nodes with high degree can have small weight (i.e. they have many connections to other nodes but the links of these connections have small weights), while there could also be nodes with small degree and high weight. Situations where the weights play an important role occur, for example, in economic or trade networks. In such networks, the weights are related to some measured property (such as trade flow, capital flow, etc), and in many cases one wishes to focus on nodes with high weights that are (usually) the most important players. Thus, in such systems the presence of nodes with high degree and relatively small weights may influence the results obtained by methods that are based only on the degree. In such cases, two main approaches have been used, with both having their own drawbacks. Under the first approach, one completely neglects the weights and performs the analysis on the unweighted network, but doing so one chooses to neglect an important property of the network. The second approach would be to consider only links with weights above some—(usually) arbitrary chosen—threshold value and filter out the rest. The drawback of this approach is the selection of a proper cutoff value, which may remove important high degree nodes with links of low weights (below the threshold), and as we will discuss later, this could have significant impact on the results. Additionally, by neglecting links below a threshold, the network becomes sparser with some nodes getting disconnected and not considered by the applied method afterwards.

In this paper, we aim to overcome these limitations by introducing a generalized method for calculating the k-shell structure of weighted networks. The paper is organized as follows. First we discuss the standard k-shell decomposition method, and right after that we introduce our generalized version. Next we apply both methods on real networks and present their results. Subsequently, we compare in more detail the performance of both methods in ranking nodes according to their importance when it comes to spreading processes, and finally we present the conclusions.

2. The unweighted k-shell decomposition method

The k-core/k-shell decomposition method partitions a network into sub-structures that are directly linked to centrality [49]. This method assigns an integer index, ks, to each node that is representative of the location of the node in the network, according to its connectivity patterns. Nodes with low/high values of ks are located at the periphery/center of the network. This way, the network is described by a layered structure (similar to the structure of an onion), revealing the full hierarchy of its nodes. The innermost nodes belong to a structure called the core or 'nucleus' of the network, while the remaining nodes are placed into more external layers (k-shells).

A more detailed description of how a network is divided into this k-shell structure is as follows (see figure 1). First we remove recursively from the network all nodes with degree k = 1, and we assign the integer value ks = 1 to them. This procedure is repeated iteratively until there are only nodes with degree k ⩾ 2 left on the network. Subsequently, we remove all nodes with degree k = 2 and assigns to them the integer value ks = 2. Again, this procedure is repeated iteratively until there are only nodes with degree k ⩾ 3 left on the network, and so on. This routine is applied until all nodes of the network have been assigned to one of the k-shells. This is how the original k-shell decomposition method works, which, as described above, does not consider the weights of the links at all; therefore, from now on we will call it the unweighted k-shell decomposition method (Uk-shell).

Figure 1.

Figure 1. Illustration of the layered structure of a network, obtained using the k-shell decomposition method. The nodes between the two outer rings compose shell 1 (ks = 1), while the nodes between the two inner rings compose shell 2 (ks = 2). The nodes within the central ring constitute the core, in this case ks = 3.

Standard image

3. The weighted k-shell decomposition method

Here we propose a generalization of the k-shell decomposition method, which we call the weighted k-shell decomposition method (Wk-shell). This method applies the same pruning routine that was described earlier, but is based on an alternative measure for the node degree. This measure considers both the degree of a node and the weights of its links, and we assign for each node a weighted degree, k'. The weighted degree of a node i is defined as

Equation (1)

where ki is the degree of node i and $\sum _{j}^{k_{i}}{w_{j}}$ is the sum over all its link weights. In this paper, we discuss only the case when α = β = 1, which treats the weight and the degree equally. The full exploration of the parameter space is beyond our scope and is left for future work. Therefore, for what follows, $k'_{i}=\sqrt {k_{i} \sum _{j}^{k_{i}}{w_{ij}}}$ .

Using the above approach in the case of unweighted networks, where wij = 1, the weighted degree is equivalent to the node degree (k' ≡ k), and we resume the same network partitioning as with the Uk-shell decomposition method. However, in order that a typical weighted link be regarded as of unit weight before we calculate k' using equation (1), we perform the following steps. First, we normalize all the weights with their mean value 〈w〉, next we divide the resulting weights with their minimum value, and we discretize them by rounding to the closest integer; this way the minimum link weight is equal to 1.4

In figure 1, we illustrate schematically the layered structure obtained by applying the Uk-shell decomposition method in a graph. In order to highlight the weaknesses of the unweighted method, let us suppose that the network is weighted. For simplicity, we assume that all link weights are equal to 1, except for the weight of the link between nodes A and B, which is wAB = 3. As illustrated in figure 1, the node B is located at the periphery of the network, even though it is strongly connected to one of the core nodes. In real networks such a strong link (three times the capacity of other links) means that this particular node is of more importance for the core, but this is not depicted in the layered structure calculated by the classical unweighted approach, since this node will be placed in the outermost shell (ks = 1). However, if we apply the Wk-shell decomposition method, then node B is assigned to ks = 2 that is one shell away from the core of the network, highlighting its actual importance.

4. Application to real networks

In order to compare the results obtained from the Uk-shell and the Wk-shell decomposition method, we used as case studies the following four real networks:

  • (i)  
    Corporate ownership network (CON). This is an economic network linking 206 different countries. It is constructed [21] using the 616 000 direct or indirect subsidiaries of the 4000 world corporations with the highest turnover, based on the 2007 version of the ORBIS database obtained from the Bureau van Dijk Electronic Publishing (BvDEP)5. The network is weighted, and its weights represent the business ties among countries [21].
  • (ii)  
    The collaboration network of scientist working in network science (SCIE). This network contains the co-authorship relations of scientists working on network theory and experiment, as compiled by Newman [50]. The network is weighted, and its weights are assigned as described in [51].
  • (iii)  
    The neural network of the nematode C. elegans (CEL). This network was compiled by Watts and Strogatz [52] using the original experimental data of White et al [53]. It is a weighted representation of the neural network of C. elegans.
  • (iv)  
    The US air transportation network (AIR). This is a weighted network obtained by considering the 500 US airports with the largest amount of traffic from publicly available data [54]. Nodes represent US airports and edges represent air travel connections among them. It reports the anonymized list of connected pairs of nodes and the weight associated with the edge, expressed in terms of the number of available seats on the given connection on a yearly basis.

In table 1, we provide some detailed statistical properties of the above networks. For our analysis, if not stated otherwise, when we talk about the network we refer to the largest connected component (LCC), and whenever we discuss network properties these are calculated from the LCC.

Table 1. Statistical properties of the networks used in our analysis. Here NN is the number of nodes, NE is the number of edges, 〈k〉 is the average degree of the network nodes, d is the diameter, C is the clustering coefficient [52] and B is the network's betweenness [38, 55]. If the original network is disconnected, we only consider its LCC.

Network NN NE k d C B
CON 206 2886 28.0 4 0.38 94.6
SCIE 379 914 4.82 17 0.43 952.9
CEL 297 2345 15.8 5 0.18 215.4
AIR 500 2980 11.92 7 0.35 496.7

In table 2, we compare the network hierarchies obtained by applying the Uk-shell and the Wk-shell decomposition method. We observe that the Wk-shell method yields a more refined partitioning (a larger number of k-shells) of the networks. This means that by applying this method we obtain more detailed information about the networks' internal structure, which is similar to using a high-resolution microscope to observe small-size structures of a larger system.

Table 2. Comparison of the network hierarchies obtained by the Uk-shell and Wk-shell decomposition method. Here sU and sW is the total number of k-shells, while nUc and nW c are the total number of nodes in the cores obtained using the Uk-shell and the Wk-shell, respectively. NC is the number of common nodes in both cores, NUW is the fraction of nodes of the core obtained by the Uk-shell that also belong to the core obtained by the Wk-shell and NWU is the fraction of nodes of the core obtained by the Wk-shell that also belong to the core obtained by the Uk-shell.

Network sU sW nUc nW c NC NUW NWU
CON 28 87 41 11 11 0.27 1
SCIE 8 10 9 13 9 1 0.69
CEL 10 21 119 26 26 0.22 1
AIR 29 257 35 31 28 0.8 0.9

Furthermore, for three out of the four studied networks the core obtained with the Wk-shell contains a smaller number of nodes, while these nodes are almost entirely part of the core obtained by the Uk-shell. This means that the weighted method in most cases is able to split the cores obtained by the unweighted method further and to identify which are most central of the central nodes.

In figure 2, we plot the degrees of the nodes according to the k-shell they belong to (expressed as the distance from the core of the network). The node ranking is obtained using the Wk-shell method for all the four different networks described above. As shown in figure 2, the degree is highly (and nonlinearly) correlated with the position of the node in the k-shell structure, but there are particular cases where the trend is not monotonic. This means that there are nodes with high degree that may not be as central to the network as one would expect; this is in line with our discussion for the example network of figure 1.

Figure 2.

Figure 2. The average degree of all nodes in each shell, obtained using the Wk-shell decomposition method. The shaded area highlights the full range of the degree values in each shell. The shells are ranked according to their distance from the core, and the error bars are showing the standard deviation. Insets: zoom to distances closer to the core for networks with a large number of shells.

Standard image

4.1. A detailed example: analysis of the core of an economic network

Next we compare the core of the Uk-shell and the Wk-shell decomposition methods applied to the global CON studied in [21]. The CON connects 206 countries around the globe, using as links the ownership relations within large companies. If companies listed in country A have subsidiary corporations in country B, there is a link connecting these two countries directed from country A to country B. The weight of the link, wAB, equals the number of the subsidiary corporations in country B controlled by companies of country A.

Using the Uk-shell decomposition method, as shown in table 2 and figure 3, we identify a core of 41 countries. However, we expect that in the current state of the global economy, a smaller set of countries are the major players (G8, G20, etc). In order to reduce the size of the core and to highlight which are the potentially more important nodes of this network by using the classic k-shell decomposition method, a cutoff value of wc = 100 was assumed by Garas et al [21]. It was shown that the remaining network after filtering the links with wc < 100 contains only 66 out of the original 206 nodes. However, a core formed by the following 12 countries: the United States of America (US), the United Kingdom (GB), France (FR), Germany (DE), the Netherlands (NL), Japan (JP), Sweden (SE), Italy (IT), Switzerland (CH), Spain (ES), Belgium (BE) and Luxembourg (LU), was identified. In figure 3 the evolution of the core and network size of the CON is shown, as a function of the weight cutoff value wc.

Figure 3.

Figure 3. Changes in the CON network structure when using different weight cutoff values wc. Panels (A)–(C) show the network snapshots around the central region for wc = 3, 75 and 150, respectively. The size of the nodes is proportional to their degree. (D) Evolution of the core size as a function of wc (after Garas et al [21]). (E) Fraction of nodes and links of the original network that remain for different wc values.

Standard image

Using the Wk-shell decomposition method, we obtain the layered structure of the network including all the 206 nodes, without using any arbitrary cutoff parameter. The core of the network obtained with this method consists of the following 11 counties: US, GB, FR, DE, NL, JP, Canada (CA), IT, CH, ES and BE. Comparing these two cores we find a striking similarity. The only two differences are the presence of CA in the core calculated using our new weighted k-shell approach, while SE and LU have moved to the second innermost layer. These differences can be well understood considering that CA is a major economy; it is part of G7, and all the other six members of G7 are already part of the core. Furthermore, CA outperforms SE and LU in terms of population and other macroeconomic indicators, such as total import/exports and GDP. It is thus natural to conclude that the core obtained using the Wk-shell decomposition method is more meaningful from an economic perspective, since it groups together some of the largest (developed) global economies.

5. Dynamics: shell positioning and spreading potential

In recent years, models such as the susceptible infectious recovered (SIR) model [56] have been used extensively in network research in order to explore epidemic spreading [27, 5658], economic crisis spreading [21] as well as information and rumor spreading [26, 28] in social processes. In such processes the topology of the network is not the only thing that matters; the position of the node where the spreading begins plays an important role as well. In the recent work of Kitsak et al [48], it was shown that the spreading power of a node cannot be predicted solely based on its degree. A better measure is its actual position in the network, as it is described by the k-shell where it belongs.

Using this perspective, it is reasonable to assume that a k-shell partitioning method provides us with a more accurate node ranking for representing the nodes' spreading power. In addition, since the individual nodes are grouped in k-shells, it is reasonable to assume that every k-shell should contain nodes with similar spreading power. In what follows, we will use these assumptions to evaluate and compare the performance of the Uk-shell and Wk-shell decomposition methods.

We modeled the spreading process by applying the SIR model on all the networks described above. However, since we are interested in the weights of the network, we used a version of the SIR model which takes into account the weight of the links that mediate the spreading. This model was originally introduced to simulate the spreading of an economic crisis [21]; for this model the probability of infection is different for every link and is calculated by

Equation (2)

where wij is the weight of the link that connects the origin node i with the destination node j, and $\tilde {w}_{j}$ is the total weight ($\tilde {w}_{j}=\sum _{i}w_{ij}$ ) of the destination node j. The factor m is a free amplification parameter that can determine, for example, the severity of a crisis, how infectious a virus is, the importance of a rumor, etc. In what follows, we will call this model weighted SIR (W-SIR).

The modeling procedure of the W-SIR is the following. Initially, we assign all nodes to be susceptible (S) to an infection. Next, one node, i, is chosen and is assumed to be infected (I). This node will infect all its neighboring nodes with probability pij during the first time step. This causes all infected nodes to switch their status from S to I, while the node that initiated this process changes to the recovered state (R), and can no longer infect other nodes or become infected. At every consecutive time step the process is repeated, and all the infected nodes are trying to infect their susceptible (S) neighbors in the network. The process lasts until there are no infected nodes left in the network.

For each individual node we performed 100 realizations of the W-SIR model, and we calculated the average infected fraction of the network for different values of m∈[0,10]. This fraction is used as score in order to rank the nodes according to their spreading potential. We restricted ourselves to values of m in this interval, as for much larger m values the role of individual nodes is no longer important, and an epidemic outbreak emerges no matter where the infection starts. Next, we partitioned the network using the Uk-shell and the Wk-shell decomposition methods, and ranked the obtained k-shells according to their distance from the core. By calculating the average infected fraction that results from an epidemic starting separately from all nodes of every individual k-shell, we estimated the shell's spreading potential.

In figure 4, we study how the average infected fraction changes versus the distance of each k-shell from the core of the network for both methods. We find that, in general, the central k-shells obtained by the Wk-shell method are more able to initiate a severe outbreak in comparison with the central k-shells obtained using the Uk-shell method. This result is robust for all networks used in this study, and for different values of the parameter m. The above finding means that the Wk-shell decomposition method positions the nodes with the higher average spreading potential in shells closer to the core.

Figure 4.

Figure 4. Average infected fraction of a k-shell versus the shell's distance from the core of the network.

Standard image

Next, we tested how homogeneous are the obtained k-shells with respect to the spreading potential of their containing nodes. In order to do so, we calculated the standard deviation, σ, of a node's infected fraction (spreading potential) for every k-shell for a given value of the parameter m. Next we calculated the average value over all the shells, 〈σ〉, and we plotted it versus m (figure 5). We find that the average standard deviation of the spreading potential using W-SIR is always lower when we partition the network using the Wk-shell method, with respect to partitioning using the Uk-shell method. This means that the Wk-shell method gives more homogeneous k-shells, where all nodes in the shell have similar importance for the dynamical process in question.

Figure 5.

Figure 5. Average value of the spreading potential of nodes within a k-shell over all shells, 〈σ〉, versus m.

Standard image

As a final step and given that the Wk-shell method performs better in positioning the nodes according to their W-SIR spreading potential in weighted graphs, it is interesting to further explore the role of the weights in this process. To do so, we created ten realizations of the CON network with shuffled weights, and we performed 100 runs of the W-SIR model on every one of these ten networks. Next, we calculated the average spreading potential per k-shell using the infected fraction obtained by the implementation of W-SIR on the network with shuffled weights. As shown in figure 6, in the shuffled case the k-shells are becoming significantly more inhomogeneous, and their 〈σ〉 is always larger that the 〈σ〉 obtained by the original, unshuffled network. This procedure highlights the role of the weights in the process, since in the case when the weights do not to play any role these two curves should collapse into one.

Figure 6.

Figure 6. Comparison of 〈σ〉 versus m for two different configurations of the CON. Wk-shell—W-SIR is the original case (also shown in figure 5) where the nodes' spreading potential is obtained by applying the W-SIR in the original network. Wk-shell—(Sh)W-SIR is a case when we calculated the nodes' spreading potential by applying the W-SIR on the ten realizations of the CON with shuffled weights.

Standard image

6. Conclusion

In summary, we presented a generalized k-shell decomposition method (Wk-shell) that considers the link weights of networks, without applying any arbitrary cutoff threshold on their value. The method resumes the same shell structure obtained by the classic k-shell decomposition in the absence of weights, but when weights are present, it is able to partition the network in a more refined way. In its general formulation, our method allows us to vary the importance assigned to either the node weights or the node degree, by adjusting the exponents α and β of equation (1). While in this paper we did not fully explore the parameter space, we would like to stress that this additional flexibility provides a more accurate ranking for various applications. Here, using α = β = 1 we showed that the partitioning obtained by the Wk-shell method is particularly meaningful in terms of the spreading potential of the nodes. We demonstrated the weighted version of the SIR model in four different networks, and showed that nodes with higher spreading potential were positioned in the core or in shells closer to the core better in comparison with the Uk-shell method.

Acknowledgments

SH thanks the European EPIWORK and LINC projects, the Israel Science Foundation, the Office of Naval Research (ONR), the Deutsche Forschungsgemeinschaft (DFG) and the Defense Threat Reduction Agency (DTRA) for financial support. AG acknowledges financial support from the Swiss National Science Foundation (project no. 100014 126865).

Footnotes

  • We also tested the effect of the normalization by dividing it by the minimum weight, and the results we obtained in terms of node positioning with or without the normalization were similar.

  • Bureau van Dijk Electronic Publishing (BvDEP), http://www.bvdep.com/.

Please wait… references are loading.
10.1088/1367-2630/14/8/083030