Structure and growth of weighted networks

We develop a simple theoretical framework for the evolution of weighted networks that is consistent with a number of stylized features of real-world data. In our framework, the Barabási–Albert model of network evolution is extended by assuming that link weights evolve according to a geometric Brownian motion. Our model is verified by means of simulations and real-world trade data. We show that the model correctly predicts the intensity and growth distribution of links, the size–variance relationship of the growth of link weights, the relationship between the degree and strength of nodes, and the scale-free structure of the network.


Introduction
Graph theory has been used to describe a vast array of real-world phenomena, but only recently the attention has shifted from binary to weighted graphs, from both an empirical [1,2,3,4,5,6] and a theoretical perspective [7,8,9].The empirical literature has come up with a number of robust stylized facts that apply to a wide range of phenomena as different as Internet traffic, airport connections, and international trade.In particular, it has been demonstrated that weighted graphs display (i) a power law connectivity distribution P (K), with finite size truncation [10,3]; (ii) a skewed distribution of link weights P (w) and node strengths measured as the sum of the weights of the links of a given node P (W ) [11,12]; (iii) a powerlaw relation between node strength W and node degree K: W = K θ , with θ ranging between 1.3 and 1.5 [10,13].
In this paper we present a simple stochastic model of proportionate growth of both the number and the weight of links to describe the structure and evolution of weighted networks and account for above mentioned regularities.In our setup we extend the Barabási and Albert (BA) model [14] to accommodate weighted network dynamics.This is done exploiting the theoretical framework recently put forward by Stanley and co-authors to explain the scaling distribution of fluctuations in complex systems [15,16,17].
We test our model using data on the network of international trade flows, which is a prototypical example of a real-world network that is inherently weighted.International trade flows have been traditionally analyzed in the contest of the socalled gravity model [18] that relates bilateral flows to countries' size and distance.However, one of the main limits of this approach is its inability to capture the large fraction of zeros existing in the matrix of bilateral links.Although this has recently been addressed in the context of standard economic theory [19], graph theory as been applied to naturally accommodate this feature of the data.
We selected the international trade network (ITN) as a testbed for our model based on the following considerations.First, the ITN has been already extensively investigated [5,6,20,21,22,23,24], and previous works on the ITN provide us with a rich set of empirical regularities.Thus, we know that the link weight distribution assumes a log-normal form in the case of the ITN [23,24], whereas their growth rates display fat tails [24].Second, the relationship between node strength and degree is crucial in the economic literature about ITN since it is related to the interplay between intensive and extensive margins of trade, which is key to explain trade flows [25]. 1 Third, despite the structural inertia of the ITN, the huge volatility of trade flows after the 2008 global financial crisis has recently attracted a great deal of attention.Our theoretical framework provides an explanation for the relationship between node centrality and the variance of network flows.
The paper is organized as follows.Section 2 presents the model and its most important predictions.We then test our model using data on the ITN (Section 3) and simulations (Section 4).Finally, in the last Section we lie down some conclusions and outline possible patterns for future research.

The model
Barabási and Albert [14] have proposed a simple stochastic model of network growth based on preferential attachment which accounts for many of the stylized facts observed in real-world networks.The increasing interest in the study of weighted versions of networks calls for an extension of the original BA model to account for the large degree of heterogeneity across link weights [7,9].The route we take here exploits the theoretical framework recently put forward by Stanley and co-authors [16] to deal with the growth dynamics of complex systems.We prove that our model is capable to accurately match the structural properties that characterize a number of real-world weighted networks.
We therefore propose a generalized version of the BA model to describe the dynamic and growth of weighted networks, by modeling them as a set of links of different weight occurring among nodes.In particular, we assume that the weight of links grows according to a geometric Brownian motion (also known as Gibrat's law of proportionate effects [26]), so that the expected value of the growth rate of link weights is independent of their current level.
The key sets of assumptions in the model are the following [14,27,16]: 1.The network begins at time t = 0 with N 0 nodes each with a self loop.At each time step t = {1, . . ., M}, a new link among two nodes arises: thus the number of links (excluding self-loops which are used only for initialization) existing at time t is m t = t.We write K i (t) for the number of links of node i at time t (node degree).To identify the nodes connected by the newly formed link at time t we adopt the following procedure: with probability a the new link is assigned to a new source node, whereas with probability 1 − a it is allocated to an existing node i.In the latter case, the probability of choosing node i is given by: p i (t) = K i (t−1)/2t.Edge endpoints i and j of the new link are chosen symmetrically with i = j.Thus with probability a the new link is assigned to a new target node, while with probability 1 − a it is allocated to an existing node with probability p j (t) = K j (t − 1)/(2t − K i (t − 1)) if j = i and p j (t) = 0 otherwise.Hence, at each time t this rule identifies the pair of (distinct) nodes to be linked; 2. at time t each existing link between nodes i and j has weight w ij (t) > 0, where K i , K j and w ij are independent random variables.At time t + 1 the weight of each link is increased or decreased by a random factor x ij (t), so that w ij (t + 1) = w ij (t)x ij (t).The shocks and initial link weights are taken from a distribution with finite mean and standard deviation.
Thus we assume that each link weight grows in time according to a random process.Moreover the two processes governing link formation and weight growth are assumed to be independent.We therefore combine a preferential attachment mechanism (Assumption 1), with an independent geometric Brownian motion of link weights (Assumption 2).In this way we obtain a generalization of the BA setup capable to account for the growth of weighted networks.
Based on the first assumption we derive the degree distribution P (K) [14,28].In the absence of the entry of new nodes (a = 0) the probability distribution of the number of links at large t, i.e. the distribution P (K), is exponential: where K = 2t/N 0 is the average number of links per node, which linearly grows with time. 2f a > 0, P (K) becomes a Yule distribution which behaves as a power law for small K: where ϕ = 2 + a/(1 − a) ≥ 2, followed by the exponential decay of Eq.( 1) for large Hence, in the limit of large t when a = 0 (no entry), the distribution of P (K) converges to an exponential; on the contrary when a > 0 and small the connectivity distribution at large t converges to a power-law with an exponential cutoff [15].
By using the second assumption we can compute the growth rate of the strength of nodes.The strength of node i is given by The growth rate is measured as g = ln(W (t + 1)/W (t)).Thus, the resulting distribution of the growth rates of node strength P (g) is determined by where P (K) is the connectivity distribution, computed in the previous stage of the model and P (g|K) is the conditional distribution of growth rates of nodes with given number of links determined by the distribution P (w) and P (x).
Fu and colleagues [16] find an analytical solution for the distribution of the growth rates of the weights of links P (g) for the case when a → 0 and t → ∞, P (g) has similar behavior to the Laplace distribution for small g i.e.P (g) ≈ exp(− √ 2|g|/ V g )/ 2V g , while for large g, P (g) has power law tails P (g) ∼ g −3 which are eventually truncated for g → ∞ by the distribution P (x) of the growth rate of a single link.
A further implication of the model that can be derived from the second assumption concerns the distribution of link weights P (w).The proportional growth process (Assumption 2) implies that the distribution of the weights P (w) converges to a log-normal.Thus node strength W is given by the sum of K log-normally distributed stochastic values.Since the log-normal distribution is not stable upon aggregation, the distribution of node strength P (W ) is multiplied by a stretching factor that, depending on the distribution of the number of links P (K) could lead to a Pareto upper tail [29].
Moreover, a negative relationship exits among the weight of links and the variance of their growth rate.Our model implies an approximate power-law behavior for the variance of growth rates of the form σ(g) = W −β(W ) where β(W ) is an exponent that weakly depends on the strength W .In particular, β = 0 for small values of W , β = 1/2 for W → ∞, and it is well approximated by β ≈ 0.2 for a wide range of intermediate values of W [17] .
Finally, the model yields a prediction also on the relation between the degree K and the strength W of each node.In Section 4 we show that since the weight of each link is sampled from a log-normal distribution (w are log-normally distributed), and given the skewness of such a density function, the law of large numbers does not work effectively.In other words, the probability to draw a large value for a link weight increases with the number of draws, thus generating a positive power law relationship between W and K, for small K.

The Empirical Evidence
To test our model we use the NBER-United Nations Trade Data [31] available through the Center for International Data at UC Davis.This database provides bilateral trade flows among countries over 1962-2000, disaggregated at the level of commodity groups (4-digit level of the Standard International Trade Classification, SITC).Data are in thousands US dollars and, for product-level flows, there is a lower threshold at 100,000 dollars below which transactions are not recorded.One point to note is that disaggregated data are not always consistent with country trade flows: in a number of cases we do not observe any 4-digit transaction recorded between two countries, but nevertheless find a positive total trade, and vice-versa.Since we take the number of product traded among any two country pairs as the empirical counterpart of the number of transactions, to avoid inconsistency we compute the total trade by aggregating commodity-level data.
In this Section we test the predictions of our model while in the following Section, we use the data to calibrate the simulations and check for the ability of the model to replicate real-world phenomena by comparing simulated and actual trade flows.We already know from previous work [24] that the main features of the ITN are broadly consistent with our model.Here we look in more details at some specific characteristics of ITN. Figure 1 shows that the distribution P (K), that is the number of 4-digit SITC products traded by countries, is power-law distributed with an exponential cutoff.
The main plot displays the probability distribution in log-log scale, whereby the power-law is the straight line body, and the exponential cutoff is represented by the right tail.The inset presents the same distribution in semi-log scale: this time it is the exponential part of the distribution that becomes a straight line, so that with this trick we can magnify what happens to the probability distribution as K grows large.As discussed in Section 2 above, the power law distribution of K hints at the existence of moderate entry of new nodes in the network.Indeed, 17 new countries enter into the ITN during the observed time frame, mostly due to the collapse of the Soviet Union and Yugoslavia.Complementary cumulative distribution of the strength distribution P (W ) (aggregate flows) and link weights P (w) (commodity flows) and their power-law fits (dashed lines) [34] Moving to the weighted version of the network, one can look at the distribution of positive link weights as measured by bilateral trade flows at the commodity level, P (w), as well as the total value of country trade or node strength P (W ). Figure 2 shows the complementary cumulative probability distribution of trade flows in log-log scale, both for product-level transactions and for aggregate flows.Figure 2 refers to 1997 data (other years display the same behavior).We observe that both distributions show the parabolic shape typical of the log-normal distribution, thus conforming to previous findings [23,24].As predicted, upon aggregation the power-law behavior of the upper tail become more pronounced [29].However, this departure from log-normality concerns a very small number of observations (0.16% in the case of commodities flows, 2.21% for aggregate flows) since only a few new nodes (countries) enter the network over time.
As for the growth of trade flows, Figure 3 shows the empirical distribution P (g) together with the maximum likelihood fit of Eq.( 4) as well as the Generalized Exponential Distribution (GED, with shape parameter 0.7224).the growth of nodes centrality, as measured by strength W , follows the same law of the fluctuations of the size of complex systems [16,35].This is not surprising, since the size of an airport can be measured by the number of the passengers that travel through it and the size of a firm in terms of sales is given by the sum of the value of each product it sells.Thus the theoretical framework of Stanley and colleagues [16] complements and completes the BA proportional growth model in the case of weighted networks.
As discussed in Section 2, our model implies a negative relationship between node strength and the variance of its growth rate.Figure 4 reports the standard deviation of the annual growth rates of node strength (g) and their initial magnitude (W ).The standard deviation of the growth rate of link weights exhibits a power law relationship σ(g) = W −β with β ≈ .2, as predicted by the model [17].This implies that the fluctuations of the most intense trade relationships are more volatile than expected based on the central limit theorem.
All in all, our model accurately predicts the growth and weight distribution of trade flows, the number of commodities traded and the size-variance relationship of trade flows.Thus we can conclude that a stochastic model that assumes a proportional growth of the number of links combined with an independent proportional growth process of link weights can reproduce most of the observed structural features of the world trade web and should be taken as a valid stochastic benchmark to test the explanatory power of alternative theories of the evolution of international trade and weighted networks in general.In the next section we will compare the structure of random networks generated according to our model and with the real world trade network.
a population with a specific distribution.Both KS and AD tests quantify a distance between the empirical distribution function of the sample and the cumulative distribution function of the reference distribution.The AD test gives more weight to the tails than the KS test.More detailed information are available in [32,33].

Simulation Results
Based on the assumptions in Section 2 we generate a set of random networks and fit them with real world data in order to test the predictive capability of our theoretical framework.We will proceed in two steps.First, we generate the unweighted network according to the first set of assumptions.Next, we assign the value of weights based on a random sampling of K values from a log-normal distribution P (w) whose parameters are obtained through a maximum likelihood fit of the real world data.We model a system where at every time t a new link is added, which represents the possibility to exchange one product with a trading partner.We slightly modify the original setting in order to account for the possibility that the new links could be assigned randomly rather than proportionally to node connectivity.Thus in our simulations the parameter a governs the entry of new nodes according to Assumption 1, whereas parameter b is the probability that a new link is assigned randomly.Thus with probability a the new link is assigned to a new source node, whereas with probability 1−a it is allocated to an existing node i.In the latter case, the probability of choosing node i is now given by p i (t) = (1−b)K i (t−1)/2t+b/N t−1 where N t−1 is the number of nodes at time t − 1.The target of the new link is chosen symmetrically with i = j.
By tuning the two model parameters a and b we generate different networks in terms of the connectivity distribution of trade links P (K).In particular, without entry (a = 0) and completely random allocation of opportunities (b = 1) one obtains a random graph characterized by a Poisson connectivity distribution [36], whereas allowing entry (a > 0) P (K) is exponentially distributed.Keeping a positive entry rate, but assigning opportunities according to a preferential attachment model (b = 0) the model leads to a power-law connectivity distribution with an exponential cut-off which is more pronounced the higher is the number of initial nodes N 0 .In the limit case in which entry of new nodes is ruled out (a = 0) then the connectivity distribution tends toward a Bose-Einstein geometric distribution.
We compare the structure of random scale-free model networks with the real world trade network in 1997.Since the structure of the network is highly stable over time results do not change substantially if we compare simulations with the structure of the real world network in different years.In the first stage, we generate one million networks with a and b both ranging from 0 to 1.We simulate random networks of 166 nodes (countries) and 1,079,398 links (number of different commodities traded by two countries).The number of commodities traded is taken as a proxy of the number of transactions.Next we select the random networks that better fit the real world pattern in terms of correlation, as measured by the Mantel r test, and connectivity distribution. 4igure 5 reports the value of the Mantel test for networks with 0 ≤ b ≤ 1 and an entry rate a which implies the entry form 0 to 66 countries.The Mantel correlation statistics reach a peak of .88(p < .01) in the case of preferential attachment regimes (b = 0).However, the Mantel test does not discriminate among different entry regimes.We next compare the connectivity distribution of simulated networks with the real world distribution of the number of traded commodities P (K) by means of the Kolmogorov-Smirnov (KS) goodness of fit test.Figure 6 confirms that the best fit is obtained in the case of a purely preferential attachment networks  Figure 7 shows that the our model can better reproduce the connectivity distribution with and entry rate a > 0 that implies the entry of 14-18 countries.This closely corresponds to the empirically observed number of new countries.Thus we can conclude that a simple proportional growth model with mild entry can account for the distribution of the number of commodities traded by each pair of countries.
By introducing the value of the transactions we can show that the model generates the observed relationship between intensive and extensive margins of trade.Figure 8 depicts the relationship between total trade flows (W ) and the number of trade links maintained by each country (K).Empirically, we proxy the number of transactions by means of the number of products traded by each country.Figure 8 displays the relationship that emerges from 1997 trade data, and confirms that there exists a positive correlation between the two variables.The slope of the interpolating line (1.33) in double logarithmic scale reveals a positive relationship between the number of commodities and their average value of the kind W = K θ with θ ≈ 1.33.
The curve displays and upward departure in the upper tail.This can be ex-  plained by noticing that the product classification that we used imposes a ceiling to the number of products a country can trade since there are only around 1,300 4-digit categories (vertical dotted line). 5part from the upper decile of the distribution, the simulated version of the network shows exactly the same dependence among the magnitude and the number of the transactions.This seems surprising by considering that the model assumes two independent growth processes for the number of transactions K and their values w.However, it should be noticed that the law of large numbers does not work properly in case of skew distributions such as the log-normal.Given a random number of transactions with a finite expected value, if its values are repeatedly sampled from a log-normal, as the number of links increases, the average link weight will tend to approach and stay close to the expected value (the average  for the population).However this is true only for large K, while according to the distribution P (K) the vast majority of nodes has few links (small K).The higher is the variance of the growth process of link weights, the larger has to be K to start observing convergence toward W = wK θ with θ = 1 predicted by the law of large numbers.Thus only the largest countries approach the critical threshold.In sum, our simulations demonstrate that our model can account for the relationship between K and W that has been observed in many real world weighted networks [10,13].

Discussion and Conclusions
Using a simple model of proportionate growth and preferential attachment we are able to replicate some of the main topological properties of real-world weighted networks.In particular, we provide an explanation to the power-law distribution of connectivity, as well as for the fat tails displayed by the distribution of the growth rates of link weights and node strength.Additionally, the model matches the log-normal distribution of positive link weights (trade flows in the present context) and the negative relationship between node strength and variance of growth  The main contribution of the paper is to offer an extension of the BA model for weighted networks.Besides, we provide further evidence that such a unifying stochastic framework is able to capture the dynamics of a vast array of phenomena concerning complex system dynamics [16].
Further refinements of our model entail investigating its ability to match other topological properties of the networks such as assortativity and clustering.

Figure 1 :
Figure 1: Distribution of the number of products traded -1997.Double logarithmic scale (main plot) and semi logarithmic scale (inset)

Figure 2 :
Figure 2: The distribution of the link weights and node strength in year 1997.Complementary cumulative distribution of the strength distribution P (W ) (aggregate flows) and link weights P (w) (commodity flows) and their power-law fits (dashed lines)[34]

Figure 3 :
Figure 3: Distribution of the growth rates of aggregate trade flows P (g)

Figure 4 :
Figure 4: Size-variance relationship between nodes strength W (trade values) and the standard deviation of its growth rate σ(g), double logarithmic scale

Figure 5 :
Figure 5: Mantel test comparing simulated and real networks

Figure 6 :
Figure 6: Kolmogorov-Smirnov goodness-of-fit test for different entry rates and probabilities of random assignment

Figure 7 :
Figure 7: Kolmogorov-Smirnov goodness-of-fit test for different entry rates in a pure preferential attachment regime (b = 0)

Figure 8 :
Figure 8: The relationship between the number of products traded and trade value.Double logarithmic scale.Simulated (back) and real-world (red) data, mean and one standard deviation in each direction.The dashed line represents the reference line W = K θ with θ ≈ 1.33 fluctuations σ(g) = W −β with β ≈ .2.The main contribution of the paper is to offer an extension of the BA model for weighted networks.Besides, we provide further evidence that such a unifying stochastic framework is able to capture the dynamics of a vast array of phenomena concerning complex system dynamics[16].Further refinements of our model entail investigating its ability to match other topological properties of the networks such as assortativity and clustering.

Table 1 :
Kolmogorov-Smirnov (KS) and Anderson-Darling (AD) goodness of fit tests for the distribution of growth rates of trade flows P (g)