This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Paper The following article is Open access

Triadic closure dynamics drives scaling laws in social multiplex networks

and

Published 7 June 2013 © IOP Publishing and Deutsche Physikalische Gesellschaft
, , Citation Peter Klimek and Stefan Thurner 2013 New J. Phys. 15 063008 DOI 10.1088/1367-2630/15/6/063008

1367-2630/15/6/063008

Abstract

Social networks exhibit scaling laws for several structural characteristics, such as degree distribution, scaling of the attachment kernel and clustering coefficients as a function of node degree. A detailed understanding if and how these scaling laws are inter-related is missing so far, let alone whether they can be understood through a common, dynamical principle. We propose a simple model for stationary network formation and show that the three mentioned scaling relations follow as natural consequences of triadic closure. The validity of the model is tested on multiplex data from a well-studied massive multiplayer online game. We find that the three scaling exponents observed in the multiplex data for the friendship, communication and trading networks can simultaneously be explained by the model. These results suggest that triadic closure could be identified as one of the fundamental dynamical principles in social multiplex network formation.

Export citation and abstract BibTeX RIS

Content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

Social networks often exhibit statistical structures that manifest themselves in scaling laws that can be quantified through a set of characteristic exponents. Maybe the three most relevant scaling laws in terms of network formation are the linking probability for new nodes joining the network as a function of degree of the existing (linked-to) node, degree distribution and clustering coefficients of nodes as a function of their degree. In particular, the probability for a node to acquire a new link, the attachment kernel Π(k), often scales with the node degree k [1, 2] as

Equation (1)

The degree distribution of social networks, i.e. the probability of finding a node with a given degree k, P(k), often shows features of exponential, fat-tailed distributions [3, 4] or something in between, depending on the type of social interaction [5, 6]. They can be parameterized conveniently by the q-exponential [7, 8]

Equation (2)

with q being a parameter that determines an asymptotic scaling exponent 1/(1 − q). A third scaling law, which is ubiquitous in social networks [5, 6, 9, 10], is observed for the clustering coefficients c(k) as a function of node degree,

Equation (3)

Despite the overwhelming empirical evidence for the scaling laws in equations (1)–(3), it is still undecided whether they share a common dynamical origin, and if and how characteristic exponents are related to each other. For example, for growing network models, where new nodes are constantly added that link through a preferential attachment rule to already existing nodes [3], a relation between scaling exponents of the degree distribution and the attachment kernel γ has been found [11]. However, these models cannot explain the observed scaling of the clustering coefficients. Moreover, the preferential attachment process [3] requires global information (the degrees of all nodes in the network) to establish a new social tie, which is clearly an unrealistic assumption for most social networks. To overcome this problem, growth and preferential attachment mechanisms have been extended by local network formation rules [1214, 16], where a node's linking dynamics only depends on its neighbors or second neighbors. One such local rule that is extremely relevant for social network formation is the principle of triadic closure [17, 18], which means that the probability of a new link to close a triad is higher than the probability to connect any two nodes. Scaling laws for the degree distribution [13], degree distribution and clustering coefficients [14, 15] and preferential attachment [16] have been reproduced in the context of specific models using triadic closure. Although it is instructive to see how a combination of growth, preferential attachment and clustering processes gives rise to the three scaling laws above, this does not help us to understand if the existence and possible inter-relations of the three exponents can emerge from a single underlying dynamical origin, and to what extent this common origin is an actual feature of real social network formation processes. Less is known on relations between characteristic exponents in non-growing, stationary networks [7, 19]. It has been shown that triadic closure is related to scaling laws for the degree distribution and clustering coefficients in the stationary case [2023].

Here we study a simple model that simultaneously explains the three scaling laws in equations (1)–(3) based on the process of triadic closure in non-growing networks. This process introduces a mechanism from which preferential attachment emerges, leads to fat-tailed degree distributions and induces scaling of the clustering coefficients with node degrees. The model is validated with the data from a social multiplex, i.e. a superposition of several social networks labeled by α with adjacency matrices Mα, defined on the same set of nodes [24]. The model can be fully calibrated with the multiplex data and explains three observed characteristic exponents for three different sub-networks of the multiplex.

1. Results

1.1. Model specification

The model is built around the process of triadic closure, the principle that links tend to be created between nodes that share a neighbor. The model includes the addition and removal of nodes. The network is initialized with N nodes, each node having one link to a randomly chosen node. The dynamics is completely specified by an iteration of the following steps, starting at t:

  • (i)  
    Pick a node i at random. If i has less than two links, create a link between i and any randomly chosen node and continue with step (iii). If i has two or more links, choose one of its neighbors at random, say node j, and continue with step (ii).
  • (ii)  
    With probability r (triadic closure parameter), create a link between j and another randomly chosen neighbor of i, say k. With probability 1 − r, create a link between j and a node randomly chosen from the entire network, see figure 1.
  • (iii)  
    With probability p (node-turnover parameter) remove a randomly chosen node from the network along with all its links and introduce a new node linking to m randomly chosen nodes. Then continue with time-step t + 1.
Figure 1.

Figure 1. Node i (with more than two links) and one of its neighbors j are randomly selected. With probability r, the process of triadic closure takes place (the triad consists of i,j,k); with probability 1 − r j links to a random node.

Standard image High-resolution image

For p > 0, nodes have a finite lifetime, which implies that the network reaches a stationary state where the total number of links L(t) and the network measures Π(k), P(k) and c(k) fluctuate around steady-state levels. The model is a variant of the model proposed in [20], which is contained as the special case r = 1 in the above protocol. Our model can also be seen as a stationary version of the connecting nearest-neighbors model in [14]. Combinations of triadic closure and random edge attachment have also been studied in growing [13, 15] and weighted [22] networks. Reaching a stationary state is independent of m. The model is completely specified by four parameters, N, r, p and m.

1.2. Estimation of model parameters

Social ties are often established between two individuals by being introduced by a mutual acquaintance. Other modes of social tie formation, such as random encounters, may not lead to triadic closure. Step (ii) in the above protocol captures these two linking processes. Ties also change because people enter and leave social circles; for example, they change workplaces, move to different cities or change their hobbies. This is incorporated in step (iii). To calibrate the model to a real-social multiplex network, Mα with Nα nodes and Lα links, the stationarity assumption has to be checked and the parameters for triadic closure r and node-turnover p have to be estimated. Consider the average number of nodes entering (Δn+α) and leaving (Δnα) the network Mα per time unit. For stationarity to hold, we demand

Equation (4)

i.e. the net growth rate is much smaller than the rates at which nodes enter or leave the network. The triadic closure parameter rα can be directly measured as the ratio between the number of links in network Mα which—at their creation—close at least one triangle and the total number of created links. The node-turnover parameter p can be estimated by demanding that the number of links in the model and in the real network are the same. To see this, note that one adds on average Δl+ and removes Δl links per time step. Stationarity means that Δl+ = Δl. Because one link is created at each time step in either step (i) or (ii) and with probability p, m links are added in step (iii), we have Δl+ = 1 + pm. Denoting the average degree by $\bar k = \frac {2 N}{L}$ , with probability p, in step (iii), one removes on average $\bar k$ links per time step, $\Delta l^- = p \bar k$ . To calibrate the model to a network Mα, the turnover parameter pα is

Equation (5)

The model is initialized with Nα nodes and the dynamics follows the protocol with parameters rα and pα. After a transient phase the number of links fluctuates around Lα, and the scaling exponents γ,q and β approach stationary values.

Calibration of the model requires complete, time-resolved topological information Mα(t) over a large number of link-creation processes. Suitable data are available for example in the social multiplex network of the online game 'Pardus' [6, 2528]; see the Methods section. Table 1 summarizes key features of Mα, including the number of nodes Nα, and links Lα for the Pardus friendship (α = 1), communication (α = 2) and trade (α = 3) networks. Table 1 also lists the average degree $\bar k_{\alpha }$ , as measured on the last day of the observation record, and the average number of nodes entering (Δn+α) and leaving (Δnα) per day, confirming that the networks are in fact stationary in the sense of equation (4). Estimates for r and p are also shown in table 1.

Table 1. Summary of network measures and model results. For the Pardus friendship (α = 1), communication (comm., α = 2) and trade (α = 3) networks, the number of nodes Nα, links Lα, average degree $\bar k_{\alpha }$ and average number of nodes entering and leaving the network per day, Δn+α and Δnα, are shown. The results of the calibration of the model to the empirical networks, r and p, are given together with the fit results of the parameters γ, q and β for the data and the model.

Type Network features Parameter Exponents (data and model)
  α Nα Lα $\bar k_{\alpha }$ Δn+α Δnα rα pα γ γmod q qmod β βmod
Friends 1 4547 21 622 9.5 24.26 23.07 0.58 0.12 0.88(4) 0.77(2) 1.16(1) 1.116(2) 0.69(3) 0.66(3)
Communication 2 2810 9420 6.7 110.2 109.4 0.57 0.18 0.84(1) 0.76(2) 1.24(1) 1.148(3) 0.59(3) 0.78(3)
Trade 3 4514 31 475 13.9 58.58 56.19 0.80 0.08 0.83(1) 0.80(1) 1.073(1) 1.102(1) 0.63(3) 0.60(3)

1.3. Characteristic exponents

Simulation results for the values of the characteristic exponents γ,q and β in the model depend on the parameters p and r, as shown in figure 2. We fix N = 103 and m = 0. Results are averaged over 500 realizations for each parameter pair (p,r). All three scaling exponents, equations (1)–(3), can be explained by the model.

Figure 2.

Figure 2. Dependence of scaling exponents γ, q and β on the model parameters p and r. (a) γ becomes closer to one for high p or r and is confined to the interval 0 < γ < 1. (b) q is large for small p and large r and approaches one for large p. (c) β is close to zero for r close to zero and approaches β = 1 for large values of p and r.

Standard image High-resolution image

Model exponents for γ fall in the range 0 < γ < 1, depending on p and r, figure 2(a). Exponent γ is close to one for high p and high r. The preferential attachment associated with triadic closure is therefore sub-linear. The dependence of the exponent q on both p and r is shown in figure 2(b). Note that for q = 1 the q-exponential is equivalent to the exponential. Values of q above (below) one indicate that the distribution decays slower (faster) than the exponential. For small p and large r, q is significantly larger than one and degree distributions are fat tailed. For large p the values of q approach one, independent of r. Values for β are close to zero for r = 0 or p going to 0. β approaches a plateau at β = 1 for high values of p and r; see figure 2(c).

For the experimental validation of the model, figure 3 shows the attachment kernel Πα(kα), degree distribution Pα(kα) and clustering coefficients cα(kα) for the three sub-networks Mα of the empirical multiplex data. They are compared with the respective distributions of the calibrated model (results averaged over 20 realizations). Data and model results are logarithmically binned; a version of figure 3 showing raw data can be found in the supplementary information (available from stacks.iop.org/NJP/15/063008/mmedia).

Figure 3.

Figure 3. Network scaling exponents of the social multiplex can be explained by the calibrated model. Results are shown for the Pardus friendship (α = 1, left column), communication (α = 2, middle column) and trade (α = 3, right column) networks. All data are logarithmically binned. Top row: the attachment kernels scale sub-linearly with the node degrees in each case for data (γ) and model (γmod). Curves for data and model are barely distinguishable from each other. Middle row: degree distributions for α = 1,2,3 and best fits of a q-exponential, for data (q) and model (qmod). Bottom row: the scaling of the average clustering coefficients as a function of degree is compared between data and model. Fits for β and βmod yield almost the same results for friends and trades, with comparably larger deviations for the communication network. The model results for cα(kα) show an upward curvature for high kα.

Standard image High-resolution image

The observed preferential attachment in the data is in good agreement with model results for each network Mα; see the top row of figure 3. We find exponents of γ = 0.88(4) for the data and γmod = 0.77(2) in the model for the friendship network, γ = 0.84(1), γmod = 0.76(2) for communication and γ = 0.83(1), γmod = 0.80(1) for trade. Data and model curves for Πα(kα) are barely distinguishable from each other. The model fits the number of friends per player with exponents q = 1.16(1) and qmod = 1.116(2) for α = 1, q = 1.24(1) and qmod = 1.148(3) for α = 2, and q = 1.073(1) and qmod = 1.102(1) for α = 3. Results are shown in the middle row of figure 3. Data and model show similar scaling of the average clustering coefficient of nodes cα(kα) as a function of their degree kα; see the bottom row of figure 3. For friendships (α = 1) we find β = 0.66(3); for the model βmod = 0.69(3). For communication (α = 2) the data yield β = 0.59(3); the model gives βmod = 0.78(3). For trade (α = 3) there is good agreement between data and model with β = 0.63(3) and βmod = 0.60(3), respectively. The model results for cα(kα) show a curvature and are not straight lines. Comparing the curves for α = 1,2,3 suggests that this curvature increases with the average degree $\bar k_{\alpha }$ . Values for βmod should be interpreted as first-order approximations for the slopes of these curves. Results for the exponents γ,q and β for data and model are summarized in table 1.

2. Discussion

We reported strong evidence that the process of triadic closure may play an even more fundamental role in social network formation than previously anticipated [17, 18]. Given that all model parameters can be measured in the data, it is remarkable that three important scaling laws are simultaneously explained by this simple triadic closure model. Since exponents γ, q and β are sensitive to choices of the model parameters p and r, the agreement between data and model is even more remarkable.

The Pardus multiplex data contain three other social networks, where links express negative relationships between players, such as enmity, attacks and revenge [6]. Triadic closure is known to be not a good network formation process for negative ties, 'the enemy of my enemy is in general not my enemy' [29]. It was shown that the probability of triadic closure between three players is one order of magnitude smaller for enmity links when compared with friendship links in the Pardus multiplex [6, 25]. The model is therefore not suited to describe network formation processes of links expressing negative sentiments.

The findings in the current model also compare well to several facts of real-world social networks. Sub-linear preferential attachment has been reported in scientific collaboration networks and the actor co-starring network (Π(k)∝k0.79 and ∝k0.81, respectively [2]). Degree distributions of many social networks often fall between exponential and power-law distributions [35, 25, 30], and scaling of the average clustering coefficients as a function of degree has been observed in the scientific collaboration and actor networks with values for c(k)∝k−0.77 and ∝k−0.31, respectively (when the same fitting as in figure 3 is applied). Mobile phone and communication networks give ∝k−1 [31].

In the Pardus dataset, players are removed if they choose to leave the game or if they are inactive for some time [25]. In the mobile communication, actor and collaboration networks, a link is established by a single action (phone call, movie or publication) and persists from then on. Note that our model addresses the empirically relevant case where node-turnover rates (Δn+αnα) are significantly larger than the effective network growth rate (Δn+α − Δnα). For growing networks (without node deletion) it has been shown that sub-linear preferential attachment (γ < 1) leads to degree distributions with power-law tail with an exponent proportional to γ [11]. Something similar can be observed in the present model. If we keep the node-turnover parameter p fixed and decrease the triadic closure parameter r, figures 3(a) and (b) show that γ decreases and q approaches one. The network is dominated by randomly created links. However, if we fix r = 1 (only triadic closure, no random links) and increase p, figures 3(a) and (b) show that q approaches one despite an increase in γ. An increase of the node-turnover parameter p implies a shorter lifetime for individual nodes and hence a shorter time in which they may acquire new links. Consequently, the degree distribution only has a substantial right skew if both p ≲ 0.25 and r ≳ 0.5 hold.

3. Methods

3.1. Multiplex data

The Pardus dataset allows us to continuously track all actions of more than 3 70 000 players in an open-ended, virtual, futuristic game universe where players interact in a multitude of ways to achieve their self-posed goals, such as accumulating wealth and influence. Players can establish friendship links, exchange one-to-one messages (similar to phone calls) and trade with each other. We focus on three sub-networks (friendship, communication and trade) of the multiplex, over 1 year from September 2007 to September 2008. Network label α = 1 refers to the friendship network, α = 2 for communication and α = 3 for trade. In the friendship network a node is present on a given day if at least one friendship link to another node exists on that day. A node is removed if the player either leaves the game or has no friendship link. The same holds for the message and trade networks, where a link exists between two nodes on day t if at least one message (trade) is exchanged within the period of six days, $\left [ t-6,t \right ]$ . For details of structural and dynamical properties of the Pardus multiplex, see [6, 2528].

To measure the degree distributions Pα(kα) and clustering coefficients cα(kα), we use the adjacency matrix of the networks Mα on the last day of the data record. The preferential attachment probability Πα(kα) is measured by counting (over the entire observation period) the number of link-creation events, in which a node with degree k acquires a new link, and then dividing this by the average number of nodes with degree k, where the average is again taken over the observation period.

3.2. Fitting procedures

Power-law fits (least squares) to the logarithms of the logarithmically binned data in figure 3 are shown for γ, for 2 < k(α) < 100, and for β over the range 5 < k(α) < 100, for each α, for data and model. The reported errors are the standard deviations of the coefficients. For the degree distributions the data are also logarithmically binned and fitted over the entire range k(α) > 0 in figure 3 with equation (2). The coefficients are obtained as maximum likelihood estimates, and reported errors correspond to the 95% confidence intervals. For better comparison and to diminish the effect of outliers, data and model results for Πα(kα) are normalized over the range kα ⩽ 100. Higher values correspond to data outliers, often due to the behavior of non-serious players.

Acknowledgments

This work was supported by Austrian Science Fund FWF P23378 and EU FP7 projects MULTIPLEX no. 317532 and LASAGNE No. 318132. We thank B Fuchs and M Szell for data issues.

Please wait… references are loading.
10.1088/1367-2630/15/6/063008