Optimal timescale for community detection in growing networks

Matúš Medo; An Zeng; Yi-Cheng Zhang; Manuel S Mariani

doi:10.1088/1367-2630/ab413f

1. Introduction

Many systems that are of interest for social science, information science, and data mining can be represented as complex networks that are not static but grow with time. For example, the global scientific output grows at a fast and accelerating pace, which results in growing citation networks of scientific papers that represent our accumulated scientific knowledge [1, 2]. The World Wide Web [3] and social contact networks [4, 5] grow as well. The presence of growth challenges traditional network analysis [6, 7] and makes it essential to develop and validate time-aware methods to achieve a solid understanding of the structure of these systems [8]. In particular, extensive research has shown that the inclusion of temporal information into network analysis has a dramatic impact on long-studied problems such as community detection [9–11], node ranking [11–13], dynamics control [14], and spreading phenomena [15–17].

This article focuses on one of the fundamental problems in network science, the detection of communities [18], which has received enormous attention from diverse research areas, including physics [18], computer science [19], ecology [20], neuroscience [21], and social science [6, 22], among others. While the problem is not uniquely defined [18], it can be generically described as the problem of determining whether there exists a meaningful partition of the network nodes into groups of nodes. The problem (also known as clustering in computer science [19, 23]) has a long history, and it has been traditionally addressed using structural, static network analysis.

A popular approach to community detection is to maximize a function, called modularity, which quantifies how much the total number of intra-community edges deviates from its expected value under a null model that preserves the individual nodes' number of connections [24]. Modularity has been studied from many viewpoints, and it is widely-recognized as a standard tool in network analysis [18]. Despite past research and the wide use of modularity optimization in a broad range of contexts, we still lack a systematic understanding of its behavior and performance in growing networks where time and aging phenomena are fundamental [25–27]. Albeit modularity has been used in such systems in its original form [28–30], the results can be expected to be suboptimal as modularity neglects the vital time information. A multi-layer form of modularity has been developed that can take into account network snapshots at various times [31, 32]. However, when we wish to apply a multi-layer approach to identify relevant communities in growing networks, we face an impasse: existing works assume layered input data [31–34] and thus they do not consider the question of how to divide an arbitrary time-stamped network into layers. Addressing this question requires to choose an appropriate observation timescale, i.e. the temporal duration for each layer [5, 35, 36]. This choice is essential because different timescales might reveal substantially different community structures, which in turn might lead to different conclusions on the large-scale organization of the system.

In this work, we derive analytically a criterion to estimate when a time-aggregated, static view of a growing network ceases to be sufficient for effective community detection through standard modularity maximization. When this criterion is not met, the detected communities are strongly determined by node age and therefore in discordance with the network's actual community structure. We introduce the observation timescale τ_O, divide the input network into subsequent layers of temporal duration τ_O each, construct a corresponding multi-layer modularity function, and use the resulting community detection method on diverse synthetic and real datasets. Remarkably, we find that the observation timescale ${\tau }_{O}^{* }$ that best uncovers the ground-truth communities in synthetic data is tightly related to the inherent aging timescale τ_S of the system's growth dynamics: ${\tau }_{O}^{* }\simeq {\tau }_{S}$ . We use both synthetic and real data to show that different choices of the temporal resolution parameter lead to very different detected communities and conclusions on their statistical significance. Our results provide clear guidelines for data-driven calibration of multi-layer community detection techniques. Beyond the particular problem of community detection, the connection between the observation timescale τ_O used for structural analysis and the system's intrinsic timescale τ_S is relevant to the general problem of analyzing the structure and function of the broad variety of networks that evolve in time.

2. Impact of network growth on modularity

Before detailing the multi-layer community detection method, we start by demonstrating how temporal effects impair the ability of the traditional modularity maximization to uncover the community structure of growing networks with aging. Since aging is common for information networks where connections between the items are usually directed (such as citations in a scholarly citation network and followers in a social network), we use here the formalism of directed networks. A similar analysis is possible for undirected and bipartite networks.

2.1. Modularity

The classical Newman–Girvan modularity function [37] has been adapted to quantify the quality of a partition into communities in directed networks [38, 39] as

$\begin{eqnarray}&&Q=\displaystyle \frac{1}{m}\displaystyle \sum _{i,j}\left({A}_{{ij}}-\displaystyle \frac{{k}_{i}^{{\rm{out}}}{k}_{j}^{{\rm{in}}}}{m}\right)\delta ({c}_{i},{c}_{j}),\end{eqnarray} \tag{ 1 }$

where m is the number of network links, A_ij is an element of the adjacency matrix which is 1 if node i points to node j and zero otherwise, k_i^out and k_iⁱⁿ are respectively the out- and in-degree of node i, c_i denotes the community of node i, δ(c_i, c_j) is the Kronecker delta which is 1 when c_i = c_j and zero otherwise, and the summation indexes i and j run over all N network nodes. Equation (1) is referred to as static modularity from now on. The task is to find the network partition that maximizes static modularity which thus serves as the objective function. Among the various existing approaches to modularity optimization, the Louvain algorithm [40] is a popular choice.

The negative term in equation (1) represents the expectation of A_ij for a random network that has the same in- and out-degree sequence as the original network. However, such randomization is of limited use in growing networks where time plays an important role in the nodes' connection patterns [26, 27]: it neglects the original network's temporal properties and, consequently, it can violate the network's fundamental temporal constraints. In often-studied citation networks, for example, it generates 'unphysical' randomized networks where papers can also cite future papers [41]. To demonstrate the problem of standard modularity, and later to assess its modification suitable for growing networks, we set up a simple network growth model based on the classical preferential attachment process with aging [26].

2.2. Model for growing networks with community structure

The model assumes that each node belongs to one of two ground-truth communities and preferentially (to a degree that can be tuned in the model) links to other nodes in the same community. The ground-truth community of node i, C_i, is chosen at random. The model can be easily extended to a case with more communities of various relative size. There are initially n₀ nodes that are all assumed to appear at time 0. The network then grows in time steps t = 1, ..., N − n₀. In each time step, one node is introduced in the network; the final number of nodes is thus N. Each introduced node creates k^out outgoing links to the existing nodes. The probability that node i points to an existing node j is

$\begin{eqnarray}&&{P}_{i\to j}=\displaystyle \frac{{X}_{{ij}}({k}_{j}^{{\rm{in}}}+1){{\rm{e}}}^{-(i-j)/{\rm{\Theta }}}}{{\displaystyle \sum }_{j}{X}_{{ij}}({k}_{j}^{{\rm{in}}}+1){{\rm{e}}}^{-(i-j)/{\rm{\Theta }}}},\end{eqnarray} \tag{ 2 }$

where k_jⁱⁿ is the current indegree of node j, exponential aging controlled by the parameter Θ is assumed, and the general preference for links between nodes i and j is encoded in the term X_ij which is described below. If node j has been chosen by node i before, the choice is repeated. In this way, there is at most one directed link between any two nodes in the network. Small Θ values result in 'short' links that connect the newly introduced node with other recently introduced nodes. As Θ increases, aging slows down and becomes negligible when Θ ≫ N. We refer to table 1 for a summary of the notation for all model parameters and network properties.

Table 1. Network model summary. Adopted notation for the model parameters and the resulting network properties.

Variable type	Variable	Meaning
Model parameter	n₀	Initial number of nodes
	N	Final number of nodes
	k^out	Outdegree of the introduced nodes
	Θ	Aging parameter
	μ	Community-mixing parameter

Network property	m	Number of links
	f_B	Fraction of inter-community links
	$\overline{k}$	Average degree

Note that equation (2) for simplicity omits the fitness term that is crucial to control the resulting network degree distribution [26]. We consider various model variants, including a variant with heterogeneous node fitness values, in appendix C. The community structure is introduced in the model by assuming

$\begin{eqnarray}&&{X}_{{ij}}=\mu [1-\delta ({C}_{i},{C}_{j})]+(1-\mu )\delta ({C}_{i},{C}_{j}),\end{eqnarray} \tag{ 3 }$

where C_i and C_j are the ground-truth communities of i and j, respectively. X_ij is 1 − μ if nodes i and j are in the same ground-truth community and μ if they are not. As a result, the number of links between the communities grows with μ. Other benchmark models for community detection, such as the Lancichinetti–Fortunato–Radicchi model [42], allow by construction to directly set the fraction of inter-community links in model networks. In our growth model, instead, μ only influences the preference for intra-community links in the network growth. The resulting fraction of inter-community links, f_B, therefore emerges by an interplay of this preference, the pool of available target nodes, and aging. Figure 1 shows that f_B grows with μ in a nonlinear yet monotonous manner. In our numeric simulations, we achieve the desired f_B values by choosing the appropriate μ(Θ).

**Figure 1.** The relation between the community-mixing parameter μ and the resulting fraction of inter-community links f_B. In the described network model, f_B shows a nonlinear yet monotonous dependence on μ. As an illustration, we show here the fraction of links between communities f_B as a function of μ for model data with ${n}_{0}=10,N=512,{k}^{{\rm{out}}}=10$ , and two ground-truth communities.
Download figure:
Standard image High-resolution image

**Figure 1.** The relation between the community-mixing parameter μ and the resulting fraction of inter-community links f_B. In the described network model, f_B shows a nonlinear yet monotonous dependence on μ. As an illustration, we show here the fraction of links between communities f_B as a function of μ for model data with ${n}_{0}=10,N=512,{k}^{{\rm{out}}}=10$ , and two ground-truth communities.
Download figure:
Standard image High-resolution image

2.3. Breakdown of static modularity in growing networks

Figure 2(A) shows the result of modularity maximization in a toy model network. We see that, despite the two true communities being visually well separated, the result is markedly wrong as the nodes are essentially clustered by their appearance time. To understand why modularity fails to recognize the true communities, we focus now on the simple case where the two ground-truth communities are of similar size and perfectly separated (in the model, this is achieved by setting μ = 0). For the correct partition into two communities that are identical with the ground-truth communities, the modularity contribution from true community k is

$\begin{eqnarray}&&{Q}_{k}=\displaystyle \frac{1}{m}\left(\displaystyle \sum _{i,j\in {{\mathsf{T}}}_{k}}{A}_{ij}-\displaystyle \frac{{k}_{i}^{{\rm{o}}{\rm{u}}{\rm{t}}}{k}_{j}^{{\rm{i}}{\rm{n}}}}{m}\right)=\displaystyle \frac{1}{m}\left(\displaystyle \frac{m}{2}-\displaystyle \frac{{\left(m/2\right)}^{2}}{m}\right)=\displaystyle \frac{1}{4}.\end{eqnarray} \tag{ 4 }$

The total modularity of the correct division is thus 1/2. We now study the impact of dividing one correct community into two parts with sizes N₁ and N₂ (N₁ + N₂ = N/2). Modularity contribution from these two parts that cover true community k is

$\begin{eqnarray}&&{Q}_{k}^{{\prime} }=\displaystyle \frac{1}{m}\left(\displaystyle \frac{m}{2}-{\rm{\Delta }}m-\displaystyle \frac{{\left(\overline{k}{N}_{1}\right)}^{2}}{m}-\displaystyle \frac{{\left(\overline{k}{N}_{2}\right)}^{2}}{m}\right).\end{eqnarray} \tag{ 5 }$

Here, Δm represents the 'loss' of links that are in the true community k but do not contribute to ${Q}_{k}^{{\prime} }$ because they run between the two parts that we now consider (i.e. they cross the partition boundary). The third and the fourth term represent the sum of expectation terms which goes over all ${N}_{1}^{2}$ pairs of nodes in the two parts into which k is subdivided. The smallest sum of expectation terms is achieved when N₁ = N₂ = N/4 (i.e. the true community is divided into two parts of the same size) when we obtain

$\begin{eqnarray}&&{Q}_{k}^{{\prime} }=\displaystyle \frac{1}{m}\left(\displaystyle \frac{m}{2}-{\rm{\Delta }}m-\displaystyle \frac{m}{8}\right)=\displaystyle \frac{3}{8}-\displaystyle \frac{{\rm{\Delta }}m}{m},\end{eqnarray} \tag{ 6 }$

where we used $m=\overline{k}N$ .

**Figure 2.** The breakdown of modularity in growing networks. (A) Temporal confinement of communities identified by the standard static modularity in a model network with fast aging ( $N=60,{\rm{\Theta }}=3,{k}^{{\rm{out}}}=10,\mu =0.05;$ the resulting fraction of links between the communities is f_B = 0.24). The horizontal position of nodes is given by their appearance time, the vertical position is given by their true community affiliation, and node color marks the community assigned by modularity maximization. We see that static modularity is essentially insensitive to the true communities and clusters the nodes by their appearance time. (B)–(D) The behavior of static modularity on model data with different aging timescales (in all simulations k^out = 5 and μ = 0, hence f_B = 0; results are averaged over 100 model realizations and the shaded areas visualize the 10th–90th percentile range). Albeit the two ground-truth communities are perfectly separated in the model data, from some network size, modularity optimization yields inferior results with increasing modularity (panel B) yet decreasing NMI (panel C). The loss of NMI is due to the number of identified communities n_C which is correctly two until some N but then it starts to grow (panel D). The vertical lines mark the analytically estimated thresholds of 4Θ for the three plotted Θ values.
Download figure:
Standard image High-resolution image

**Figure 2.** The breakdown of modularity in growing networks. (A) Temporal confinement of communities identified by the standard static modularity in a model network with fast aging ( $N=60,{\rm{\Theta }}=3,{k}^{{\rm{out}}}=10,\mu =0.05;$ the resulting fraction of links between the communities is f_B = 0.24). The horizontal position of nodes is given by their appearance time, the vertical position is given by their true community affiliation, and node color marks the community assigned by modularity maximization. We see that static modularity is essentially insensitive to the true communities and clusters the nodes by their appearance time. (B)–(D) The behavior of static modularity on model data with different aging timescales (in all simulations k^out = 5 and μ = 0, hence f_B = 0; results are averaged over 100 model realizations and the shaded areas visualize the 10th–90th percentile range). Albeit the two ground-truth communities are perfectly separated in the model data, from some network size, modularity optimization yields inferior results with increasing modularity (panel B) yet decreasing NMI (panel C). The loss of NMI is due to the number of identified communities n_C which is correctly two until some N but then it starts to grow (panel D). The vertical lines mark the analytically estimated thresholds of 4Θ for the three plotted Θ values.
Download figure:
Standard image High-resolution image

At this point, we can understand why a division of a single ground-truth community into more parts can increase modularity. If the average degree $\overline{k}$ is fixed, the number of links m is proportional to the number of nodes N. At the same time, aging suppresses the formation of 'long' links in the network and therefore implies an upper bound on the number of links between the two parts, Δm. As the network grows, the negative term Δm/m therefore decreases and, thanks to a higher absolute term, ${Q}_{k}^{{\prime} }$ thus eventually exceeds Q_k. At this point, modularity maximization results in dividing the true community k into two parts. As the network grows further, the divisions continue and the number of identified communities grows (see figure 2(D)).

We now estimate the network size at which the first division occurs. We do so assuming the exponential aging that we use in our numerical simulations. If we disregard preferential attachment, the probability that a link created under exponential aging targets a node introduced n steps ago is approximately $\exp (-n/{\rm{\Theta }})/{\rm{\Theta }}$ (we assume that Θ is large, so summation over individual n values can be replaced with integration that eventually yields the normalization factor Θ). The first node after the partition boundary must necessarily point all its outgoing links across the boundary. For a node n steps after the partition boundary, the fraction of boundary-crossing links created by this note is approximately ${\sum }_{i=n}^{\infty }\exp (-i/{\rm{\Theta }})/{\rm{\Theta }}\approx \exp (-n/{\rm{\Theta }})$ . The total number of boundary-crossing links is then obtained by multiplying with k^out and summing over all n values. In the summation, each n values carries the weight 1/2 because there are two communities and we count the boundary-crossing links in only one of them. Hence

$\begin{eqnarray}&&\overline{{\rm{\Delta }}m}\approx \displaystyle \frac{{k}^{{\rm{out}}}}{2}\displaystyle \sum _{n=1}^{\infty }{{\rm{e}}}^{-n/{\rm{\Theta }}}\approx \displaystyle \frac{{\rm{\Theta }}{k}^{{\rm{out}}}}{2}.\end{eqnarray} \tag{ 7 }$

When the initial number of nodes, n₀, is small, ${k}^{{\rm{out}}}\approx \overline{k}$ . The inequality ${Q}_{k}^{{\prime} }\gt {Q}_{k}$ can be now solved for N, yielding

$\begin{eqnarray}&&N\gt 4{\rm{\Theta }}.\end{eqnarray} \tag{ 8 }$

When this condition is met, network divisions into four (or more) parts are preferred to the correct division into two ground-truth communities.

Our analytically-derived criterion predicts accurately the breakdown of modularity in numeric experiments in the absence of inter-community links (figure 2, panels B–D). In particular, when the criterion defined by equation (8) is met, the optimal modularity is larger than the value (Q = 0.5) expected for a partition with two perfectly-separated communities (figure 2(B)), the Normalized Mutual Information (see appendix A for the definition) between detected and ground-truth communities is significantly smaller than one (figure 2(C)), and the number of detected communities is larger than two (figure 2(D)). In other words, in growing networks above some network size, modularity optimization fragments the ground-truth communities into smaller communities that are mostly determined by the nodes' age.

3. Community detection in growing networks

To resolve the limitations of static modularity, we propose the temporal modularity quality function building on the recently-introduced Dynamic Configuration Model (DCM) for growing networks [41] which proposes a way of randomizing time-stamped networks whilst approximately preserving the time evolution of each node's degree.

3.1. Multi-layer modularity

To define the temporal modularity function, we divide the network's links by time into L layers of an equal number of links; layers l = 1 and l = L contain the earliest and latest links, respectively. Possible ties (several links created at the same time) can be solved by ordering them at random. Since ties are scarce in the real networks analyzed here, their impact is marginal. The numbers of outgoing and incoming links established by node i in layer l are denoted as ${\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}$ and ${\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}$ , respectively (note that ${\sum }_{l}{\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}={k}_{i}^{{\rm{out}}}$ and ${\sum }_{l}{\rm{\Delta }}{k}_{i,l}^{{\rm{in}}}={k}_{i}^{{\rm{in}}}$ ). The total number of links created in layer l is m_l ( ${\sum }_{l}{m}_{l}=m$ ). Note that requiring the layers to have the same number of links is just one of the possible choices. Other simple choices are layers of an equal number of newly introduced nodes, and layers of equal physical timespan (we discuss the latter in section 5). While different real networks can, in principle, require different ways of constructing the layers, the current choice of an equal number of links has the advantage of producing layers of comparable statistical power.

With the constraint of given degree increase sequences ${\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}$ and ${\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}$ , the expectation of A_ij can be written as

$\begin{eqnarray}&&\langle {A}_{{ij}}{\rangle }_{{\rm{DCM}}}=\displaystyle \sum _{l=1}^{L}\displaystyle \frac{{\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}{\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}}{{m}_{l}},\end{eqnarray} \tag{ 9 }$

where we assume that the connections between the nodes are random within each layer. This expectation value can be used to replace the time-ignoring expectation value ${k}_{i}^{{\rm{out}}}{k}_{j}^{{\rm{in}}}/m$ in equation (1) and obtain temporal modularity

$\begin{eqnarray}&&{Q}_{T}(L)=\displaystyle \frac{1}{m}\displaystyle \sum _{i,j}\left({A}_{{ij}}-\displaystyle \sum _{l=1}^{L}\displaystyle \frac{{\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}{\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}}{{m}_{l}}\right)\delta ({c}_{i},{c}_{j}),\end{eqnarray} \tag{ 10 }$

where L = 1 recovers static modularity defined by equation (1). Similarly to static modularity, temporal modularity is negative when each node constitutes an independent community and it can increase by favorable changes in the community assignment; the result of its maximization is thus non-trivial and depends on the network structure. Unlike previous work on communities in multilayer networks [31], we assume that node community affiliation does not change over time. If necessary, this assumption can be relaxed. We adapt the Louvain algorithm [40] to optimize temporal modularity (see appendix B for details). In the toy example from figure 2(A), temporal modularity partitions the nodes correctly for any L from 4 to 20 (see figure 3).

**Figure 3.** Comparing modularity and temporal modularity in a small network. Partitions obtained by maximizing static directed modularity (top row) and the temporal modularity with 9 temporal layers (bottom row) in the model network from figure 2(A). The choice L = 9 follows from the observation timescale criterion described in section 3.2.
Download figure:
Standard image High-resolution image

Equation (10) can be viewed as a special case of the previously introduced multi-layer modularity [31, 32] where we constrain node affiliation to be fixed in time. While those studies assumed the layered structure of the data to be given, we obtained equation (10) by construction from continuously growing network data. As a result, the number of layers L can be freely varied, and it is thus important to study how to choose it in practice. This problem was not investigated in previous studies on multi-layer generalizations of modularity [31, 32] where the division of the network into layer was assumed to be given a priori, which is not the case for a generic time-stamped dataset.

One expects that the choice of L is linked with the network's aging speed: while one layer may suffice when aging is slow or even absent, many layers are needed when aging is fast. To measure the aging speed, we measure the median span of links. Assuming that the nodes are labeled by the time of appearance and denoting the out-going and in-coming node of edge n as e_n^out and e_nⁱⁿ, respectively, the system's median link span τ_S can be computed as the median of $| {e}_{n}^{{\rm{out}}}-{e}_{n}^{{\rm{in}}}|$ , that is

$\begin{eqnarray}&&{\tau }_{S}:= \mathrm{median}\{| {e}_{n}^{{\rm{out}}}-{e}_{n}^{{\rm{in}}}| \}.\end{eqnarray} \tag{ 11 }$

This timescale can be now compared with the average layer timespan

$\begin{eqnarray}&&{\tau }_{O}:= N/L\end{eqnarray} \tag{ 12 }$

which defines the observation timescale of the new community detection method (see table 2 for a summary of all the timescales and related variables considered in this article). If each layer covers time much longer than the aging timescale ( ${\tau }_{O}\gg {\tau }_{S}$ ), the temporal effects 'average' out and we can expect the results to be similar to those obtained with static modularity. By contrast, if layers are many and each of them contains only a handful of links ( ${\tau }_{O}\ll {\tau }_{S}$ ), temporal modularity is expected to be hampered by statistical fluctuations. We thus expect temporal modularity to work best at an intermediate range of τ_O values; we will determine the optimal timescale below.

Table 2. Timescales and related parameters. Summary of the notation adopted in this paper. Note that the nodes are labeled by their order of appearance.

Variable	Meaning
Θ	Aging parameter in the model
L	Number of layers
${\tau }_{O}:= N/L$	Observation timescale (layer duration)
e_n^out	Outgoing node of edge n
e_nⁱⁿ	Incoming node of edge n
τ_S	Detected link-based timescale (median of $\| {e}_{n}^{{\rm{out}}}-{e}_{n}^{{\rm{in}}}\|$ )
τ_J	Similarity-based timescale, extending [36]

To evaluate community detection results on model data, we compute Normalized Mutual Information (NMI) between the detected partitions and the ground-truth communities. Motivated by the tendency of static modularity to produce temporally confined communities, we also compute the size-weighted average time span Ω of the detected communities which is related to the age difference between the oldest and the most recent nodes in each identified community. The higher the Ω, the less temporally confined are the identified communities. The advantage of this metric is that it only concerns the properties of the detected communities and it does not involve any notion of ground-truth communities which cannot be uniquely defined in real data [43]. We will thus use Ω to evaluate partitions in real networks, which will help us to establish a bridge between model-based observations and real data. Details for these two metrics and other metrics that further support our findings are described in appendix A.

3.2. The optimal timescale of temporal modularity

Results for model data with various aging timescales are shown in figure 4. While panels (A, B) explore the parameter plane (τ_O, Θ), panels (C, D) use τ_O/τ_S on the horizontal axis. As can be seen, both the NMI and Ω show a peak around τ_O/τ_S ≈ 1, in particular when aging is sufficiently fast (Θ ≪ N). The system's intrinsic timescale τ_S thus directly determines the optimal value of ${\tau }_{O}^{* }$ for the temporal modularity's layers. The loss of temporal modularity's efficiency when the observation timescale is too long compared to the system's timescale ( ${\tau }_{O}/{\tau }_{S}\gtrsim 1$ ) is particularly fast. When τ_O/τ_S ≲ 1 (i.e. layers are shorter than the system timescale), we can observe performance plateaus that end when the number of nodes per layer becomes too small (in our simulations 10 nodes per layer or less) and the results become hampered by insufficient statistics. Panels (B, D) further show that the community time span, Ω, can be indeed used to distinguish between the large-τ_O regime where communities are mainly determined by time and thus of limited time span (right side of the figure), an intermediate regime with 'long' communities that are independent of time (hence they can reflect the network's structural information), and finally the noisy small-τ_O regime that again yields shorter sub-optimal communities.

**Figure 4.** The optimal timescale for community detection in growing networks. (A), (B) Community detection results for model data as a function of the observation timescale τ_O and the model aging parameter Θ: NMI (A) and the average community timespan (B). τ_O = 1024 corresponds to static modularity. The dashed lines mark the intrinsic system timescale τ_S corresponding to given Θ. Model parameters: N = 1024, n₀ = 10, k^out = 5, f_B = 0.1; results are averaged over 100 model realizations. (C), (D) To better appreciate the relation between ${\tau }_{S}$ and the optimal τ_O, data from panels (A), (B) are plotted here as a function of τ_O/τ_S. Full circles mark the right-most points of each curve which correspond to the results obtained with static modularity. These results show that the optimal NMI between the detected and ground-truth communities is achieved for τ_O ≃ τ_S; (2) the least time-confined communities (i.e. the communities with the largest Ω values) are observed for τ_O ≃ τ_S.
Download figure:
Standard image High-resolution image

Additionally, figure 5 shows results for two other evaluation metrics: the average number of detected communities and the average effective number of detected communities. We see that τ_O/τ_S ≈ 1 leads to community divisions with fewer communities (both in absolute terms as well as measured by the effective number of communities ${n}_{\mathrm{eff}}$ ) than other choices of τ_O. Results for other model settings and variants (see the figures in appendix C) further confirm that choosing τ_O = τ_S is optimal or nearly-optimal in many circumstances. The system's intrinsic timescale τ_S is thus an important connection between community detection using temporal modularity and the system's intrinsic properties. Since τ_S is based on studying the temporal properties of individual links, in the following, we refer it as link-based timescale.

Let us illustrate how the link-based timescale can be used to overcome the modularity breakdown illustrated in our initial example (figure 2(A)). For this network, the link-based timescale is τ_S = 7. Setting the observation timescale τ_O to 7 corresponds to choosing L = N/τ_O ≈ 8.6 layers in temporal modularity given by equation (10). In figure 3, we rounded that up to 9 layers, leading to the perfect community detection result.

3.3. Link-based and similarity-based timescale detection: a comparative analysis

In the previous analysis, we have defined the system's timescale using the median link time span τ_S. We now aim to compare this timescale detection criterion against an existing principled method for timescale detection in complex systems [36], which we refer to as similarity-based timescale detection. The original method introduced in [36] constructs iteratively layers ${{ \mathcal S }}_{n}=[{t}_{n},{t}_{n}+{\rm{\Delta }}{t}_{n})$ by maximizing the Jaccard similarity between the sets of events that occurred in pairs $({{ \mathcal S }}_{n},{{ \mathcal S }}_{n+1})$ of consecutive layers (the maximization is with respect to the layer duration Δt_n). The original method thus can, in principle, detect layers of heterogeneous lengths. However, as our synthetic networks feature a homogeneous timescale, we consider a variant of the method that aims to detect a single timescale. For given layer duration τ, we compute the average Jaccard similarity between pairs of consecutive layers. The layer duration that leads to the maximal average similarity is selected.

The key element in the similarity-based timescale detection is the definition of an event. Since [36] studies temporal networks, an event naturally is a temporal network link that occurs in layer n. In our case of a growing network, a link between two nodes appears at most once—event sets composed of links would be therefore disjoint and their similarity would be zero for every layer duration τ. We thus extend the original method and assume that an event is a node receiving a link in layer n (as in [36], we consider unweighted events, i.e. it does not matter how many links a node has received). Denoting ${{ \mathcal S }}_{n}$ the set of nodes that receive at least one link in layer n, the Jaccard similarity of layers n and $n+1$ is $J=| {{ \mathcal S }}_{n}\cap {{ \mathcal S }}_{n+1}| /| {{ \mathcal S }}_{n}\cup {{ \mathcal S }}_{n+1}|$ . The resulting timescale obtained by maximizing the average layer similarity is referred to as τ_J.

Our simulation results show that the two timescales, τ_S and τ_J, have similar values across the whole range of the model aging parameter Θ, and link-based timescales tend to be longer than the similarity-based timescales (figure 6(A)). More importantly, when the detected timescales are used to set the temporal modularity's layer duration, the detected communities are in much better agreement with the ground-truth communities than when static modularity is used. The two timescales yield similar NMI values across the whole range of Θ values except for the two smallest Θ values where the similarity-based timescales performs better than the link-based timescale. It has to be nevertheless noted that the link-based timescale is considerably simpler and amendable to analytical solutions than the similarity-based timescale.

**Figure 6.** Comparing the performance of different timescale detection procedures. Results are obtained on model data with parameters as in figure 4 and the model aging parameter Θ varied in the range [1, 1024]. The lines and the shaded areas indicate, respectively, the means and the standard deviation values computed from 100 model realizations.
Download figure:
Standard image High-resolution image

4. Significance analysis

To assess the statistical significance of the detected communities, we compare the results obtained on the model networks with those on model networks where links are randomized within each layer. To this end, we implement the Dynamic Configuration Model (DCM, [41]) using the layer division described before (see equation (10) and its justification): all network links are sorted by the time of their appearance and divided into L layers of equal size. In each layer l, the ${\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}$ outgoing stubs of the nodes are matched with the ${\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}$ incoming stubs of the nodes by choosing the candidate nodes preferentially by the number of remaining stubs. If the matched nodes correspond to an already existing link or a self-loop, the matching is repeated. In the end, there may be a small number of stubs that cannot be matched but this number is usually negligible. The number of layers in the DCM randomization is chosen depending on the system's timescale as N/τ_S rounded down.

Denoting the highest modularity values achieved on the original and DCM-randomized data as ${Q}_{T}{(L)}^{{\rm{orig}}}$ and ${Q}_{T}{(L)}^{{\rm{DCM}}}$ , respectively, the significance of the detected communities can be evaluated by computing the z-score

$\begin{eqnarray}&&{z}^{{\rm{DCM}}}=\displaystyle \frac{{Q}_{T}{\left(L\right)}^{{\rm{orig}}}-E[{Q}_{T}{\left(L\right)}^{{\rm{DCM}}}]}{\sigma [{Q}_{T}{\left(L\right)}^{{\rm{DCM}}}]},\end{eqnarray} \tag{ 13 }$

where $E[{Q}_{T}{(L)}^{{\rm{DCM}}}]$ and $\sigma [{Q}_{T}{(L)}^{{\rm{DCM}}}]$ denote the mean and the standard deviation of ${Q}_{T}{(L)}^{{\rm{DCM}}}$ over multiple realizations of the DCM randomization (we use 100 realizations). When z is small, the identified communities are not significant as their modularity is similar (or even worse) to the maximal modularity achievable in randomized data that are by construction free of any community structure. The higher the z value, the more significant the network partition (commonly used significance thresholds for z-score are two and three).

We use bootstrap to estimate the uncertainty of the estimated z scores. Denoting the estimated standard deviation of the resulting z-score as σ[z], we can compare the significance of community partitions A and B by evaluating the normalized z-score difference

$\begin{eqnarray}&&\delta z:= \displaystyle \frac{{z}_{A}-{z}_{B}}{\sqrt{\sigma {[{z}_{A}]}^{2}+\sigma {[{z}_{B}]}^{2}}}\end{eqnarray} \tag{ 14 }$

which compares the difference between the partitions' z-score with the combined uncertainty of the z-score estimates. Positive δz means that partition A is more significant than partition B. When δz is large, the observed significance-difference is significant itself. In figures 8(C), (F), we use the threshold of 3 to decide between subsets where temporal modularity yields significantly less or significantly more significant communities than static modularity.

For the toy example from figure 3, the described significance analysis shows that the result obtained at L = 9 is much more significant than that obtained with static modularity (the z-scores are 15 and 2.6, respectively). Importantly, the usual significance analysis using the Configuration Model (which, similarly as static modularity, does not take time information into account and it is thus inappropriate in the current setting) deems the incorrect partition obtained with static modularity as highly significant (its z-score is 28). This is an example of a static method that not only produces a misleading result but also confirms its statistical significance.

Significance analysis results for the basic model setting from figure 4 are shown in figure 7. We see that: (1) the detected communities are statistically significant when τ_O ≈ τ_S, (2) the communities detected by static modularity are less significant or even not significant when aging is fast; this indicates that these communities are largely determined by node degree dynamics that is preserved by the DCM and not by higher-order structural patterns, (3) a too short observation timescale is also damaging to the significance of the detected communities. We compute also the average NMI between the originally identified communities and the communities identified in randomized data. Figure 7(B) shows that the difference between the communities identified in the original and the randomized data is greatest when τ_O ≈ τ_S, which confirms that temporal modularity is then most sensitive to the actual network structure.

5. Implications for real networks

As τ_O ≃ τ_S proves to be an optimal choice of the temporal modularity function in model networks, two natural questions arise: how different are the communities detected by temporal modularity optimization (with τ_O = τ_S) from those detected by static modularity maximization in real growing networks? Do the results obtained with temporal modularity call for a revision of our conclusions on the significance of the community structure of growing networks based on static modularity maximization?

To investigate the relation between temporal and static modularity maximization in real data, we use subsets of the news citation dataset that was used in [44] to analyze the backbone of the citation network, and subsets of the American Physical Society (APS) citation data from years 1893–2013; the subsets correspond to specific newspapers and PACS codes, respectively (see appendix D for details). We focus on three main properties: (1) the average community lifespan Ω of the detected communities—we compare Ω₁ achieved by static modularity with Ω_T achieved by temporal modularity; (2) the NMI between the communities detected by static and temporal modularity (since the true partition is unknown, we cannot directly evaluate the 'correctness' of the obtained communities); (3) the normalized z-score difference between the partitions obtained with temporal and static modularity, respectively.

Figures 8(A), (D) show that the ratio Ω_T/Ω₁ is generally larger than one, confirming that the communities detected by temporal modularity have a longer time span than those detected by static modularity. Importantly, Ω_T/Ω₁ decreases with τ_S: as aging becomes faster, the communities detected by temporal modularity become more 'stretched' over time compared to those detected by static modularity. This is in qualitative agreement with our results on model data where the increase of Ω by temporal modularity grows with the aging speed (figure 4(D)). Figures 8(B), (E) show that the NMI between the communities by temporal modularity and those by static modularity is substantially smaller than one. Furthermore, the communities detected by the two methods tend to differ more for larger networks. Figures 8(C), (F) show that the communities by temporal modularity tend to be more statistically significant than the communities by static modularity. This indicates that properly including temporal information into the detection algorithm can substantially alter the conclusions on the significance of the community structure in growing networks. The analyzed real growing networks tend to be more 'temporally modular' than modular: factoring out temporal patterns allows us to reveal a richer level of organization than possible with static modularity alone.

**Figure 8.** Implications of temporal effects for community detection in empirical growing networks. We report the community detection results for the empirical News subsets (top) and the APS subsets (bottom); see appendix D the data description. The columns show that, in order: (1) the average community time span Ω_T obtained with temporal modularity using τ_O = τ_S is longer than the static modularity timespan Ω₁, and the ratio Ω_T/Ω₁ tends to decrease with the relative aging speed τ_S/N (Spearman correlation −0.77 and −0.78 for the news and the APS data, respectively). (2) NMI between communities obtained with temporal and static modularity tends to decrease with the number of nodes (Spearman correlation −0.77 and −0.86 for the news and the APS data, respectively). (3) Communities obtained with temporal modularity tend to be more significant than those obtained with static modularity (the light and dark part of each column visualize the subsets with the number of nodes below and above the median for given datasets, respectively, indicating that the difference is even bigger for larger networks).
Download figure:
Standard image High-resolution image

Recent literature [45] suggests that 'event time' defined by the number of nodes characterizes the decay of attention in citation networks better than 'real time'. We have nevertheless considered an alternative to the equal layer size construction introduced in section 3.1 where real time is used instead to both define the layers as well as to measure the intrinsic system timescale τ_S which is then based on the median of real time differences between the appearance times of the out-going and the in-coming node. The obtained results qualitatively agree with those presented in figure 8 without strong evidence in favor of either of the two temporal modularity constructions. Further research is necessary to map the settings where either of the two constructions is preferrable, and to understand what distinguishes them.

6. Discussion

The implications of our work are multi-fold. It has become increasingly clear [8, 10, 16] that to properly detect structural patterns in time-evolving systems, we need time-aware network analysis methods. While methods based on multi-layer representations of time-evolving networks are potentially powerful [31, 32, 34], they call for the fundamental question of how to choose the temporal resolution at which to look at the system [35, 36]. Importantly, looking at a given system with different observation timescales can reveal different structural phenomena, such as different group organizations in social systems [5] and different behavioral patterns in communication networks [35], among others. Our work sets out to determine the optimal observation timescale ${\tau }_{O}^{* }$ for community detection based on a multi-layer generalization of the modularity function, which we called temporal modularity.

We found, both analytically and numerically, that modularity maximization yields unsatisfactory results in networks where aging decays sufficiently quickly. We found that the optimal observation timescale ${\tau }_{O}^{* }$ of the multi-layer modularity function is determined by the system aging timescale: ${\tau }_{O}^{* }\simeq {\tau }_{S}$ . Different choices of τ_O lead indeed to sub-optimal performance in the reconstruction of ground-truth communities in model data, and to communities that are more strongly determined by node age in real data. The optimal timescale to look at the system is therefore close to the inherent timescale of the system's growth dynamics. This supports the idea that analyzing the structure of a time-evolving network requires to first understand the properties of the dynamical process that generated that network. Note that τ_S is not bound to the problem of community detection and characterizes the general aging pattern of a given network.

We proposed a simple procedure to detect communities in networks that organically grow over time. We suggested a novel metric, the link-based timescale τ_S, to characterize the temporal patterns of a network, and demonstrated that this timescale should be used in the community detection process. When the aging timescale is short, the communities detected with our new framework are dramatically different from those detected with standard time-ignoring (static) approaches such as modularity maximization. Our comparison of the link-based timescale τ_S against similarity-based timescale τ_J based on previous literature [36] reveals that the two timescales perform similarly for most parameter values, yet τ_S is conceptually simpler. In general, different timescale-detection methods can perform well for different tasks, and additional research is needed to study the performance of various timescales in different scenarios.

Our work paves the way for the search of community detection methods best suited to growing networks. Inspired by the equivalence between modularity and the long-known Stochastic Block Model [46, 47], an approach based on the dynamic stochastic block model [48] can be compared against temporal modularity on our growing benchmark graphs. Besides, methods based on higher-order networks representations of temporal networks [10] or consensus dynamics [49] can be tested within our framework. Various modifications (e.g. multiple communities of variable size) may improve our model to generate growing networks with aging and community structure, and the resulting models might be tested to explain the evolution of real growing networks [50]. While we focused here on growing networks, generalizing our results to temporal networks where links can appear or disappear over time [51] is another open direction. Besides, cultural markets [52] and E-commerce systems [53] can be represented with bipartite networks of users and products that grow over time. Adapting our approach to bipartite networks is thus a problem with many potential practical applications.

To conclude, we found that communities detected with temporal modularity are statistically more significant than those detected with standard modularity in the majority of analyzed empirical growing networks, which suggests that including temporal information into the community detection algorithm can unveil a richer large-scale organization than that uncovered by static methods. This calls for a note of caution on the use of the popular modularity maximization and, more generally, static community detection algorithms. Our findings demonstrate that if we analyze a time-evolving system, the communities detected by modularity maximization might be strongly influenced by the age of the nodes. A better way to detect communities in time-evolving networks is to first measure the typical timescale of the dynamic mechanisms that generated that network, and then exploit this information to analyze the system's structural patterns at that timescale.

Acknowledgments

MM, MSM and YCZ conceived the original idea. MM and MSM designed the study. AZ and MM acquired the data. MM and AZ performed numeric simulations. MM and MSM analyzed the results. MM and MSM wrote the manuscript.

All datasets necessary to reproduce the results in this paper will be publicly available.

This work is supported by the Swiss National Science Foundation (Grant No. 200020-156188). MM acknowledges the support from the National Natural Science Foundation of China (Grant No. 11850410444). AZ acknowledges the support from the National Natural Science Foundation of China (Grant No. 61603046). MSM acknowledges financial support from the University of Zurich through the URPP Social Networks, the Swiss National Science Foundation (Grant No. 200021-182659), and the UESTC professor research start-up (Grant No. ZYGX2018KYQD215).

Appendix A.: Community division evaluation metrics

Normalized Mutual Information (NMI) is a standard evaluation metric in community detection research [18, 54]. Denoting the sets of nodes comprising the detected communities as ${{ \mathcal C }}_{1},\ldots ,{{ \mathcal C }}_{D}$ and the sets of nodes comprising the ground-truth communities as ${{ \mathcal G }}_{1},\ldots ,{{ \mathcal G }}_{T}$ , the normalized mutual information between the two community partitions is computed as

$\begin{eqnarray}&&{\rm{NMI}}=\displaystyle \frac{-2{\displaystyle \sum }_{i=1}^{D}{\displaystyle \sum }_{j=1}^{T}| {{ \mathcal C }}_{i}\cap {{ \mathcal G }}_{j}| \mathrm{log}\tfrac{N| {{ \mathcal C }}_{i}\cap {{ \mathcal G }}_{j}| }{| {{ \mathcal C }}_{i}| | {{ \mathcal G }}_{j}| }}{{\displaystyle \sum }_{i=1}^{D}| {{ \mathcal C }}_{i}| \mathrm{ln}\tfrac{| {{ \mathcal C }}_{i}| }{N}+{\displaystyle \sum }_{j=1}^{T}| {{ \mathcal G }}_{j}| \mathrm{ln}\tfrac{| {{ \mathcal G }}_{j}| }{N}},\end{eqnarray} \tag{ A.1 }$

where N is the total number of nodes. The terms that are of the kind 0 ln 0 are ignored in the summations.

Average community time span (Ω) is introduced to measure the time confinement of communities. For a detected community k, we compute the 80th and 20th percentile of node IDs in the community, and define the community time span Ω_k as the difference between the two values. (The difference between the maximal and minimal node ID in the community would be more prone to outlier nodes.) The overall average time span is computed as a weighted mean of Ω_k with weights given by the community sizes

$\begin{eqnarray}&&{\rm{\Omega }}=\displaystyle \frac{{\displaystyle \sum }_{k=1}^{D}| {C}_{k}| {{\rm{\Omega }}}_{k}}{{\displaystyle \sum }_{k=1}^{D}| {C}_{k}| }.\end{eqnarray} \tag{ A.2 }$

Based on figure 1 in the main text and the accompanying discussion, we hypothesize that in systems with fast aging, optimization of static modularity leads to time-constrained communities with time span that is comparatively smaller than the time span of communities identified with temporal modularity.

For a given network partition one can evaluate the number of detected communities, n_C. Since the distribution of community sizes can be uneven, we also compute the effective number of detected communities (also known as the inverse Herfindahl index)

$\begin{eqnarray}&&{n}_{\mathrm{eff}}=\displaystyle \frac{{\left({\displaystyle \sum }_{k}| {{\mathsf{C}}}_{k}| \right)}^{2}}{{\displaystyle \sum }_{k}| {{\mathsf{C}}}_{k}{| }^{2}},\end{eqnarray} \tag{ A.3 }$

where $| {{\mathsf{C}}}_{k}|$ is the size of detected community k. When one community contains almost all network nodes, ${n}_{\mathrm{eff}}\to 1$ . When all D detected communities have the same size, ${n}_{\mathrm{eff}}=D$ .

Appendix B.: Algorithm for optimizing temporal modularity

We maximize the temporal modularity defined by equation (10) by following the steps used by the Louvain algorithm that was originally proposed to maximize the standard Newman–Girvan modularity for undirected networks [40]. This algorithm is a 'greedy' optimization algorithm which means that only configuration changes that increase the objective function are accepted. The algorithm proceeds as follows. Each node is initially in its own community (c_i = i). In the first step, individual nodes move from one community to another. In particular, we choose a node at random and search for the community, which yields the largest increase of Q_T(L) if the node joins it. After moving the node to the best community, or after determining that there is no modularity-increasing move, we proceed with another node chosen at random. When no single node can be moved, optimization step one ends. Note that if a node that moves from a community is the last node of this community, the community effectively disappears. In this way, the number of communities progressively decreases during the first optimization step.

The second optimization step is similar but instead of individual nodes, we probe merging of entire communities. In particular, we choose a community at random and search for the community that yields the largest increase of Q_T(L) upon merging with the chosen community. When no pair of communities can be merged, optimization step two ends and the final partition of nodes is reported.

To speed up the computation, the inner sum in equation (10), ${\sum }_{l=1}^{L}{\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}{\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}/{m}_{l}$ , can be precomputed and stored in memory. Since the optimization algorithm contains randomness (we probe the nodes and communities, respectively, in random order), it is possible to run the algorithm several times and output the solution that yields the highest value of Q_T(L). In our simulations, we always use 10 independent algorithm runs.

Appendix C.: Additional results on model data

Results for all evaluated metrics in the parameter plane (τ_O, Θ) are shown here as heatmaps. This complements figure 2 in the main text which shows NMI and Ω for one model setting. Figure 4 in the main text shows the basic setting $N=1024,{n}_{0}=10,{k}^{{\rm{out}}}=5$ and f_B = 0.1. In addition to the observations mentioned in the main text, we see that choosing τ_O ≈ τ_S yields the smallest number of detected communities which is yet another positive contribution of temporal modularity. The agreement between the optimal τ_O and the system intrinsic timescale τ_S holds also when the communities are more interconnected (see figure C1). Naturally, the results of community detection are then worse in comparison with denser networks or those whose community structure is less noisy.

**Figure C1.** As figure 4 in the main text but f_B = 0.2 (i.e. the two ground-truth communities are more interconnected).
Download figure:
Standard image High-resolution image

The basic model that we use for all simulations in the main text and all results above is based on preferential attachment and aging; differently from [26], for simplicity there is no node fitness. Figures C2, C3 and C4 show results for model data with substantial variations to the model that are achieved by modifying equation (2) in the main text:

(i)
model with aging, no preferential attachment, and no fitness: ${P}_{i\to j}\sim {X}_{{ij}}{{\rm{e}}}^{-(i-j)/{\rm{\Theta }}}$ ,
(ii)
model with aging and exponentially distributed node fitness, no preferential attachment: ${P}_{i\to j}\sim {X}_{{ij}}{\eta }_{j}{{\rm{e}}}^{-(i-j)/{\rm{\Theta }}}$ where node fitness values η are exponentially distributed in the range $[1,\infty )$ ;
(iii)
same as the original model but exponential aging is replaced with power-law aging: ${P}_{i\to j}\sim {X}_{{ij}}({k}_{j}^{{\rm{in}}}+1)/(1+{[(i-j)/{\rm{\Theta }}]}^{2})$ ;

In all three cases, the community structure-inducing term X_ij remains the same as defined by equation (3) in the main text. As can be seen in figure C2–C4, these modifications do not alter any of our main results.

**Figure C3.** As figure 4 in the main text but preferential attachment has been removed from the model and nodes are assigned fitness that is exponentially distributed.
Download figure:
Standard image High-resolution image

**Figure C4.** As figure 4 in the main text but exponential aging has been replaced with power-law aging.
Download figure:
Standard image High-resolution image

Appendix D.: Real datasets

To support our findings, we use two different real datasets that can be represented with growing directed monopartite networks: a news dataset [44] and a citation dataset of papers published by the American Physical Society (APS).

D.1. News data

The news citation dataset was used to analyze the backbone of the citation network and its impact on network centrality metrics [44]. The dataset consists of news published by various outlets (newspapers and televisions) and citations among them. Since most citations are among the news published by the same outlet (91%), we treat articles published by individual outlets as independent datasets. For each subset, self-loops and nodes that do not belong in the giant component are discarded. Only subsets with 500 edges or more are included in the further analysis (there are 22 of them). Table D1 summarizes the basic properties of the analyzed news subsets.

Table D1. Basic characteristics of the datasets corresponding to various news outlets in the dataset from [44]; only the giant component is kept, subsets with less than 500 edges in the giant component are discarded. The shown characteristics: the number of nodes (N), the number of edges (m), mean degree (μ), and the ratio N/τ_S which, as we argue, can be used to determine the number of layers for temporal modularity.

News outlet	N	m	μ	N/τ_S
Guardian	24 122	45 327	1.88	111.2
Die Welt	26 417	37 610	1.42	118.5
Die Zeit	17 788	31 381	1.76	157.4
Washington Post	17 355	29 092	1.68	39.3
CBS	12 852	22 953	1.79	210.7
Der Spiegel	15 018	21 413	1.43	300.4
Los Angeles Times	10 116	20 220	2.00	12.6
Independent	12 054	17 510	1.45	88.6
Telegraph	10 840	17 091	1.58	73.7
New York Times	7656	12 500	1.63	52.1
International Business Times	7127	10 069	1.41	41.7
Basler Zeitung	3704	6798	1.84	39.0
Neue Zürcher Zeitung	4093	5832	1.42	14.2
Toronto Star	2327	3590	1.54	61.2
Sky News	1667	2407	1.44	119.1
BBC	1736	2288	1.32	115.7
Al Jazeera	1384	1818	1.31	28.8
Süddeutsche Zeitung	942	1171	1.24	94.2
Canadian Broadcasting Corporation	854	1012	1.19	22.5
United Press International	629	950	1.51	27.3
New Yorker	500	722	1.44	35.7
Atlantic	460	646	1.40	19.2

D.2. APS dataset

We use the APS citation data that were obtained on our request from https://journals.aps.org/datasets. Our dataset contains all papers published by the APS from 1893 until December 2013 and the links among them. Importantly, paper metadata contains paper PACS codes which allow us to construct subsets of the original dataset in a controlled way (see https://journals.aps.org/PACS for details on the Physics and Astronomy Classification Scheme, PACS). The original data comprise 539 974 papers and 5 992 897 citations among them. Majority of the papers have some PACS codes assigned (404 999 out of 539 974; most of the papers without PACS codes were published before 1977 when the PACS classification scheme was introduced). The PACS codes have a three-level hierarchy (for example, code '89.75.-k' represents 'Complex systems'). To construct subsets, we use the two top levels (for example, '89.75.*') which results in larger subsets than when complete PACS codes are used. Every paper that has a given two-level PACS code is considered as part of the subset. For each subset, self-loops and nodes that do not belong in the giant component are discarded. To create the subsets, we chose two-level PACS codes whose aging speed N/τ_S covers a broad range of values. Only subsets with at least 1000 edges are used for further analysis (there are 27 of them). Table D2 summarizes the basic properties of the analyzed APS subsets.

Table D2. Basic characteristics of the datasets corresponding various top-two-level PACS codes in the APS citation data until December 2013; only the giant component is kept, subsets with less than 1000 edges in the giant component are discarded. The shown characteristics: the number of nodes (N), the number of edges (m), mean degree (μ), and the ratio N/τ_S which, as we argue, can be used to determine the number of layers for temporal modularity.

PACS code	N	m	μ	N/τ_S
42.50-*	17 890	147 261	8.23	6.3
03.67-*	12 491	122 464	9.80	4.3
03.75-*	8786	116 448	13.25	5.1
98.80-*	9496	97 721	10.29	6.8
74.70-*	7078	51 389	7.26	10.7
14.60-*	3983	37 751	9.48	8.7
25.75-*	3604	34 646	9.61	6.4
74.60-*	3785	31 123	8.22	4.9
71.27-*	4999	23 603	4.72	5.7
04.25-*	1939	23 201	11.97	5.8
04.30-*	2109	22 505	10.67	5.8
95.35-*	2347	19 303	8.22	6.3
61.30-*	3584	19 261	5.37	5.9
72.25-*	2953	16 174	5.48	4.4
61.50-*	3044	9746	3.20	9.5
45.70-*	1750	8741	4.99	4.2
68.37-*	2858	8169	2.86	4.2
72.80-*	2624	6966	2.65	9.3
75.75-*	1585	5702	3.60	4.2
87.16-*	1337	4007	3.00	3.9
03.70-*	810	3302	4.08	7.9
81.15-*	1169	3290	2.81	5.1
52.27-*	855	3279	3.84	4.1
68.65-*	1126	2692	2.39	8.5
33.20-*	976	2322	2.38	8.8
61.70-*	598	1403	2.35	3.8
36.20-*	489	1053	2.15	9.4

Optimal timescale for community detection in growing networks

Article metrics

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Impact of network growth on modularity

2.1. Modularity

2.2. Model for growing networks with community structure

2.3. Breakdown of static modularity in growing networks

3. Community detection in growing networks

3.1. Multi-layer modularity

3.2. The optimal timescale of temporal modularity

3.3. Link-based and similarity-based timescale detection: a comparative analysis

4. Significance analysis

5. Implications for real networks

6. Discussion

Acknowledgments

Appendix A.: Community division evaluation metrics

Appendix B.: Algorithm for optimizing temporal modularity

Appendix C.: Additional results on model data

Appendix D.: Real datasets

D.1. News data

D.2. APS dataset

Optimal timescale for community detection in growing networks

Article metrics

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Impact of network growth on modularity

2.1. Modularity

2.2. Model for growing networks with community structure

2.3. Breakdown of static modularity in growing networks

3. Community detection in growing networks

3.1. Multi-layer modularity

3.2. The optimal timescale of temporal modularity

3.3. Link-based and similarity-based timescale detection: a comparative analysis

4. Significance analysis

5. Implications for real networks

6. Discussion

Acknowledgments

Appendix A.: Community division evaluation metrics

Appendix B.: Algorithm for optimizing temporal modularity

Appendix C.: Additional results on model data

Appendix D.: Real datasets

D.1. News data

D.2. APS dataset