Paper The following article is Open access

Optimal timescale for community detection in growing networks

, , and

Published 30 September 2019 © 2019 The Author(s). Published by IOP Publishing Ltd on behalf of the Institute of Physics and Deutsche Physikalische Gesellschaft
, , Citation Matúš Medo et al 2019 New J. Phys. 21 093066 DOI 10.1088/1367-2630/ab413f

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

1367-2630/21/9/093066

Abstract

Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information on the system's dynamics. At the same time, time-aware methods require the choice of an observation timescale, yet we lack principled ways to determine it. We focus on the popular community detection problem which aims to partition a network's nodes into meaningful groups. We use a multi-layer quality function to show, on both synthetic and real datasets, that the observation timescale that leads to optimal communities is tightly related to the system's intrinsic aging timescale that can be inferred from the time-stamped network data. The use of temporal information leads to drastically different conclusions on the community structure of real information networks, which challenges the current understanding of the large-scale organization of growing networks. Our findings indicate that before attempting to assess structural patterns of evolving networks, it is vital to uncover the timescales of the dynamical processes that generated them.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 3.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Many systems that are of interest for social science, information science, and data mining can be represented as complex networks that are not static but grow with time. For example, the global scientific output grows at a fast and accelerating pace, which results in growing citation networks of scientific papers that represent our accumulated scientific knowledge [1, 2]. The World Wide Web [3] and social contact networks [4, 5] grow as well. The presence of growth challenges traditional network analysis [6, 7] and makes it essential to develop and validate time-aware methods to achieve a solid understanding of the structure of these systems [8]. In particular, extensive research has shown that the inclusion of temporal information into network analysis has a dramatic impact on long-studied problems such as community detection [911], node ranking [1113], dynamics control [14], and spreading phenomena [1517].

This article focuses on one of the fundamental problems in network science, the detection of communities [18], which has received enormous attention from diverse research areas, including physics [18], computer science [19], ecology [20], neuroscience [21], and social science [6, 22], among others. While the problem is not uniquely defined [18], it can be generically described as the problem of determining whether there exists a meaningful partition of the network nodes into groups of nodes. The problem (also known as clustering in computer science [19, 23]) has a long history, and it has been traditionally addressed using structural, static network analysis.

A popular approach to community detection is to maximize a function, called modularity, which quantifies how much the total number of intra-community edges deviates from its expected value under a null model that preserves the individual nodes' number of connections [24]. Modularity has been studied from many viewpoints, and it is widely-recognized as a standard tool in network analysis [18]. Despite past research and the wide use of modularity optimization in a broad range of contexts, we still lack a systematic understanding of its behavior and performance in growing networks where time and aging phenomena are fundamental [2527]. Albeit modularity has been used in such systems in its original form [2830], the results can be expected to be suboptimal as modularity neglects the vital time information. A multi-layer form of modularity has been developed that can take into account network snapshots at various times [31, 32]. However, when we wish to apply a multi-layer approach to identify relevant communities in growing networks, we face an impasse: existing works assume layered input data [3134] and thus they do not consider the question of how to divide an arbitrary time-stamped network into layers. Addressing this question requires to choose an appropriate observation timescale, i.e. the temporal duration for each layer [5, 35, 36]. This choice is essential because different timescales might reveal substantially different community structures, which in turn might lead to different conclusions on the large-scale organization of the system.

In this work, we derive analytically a criterion to estimate when a time-aggregated, static view of a growing network ceases to be sufficient for effective community detection through standard modularity maximization. When this criterion is not met, the detected communities are strongly determined by node age and therefore in discordance with the network's actual community structure. We introduce the observation timescale τO, divide the input network into subsequent layers of temporal duration τO each, construct a corresponding multi-layer modularity function, and use the resulting community detection method on diverse synthetic and real datasets. Remarkably, we find that the observation timescale ${\tau }_{O}^{* }$ that best uncovers the ground-truth communities in synthetic data is tightly related to the inherent aging timescale τS of the system's growth dynamics: ${\tau }_{O}^{* }\simeq {\tau }_{S}$. We use both synthetic and real data to show that different choices of the temporal resolution parameter lead to very different detected communities and conclusions on their statistical significance. Our results provide clear guidelines for data-driven calibration of multi-layer community detection techniques. Beyond the particular problem of community detection, the connection between the observation timescale τO used for structural analysis and the system's intrinsic timescale τS is relevant to the general problem of analyzing the structure and function of the broad variety of networks that evolve in time.

2. Impact of network growth on modularity

Before detailing the multi-layer community detection method, we start by demonstrating how temporal effects impair the ability of the traditional modularity maximization to uncover the community structure of growing networks with aging. Since aging is common for information networks where connections between the items are usually directed (such as citations in a scholarly citation network and followers in a social network), we use here the formalism of directed networks. A similar analysis is possible for undirected and bipartite networks.

2.1. Modularity

The classical Newman–Girvan modularity function [37] has been adapted to quantify the quality of a partition into communities in directed networks [38, 39] as

Equation (1)

where m is the number of network links, Aij is an element of the adjacency matrix which is 1 if node i points to node j and zero otherwise, kiout and kiin are respectively the out- and in-degree of node i, ci denotes the community of node i, δ(ci, cj) is the Kronecker delta which is 1 when ci = cj and zero otherwise, and the summation indexes i and j run over all N network nodes. Equation (1) is referred to as static modularity from now on. The task is to find the network partition that maximizes static modularity which thus serves as the objective function. Among the various existing approaches to modularity optimization, the Louvain algorithm [40] is a popular choice.

The negative term in equation (1) represents the expectation of Aij for a random network that has the same in- and out-degree sequence as the original network. However, such randomization is of limited use in growing networks where time plays an important role in the nodes' connection patterns [26, 27]: it neglects the original network's temporal properties and, consequently, it can violate the network's fundamental temporal constraints. In often-studied citation networks, for example, it generates 'unphysical' randomized networks where papers can also cite future papers [41]. To demonstrate the problem of standard modularity, and later to assess its modification suitable for growing networks, we set up a simple network growth model based on the classical preferential attachment process with aging [26].

2.2. Model for growing networks with community structure

The model assumes that each node belongs to one of two ground-truth communities and preferentially (to a degree that can be tuned in the model) links to other nodes in the same community. The ground-truth community of node i, Ci, is chosen at random. The model can be easily extended to a case with more communities of various relative size. There are initially n0 nodes that are all assumed to appear at time 0. The network then grows in time steps t = 1, ..., N − n0. In each time step, one node is introduced in the network; the final number of nodes is thus N. Each introduced node creates kout outgoing links to the existing nodes. The probability that node i points to an existing node j is

Equation (2)

where kjin is the current indegree of node j, exponential aging controlled by the parameter Θ is assumed, and the general preference for links between nodes i and j is encoded in the term Xij which is described below. If node j has been chosen by node i before, the choice is repeated. In this way, there is at most one directed link between any two nodes in the network. Small Θ values result in 'short' links that connect the newly introduced node with other recently introduced nodes. As Θ increases, aging slows down and becomes negligible when Θ ≫ N. We refer to table 1 for a summary of the notation for all model parameters and network properties.

Table 1.  Network model summary. Adopted notation for the model parameters and the resulting network properties.

Variable type Variable Meaning
Model parameter n0 Initial number of nodes
  N Final number of nodes
  kout Outdegree of the introduced nodes
  Θ Aging parameter
  μ Community-mixing parameter
Network property m Number of links
  fB Fraction of inter-community links
  $\overline{k}$ Average degree

Note that equation (2) for simplicity omits the fitness term that is crucial to control the resulting network degree distribution [26]. We consider various model variants, including a variant with heterogeneous node fitness values, in appendix C. The community structure is introduced in the model by assuming

Equation (3)

where Ci and Cj are the ground-truth communities of i and j, respectively. Xij is 1 − μ if nodes i and j are in the same ground-truth community and μ if they are not. As a result, the number of links between the communities grows with μ. Other benchmark models for community detection, such as the Lancichinetti–Fortunato–Radicchi model [42], allow by construction to directly set the fraction of inter-community links in model networks. In our growth model, instead, μ only influences the preference for intra-community links in the network growth. The resulting fraction of inter-community links, fB, therefore emerges by an interplay of this preference, the pool of available target nodes, and aging. Figure 1 shows that fB grows with μ in a nonlinear yet monotonous manner. In our numeric simulations, we achieve the desired fB values by choosing the appropriate μ(Θ).

Figure 1.

Figure 1. The relation between the community-mixing parameter μ and the resulting fraction of inter-community links fB. In the described network model, fB shows a nonlinear yet monotonous dependence on μ. As an illustration, we show here the fraction of links between communities fB as a function of μ for model data with ${n}_{0}=10,N=512,{k}^{{\rm{out}}}=10$, and two ground-truth communities.

Standard image High-resolution image

2.3. Breakdown of static modularity in growing networks

Figure 2(A) shows the result of modularity maximization in a toy model network. We see that, despite the two true communities being visually well separated, the result is markedly wrong as the nodes are essentially clustered by their appearance time. To understand why modularity fails to recognize the true communities, we focus now on the simple case where the two ground-truth communities are of similar size and perfectly separated (in the model, this is achieved by setting μ = 0). For the correct partition into two communities that are identical with the ground-truth communities, the modularity contribution from true community k is

Equation (4)

The total modularity of the correct division is thus 1/2. We now study the impact of dividing one correct community into two parts with sizes N1 and N2 (N1 + N2 = N/2). Modularity contribution from these two parts that cover true community k is

Equation (5)

Here, Δm represents the 'loss' of links that are in the true community k but do not contribute to ${Q}_{k}^{{\prime} }$ because they run between the two parts that we now consider (i.e. they cross the partition boundary). The third and the fourth term represent the sum of expectation terms which goes over all ${N}_{1}^{2}$ pairs of nodes in the two parts into which k is subdivided. The smallest sum of expectation terms is achieved when N1 = N2 = N/4 (i.e. the true community is divided into two parts of the same size) when we obtain

Equation (6)

where we used $m=\overline{k}N$.

Figure 2.

Figure 2. The breakdown of modularity in growing networks. (A) Temporal confinement of communities identified by the standard static modularity in a model network with fast aging ($N=60,{\rm{\Theta }}=3,{k}^{{\rm{out}}}=10,\mu =0.05;$ the resulting fraction of links between the communities is fB = 0.24). The horizontal position of nodes is given by their appearance time, the vertical position is given by their true community affiliation, and node color marks the community assigned by modularity maximization. We see that static modularity is essentially insensitive to the true communities and clusters the nodes by their appearance time. (B)–(D) The behavior of static modularity on model data with different aging timescales (in all simulations kout = 5 and μ = 0, hence fB = 0; results are averaged over 100 model realizations and the shaded areas visualize the 10th–90th percentile range). Albeit the two ground-truth communities are perfectly separated in the model data, from some network size, modularity optimization yields inferior results with increasing modularity (panel B) yet decreasing NMI (panel C). The loss of NMI is due to the number of identified communities nC which is correctly two until some N but then it starts to grow (panel D). The vertical lines mark the analytically estimated thresholds of 4Θ for the three plotted Θ values.

Standard image High-resolution image

At this point, we can understand why a division of a single ground-truth community into more parts can increase modularity. If the average degree $\overline{k}$ is fixed, the number of links m is proportional to the number of nodes N. At the same time, aging suppresses the formation of 'long' links in the network and therefore implies an upper bound on the number of links between the two parts, Δm. As the network grows, the negative term Δm/m therefore decreases and, thanks to a higher absolute term, ${Q}_{k}^{{\prime} }$ thus eventually exceeds Qk. At this point, modularity maximization results in dividing the true community k into two parts. As the network grows further, the divisions continue and the number of identified communities grows (see figure 2(D)).

We now estimate the network size at which the first division occurs. We do so assuming the exponential aging that we use in our numerical simulations. If we disregard preferential attachment, the probability that a link created under exponential aging targets a node introduced n steps ago is approximately $\exp (-n/{\rm{\Theta }})/{\rm{\Theta }}$ (we assume that Θ is large, so summation over individual n values can be replaced with integration that eventually yields the normalization factor Θ). The first node after the partition boundary must necessarily point all its outgoing links across the boundary. For a node n steps after the partition boundary, the fraction of boundary-crossing links created by this note is approximately ${\sum }_{i=n}^{\infty }\exp (-i/{\rm{\Theta }})/{\rm{\Theta }}\approx \exp (-n/{\rm{\Theta }})$. The total number of boundary-crossing links is then obtained by multiplying with kout and summing over all n values. In the summation, each n values carries the weight 1/2 because there are two communities and we count the boundary-crossing links in only one of them. Hence

Equation (7)

When the initial number of nodes, n0, is small, ${k}^{{\rm{out}}}\approx \overline{k}$. The inequality ${Q}_{k}^{{\prime} }\gt {Q}_{k}$ can be now solved for N, yielding

Equation (8)

When this condition is met, network divisions into four (or more) parts are preferred to the correct division into two ground-truth communities.

Our analytically-derived criterion predicts accurately the breakdown of modularity in numeric experiments in the absence of inter-community links (figure 2, panels B–D). In particular, when the criterion defined by equation (8) is met, the optimal modularity is larger than the value (Q = 0.5) expected for a partition with two perfectly-separated communities (figure 2(B)), the Normalized Mutual Information (see appendix A for the definition) between detected and ground-truth communities is significantly smaller than one (figure 2(C)), and the number of detected communities is larger than two (figure 2(D)). In other words, in growing networks above some network size, modularity optimization fragments the ground-truth communities into smaller communities that are mostly determined by the nodes' age.

3. Community detection in growing networks

To resolve the limitations of static modularity, we propose the temporal modularity quality function building on the recently-introduced Dynamic Configuration Model (DCM) for growing networks [41] which proposes a way of randomizing time-stamped networks whilst approximately preserving the time evolution of each node's degree.

3.1. Multi-layer modularity

To define the temporal modularity function, we divide the network's links by time into L layers of an equal number of links; layers l = 1 and l = L contain the earliest and latest links, respectively. Possible ties (several links created at the same time) can be solved by ordering them at random. Since ties are scarce in the real networks analyzed here, their impact is marginal. The numbers of outgoing and incoming links established by node i in layer l are denoted as ${\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}$ and ${\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}$, respectively (note that ${\sum }_{l}{\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}={k}_{i}^{{\rm{out}}}$ and ${\sum }_{l}{\rm{\Delta }}{k}_{i,l}^{{\rm{in}}}={k}_{i}^{{\rm{in}}}$). The total number of links created in layer l is ml (${\sum }_{l}{m}_{l}=m$). Note that requiring the layers to have the same number of links is just one of the possible choices. Other simple choices are layers of an equal number of newly introduced nodes, and layers of equal physical timespan (we discuss the latter in section 5). While different real networks can, in principle, require different ways of constructing the layers, the current choice of an equal number of links has the advantage of producing layers of comparable statistical power.

With the constraint of given degree increase sequences ${\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}$ and ${\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}$, the expectation of Aij can be written as

Equation (9)

where we assume that the connections between the nodes are random within each layer. This expectation value can be used to replace the time-ignoring expectation value ${k}_{i}^{{\rm{out}}}{k}_{j}^{{\rm{in}}}/m$ in equation (1) and obtain temporal modularity

Equation (10)

where L = 1 recovers static modularity defined by equation (1). Similarly to static modularity, temporal modularity is negative when each node constitutes an independent community and it can increase by favorable changes in the community assignment; the result of its maximization is thus non-trivial and depends on the network structure. Unlike previous work on communities in multilayer networks [31], we assume that node community affiliation does not change over time. If necessary, this assumption can be relaxed. We adapt the Louvain algorithm [40] to optimize temporal modularity (see appendix B for details). In the toy example from figure 2(A), temporal modularity partitions the nodes correctly for any L from 4 to 20 (see figure 3).

Figure 3.

Figure 3. Comparing modularity and temporal modularity in a small network. Partitions obtained by maximizing static directed modularity (top row) and the temporal modularity with 9 temporal layers (bottom row) in the model network from figure 2(A). The choice L = 9 follows from the observation timescale criterion described in section 3.2.

Standard image High-resolution image

Equation (10) can be viewed as a special case of the previously introduced multi-layer modularity [31, 32] where we constrain node affiliation to be fixed in time. While those studies assumed the layered structure of the data to be given, we obtained equation (10) by construction from continuously growing network data. As a result, the number of layers L can be freely varied, and it is thus important to study how to choose it in practice. This problem was not investigated in previous studies on multi-layer generalizations of modularity [31, 32] where the division of the network into layer was assumed to be given a priori, which is not the case for a generic time-stamped dataset.

One expects that the choice of L is linked with the network's aging speed: while one layer may suffice when aging is slow or even absent, many layers are needed when aging is fast. To measure the aging speed, we measure the median span of links. Assuming that the nodes are labeled by the time of appearance and denoting the out-going and in-coming node of edge n as enout and enin, respectively, the system's median link span τS can be computed as the median of $| {e}_{n}^{{\rm{out}}}-{e}_{n}^{{\rm{in}}}| $, that is

Equation (11)

This timescale can be now compared with the average layer timespan

Equation (12)

which defines the observation timescale of the new community detection method (see table 2 for a summary of all the timescales and related variables considered in this article). If each layer covers time much longer than the aging timescale (${\tau }_{O}\gg {\tau }_{S}$), the temporal effects 'average' out and we can expect the results to be similar to those obtained with static modularity. By contrast, if layers are many and each of them contains only a handful of links (${\tau }_{O}\ll {\tau }_{S}$), temporal modularity is expected to be hampered by statistical fluctuations. We thus expect temporal modularity to work best at an intermediate range of τO values; we will determine the optimal timescale below.

Table 2.  Timescales and related parameters. Summary of the notation adopted in this paper. Note that the nodes are labeled by their order of appearance.

Variable Meaning
Θ Aging parameter in the model
L Number of layers
${\tau }_{O}:= N/L$ Observation timescale (layer duration)
enout Outgoing node of edge n
enin Incoming node of edge n
τS Detected link-based timescale (median of $| {e}_{n}^{{\rm{out}}}-{e}_{n}^{{\rm{in}}}| $)
τJ Similarity-based timescale, extending [36]

To evaluate community detection results on model data, we compute Normalized Mutual Information (NMI) between the detected partitions and the ground-truth communities. Motivated by the tendency of static modularity to produce temporally confined communities, we also compute the size-weighted average time span Ω of the detected communities which is related to the age difference between the oldest and the most recent nodes in each identified community. The higher the Ω, the less temporally confined are the identified communities. The advantage of this metric is that it only concerns the properties of the detected communities and it does not involve any notion of ground-truth communities which cannot be uniquely defined in real data [43]. We will thus use Ω to evaluate partitions in real networks, which will help us to establish a bridge between model-based observations and real data. Details for these two metrics and other metrics that further support our findings are described in appendix A.

3.2. The optimal timescale of temporal modularity

Results for model data with various aging timescales are shown in figure 4. While panels (A, B) explore the parameter plane (τO, Θ), panels (C, D) use τO/τS on the horizontal axis. As can be seen, both the NMI and Ω show a peak around τO/τS ≈ 1, in particular when aging is sufficiently fast (Θ ≪ N). The system's intrinsic timescale τS thus directly determines the optimal value of ${\tau }_{O}^{* }$ for the temporal modularity's layers. The loss of temporal modularity's efficiency when the observation timescale is too long compared to the system's timescale (${\tau }_{O}/{\tau }_{S}\gtrsim 1$) is particularly fast. When τO/τS ≲ 1 (i.e. layers are shorter than the system timescale), we can observe performance plateaus that end when the number of nodes per layer becomes too small (in our simulations 10 nodes per layer or less) and the results become hampered by insufficient statistics. Panels (B, D) further show that the community time span, Ω, can be indeed used to distinguish between the large-τO regime where communities are mainly determined by time and thus of limited time span (right side of the figure), an intermediate regime with 'long' communities that are independent of time (hence they can reflect the network's structural information), and finally the noisy small-τO regime that again yields shorter sub-optimal communities.

Figure 4.

Figure 4. The optimal timescale for community detection in growing networks. (A), (B) Community detection results for model data as a function of the observation timescale τO and the model aging parameter Θ: NMI (A) and the average community timespan (B). τO = 1024 corresponds to static modularity. The dashed lines mark the intrinsic system timescale τS corresponding to given Θ. Model parameters: N = 1024, n0 = 10, kout = 5, fB = 0.1; results are averaged over 100 model realizations. (C), (D) To better appreciate the relation between ${\tau }_{S}$ and the optimal τO, data from panels (A), (B) are plotted here as a function of τO/τS. Full circles mark the right-most points of each curve which correspond to the results obtained with static modularity. These results show that the optimal NMI between the detected and ground-truth communities is achieved for τO ≃ τS; (2) the least time-confined communities (i.e. the communities with the largest Ω values) are observed for τO ≃ τS.

Standard image High-resolution image

Additionally, figure 5 shows results for two other evaluation metrics: the average number of detected communities and the average effective number of detected communities. We see that τO/τS ≈ 1 leads to community divisions with fewer communities (both in absolute terms as well as measured by the effective number of communities ${n}_{\mathrm{eff}}$) than other choices of τO. Results for other model settings and variants (see the figures in appendix C) further confirm that choosing τO = τS is optimal or nearly-optimal in many circumstances. The system's intrinsic timescale τS is thus an important connection between community detection using temporal modularity and the system's intrinsic properties. Since τS is based on studying the temporal properties of individual links, in the following, we refer it as link-based timescale.

Figure 5.

Figure 5. The number of detected communities in growing networks. The average number of detected communities (A) and the average effective number of detected communities (B) for model data. Model parameters and other settings as in figure 4.

Standard image High-resolution image

Let us illustrate how the link-based timescale can be used to overcome the modularity breakdown illustrated in our initial example (figure 2(A)). For this network, the link-based timescale is τS = 7. Setting the observation timescale τO to 7 corresponds to choosing L = N/τO ≈ 8.6 layers in temporal modularity given by equation (10). In figure 3, we rounded that up to 9 layers, leading to the perfect community detection result.

3.3. Link-based and similarity-based timescale detection: a comparative analysis

In the previous analysis, we have defined the system's timescale using the median link time span τS. We now aim to compare this timescale detection criterion against an existing principled method for timescale detection in complex systems [36], which we refer to as similarity-based timescale detection. The original method introduced in [36] constructs iteratively layers ${{ \mathcal S }}_{n}=[{t}_{n},{t}_{n}+{\rm{\Delta }}{t}_{n})$ by maximizing the Jaccard similarity between the sets of events that occurred in pairs $({{ \mathcal S }}_{n},{{ \mathcal S }}_{n+1})$ of consecutive layers (the maximization is with respect to the layer duration Δtn). The original method thus can, in principle, detect layers of heterogeneous lengths. However, as our synthetic networks feature a homogeneous timescale, we consider a variant of the method that aims to detect a single timescale. For given layer duration τ, we compute the average Jaccard similarity between pairs of consecutive layers. The layer duration that leads to the maximal average similarity is selected.

The key element in the similarity-based timescale detection is the definition of an event. Since [36] studies temporal networks, an event naturally is a temporal network link that occurs in layer n. In our case of a growing network, a link between two nodes appears at most once—event sets composed of links would be therefore disjoint and their similarity would be zero for every layer duration τ. We thus extend the original method and assume that an event is a node receiving a link in layer n (as in [36], we consider unweighted events, i.e. it does not matter how many links a node has received). Denoting ${{ \mathcal S }}_{n}$ the set of nodes that receive at least one link in layer n, the Jaccard similarity of layers n and $n+1$ is $J=| {{ \mathcal S }}_{n}\cap {{ \mathcal S }}_{n+1}| /| {{ \mathcal S }}_{n}\cup {{ \mathcal S }}_{n+1}| $. The resulting timescale obtained by maximizing the average layer similarity is referred to as τJ.

Our simulation results show that the two timescales, τS and τJ, have similar values across the whole range of the model aging parameter Θ, and link-based timescales tend to be longer than the similarity-based timescales (figure 6(A)). More importantly, when the detected timescales are used to set the temporal modularity's layer duration, the detected communities are in much better agreement with the ground-truth communities than when static modularity is used. The two timescales yield similar NMI values across the whole range of Θ values except for the two smallest Θ values where the similarity-based timescales performs better than the link-based timescale. It has to be nevertheless noted that the link-based timescale is considerably simpler and amendable to analytical solutions than the similarity-based timescale.

Figure 6.

Figure 6. Comparing the performance of different timescale detection procedures. Results are obtained on model data with parameters as in figure 4 and the model aging parameter Θ varied in the range [1, 1024]. The lines and the shaded areas indicate, respectively, the means and the standard deviation values computed from 100 model realizations.

Standard image High-resolution image

4. Significance analysis

To assess the statistical significance of the detected communities, we compare the results obtained on the model networks with those on model networks where links are randomized within each layer. To this end, we implement the Dynamic Configuration Model (DCM, [41]) using the layer division described before (see equation (10) and its justification): all network links are sorted by the time of their appearance and divided into L layers of equal size. In each layer l, the ${\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}$ outgoing stubs of the nodes are matched with the ${\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}$ incoming stubs of the nodes by choosing the candidate nodes preferentially by the number of remaining stubs. If the matched nodes correspond to an already existing link or a self-loop, the matching is repeated. In the end, there may be a small number of stubs that cannot be matched but this number is usually negligible. The number of layers in the DCM randomization is chosen depending on the system's timescale as N/τS rounded down.

Denoting the highest modularity values achieved on the original and DCM-randomized data as ${Q}_{T}{(L)}^{{\rm{orig}}}$ and ${Q}_{T}{(L)}^{{\rm{DCM}}}$, respectively, the significance of the detected communities can be evaluated by computing the z-score

Equation (13)

where $E[{Q}_{T}{(L)}^{{\rm{DCM}}}]$ and $\sigma [{Q}_{T}{(L)}^{{\rm{DCM}}}]$ denote the mean and the standard deviation of ${Q}_{T}{(L)}^{{\rm{DCM}}}$ over multiple realizations of the DCM randomization (we use 100 realizations). When z is small, the identified communities are not significant as their modularity is similar (or even worse) to the maximal modularity achievable in randomized data that are by construction free of any community structure. The higher the z value, the more significant the network partition (commonly used significance thresholds for z-score are two and three).

We use bootstrap to estimate the uncertainty of the estimated z scores. Denoting the estimated standard deviation of the resulting z-score as σ[z], we can compare the significance of community partitions A and B by evaluating the normalized z-score difference

Equation (14)

which compares the difference between the partitions' z-score with the combined uncertainty of the z-score estimates. Positive δz means that partition A is more significant than partition B. When δz is large, the observed significance-difference is significant itself. In figures 8(C), (F), we use the threshold of 3 to decide between subsets where temporal modularity yields significantly less or significantly more significant communities than static modularity.

For the toy example from figure 3, the described significance analysis shows that the result obtained at L = 9 is much more significant than that obtained with static modularity (the z-scores are 15 and 2.6, respectively). Importantly, the usual significance analysis using the Configuration Model (which, similarly as static modularity, does not take time information into account and it is thus inappropriate in the current setting) deems the incorrect partition obtained with static modularity as highly significant (its z-score is 28). This is an example of a static method that not only produces a misleading result but also confirms its statistical significance.

Significance analysis results for the basic model setting from figure 4 are shown in figure 7. We see that: (1) the detected communities are statistically significant when τO ≈ τS, (2) the communities detected by static modularity are less significant or even not significant when aging is fast; this indicates that these communities are largely determined by node degree dynamics that is preserved by the DCM and not by higher-order structural patterns, (3) a too short observation timescale is also damaging to the significance of the detected communities. We compute also the average NMI between the originally identified communities and the communities identified in randomized data. Figure 7(B) shows that the difference between the communities identified in the original and the randomized data is greatest when τO ≈ τS, which confirms that temporal modularity is then most sensitive to the actual network structure.

Figure 7.

Figure 7. The statistical significance of the communities detected by the temporal modularity maximization. We compare our previous results against randomized data: the z-score (A) and average NMI between the original and the communities detected in networks randomized with the Dynamic Configuration Model [41] (B). Model parameters as in figure 4; results are averaged over 100 model realizations, each of them is randomized ten times.

Standard image High-resolution image

5. Implications for real networks

As τO ≃ τS proves to be an optimal choice of the temporal modularity function in model networks, two natural questions arise: how different are the communities detected by temporal modularity optimization (with τO = τS) from those detected by static modularity maximization in real growing networks? Do the results obtained with temporal modularity call for a revision of our conclusions on the significance of the community structure of growing networks based on static modularity maximization?

To investigate the relation between temporal and static modularity maximization in real data, we use subsets of the news citation dataset that was used in [44] to analyze the backbone of the citation network, and subsets of the American Physical Society (APS) citation data from years 1893–2013; the subsets correspond to specific newspapers and PACS codes, respectively (see appendix D for details). We focus on three main properties: (1) the average community lifespan Ω of the detected communities—we compare Ω1 achieved by static modularity with ΩT achieved by temporal modularity; (2) the NMI between the communities detected by static and temporal modularity (since the true partition is unknown, we cannot directly evaluate the 'correctness' of the obtained communities); (3) the normalized z-score difference between the partitions obtained with temporal and static modularity, respectively.

Figures 8(A), (D) show that the ratio ΩT1 is generally larger than one, confirming that the communities detected by temporal modularity have a longer time span than those detected by static modularity. Importantly, ΩT1 decreases with τS: as aging becomes faster, the communities detected by temporal modularity become more 'stretched' over time compared to those detected by static modularity. This is in qualitative agreement with our results on model data where the increase of Ω by temporal modularity grows with the aging speed (figure 4(D)). Figures 8(B), (E) show that the NMI between the communities by temporal modularity and those by static modularity is substantially smaller than one. Furthermore, the communities detected by the two methods tend to differ more for larger networks. Figures 8(C), (F) show that the communities by temporal modularity tend to be more statistically significant than the communities by static modularity. This indicates that properly including temporal information into the detection algorithm can substantially alter the conclusions on the significance of the community structure in growing networks. The analyzed real growing networks tend to be more 'temporally modular' than modular: factoring out temporal patterns allows us to reveal a richer level of organization than possible with static modularity alone.

Figure 8.

Figure 8. Implications of temporal effects for community detection in empirical growing networks. We report the community detection results for the empirical News subsets (top) and the APS subsets (bottom); see appendix D the data description. The columns show that, in order: (1) the average community time span ΩT obtained with temporal modularity using τO = τS is longer than the static modularity timespan Ω1, and the ratio ΩT1 tends to decrease with the relative aging speed τS/N (Spearman correlation −0.77 and −0.78 for the news and the APS data, respectively). (2) NMI between communities obtained with temporal and static modularity tends to decrease with the number of nodes (Spearman correlation −0.77 and −0.86 for the news and the APS data, respectively). (3) Communities obtained with temporal modularity tend to be more significant than those obtained with static modularity (the light and dark part of each column visualize the subsets with the number of nodes below and above the median for given datasets, respectively, indicating that the difference is even bigger for larger networks).

Standard image High-resolution image

Recent literature [45] suggests that 'event time' defined by the number of nodes characterizes the decay of attention in citation networks better than 'real time'. We have nevertheless considered an alternative to the equal layer size construction introduced in section 3.1 where real time is used instead to both define the layers as well as to measure the intrinsic system timescale τS which is then based on the median of real time differences between the appearance times of the out-going and the in-coming node. The obtained results qualitatively agree with those presented in figure 8 without strong evidence in favor of either of the two temporal modularity constructions. Further research is necessary to map the settings where either of the two constructions is preferrable, and to understand what distinguishes them.

6. Discussion

The implications of our work are multi-fold. It has become increasingly clear [8, 10, 16] that to properly detect structural patterns in time-evolving systems, we need time-aware network analysis methods. While methods based on multi-layer representations of time-evolving networks are potentially powerful [31, 32, 34], they call for the fundamental question of how to choose the temporal resolution at which to look at the system [35, 36]. Importantly, looking at a given system with different observation timescales can reveal different structural phenomena, such as different group organizations in social systems [5] and different behavioral patterns in communication networks [35], among others. Our work sets out to determine the optimal observation timescale ${\tau }_{O}^{* }$ for community detection based on a multi-layer generalization of the modularity function, which we called temporal modularity.

We found, both analytically and numerically, that modularity maximization yields unsatisfactory results in networks where aging decays sufficiently quickly. We found that the optimal observation timescale ${\tau }_{O}^{* }$ of the multi-layer modularity function is determined by the system aging timescale: ${\tau }_{O}^{* }\simeq {\tau }_{S}$. Different choices of τO lead indeed to sub-optimal performance in the reconstruction of ground-truth communities in model data, and to communities that are more strongly determined by node age in real data. The optimal timescale to look at the system is therefore close to the inherent timescale of the system's growth dynamics. This supports the idea that analyzing the structure of a time-evolving network requires to first understand the properties of the dynamical process that generated that network. Note that τS is not bound to the problem of community detection and characterizes the general aging pattern of a given network.

We proposed a simple procedure to detect communities in networks that organically grow over time. We suggested a novel metric, the link-based timescale τS, to characterize the temporal patterns of a network, and demonstrated that this timescale should be used in the community detection process. When the aging timescale is short, the communities detected with our new framework are dramatically different from those detected with standard time-ignoring (static) approaches such as modularity maximization. Our comparison of the link-based timescale τS against similarity-based timescale τJ based on previous literature [36] reveals that the two timescales perform similarly for most parameter values, yet τS is conceptually simpler. In general, different timescale-detection methods can perform well for different tasks, and additional research is needed to study the performance of various timescales in different scenarios.

Our work paves the way for the search of community detection methods best suited to growing networks. Inspired by the equivalence between modularity and the long-known Stochastic Block Model [46, 47], an approach based on the dynamic stochastic block model [48] can be compared against temporal modularity on our growing benchmark graphs. Besides, methods based on higher-order networks representations of temporal networks [10] or consensus dynamics [49] can be tested within our framework. Various modifications (e.g. multiple communities of variable size) may improve our model to generate growing networks with aging and community structure, and the resulting models might be tested to explain the evolution of real growing networks [50]. While we focused here on growing networks, generalizing our results to temporal networks where links can appear or disappear over time [51] is another open direction. Besides, cultural markets [52] and E-commerce systems [53] can be represented with bipartite networks of users and products that grow over time. Adapting our approach to bipartite networks is thus a problem with many potential practical applications.

To conclude, we found that communities detected with temporal modularity are statistically more significant than those detected with standard modularity in the majority of analyzed empirical growing networks, which suggests that including temporal information into the community detection algorithm can unveil a richer large-scale organization than that uncovered by static methods. This calls for a note of caution on the use of the popular modularity maximization and, more generally, static community detection algorithms. Our findings demonstrate that if we analyze a time-evolving system, the communities detected by modularity maximization might be strongly influenced by the age of the nodes. A better way to detect communities in time-evolving networks is to first measure the typical timescale of the dynamic mechanisms that generated that network, and then exploit this information to analyze the system's structural patterns at that timescale.

Acknowledgments

MM, MSM and YCZ conceived the original idea. MM and MSM designed the study. AZ and MM acquired the data. MM and AZ performed numeric simulations. MM and MSM analyzed the results. MM and MSM wrote the manuscript.

All datasets necessary to reproduce the results in this paper will be publicly available.

This work is supported by the Swiss National Science Foundation (Grant No. 200020-156188). MM acknowledges the support from the National Natural Science Foundation of China (Grant No. 11850410444). AZ acknowledges the support from the National Natural Science Foundation of China (Grant No. 61603046). MSM acknowledges financial support from the University of Zurich through the URPP Social Networks, the Swiss National Science Foundation (Grant No. 200021-182659), and the UESTC professor research start-up (Grant No. ZYGX2018KYQD215).

Appendix A.: Community division evaluation metrics

Normalized Mutual Information (NMI) is a standard evaluation metric in community detection research [18, 54]. Denoting the sets of nodes comprising the detected communities as ${{ \mathcal C }}_{1},\ldots ,{{ \mathcal C }}_{D}$ and the sets of nodes comprising the ground-truth communities as ${{ \mathcal G }}_{1},\ldots ,{{ \mathcal G }}_{T}$, the normalized mutual information between the two community partitions is computed as

Equation (A.1)

where N is the total number of nodes. The terms that are of the kind 0 ln 0 are ignored in the summations.

Average community time span (Ω) is introduced to measure the time confinement of communities. For a detected community k, we compute the 80th and 20th percentile of node IDs in the community, and define the community time span Ωk as the difference between the two values. (The difference between the maximal and minimal node ID in the community would be more prone to outlier nodes.) The overall average time span is computed as a weighted mean of Ωk with weights given by the community sizes

Equation (A.2)

Based on figure 1 in the main text and the accompanying discussion, we hypothesize that in systems with fast aging, optimization of static modularity leads to time-constrained communities with time span that is comparatively smaller than the time span of communities identified with temporal modularity.

For a given network partition one can evaluate the number of detected communities, nC. Since the distribution of community sizes can be uneven, we also compute the effective number of detected communities (also known as the inverse Herfindahl index)

Equation (A.3)

where $| {{\mathsf{C}}}_{k}| $ is the size of detected community k. When one community contains almost all network nodes, ${n}_{\mathrm{eff}}\to 1$. When all D detected communities have the same size, ${n}_{\mathrm{eff}}=D$.

Appendix B.: Algorithm for optimizing temporal modularity

We maximize the temporal modularity defined by equation (10) by following the steps used by the Louvain algorithm that was originally proposed to maximize the standard Newman–Girvan modularity for undirected networks [40]. This algorithm is a 'greedy' optimization algorithm which means that only configuration changes that increase the objective function are accepted. The algorithm proceeds as follows. Each node is initially in its own community (ci = i). In the first step, individual nodes move from one community to another. In particular, we choose a node at random and search for the community, which yields the largest increase of QT(L) if the node joins it. After moving the node to the best community, or after determining that there is no modularity-increasing move, we proceed with another node chosen at random. When no single node can be moved, optimization step one ends. Note that if a node that moves from a community is the last node of this community, the community effectively disappears. In this way, the number of communities progressively decreases during the first optimization step.

The second optimization step is similar but instead of individual nodes, we probe merging of entire communities. In particular, we choose a community at random and search for the community that yields the largest increase of QT(L) upon merging with the chosen community. When no pair of communities can be merged, optimization step two ends and the final partition of nodes is reported.

To speed up the computation, the inner sum in equation (10), ${\sum }_{l=1}^{L}{\rm{\Delta }}{k}_{i,l}^{{\rm{out}}}{\rm{\Delta }}{k}_{j,l}^{{\rm{in}}}/{m}_{l}$, can be precomputed and stored in memory. Since the optimization algorithm contains randomness (we probe the nodes and communities, respectively, in random order), it is possible to run the algorithm several times and output the solution that yields the highest value of QT(L). In our simulations, we always use 10 independent algorithm runs.

Appendix C.: Additional results on model data

Results for all evaluated metrics in the parameter plane (τO, Θ) are shown here as heatmaps. This complements figure 2 in the main text which shows NMI and Ω for one model setting. Figure 4 in the main text shows the basic setting $N=1024,{n}_{0}=10,{k}^{{\rm{out}}}=5$ and fB = 0.1. In addition to the observations mentioned in the main text, we see that choosing τO ≈ τS yields the smallest number of detected communities which is yet another positive contribution of temporal modularity. The agreement between the optimal τO and the system intrinsic timescale τS holds also when the communities are more interconnected (see figure C1). Naturally, the results of community detection are then worse in comparison with denser networks or those whose community structure is less noisy.

Figure C1.

Figure C1. As figure 4 in the main text but fB = 0.2 (i.e. the two ground-truth communities are more interconnected).

Standard image High-resolution image

The basic model that we use for all simulations in the main text and all results above is based on preferential attachment and aging; differently from [26], for simplicity there is no node fitness. Figures C2, C3 and C4 show results for model data with substantial variations to the model that are achieved by modifying equation (2) in the main text:

  • (i)  
    model with aging, no preferential attachment, and no fitness: ${P}_{i\to j}\sim {X}_{{ij}}{{\rm{e}}}^{-(i-j)/{\rm{\Theta }}}$,
  • (ii)  
    model with aging and exponentially distributed node fitness, no preferential attachment: ${P}_{i\to j}\sim {X}_{{ij}}{\eta }_{j}{{\rm{e}}}^{-(i-j)/{\rm{\Theta }}}$ where node fitness values η are exponentially distributed in the range $[1,\infty )$;
  • (iii)  
    same as the original model but exponential aging is replaced with power-law aging: ${P}_{i\to j}\sim {X}_{{ij}}({k}_{j}^{{\rm{in}}}+1)/(1+{[(i-j)/{\rm{\Theta }}]}^{2})$;

In all three cases, the community structure-inducing term Xij remains the same as defined by equation (3) in the main text. As can be seen in figure C2C4, these modifications do not alter any of our main results.

Figure C2.

Figure C2. As figure 4 in the main text but preferential attachment has been removed from the model; the attractiveness of nodes to new links is thus determined solely by their age.

Standard image High-resolution image
Figure C3.

Figure C3. As figure 4 in the main text but preferential attachment has been removed from the model and nodes are assigned fitness that is exponentially distributed.

Standard image High-resolution image
Figure C4.

Figure C4. As figure 4 in the main text but exponential aging has been replaced with power-law aging.

Standard image High-resolution image

Appendix D.: Real datasets

To support our findings, we use two different real datasets that can be represented with growing directed monopartite networks: a news dataset [44] and a citation dataset of papers published by the American Physical Society (APS).

D.1. News data

The news citation dataset was used to analyze the backbone of the citation network and its impact on network centrality metrics [44]. The dataset consists of news published by various outlets (newspapers and televisions) and citations among them. Since most citations are among the news published by the same outlet (91%), we treat articles published by individual outlets as independent datasets. For each subset, self-loops and nodes that do not belong in the giant component are discarded. Only subsets with 500 edges or more are included in the further analysis (there are 22 of them). Table D1 summarizes the basic properties of the analyzed news subsets.

Table D1.  Basic characteristics of the datasets corresponding to various news outlets in the dataset from [44]; only the giant component is kept, subsets with less than 500 edges in the giant component are discarded. The shown characteristics: the number of nodes (N), the number of edges (m), mean degree (μ), and the ratio N/τS which, as we argue, can be used to determine the number of layers for temporal modularity.

News outlet N m μ N/τS
Guardian 24 122 45 327 1.88 111.2
Die Welt 26 417 37 610 1.42 118.5
Die Zeit 17 788 31 381 1.76 157.4
Washington Post 17 355 29 092 1.68 39.3
CBS 12 852 22 953 1.79 210.7
Der Spiegel 15 018 21 413 1.43 300.4
Los Angeles Times 10 116 20 220 2.00 12.6
Independent 12 054 17 510 1.45 88.6
Telegraph 10 840 17 091 1.58 73.7
New York Times 7656 12 500 1.63 52.1
International Business Times 7127 10 069 1.41 41.7
Basler Zeitung 3704 6798 1.84 39.0
Neue Zürcher Zeitung 4093 5832 1.42 14.2
Toronto Star 2327 3590 1.54 61.2
Sky News 1667 2407 1.44 119.1
BBC 1736 2288 1.32 115.7
Al Jazeera 1384 1818 1.31 28.8
Süddeutsche Zeitung 942 1171 1.24 94.2
Canadian Broadcasting Corporation 854 1012 1.19 22.5
United Press International 629 950 1.51 27.3
New Yorker 500 722 1.44 35.7
Atlantic 460 646 1.40 19.2

D.2. APS dataset

We use the APS citation data that were obtained on our request from https://journals.aps.org/datasets. Our dataset contains all papers published by the APS from 1893 until December 2013 and the links among them. Importantly, paper metadata contains paper PACS codes which allow us to construct subsets of the original dataset in a controlled way (see https://journals.aps.org/PACS for details on the Physics and Astronomy Classification Scheme, PACS). The original data comprise 539 974 papers and 5 992 897 citations among them. Majority of the papers have some PACS codes assigned (404 999 out of 539 974; most of the papers without PACS codes were published before 1977 when the PACS classification scheme was introduced). The PACS codes have a three-level hierarchy (for example, code '89.75.-k' represents 'Complex systems'). To construct subsets, we use the two top levels (for example, '89.75.*') which results in larger subsets than when complete PACS codes are used. Every paper that has a given two-level PACS code is considered as part of the subset. For each subset, self-loops and nodes that do not belong in the giant component are discarded. To create the subsets, we chose two-level PACS codes whose aging speed N/τS covers a broad range of values. Only subsets with at least 1000 edges are used for further analysis (there are 27 of them). Table D2 summarizes the basic properties of the analyzed APS subsets.

Table D2.  Basic characteristics of the datasets corresponding various top-two-level PACS codes in the APS citation data until December 2013; only the giant component is kept, subsets with less than 1000 edges in the giant component are discarded. The shown characteristics: the number of nodes (N), the number of edges (m), mean degree (μ), and the ratio N/τS which, as we argue, can be used to determine the number of layers for temporal modularity.

PACS code N m μ N/τS
42.50-* 17 890 147 261 8.23 6.3
03.67-* 12 491 122 464 9.80 4.3
03.75-* 8786 116 448 13.25 5.1
98.80-* 9496 97 721 10.29 6.8
74.70-* 7078 51 389 7.26 10.7
14.60-* 3983 37 751 9.48 8.7
25.75-* 3604 34 646 9.61 6.4
74.60-* 3785 31 123 8.22 4.9
71.27-* 4999 23 603 4.72 5.7
04.25-* 1939 23 201 11.97 5.8
04.30-* 2109 22 505 10.67 5.8
95.35-* 2347 19 303 8.22 6.3
61.30-* 3584 19 261 5.37 5.9
72.25-* 2953 16 174 5.48 4.4
61.50-* 3044 9746 3.20 9.5
45.70-* 1750 8741 4.99 4.2
68.37-* 2858 8169 2.86 4.2
72.80-* 2624 6966 2.65 9.3
75.75-* 1585 5702 3.60 4.2
87.16-* 1337 4007 3.00 3.9
03.70-* 810 3302 4.08 7.9
81.15-* 1169 3290 2.81 5.1
52.27-* 855 3279 3.84 4.1
68.65-* 1126 2692 2.39 8.5
33.20-* 976 2322 2.38 8.8
61.70-* 598 1403 2.35 3.8
36.20-* 489 1053 2.15 9.4
Please wait… references are loading.