Optimal timescale for community detection in growing networks

Time-stamped data are increasingly available for many social, economic, and information systems that can be represented as networks growing with time. The World Wide Web, social contact networks, and citation networks of scientific papers and online news articles, for example, are of this kind. Static methods can be inadequate for the analysis of growing networks as they miss essential information on the system’s dynamics. At the same time, time-aware methods require the choice of an observation timescale, yet we lack principled ways to determine it. We focus on the popular community detection problem which aims to partition a network’s nodes into meaningful groups. We use a multi-layer quality function to show, on both synthetic and real datasets, that the observation timescale that leads to optimal communities is tightly related to the system’s intrinsic aging timescale that can be inferred from the time-stamped network data. The use of temporal information leads to drastically different conclusions on the community structure of real information networks, which challenges the current understanding of the large-scale organization of growing networks. Our findings indicate that before attempting to assess structural patterns of evolving networks, it is vital to uncover the timescales of the dynamical processes that generated them.


I. INTRODUCTION
Many complex networks of interest for social science, information science and data mining are not static but grow with time.For example, the global scientific output grows at a fast and accelerating pace, which results in growing citation networks of scientific papers that represent our accumulated scientific knowledge [1,2].The World Wide Web [3] and social networks [4] grow as well.The presence of growth challenges traditional network analysis and graph theoretical methods [5,6], making it essential to develop and validate time-aware methods to achieve a solid understanding of the structure of these systems [7].In particular, extensive research has shown that the inclusion of temporal information into network analysis has a dramatic impact on long-studied problems such as community detection [8][9][10], node ranking [10][11][12], dynamics control [13], and spreading phenomena [14][15][16].
This article focuses on one of the fundamental problems in network science, the detection of communities [17], which has received enormous attention from diverse research areas, including physics [17], computer science [18], ecology [19,20], neuroscience [21], and social science [5,22], among others.While the problem is not uniquely defined [17], it can be generically described as the problem of determining whether there exists a meaningful partition of the network nodes into groups of nodes.The problem (also known as clustering in computer science [18]) has a long history, and it has been traditionally addressed by means of structural, static network analysis.* matus.medo@unifr.chA popular approach to community detection is to maximize a function, called modularity, which quantifies how much the total number of intra-community edges deviates from its expected value under a null model that preserves the individual nodes' number of connections [23].Modularity has been studied from many viewpoints: designing fast algorithms for its optimization [24,25], benchmarking its performance in uncovering groundtruth communities in artificial networks [26][27][28], generalizing it to multilayer and evolving networks [29,30], unveiling its limitations as a quality function of network partition in communities [31][32][33], quantifying the statistical significance of the found communities [34], and pinning down the relation between modularity optimization and the likelihood maximization of the long-known degree-corrected stochastic block model [35].Modularity optimization has become a standard tool in network analysis, and it has been applied to networks as diverse as scientific citation networks [36,37], ecological networks [20], human brain networks [38], and even human skills networks [39].
Despite past research and the wide use of modularity optimization in a broad range of contexts, we still lack a systematic understanding of its behavior and performance in growing networks where time and aging phenomena are fundamental [40][41][42].Nonetheless, modularity is used in such systems either in its original form [36,37] or in a multilayer form where layers correspond to various time intervals [29,30].The former approach can be expected to be suboptimal as it neglects the vital time information.The latter multilayer approach allows to jointly analyze different types of interactions between the network nodes [43,44]; from a temporal perspective, one divides the observed links into layers that group together links created at a sim-ilar time [45].The multilayer approach to community detection has proven to be powerful to identify welldefined functional modules in protein-protein interaction networks [46], gene co-expression networks [46], brain networks [45], human anatomical networks [47], among others.
However, when we wish to apply a multi-layer approach to identify relevant communities in growing networks, we face an impasse: Existing works assume layered input data [29,30,45,48] and thus they does not consider the question of how to divide an arbitrary time-stamped network into layers.Addressing this question requires to choose an appropriate observation timescale, i.e., the temporal duration for each layer [49][50][51].Such choice is essential because different timescale choices might reveal substantially different community structures, which in turn might lead us to different conclusions on the large-scale organization of the system.
In this work, we derive a criterion to estimate when a time-aggregated, static view of a growing network ceases to be sufficient for effective community detection through standard modularity maximization.When this criterion is not met, the detected communities are strongly determined by node age.We divide the input network into subsequent layers of temporal duration τ L each, construct a corresponding multi-layer modularity function, and use the resulting community detection method on diverse synthetic and real datasets.Remarkably, we find that the optimal layer duration τ * L is tightly related to the inherent aging timescale τ S of the system's growth dynamics: τ * L τ S .We use both synthetic and real data to show that different choices of the temporal resolution parameter lead to very different detected communities and conclusions on their statistical significance.
Our results connect the optimal timescale τ * L at which to detect communities in a given system with the system's inherent aging timescale τ S .The results have important implications for time-evolving systems where the community structure plays an important role.In particular, they imply that standard modularity optimization is likely to provide misleading communities in growing networks, and it provides clear guidelines for data-driven calibration of multi-layer community detection techniques.Besides addressing the particular problem of community detection, the connection between the timescale τ L used for structural analysis and the system's intrinsic timescale τ S is relevant to the general problem of analyzing the structure and function of networks that evolve in time.

A. The impact of network growth on modularity
Before detailing the multi-layer community detection method, we start by demonstrating how temporal effects impair the ability of the traditional modularity maxi-mization to uncover the community structure of growing networks with aging.Since aging is common for information networks where connections between the items are usually directed (such as citations in a scholarly citation network and followers in a social network), we use here the formalism of directed networks.Similar analysis is possible for undirected and bipartite networks.The classical Newman-Girvan modularity function [52] has been adapted to quantify the quality of a partition into communities in directed networks [53,54] as where m is the number of network links, A ij is the adjacency matrix which is 1 if node i points to node j and zero otherwise, k out i and k in i are respectively the out-and in-degree of node i, c i denotes the community of node i, δ(c i , c j ) is the Kronecker delta which is 1 when nodes c i = c j and zero otherwise, and the summation indexes i and j run over all N network nodes.Eq. ( 1) is referred to as static modularity from now on.The task is to find the network partition that maximizes static modularity which thus serves as the objective function.Among the various existing approaches to modularity optimization, the Louvain algorithm [24] is a popular choice for this task.
The negative term in Eq. ( 1) represents the expectation of A ij for a random network that has the same inand out-degree sequence as the original network.However, such randomization is of limited use in growing networks where time plays an important role in the nodes' connection patterns [41,42]: It neglects the original network's temporal properties and, consequently, it can violate the network's fundamental temporal constraints.In often-studied citation networks, for example, it generates "unphysical" randomized networks where papers can also cite future papers [55].To illustrate the problem of standard modularity, and later to assess its modification suitable for growing networks, we set up a simple network growth model based on the classical preferential attachment process with aging [41] and it furthermore assumes that each node belongs to one of two ground-truth communities (see Sec. IV A for details).The model can be easily extended to a case with more communities.
An illustration of how static modularity fails on model networks with fast aging is provided by Fig. 1 where despite a small fraction of links between the two communities (f B = 0.23), the correct partition remains unnoticed and the nodes are essentially divided by age in three communities.In fact, static modularity does not always find the correct partition in growing networks even when the two communities are perfectly separated (f B = 0): From some network size on, the global maximum of Q is achieved at four (or more) communities instead of the planted two.We find analytically (see Sec. IV C) that when f B = 0, the critical network size at which fourcommunity partitions start to be have higher modularity The horizontal position of nodes is given by their appearance time, the vertical position is given by the nodes' true community affiliation, and node color marks the community assigned by modularity maximization.While temporal modularity is able to correctly uncover the true community structure of this network (see Fig. S1 in SI), static modularity is essentially insensitive to the true communities and clusters nodes by the time of their appearance.(B-D) Behavior of static modularity on model data with different aging timescales (in all simulations k out = 5 and µ = 0, hence fB = 0; results are averaged over 100 model realizations and the shaded areas visualize the 10th-90th percentile range).Despite the two planted communities being perfectly separated in the model data, when N Θ, modularity optimization yields inferior results with increasing modularity (panel B) but decreasing N M I (panel C).As shown in panel (D), the loss of NMI is due to the number of identified communities nC which is correctly two until some N but then starts to grow.The vertical lines mark the analytically estimated thresholds of 4Θ for the three plotted Θ values.
than two-community partitions is N ≈ 4Θ (see Sec. IV C for details); this is confirmed by Fig. 1.In other words, when aging is sufficiently fast (Θ N/4), static modularity favors incorrect partitions with more than two communities.

B. Multi-layer modularity
To resolve the limitations of static modularity, we propose the temporal modularity quality function building on the recently-introduced Dynamic Configuration Model (DCM) for growing networks [55] which proposes a way of randomizing time-stamped networks whilst approximately preserving the time evolution of each node's degree.We divide the network's links by time into L layers of equal number of links; layers l = 1 and l = L contain the earliest and latest links, respectively.[56] The numbers of outgoing and incoming links established by node i in layer l are denoted as ∆k out i,l and ∆k in j,l , respectively (note that l ∆k out i,l = k out i and l ∆k in i,l = k in i ).The total number of links created in layer l is m l (again l m l = m).With the constraint of given degree increase sequences ∆k out i,l and ∆k in j,l , the expectation of A ij can be written as where we assume that the connections between the nodes are random within each layer.This expectation value can be used to replace the time-ignoring expectation value k out i k in j /m in Eq. ( 1) and obtain temporal modularity where setting L = 1 recovers static modularity defined by Eq. ( 1).Similarly to static modularity, temporal modularity is zero when each node constitutes an independent community; the result of its maximization is thus nontrivial and depends on the network structure.Unlike previous work on communities in multilayer networks [29], we assume that node community affiliation does not change over time; If necessary, this assumption can be relaxed.We adapt the Louvain algorithm [24] to optimize temporal modularity (see SI for details).In the toy example from Fig. 1, temporal modularity partitions the nodes correctly for any L from 3 to 20 (see Fig. S1 in SI).
Eq. ( 3) can be viewed as a special case of the previously introduced multi-layer modularity [29,30] where we constrain node affiliation to be fixed in time.By contrast to previous studies [29,30] which assumed the layered structure of the data to be given, we have obtained Eq. ( 3) by construction from continuously growing network data.As a result, the number of layers L can be freely varied, and it is thus important to study how to choose it in practice.This problem was not investigated in previous studies on multi-layer generalizations of modularity [29,30] where the division of the network into layer was assumed to be given a priori, which is not the case for a generic time-stamped dataset.
One expects that the choice of L is linked with the network's aging speed: While one layer may suffice when aging is slow or even absent, many layers are needed when aging is fast.To measure the aging speed, we measure the median time span of links.Assuming that the nodes are labeled by the time of appearance and denoting the out-and in-coming node of edge n as e out n and e in n , respectively, the system's median link time span τ S can be computed as the median of |e out n − e in n |.This timescale can be now compared with the average layer timespan τ L := N/L which defines the characteristic timescale of the new community detection method.If each layer covers time much longer than the aging timescale (τ L τ S ), the temporal effects "average" out and we can expect the results to be similar to those obtained with the static modularity.By contrast, if layers are many and each of them contains only a handful of links (τ L τ S ), temporal modularity is expected to be hampered by statistical fluctuations.We thus expect temporal modularity to work best at an intermediate range of τ L values; we will determine the optimal timescale in the following.
To evaluate community detection results on model data, we compute the normalized mutual information (NMI) between the found partitions and the ground truth communities [17].Motivated by the tendency of static modularity to produce temporally confined communities, we also compute the size-weighted average time span Ω of the found communities which is related to the age difference between the oldest and the most recent nodes in each identified community (see Methods for details).The higher the Ω, the less temporally confined are the identified communities.The advantage of this metric is that it only concerns the properties of the detected communities and it does not involve any notion of ground-truth communities which cannot be uniquely defined in real data [57].We will thus use Ω to evaluate partitions in real networks, which will help us to establish a bridge between model-based observations and real data.Details for these two metrics and other partition metrics that further support our findings are described Sec.S3 in SI.
Results for model data with various aging timescales are shown in Fig. 2. To facilitate the evaluation of the aforementioned criterion for the optimal number of nodes per layer, we use τ L /τ S on the horizontal axis.Results confirm our insights into the behavior of temporal modularity; its loss of efficiency when method's time-scale is too long compared to the system's time-scale (i.e., system aging is fast) (τ L /τ S 1) is particularly fast.Remarkably, both the NMI and Ω show a peak around τ L /τ S ≈ 1: The system's intrinsic timescale τ S directly determines the optimal temporal duration τ * L for the temporal modularity's layers.When τ L /τ S 1, we can observe performance plateaus that end when the number of nodes per layer becomes too small (in our simulations 10 nodes per layer or less).Panel (B) further shows that the community time span can be indeed used to distinguish between the large τ L regime where communities are mainly determined by time and thus of limited time span (right side of the figure), an intermediate regime where communities are independent of time and we can thus hope that they capture network's structural information, and finally the noisy small τ L regime that again yields shorter sub-optimal communities (see Fig. S4 for other evaluation metrics, average number of found communities and the average effective number of found communities).Again, when aging is sufficiently fast, Ω peaks around τ L τ S .Results for other model parameter values and other model settings shown (Figs.S6-10) further confirm that choosing τ L = τ S is optimal or nearlyoptimal in many circumstances.The system's intrinsic timescale τ S is thus an important connection between the new community-detection algorithm and the system's intrinsic properties.By randomizing links in each layer (which, as an extension of the traditional Configuration model to time-evolving networks, helps to preserve the network's fundamental temporal patterns [55]), we ascertained that the detected communities are statistically significant for all Θ values when τ L is set to τ S (see Methods and SI).

C. Implications for real networks
As τ L τ S proves to be an optimal choice of the temporal modularity function in model networks, two natural questions arise: How different are the communities detected by temporal modularity optimization (with τ L = τ S ) with respect to those detected by static modularity maximization, in real growing networks?Do the results obtained with temporal modularity call for a revision of our conclusions on the significance of the community structure of growing networks based on static modularity maximization?
To investigate the relation between temporal and static modularity maximization in real data, we use subsets of the news citation dataset that was used in [58] to analyze the backbone of the citation network, and subsets of the American Physical Society (APS) citation data from years 1893-2013; the subsets correspond to specific newspapers and PACS codes, respectively (see SI for details).We focus on three main properties: (1) the average community lifespan Ω of the detected communities -we com- Full circles mark the right-most points of each curve which correspond to the result obtained with the static modularity.These results show that the optimal NMI between the detected and ground-truth communities is achieved for τL τS; (2) the least time-confined communities (i.e., the communities with the largest Ω values) are observed for τL τS.
pare Ω 1 achieved by static modularity with Ω T achieved by temporal modularity; (2) the NMI between the communities detected by static and temporal modularity; (3) the difference between the z-scores associated with the maximal static modularity Q and temporal modularity Q T ; the z-scores are obtained by comparing the observed maximal values with the maximal values observed in randomized networks with the same degree time-series for all the individual nodes (see Methods for details).
Figs. 3A-B show that the ratio Ω T /Ω 1 is generally larger than one, confirming that the communities detected by temporal modularity have a longer time span than those by static modularity.Importantly, Ω T /Ω 1 decreases with τ S : the faster node aging, the more "stretched" over time the communities by temporal modularity compared to those by static modularity.This is in qualitative agreement with our results on model data where the increase of Ω by temporal modularity grows with the aging speed (Fig. 2).Figs.3C-D show that the NMI between the communities by temporal modularity and those by static modularity is significantly smaller than one; the communities by the two methods tend to differ more for larger networks.Figs.3E-F show that the communities by temporal modularity tend to be more statistically significant than the communities by static modularity.This indicates that properly including temporal information into the detection algorithm can substantially alter the conclusions on the significance of the community structure in growing networks.The analyzed real growing networks tend to be more "temporally modular" than modular: factoring out temporal patterns allows us to reveal a richer level of organization than possible with static modularity alone.

III. DISCUSSION
The implications of our work are multi-fold.It has become increasingly clear [7,9,15] that to properly detect structural patterns in time-evolving systems, we need time-aware network analysis methods.While methods based on multi-layer representations of time-evolving networks are potentially powerful [29,30,45], they call for the fundamental question of how to choose the temporal resolution at which to look at the system [49,50].Importantly, looking at a given system with different observation timescales can reveal different structural phenomena, such as different group organizations in social The average community time span ΩT obtained with temporal modularity using τL = τS is longer than the static modularity timespan Ω1, and the ratio ΩT /Ω1 decreases with relative aging speed τS/N (Spearman correlation between τS/N and ΩT /Ω1 is below −0.7 for both the news and APS data; see Sec.V. in SI for details).
(2) The difference between communities obtained with the temporal and static modularity tends to increase with the network size (here represented by the number of nodes N ); (3) Communities obtained with the temporal modularity tend to be more significant than those obtained with the static modularity.
systems [51] and different behavioral patterns in communication networks [49], among others.Our work set out to determine the optimal temporal resolution τ * M for community detection based on a multi-layer generalization of the modularity function, which we called temporal modularity.
We found, both analytically and numerically, that modularity maximization yields unsatisfactory results in networks where aging decays sufficiently quickly.We found that the optimal timescale τ * M of the multi-layer modularity function is determined by the system aging timescale: τ * M τ S .Different choices of τ L lead indeed to sub-optimal performance in the reconstruction of ground-truth communities in model data, and to com-munities that are more strongly determined by node age in real data.The optimal timescale to look at the system is therefore close to the inherent timescale of the system's growth dynamics.This supports the idea that analyzing the structure of a time-evolving network requires to first understand the properties of the dynamical process that generated that network.
Beyond the proposed temporal modularity, our work paves the way to the search for community detection methods best suited to growing networks.Inspired by the equivalence between modularity and the long-known Stochastic Block Model [35,59], an approach based on the dynamic stochastic block model [60] can be compared against temporal modularity on our growing benchmark graphs.Besides, methods based on higher-order networks representations of temporal networks [9] or consensus dynamics [61] can be tested within our framework.Various modifications (e.g., individual node fitness [41]) may improve our model to generate growing networks with aging and community structure, and the resulting models might be tested to explain the evolution of real growing networks [62].While we focused here on growing networks, generalizing our results to temporal networks where links can appear or disappear over time [63] is a fascinating challenge for future research.In addition, cultural markets [64] and E-commerce systems [65] can be represented with bipartite networks of users and products that grow over time.Adapting our approach to bipartite networks is thus a problem with many potential practical applications.
Besides, we found that communities detected with temporal modularity are statistically more significant than those detected with standard modularity in the majority of analyzed empirical growing networks, which suggests that including temporal information into the community detection algorithm can unveil a richer largescale organization than that uncovered by static methods.This calls for a note of caution on the use of the popular modularity maximization and, more generally, static community detection algorithms.Our findings demonstrate that if we analyze a time-evolving system, the communities detected by modularity maximization might be strongly influenced by the age of the nodes.A better way to detect communities in time-evolving networks is to first measure the typical timescale of the dynamic mechanisms that generated that network, and then exploit this information to analyze the system's structural patterns at that timescale.

A. Growing network model with community structure
There are initially n 0 nodes that are all assumed to appear at time 0. The network grows in time steps t = 1, . . ., N − n 0 .In each time step, one node is introduced in the network; the final number of nodes is therefore N .Each introduced node creates k out outgoing links to the existing nodes.The probability that node i points to an existing node j is where k in j is the current indegree of node j, exponential aging with the timescale Θ is assumed, and the general preference for links between nodes i and j is encoded in the term X ij .[66] Eq. ( 4) for simplicity omits the fitness term that is crucial to control the resulting network degree distribution [41].The community structure is introduced by assuming where C i and C j are the ground truth communities of i and j, respectively.X ij is 1−µ if nodes i and j are in the same ground truth community and µ if they are not.As a result, the number of links between the communities grows with µ.The interplay between the preference for nodes from the same community and aging is not trivial and results in a non-linear (but nevertheless monotonous) relationship between µ and the fraction of links between the communities (see Fig. S1).In simulations based on model data, we choose such µ(Θ) that a desired value f B of the fraction of links between the communities is achieved.

B. Community division evaluation metrics
Normalized Mutual Information (NMI) is a standard evaluation metric in community detection research [17,67].Denoting the sets of nodes comprising the found communities C 1 , . . ., C F as C 1 , . . ., C F and the sets of nodes comprising the ground-truth communities G 1 , . . ., G T as G 1 , . . ., G T , the normalized mutual information between the two community partitions is computed as where N is the total number of nodes.Average community time span (Ω).For a found community k, we compute the 80th and 20th percentile of node IDs in the community, and define the time span Ω k of this community as the difference between the two values.(Simply computing the difference between the maximal and minimal node ID in the community would be more prone to outlier nodes.)Overall average time span is computed as a weighted mean of Ω k with weights given by the community sizes

C. Breakdown of static modularity in growing networks
We focus here on the simple case where the two planted communities are of similar size and perfectly separated (in the model this case is achieved with µ = 0).For the correct partition in two communities that are identical with the planted communities, modularity contribution from true community k is (8) Modularity of the correct division is thus 1/2.We now study the division of this community in two parts with sizes N 1 and N 2 (N 1 + N 2 = N/2).Modularity contribution from these two communities that cover true community k is Here the second term represents the "loss" of links that are in true community k but they do not contribute to Q k because they run between the two partitions (they cross the partition boundary).The third and fourth term represent the sum of expectation terms which goes over all N 2 1 pairs of nodes in the two communities into which k is subdivided.The smallest sum of expectation terms is achieved when N 1 = N 2 = N/4 (i.e., the true community is divided in two parts of the same size) when we achieve where we used m = zN .This formula explains why division in more communities can be advantageous: the number of links crossing the dividing boundary, ∆m, is a function of Θ and due to aging it does not grow with network size without bounds.On the other hand, the number of network's links, m, is directly proportional to network size.As the network grows, the negative term ∆m/m decreases and Q k thus eventually exceeds Q k : modularity starts dividing the true community k.To understand, at which network size N this occurs, we need to estimate ∆m which can be done easily in the case of exponential aging that we use in our numerical simulations.A node that has been introduced in the system just behind the boundary must direct all its outgoing links across the boundary.Due to aging, these links can be considered to be distributed among the last Θ nodes.Later nodes establish less links that cross the boundary, until eventually node introduced in Θ steps establishes no such link.The expected total number of boundary-crossing links, ∆m, can be find by summing the contributions from these Θ nodes as The inequality Q k > Q k can be now solved with respect to N , yielding Hence, the transition from the correct solution where the two true communities are found to sub-optimal solutions where one or both of these communities is further divided is expected to occur at N ≈ 4Θ.Fig. 1 agrees well with this result.

D. Empirical datasets
To further support our findings, we use two different datasets that can be represented with directed monopartite networks: a news dataset [58] and a citation dataset of papers published by the American Physical Society (APS).Basic properties of the analyzed subsets are summarized in tables in SI.
a. News data.The news citation dataset was used to analyze the backbone of the citation network and its impact on network centrality metrics [58].The dataset consists of news published by various outlets (newspapers and televisions) and citations among them.Since most citations are among the news published by the same outlet (91%), we treat articles published by individual outlets as independent datasets.For each subset, we keep only the articles that belong in the giant component.Only subsets with 500 edges or more are included in the further analysis (there are 22 of them).
b. APS data.We use the APS citation data that were obtained on our request at https://journals.aps.org/datasets.Our dataset contains all papers published by the APS from 1893 until December 2013 and the links among them.Importantly, paper metadata contains paper PACS codes which allows us to construct subsets of the original dataset in a controlled way.The original data comprise 539,974 papers and 5,992,897 citations among them.Majority of papers (404, 999 out of 539, 974; most of the papers without PACS codes were published before 1977 when the PACS classification scheme was introduced) are assigned on average 3 PACS codes.The PACS codes have a three-level hierarchy (for example, code "89.75.-k" represents "Complex systems").To construct subsets, we use the two top levels (for example, "89.75.*") which results in larger subsets than when complete PACS codes are used.Every paper that has a given two-level PACS code is considered as part of the subset, from which eventually only the giant component is kept.Only subsets with at least 500 edges are used for the further analysis (there are 28 of them).

E. Significance analysis
To assess the statistical significance of the detected communities, we compare the results obtained on the model networks with those on model networks where links are randomized within each layer.This recently introduced randomization technique, referred to as the Dynamic Configuration Model (DCM) [55], preserves the temporal dynamics of all nodes' degree values (see SI for details).Denoting the highest modularity values achieved on the original and DCM-randomized data as Q T (L) orig and Q T (L) DCM , respectively, significance of the found communities can be evaluated by computing the z-score where E[Q T (L) DCM ] and σ[Q T (L) DCM ] denote the mean and standard deviation of Q T (L) DCM over multiple realizations of the DCM randomization (we use 100 realizations).When z is small, the identified communities are not significant as their modularity is similar (or even worse) to the maximal modularity achievable in randomized data that are free of any community structure.The higher the z value, the more significant the network partition (commonly used significance thresholds for zscore are two and three).
We use bootstrap to estimate the uncertainty of the estimated z scores (see SI for details).Denoting the esti-mated standard deviation of the resulting z-score as σ[z], we can compare the significance of community partitions A and B (as we do in Figure 3E,F by comparing the significance of partitions obtained with static modularity with those obtained with temporal modularity) by evaluating the normalized z-score difference ) which compares the difference between the partitions' zscore with the combined uncertainty of z-score estimates.
Positive δz means that partition A is more significant than partition B.
In SI, we discuss also a different randomization approach which preserves the network structure and randomizes only the node appearance order.The choice L = 9 follows from the criterion described in the main text: The median link time span is 7 which implies that the appropriate number of layers is around 60/7 = 8.57.The horizontal position of nodes is given by their appearance time, and the vertical position is given by the nodes' true community affiliation.Node color marks the community assigned by maximizing the respective modularity function.While temporal modularity is able to correctly uncover the true community structure of this network (the same correct partition is obtain for any L ∈ {4, . . ., 20}), static modularity fails as it is essentially insensitive to the true communities and clusters nodes by the time of their appearance.Significance analysis based on the Dynamic Configuration Model (see Section S3 for details) shows that only temporal modularity yields a partition that is much more significant (its z-score is 15 which far exceeds the usual significance thresholds of two and three) than the partition obtained with static modularity (whose z-score is 2.6).Importantly, the usual static significance analysis (which, similarly as static modularity, is inappropriate in this dynamic setting) based on the standard Configuration Model deems the parition obtained with static modularity as highly significant (its z-score is 28); this is an example of a static method not only producing a misleading result but also confirming its significance.

S2. BREAKDOWN OF STATIC MODULARITY IN GROWING NETWORKS
We focus here on the simple case where the two planted communities are of similar size and perfectly separated (in the model this case is achieved with µ = 0).For the correct partition in two communities that are identical with the planted communities, modularity contribution from true community k is Modularity of the correct division is thus 1/2.We now study the division of this community in two parts with sizes N 1 and N 2 (N 1 + N 2 = N/2).Modularity contribution from these two communities that cover true community k is Here the second term represents the "loss" of links that are in true community k but they do not contribute to Q k because they run between the two partitions (they cross the partition boundary).The third and fourth term represent the sum of expectation terms which goes over all N 2 1 pairs of nodes in the two communities into which k is subdivided.The smallest sum of expectation terms is achieved when N 1 = N 2 = N/4 (i.e., the true community is divided in two parts of the same size) when we achieve where we used m = zN .This formula explains why division in more communities can be advantageous: the number of links crossing the dividing boundary, ∆m, is a function of Θ and due to aging it does not grow with network The fraction of links between communities fB as a function of µ for model data with n0 = 10, N = 512, k out = 10, and two planted communities.While strongly non-linear when the aging is fast (small Θ), the dependence is monotonous and thus allows to determine the µ value that is necessary to achieve a desired value of fB.Other benchmark models for community detection, such as the Lancichinetti-Fortunato-Radicchi model [26], allow by construction to directly set the fraction of inter-community links in model networks.In our growth model, instead, µ only influences the preference for intra-community links in the model's growth equation and the resulting fraction of inter-community links results by an interplay of this preference and the pool of available target nodes that are created by the network growth.size without bounds.On the other hand, the number of network's links, m, is directly proportional to network size.As the network grows, the negative term ∆m/m decreases and Q k thus eventually exceeds Q k : modularity starts dividing the true community k.To understand, at which network size N this occurs, we need to estimate ∆m which can be done easily in the case of exponential aging that we use in our numerical simulations.A node that has been introduced in the system just behind the boundary must direct all its outgoing links across the boundary.Due to aging, these links can be considered to be distributed among the last Θ nodes.Later nodes establish less links that cross the boundary, until eventually node introduced in Θ steps establishes no such link.The expected total number of boundary-crossing links, ∆m, can be find by summing the contributions from these Θ nodes as The inequality Q k > Q k can be now solved with respect to N , yielding Hence, the transition from the correct solution where the two true communities are found to sub-optimal solutions where one or both of these communities is further divided is expected to occur at N ≈ 4Θ. Figure 1 in the main text (panels B-D) agrees well with this result.

S3. EVALUATION METRICS AND COMPARISON WITH PARTITIONS OF RANDOMIZED NETWORKS
For any network partition, the most immediate metric is the number of found communities n C .Since the distribution of community sizes can be very uneven, we also compute the effective number of found communities (also known as the inverse Simpson or inverse Herfindahl index) where |C k | is the size of found community k.When one community contains almost all network nodes, n eff → 1.When all F found communities have the same size, n eff = F .To evaluate how confined in time are the found communities, we introduce their average time span Ω.For found community k, we compute the 20th and 80th percentile of node IDs in the community, and define the time span Ω k of this community as the difference between the two values.(Recall that we assume that nodes are labeled in the order of their appearance, hence the difference of node IDs is a suitable measure of the time span between the nodes.)We then compute the size-weighted average of community time span Ω k to obtain the resulting average community time span Based on Fig. 1 in the main text and the accompanying discussion, we hypothesize that in systems with fast aging, optimization of static modularity leads to time-constrained communities with time span that is comparatively smaller than the time span of communities identified with temporal modularity.
In model data, we know the ground truth and can thus evaluate the found communities by comparing them with the ground truth communities.Normalized mutual information (NMI, [68]) is an information-theoretic metrics that is popular in the field of community detection.It computes the overlap between the G ground truth communities T 1 , . . ., T G and the F found communities C 1 , . . ., C F .Denoting |C k ∩ T l | = m kl , l m kl = x k and k m kl = y l , the metric is defined as where 0 log 0 = 0 by convention.

S3.1. Comparison with results on randomized data
To probe how strongly the network structure determines the found communities and whether these communities are significant, we compare the modularity maximization results with results obtained on randomized data.We consider two distinct randomizations.In the first one, the network's links are randomized using the Dynamic Configuration Model (DCM, [55]).The DCM is similar to the standard Configuration Model (CM, [69]) which preserves the node inand out-degree values (in a directed network) and matches the corresponding in-and out-stubs of nodes at random.By contrast, the DCM acknowledges that node connection patterns are not time-invariant and the idea of matching node stubs at random thus cannot be applied universally.Instead, the DCM first divides links by age into a given number of layers and essentially realizes the CM in each individual layer.As a result, the DCM preserves not only node degree but also the sequence of degree increase values in each layer-upon a suitable choice of the number of layers, it is thus able to simultaneously preserve the network's temporal properties and randomize its structural properties [55].We implement the DCM using the layer division described in the main text (see Eq. ( 3) and its justification): all network links are sorted by the time of their appearance and divided into L layers of equal size.In each layer l, the ∆k out i,l outgoing stubs of the nodes are matched with the ∆k in j,l incoming stubs of the nodes by choosing the candidate nodes preferentially by the number of remaining stubs.If the matched nodes correspond to an already existing link or to a self-loop, the matching is repeated.At the end, there may be a small number of stubs that cannot be matched but this number is usually negligible.The number of layers is chosen so that each layer's span N/L matches the system timescale which is defined as the median link timespan τ S .
Our results show (see Fig. S5) that: (1) the found communities are statistically significant when τ L ≈ τ S , (2) the communities found by static modularity are less significant or even not significant when aging is fast-this indicates that these communities are largely determined by node degree dynamics that is preserved by the DCM and not by higher-order structural patterns, (3) the difference between communities in the original and randomized data is greatest when τ L ≈ τ S , which confirms that temporal modularity is then most sensitive to the actual network structure.One can also consider unchanged model networks where only the order of node arrival is randomized.Results show (Fig. S5) that when aging is fast and L > 1, the resulting z score is negative (up to −190) which indicates that useful information is lost by randomizing the node order of arrival.
Another measurement that can be carried out using randomized networks is that of similarity between the partitions obtained on original and randomized data, respectively.For each DCM realization, we evaluate NMI between the newly found and original communities, and finally compute the average value N M I DCM of this metric.
The second randomization approach preserves the network structure and randomizes only the node appearance order.This is inspired by [70] where the authors use a similar procedure to randomize nodes' geographical locations whilst keeping their connections unchanged-we now do the same with the temporal dimension replacing their spatial dimension.On thus-randomized networks, one can maximize temporal modularity and then compare the results with the results on the original data.In this way we obtain the z-score z N O and the average normalized mutual information N M I N O (here N O stands for node order that is randomized in the process).

S4. ADDITIONAL RESULTS ON MODEL DATA
Results for all evaluated metrics in the parameter plane (Θ, L) are shown here as heatmaps.This complements Figure 2 in the main text which shows NMI and Ω for one model setting.Figure S3 shows the basic setting N = 1024, n 0 = 10, k out = 5 and f B = 0.1.In addition to the observations mentioned in the main text, we see that choosing N 1 ≈ τ yields the smallest number of found communities which is yet another positive contribution of temporal modularity.Figure S4 shows results of comparisons with randomized data for the same set of parameters.
The agreement between the optimal N 1 and network timescale τ holds also when the network is more sparse (see Figure S5) or when the communities are more interconnected (see Figure S6).Naturally, the results of community detection are then worse in comparison with networks that are more dense or those whose community structure is less noisy.
The basic model that we use for all simulations in the main text and all results above is based on preferential attachment and aging; differently from [41], for simplicity there is no node fitness.Figures S7, S8 and S9 show results for model data with substantial variations to the model: (1) model with aging, no preferential attachment, and no fitness, (2) model with aging and exponentially distributed node fitness, no preferential attachment, (3) same as the  original model but exponential aging is replaced with power-law aging.As can be seen in the figures below, these modifications do not alter any of our main results.I. Basic characteristics of the datasets corresponding to various news outlets in the dataset from [58]; only the giant component is kept, subsets with less than 500 edges in the giant component are discarded.The shown characteristics are: number of nodes (N ), number of edges (E), mean degree (z), and the ratio N/τS which, as we argue, can be used to determine a suitable number of layers for temporal modularity.

S5. REAL DATA
To further support our findings, we use two different datasets that can be represented with directed monopartite networks: a news dataset [58] and a citation dataset of papers published by the American Physical Society (APS).

S5.1. News dataset
The news citation dataset was used to analyze the backbone of the citation network and its impact on network centrality metrics [58].The dataset consists of news published by various outlets (newspapers and televisions) and citations among them.Since most citations are among the news published by the same outlet (91%), we treat articles published by individual outlets as independent datasets.For each subset, we keep only the articles that belong in the giant component.Only subsets with 500 edges or more are included in the further analysis (there are 22 of them).Table I summarizes the basic properties of the analyzed subsets.

S5.2. APS dataset
We use the APS citation data that were obtained on our request at https://journals.aps.org/datasets.Our dataset contains all papers published by the APS from 1893 until December 2013 and the links among them.Importantly, paper metadata contains paper PACS codes [71] which allows us to construct subsets of the original dataset in a controlled way.The original data comprise 539,974 papers and 5,992,897 citations among them.Majority of papers are assigned on average 3 PACS codes (404, 999 out of 539, 974; most of the papers without PACS codes were published before 1977 when the PACS classification scheme was introduced).The PACS codes have a three-level hierarchy (for example, code "89.75.-k" represents "Complex systems").To construct subsets, we use the two top levels (for example, "89.75.*") which results in larger subsets than when complete PACS codes are used.Every paper that has a given two-level PACS code is considered as part of the subset, from which eventually only the giant component is kept.Only subsets with at least 1000 edges are used for the further analysis (there are 27 of them).Table II summarizes the basic properties of the analyzed subsets.

S5.3. Results on real data
To assess the results of temporal modularity on real data, we measure the mean community time span Ω T achieved with temporal modularity and compare it with the mean community time span Ω 1 of static modularity (which corresponds to setting L = 1 in temporal modularity).In Figure 3 in the main text (panels A and B), the ratio Ω T /Ω 1 is plotted against τ S /N which is a relative characteristics of the aging speed in the network (when τ S is large, aging is slow; when τ S becomes comparable with N , aging becomes negligible).We further study the overlap (measured with the NMI) between the communities obtained with static and temporal modularity, respectively (Figure 3, panels C and D).Table III shows Spearman correlation values between these two summary characteristics of temporal modularity communities and basic network characteristics that display significant correlation.We can see that the relative community time span is chiefly determined by the relative aging speed (fast aging implies big relative community time span) and the community overlap is determined by the network size (in large networks, the difference between static and temporal tends to be big).

FIG. 1
FIG. 1. (A)Temporal confinement of communities identified by the standard static modularity in a model directed network with fast aging (N = 60, Θ = 3, k out = 10, µ = 0.05).The horizontal position of nodes is given by their appearance time, the vertical position is given by the nodes' true community affiliation, and node color marks the community assigned by modularity maximization.While temporal modularity is able to correctly uncover the true community structure of this network (see Fig.S1in SI), static modularity is essentially insensitive to the true communities and clusters nodes by the time of their appearance.(B-D) Behavior of static modularity on model data with different aging timescales (in all simulations k out = 5 and µ = 0, hence fB = 0; results are averaged over 100 model realizations and the shaded areas visualize the 10th-90th percentile range).Despite the two planted communities being perfectly separated in the model data, when N Θ, modularity optimization yields inferior results with increasing modularity (panel B) but decreasing N M I (panel C).As shown in panel (D), the loss of NMI is due to the number of identified communities nC which is correctly two until some N but then starts to grow.The vertical lines mark the analytically estimated thresholds of 4Θ for the three plotted Θ values.

FIG. 2
FIG. 2. (A, B) Community detection results for model data as a function of the method timescale τL and the model aging parameter Θ: NMI (A) and the average community timespan (B).τL = 1024 corresponds to static modularity.The dashed lines mark the intrinsic system timescale τS corresponding to given Θ.Model parameters: N = 1024, n0 = 10, k out = 5, fB = 0.1; results are averaged over 100 model realizations.(C, D) To better appreciate the relation between τS and τL, data from panels (A,B) are plotted here as a function of τL/τS.Full circles mark the right-most points of each curve which correspond to the result obtained with the static modularity.These results show that the optimal NMI between the detected and ground-truth communities is achieved for τL τS; (2) the least time-confined communities (i.e., the communities with the largest Ω values) are observed for τL τS.

FIG. 3 .
FIG.3.Community detection results for empirical growing networks from News (left) and the APS (right) -see Methods for all dataset details.The rows show that, in order:(1) The average community time span ΩT obtained with temporal modularity using τL = τS is longer than the static modularity timespan Ω1, and the ratio ΩT /Ω1 decreases with relative aging speed τS/N (Spearman correlation between τS/N and ΩT /Ω1 is below −0.7 for both the news and APS data; see Sec.V. in SI for details).(2)The difference between communities obtained with the temporal and static modularity tends to increase with the network size (here represented by the number of nodes N ); (3) Communities obtained with the temporal modularity tend to be more significant than those obtained with the static modularity.
FIG. S1.Partitions obtained by maximizing static directed modularity (top row) and the temporal modularity with 8 temporal layers (bottom row) in a small model network (N = 60, Θ = 3, k out = 10, µ = 0.07; the resulting fraction of links between the communities is 0.24).The choice L = 9 follows from the criterion described in the main text: The median link time span is 7 which implies that the appropriate number of layers is around 60/7 = 8.57.The horizontal position of nodes is given by their appearance time, and the vertical position is given by the nodes' true community affiliation.Node color marks the community assigned by maximizing the respective modularity function.While temporal modularity is able to correctly uncover the true community structure of this network (the same correct partition is obtain for any L ∈ {4, . . ., 20}), static modularity fails as it is essentially insensitive to the true communities and clusters nodes by the time of their appearance.Significance analysis based on the Dynamic Configuration Model (see Section S3 for details) shows that only temporal modularity yields a partition that is much more significant (its z-score is 15 which far exceeds the usual significance thresholds of two and three) than the partition obtained with static modularity (whose z-score is 2.6).Importantly, the usual static significance analysis (which, similarly as static modularity, is inappropriate in this dynamic setting) based on the standard Configuration Model deems the parition obtained with static modularity as highly significant (its z-score is 28); this is an example of a static method not only producing a misleading result but also confirming its significance.
FIG. S2.The fraction of links between communities fB as a function of µ for model data with n0 = 10, N = 512, k out = 10, and two planted communities.While strongly non-linear when the aging is fast (small Θ), the dependence is monotonous and thus allows to determine the µ value that is necessary to achieve a desired value of fB.Other benchmark models for community detection, such as the Lancichinetti-Fortunato-Radicchi model[26], allow by construction to directly set the fraction of inter-community links in model networks.In our growth model, instead, µ only influences the preference for intra-community links in the model's growth equation and the resulting fraction of inter-community links results by an interplay of this preference and the pool of available target nodes that are created by the network growth.

8 N
FIG. S3. Results for basic evaluation metrics on model data: NMI, CCE, average community timespan Ω, number of found communities nC , and the effective number of found communities n eff .The dashed line visualizes the optimal number of nodes per community (N1) obtained using median link length.(Model parameters are N = 1024, n0 = 10, k out = 5 and fB = 0.1; results are averaged over 100 model realizations.) FIG. S5.As Fig. S3 but k out = 3 (i.e., the resulting network is more sparse).
FIG. S7.As Fig.S3but preferential attachment has been removed from the model; the attractiveness of nodes to new links is thus determined solely by their age.
FIG. S8.As Fig. S3 but preferential attachment has been removed from the model and nodes are assigned fitness that is exponentially distributed in the range [1, ∞) (model equation thus becomes Pi→j ∝ fje −(i−j)/Θ where fj is fitness of node j).