Topological analysis of traffic pace via persistent homology

We develop a topological analysis of robust traffic pace patterns using persistent homology. We develop Rips filtrations, parametrized by pace, for a symmetrization of traffic pace along the (naturally) directed edges in a road network. Our symmetrization is inspired by recent work of Turner (2019 Algebr. Geom. Topol. 19 1135–1170). Our goal is to construct barcodes which help identify meaningful pace structures, namely connected components or ‘rings’. We develop a case study of our methods using datasets of Manhattan and Chengdu traffic speeds. In order to cope with the computational complexity of these large datasets, we develop an auxiliary application of the directed Louvain neighborhood-finding algorithm. We implement this as a preprocessing step prior to our main persistent homology analysis in order to coarse-grain small topological structures. We finally compute persistence barcodes on these neighborhoods. The persistence barcodes have a metric structure which allows us to both qualitatively and quantitatively compare traffic networks. As an example of the results, we find robust connected pace structures near Midtown bridges connecting Manhattan to the mainland.


Introduction
Today's road networks are challenged with an excess of congestion and faced with an array of routing algorithms, apps, and other tools which can easily lead to unexpected and emergent behaviors. On the other hand, with more data now available, it is becoming possible to think of big-data approaches to understanding large-scale mobility problems and compare cities [AMSW+17,DM15,FPV+13,GCW16,ZOXY16,ZUZ14].
Our effort here is to understand the topology of pace. Road infrastructure, passenger travel demands, and real-time information systems all can interact to create complex behaviors on road networks. An important analytical challenge is to construct robust and coarse descriptions of traffic behavior which can be used to assess and compare these complex networks. Our interest here is to apply some recent methods of topological data analysis to understand emergent behaviors in the presence of slow roads. We hope that these techniques will lead to new ways of characterizing mobility patterns.

Outline
We begin in section 3 with a summary of previous work and the history of topological data analysis. The background material begins in section 4, where we review the fundamental construction which allows us to pass from directed graphs to topological objects with higher-order structural information. We note that the assignment of a Rips complex to a directed network is an intermediate step in order to ultimately assign a collection

Topological data analysis
A recurring motif in the development of mathematics is the development of tools to identify objects which are invariant under different perspectives. This has become increasingly interesting in computational analyses of big data, where large datasets may represent some underlying structure. The field of topological data analysis has developed a number of methods for addressing challenges of this sort and doing so in robust ways.
Originally developed by Edelsbrunner, Letscher, and Zomorodian in [ELZ02], persistent homology is a method for identifying certain topological features of a dataset in the presence of 'topological noise'. Persistent homology seeks to associate a family of topological invariants, in the form of vector spaces, with a dataset, and to also identify a measure of significance of each invariant (see [CM17]) by means of perturbation with respect to a parameter.
The original use case of persistent homology was to point clouds which sampled points on the surface of an object. Using the Vietoris-Rips complexes, persistent homology allows one to reliably reconstruct the topology of the surface, even in the presence of sampling noise, and quantify how robust that reconstruction is to the sampling noise via a barcode [Ghr08].
Our interest here is to develop an application of persistent homology to analysis of traffic networks. As with point cloud data, we want to understand macroscopic topological structures from local data; the topological structures of interest here are pace networks where a driver may find it difficult, or alternately easy, to avoid slow roads. A simplified version of this was developed in [WSK+17], which ignored the effect of directionality on streets (see also [FP20] for another application of persistent homology to urban data). Our goal here is a treatment which properly addresses the effect of directionality. Similar methods have also appeared in [LFW+15], but the metric space of barcodes makes no appearance in that work.
To understand the development of our ideas in a meaningful way, we consider traffic in Manhattan from the dataset of [DW16,DMA+16]. This dataset is large and complex enough (section 6) to allow for a range of traffic behavior. We do find some locally coherent structures, particularly near connections between Manhattan and the mainland. Some of our conclusions are in section 9. In order to compare structures between cities, we also consider traffic from Chengdu using the dataset of [GZDG19a].

Rips complexes of directed networks
To associate topological signatures to a directed network, we first associate a family of simplicial complexes (topological spaces) to the network. While it is true that an (undirected) graph already has the structure of a topological space, there is a major problem with using graph topology alone. As topological spaces graphs have no built-in notion of direction, and the topological invariants calculated by viewing a graph as a topological space do not see the directed nature of the graph. For example, the null space and left null space of a directed adjacency matrix carry information about weakly connected components and undirected cycles respectively rather than their strong/directed counterparts. Defining the notion of a directed topological space leads one down the complicated path of sheaves of preorders or topological spaces together with a collection of distinguished directed paths, see for example [Gra03]. For this reason, we instead build simplicial complexes (topological spaces) from a directed graph which during construction use the directed nature of the graph and whose associated topological invariants change if one changes the direction of links in the graph. Furthermore, rather than choosing a single threshold which we consider slow, we use the paradigm of persistence [ELZ02] to bundle a collection of spaces into a single topological object which simultaneously encapsulates the connectivity of a network at many different pace thresholds. The idea of considering a family of thresholds in order to get a notion of robust emergent features in traffic is not new, and has already appeared in [LFW+15]. While loc. cit. considers emergent features arising from functionally connected components of a traffic network, we consider both connected components and higher order topological information (cycles). We discuss the relationship between our methods and those of [LFW+15] in more detail in subsection 4.1.
Our starting point is a weighted directed graph D = (V, E, W), informally (at the moment) corresponding to a road network with intersections V, links E, and traffic pace W. Roads are naturally directed, so E consists of ordered pairs (v, v ) of distinct vertices which correspond to a link from v to v . A two-way link between v and v corresponds to two edges, (v, v ) and (v , v ). Each edge e ∈ E has a positive weight w e denoting traffic pace (time/distance; the reciprocal of speed) on the link. Section 6 will give some summary statistics of pace, distance, and travel time along links in our dataset. Pace is an appropriate indicator of road speeds, even though pace is not additive along links (the pace along a path consisting of several links is not the sum of the paces along the links).
If the link between v and v is bi-directional, the paces w (v,v ) and w (v ,v) in the different directions need not be the same. To avoid degeneracy, we assume that the graph is strongly connected-any destination vertex can be reached from any origin vertex via a directed path.
Topological data analysis depends on a notion of 'nearness' of a collection of points. For traffic pace analysis, two points are 'near' if there is a fast directed path joining them (by contrast, for point clouds, the notion of nearness is simply Euclidean distance). More formally, for vertices v and v , let (1) is the minimum sum of paces over all paths from v to v . For consistency, we also define for all v ∈ V. Since pace is not additive along links in a road network (unlike trip time or length of a path), d is a synthetic distance which is a reasonable indicator of slow roads. For (the cheapest cost from v to v is less than the cost of paths which go through v ). Also, since the w e 's are To capture the directional nature of our network, we will build upon [Tur19] and consider symmetrizations of distance given by (the last equality following since both terms in the maximum are the same), which is the triangle inequality.
On the other hand, sym − (d) is not a metric.
Counterexample 4.1. Consider the diagram so the triangle inequality does not hold. Given a pairwise notion of 'proximity' such as sym ± (d), we can construct Rips filtrations which describe how bigger and bigger 'neighborhoods' (defined according to sym ± (d)) fill up the entire space. For sym + (d) these neighborhoods will correspond to nodes which are 'close' in terms of bi-directional travel times, while for sym − (d), the nodes will be 'close' in at least one direction. The work of [Tur19] clarifies the structure of these Rips complexes built from sym − (d). Symmetric notions of proximity simplify the construction of these neighborhoods, but this too can in fact be relaxed (Dowker [CM18] persistence diagrams, for example, might be appropriate to identify robust origin or destination regions). See also [FP19] for yet other examples of persistence in geospatial data.
An understanding of the topology of networks is often built upon simplices. A triangle (a two-simplex) with vertices v, v , and v is denoted as [v, v , v ], a line (a one-simplex) with vertices v and v is denoted as [v, v ], and a point (a zero-simplex) v is denoted as [v]. Subsimplices are naturally defined; Higher-dimensional simplices can be similarly defined, but they will not play a role in our analysis.
Definition 4.2 (Simplicial complex). Given a vertex set V ⊂ V, an abstract simplicial complex Δ with vertex set V is a collection of simplices which is closed under the operation of taking nonempty subsets; i.e., if K ∈ Δ and L is a subsimplex of K, then L ∈ Δ.
In other words, the collection is the simplicial complex formed by a triangle with vertices v, v , and v .

Definition 4.3 (Rips complex).
For ε > 0, define the two-skeleton of the Rips complex as This is clearly a simplicial complex; if sym ± (d)(v, v ) < ε for all v and v in some V • ⊂ V, then sym ± (d)(v, v ) < ε for all v and v in any nonempty subset of V • . Simplicial complexes are formally defined as finite subsets of a given vertex set; since V is itself finite, all subsets are also necessarily finite. Counterexample 4.1 in particular shows a distance-preserving embedding may not exist; this precludes using Cech complexes (which have nicer theoretical properties) instead of the Rips complexes. Looking now at the Rips filtration R ± (D) def = {R ε ± (D); ε > 0}, we see that R ε ± (D) becomes larger as ε increases; more pairs of points satisfy the distance criterion. If ε < ε , then there is a canonical inclusion from R ε (D) into R ε (D). Understanding the geometry of this map (via algebraic topology) gives us a rigorous framework for 'unfolding' the topology of the network. We will formalize this in section 5.
We will be interested in 'complex' road networks, where |V| is large (see our case study of section 6). While the Rips complex gives us an easily-implementable definition of 'neighborhood' (see [CM17] and [CdSO12] for a detailed discussion of Rips and Cech complexes), the total number of simplices in the Rips complex grows exponentially with the number of vertices [Zom10]. Nevertheless, our interest in connected sets of congested traffic, and 'beltways' of faster streets surrounding congested areas allows us to focus only on zero, one, and two-simplices of the Rips complex; these computations will be at most cubic in the number of vertices [Zom10].

A comparison to percolation methods
In [LFW+15], for a traffic network G, the authors assign to each link at a given time the percentage of maximal velocity, where maximal velocity is computed as the 95th percentile of velocity measurements on the link on a given day. They then vary a threshold q from 0 to 1, and include only links whose percentage of maximal velocity is above the threshold q. After computing strongly connected components, the authors obtain a family of clusters which merge over time. The associated algebraic object keeping track of these merges is exactly the strongly connected persistence module from definition 26 of [Tur19].
Although [LFW+15] is similar in spirit to our work, there are some fundamental differences. We instead choose to work with the symmetrization of the path distance sym +/− (d) (another approach from [Tur19]) because we want to study higher dimensional homological information, namely data about cycles in a network. Rather than thresholding links in the original graph based on the weight of the link alone, we create a simplicial complex which has links (one-simplices) between nodes if a driver can travel at a high speed for the entire journey from one node to another.

Persistent homology
Although the homology of a simplicial complex or topological space has an abstract definition, the computation of homology with field coefficients comes down to calculating the row spaces and null spaces (image and kernel) of some matrices. For example, the null space of the edge-node adjacency matrix of an undirected graph is the zeroth cohomology group of the graph (which is equivalent to the zeroth homology in our setting), while the left null space is the first homology group. In the spirit of [LFW+15], we are more interested in the functional properties (i.e. traffic pace) of a traffic network rather than the structural properties of the underlying graph.
The key insight of persistent homology is that one can simultaneously compute the homology of a family of simplicial complexes while keeping track of how the relationships between the complexes induce relationships in homology in an algorithmic fashion. While spectral sequences have a long history of being used to solve such problems, spectral sequences do not in any practical sense yield an algorithm for computing homology. At the end of this section, we recall the metric structure on the collection of persistence diagrams, which allows one to use a topological notion of distance when comparing the persistent homology of Rips complexes associated to directed graphs.
The inclusion of Rips complexes into each other as ε varies in (4) leads to quantitative barcodes which capture the stability of topological structures. Taking the homology of each complex with coefficients in the field Z/2Z, we get the persistence module for ε 1 < ε 2 < · · · < ε M , which breaks into a direct sum of persistence intervals [ZC05], which are chain complexes of the form where the copies of Z/2Z correspond to the ε m 's. In practice, the relevant values of ε are the values of sym ± (d)(v, v ) as v and v range over V. Persistent homology outputs a collection of persistence intervals which shed light on pace structures which are in some sense robust. We can visualize this persistence interval in a number of ways. The two most common are the persistence barcode, which plots a horizontal bar starting at ε i and ending at ε m or as the point (ε i , ε m ) in a persistence diagram (a persistence diagram is a collection of (birth, death) pairs in the plane together with the diagonal Δ. See [CM17] for details). Furthermore, we can compute an explicit collection of homology generators for each value of ε, and highlight these generators via a visualization of the one-skeleton (a graph) of the Rips complex R ε (sym ± (d)). See also appendix A for a detailed example of a persistent homology calculation.
Our goal is to use explicit generators of persistent homology to draw conclusions about traffic routing and pace. We view separations of the Rips complex into connected components as being due to traffic obstacles (high pace edges or inherent lack of connectivity in the traffic graph), and we capture the connected components of the Rips complex at various pace thresholds using persistent H 0 . The generators of H 1 (R ε (sym + (d))) will capture bi-directional beltways around congested areas. Studying uni-directional traffic is slightly more involved. By the definition of sym − (d), the first persistent homology group H 1 (R ε (sym 1 (d))) will capture fast routes around congested areas which are undirected cycles but not necessarily directed cycles. It follows that fast uni-directional cycles will (up to change of basis) be a subset of the generators returned. Furthermore, fast undirected cycles can give useful information about where to build bypasses in order to construct a proper beltway around slow roads: if the majority of the edges in an undirected cycle correspond to a single orientation, then building bypasses over the other edges yields a fast uni-directional beltway. Starting in section 9, we will develop a case study of Manhattan. As far as obtained generators which are optimal in some sense, because we are using Z/2Z coefficients, the optimal generator problem is NP-hard (even for H 1 ; see [CF11]). For this reason we do not pursue optimal generators here. We believe that the tools of topological data analysis can be of use in comparing traffic behavior in different cities or environments (viz seasons, before-and-after comparisons). The Wasserstein distance [CM17] allows us to compare persistence diagrams and thus simultaneously compare topological signatures at various scales. The Wasserstein one-distance between persistence diagrams dgm 1 , dgm 2 is defined as Here a matching m of two persistence diagrams dgm 1 , dgm 2 is a subset m ⊆ dgm 1 × dgm 2 such that for every p ∈ dgm 1 − Δ and q ∈ dgm 2 − Δ, the sets ({p} × dgm 2 ) ∩ m and (dgm 1 × {q}) ∩ m each contain a single pair.

Data and setup
To ground our efforts, we develop our ideas around the dataset of [DMA+16], which gives us hourly estimates of the travel times along different links in Manhattan. Our model of road networks is given by Open Street Map (circa February 26, 2020), which lists 4496 nodes (intersections) and 9720 links in Manhattan. The dataset of [DMA+16] was obtained by reverse-engineering traffic speeds (and counts) from taxi origin-destination pairs, assuming that taxi drivers universally minimize travel times. From that dataset, we use hourly (24 h a day) traffic estimates in of (8). Our estimated travel time dataset then contains 14 243 122 total traffic speed estimates. Further restricting this data to only those links which begin and end in Manhattan, we obtain data for 5491 (roughly 56%) of the 9720 links in our graph. Links with no travel times are removed from the graph. Figure 39 shows how the estimated travel times and speeds (i.e., link distance, as given by Open Street Map, divided by estimated time) vary throughout the hours of the day. The geographic distribution of the pace (the reciprocal of speed) is given in figure 40. These speed estimates serve as a proxy for actual travel speeds (which of course may differ from posted speed limits). The speed estimates of [DMA+16] may in fact be incomplete at any given time; a given link may have had no traffic, or any 'optimizing' taxi driver would have avoided it (due to, e.g., congestion). We resolve these deficiencies by averaging across days to get an hour-of-day estimate on each edge. In other words, if the dataset has estimates of the travel time of some link at 8 AM on two different days as 50.84 s and 30.0 s, we will define the travel time on that link to be 40.42 s. If travel times are stationary on a daily basis, this will approximate the true travel times.
Notwithstanding this averaging, consistently missing speed estimates may lead to disconnected components of Manhattan. Let the weighted directed graph In order to demonstrate how our algorithm changes in the face of differing graph topology, we also study the traffic network of Chengdu, China using the dataset [GZDG19b]. The dataset [GZDG19b] estimates traffic speeds at different time periods using GPS data from floating vehicles, ignoring links with few or no GPS  readings. After averaging traffic speeds along different time periods to obtain average speed and time estimates, we obtain a strongly connected directed graph ChengG • with 1902 nodes and 5943 links shown in figure 2. New York City and Chengdu were ranked 52 and 65, respectively, in the worldwide TomTom traffic index rankings in 2019 [Tom].

Louvain algorithm
The complexity of the large road networks poses some significant computational challenges. For any suffi- vertices; for ε > 0 sufficiently large, the Rips complex is the entire simplex on V. Persistent homology has cubic complexity (it is essentially Gaussian elimination [ZC05]), requiring on the order of 2 3|V| = 2 3×1087 ≈ 10 981 computations for ManG • (in the worst case).
To approximate the theoretical calculations with tractable ones, we will first use the Louvain algorithm [BGLL08] to coarse-grain the graph into statistically similar neighborhoods. We note that the Louvain  algorithm has been extended to handle directed networks in [DP15]. The Louvain algorithm seeks to find a partition P of V which maximizes are, respectively, the weighted in-and out-degree of a vertex v, and v∼ P v if both v and v are in the same set in the partition of V. We start by assigning each node in V to its own community (i.e., begin with the |V| communities in the trivial partition). Given a partition P and a vertex v, we can update P based on v by trying to increase Q(P) by moving v to each of the different sets in P; if we can increase Q(P) by doing so, we get a new partition. A pass through V corresponds to sequentially updating the partitions based on each v in a random ordering of V. Letting P n be the partition after the nth pass through V (with P 0 being the trivial partition V itself), we get a (stochastic) dynamical system on partitions on V. After n * passes, the partitions remain constant (i.e., the Louvain algorithm enters a fixed point). The (directed) Louvain algorithm is a greedy stochastic algorithm. It is believed to run in O(|E|) time (see also [Tra15]). Informally, we can split d in v and d out v into internal (intra-community) and external (inter-community) connections. The Louvain algorithm seeks to increase internal connections while decreasing external connections (see [BGLL08] for motivation).

Community simplification
If we have a partition P of V, we construct a simplified graph ManG P = (V P , E P , W P ). Let V P be the partition P itself. For C and C in P and v ∈ C and v ∈ C , definê as the cheapest cost to go from v to v , where paths are restricted to C ∪ C (if no such paths exist, we set d C∪C (v, v ) to be ∞). The set E P will consist of ordered pairs (C, C ) of distinct elements of P (i.e., points in This gives a measure of how quickly one can directly travel between two communities. We will see in our case study that this coarse-graining still allows us to capture interesting behavior. Once we have weights between communities, we can copy the development of section 4 and construct a (directional) distance d L between communities (analogous to (1) and (2); note that (9) is a distance between points, whereas d L is a distance between communities), a symmetrization sym ± (d L ) (analogous to (3)), R ε ± (ManG P ), and then persistence module and persistence intervals (as in (5) and (6)) on ManG P .

Louvain communities
The Louvain algorithm gives an evolving (dynamically defined) set of communities {P n ; n = 0, 1 . . .}, and we then have ManG P n and ChengG P n (with ManG P 0 = ManG • ) (see figures 3 and 4). The Louvain algorithm should coarse-grain various aspects of ManG, and we will be able to draw some conclusions in our case study from how persistent homology of these communities (see section 9) evolves as we iterate the Louvain algorithm (and consider coarser and coarse P n ); see also section 8. Tables 1 and 2 in the supplementary appendix show how the sizes of the graphs change as we increase the number of iterations.

Intuition and simulations
Consider the problem of finding detours around slow roads (where a slow road is one with high pace/low speed). For example, one might be planning a city bus route, where one needs to find a cyclical path which covers a decent portion of the city and travels along non-congested roads. For specificity, consider three vertices v 1 , v 2 , and v 3 in V ( figure 5). The simplicial complex generated by the routes between these vertices is If the various edge distances (generalized paces) d(v, v ) for distinct v and v in {v n } 3 n=1 are low (i.e., are not congested), then (for the purposes of choosing a route) it does not matter how we route around the triangle   with vertex set {v n } 3 n=1 ; any choice will yield a route with low pace (other metrics like total length may of course differ). To reflect that the choice does not matter, we fill in the triangle [v 1 v 2 v 3 ]; i.e., add the twosimplex [v 1 , v 2 , v 3 ] to (10). This makes the space of choices topologically contractible. Essentially, we are filling in triangles of 'fast' routes. Since sym + (d) is in fact a symmetric distance function, this leads to an equivalence relation on paths (see [BGK15]; two paths are equivalent if their difference is homologous to 0).
Persistent homology gives us a framework for identifying robustness of routing decisions by varying a pace parameter ε telling us when to fill in triangles. In our above example, we will fill in the triangle v   (i.e., fill in triangles with pace in some way smaller than ε). If ε = 0 no triangles are filled in (all roads are congested), and if ε max v,v ∈V d(v, v ), all triangles are filled in (no roads are congested according to ε). As ε increases, meaningful cyclical route will appear and merge, with the longest lasting and fastest detours being the most impactful for routing around slow roads (see the below definition 8.2 of impact). Mathematically, as ε increases, we obtain a simplicial complex whose first homology group H 1 will have generators which represent cyclical detours around congested areas. In other words, adding two-simplices allows us to focus on the topological features which correspond to meaningful choices. In this section, we will consider several carefully selected synthetic road networks to experimentally verify that several aspects of our approach work as intended. We begin by giving an explicit description of one-cycles in the Rips complex (with Z/2Z coefficients). While in general a cycle is defined as an element in the kernel of some linear transformation, the following lemma should hopefully give the reader a visual image of the abstract notion of a cycle as a collection of loops in a graph. Recall that the first homology group H 1 (R)   of a simplicial complex R is the quotient group of the cycles (elements in the kernel of the boundary map ∂ 1 : R 1 → R 0 ) by the boundaries (elements in the image of the boundary map ∂ 2 : R 2 → R 1 ). See appendix A for more details. Proof. Every two-regular subgraph gives rise to a cycle by definition of the boundary map and the fact that we're using Z/2Z coefficients. By the handshaking lemma, the edge-node adjacency matrix M of a tworegular graph is square. Because the graph is connected, ker(M T ) has rank one, and thus the rank-nullity  theorem implies that ker(M) also has rank one. The kernel is thus generated by (1 . . . 1) T , so that the cycle is irreducible.
Assume now that we have an irreducible cycle. By definition the degree of every vertex in the cycle is even, and by irreducibility the subgraph corresponding to the cycle is connected. Say for the sake of contradiction that some node has degree 4. By the handshaking lemma, |E| |V| + 1. The edge-node adjacency matrix M of the cycle thus has kernel of rank at least one. We claim it must have a kernel of rank at least two, contradicting irreducibility. By definition of M and the fact that we use Z/2Z coefficients, the vector (1 . . . 1) is in the left kernel of M. Thus M T has rank at most |V| − 1. It follows that M has rank at most |V| − 1.
We want to slightly modify the notion of persistence (the length of a bar in the persistence barcode) to capture the length of a bar relative to its starting point (pace, in this case). While the persistence intervals of (6) are scale-specific (i.e., lengths will change, for example, if distance is measured in kilometers as opposed to miles), we want to capture the fact that the persistence interval (20, 30) lasts 10% of its original starting point (of 20).   = (a, b), i.e., the logarithm of the persistence, to be the impact of the cycle. Figure 6 shows a strongly connected graph G with a bi-directional outer beltway and uni-directional inner beltway. The pace along the edges of the outer green beltway is 1, while the pace along the inner violet edges is 20, making it more efficient to travel along the outer beltway. The barcode corresponding to the Rips filtration of sym + (d) (as in section 6) is shown in the left panel of figure 7. The H 1 portion of the barcode is expanded and shown with a corresponding choice of homology generator and the H 1 impact in figure 8. As expected, the outer beltway corresponds to a generator of the persistent homology which is born early and dies early (the top gray bar). While this generator seems insignificant (i.e. not persistent) in the original barcode, it is clearly the highest impact cycle as shown by the right panel. This supports the intuition that high impact cycles should correspond to large, speedy beltways.

Simulations
Informally, the Louvain algorithm should allow us to coarse-grain a graph (and thus allow more tractable calculations) without losing too much of the information available in persistent homology. The Louvain  algorithm also reduces visual 'noise' by condensing nodes with short lifespans in the H 0 barcode. We can see this more clearly by constructing in figure 7 the persistence barcode after every pass of the Louvain algorithm through V.
By contrast, figure 9 shows the persistent homology of figure 6 using the Rips filtration of sym − (d) (rather than sym + (d)). We still observe the speedy outer beltway as a high impact cycle, but now we also observe the slow inner beltway as a cycle with fairly large impact.
To get some more perspective on sym − (d), let us modify figure 6 by making the outer beltway unidirectional (clockwise), giving us G of figure 10. The graph is still strongly connected, and the persistence diagram of sym − (d) is shown in figure 11. We now only detect the bi-directional inner beltway with sym + (d), as can be seen in figure 12. This is because the outer beltway dies (is in the image of the boundary operator) as soon as it is born. Informally, if a generator has high impact in the sym − (d) but low impact in sym + (d), the generator is likely to be uni-directional.
Another way to think about the difference between sym − (d) and sym + (d) is to consider a bi-directional cycle graph with uniform edge weights and then make a single edge uni-directional. With sym − (d), the distance matrix will remain unchanged. However, with sym + (d), the distance between the vertices bounding the unidirectional edge will increase drastically, as one will have to travel every other edge in the cycle to get back to the adjacent vertex. This implies that the bar corresponding to the cycle in the sym + (d) filtration will have length zero, since every other pair of vertices has distance strictly less than the maximum distance between the vertices bounding the uni-directional edge (hence the cycle will already be a boundary when it is born); see appendix A.

Persistent homology of Manhattan traffic
With our goal of using persistence to understand the topology of pace, let us look at the data of section 6.
In agreement with the simulations of section 8: • H 1 (R + (ManG P n )) captures bi-directional emergent beltways in traffic; see subsection 9.1  • H 1 (R − (ManG P n )) captures uni-directional emergent beltways in traffic; see subsection 9.2 • High impact cycles are preserved by our Louvain preprocessing step (see figure 19) Several conclusions stem from this analysis: • Entry and exit points in Manhattan (bridges and tunnels) cause traffic slowdowns at the entry points, • Traffic seems to obey the lower posted speed limits near Thompson Park creating a bi-directional beltway, • There is an uni-directional emergent beltway around lower Manhattan, and • 59th street is so slow that it acts as a barrier between lower and central Manhattan.

Bi-directional cycles: persistent homology R + (ManG P n )
The Louvain algorithm takes five passes over ManG before converging, giving us five successively coarser partitions of the starting graph. From each of these partitions we can construct a Rips complex R + (ManG P n ), n = 0, . . . , 4. In order to compute these Rips complexes in a reasonable time, we focus on R + (ManG P n ) for n ∈ {2, 3, 4}. Figure 13 shows the persistence barcodes for R + (ManG P n ) for n ∈ {2, 3, 4}. We see that none of the cycles (corresponding to green bars in figure 13) survive more extreme preprocessing by the Louvain algorithm, indicating that the beltways found are probably low impact. We confirm this in figure 14, which plots the H 1 barcode of R + (ManG P 2 ) together with the shape and impact of the cycles, coded by color. 'Low'  impact is admittedly subjective, although interpreting it to be an impact bar whose length is less than 1 4 log maximal finite death time of generators of H 0 will give us a consistent definition for our results. A more rigorous definition of low impact could result from a detailed analysis of the impact of persistent H 1 of random directed graphs with a fixed (positive real valued) degree distribution. We will not pursue this here. Figure 15 positions the generators of persistent H 1 for R + (ManG P 2 ) on a map. The position of node n in each cycle in this condensed graph is located at the centroid of the communities in P n . The green generator near Alphabet City surrounds Thompson Park, where the posted road speeds are 5 mph slower than the surrounding roads [Cit16]. This cycle is a small beltway around Thompson Park due to the slower speeds near the park. From figure 1, we see that our traffic graph does not contain enough data from the West Village to identify the slower road speeds in that area.
Finally, we want to decompress the data to see what these cycles and connected components look like in the original graph. Figure 16 shows cycles in ManG, produced as follows: we find the closest node (ties broken randomly) in ManG to each node in the Rips complex, then compute a shortest path (ties broken randomly) between each of these nodes (and hence we first choose an orientation of the cycle since shortest paths are not symmetric). This cycle should be thought of as the path a fast driver would take if they were to start from home and proceed to make trips where they pick a passenger up at the same location they drop the previous passenger off, then return home at the end of the day. As such these cycles may have inner loops or may require  the driver to backtrack as they move to the next location. The condensed graph can be viewed as a quotient space of the original graph, and as such a lift of a cycle to the original graph is not unique. Figure 17 shows the two most persistent connected components in the Manhattan traffic graph. We can infer that crossing 59th street is incredibly slow compared to average Manhattan traffic. Figure 18 shows the connected components in the Manhattan traffic graph earlier in the filtration. We see that the locations where bridges and tunnels enter Manhattan seem to cause divides between connected components. Unwinding the definition of our filtration, this indicates that the roads which cross entryways are relatively slow. In other words, traffic entering Manhattan island obstructs traffic crossing in an orthogonal direction. Figure 19 shows the persistence barcodes for R − (ManG P n ) for n ∈ {2, 3, 4}. There are two major differences from the persistence barcode using R + (ManG P n ). First, there are 3 rather than 2 connected components which survive longest, and second there is a cycle (a generator of H 1 ) even after 4 applications of the Louvain preprocessing step. This should indicate that this cycle has higher impact.

Uni-directional cycles: persistent homology of R − (ManG P n )
We confirm this in figure 20, which plots the shape of the cycles together with their persistence and impact (color coded). The cycle of highest impact appears to be the dark purple cycle. We can see from figure 21 that it corresponds to the area surrounding much of downtown Manhattan. The fact that this cycle is not a generator of the persistent homology of R + (ManG P n ) indicates that it is not bi-directional. Lifting both orientations (clockwise/counterclockwise) of the cycle to the original graph as in figure 22; we see that the counter-clockwise direction appears as to follow a circular pattern, while the clockwise direction follows an incredibly convoluted path with a lot of backtracking. Unfortunately, investigating ManG as shown in figure 1, we see that the reason for the difference in the counterclockwise and clockwise orientations is a combination  missing travel time data along Park Avenue and the fact that the east-west streets in this region of Manhattan are one-way.
Recall that homology generators of H 1 (R − (ManG P n )) can correspond to zig-zags rather than oriented cycles. That is, edges in a homology generator can switch between clockwise or counter-clockwise orientation as we move around the cycle depending on which direction is faster. It is thus necessary to check the homology generators to see how closely they resemble an honest uni-directional beltway. We isolate the cyclic route corresponding to the highest impact cycle in figure 23 in order to demonstrate how important the orientation of a cycle can be. The red points correspond to locations that the cycle is required to visit; specifically, they correspond to the nodes in ManG which are closest to the nodes in the highest impact cycle in ManG P 2 . From this representation we can immediately see which stops along the cycle are significantly 'out of the way'. For example in the path with counter-clockwise orientation (right panel in figure 23) we see that the corner near the intersection of 9th Avenue and 30th street is hard to reach from the intersection of 46th street and 9th Avenue and consequently should be left out of any planned route going counter-clockwise around the city. The paths in figure 23 allow us to determine where to add a bypass in order to obtain a proper uni-directional beltway. Figure 24 shows the three most persistent connected components in R − (ManG P n ). This is similar to the persistent connected components in R + (ManG P n ) with the exception that the black and light green connected components merge at almost the same time as the light green and dark green connected components. Comparing the filtration levels in figure 24 to the filtration levels in figure 17, we see that crossing 59th street is much faster in one direction than the other. Because the only roads in our data connecting these components are Lexington Ave (one way north − > south) and York Ave which is close to the entrance of the Queensborough bridge, it is faster in our graph to travel north to south than south to north across 59th St. Figure 25 shows the connected components at an earlier stage in the filtration. Similarly to the connected components of R + (ManG P n ), we see that entries into the city seem to split connected components.

Using persistent homology to compare traffic across cities: a study of Chengdu
Our goals now are to demonstrate how persistent homology is affected by the topology of the underlying traffic network and to show how persistent homology provides a framework for comparing traffic across cities. We use traffic data from Chengdu, China described in section 6 in order to compare the grid-like network of Manhattan to a more ring-like network. We focus on the filtration sym − (d) in order to streamline the discussion and highlight methods of comparison.
We produce the same figures as for Manhattan, with the exception of the impact barcode in order to increase the visibility of the plotted cycles. Figure 26 contains the total persistence barcode of the Chengdu traffic network at three different levels of granularity. As expected, in a traffic network with structural beltways, we find more prominent emergent functional beltways which survive coarse-graining of the graph. Because the pace distribution of Chengdu is more concentrated than the distribution for Manhattan (figure 27), one expects the bars in its persistence barcodes to be shorter.   Figure 28 shows the shape of the persistent H 1 generators of ChengG P 2 . Compared to Manhattan, there are significantly more emergent beltways, and the persistent emergent beltways connect far more nodes in the Rips complex. Indeed we see from figure 29 that the cycles cover large portions of the city, indicating that the structural beltways are working as intended. The fact that the two orientations of lifts of the generating cycles to the traffic network in figure 30 are so similar indicates that the generating cycles in the barcode are bi-directional. We can see this in particular when examining a lift of the most persistent cycle with two different orientations in figure 31. Finally, figure 32 shows that there are two persistent components of Chengdu traffic near the Jinjiang theater and Dongguang street residential district. Figure 33 shows the Wasserstein one-distance between the both the H 0 and H 1 barcodes of Chengdu and Manhattan at each partition level. Note that the distances in the H 0 confusion matrix are globally higher than the distances in the H 1 confusion matrix because there are more bars in the H 0 barcode: see formula (7). Just by looking at the networks of Manhattan and Chengdu, we might expect a significant difference in the H 1 barcode, since this measures emergent ring structures. Indeed, we can see that the differences between the H 1 barcodes for Manhattan and Chengdu are greater than any of the differences within a single city occurring because of changes in granularity.

Conclusions
By computing the persistence barcode of traffic networks, we provide a new means of both qualitative and quantitative comparison of city traffic at various levels of granularity. First, we show that coarse-graining a traffic network using the Louvain algorithm preserves important topological features by studying two handcrafted networks. We demonstrate how to read data from persistent homology generators in section 9, showing the disruptive effect that bridges and tunnels have on Manhattan traffic and identifying a surprising emergent beltway. We confirm that significant emergent beltways in traffic survive our coarse-graining process by computing the barcode of the Chengdu traffic network. Finally, we show how to extend the ideas of [LFW+15] to embed traffic networks into a metric space of persistence barcodes to allow for quantitative comparison between cities. We see that the structural beltways of Chengdu allow for significantly more Figure 34. An example directed traffic graph with four intersections, two one-way links, and two bi-directional links. emergent beltways than in Manhattan, and the Wasserstein distance between H 1 barcodes quantitatively verifies this intuition.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https://doi.org/ 10.13012/B2IDB-4900670_V1.

Appendix A. Persistent homology: an example
To explain some of our topological tools for the nonspecialist, let us consider a simple weighted directed graph D (as in section 4).
Consider figure 34, where the matrix form of the weights are ⎛ ⎜ ⎜ ⎝ i.e., w (1,2) = 1, and where • denotes either a pair of vertices not connected by an edge (i.e., there is no edge from node 1 to node 3) or a diagonal pair of vertices. The matrix form of the distance function d as in (1)    We summarize the filtered Rips complex in figure 35. As usual, the dimension of a simplex is one less than the length of the tuple describing the simplex. The persistence degree is the index of the start filtration in the ordered filtration set; i.e., the degree of the grading of the simplex as a basis element of the graded Z/2Z[t]-module associated to our Rips complex as in [ZC05].
In detail, let C i (R ε + (D); Z/2Z) denote the Z/2Z-vector space with basis given by the i-simplices in R ε + (D). For example, C 0 (R 0 + (D); Z/2Z) is the four dimensional vector space Z/2Z ⊕4 where a basis is given by the vertices of the graph. There is a linear (over Z/2Z) boundary map We have disregarded the usual alternating sign in the definition of the boundary map since we are considering Z/2Z coefficients. If we compute the kernel of ∂ i mod the image of ∂ i+1 to all the simplices in figure 35 we recover the homology of R 3.5 + (D). To deal with the filtration, the authors of [ZC05] note that each C i (R ε + (D); Z/2Z) can be decomposed as Since C i (R That is, multiplication by t moves simplices into a higher filtration, keeping track of the fact that if the distance between nodes is less than ε j , it is also less than ε j+1 . As the boundary map ∂ i can only decrease filtration, we can promote ∂ i to a graded map by defining  Here the degree deg of a simplex is the index of its filtration in the ordered filtration set. With a graded boundary map in hand, one can compute a homogeneous basis of its kernel and image, and use these to determine the persistence intervals (the fact that persistence intervals are well defined relies on the structure theorem for finitely generated modules over a principal ideal domain).
We now proceed to compute the persistent homology of the Rips complex defined by figure 35. Because there are no −1-simplices, ∂ 0 = 0.
We can perform column operations to put the matrix in column echelon form: From this representation we can read off three pieces of information: a basis for the kernel, Z 1 , a basis for the image, B 0 , and the filtration at which the generators of Z 0 are killed. In particular, by looking at the degree of the pivots, we read off three persistence intervals (0, 2.5) corresponding to the generators [1], [3], [4] of H 0 , and one interval (0, ∞) corresponding to the generator in the non-pivot row, [2].
Next, we calculate persistent H 1 by applying similar techniques to ∂ 2 .
Because ∂ 1 • ∂ 2 = 0, some linear algebraic calculations as in [ZC05] tells us that we can represent ∂ 2 with respect to the basis of Z 1 as Again, using column operations to reduce, we have Now again we read off the persistence intervals by looking at the filtration of the generator of Z 1 corresponding to a given row and the degree of the pivot in that row. We see that the H 1 persistence intervals are (2.5, 3.5), (3.5, 3.5), (3.5, 3.5). Abstractly, the process can continue to simplices of higher degree (vis H 2 ), but we only use H 0 and H 1 in our efforts.

Appendix B. Supplementary figures
See figures 39 and 40 and tables 1 and 2.