The spindle approximation of network epidemiological modeling

Understanding the dynamics of spreading and diffusion on networks is of critical importance for a variety of processes in real life. However, predicting the temporal evolution of diffusion on networks remains challenging as the process is shaped by network topology, spreading non-linearities, and heterogeneous adaptation behavior. In this study, we propose the ‘spindle vector’, a new network topological feature, which shapes nodes according to the distance from the root node. The spindle vector captures the relative order of nodes in diffusion propagation, thus allowing us to approximate the spatiotemporal evolution of diffusion dynamics on networks. The approximation simplifies the detailed connections of node pairs by only focusing on the nodal count within individual layers and the interlayer connections, seeking a compromise between efficiency and complexity. Through experiments on various networks, we show that our method outperforms the state-of-the-art on BA networks with an average improvement of 38.6% on the mean absolute error. Additionally, the predictive accuracy of our method exhibits a notable convergence with the pairwise approximation approach with the increasing presence of quadrangles and pentagons in WS networks. The new metric provides a general and computationally efficient approach to predict network diffusion problems and is of potential for a large range of network applications.


Introduction
Network diffusion modeling is a priority for predicting information propagation, guiding epidemic responses, etc.During diffusion processes, contagious elements, like infectious diseases, information, etc propagate according to the network topology.As a ubiquitous process in networks, diffusion plays a key role in applications ranging from information propagation [1], epidemic spreading [2,3], and adaptation to new technologies [4], to changes in behavior [5][6][7][8].From the microscopic perspective, diffusion propagation among the individual nodes is highly affected by their properties and contacts, and possibly whom they are connected to.From the mesoscopic perspective, diffusion on networks depends on the local wiring of the network, characteristics of the carrier, and other environmental or spatial factors [9].From the global perspective, the large-scale network structure, interventions, and the evolution of the pathogens or information spreading, affect the outcome of the diffusion process.This interplay with factors at various levels makes the analysis of network diffusion highly challenging.
Various models [10][11][12][13] have been proposed to characterize such diffusion processes, where the most straightforward and intuitive scenario is epidemic-like models where the disease is transmitted from infectious individuals (I) to the susceptible (S) when they interact with each other.Compartmental models [14][15][16], such as SIS, SIR, SEIR, etc are typically applied to study epidemic-like diffusion phenomena and predict dynamics making the simplifying assumption that all individuals interact with each other with the same probability [17][18][19].A wave of research has focused on the effects of network structure on spreading dynamics, including degree distributions, clustering, assortativity and communities on epidemic spreading velocity [20], epidemic size [21], and epidemic thresholds [22].
From a theoretical perspective, the heterogeneous mean-field [23] (HMF) approach assumes that nodes with the same degree are statistically equivalent, and the quenched mean-field [24] (QMF) approach describes the full network structure using the adjacency matrix.Although QMF improves predictions of outbreak sizes and thresholds relative to HMF, it overestimates the infection probability of susceptible vertices because of dynamic correlations among the infectious states of vertices, generated by the wave of infectious spreading through the network [25][26][27].The weakness of the QMF approach is remedied e.g. by dynamic message-passing [28] (DMP), or the pairwise approximation [14] (PA) approach.Although these approaches provide more accurate epidemic size and threshold predictions, the numerical solution is time-consuming, which is an obstacle to wide application.
The key dynamic characteristic of a vertex on networks for our purposes is its role within the hierarchical structure of the network [29].The hierarchical structure of a root node describes the immediate neighborhood and high-order neighborhoods [30] up to a maximal distance.The distribution of the number of nodes in the hierarchical structure from a specific node is characterized by a very rapid growth for short distances, followed by an exponential decay [31].Similar to the k-shell [32], the hierarchical shell l of a network is defined as the set of nodes that are at distance l from a randomly chosen root node.A discriminative and computationally efficient metric based on the hierarchical structure is successfully utilized to measure the heterogeneity of a network in terms of connectivity distances, thus distinguishing and quantifying network dissimilarities [33,34].Concentrated on the dissimilarity metric, several researches have already sprung up, such as influential nodes identification [35], diffusion capacity of interconnected networks [36], filtering and compression of weighted networks [37], link prediction in multiplex networks [38], and evaluation of community vulnerability [39].Obviously, the structure of the hierarchical shell is important for understanding the network's diffusion properties since propagation precisely occurs shell after shell [40], but its application with respect to diffusion prediction has been largely ignored, and whether the diffusion-wise hierarchical shell provides more information on spreading prediction has not been investigated.
In this work, we will propose a novel theoretical scheme, the spindle approximation (SA) to approximate network diffusion.SA uses the hierarchical structure characterized via the so-called spindle vector and obtained by the breadth-first-search (BFS) process.The BFS process starts at the root node and visits all nodes at the current hierarchical layer before moving on to the nodes at the next hierarchical layer.For the SA, we decompose the epidemic-like diffusion process and find that the diffusion tree is always a possible spanning tree of the hierarchical structure and that the infection sequence of the SI model shows an outbreak pattern which is similar to the spindle vector.These similarities support the idea of basing diffusion modeling on hierarchical structures.We will pursue this idea by asking: First, how the hierarchical structure should be utilized to approximate network spreading on networks?Second, is the approximation better than any other methods for modeling the spreading dynamics?SA resolves a network as a spindle-shaped multilayered network according to the hierarchical structure and considers effective infections from former layers and the same layer.To reduce the complexity of SA, we simplify it by mean-field theory to an approximation theory called SA * , which assumes that the number of neighbors is equivalent for nodes in the identical layer and estimated as the average connections.We find that the hierarchical structure enables the outperformance of SA * compared with QMF, DMP and PA methods on BA networks, and that the less complex SA * provides more powerful prediction than SA in most WS and BA networks.The successful application of hierarchical structure in infection prediction calls for more analysis of the hierarchical structure.

Hierarchical structure and network diffusion
At a given time in a diffusion process, the number of nodes in close contact with the infectious nodes is the fundamental predictor for the immediate propagation of the infection.For example, an infection starting from a source v will first propagate to some of its nearest neighbors, then to those that are two steps away from v, and so on.In the most extreme scenario, if the infection follows the SI model with an infection probability of β = 1, the spreading becomes the BFS process and is characterized by the hierarchical structure.The number of infected nodes at time t will be the number of nodes at most t steps away from v. To further exemplify the relation between the hierarchical structure and diffusion evolution, we focus on BFS processes and virus transmission shaped by SI model starting from vertex v 1 and v 2 on the same network (see figures 1(a) and (b)).The hierarchical structure shows a similar pattern with an infection sequence.Specifically, both BFS processes and virus transmission originating from v 1 show two peaks, while those starting from v 2 have only one peak.More importantly, the distributions are more aligned when β is closer to one (see figures 1(c) and (d)).

Spindle vector
The hierarchical structure characterizes a BFS process and is also critical for describing how diffusion will occur.In most networks, elements in this sequence usually start with small numbers, i.e. the degree of the root vertex, increase until the peak and then reduce to a small number again at the most distant neighbors.We describe this pattern as a 'nodal spindle' and name ⃗ S i p as a new metric characterizing the diffusion capability of vertex i which is expressed as where n i L i denotes the fraction of vertices in layer L i , and the root vertex is viewed as layer 0. Different roots may give rise to various characteristics of hierarchical structure, or distinct ⃗ S i p , in the identical network.To integrate the overall hierarchical structure and describe the diffusion capability of networks, we average ⃗ S i p over all nodes and get the 'network spindle' which is characterized by ⃗ S p (shown in figure 2) where ∑ ⃗ S i p is the summation operator of vectors with various lengths, and L max is the total number of layers.Specifically, ⃗ S i p (j) denotes the average number of vertices in layer j over all nodes where Θ(B j )(j = 0, 1, . .., L max ) is the set of nodes from which layer j can be reached through BFS process.
As a variation of nodal distance distribution, the nodal spindle does not always illustrate unimodal distribution, such as for the center node of a star network.However, the network spindle, aggregating all nodal spindles, always depicts two narrow poles which can be described as a spindle.To unify the metric for nodes and networks, we extend the definition of distance distribution to spindle in the broad sense.More importantly, the network spindle incorporates some intrinsic characteristics of networks, such as the average degree (⟨k⟩), the average distance (Λ), node betweenness (B v ) and edge betweenness (B e ) which are proved in appendix D.

Lemma 1. The total number of layers L
Lemma 4. The area under the curve of ⃗ S p approximates to Λ of networks, i.e.

SA and SA * methods on synthetic networks
The SA methodology is centered on modeling the epidemic propagation from neighboring nodes in the preceding and current layers, utilizing detailed information about node-specific connections within these layers.The SA * variant homogenizes nodes within the same layer and estimates the neighbors in the former and identical layer as the average connections using the mean-field method.Given that the maximal nodal distance within a network is typically significantly smaller than the network's magnitude, SA and SA * notably curtail the number of requisite equations to fewer than D. This reduction stands in stark contrast to the N equations necessitated in QMF, N + 2E in DMP, and N + 4E in PA approach.Further, SA * adopts a mesoscopic point for epidemic approximation, centering its focus on the quantification of nodes across each layer and the interlayer as well as intralayer connections.This approach deliberately eschews the intricate mapping of pairwise nodal connections, offering a more generalized yet insightful perspective into epidemic dynamics.
We use a BA network with N = 1000 and m = 2 to first test the performance of our method compared with QMF, DMP and PA, in predicting epidemic propagation originating from a single node with the minimum degree (see figure 3).The ground truth is the average infection originating from the node on a SI process with β = 1/3 under Monte Carlo simulation 100 times.The layouts of the nodal spindle structure, i.e. the hierarchical layer, and degree-based layer of the corresponding network are shown in figures 3(a) and (b).The approximations from theoretical methods shown in figure 3(c) indicate that SA and SA * perform better than QMF, and that SA * is even more powerful than DMP and PA which take the dynamical correlations among the states of the nodes into consideration.Specifically, SA * provides the most precise prediction with the mean absolute error (MAE) equaling 0.0113, smaller than those of SA, QMF, DMP, PA methods, i.e.MAE SA = 0.0282, MAE QMF = 0.0358, MAE DMP = 0.0147, MAE PA = 0.0147.We repeat the approximation with 100 different source nodes and average the MAE to assess the performance of the proposed method.The synthetic networks include WS networks with different rewiring probability (p WS r ∈ [0.01, 0.05, 0.1, 0.5, 0.9]) and BA networks with different number of edges for new attaching nodes (m ∈ [2,3,4,5]).Experiments on synthetic networks indicate that SA, SA * method predicts the temporal evolution of diffusion dynamics on BA networks better than those on WS networks, and that SA * is more efficient than SA on most BA and WS networks.For the SA method, the MAE of BA networks ranges from 0.0134 to 0.0383 with an average of 0.0234, much smaller than that of WS networks ranging from 0.0126 to 0.2058 with an average of 0.0966.For the SA * method, the average MAE on BA and WS networks equals 0.0154 and 0.0906, respectively.More importantly, SA * outperforms QMF, DMP and PA in all BA networks with an average improvement of 38.6% on MAE with 95% confidence interval [36.7%, 40.6%], while SA * can only produce more precise prediction than QMF on WS networks with high rewiring probability bigger than 0.5 (see appendix A tables A1 and A2).Theoretically, WS networks with high rewiring probability usually induce a low clustering coefficient, reducing the connections within the same layer shaped by the nodal spindle.The few connections decrease the complexity of correlations among the states of neighboring nodes, thus facilitating the propagation prediction by SA * for its advantage in partially dealing with the correlations which is missing in QMF.
In addition, we also check the performance of SA on scale-free networks with the power law γ > 3 using the uncorrelated configuration model [41] with the minimum degree m = 4. Results illustrate that all approximations generated larger MAE to predict the epidemic propagation process, while SA * is consistently superior to QMF and inferior to PA and DMP in these networks, which is compatible with the findings on WS networks (see appendix B figure B1).

SA and SA * methods on empirical networks
Next, we illustrate the performance of SA and SA * under the SI epidemic process on Windsurfers [42] and Ego-Facebook [43] networks.As shown in figure 4, SA * provides the best prediction of the epidemic from the root with the minimum degree on both densely and sparsely connected networks.We repeat the approximation with 100 different source nodes for empirical networks.Our experiments demonstrate that SA * outperforms SA in all 10 empirical networks.For SA * method, MAE ranges from 0.0106 to 0.0455 with an average of 0.0255, smaller than SA method ranging from 0.0181 to 0.0471 with an average of 0.0298 (see appendix A table A3).More importantly, SA * demonstrates comparable or superior performance to QMF, and shows predictions that are not significantly worse and, in some cases, even superior to PA and DMP.Specifically, SA * outperforms QMF in 60% of real networks with an average improvement of 7.8% on MAE, while it surpasses PA in 30% of real networks (see table 1 and appendix A table A3).

Sensitivity analysis
To understand the performance of SA and SA * , we further investigate how performance changes with the network topological features.Specifically, we measure MAE for networks with varying average distance (Λ), modularity (Q) and clustering coefficients (C).Λ, Q in BA networks and C in PLC networks (Scale-free networks with adjustable clustering coefficients), all have a linear positive influence on MAE of all approximating methods, while those in WS networks illustrate nonlinear positive effect (see figure 5).Except for triangles described by C, we find that the number of quadrangles (Q ua ) and pentagons (P en ) show a negative effect on the performance of all approximating methods.In addition, the superiority of SA * increases with the number of quadrangles and pentagons in BA networks, and the epidemic prediction of SA * is nearly identical to that of PA method on WS networks with more than 1000 quadrangles or 3000 pentagons (see figure 6).Large Q ua and P en are associated with low clustering coefficient on WS networks (see appendix A table A2), which is in turn explainable to the observation.Theoretically, considering the relationship Λ ≈ ∑ Lmax l=1 l ⃗ S p (l), large Λ usually produces a flat and long distribution of ⃗ S p in WS networks and induces the cumulation of prediction error.Moreover, the nonlinear positive relationship between Q, C and Λ in WS networks explains the effect of Q and C on MAE.In BA networks, the small C described by C ∼ (lnt) 2 /t and short Λ induced by Λ ∼ lnN/lnlnN respectively witness sparse connections within the same layer and reduce the cumulation of error, thus supporting the better prediction.In addition, quadrangles and pentagons illustrate the weak dynamic correlation among neighbors for the lack of connections within the same layer under the projection of spindle structure.
In addition, to understand the effect of infection rate on diffusion prediction, we investigate how the performance of the proposed methods changes along with β.Generally, SA is more powerful than SA * when β = 1/10, and SA or SA * significantly outperforms DMP, QMF and PA on most BA networks regardless of β.For WS networks, SA * illustrates a non-significant improvement compared to other methods, except for slight superiority to PA in several cases (see figure 7).

Conclusions and outlook
Diffusion on networks propagates through the direct interaction among vertices, of which the temporal evolution is determined by the distance of vertices from the source.Interestingly, the spindle vector is capable of quantifying the hierarchical structure from a source and describing the feature of the propagation chain, thus providing an interesting view of the diffusion dynamics.Based on this new network feature, the proposed SA and SA * methods demonstrate their superiority over the state-of-the-art in predicting the temporal evolution of spreading dynamics on many synthetic and empirical networks.This work extends the existing literature on diffusion prediction in two ways.First, we demonstrate that the hierarchical structure of networks exhibits the intrinsic diffusion pattern, and define nodal spindle vector and network spindle vector to quantify the diffusion ability of vertices and networks.Second, we propose a novel approximation method based on the spindle vector and successfully seek a compromise between efficiency and complexity.Notably, the SA * approach is only based on the count of nodes in each BFS layer and the connection between different layers, ignoring the whole network information.More importantly, it outperforms the state-of-the-art on BA networks in diffusion prediction, especially the PA method which is based on details about the evolution of the pair node states.Although we have tried to access the characteristics of hierarchical structure embedded in the nodal or network spindle, more thorough analyses are required to reveal the correlation and high-order topology described by the spindle vector.In line with the statistical analysis, the application of hierarchical structure on network dynamics will support a brand-new view of mesoscopic structure on network functionality.In addition, the extension of the hierarchical structure to high-order networks modeled by hypergraph and simplicial complex may also be beneficial.The study is limited to SI epidemic-like processes on several synthetic and empirical networks, supporting the significant outperformance on propagation prediction of SA * on BA networks.However, understanding the propagation dynamics from the perspective of hierarchical structure may require the construction of hybrid diffusion models.Analyzing the epidemic thresholds of SIS and SIR model based on the hierarchical structure will be the work at the corner.

Dataset Synthetic networks. We choose generative network models that are common and reflect several topological properties of real-world networks (summarized in appendix C table C1).
Small-world networks (WS) [44].WS model explains the coexistence of a high clustering coefficient and a short distance (small-world behavior) by rewiring links with probability p WS r in a ring lattice with non-overlap connections to K nearest neighbors for each node.This construction allows us to tune the graph between a symmetric state with long distances (p WS r = 0) and a disordered state with short distances (p WS r = 1).Scale-free networks (BA) [45].BA is a model that generates random scale-free networks using a preferential attachment mechanism.It aims to explain the existence of highly heterogenous degree distribution in real networks.It generates a graph by attaching new nodes along with m edges which are attached to existing nodes in proportion to their degree.
Scale-free networks with adjustable clustering coefficients (PLC) [46].PLC model can diversify average clustering in scale-free BA networks with an extra step that the new node will be linked to a random neighbor of the connected with probability p BA ∆ to form triangles.The model takes (m − 1)p BA ∆ as the parameter to control clusters.This construction improves BA networks in the sense that it enables a higher average clustering to be attained if desired.
Real-world networks.We use 10 empirical networks with different characteristics to testify the performance of the proposed method.The self-loops, directionality, and edge weights are all removed, and we only consider the giant connected components of the networks.The topological features of these empirical networks are summarized in appendix C table C2.Note that N, M, D, ⟨k⟩, Λ, C, Q respectively denote the number of vertices and links, the diameter, the average degree, the distance, average clustering coefficient and modularity.

SA on network diffusion
We let G = (V, E) denote an undirected and unweighted graph.Here V is the vertex set and E is the set of edges (unordered pairs of vertices).Furthermore, N denotes the number of nodes and M the number of edges.We consider the spread of epidemics on G described by SI model with a single source j.Since we are interested in the relationship between nodal spindle vector ⃗ S j p and the fraction of infection I j , we turn to the probability of a randomly chosen individual in layer i being infected at time t.The proposed method separates nodes according to nodal spindle layers from j, which can be regarded as a standard metapopulation model [47] where vertices belonging to the same layer are gathered as a subpopulation with heterogeneous connections and subpopulations are connected in the spindle structure.
A susceptible vertex k in layer i remains susceptible until time t if t is numerically smaller than i, which means that no infection has been traversed to layer i since the transmission occurs once each time along the spindle.Supposing that neighbors of k are independent, the probability of getting infected by its d k (t) infectious neighbors at time t is thus Extending the definition to the case of spindle layers, infectious neighbors of k at time t are comprised of those in the preceding layer d − k (t), the same layer d o k (t), and the following layer Connections between layer i and i − 1 will be traversed twice when we focus on layer i and i − 1 respectively, causing repeated infections that could be compensated by ignoring d + k (t) for each vertex.Therefore, the number of infected individuals in layer i at time t, F j i (t), is determined by connections to infection in the preceding layer i − 1 and the identical layer i at time t − 1.Since the contagious item cannot reach vertices in layer i when i > t, F j i (t) evolves like where Θ i is the assembly of vertices in layer i and 1 − ρ t k is the probability of k being in the susceptible state at time t.
Furthermore, we assume that the infected probability of susceptible nodes at time t within the same layer i are identical and equal to the fraction of infectious vertices in layer i: Infectious neighbors of vertex k can be respectively estimated as where k − k , k o k correspond to the in-layer and mid-layer degree of vertex k.So far, the SA method can be summarized as and the total infections at time t is where L max is the total number of layers.

The simplified SA * on network diffusion
The prediction of SA method requires information about the in-layer and mid-layer degree of each node, which may induce the complexity of the method.Therefore, we simplify it by mean-field theory to an approximation theory called SA * .Specifically, we assume that the in-layer and mid-layer degree of nodes within the same layer i are identical and equal to the average connections of each node: where E − i , E o i represent connections between layer i − 1 and i, and those within layer i.Therefore, SA * can be expressed as Here, n i is the number of nodes in layer i originating from node j, (1 − p t i ) is the probability of a node, in layer i, being in the susceptible state at time t, and 1 /n i is the probability that a node will be infected by its neighbors, where (E − i p t i−1 + E o i p t i )/n i represents the infected neighbors.The total infections are denoted as:

Discrete version of QMF method on network diffusion
As for the SI epidemic dynamics approximated by QMF, an infected node tries to transmit the disease to its neighbors with probability β per unit time.This forms a Markov chain where the probability of a node being infected depends only on the last time step.Specifically, the susceptible node i is infected at time t by at least a neighbor with probability (1 − q i (t))(1 − p i (t)), then the discrete-time version of the evolution of the probability of infection of any node i reads where p i (t) is the probability that node i is infected at time t, q i (t) is the probability of node i not being infected by any neighbor where a ji represents the adjacent relationship between node j and i.If they are connected, a ji = 1, otherwise, a ji = 0.The fraction of infection at time t can be expressed as

Discrete version of DMP method on network diffusion
In the DMP approach, a node designated as being in a 'cavity' state is precluded from transmitting an infection to its adjacent nodes, while still remaining susceptible to infection from them.This framework inherently incorporates dynamic correlations among the states of neighboring nodes.The temporal evolution of the probability that a given node i remains susceptible at time t, denoted as p i S (t), is articulated as Here,θ j →i (t) symbolizes the probability that the disease has not traversed through the edge j → i up to time t; Φ i signifies the set neighbors of node i.The updating rule for θ j →i (t) is given by where ϕ j →i (t) is the probability that the disease has not been transmitted through the edge j → i up to time t and node j is infected at time t.The update mechanism for ϕ j →i (t) is as follows New J. Phys.26 (2024) 043027

J Mou et al
In this expression, p i\k S (t) represents the probability that node i remains susceptible upon disregarding any infection originating from its neighbor node k.The right-hand side of (22) delineates two distinct scenarios: the first term describes the situation where node j infects node i with a rate β at time t if it is infected at time t − 1; the second term reflects the probability of node j being infected given it was susceptible at time t − 1.
By excluding node k from Φ i in (20), the following is obtained To complete the recursion updating rules, the initial conditions are set as: Here, δ 0 j indicates the initial status of node j.If it is infectious, δ 0 j = 1, otherwise, δ 0 j = 0.The fraction of infection at time t within the DMP framework is computed as

Discrete version of PA method on network diffusion
In the PA approach, the focal point is the intricate interplay of joint and conditional probabilities associated with each link, formulated through epidemic link equations.For a given link connecting nodes i and j, the probability of node i remaining in a susceptible state is expressed as Here, P SS ij represents the joint probability that both node i and node j are susceptible, while P SI ij signifies the joint probability of node i being susceptible and node j being infected.In a similar fashion, the probability of node i being infected is deduced as P I i = P II ij + P IS ij .Integrating these restrictions, the equations for each node i can be succinctly formulated as In this context, q i (t) delineates the probability of node i evading infection through any pairwise interaction with its neighbors, defined as To fully capture the dynamics of the system, L additional equations are required, each corresponding to a specific link.These equations account for the probability of a link connecting two nodes in the infected state II, transitioning from one of the four potential states: SS, SI, IS, II.Thus, we have Here, q ij (t) specifies the probability of node i not being infected by any neighboring node other than j through a link Finally, the fraction of infection at time t, as predicted by the PA approach, is determined by  .Finally, for the discrete distribution, the area under the curve is the integration of ⃗ S p .According to the definition of ⃗ S p (j), Λ = Proof 5.In [48], authors denote generating functions for the degree distribution and the excess degree distribution of a network respectively, and explicitly give the generating function for the number of neighbors at any distance d.Especially, if the degree distribution of the tree network were Poisson with mean c, the mean number of second neighbors equals the minus between second moment and first moment of degree.We extend the claim in spindle vector, and get ⃗ S p (2) = ⟨k 2 ⟩−⟨k⟩ N .

Figure 1 .
Figure 1.Similar patterns of hierarchical structure and SI dynamics.In a highly clustered network, the node sequence of hierarchical structure (HS), i.e. the number of neighbors that are high-order steps away from the root, can qualitatively indicate the times of diffusion outbreaks, and gradually approximate the spatio-temporal evolution of diffusions with β quantitively.(a) and (b) map the same network as different multilayered networks according to the hierarchical structure starting from v1 and v2, respectively.The root node is denoted as layer 0, and different colors correspond to various layers.(c) and (d) show the distribution of infected nodes with various β, and the similarity with HS.

Figure 2 .
Figure 2. Illustration of nodal spindles and the network spindle of the sample network in figure 1. Nodal spindles shape the network according to the distance of nodes to the root.The width of the spindle profiles reflects the number of nodes at that layer.The network spindle aggregates all nodal spindles and describes the distribution of nodal distance.

Figure 3 .
Figure 3.Comparison of different epidemic approximating methods on BA network with N = 1000, m = 2. SA * produces the most similar spreading process to the Monte Carlo simulation (MC) originating from node with the minimum degree under the SI model with β = 1/3.(a)illustrates the layout of nodal spindle structure.It treats the original network as a multilayer one with different colors indicating various layers.Light cyan and gray edges respectively display the connections between layers and within layers.The size of nodes means the degree centrality in the underlying network.(b) represents the layout of degree-based layer.Nodes are equally distributed on concentric circles where the radius represents the degree centrality.The size and color denote the degree of vertices.(c) shows the cumulative distribution of infections under PA, QMF, DMP, SA, SA * approximation and MC simulation with β = 1/3.

Figure 4 .
Figure 4. Comparison of different epidemic approximating methods on empirical sample networks.SA * provides the best prediction of epidemic from the root with the minimum degree on both densely and sparsely connected networks under the SI model with β = 1/3.(a) and (b) are the regular layouts of Windsurfers and Ego-Facebook networks.The color and size of vertices represent the degree centrality in (a), while the color denotes the community structure in (b).(c) and (d) show the network spindles of corresponding networks.(e) and (f) illustrate the performance of PA, QMF, DMP, SA and SA * on predicting propagation.The insets illustrate the corresponding nodal spindle-shaped structure.

Figure 6 .
Figure 6.The effect of quadrangles and pentagons on approximating methods in synthetic networks.The number of quadrangles (Qua) and pentagons (Pen) show a negative effect on the performance of all methods.The outperformance of SA * compared to PA increases with Qua and Pen on BA networks, while the prediction by SA * illustrates a stable tendency approximating to that of PA with both Qua and Pen on WS networks.Gap in (c) and (f) denotes the difference of MAE between PA and SA * .

Figure 7 .
Figure 7.The effect of β on prediction accuracy of MAE.SA or SA * significantly outperforms DMP, QMF and PA on most BA networks regardless of β, while SA * illustrates a non-significant improvement compared to other methods.The MAE is represented as the polar axis of these radar plots.The area demonstrates the prediction error of the corresponding method, and a larger area means worse performance.(b) and (c) show the same MAE of DMP and PA methods.

Appendix D. Proofs for Lemmas of spindle vector Lemma 1 .Proof 1 .Proof 2 . 0 N = 1 /Proof 3 .Lemma 4 . 2 Λ. Proof 4 .
The total number of layers L max is the diameter of networks D, i.e.L max = | ⃗ S p | = D.The diameter of networks, D, is denoted as the maximum distance between any pairs of nodes, i.e.D = max ij d ij .The total number of layers depends on the longest BFS spinning tree, characterizing the length of network spindle vector, i.e, L max = | ⃗ S p | = max ij d ij .Therefore, L max = | ⃗ S p | = D. Lemma 2. ⃗ S p (0) is the reciprocal of N, i.e. ⃗ S p (0) = 1/N.Each root node i is viewed as layer 0 in ⃗ S i p , then n i 0 = 1/N.According to the definition of ⃗ S p (j), ⃗ S p (0) = ∑ i∈Θ(B0) n i N because layer 0 is reachable to all vertices.Lemma 3. ⃗ S p (1) is a linear functions of ⟨k⟩, specifically, ⃗ S p (1) = ⟨k⟩ /N.The number of vertices in layer 1 is the degree of each root, then n i 1 = k i /N.According to the definition of ⃗ S p (j), ⃗ S p (1) = ∑i ∈Θ(B1) The area under the curve of ⃗ S p approximates to Λ of networks, i.e.∑ Lmax j =0 j ⃗ S p (j) = N(N−1)N Firstly, Nn i j means the number of vertices up to a distance of j away from the root node i, then N ∑ i ∈Θ(B j ) n i j indicates the number of pairs of vertices which are far apart up to a distance of j.Secondly, ∑ Lmax j =0 (jN∑ i ∈Θ(B j ) n i j )corresponds to the sum of the distance over all connections.Λ is defined as the average distance over all connections, i.e.Λ = ∑ Lmax j =0 ( jN∑ i ∈Θ(B j ) n i j ) N(N−1)
Note:Figure5.The effect of topological features on approximating methods in synthetic networks.The average distance (Λ), modularity (Q) and clustering coefficient (C) show a linear and nonlinear positive influence on MAE of all methods on BA and WS networks, respectively.Considering the nonsignificant clustering structure in BA networks, we analyze the effect of C in PLC networks in (c).

Table C2 .
Basic statistics of empirical networks.