Identifying critical nodes in complex networks based on neighborhood information

The identification of important nodes in complex networks has always been a prominent topic in the field of network science. Nowadays, the emergence of large-scale networks has sparked our research interest in complex network centrality methods that balance accuracy and efficiency. Therefore, this paper proposes a novel centrality method called Spon (Sum of the Proportion of Neighbors) Centrality, which combines algorithmic efficiency and accuracy. Spon only requires information within the three-hop neighborhood of a node to assess its centrality, thereby exhibiting lower time complexity and suitability for large-scale networks. To evaluate the performance of Spon, we conducted connectivity tests on 16 empirical unweighted networks and compared the monotonicity and algorithmic efficiency of Spon with other methods. Experimental results demonstrate that Spon achieves both accuracy and algorithmic efficiency, outperforming eight other methods, including CycleRatio, collective influence, and Social Capital. Additionally, we present a method called W-Spon to extend Spon to weighted networks. Comparative experimental results on 10 empirical weighted networks illustrate that W-Spon also possesses advantages compared to methods such as I-Core and M-Core.


Introduction
As an emerging interdisciplinary field, complex systems have been receiving increasing attention from the academic community [1]. In real-life scenarios, there are numerous complex systems. To better understand the fundamental principles governing interactions among entities, researchers have abstracted the entities within complex systems as nodes and the connections between entities as edges, thereby forming complex networks [2]. These diverse complex networks encompass almost every aspect of human life and even the natural world [3], including widely observed biological networks [4], power systems [5], social networks [6], and even virus transmission networks [7], among others [8]. Due to the heterogeneity of network structures, a small number of nodes often play a dominant role in complex networks, and certain important nodes can significantly influence the structure and functionality of the network [9]. Therefore, the fast and accurate identification of key nodes in the network is an urgent and significant problem.
Over time, an increasing number of methods for mining important nodes have been proposed. In the initial stages of complex network research, classical methods such as degree centrality [10] and semi-local centrality (SLC) [11] were introduced, which gained widespread application and achieved notable results. However, as our understanding of complex networks deepened, along with the expansion of network scale and complexity, new challenges emerged in the field of important node mining. Confronted with progressively complex network structures, researchers have put forth numerous new methods for mining important nodes [9,12]. For instance, the K-Shell (KS) method [13] determines node centrality based on its

Related works
Since the emergence of network science, the identification of important nodes has been one of the key research areas in the study of complex networks. The academic community has devised various methods to address the problem of identifying important nodes in complex networks.
Koene was the first to propose degree centrality [10], suggesting that the centrality of a node is related to its degree, which is the simplest and somewhat effective method for assessing node centrality. Subsequently, Chen et al introduced SLC [11], which considers the centrality of a node based on both its direct and indirect neighbors. This approach improved the accuracy of identifying important nodes but sacrificed efficiency. In recent years, several new methods have been proposed. Tulu et al combined node degree with community structure to determine node centrality, which not only identifies important nodes but also identifies hub nodes within communities [18]. Fei et al proposed a new approach to address the issue of eccentricity centrality being susceptible to extreme values. They believed that the strength of a node is related to the attraction between nodes. They defined node strength as the sum of interaction forces between nodes in the network, which is a method that combines local information and shortest paths [19]. Dai et al optimized eigenvector centrality (EC) by introducing a computation method that measures both a node's self-influence and its neighbors' contributions. They improved the accuracy of EC by expanding the measurement scope of node neighborhood information [20]. Wang et al designed a sorting method called EFFC based on network topological changes. They defined the average reciprocal of the shortest distances between all pairs of nodes as the network's information propagation efficiency. Then, by removing a node and evaluating the change in information propagation efficiency before and after removal, they assessed the node's centrality [21].
In addition to mining important nodes based on network information, optimizing the KS method is also an effective approach. KS is a simple method that determines the centrality of nodes based on their position [13,22]. However, KS only assigns nodes to different shells without distinguishing the centrality of nodes within the same shell. Therefore, improving this limitation can enhance the performance of the KS method. Zeng and Zhang differentiated the centrality of different nodes by assessing the contribution of the deleted nodes and their connections to the network. This approach, named 'Mixed Degree Decomposition' method, was proposed [23]. Li et al designed a KS optimization method called CN, which distinguishes the centrality of nodes based on different neighborhood information, thereby addressing the limitations of KS and improving its accuracy [24]. Wang et al proposed the KSIF method, which distinguishes the centrality of different nodes by differentiating the iterative information generated during KS decomposition [25]. Bae and Kim developed the CNC+ method, which distinguishes the centrality of different nodes based on the degrees of neighboring nodes and their core numbers, thereby improving the limitations of KS [26]. This is also a KS optimization method that distinguishes the centrality of nodes based on neighborhood information.
In addition to these methods, ranking node centrality based on random walks is also a viable approach. Among them, PageRank is a well-known and classical method. The PageRank algorithm constructs a network based on the relationships between web pages and distinguishes the importance of different websites through random walks [27]. Following that, Lue et al proposed LeaderRank [28], where they introduced a node 'g' with bidirectional links to all other nodes in the network, transforming the original network into a strongly connected network and improving the performance of PageRank. Li et al further enhanced LeaderRank by using biased random walks instead of standard random walks, thus improving its performance [29]. In recent years, some new works have emerged. Lin and Zhang introduced a novel random walk method called NBCRW based on non-backtracking centrality, where walkers are more likely to visit neighbors with high non-backtracking centrality [30]. OetterShagen et al proposed a new centrality measure called temporal walk centrality, which quantifies node centrality by measuring the ability of nodes to acquire and distribute information in a temporal network. They argued that information does not necessarily propagate along the shortest paths but rather through random walks that satisfy temporal constraints within the network [31]. De Meo et al creatively interpreted node navigability as the property that any node in the graph can be reached through short walks, leading to the concept of potential gain centrality. This centrality measure unifies various walk-based centrality metrics in complex networks [32].
As pointed out by Lu et al, weighted networks carry richer information compared to unweighted networks [9]. For example, studying social networks with weighted information helps in measuring and analyzing the complex functionality and evolution of real-world societies [33]. Analyzing the topology of large-scale economic networks with trade volumes as weights allows us to identify vulnerable nodes in an economic network, contributing to the stability and robustness of a specific economic system [34,35]. Therefore, assessing node centrality in weighted networks is an important aspect of complex network research. In recent years, several methods for determining node centrality in weighted networks have been proposed. For instance, Li et al, as mentioned earlier [29], developed a method specifically for weighted networks. Additionally, Xu and Wang addressed the issue of dynamic changes in network scenarios by proposing a weighted network centrality measure called Adaptive LeaderRank (ALR), which combines the biased weighting mechanism of H-index with the idea of LeaderRank. Compared to previous algorithms, ALR is well-suited for scenarios with dynamically changing network structures [36]. Garas et al extended the KS method and introduced a weighted version called Weighted KS, which considers both node degree and link weights [37]. This method follows the same principle as the traditional KS method, with the difference being that Weighted KS utilizes weighted degrees instead of node degrees. Following a similar approach, Wu et al proposed a unified framework for extending the KS method [38]. They also replaced node degrees with weighted degrees and employed the iterative node removal process of the traditional KS method to determine node centrality.

Proposed method
This section primarily discusses the node centrality measures involved in the experiments conducted in this paper. In section 3.1, various node centrality measures in complex networks are introduced, which will be used as baseline methods to evaluate the performance of Spon. Section 3.2 provides the specific details of the proposed Spon Centrality.

Collective influence (CI)
CI is a method proposed by Morone and A. Makse for identifying highly influential nodes in complex networks [39]. CI measures the influence of a node by quantifying the damage suffered by the giant connected component of the network after removing that particular node. Its definition is as follows: In equation (1), D (i, l) represents the set of nodes in the network that surround node i and belong to the ball of radius l. k i denotes the degree of node i, and l is a predefined value typically set to 3 in large and medium-sized networks, or 2 in small networks.

Cycle ratio (CR)
Fan et al proposed a novel node centrality measure called CycleRatio based on the statistical analysis of cycle structures [40]. CycleRatio measures the extent to which a node participates in the shortest cycles involving other nodes. The shortest cycle refers to the cycle with the minimum length that includes the given node. Therefore, CycleRatio can be defined as follows: In equation (2), c ij represents the number of shortest cycles in the network that pass through nodes i and j, while c ii represents the number of shortest cycles in the network that pass through node i.

Social capital (SC)
Zhou et al proposed Social Capital, which is a succinct measure of local centrality to identify Fast influencers in complex networks [41]. They argue that if a node has a high degree and many of its neighbors are also high-degree nodes, then that node occupies an important position in the network. The definition of Social Capital (SC) is as follows: In equation (3), k i represents the degree of the current node, while k j represents the degree of the neighboring node.

H-index (HI)
Initially, the HI was used to evaluate the quantity and quality of academic output for individual researchers [42]. However, this concept was later extended to the domain of complex networks by Lue et al leading to a new method for mining important nodes in complex networks [43]. More recently, Wu et al proved that all KS indices can be seen as a certain steady-state of the HI series when n tends to infinity [38]. The HI determines the centrality of a node by considering the degrees of its neighboring nodes, and it is defined as follows: In this equation, v1, v2, . . . , vj represent the neighbors of node i, and k v1 represents the degrees of the neighboring nodes. H is a function that returns a value h, which indicates that in the set {k v1 , k v2 , k v3 , . . . , k vj }, there are h values greater than or equal to h.

KS
KS is a highly classical node centrality measure [13]. It is based on a simple and computationally efficient principle. However, it has a significant limitation in terms of its monotonicity, which is due to the recursive removal of nodes in the network with degrees less than or equal to k. This rough approach classifies all nodes with degrees less than or equal to k into the same centrality level, making it difficult to distinguish the centrality of different nodes at a more nuanced level.

Closeness centrality (CC)
CC determines the centrality of a node by considering the average shortest path length from that node to all other nodes in the network [44]. In other words, nodes that are closer to other nodes are considered more important. The CC is defined as follows: where d ij represents the shortest path distance from node i to node j.

EC
The core idea of EC is that the centrality of a node is a function of the centrality of its neighboring nodes [45]. In other words, the more important the neighboring nodes connected to a node, the more important the node itself. Let x i denote the centrality score of node i, then we have: In equation (6), c represents a proportionality constant. When node i is connected to node j, a ij = 1. Let T . After multiple iterations to reach a steady state, it can be expressed as follows: In equation (7), x is the eigenvector corresponding to the eigenvalue c −1 of matrix A.

SLC
The SLC was proposed by Chen et al [11]. SLC takes into account not only the direct neighbor information of a node but also the first and second-order neighbor information of its direct neighbors. Therefore, SLC effectively considers the four-order neighbor information of a node to assess its importance. The definition of SLC is as follows: In equation (8), N (k) represents the sum of degrees of the two-hop neighbors of node k, which corresponds to the number of neighbors that can be reached within two steps from node k. Γ (i ) denotes the set of first-order neighbor nodes of node i, and Γ (j ) represents the set of first-order neighbor nodes of node j.

Spon centrality
Considering a simple undirected and unweighted graph G (V, E), where V is the set of nodes in the graph and E is the set of edges. Let N v1 be the set of first-order neighbors of a node in V, and let N v2 be the set of nodes reached from that node after two hops. The neighbor proportion PN v of node v is defined as follows: Based on this, we can define the sum of neighbor proportions SPN v for node v as follows: From the above definition, we can see that the computation of Spon involves two steps. The first step is to calculate the PN values for all nodes, and the second step is to sum up the PN values of the first-order neighbors of each node to obtain the SPN value. We consider a higher SPN value to indicate a higher centrality of the node. As mentioned earlier, a higher PN value for a node implies that the node can establish connections with a larger number of 'second-order' neighbors through a small number of first-order neighbors, indicating its potential influence. Of course, like-minded people tend to gather together, and objects tend to form groups. If a node has a greater number of such nodes among its neighbors, the probability of that node being a high-influence node is higher. Moreover, in our definition, N v2 calculates the set of nodes reached by starting from node i and making two hops. This means that node i itself is also included in addition to its second-order neighbors, which provides important information. Additionally, if a node belongs to both the N v1 and N v2 sets, we calculate its contribution twice. This is because a node can directly influence the centrality of another node as a first-order neighbor, as well as indirectly influence the centrality of another node as a second-order neighbor. Thus, we need to consider and integrate these pieces of information. Figure 1 illustrates the example process of Spon. Taking Node 3 as an example, Node 1 and Node 5 are the immediate neighbors of Node 3, thus forming the set N 31 = {1, 5}. Starting from Node 3, the nodes reachable within two steps are Node 2, Node 3, and Node 4, resulting in the set N 32 = {2, 3, 4}. Substituting these values into equation (9), we obtain PN 3 = 1.5. Similarly, we get PN 1 = 2 and PN 5 = 2. Substituting these values into equation (10), we get SPN 3 = 4.0. The calculation process for other nodes follows the same procedure.

Experiments
This section provides an overview of the experiments and methodologies employed to evaluate the performance of Spon. Section 4.1 introduces the empirical unweighted networks dataset used in this study. Section 4.2 presents the robustness and monotonicity metrics, which are standard approaches for comparing the performance of centrality measures. Section 4.3 showcases the experimental results of Spon Centrality and provides a discussion of these results.

Roubustness metric
The connectivity test is a prominent method for evaluating the effectiveness of algorithmic ranking in terms of node and edge importance. It is one of the most classical approaches for assessing accuracy. The connectivity test involves sequentially removing nodes from the ranking set and recording the changes in the size of the largest connected component in the remaining network. If the size of the largest connected component in the residual network decreases rapidly, it indicates that the network structure collapses quickly Table 1. Basic topological features of the network data. The left column is the name of the network, N and E denote the number of nodes and edges of the network, respectively; ⟨K⟩ and Kmax denote the average and maximum degree of the network, respectively; and C and r denote the clustering and congruency coefficients of the network. during the node removal process, demonstrating the algorithm's ability to efficiently identify important nodes.
The robustness metric R is commonly used to measure network connectivity [55]. R is defined as follows: Here, N represents the set of nodes in the network, and N n denotes the set of nodes in the current network's largest connected component after removing node n. The ratio |N n |/ |N| represents the robustness measure r. In fact, the robustness metric R quantifies the area enclosed by the observed curve of the r-values obtained during the node removal process and the lines x = 0 and y = 0. A smaller area indicates better performance of the node centrality algorithm, while a larger area suggests poorer performance. Thus, a smaller R value implies a faster collapse. The normalization factor 1/ |N| ensures that R values can be compared across networks of different sizes.

Monotonicity
A good node centrality method should be able to uniquely determine the centrality of each node rather than roughly categorizing them. Methods with low monotonicity, such as KS or HI, may consider certain nodes to have the same centrality, but there are often subtle differences in centrality between nodes. Therefore, this paper employs the monotonicity metric M [56] to measure the monotonicity of the Spon method, Here, |V| represents the number of nodes in the network, L denotes the ranking list generated by the node centrality ranking method, and s i represents the count of nodes at rank i in the ranking list. The monotonicity metric M ranges from [0, 1], where a value closer to 1 indicates a stronger monotonicity.

Correlation analysis
Before delving into the ability of each method to identify important nodes, we first use correlation metrics examine whether Spon utilizes more information to determine the centrality of nodes. In fact, correlation metrics are highly valuable indicators that are often overlooked but hold significant potential. A lower correlation index between the Proposed Method and the Baseline Method may indicate that the Proposed Method can offer insights beyond the Baseline Method [40]. In this study, we employ Kendall's Tau (τ ) metric to measure the correlation between the ranking lists of important nodes obtained from different methods. Figure 2 displays the correlation matrices between different centrality metrics in four networks: Dolphins, Jazz, Minnesota, and HepPh. The correlation matrices for each network are presented in appendix A. From figure 2 and appendix A, it is evident that in the majority of networks, the τ values between Spon and CI exceed 0.7, and in some networks, even surpass 0.8. This indicates a strong similarity in the ranking of node centrality between Spon and CI. Furthermore, in most networks, the τ values between Spon and HI also exceed 0.7, suggesting a substantial similarity in the ranking of node centrality between Spon and HI. In certain networks, Spon exhibits high correlation with KS, and there is also significant correlation among KS, HI, and CI. Therefore, compared to CI, HI, and KS, the ranking of node centrality generated by Spon does not significantly differ from these three methods, indicating that Spon does not utilize additional information to assess node centrality.
However, in the majority of networks, SC, SLC, CC, and EC exhibit extremely high correlation among themselves, while showing lower correlation with Spon. This suggests that the ranking of node centrality produced by Spon differs significantly from that generated by the aforementioned four metrics. On the other hand, the ranking results produced by SC, SLC, CC, and EC are more similar to each other. Consequently, compared to SC, SLC, CC, and EC, Spon employs additional information to evaluate node centrality.
Among the aforementioned metrics, we are particularly interested in the differences between Spon and Social Capital. Social Capital determines node centrality based on the sum of a node's degree and the degrees of its first-order neighbors, while the definition of Spon has been mentioned earlier in the text. Although there is conceptual overlap between the two, Spon and Social Capital still have differences: (1) both SC and Spon involve the first-order neighbors of nodes in their calculations, but in the definition of Social Capital, the degrees of the first-order neighbors are used, whereas Spon does not consider the degrees of the first-order neighbors. (2) Social Capital calculates the sum of the degrees of a node and its neighboring nodes, whereas Spon calculates the sum of the ratios of the number of neighbors' neighbors. (3) Social Capital considers information up to the second-order neighbors at most, while Spon can consider information up to the third-order neighbors.

Connectivity test
We evaluate the accuracy of these methods by observing the changes in robustness measure r during the node removal process and the magnitude of the robustness metric R. Figure 3 demonstrates the accuracy performance of Spon in the Dolphins, Jazz, Minnesota, and HepPh networks (the accuracy performance of each network is shown in appendix B). Figure 3, along with appendix B, illustrates the collapse process in the 16 empirical networks through the removal of nodes using Spon and other important node identification methods. In the experiment, we score the nodes using Spon and other methods, rank the nodes based on the scores, and then sequentially remove nodes according to their rankings. From the graph, it can be observed that the robustness curve of Spon is consistently positioned at the bottom in all 15 networks, and the area enclosed by the curve and the axes is the smallest. This indicates that Spon only requires the removal of a relatively small number of nodes to cause a widespread collapse of the network. Overall, compared to other methods, Spon induces network collapse more rapidly, indicating its ability to identify important nodes in the network. Table 2 presents a comparison of the R values for different methods on the 16 datasets. The results show that Spon has the lowest R value in 15 datasets and ranks second in only one dataset. In general, Spon performs the best in identifying important nodes.

Monocity
We evaluated the monotonicity of Spon and other methods using rank distribution plots and the monotonicity measure M (rank distribution plots for each network are shown in appendix C). The x-axis of the rank distribution plot represents the node ranks, while the y-axis represents the number of nodes with the same rank. Therefore, if a straight line can be drawn at the bottom of the rank distribution plot, it indicates better monotonicity for that method. Figure 4 presents the rank distribution plots for different methods on the Dolphins, Jazz, Minnesota, and HepPh networks, while table 3 provides the results for the monotonicity measure. Based on table 3, figure 4, and appendix C, it can be observed that KS and HI exhibit the poorest monotonicity, while Spon, EC, CC, SC, and SLC demonstrate the highest monotonicity. Among these four methods, EC achieves the highest average monotonicity, approaching 1. SLC follows closely behind, and Spon ranks third, with only a marginal difference of 0.003 in average monotonicity compared to EC and a difference of 0.0008 compared  to SLC. Both EC and SLC rely on the centrality of neighboring nodes to determine the centrality of the target node, and EC involves multiple iterations in its computation process, explaining their superior monotonicity performance.

CPU time
The time complexity of Spon is O n 2 , which is relatively low. However, solely comparing time complexity does not clearly demonstrate the efficiency differences of different methods on large-scale networks. Therefore, in this study, we implemented these methods using the same programming language and coding style on the same computer to compare the CPU time consumed by different methods during runtime. The computer used had an AMD R7-5800 CPU, which is a 64-bit processor with 8 cores and a clock speed of 3.40 GHz. We utilized Python as the programming language, which is commonly used in scientific research. The CPU time results are presented in table 4. Appendix D provides a comparison graph of CPU time, allowing for a more intuitive understanding of the efficiency differences among the methods. From table 4 and appendix D, it can be observed that in the majority of networks, the SC, KS, Spon, and SLC algorithms exhibit the highest efficiency, especially in large-scale networks. Among them, SC and KS demonstrate the highest algorithm efficiency, requiring the least CPU runtime, followed by Spon and SLC, although the difference between Spon and the other two is not significant. In fact, SC only considers the first-degree and second-degree of nodes, and the computation process involves simple addition. On the other hand, Spon not only considers the first-degree and 'second-degree' of nodes but also calculates the SPN value of nodes. Additionally, KS just iteratively removes nodes with degree less than or equal to k, which is evidently more efficient than Spon. However, Spon significantly outperforms SC in terms of accuracy and surpasses KS in both monotonicity and accuracy. Moreover, the algorithm efficiency of Spon does not significantly lag behind SC and KS. Therefore, Spon combines accuracy and algorithm efficiency, making it more advantageous overall.

Sensitivity of Spon
In the previous experiments, we discussed the performance of Spon, and in this section, we will examine the sensitivity of Spon. Although Spon achieves the highest accuracy in the majority of networks, it must be acknowledged that Spon performs poorly when confronted with certain specific network structures. Let us consider a network depicted in figure 5. From the description in section 3.2, it can be observed that Spon determines node centrality based on neighborhood information, which can lead to inaccuracies in certain special network structures. Taking the example of figure 6(a), according to the definition, in this graph, PN i = 13/4, PN j = 1, and PN k = 4. Consequently, we can calculate SPN i = 1 × 4 = 4, SPN j = (13/4) + 4 × 3 = 61/4, and SPN k = 1 × 1 = 1. In this case, node j is identified as the most important node, which is counterintuitive as node i should be the most important. Similar issues exist in figure 6(b), the Jazz network.
The reason behind this phenomenon is that node j's neighborhood includes a highly influential node i, while node i 's neighborhood lacks such a node. As a result, the node j, which is connected to the important node i, is recognized as important, while the node i, connected to non-important nodes, is recognized as non-important. One possible improvement approach is to include the PN i value of node i when calculating SPN i . This would address the issue but would fundamentally change the Spon method, as its core principle is to determine node centrality based on neighborhood information rather than the node's own information.
Another possible improvement approach is to consider both neighborhood information and node information simultaneously. For example, node centrality could be determined based on PN when SPN > PN, and based on SPN when SPN ⩽ PN. This approach would also address the issue but would make Spon more complex. Therefore, resolving this limitation is one of our key focuses for future work.
Apart from the network structure, changes in network edges can also have an impact on the accuracy of Spon. To investigate the influence of edge variations on the accuracy of Spon, we randomly removed 10%, 20%, 30%, 40%, and 50% of the edges in the network, observing the scores of Spon and whether there were any changes in important nodes. The edge removal actions were performed independently for each percentage. Put simply, we used five identical graphs and removed 10%, 20%, etc, of the edges separately, observing the changes in Spon scores, rather than removing 10%, 20%, etc, of the edges based on the previous removal (as this would certainly lead to a decrease in Spon scores).
In all 16 networks, the highest Spon scoring nodes hardly changed. In other words, removing less than half of the edges hardly affected the highest central nodes in the network. Figure 6 illustrates the score variations of the Top-5 nodes in terms of centrality scores in the Dolphins, Jazz, Minnesota, and HepPh networks. From figure 6 and appendix E, it can be observed that as the number of removed edges increased, the Spon scores showed a gradual decline, but there were also instances of score rebound. The reason for this phenomenon might be that the sets of removed edges in the two experiments were not the same, resulting in Figure 6. The trend of Spon scores on the Dolphins, Jazz, Minnesota, and HepPh networks. The x-axis of the graph represents the proportion of removed edges relative to the total number of edges, while the y-axis represents the Spon scores of the Top-5 nodes. 'Top1' indicates the node with the highest Spon score in the network, 'Top2' represents the node with the second-highest Spon score, and so on. score rebounds. However, overall, the degree of score rebound was not significant, and Spon scores still exhibited a declining trend.

W-Spon centrality
The concept of Spon can also be applied to undirected weighted networks. In this section, we propose a simple method for extending Spon to handle weighted networks, which we refer to as W-Spon. The following is the definition of W-Spon in undirected weighted networks.
Consider a simple undirected weighted graph G (V, E, W), where V represents the set of nodes in the graph, E represents the set of edges, and W represents the set of weights assigned to all edges. Let w e denote the weight of a particular edge. We define N v1 as the first-order neighborhood set of a node v in V. So the weighted neighborhood proportion WPN v of node v is defined as: where s v represents the strength of node v, defined as the sum of the weights of edges directly connected to node v, and s j represents the sum of the strengths of node v's immediate neighboring nodes. Based on these definitions, we can derive the weighted sum of the neighborhood proportion for node v, denoted as WSPN v : Compared to the Spon, W-Spon introduces changes to the definition of the neighborhood proportion. W-Spon uses the ratio of the sum of edge weights to replace the ratio of neighbor count, resulting in the WSPN-Value of node v. Figure 7 provides an example calculation of W-Spon.
It can be seen that, similar to Spon, W-Spon's time complexity is reflected in two key steps: computing WPN values and summing the WPN values of neighboring nodes. Since computing WPN values requires knowing the number of edges connected to a node or its first-order neighbors, which depends on the number of neighbors, the time complexity of the first step is O n 2 . The second step is the same as Spon, resulting in a total complexity of O n 2 for W-Spon.

Baseline methods for weighted networks
To test the feasibility of the W-Spon method described earlier, we compared its accuracy, monotonicity, and CPU running time with these centrality methods for weighted networks.

W-Core (W-Core)
W-Core [37,38] is an extension of the KS decomposition method, and its core idea is to use weighted degrees instead of the original node degrees. Based on this, the KS method is applied recursively to remove nodes. The weighted degree, denoted as k ′ i for node i, is defined as follows: In equation (15), k α i represents the degree of node i, and k i j w ij is the sum of weights of all first-order neighbors of node i, which corresponds to the node's strength. α and β are tuning parameters that regulate the contribution of degree and weight to node centrality. In this study, we adopt the parameter selection consistent with Garas, i.e., α = β.

M-Core (M-Core)
M-Core [38] is also an extension of the KS decomposition method, and its core idea is similar to W-Core. It still replaces the node degrees with specific weighted degrees and employs the KS method to recursively remove nodes. The weighted degree definition of M-Core is as follows: In equation (16), ω ij represents the edge weight between node i and node j, N (i ) denotes the set of neighboring nodes of node i, and the definition of D m-core is as follows: In equation (17), ω * represents the average value of all edges in the weighted graph G.

I-Core (I-Core)
I-Core [38] belongs to the same category as M-Core. The weighted degree of I-Core is defined as follows: The definition of D i-core in the above equation is as follows: The β infec in equation (19) represents the infection rate derived from the SIR propagation model.

Betweenness centrality (BC)
BC [57] is also a highly classical centrality algorithm. In this paper, we choose its weighted version for performance comparison. The definition of weighted BC is as follows: In equation (20), n st (i ) represents the number of shortest paths between node pairs (s, t) that pass through node i, and g st represents the total number of paths connecting node pairs (s, t).
In addition to that, the other two comparative methods are weighted EC [58] and weighted HI [59].

Connectivity test
Appendix F shows the performance of W-Spon compared to other methods on 10 empirical weighted networks. From the figure, it can be observed that the robustness curve of W-Spon is consistently located at the bottom for all 8 networks, and it encloses the smallest area with respect to the axes. Therefore, W-Spon causes the network to collapse more quickly. Table 6 presents the comparison of robustness R values for different methods on the 10 datasets. The results indicate that W-Spon has the smallest R value on 8 datasets, ranking second on only 2 datasets. Overall, W-Spon identifies important nodes more accurately.

Monocity
Similarly, we use the monotonicity metric to measure the ability of W-Spon Centrality and other methods to capture subtle differences in node rankings. Appendix G presents the rank distribution plots for different methods, and table 7 provides the results of the monotonicity metric for different methods. It can be observed that the monotonicity of W-Spon is weaker than that of I-Core and EC among all methods. However, the difference between W-Spon and two methods is minimal, and W-Spon exhibits higher accuracy compared to these two methods. Therefore, W-Spon still maintains an overall advantage.

CPU time
Following the same procedure as Spon, we implemented these methods on the same computer using the same programming language and coding style. We also compared the CPU time consumed by different methods during runtime. The computer used for the experiment had the same CPU and programming language as Spon. The CPU runtime results are presented in table 8. It can be observed that running W-Spon requires more CPU time compared to M-Core and I-Core, while in some networks it requires less time compared to HI, EC, and W-Core. In large-scale networks, BC takes the longest time, exhibiting a significant difference from the other methods. Since W-Spon involves traversing the set of edges during implementation, it does not have an advantage in this aspect. However, W-Spon demonstrates higher accuracy and monotonicity compared to the aforementioned methods. Therefore, considering the overall performance, it is feasible to extend Spon to weighted networks following the approach of W-Spon.

Conclusion
This paper proposes a node centrality identification method called Spon, based on neighborhood information. The method first calculates the neighbor proportion (PN) for each node and then aggregates the PN values of its neighboring nodes to determine node centrality. As a result, Spon exhibits extremely low time complexity. By comparing Spon with eight centrality metrics such as CycleRatio, CI, and Social Capital, the performance of Spon can be evaluated. Results from a series of experiments demonstrate that Spon performs well in terms of accuracy, monotonicity, and algorithm efficiency. Additionally, this paper presents a method called W-Spon to extend Spon to weighted networks, which is compared with six other methods including W-Core, M-Core, and I-Core. It can be observed that W-Spon also possesses advantages compared to these methods. Despite discussing the various advantages of Spon, it must be acknowledged that Spon still has certain limitations. Firstly, it may perform poorly in certain specific network structures. Secondly, Spon still relies on network topology for computations, which may impose certain limitations. Therefore, breaking these limitations and designing another method that combines algorithm efficiency and accuracy remains one of our main future research directions. In particular, both Spon and W-Spon are designed for undirected graphs, and extending the Spon method to directed graphs or even temporal sequence networks is another future research direction. As the research continues, we will persist in pursuing our goals and strive to design new methods that are innovative, accurate, and efficient.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).