The recoverability of network controllability with respect to node additions

Network controllability is a critical attribute of dynamic networked systems. Investigating methods to restore network controllability after network degradation is crucial for enhancing system resilience. In this study, we develop an analytical method based on degree distributions to estimate the minimum fraction of required driver nodes for network controllability under random node additions after the random removal of a subset of nodes. The outcomes of our method closely align with numerical simulation results for both synthetic and real-world networks. Additionally, we compare the efficacy of various node recovery strategies across directed Erdös–Rényi (ER) networks, swarm signaling networks (SSNs), and directed Barabàsi Albert (BA) networks. Our findings indicate that the most efficient recovery strategy for directed ER networks and SSNs is the greedy strategy, which considers node betweenness centrality. Similarly, for directed BA networks, the greedy strategy focusing on node degree centrality emerges as the most efficient. These strategies outperform recovery approaches based on degree centrality or betweenness centrality, as well as the strategy involving random node additions.


Introduction
Network controllability has been extensively investigated [1], particularly due to its applicability to various complex systems that can be represented as networks.These include domains such as power grids [2], transportation systems [3], and telecommunication systems [4].Controllability represents an important characteristic of such systems, affording the ability to achieve varied control objectives.For example, manipulating approximately 17% neurons in the C. elegans worm can elicit coordinated body responses, while controlling 5% of a swarm of honeybees can guide the swarm to new destinations [5].However, controllability can falter in the face of malicious attacks or natural catastrophes, resulting from the failure of system components [6].To improve the resilience of the system against attacks [7], bolstering its robustness becomes paramount.Moreover, there is a pressing need to explore strategies for efficiently restoring failed components to ensure controllability within the system [8].
Network controllability discussed in this research pertains to the concept of structural controllability within directed networks, which do not contain self-loops.In the domain of control theory, a system is considered controllable if it can transition from an initial state to any desired state in a finite time by applying external inputs [9].Lin introduced the notion of structural controllability [10], where a system exhibiting structural controllability maintains a high likelihood of controllability even after modifying the weights of interconnections.Exploring the intricate interplay between network topology and controllability, Liu et al [11] devised the framework of structural controllability for directed networked systems.This framework focuses on injecting specific external input nodes to achieve full system controllability.Importantly, it is worth noting that network controllability differs from the widely recognized concept of 'pinning controllability' [5].The latter explores methods for driving the system to specific states by manipulating specific nodes.For example, network synchronization explores whether all nodes can exhibit identical dynamic trajectories [12] and investigations of network consensus problems [13] aim to determine strategies to guide all nodes towards the same state.
Errors or attacks within a system can cause a degradation in network performance [14].Effective and efficient recovery of networks after attacks has gained considerable attention [15].For example, Shang has explored local recovery strategies from a network percolation perspective [16], as well as strategies for restoring consensus in nonlinear multiagent systems [8].Moreover, He et al [17] have defined network recoverability as a network's capacity to revert to a desired state after facing disruptions.In the context of network recoverability with a focus on network controllability, Chen et al [18] explored efficient recovery strategies after random link removals.Their study revealed that the greedy recovery strategy outperforms degree-based and eigenvector-based recovery strategies.Additionally, they introduced an analytical method based on degree distributions to predict network controllability during recovery.However, their investigation did not include the effective recovery strategy of nodes after node failures, a scenario frequently observed in real-life networks.Addressing this gap, our study aims to predict network controllability under random node additions and explore efficient strategies for node recovery in network controllability.
Since network recovery is the reverse process of attacking or disrupting networks, we can derive various recovery strategies by drawing insights from the attack process.Researchers have explored efficient strategies to undermine network controllability and methods to forecast network controllability under attack scenarios.Targeted attacks are generally more detrimental than random attacks [6].Pu et al [19] demonstrated that node removals based on node degrees are more harmful than random node attacks in directed Erdös-Rényi (ER) and scale-free (SF) networks.Directed ER networks are synthetic networks generated by randomly placing directed links, while directed SF networks are generated to ensure that the degree distributions follow power-law distributions.Wang et al [20] found that intentionally attacking bridge links, whose removal can disconnect the network, effectively disrupts network controllability compared to link removals based on node degrees and distances in directed ER and SF networks.Critical nodes and links are identified based on their propensity to increase the number of driver nodes required for network controllability after removals, where driver nodes are defined as nodes where external inputs are injected [11].Building on this, Lou et al [21] developed a hierarchical attack framework that incorporates critical nodes or links.In this framework, nodes or links are removed based on categorical priorities, and within the same category, nodes or links with higher centrality values are removed first.Their findings presented that the destructiveness of this attack framework is stronger than strategies that solely leverage centrality features like node degrees or betweenness when targeting nodes or links.Given that attack strategies considering degree and betweenness are extensively investigated, we aim to explore the effectiveness of different degree-based and betweenness-based recovery strategies in terms of network controllability after node removals.
Several techniques have been employed to predict the minimum number of driver nodes required under attack scenarios, including regression models, analytical methods using degree distributions, and machine learning approaches.Sun et al [6] introduced linear regressions for removal fractions less than l c and quadratic regressions for fractions greater than l c (where l c is the fraction of critical links) to approximate the fraction of driver nodes for random and targeted link removals based on critical links.Liu et al [11] proposed an analytical method based on degree distributions to estimate the minimum fraction of driver nodes.Chen et al [18] presented an analytical method using degree distributions for random link removals.Dhiman et al [22] utilized an artificial neural network to predict the minimum number of driver nodes under link-targeted removals, outperforming analytical methods based on critical links.Lou et al [23] predicted controllability robustness under targeted attacks by utilizing convolutional neural networks, which process the adjacency matrix as a grayscale image.The performance of regression models are worse than the analytical methods using degree distributions and machine learning approaches.While machine learning methods require training data, analytical methods offer time and computational cost savings.This encourages us to develop an analytical method based on degree distributions to predict network controllability during the random node recovery process.
In this study, we focus on the recovery process of network controllability after random node removals.We propose an analytical method based on degree distributions to approximate the number of driver nodes required during random node additions.To validate our analytical approach, we apply it to both synthetic and real-world networks.Additionally, we investigate six other node recovery strategies on synthetic networks: degree-based recovery strategy, betweenness-based recovery strategy, updated degree-based recovery strategy, updated betweenness recovery strategy, greedy degree-based recovery strategy, and greedy betweenness-based recovery strategy.To measure the efficiency of a recovery strategy for network controllability, we utilize two modified recoverability indicators of the recovery process [21,24].
The remainder of the paper is the following.Section 2 provides a detailed description of the networks utilized in this study.Section 3 outlines the attack scenario and the recovery strategies employed, the introduction of network controllability, and the two recoverability indicators used in this study.Section 4 presents the analytical approximation for network controllability under random node additions and based on the recoverability indicators, we compare and evaluate the efficiency of different recovery strategies on synthetic networks.The last section of the article is dedicated to conclusions and discussions.

Network data
In this study, we evaluate the effectiveness and efficiency of our proposed methods by applying them to synthetic networks and real-world communication networks.

Synthetic networks
The synthetic networks under investigation comprise directed ER networks, swarm signaling networks (SSNs), directed Barabàsi-Albert (BA) networks and directed SF networks.

(i) Directed ER networks
Directed ER networks with N nodes are constructed by randomly placing directed links between any two nodes with a given probability p ER .Both the in-degree and out-degree distributions of the generated directed ER network follow the Poisson distribution.In this research, two directed ER networks have been generated with N = 500, p ER = 0.007 and N = 1000, p ER = 0.004, respectively, where the average total degree is 7 and 8, correspondingly.(ii) SSNs In this study, we employ the topology of SSNs proposed and developed by [25,26].The SSN exhibits a regular out-degree distribution, while its in-degree distribution follows a Poisson distribution.To generate SSNs, we specify two parameters: the number of nodes N and the out-degree value k.Each node randomly creates k outgoing links to other nodes.Specifically, we generated SSNs with N = 500, k = 2 and N = 500, k = 4.The total average degree is 4 and 8, respectively.(iii) Directed BA networks To generate a directed BA network, we first generate an undirected BA network [27] by giving two parameters: the number of nodes N and the number of links m that a new node preferentially attaches to existing nodes with high degrees.The initial network is a star network with m + 1 nodes.Once the undirected BA network is established, we proceed to randomize the orientations of links, thereby transforming the network into a directed structure.We generated directed BA graphs with N = 500, m = 2 and N = 500, m = 4, respectively, where the total average degree is 4 and 8 respectively.(iv) Directed SF network SF networks have power-law degree distributions, which are characterized by a specific power-law exponent γ and the minimum value of the degree α.To generate SF networks, we first generate a power-law degree sequence using the Python package powerlaw [28].Next, we use the configuration model [29] to generate a digraph and remove self-loop links.To ensure that the generated network conforms to the power-law distribution, we use the same Python package to fit the degree distributions.We only use generated networks that have a difference between the exponent used and the average fitting power-law exponent of the in-degree and out-degree distribution smaller than 0.01.In this study, we choose two SF networks with 10 000 nodes, one with γ = 2.3, α = 3 and the other one with γ = 3, α = 3.The average total degrees are around 22 and 10.3, respectively.

Real-world networks
For the real-world networks, we choose 202 communication networks from the Internet Topology Zoo data set [30], whose number of nodes ranges from 11 to 754.To change undirected communication networks into directed networks, based on the node attribute: source node or target node [6], we assign the direction of the link from the source node to the target node.The properties of the 202 communication networks, in terms of number of nodes, number of links and average degree, are depicted in figure 1. Apart from the small and medium-sized communication networks, we incorporate an additional seven larger directed networks obtained from the network data repository [31] and the SNAP dataset collection [32].These selected networks originate from diverse domains, such as the world wide web (WebSpam [33] and Indochina [34]), a Wikipedia adminship election dataset (Wiki Vote [35]), a retweet network dataset (Qatif [36]), an E-mail network dataset (Email Eu core [37]), and internet peer-to-peer network datasets (p2p Gnutella25 [38] and p2p Gnutella08 [39]).Essential details such as the number of nodes (N) and links (L) and the average degree (d av ) of these seven larger networks are presented in table 1.

Attack and recovery scenarios
In this study, the network attack process is executed iteratively.At each time step, a node is uniformly and randomly chosen and subsequently removed.Concurrently, the links connected to other nodes are eliminated when the selected node is removed.We stop removing nodes when 15% of the nodes are removed from the network.
In the recovery phase, we employ seven distinct recovery strategies within our investigation: random recovery strategy, degree-based recovery strategy, betweenness-based strategy, updated degree recovery strategy, updated betweenness recovery strategy, greedy-degree recovery strategy, and greedy-betweenness recovery strategy.When implementing these strategies, we focus on restoring the nodes.At each step, a single node is recovered along with its previously removed links that connect to nodes still present in the attacked graph based on the recovery strategy.We persist in adding back the removed nodes until the original network is fully restored.
The random recovery strategy involves selecting a node uniformly and randomly from the set of removed nodes at each step.This chosen node is then added to the attacked network.On the other hand, the degree-based recovery strategy relies on the degree information derived from the initial graph (i.e. the graph prior to the attack).The procedure entails ranking the removed nodes based on their degree values in the original network.Throughout the recovery phase, these nodes are gradually reintroduced to the network in accordance with their degree ranks and their original connections.Similarly, the betweenness recovery strategy is rooted in betweenness centrality in the original graph.Nodes that have been removed are ranked according to their betweenness values as calculated from the original network.Nodes with higher betweenness centrality rankings are afforded higher priority during the recovery process, and they are added into the network earlier, including their original connections with other existing nodes in the attacked graph.
The updated degree recovery strategy involves selecting a removed node at each time step that, upon reintegration into the attacked network, will possess the highest degree compared to other removed nodes undergoing the same process.In cases where multiple nodes would have the same highest degree after reintegration, their degrees in the original network are compared.The node with the highest original degree is prioritized for the addition.Should the degrees in the original network be equal, a random selection between the nodes is made.Similarly, the updated betweenness recovery strategy follows a comparable approach.The key distinction lies in the use of betweenness values instead of degree values at each step for selecting the node to be reintroduced into the network.
The greedy-degree recovery strategy operates by selecting a removed node from the set in each step to minimize the number of driver nodes most effectively.If multiple nodes offer the same potential reduction in the minimum number of driver nodes, the original degrees of the removed nodes are compared.The node with the higher initial degree is given priority for reintegration.If removed nodes yield an equal reduction in the minimum number of driver nodes and have identical initial degrees, a random selection determines which node is added back.Similarly, the greedy-betweenness recovery strategy follows a similar approach.However, instead of relying on initial degrees as a determining factor for reintegration, the initial betweenness values of the removed nodes are used.

Network controllability
Consider a linear, time-invariant networked system composed of N nodes, governed by the following equation: Here, the N × 1 vector T represents the state of each node.The matrix A, with dimensions N × N, characterizes the connections between nodes with corresponding strength.Furthermore, the N × M matrix B serves as the input matrix, indicating which nodes are under direct control through the A linear, time-invariant networked system is considered controllable if its node states can be manipulated to reach any desired state within a finite time by applying a set of external inputs.The Kalman rank criterion provides a way to determine controllability, where the rank of the controllability matrix [B, AB, A 2 B, . . ., A N−1 B] should be equal to N for the system to be fully controllable [9].To gain an understanding of the Kalman rank criterion, we can derive the formal solution of equation ( 1) with an initial condition of x (0) = 0 as x (t) = ´∞ 0 e A(t−τ ) Bu (τ ) dτ .By expanding e A(t−τ ) into a series, we can deduce that x (t) is a linear combination of the matrix [B, AB, A 2 B, . . ., A N ′ B, . . .].According to the Cayley-Hamilton theorem, for N ′ > N, the rank of the matrix [B, AB, A 2 B, . . ., A N ′ B, . . .] is equivalent to the rank of the controllability matrix [B, AB, A 2 B, . . ., A N−1 B].Consequently, if the rank of the controllability matrix is less than N, it implies that the matrix [B, AB, A 2 B, . . ., A N ′ B, . . .] cannot span the state space of dimension N entirely.In such cases, an input u (t) cannot be found to steer x (0) to an arbitrary state x (t) [40].In practical applications, the implementation of the Kalman rank criterion poses challenges due to the requirement of obtaining information about the network's interaction strengths and the involvement of computationally intensive calculations, especially for large-scale networks.To mitigate these challenges, Lin [10] introduced the concept of structural controllability.Additionally, Liu et al [11] presented the maximum matching method and the minimum inputs theorem to determine the minimum number of nodes (driver nodes) that must be controlled to ensure controllability.To determine the count of driver nodes, a directed network should first be transformed into a bipartite network.Subsequently, a maximum matching edge set can be derived using the maximum matching algorithm [41], consisting of N M directed edges without shared source nodes or end nodes.The end nodes of the matching edges are termed matched nodes, while the remaining nodes are unmatched.The calculation of the minimum number N D of driver nodes is as follows (2)

Analytical approximations of the number of driver nodes
According to Liu et al [11], under the assumptions of no self-loops and absence of degree correlations among nodes, for a directed network represented by G (N, L) with N nodes and L links, the minimum fraction of driver nodes can be approximated by using generating functions of in-and out-degree distributions (G in (x) and G out (x), respectively) as well as excess in-and out-degree distributions (H in (x) and H out (x), respectively).The aforementioned generating functions are defined as follows: , where k in and k out correspond to in-and out-degree, respectively, and P in (•) and P out (•) signify in-and out-degree probability distribution, respectively.Then the minimum fraction of driver nodes is given by: where ω 1 , ω 2 , ω1 and ω2 satisfy and k denotes half of the average degree equal to the average in-degree and the average out-degree, Under node removals, the driver nodes can be classified into two categories.The first category consists of N D driver nodes controlling the remaining network, while the second category includes N r removed nodes with the assumption that each removed node should be controlled separately.We define the fraction of driver nodes n D as n D = ND+Nr N .After randomly removing a fraction p of nodes in the network, the fraction of driver nodes n D satisfies

Analytical approximations under random node removals and random node additions
Based on the research of Shao et al [42], the generating function following the random removal of a fraction p of nodes is analogous to the initial generating function but with the modified argument x = p + (1 − p) x.Consequently, the generating functions of in-and out-degree, as well as excess in-and out-degree, are updated as follows after randomly removing a proportion p of nodes: , . ( Next, we use equations ( 4) and ( 6) to acquire the fraction of minimum number of nodes n D after randomly removing a fraction p of nodes: where ω 1 , ω 2 , ω1 and ω2 satisfy and k is half of the average degree equal to the average in-degree and the average out-degree, In this study, we will denote a network perturbation, either a node removal or a node addition, as a challenge.This study uses challenge K to present the number of manipulations under node removals or additions.A manipulation represents a node removal or a node addition.Challenge K represents that a fraction p = K N of nodes was removed during the removal process.Hence K = 0 corresponds to the graph in the initial state before the attack.Then the minimum fraction of driver nodes at challenge K satisfies satisfying equation (9).Under random node additions, suppose the total number of removed nodes (challenges) during the attack process is K a , the total number of nodes added back at challenge K is K − K a , and the fraction of removed nodes p at challenge K is equal to p = 2Ka N − K N .Therefore, during random additions, the minimum fraction of driver nodes at challenge K is satisfying equation ( 9).
(i) Directed ER networks Both the in-degree distribution P in (k in ) and the out-degree distribution P out (k out ) of ER networks follow a Poisson distribution with average degree k.Therefore, the generating functions of in-degree and out-degree are as follows, The minimum fraction of driver nodes n D at challenge K under random removals in the ER networks can be obtained through equations ( 7), (10) and (9) as where = 0.Then, the minimum fraction of driver nodes n D at challenge K under random additions satisfies where In SSNs with N nodes and average in-degree and out-degree equal to k, the in-degree distribution resembles a Poisson distribution with mean value k and the out-degree distribution follows a Dirac delta function.As a result, the generating functions of in-degree and out-degree distribution can be denoted as follows, Based on equations ( 7), ( 10) and ( 9), the minimum fraction of driver nodes n D at challenge K under random removals can be calculated by where Then the minimum fraction of driver nodes n D at challenge K under random additions can be obtained by where For SF networks, we suppose the in-degree distribution and out-degree distribution both follow the pure power-law distribution with minimum degree a and exponent γ, which can be denoted as follows, where where ζ (γ, a) is the Hurwitz Zeta function.The average degree satisfies k = ζ(γ− 1,a)  ζ(γ,a) .Correspondingly, the generating functions can be obtained by where Φ (z, s, α) is the Lerch transcendent function.Together with equations ( 7), ( 10) and ( 9), the fraction of the minimum fraction of driver nodes n D at challenge K under random removals can be calculated by where 1 − ω 2 − Hout (1 − Hin (ω 2 )) = 0. Then the fraction of the minimum fraction of driver nodes n D at challenge K under random additions can be acquired by where 1 − ω 2 − Hout (1 − Hin (ω 2 )) = 0.

Recoverability indicators
To facilitate the comparison of various recovery strategies, recoverability indicators are necessary.One such indicator is the recovery energy E, as proposed by Sun et al [24].We adopt the recovery energy as a measure of recoverability.The calculation of recovery energy is outlined as follows: where K a is the number of challenges occurring during the attack process.
Taking inspiration from the robustness metric suggested in [21,43] to assess the effectiveness of attack strategies, our study introduces a recoverability metric denoted by R to quantify the recoverability.This robustness metric quantifies the impact of an attack strategy by averaging the network controllability at each step during an attack [21].Similarly, we compute the recoverability metric R for a recovery strategy by averaging the network controllability at each step during the recovery process.This allows us to quantify the performance of different recovery strategies with respect to network controllability: Since the value of K a remains constant for different recovery strategies for a given network, the ranking of recovery strategies remains consistent across both recoverability indicators.The physical significance of the recovery energy lies in its representation of the total minimum number of required driver nodes throughout the recovery process.A higher recovery energy signifies a greater demand for driver nodes during recovery, whereas a lower recovery energy implies that network controllability can be regained with fewer driver nodes.
In essence, recovery strategies with lower recovery energy E or a reduced unique recoverability measure R are deemed more efficient in restoring network controllability.

Validations of the analytical method
To validate the proposed analytical method for random removals and additions, we conducted simulations on both synthetic and real-world networks to determine the minimum fraction of required driver nodes.Each simulation realization involved a sequence of attacking and recovering the network, with this process repeated 10 000 times.During the attack phase, a single node was randomly removed at each step, and the recalculated minimum fraction of necessary driver nodes for network controllability was recorded.Subsequently, in the recovery phase, we reintroduced one removed node along with its original connections at each step and recalculated the minimum fraction of driver nodes.The recovery process concluded upon the restoration of all initially removed nodes.For synthetic networks characterized by specific parameters, a new network was generated for each simulation realization.However, for real-world networks, the same network was employed across all realizations.
In figure 2, we present the simulation results and analytical predictions of the proposed method on synthetic networks.The results of random node removal and addition are displayed in blue and red, respectively.The analytical approximations are represented by dashed lines, and the algorithm results are presented with solid lines.Our analysis shows that the analytical approximations accurately predict the minimum fraction of driver nodes in most synthetic networks, except for SF networks with N = 10 000, γ = 2.3, and a = 3, where a gap is observed between the dashed lines and solid lines.
We also observe the discrepancies between simulation and analytical results for three real-world networks, which is depicted in figure 3 where the solid lines represent the simulation results and the dashed lines represent the analytical results.To further explore the reasons for these discrepancies, we conduct two experiments on each of the three networks.
In the first experiment, we conduct the degree preserving rewiring strategy that we maintain the original degree distribution of each network and randomly rewire two links in the graph.We then recalculate the minimum fraction of driver nodes and repeat this process for 10 000 iterations.We record the minimum fraction of driver nodes for each iteration and present the results in the form of a box plot.In the second experiment, we generate 10 000 graphs using the degree distributions obtained from the real-world networks by employing the configuration model [29].We then calculate the minimum fraction of driver nodes for each generated graph and represent the results as a box plot.
As depicted in figure 4, the results of both experiments reveal that the minimum fraction of driver nodes can vary for the networks with the same in-and out-degree distributions.Additionally, we observe that the The minimum fraction of driver nodes during random node removals and random node additions in synthetic networks.The blue lines depict node removals, while the red lines represent node additions.The solid lines are obtained by simulations.The blue and red dashed lines are the analytical approximations under random node removals and additions, respectively.We use nDavg to denote the mean minimum fraction of driver nodes in the simulations at each challenge and use nDavg to denote the analytical values of the minimum fraction of driver nodes at each challenge.mean value of the minimum number of driver nodes in the networks generated by the configuration model is equivalent to that obtained using the analytical approximation method.It should be noted that the analytical approximation method represents the expected value of the minimum number of driver nodes for graphs that have the same in-degree and out-degree distributions.Conversely, a real-world network is merely a single instance of networks that satisfy the specific in-degree and out-degree distributions.This fundamental difference between the analytical and real-world networks accounts for the gaps between the simulation and analytical results.

Analytical method with shifting
In order to reduce the discrepancies between the predicted and simulated values, we propose to adjust our original analytical model by applying a shift.First, we determine the exact value of the minimum fraction of driver nodes n D [0] ′ by applying the maximum matching algorithm.The shifting term β is then calculated as the difference between n D [0] ′ and the original analytical approximation Consequently, if the shifted analytical result of the minimum fraction of driver nodes at a particular Figure 3.The minimum fraction of driver nodes nD during random node removals and random node additions in three real-world networks.The blue lines depict node removals while the red lines represent node additions.Both are obtained by the maximum matching algorithm over 10 000 realizations.The blue dashed lines represent the analytical approximations under node removals and the red dashed lines represent the analytical approximations under node additions.nDavg presents the mean minimum fraction of driver nodes using the algorithm at each challenge and nDavg presents the analytical values of the minimum fraction of driver nodes at each challenge.

Figure 4.
Rewiring links and generating graphs using the configuration model can yield different values for the minimum fraction of driver nodes.'Rewiring' presents rewiring results, and 'CFM' represents results for the configuration model.The analytical results obtained using the in-degree and out-degree distributions are represented by the grey dashed lines.The mean minimum fraction of driver nodes calculated by the algorithm for networks after rewiring and for networks generated by using the configuration model are indicated by the red lines.

Figure 5.
The minimum fraction of driver nodes nD during random node removals and random node additions in three real-world networks and one SF network after shifting.The blue lines depict node removals, while the red lines represent node additions.Simulation results (solid lines) are obtained by the maximum matching algorithm over 10 000 realizations.The blue dashed lines are the shifted analytical approximations under node removals, while the red dashed lines are the shifted analytical approximations under node additions.nDavg denotes the mean minimum fraction of driver nodes in the simulations at each challenge and nDavg denotes the shifted analytical values of the minimum fraction of driver nodes at each challenge.
It should be noted that the analytical method using shifting will incur an additional computational cost due to the use of the maximum matching algorithm, which has a time complexity of O ( L √ N ) [41].Here, L denotes the number of links in the network and N represents the number of nodes in the network.
We show the results of the adjusted model for three real-world networks and SF(2.3, 3) in figure 5, where the prediction results are much better than those before shifting.Then we validate the shifting method using the dataset comprising 202 small-scale real-world networks from the Topology Zoo and seven large-scale networks.Validation involves calculating the absolute mean error (AME) and root mean square error (RMSE) between the results obtained from simulations and analytical results, both before and after applying shifting.AME is defined as the absolute difference between the simulation and the analytical results.Additionally, we calculate the proportion of challenges where RMSE was smaller than 5%, denoted as P RMSE⩽0.05 .We compare the results before and after shifting to demonstrate the effectiveness of the method.
We present the results of the shifted model for the Topology Zoo dataset in a histogram (figure 6), where the results before and after shifting are depicted in orange and blue, respectively.Furthermore, we observe an improvement in the approximations for the seven large graphs by comparing the results obtained before and after shifting (table 2).The results for the shifted model exhibit smaller AME and RMSE values, and higher P RMSE⩽0.05values, thus indicating the effectiveness of the shifting method.Besides using the shifting term .However, the results for the rescaled model for the Topology Zoo and seven large-scale networks are less good than those for the shifted model.This discrepancy could be attributed to the rescaled model's heightened sensitivity to scaling factors at each point, as opposed to the fixed modifications offered by the shifted model.The results for the rescaled model are reported in appendix A.

The efficiency of recovery strategies
We adopt two recoverability indicators to assess the effectiveness of distinct recovery strategies.For each synthetic network category with different sets of parameters, such as the directed ER network with parameters N = 500, p ER = 0.007, we generate a total of 10 000 networks.For each network instance, we proceed by randomly removing 15% of the nodes and subsequently employ diverse recovery strategies to restore the network.During the recovery phase, we reintroduce one node at each step in accordance with the chosen recovery method.We then recalibrate the minimum fraction of driver nodes required for network controllability until all previously eliminated nodes are reinstated.
Next, we compute the mean value of the minimum fraction of driver nodes across the 10 000 networks for each specific recovery strategy at every step.Subsequently, we sum these mean values to derive the recovery energy associated with the recovery strategy.For the various recovery strategies employed on synthetic networks, we present the corresponding recovery energy in figure 7. The recoverability metric R is summarized in table 3.In the case of directed ER networks and SSNs, the greedy-betweenness recovery strategy demonstrates the lowest recovery energy, followed by the greedy-degree recovery strategy.The remaining recovery strategies, ranked in order of increasing recovery energy, are updated betweenness recovery strategy, updated degree recovery strategy, betweenness-based recovery strategy, degree-based recovery strategy, and random recovery strategy.Regarding the directed BA network, the degree-related recovery strategies outperform the betweenness-related recovery strategies.The order of recovery energy ranking for different recovery strategies, from lowest to highest, is as follows: greedy-degree recovery strategy, greedy-betweenness recovery strategy, updated degree recovery strategy, updated betweenness recovery strategy, degree-based recovery strategy, betweenness-based recovery strategy, and random recovery strategy.Indeed, it is worth highlighting that the performance improvements brought about by the updated degree (or betweenness) recovery strategy are not substantial when compared to the performance of the corresponding degree-based (or betweenness-based) recovery strategy.On the other hand, the greedy-degree (or greedy-betweenness) recovery strategy significantly enhances performance in comparison to the degree-based (or betweenness-based) recovery strategy.The recovery strategy outcomes for small-sized networks, as presented in appendix B, are consistent with the results discussed here.

Conclusion and discussion
In this study, we have introduced an analytical approach based on degree distributions to estimate the minimum fraction of driver nodes needed for achieving network controllability through random node additions.We have also employed two recoverability indicators to assess the efficiency of seven recovery strategies after random node removals.These strategies include the random recovery strategy, degree-based recovery strategy, betweenness-based recovery strategy, updated degree recovery strategy, updated betweenness recovery strategy, greedy-degree recovery strategy, and greedy-betweenness recovery strategy.Upon analysis, we have observed a difference between our initial analytical predictions and simulation results in both synthetic and real-world networks.To address this inconsistency, we propose an adjustment to the original method to align the outcomes more closely.Regarding the seven recovery strategies, we have determined that the greedy-betweenness recovery strategy demonstrates superior efficiency in directed ER networks and SSNs, while the greedy-degree recovery strategy proves most efficient in directed BA networks.
With the investigation into approximating network controllability under random node additions complete, future research endeavors could delve into the development of analytical techniques for estimating network controllability under various recovery strategies.For instance, Wang and Kooij [44] have laid the groundwork for potential analytical methods to approximate network controllability under targeted node additions based on degree.Furthermore, considering the additional computation cost associated with the shifted model, there is potential for enhancing its effectiveness.One promising avenue is the exploration of algorithms with lower complexity to calculate the initial minimum number of driver nodes, thereby optimizing the performance of the shifted model.
Moreover, considering that cycles play a critical role in network controllability [45], we can consider the method proposed Fan et al [46] to measure node centrality based on cycles, which could lead to the development of a cycle ratio recovery strategy, potentially offering improved recovery efficiency.In addition, the concept of the l-shell of a given node, defined as the set of nodes at a distance l from the focal node [42], presents an intriguing avenue for further research.Exploring localized attacks and subsequent recovery strategies based on the shell distance l could offer insights into strategies that leverage localized information.These investigations hold the potential to deepen our understanding of the efficacy of diverse recovery methods and contribute to the evolution of more efficient network recovery techniques in the context of network controllability.result of the minimum fraction of driver nodes at a particular challenge k is denoted as n D [k], the rescaled result n D [k] ′ can be computed as follows: We present the results of the rescaled model for the Topology Zoo dataset in a histogram (figure A1), where the results obtained before and after rescaling are depicted in orange and blue, respectively.We observe a noticeable improvement in the approximations for the seven large real-world graphs by comparing the results obtained before and after rescaling (table A1).The results for the rescaled model exhibit smaller AME and RMSE values, and higher P RMSE⩽0.05values, thus indicating the effectiveness of the rescaling method.However, compared to the results using the shifted model, the prediction improvements are slightly less.

Figure 1 .
Figure 1.Properties of 202 networks from the Internet Topology Zoo data set.

Figure 2 .
Figure 2.The minimum fraction of driver nodes during random node removals and random node additions in synthetic networks.The blue lines depict node removals, while the red lines represent node additions.The solid lines are obtained by simulations.The blue and red dashed lines are the analytical approximations under random node removals and additions, respectively.We use nDavg to denote the mean minimum fraction of driver nodes in the simulations at each challenge and use nDavg to denote the analytical values of the minimum fraction of driver nodes at each challenge.

Figure 6 .
Figure 6.Results before and after shifting for the Topology Zoo data set.Clearly the shifted model exhibits better performance for this data set.
we can also attempt to use the ratio of n D [0] ′ and n D [0] as a scaling factor, i.e. γ = nD[0] ′ nD[0] , to construct a rescaled model n D [k] ′ = γn D [k]

Figure 7 .
Figure 7. Recovery for different recovery strategies in synthetic networks.Bar 'Rand' presents the recovery energy of random recovery strategy; Bar 'Deg' presents the recovery energy of degree-based recovery strategy; Bar 'Bet' presents the recovery energy of betweenness-based recovery strategy; Bar 'Deg-up' presents the recovery energy of updated degree recovery strategy; Bar 'Bet-up' presents the recovery energy of updated betweenness recovery strategy; Bar 'Greedy-deg' presents the recovery energy of greedy-degree recovery strategy and Bar 'Greedy-bet' presents the recovery energy of greedy-betweenness recovery strategy.The numbers above each bar demonstrate different recovery strategies' values of recovery energy.

Figure A1 .
Figure A1.The method based upon rescaling has better approximation performance for the Topology Zoo data set than the original analytical model.

Table 1 .
Properties of seven real-world networks.

Table 2 .
Results for the shifted model for seven large scale real-world networks.

Table 3 .
The recoverability metric R for different recovery strategies for different kinds of synthetic networks.'Rand' is an abbreviation of random recovery strategy; 'Deg' is an abbreviation of degree based recovery strategy; 'Bet' presents betweenness based recovery strategy; 'Deg-up' is an abbreviation of updated degree based recovery strategy; 'Bet-up' presents updated betweenness recovery strategy; 'Greedy-deg' is an abbreviation of greedy-degree recovery strategy; 'Greedy-bet' presents greedy-betweenness recovery strategy.

Table A1 .
The results of the rescaled model for seven large scale real-world networks.