Acquaintance immunization with limited knowledge of network structure

Optimal and efficient immunization of large networks remains a challenging task. Many theories and approaches have been suggested, however most of them require complete knowledge of the underlying network structure. Here, we study a targeted immunization strategy that incorporates the fact that there is often limited knowledge on the network structure. Previous work has suggested ‘acquaintance’ immunization, where rather than selecting a random individual to immunize, an individual is selected and then one of their acquaintances is immunized. Here, we generalize acquaintance immunization to the case where rather than selecting a random acquaintance, we examine the degrees of n acquaintances and immunize the one with the highest degree. We develop and solve an analytic framework for this model and verify our model with extensive numerical simulations. We determine the critical percolation threshold p c and the size of the giant component, P ∞ , for arbitrary degree distributions. We also consider our immunization strategy on real-world networks and determine the variation of p c with increasing n . We find that our new approach improves on both acquaintance immunization and random immunization using limited knowledge.

Within network science, the goal of immunization has been to disconnect nodes into separate components such that no virus is able to spread between them [10,11]. To achieve this goal, percolation theory from statistical physics has been used, with the network connectivity being measured after a fraction of nodes or links are immunized (removed) [12,13]. It has been shown that random immunization is generally inefficient on real-world networks which tend to have a scale-free (SF) degree distribution [11,14], however targeted immunization can quickly reduce the susceptible portion of the network [15][16][17][18]. At the same time, most targeted immunization strategies assume full-knowledge of the network structure [18][19][20][21][22][23][24][25], and while effort has been devoted to developing algorithms for reconstructing networks [26,27], this remains a challenging area. In contrast, several approaches have attempted to understand how targeted immunization can be carried out with limited knowledge, including using a percolation based theoretical framework [28][29][30][31][32]. One particular strategy that requires minimal knowledge of the network is acquaintance immunization [2], which leverages the 'friendship paradox' [33]. Specifically, acquaintance immunization selects a random individual and rather than immunizing them, immunizes one of their neighbors in the network.
Here, we consider a generalization to acquaintance immunization that improves its effectiveness in immunizing the network. Rather than assuming only knowledge of the selected node and choosing a random neighbor of it, we compare the degrees of the node itself and n − 1 of its neighbors and immunize the node of highest degree among these n. We note that in this model n will be assumed to be small, generally less than 5, and certainly much smaller than N, the number of nodes in the network. This model still requires only local knowledge of the network, and the information on the degrees of several neighbors could be gleaned by a group of public health investigators each provided with some number of immunizing doses who are sent out to optimally immunize the network. In contrast to acquaintance immunization of [2] where the investigators would just ask an individual to name a neighbor, here the investigators could also ask the individual to name n − 1 friends and then guess which of these has the most connections. In this manner, only diffuse and local information about the network is needed. Another possible setting is where some fraction of individuals are sent doses and instead of being asked to give the dose to a neighbor, are instead told to assess among n − 1 neighbors and themselves who is the most connected and then give the does to whoever that ends up being.
After implementing this model, we examine both analytically and numerically, the size of the largest connected component P ∞ , representing the fraction of individuals that the virus can spread to, and the critical point p c , below which spread is curtailed. We solve our model for general arbitrary degree distributions and then analyze the specific cases of Erdos-Renyi (ER) networks and SF networks. Furthermore, we show that by comparing the degrees of only a few individuals, n ≈ 3, we can considerably improve the rate of immunization to the point where less than 30% individuals need to be immunized to stop the spread on an SF network. Finally, our approach is more efficient than both previous work on acquaintance immunization [2] and recent work on immunization with limited information where n individuals were selected randomly [28].

Model description
Let G(V, E) be a graph where V and E are the set of nodes and edges respectively, and N = |V| is the number of nodes in the graph. In our model, at each stage, an immunizing agent has limited knowledge about the neighborhood of a particular node. For instance, the agent might be able to ask about the degrees of a set of nodes in the neighborhood of node i, denoted as Γ i = {d u1 , . . . , d un }, where d u j is the degree of node u j and n = |Γ| is the knowledge index, or number of nodes whose degrees are known. In our model, the set of nodes whose degrees are known are all neighbors of a particular node i. For simplicity, we assume that n is equal at each stage of the targeted immunization process.
In figure 1, we illustrate our immunization strategy, acquaintance immunization with limited knowledge (AILK), on an example network. The knowledge index in this example is n = 3, and the node initially selected, the index node, is u 1 . We then randomly evaluate the degrees of n − 1 neighbors of u 1 , such that we ask for the degrees of nodes in the set Γ = {u 1 , u 2 , u 3 }. We then identify the node with highest degree, in this case u 3 and select it to be immunized.
More generally, we assume that the agent randomly selects a node i and obtains information on n − 1 neighbors of node i, as well as information on i itself. The agent selects the node with highest degree among the n nodes to immunize (if multiple nodes have an equal highest degree, we randomly select one of them). This procedure is repeated until a 1 − p fraction of nodes is removed from the network. We assume that the degree of nodes in the network is such that d ∈ [k min , k max ].

Critical point p c
We begin by deriving the critical point p c , where for any p < p c the network is sufficiently immunized such that viral spread is contained. We note that this derivation builds upon the derivation from [34] in pages 164-166 where p c was derived for ordinary acquaintance immunization.
We start with the simplest non-trivial case n = 2 and examine the following branching process. Consider a random edge leading to a node u with degree k on layer l i.e. at distance l from the initially selected node. Let a k define the case that a node with degree k is not removed. Then n l (a k ) denotes the number of nodes with degree k on layer l which are not removed. Node u has k − 1 edges (excluding the incident edge) Figure 1. Schematic illustration of acquaintance immunization with limited knowledge (AILK). In this example network we consider the case where the knowledge index is n = 3. (left) At the initial stage, a random chosen node u1, outlined with a bold solid circle is identified by the immunizing agent, and its degree is found, in this case k = 4. (right) In AILK, an agent then obtains the degree of n − 1 = 2 other nodes that are neighbors of node u1 and the node with largest degree among these n = 3 is selected for immunization. In our example here, among the set {u1, u2, u3} the agent selects node u3 to be immunized. This procedure is iterated until the desired 1 − p fraction of nodes are removed.
connected to the next layer l + 1. Randomly following an edge, it will encounter a node of degree k ′ with Using Bayes rules Equation (1) becomes : Assuming no degree-degree correlations, p(k <k> . Now, recall that the number of nodes in the network is N, and we would like to immunize f fraction of nodes. Let ω p (k ′ , k) be the probability that a specific node with degree k ′ is not selected by one of its neighbor with degree k in all Nf immunizing attempts. This satisfies where θ is the real Heaviside step function, θ(x) = 1 for x ⩾ 0, otherwise θ(x) = 0. If the degree of the neighbor is not known, then we can take the average over Thus the probability that a node with degree k ′ is not removed during the branching process is For a node with a neighbor having degree k, the probability that this node with degree k ′ is not removed is: Substituting the above equations into equation (1), we have: To derive the above critical condition, we approximated Note that the summation of the right side is independent of k ′ , hence we can write n l (k) in the form of where c l is a constant associated with layer l. Substituting this into equation (7), we have The condition that the branching process continues is The critical threshold f c is obtained when the left side of equation (9) equals 1. Then, the critical point p c is: while f c satisfies the condition where P(k) is the degree distribution, and ω fc ( <k> as in [2]. Interestingly, for the case of general n, we arrive at a simple expression. First, we define We then let M be the kernel matrix where k)) . In this case the condition for the critical point is where ρ(M) is the spectral radius of matrix M (the largest eigenvalue of matrix M) and p c = 1 − k P(k)ω k fc,n (k) (see appendix 'Spectral radius' for details).

Giant component P ∞ near the critical point
Above we have derived the probability that a node with degree k is not removed after Nf queries, i.e. p(a k |k).
Here we use the generating function framework to obtain the final giant component size P ∞ [14]. Let G 0 (x) = k P(x)x k be the generating function for the network. After Nf queries, the generating function of the remaining network is, where Q 2,p (k) is the probability that nodes with degree k are not removed after Nf queries. This quantity Q 2,p is equal to p(a k |k) derived earlier i.e.
Then the generating function of the remaining network is The probability that randomly following an edge leads to a node with degree k is P 1 (k) = kP(k) where u = 1 − F 1 (1) + F 1 (u). Together with equation (11), Q 2,p (k) can be estimated aŝ Here ) )⟩ k ′ , and f satisfies the equation p = 1 − k P(k)ω k f (k). More details and the general case for n > 2 can be found in appendix 'Spectral radius' .

Random networks 4.1.1. ER networks
In this section we study AILK on ER networks. First, we determine the giant component P ∞ . Using equation (19), the giant component P ∞ can be solved for a given p numerically. In figure 2(a), we plot results of the giant component P ∞ as a function of 1 − p for n = 1 (random immunization) and n = 2, 3. We find that as n increases the giant component decreases for a given level of immunizations. In figure 2(b), we show similar plots for fixed n but very ⟨k⟩, finding that increasing the degree of the network leads to a larger giant component. In both plots we observe excellent agreement between our analytical results and simulations near the critical threshold.
Next, we study the critical behavior under AILK. Surprisingly, we find that one does not need a very large knowledge index n to achieve the same effect as targeted immunization with complete information, as seen in figure 2(c). Even for n = 2 or other small values of n, AILK represents a considerable improvement over the random case (n = 1). The critical threshold p c increases with n, where the change from n = 1 to n = 2 is the largest, corresponding to the difference between no-knowledge and some knowledge. The continued benefits of increased information tend to accrue more slowly however and p c quickly reaches a steady state.
In figure 2(d), we consider the critical threshold p c as a function of mean degree ⟨k⟩ on ER networks with fixed n. Unsurprisingly, increasing the degree decreases p c as more immunizations are needed to segregate the network.

SF networks
We next turn to SF networks. We consider the degree distribution with maximum and minimum degree cut-offs such that [35] In the simulations, we generate SF networks of size N = 10 6 using the configuration model from [4]. Similar to the ER network, we can substitute the degree distribution of the SF network into equation (19), and calculate the giant component P ∞ numerically. In figure 3(a), we obtain the giant component P ∞ of SF networks under AILK with different knowledge indices, finding the simulation results agree well with the theoretical results. As the knowledge index n increases, SF networks have a more significant gap in p c compared to ER networks. In figure 3(b), we find that around 30% of nodes in an SF network have to be immunized using our strategy with n = 3 to stop the spread. We next compare both the giant component in figure 4(a) and the value of p c in figure 4(b), for acquaintance immunization [2], random immunization under limited knowledge [28], and our current work on AILK on SF networks. We see that AILK immunizes the network significantly faster than immunization with limited knowledge where a random node is selected (as opposed to acquaintances) [28], and also immunizes the network faster than ordinary acquaintance immunization [2]. AILK requires slightly more effort than acquaintance immunization as the randomly selected node must be asked not only about the identity of a friends, but also about the relative popularity of several friends. However, the additional level of effort is reasonable and the improvement in immunization efficiency is likely to be worthwhile.

Real-world networks
Finally, we test our model on six real-world networks in figure 5. They are (a) the autonomous computer networks from the Skitter project [36], (b) the co-authorship network(dblp) [37], (c) the youtube   video-sharing network [37], (d) the social friendship network form Gowalla [38], (e)the co-authorship network of the DBLP computer science bibliography [39] and (f) A network of Linux (v3.16) source code file inclusion [39]. The details of network structure are shown in table 1. We use the peak of the second largest component to characterize the critical point p c . For a given n, the box diagram shows the distribution of the critical point over 200 independent realizations, plotted together with the theoretical results for an SF network with the corresponding γ. We find that the critical point p c in empirical data is generally consistent with theoretical predictions that consider only the degree distribution.

Discussion
Here we defined and explored acquaintance immunization with limited knowledge (AILK), a new immunization approach combining acquaintance immunization and limited knowledge of a network. The model assumes that at each step a seed node is randomly selected from the network and its degree information is obtained. Then n − 1 nodes are randomly selected from the seed node's neighbors, and the node with largest degree among the total known n nodes is immunized. Importantly, this novel approach, while requiring more information than ordinary acquaintance immunization, does not require full knowledge of the network, rather only local knowledge of a node and a few neighbors. We find that AILK lends itself to analytic treatment, allowing us to calculate the giant component and critical threshold analytically, which we then confirmed via simulations. Using the branching process and the combined probability method, we first derived the equations of critical threshold point p c for the case of n = 2, and developed a method to estimate the giant component. Simulations showed excellent agreement with the analytic results. Surprisingly, our results showed that even if only a few of the neighbors' degrees are examined and the knowledge index (n) is low, the improvement in p c can be considerable. For SF networks, while random immunization would require almost all individuals to be immunized (p c → 1), for AILK with n = 3 we find that only 30% of individuals need to be immunized. This also represents an ≈5% improvement over ordinary acquaintance immunization which does not take the effort to ask the individual about the degrees of their neighbors.
This new immunization strategy could help public health authorities reduce spreading during epidemics and better immunize large populations. Furthermore, our findings could have broader applications beyond just public health and immunizations against epidemics. For example, our strategy could also be used in immunizing computer networks from viruses through software updates or disrupting terrorist networks by asking individuals to identify their most connected contact.
In addition, several extensions could be made to our initial work here. For example, one could consider including community structure in the underlying networks [40]. If the connections both within and between the communities are Poisson (leading to ER-like communities) then the results will be the same as our results for ER networks and our analytic results will hold. In contrast, in many cases the interconnected nodes are of interest [25,41,42] and how our model could incorporate potential limited knowledge of a node's interconnections would be an interesting extension. Likewise, considering our immunization strategy on multiplex networks would be a worthwhile extension [43,44]. Finally, another important extension could incorporate the fact that individuals might have 'noisy' knowledge about the degrees of their neighbors and thus estimates of the neighbor with the highest degree might not always be correct.

Data availability statement
The data cannot be made publicly available upon publication because no suitable repository exists for hosting data in this field of study. The data that support the findings of this study are available upon reasonable request from the authors.

Appendix. General case: n Spectral radius
First, we introduce the spectral radius for a vector iterative system. Let λ 1 , λ 2 , · · · , λ J be the eigenvalues of matrix M in a strictly decrease order, i.e. |λ i | > |λ j | where i > j and v 1 , v 2 , · · · , v J be the corresponding eigenvectors. In graph theory, λ 1 (the largest eigenvalue) is called the spectral radius of M, denoted ρ(M). For the vector n 0 in a J dimensional linear space, there exists a linear expression Thus we have the two equations

Critical point p c
We adapt the derivation of acquaintance immunization (mainly in pages 164-167 of [34]) to find the critical point p c for general n. Let A represent the case that a node with degree k ′ is not removed as a result of being a neighbor to a node with degree k in Nf queries. The probability of A is Taking the average over k, the probability that a node with degree k ′ is not removed after Nf queries is: where < · > k means taking the average over k. Accordingly the probability that a node with degree k ′ is not removed provided that a neighbor's degree is known to be k is: Let ω f,n (k ′ ) = < exp −fϕn(k ′ ,k) ) > k . Using Bayes rule, and substituting the above equations into equation (7), the equation of branching process is n l+1 (a k ′ ) = k n l (a k )(k − 1)P 1 (k ′ ) ω k ′ −1 f,n (k ′ ) ω f,n (k) × exp −f(ϕn(k,k ′ )+ϕn(k ′ ,k)) . (A.7) Let us rewrite equation (A.7) in matrix form as: where n l (k) = n l (a k ) ω f,n (k) (k − 1)P 1 (k ′ )e −f(ϕn(k,k ′ )+ϕn(k ′ ,k)) . Thus critical point p c satisfies ∃l ∈ N, f c ∈ R, s.t.
where ⊘ represents the element-wise division operator for the matrix. Suppose that initially all nodes are remaining in the network. We thus start with an initial vector Giant component P ∞ Let Q n,p (k) be the probability that a node with degree k is not removed after Nf queries. The expression for this is Then let F(x) = P(k)Q n,p (k)x k , the giant component is then where u = 1 − F 1 (1) + F 1 (u).