Epidemic modelling requires knowledge of the social network

‘Compartmental models’ of epidemics are widely used to forecast the effects of communicable diseases such as COVID-19 and to guide policy. Although it has long been known that such processes take place on social networks, the assumption of ‘random mixing’ is usually made, which ignores network structure. However, ‘super-spreading events’ have been found to be power-law distributed, suggesting that the underlying networks may be scale free or at least highly heterogeneous. The random-mixing assumption would then produce an overestimation of the herd-immunity threshold for given R 0; and a (more significant) overestimation of R 0 itself. These two errors compound each other, and can lead to forecasts greatly overestimating the number of infections. Moreover, if networks are heterogeneous and change in time, multiple waves of infection can occur, which are not predicted by random mixing. A simple SIR model simulated on both Erdős–Rényi and scale-free networks shows that details of the network structure can be more important than the intrinsic transmissibility of a disease. It is therefore crucial to incorporate network information into standard models of epidemics.

Throughout the recent COVID-19 pandemic, 'compartmental models', such as the SIR or SEIR models, were widely used to forecast the likely number of infections, hospitalisations and deaths from the disease under different scenarios [1][2][3], particularly as a guide to making 'non-pharmaceutical interventions' (NPIs) [4,5].However, doubts arose as to their predictive power [6].In the UK, for example, the large waves of infections expected to occur in the absence of NPIs, in both the summer and winter of 2021, failed to materialise [7].
While these models can take account of many details of how the specific disease spreads, they usually make the assumption of 'random mixing': that any individual can infect any other [8].However, people are in fact connected according to a social network [9].It has been known for decades that network topology has important effects on spreading processes [8][9][10][11][12][13][14], but in practice it is difficult to gather data on this web of contacts.Moreover, accounting for the network explicitly with agentbased modelling may be computationally prohibitive at the scale of, say, a whole country.Hence, random mixing -albeit with a degree of structure captured by the inclusion of different groups of people -remains the standard assumption [15].
Social networks of various kinds have been found to be highly heterogeneous, in that the degree, k, of vertices (i.e. the number of contacts of each person) follows a distribution with a high variance [16,17].Scale-free networks -in which this distribution is a power law, p(k) ∼ k −α , with α usually between 2 and 3 -are an extreme example of this.For instance, a network of sexual contacts was observed to follow this rule with α ≃ 2.4 [18].And while we don't have detailed information on the * s.johnson.4@bham.ac.uk network of contacts underlying the spread of respiratory viruses, we do know that, in the early stages of an epidemic, COVID-19 is driven largely by 'super-spreading events' (SSEs) [19], with one estimate suggesting that fewer than 10% of infectious individuals accounted for 80% of infections [20].The importance of SSEs was also shown in the case of SARS [21].Moreover, Fukui and Furukawa [22] found the distribution of these SSEs -that is, the number of individuals infected on each occasion -followed power laws in the cases of SARS, MERS and COVID-19.This suggests that the underlying networks have highly heterogeneous degree distributions, which would be consistent with other studies of social networks [16][17][18].
This letter uses an agent-based version of an SIR model to illustrate how the random-mixing assumption can lead to very large errors in the total number of people predicted to become infected, in a given epidemic 'wave', if the network is scale free.This is not just because, for a given basic reproduction number, R 0 , the 'herd immunity threshold' (HIT) is generally lower on a scale-free network; but, more significantly, because the initial rapid growth in infections in the scale-free case can lead to an overestimation of R 0 .As we shall see, the combination of these two effects can produce a random-mixing forecast of over 80% of the population becoming infected, when in fact only 20% are affected before the epidemic dies down naturally.
Conversely, once an epidemic has reached herd immunity, the random-mixing assumption predicts that the population is safe from further waves unless immunity wanes.However, if the networks are scale free and change in time, multiple waves can occur even as individual immunity is maintained.
The COVID-19 pandemic involved multiple waves of infection in countries with quite different levels of stringency in their NPIs.This seems more compatible with a process taking place on time-varying, heterogeneous networks than with the predictions of random-mixing mod- els.

II. THE EFFECT OF THE NETWORK
Consider a network in which each vertex represents an agent, and edges are contacts which potentially allow for contagion of a transmissible disease.We will compare here two different topologies: Erdős-Rényi (ER) networks, in which the edges are placed entirely at random among the vertices [23]; and scale-free (SF) networks.The latter are constructed by drawing desired degrees from a distribution p(k) ∼ k −α , and using the 'configuration model' to place the edges [9].A 'structural cut off' is imposed, such that k < k N , where k is the mean degree and N the number of vertices.For the parameters used here -k = 5 and N = 10 4 -the maximum degree in the SF case is therefore 233.This is not unrealistically high for COVID-19 contact networks since some SSEs saw over 100 people apparently infected by a single individual within a few hours.In both cases we will consider undirected networks for simplicity, although directionality has been found to have an important influence on spreading processes [24].The random-mixing assumption is a good mean-field description of the Erdős-Rényi case.However, as discussed, the scale-free network may be a better model for a real web of social contacts.
The epidemic is described by the following model.Every agent v i has a state z i (t) at discrete time t, which can take one of three values: S, I or R (Susceptible, Infectious or Recovered).If there is an edge from v i to v j , and if z i (t) = I and z j (t) = S, then with probability β we set z j (t + 1) = I (i.e.v j is infected by v i ).If z i (t) = I, then z i (t + τ ) = R, for all τ ≥ 1 (i.e.every agent recovers after one time step, and thereafter cannot change state, as though either immune or deceased).Agents are updated in parallel at every time step.This is a very simple version of an SIR model, with no allowance made for heterogeneity in transmission times, infectiousness or other features, nor for different categories of agents, such as asymptomatic individuals, children, etc.Moreover, parallel updating is not always a good approximation for a continuous-time process [25], which would be simulated more realistically with a Gillespie algorithm [26].The purpose of this model here is merely to highlight how knowledge of the network is crucial even in the simplest of settings.
Consider the situation where initially all agents are Susceptible except for one randomly chosen agent, which is made Infectious at time t = 0.If the mean degree of the network is k , at t = 1 the expected number of Infectious agents will be k β, so the basic reproduction number will be The expected mean degree of the newly Infectious agents, however, is not k , but k 2 / k , where k 2 is the second moment of the degree distribution.(This is an instance of the 'friendship paradox': your friends have more friends than you [27]).So, taking into account that one of the contacts was the originally Infectious vertex, the effective reproduction number, R t , at the next time step (t = 1) is where σ 2 is the variance of the degree distribution, p(k).If the network is an Erdős-Rényi random graph, this is a Poisson distribution, so σ 2 = k and R 1 = R 0 .However, if degrees are distributed more heterogeneously, as in a scale-free network, then R 1 > R 0 [9].In other words, the epidemic accelerates as it reaches more highly connected vertices (hubs).
On the other hand, the epidemic plays the role of a targeted attack on the network: by infecting the hubs early on, it removes edges more rapidly in a more heterogeneous network than in a homogeneous one, with the result that in the heterogeneous case fewer vertices may end up becoming Infectious before the epidemic peters out.This is an instance of a more general effect whereby if susceptibility and infectiousness are both heterogeneously distributed and positively correlated in a population, the 'herd immunity threshold' (HIT) (i.e. the proportion of infected people when R t drops below one) is lower than we would expect from the standard equa-tion HIT= 1 − 1/R 0 , which follows from the assumption of random mixing [28].
Figure 1 shows averages over time series for the proportion of the agents which are Infectious (panel A) or Recovered (B), for three different scenarios.The dark blue circles correspond to Erdős-Rényi random graphs with k = 5.The infection probability is β = 0.48 so, according to Eq. ( 1), R 0 = 2.4.Eventually about 88% of agents become infected.This is fairly close to the prediction of 81% for COVID-19 infections in the UK and the US made in March 2020 by the group led by Prof. Neil Ferguson [4], despite the much greater sophistication of their model, for the case in which no NPIs were introduced and based on an estimate of R 0 = 2.4.
The light blue triangles in Fig. 1 are for the same parameter values ( k = 5 and β = 0.48) but now the networks are scale free, with an exponent α = 2.5.The curve now grows significantly faster and peaks at a higher value, yet also falls more quickly, going on eventually  3. Proportion of agents ever infected, ρ, against estimated basic reproduction number, R e 0 , from Eq. ( 4) for SF networks with exponent α = 2 (dark blue circles) and α = 3 (light blue triangles), and for ER random graphs (red diamonds).Different values for the same network correspond to the different values of β used in Fig. 2 A and B. All other parameters as in Fig. 2.
to infect a slightly smaller proportion of the population (72%) than in the ER case.
The red diamonds also correspond to SF networks with α = 2.5, but now β = 0.12.In this case, the curve initially follows a very similar trajectory to the ER network with β = 0.48; but it peaks earlier at a lower value, and goes on to infect only 20% of the population.
This example serves to illustrate how two different scenarios -high transmissibility on a homogeneous network, and low transmissibility on a heterogeneous one -can initially follow very similar epidemic curves, yet go on to have markedly different outcomes.

III. MISMEASURING R0
In practice, it is not usually possible to obtain the value of R 0 from contact tracing.Rather, scientists estimate this number from the rate at which infections grow in the early stages of the epidemic, together with assumptions about the incubation period and duration of infectiousness [4,29].For instance, if one assumes that each Infectious individual infects R 0 others after a period τ , and the number of Recovered is low enough that one can assume exponential growth, then the number of Infectious individuals at time t is Imagine a group of scientists living in a scale-free world who observed an epidemic growing, in its early stages, as the red diamonds of Fig. 1.If they assumed random mixing and estimated R 0 from Eq. ( 3), they would find that R 0 ≃ 2.4.Their model, even if quite sophisticated in other ways, may well then predict that the epidemic would evolve similarly to the dark blue circles.Moreover, if NPIs were then imposed, and the curve went on to peak earlier than forecast and well before the expected HIT, it would be natural to assume that R t had fallen below one thanks to the NPIs.Only when an epidemic were allowed to spread without added NPIs would it become apparent that the model's predictions were significantly wrong.
Figure 2 (A) shows the proportion of the population who have been infected after the wave has passed, ρ = lim t→∞ R(t), against β for SF networks with α = 2 and 3, and for ER networks.At low β, the epidemic reaches more agents on the SF networks, since the process does not percolate on ER networks for β k < 1.However, for larger β the epidemic reaches more agents on more homogeneous networks (i.e. the HIT is lower on SF networks [22]).
Figure 2 (B) shows the 'estimated R 0 ', or R e 0 .Using Eq. ( 3) and bearing in mind that in this model τ = 1, this is defined as In other words, R e 0 is akin to the value of R 0 that a group of scientists might estimate from observations of the doubling time in the early stages of the epidemic.For ER networks, which are equivalent to random mixing, R e 0 will be very close to R 0 , as given by Eq. ( 1) (R e 0 ≃ k β).However, we shall see that, for SF networks, R e 0 can be significantly higher than this value (R e 0 > k β).Thus, estimates of the transmissibility of a disease based on changes in the number of cases can be wrong if the underlying social network is heterogeneous.
Fig. 2 (C) shows again the eventual proportion of infected agents, but against α for SF networks, and different values of β.As α decreases, β has less of an effect on the reach of the epidemic -suggesting that the intrinsic transmissibility of a disease is less significant if the network is highly heterogeneous.Fig. 2 (D) shows R e 0 against α for SF networks.The estimated reproduction number is always greater the more heterogeneous the network, and in the β = 0.1 case the value of α can even determine whether R e 0 is greater or less than one.
Another way of viewing these results is to plot ρ against R e 0 , as in Fig. 3. On the ER networks, ρ is very sensitive to R e 0 , as in random-mixing models.But as degree heterogeneity increases, this sensitivity decreases.For example, on the SF networks with α = 2, there is a range for which a doubling in R e 0 leads to barely a 20% increase in the proportion infected.Hence, if the network is highly heterogeneous, the estimated R 0 is very sensitive to β, yet the number of people who will become infected is not.In other words, it becomes more important to gain knowledge about the network than about the intrinsic transmissibility of the disease.

IV. MULTIPLE WAVES
Once an epidemic has petered out naturally, it is often assumed that herd immunity must have been achieved, and the population is no longer vulnerable unless immunity wanes or transmissibility increases significantly.However, when the HIT is low thanks to the heterogeneity of the social network, a large pool of susceptible individuals may still remain even after a first 'wave' of infection.As long as the structure of the network is unchanged, the population will indeed have herd immunity.But if this structure is altered the population may become vulnerable to subsequent waves of infection.
Figure 4 compares time series for ER and SF networks, as in Figure 1, but now at times t = 15 and t = 30 the network structure is replaced with a new one, and the epidemic is re-seeded by switching a small number of Susceptible agents to Infected (all Recovered agents remain Recovered).Figures 4 A and B show the proportions of Infectious and Recovered agents, respectively.Once the epidemic has died down in the ER case, replacing the network with a new version and re-seeding the epidemic has virtually no effect, since there are insufficient remaining Susceptible agents for a new wave to occur.However, in the SF case, a new wave is seen every time the network is changed -albeit with each wave being smaller than the last.In panels A and B the transmissibility is constant (β = 0.4 and 0.1 for the ER and SF networks, respectively).Figures 4 C and D, however, show time series for which, in addition to the network structure being changed, the transmissibility is increased to β = 0.6 (at t = 15) and β = 1 (at t = 30) for the ER networks; and to β = 0.2 (at t = 15) and β = 0.4 (at t = 30) for the SF networks.In the ER case there are still no more waves of infection.However, in the SF case there are now subsequent waves of increasing size.
In real life, should we expect the social networks behind epidemics to change?Certain connections may be quite stable, such as those between work colleagues, while others are transitory, say among people who happen to be attending the same event.The COVID-19 pandemic involved several waves of infection, something variously attributed to more infectious variants of the virus, changing NPIs or waning immunity.However, Figure 4 shows that an underlying network which is both heterogeneous and time varying is enough to produce several waves, even when previous ones died down naturally.
V. CONCLUSION While we may not have detailed information on the web of contacts underlying a process such as a COVID-19 epidemic, we know that social networks of various kinds have been found to be highly heterogeneous [9], and that super-spreading events for this and similar diseases appear to be power-law distributed [22].A heterogeneous topology, such as a scale-free network, may therefore be a better null model than the assumption of random mixing.
The epidemic model used here is very simple and devoid of any realistic parameters.But there is no obvious reason to believe that the greater sophistication of the compartmental models often used to guide public health policy would annul the effects reported here.In any case, perhaps this could be explored by implementing versions of such models on networks.Another caveat is that in this model recovered agents can never again become infected.In reality, we know that diseases such as COVID-19 can re-infect, either because of waning immunity or new variants.Multiple waves of infection are thus of-ten attributed to changing levels of individual immunity.However, we have seen that a changing network structure, if heterogeneous, can also lead to multiple waves even when individual immunity is maintained.
If these results do carry over to more realistic scenarios, then it is crucial to gather data on the networks of contacts on which epidemics play out, and to adapt existing compartmental models either to correct for network topology, or to take it into account explicitly.It may be the case that estimating k 2 / k in a social network is in fact easier than inferring the mean degree, since methods such as respondent-driven sampling suffer from a bias towards more highly connected individuals [30].Further research is also needed to elucidate to what extent social networks change in time and how this affects epidemics [31].
In any case, the effects reported here suggest that: a) each 'wave' of a disease such as COVID-19 may infect fewer people than we would otherwise assume, even in the absence of NPIs, thanks to network heterogeneity; b) if networks are heterogeneous and change in time, this can lead to multiple waves of infection that would not be predicted by random-mixing models; and c) NPIs focused on avoiding super-spreading events are likely to be particularly efficacious at suppressing the epidemic.
Other network properties -such as efficiency [32], assortativity [33], directionality [34] or spatial aspects [35] -may also be as relevant as degree heterogeneity.A welldefined community structure, in particular, can have an important effect [36].Ultimately, epidemics are yet another example of how the architecture of complex systems is fundamental to their dynamical behaviour [11,12,22].

1 FIG. 2 .
FIG.2.Proportion of agents ever infected, ρ, against probability of infection, β, for SF networks with α = 2 (dark blue circles) and α = 3 (light blue triangles), and for ER random graphs (red diamonds) (Panel A).Estimated value of basic reproduction number, R e 0 , from Eq. (4) against β on the same networks (B).Proportion infected, ρ, against SF exponent α for infection probability β = 0.6 (dark blue circles), β = 0.3 (light blue triangles) and β = 0.1 (red diamonds) (C).And R e 0 against α for the same values of β (D).All agents are initially Susceptible except for 50 randomly chosen to be set to Infectious.All other parameters as in Fig.1.

FIG. 4 .
FIG.4.Time series for proportions of agents in the Infectious (panels A and C) and Recovered (B and D) states for ER random graphs (blue circles) and SF networks with exponent α = 2.2 (red diamonds).At time t = 0 all agents are Susceptible, except for 50 randomly chosen agents set to Infectious.At times t = 15 and t = 30, the networks are replaced with new ones, randomly generated with the same network parameters; and 50 randomly chosen Susceptible agents are set to Infectious.Panels A and B: Transmissibility is constant at β = 0.4 in the ER case and β = 0.1 in the SF case.Panels C and D: Transmissibility is increased at times t = 15 and t = 30.In the ER case, β = 0.4 until t = 15, β = 0.6 until t = 30, and β = 1 thereafter.In the SF case, β = 0.1 until t = 15, β = 0.2 until t = 30, and β = 0.4 thereafter.All other parameters as in Fig.1.