How contact patterns destabilize and modulate epidemic outbreaks

The spread of a contagious disease clearly depends on when infected individuals come into contact with susceptible ones. Such effects, however, have remained largely unexplored in the study of epidemic outbreaks. In particular, it remains unclear how the timing of contacts interacts with the latent and infectious stages of the disease. Here, we use real-world physical proximity data to study this interaction and find that the temporal statistics of actual human contact patterns i) destabilize epidemic outbreaks and ii) modulate the basic reproduction number. We explain both observations by distinct aspects of the observed contact patterns. On the one hand, we find the destabilization of outbreaks to be caused by temporal clustering of contacts leading to over-dispersed offspring distributions and increased probabilities of otherwise rare events (zero- and super-spreading). Notably, our analysis enables us to disentangle previously elusive sources of over-dispersion in empirical offspring distributions. On the other hand, we find the modulation of R0 to be caused by a periodically varying contact rate. Both mechanisms are a direct consequence of the memory in contact behavior, and we showcase a generative process that reproduces these non-Markovian statistics. Our results point to the importance of including non-Markovian contact timings into studies of epidemic outbreaks.


I. INTRODUCTION
As contagious diseases are passed on through contacts, the number of secondary infections depends crucially on the contact patterns of infectious individuals.These contact patterns encode relevant information such as the number of interaction partners and contact timing.However, the majority of prevailing models for disease spread prioritize simpler descriptions that neglect these aspects -despite evidence from studies that show the effects of contact patterns to be crucial for disease spread: Structurally, when interaction partners are modeled by a static complex network [1], the network structure affects disease spread through the occurrence of hubs [2,3], multiscale link communities [4] and influential spreaders [5].Dynamically, real-world interaction times generally follow a non-Markovian process (in contrast to commonly assumed memoryless processes), which influences epidemics through the occurrence of bursts [6,7] and daily and weekly variations in human interaction [8,9].
Thus, for a better understanding of disease spread through human contacts, a complete description of timevarying interactions in the form of so-called temporal networks [10,11] seems necessary.However, constructing detailed temporal networks from real-world contacts requires extensive amounts of recorded data, which in principle can be collected in field studies [12][13][14] but such data are notoriously limited in duration and system size.Although such limitations of real-world data can be partly remedied by generating surrogate data [15,16], it is often unclear to which extent they represent the real system.An unsolved task is thus to generate surrogate data that mimics temporal statistics and individual variations of human contact data.
Here, we address this gap by identifying and isolating features of contact behaviour that affect epidemic outbreaks using a novel analysis of real-world contact data.Instead of characterizing full epidemic outbreaks on a large (likely under-determined) temporal network, we develop an effective description through potentially infectious encounters that propagates statistics of contacts to statistics of disease spread.This approach avoids treating microscopic (non-linear) network effects [17][18][19] and allows us to focus on how contact patterns statistically affect epidemic outbreaks.Our analysis reveals two main mechanisms: (i) contact clustering destabilizes outbreaks by increasing the dispersion of offspring distributions and the probability of zero-spreading events, and (ii) temporal variation of the contact rate modulates the mean basic reproduction number, R 0 , due to an interference between contact patterns and disease progression.Finally, we showcase a non-Markovian process that faithfully reproduces the temporal statistics and their effect on disease spread as a proof of principle for a new class of generative models for surrogate data that mimic human contact patterns.

II. RESULTS
To shed light on the interplay of contact patterns and epidemic outbreaks, we analyse proximity data from the Copenhagen Networks Study [20] and from SocioPatterns [21].We filter each individuals' contacts by distance and duration, and define encounters as their starting times (see Methods).The resulting encounter trains are a point-process-like representation that captures the non-Markovian statistics of the underlying contact patterns (Fig. 1).The importance of these non-Markovian statistics can be seen when comparing to randomized encounter trains.In these surrogate data, encounter times are uniformly reassigned within the duration of the experiment -which preserves the number of encounters per train, i.e., the inter-individual variability (Fig. 1c), but destroys any temporal structure (Fig. 1d).
In order to quantify the effect of contact patterns on epidemic outbreaks, we focus on potential secondary infections and assume a contagious disease that can be transmitted only during infectious encounters.Further, if τ is the time elapsed since infection, infected individuals undergo a (non-infectious) latent period τ ∈ [0, T lat ) and an infectious period τ ∈ [T lat , T lat + T inf ) during which 2all n inf encounters are potentially infectious.We estimate n inf by considering every encounter in the data set as a potential start for an infection (Fig. 2a, see Methods).As we show in Fig. 2b, empirical contact patterns increase both the probability of very few potentially infectious contacts n inf (related to zero-spreading events) as well as very many n inf (related to super-spreading events) when compared to randomized controls.This increase in variability influences whether a single infection results in an epidemic outbreak or not.

Human contact patterns destabilize epidemic outbreaks
To demonstrate the effect of empirical contact patterns on epidemic outbreaks, we map the probability distribution of n inf to an offspring distribution using a twostep, data-driven branching process (Fig. 2c): Each infected individual first generates encounters according to the empirical P (n inf ), and then infects each of them independently with probability p inf resulting in binomialdistributed secondary infections.Taking the expectation value yields the offspring distribution Similar to empirical distributions from contact tracing [22], P (x) can be well described by a negative binomial distribution (Fig. 2d) with mean x = R 0 = p inf n inf and variance (x − R 0 ) 2 = R 0 + αR 2 0 , where α is the dispersion parameter that characterizes the increase in variance relative to a Poisson distribution.
Our data analysis provides a systematic approach to identify sources of the dispersion observed in empirical offspring distributions [23][24][25].We analyze step by step the dispersion occurring because of human contact patterns, and how it depends on T lat , T inf and R 0 (Fig. 2e, left to right).For a completely randomized control, where encounters are uniformly distributed across trains and time, we consistently find Poissonian offspring distributions (Fig. 2d, gray), with low dispersion (α = 0), independent of the three disease parameters (Fig. 2e, gray triangles).When including variability of contact rates into the control, while still randomizing within trains, the dispersion of the offspring distribution increases (α ≈ 0.3) but remains mostly independent of disease parameters (yellow circles).Lastly, when also including the precise timing of human contact patterns, offspring distributions show large dispersion that depends on T lat and T inf (blue symbols).In particular, dispersion is strongest for short T inf but decays as T inf increases.Hence, part of the empirical dispersion can be attributed to variability of contact rates between individuals, but non-Markovian tim- Real-world contact patterns increase the probability of rare spreading events, which increase the extinction probability of an epidemic outbreak.a: Sketch of estimating n inf from encounter trains for a disease model with T lat and T inf .b: P (n inf ) depends on T lat for human encounter trains (red, blue) but not for randomized controls (gray, yellow), which underestimate both zero-spreading as well as super-spreading events.c: Data-driven branching model where n inf are drawn from P (n inf ) and infected independently with probability p inf .d: The resulting offspring distribution P (x) can be fitted by a negative binomial, yielding an estimate for the dispersion parameter α. e: Dispersion as a function of T lat , T inf and R0.f: The extinction probability of an epidemic outbreak depends on R0 (left, fixed T inf = 3 days) and on α (right, all combinations of T lat and T inf for given R0), and is larger for human contact patterns compared to randomized controls.
ing of human contact patterns causes a severe increase -for realistic parameters roughly by a factor of two.
From P (x) we derive the extinction probability p ext , defined as the fraction of outbreaks that asymptotically end up in the absorbing state of zero infections (Fig. 2f).It can be calculated using the probability generating function, π(θ) = ∞ x=0 P (x)θ x , as the smallest θ * that solves θ * = π(θ * ) [26].In addition to the anticipated monotonic decrease of p ext = θ * with increasing R 0 , we find that extinction is more likely for actual human contact patterns (red, blue) than for randomized controls (gray, yellow).Moreover, for fixed R 0 , we find that an increased dispersion α due to human contact patterns non-linearly increases p ext (Fig. 2f, right).
Summarizing, we find that the non-Markovian timing of human contact patterns can be a strong source of variability, relevant to explain the over-dispersion of empirical offspring distributions.In particular, increasing the dispersion for a fixed R 0 increases the probability of zerospreading events (Fig. 2d, blue vs gray), and results in a higher extinction probability (Fig. 2f) -in other words, the non-Markovian temporal structure of human contact patterns destabilizes epidemic outbreaks.

Interplay between contact pattern and disease progression modulates basic reproduction number
As highlighted in Fig. 2b, n inf depends on T lat for human encounter trains.This is at first glance surprising, because for memory-less processes n inf is proportional to T inf but independent of T lat .Hence, in the following, we systematically vary T lat and T inf to study how the interplay between human contact patterns and disease progression affects n inf (Fig. 3).
Considering a fixed T inf = 3 days (Fig. 3a) and scanning T lat leads to a periodic modulation of n inf from human encounter trains (black, dashed) around the constant estimate from randomized trains (yellow).Thus, we consider n inf relative to randomized (Fig. 3b), which accounts for the trivial increase of n inf with increasing T inf .For small T inf < 1 day, we find daily modulations as a function of T lat , with regions below-randomized (blue) and above-randomized (red).This effect diminishes for larger T inf , where we find extended, triangular regions with interfaces located at T lat + T inf = 7 days and T lat = 7 days.We thus find periodic modulations of n inf on the scale of days (small T lat ) and weeks (large T lat ).
In the following, we uncover the origin of these periodic modulations using what we call conditional encounter rate Ψ(τ ).In short, Ψ(τ ) describes the average rate of encounters conditioned on an initial encounter (Fig. 3c).Considering the initial encounter as an infection, Ψ(τ ) measures the rate of potentially infectious encounters but neglects variability and dispersion.We find that Ψ(τ ) features a peak at 0 (which implies strong clustering [6]) and the anticipated periodic modulations between high and low encounter rates (which cause a time-dependent secondary attack rate).Both, the initial peak and modulations are again lost for randomized controls (yellow line).
Note that we can directly obtain an estimate of n inf for FIG. 3. Real-world contact patterns modulate the pace of epidemic spread as a function of latent period.a: Absolute n inf are periodically modulated with T lat for human encounter trains but not for randomized (yellow).b: Relative n inf /n rand inf reveals periodic modulations on both daily and weekly scale in the full (T lat , T inf )-plane.c: Ψ(τ ) features daily and weekly modulations for non-Markovian human encounter trains but is constant for randomized encounter trains (yellow).Assuming the initial encounter to be an infection, this explains the modulations of n inf by combinations of T lat and T inf for which the integral (shaded areas) is dominated by valleys (blue) or peaks (red).d: Continuous-time branching model, where encounter times are generated from Ψ(τ ) and infected with constant probability p inf .e: Starting from an initial I0 = 100 random infections in [−T lat − T inf , 0), the (average) number of infections grows exponentially.The growth rate λ for timeindependent encounter times in a fixed T inf is expected to decrease trivially with T lat .If not constant, Ψ(τ ) modulates λ and causes regimes of slower-than-random (blue) or faster-than-random (red) growth of infections.
a particular disease progression (T lat , T inf ) by integrating Ψ(τ ) over the infectious period (Fig. 3c, shaded areas).Reconsidering the previous examples, this explains the lower n inf for T lat = 2 days (blue area covering the valley) and the larger n inf for T lat = 6 days (red area covering the 7-day peak).The examples showcase that n inf is determined by the alignment of the infectious period with regions of low or high Ψ(τ ).
Consequently, since n inf is related to R 0 , the interplay between contact patterns and disease progression modulates the pace of epidemic spread.To illustrate this, we construct a continuous-time branching process, where each exposed individual generates encounters according to Ψ(τ ).During the infectious period, encounters again have a probability p inf to become infected (Fig. 3d).Assuming an outbreak that survived the initial generations, we prepare the system with 100 random initial infections in the interval [−T lat − T inf , 0].The resulting time evolution of daily new cases shows clear exponential growth, where the growth rate λ trivially decreases with the generation time and, thus, T lat (Fig. 3e).However, this expected decrease of λ for memoryless encounter timings (yellow) is modulated in the model due to modulations in Ψ(τ ), which results in slower-than-random (blue) or faster-than-random (red) growth.
Summarizing, human contact patterns cause a dependence of n inf on T lat that modulates R 0 and thereby the growth rate of an epidemic outbreak.

Destabilization and modulation of epidemic spread can be attributed to specific temporal statistics of contact patterns
After illustrating that non-Markovian statistics can destabilize and modulate epidemic outbreaks, it seems natural to ask how they can be included in models of disease spread.In such models it is common to approximate encounter times between individuals as memoryless (Poisson) processes [1].Assuming independence, these processes can be merged to result in encounter trains with Poisson statistics -the same statistics as our randomized encounter trains.In the following, we construct encounter trains with non-Markovian statistics and identify three specific features of contact patterns that are necessary to reproduce the relevant statistics of encounters.As a proof of principle, we showcase a novel inhomogeneous Weibull-renewal process that is constrained by data and reproduces all salient features (Fig. 4

, top row):
i) Focusing on temporal statistics, the encounter rate ρ(t) averaged across individuals and weeks is timedependent but cyclostationary; ρ(t) repeats in a weekly cycle with differences between day and night, and be- FIG. 4. Specific temporal statistics of surrogate point processes can be related to specific characteristics of disease spread.Top: Tailored renewal process that captures time-dependent ρ(t) (first column), heavy-tailed P (δt) (second column), and heterogeneous P (ntrain) (third column), reproduces core characteristics such as the modulation of n inf (fourth column, cf.Fig. 3 and Supplementary Material for additional figures for Ψ(τ )) and the increased probability of zero-spreading in P (n inf ) (fifth column, cf.Fig. 2).Center: An inhomogeneous Poisson process with only time-varying, cyclostationary ρ(t) and heterogeneous P (ntrain) reproduces the modulations in n inf , but underestimates the probability of low n inf due to a lack of clustering and the corresponding large inter-cluster times in P (δt).Bottom: A Weibull-renewal process with contact clustering from P (δt) and heterogeneous P (ntrain) reproduces the high probability of rare outcomes in P (n inf ), but cannot reproduce the modulations in n inf due to a lack of cyclic temporal structure.If, additionally, one relaxes the constraint of heterogeneous P (ntrain) and considers statistically identical trains, then both surrogate processes underestimate the probability of large n inf related to super-spreading (see Supplementary Material).
ii) The distribution of inter-encounter intervals P (δt) has high probability for small δt and a heavy tail of nonvanishing probability for large δt (second column).Because this tail corresponds to long periods without any encounter, it causes the high probability of n inf ≈ 0 (last column) that strongly contributes to the destabilization of epidemic outbreaks.P (δt) is dominantly shaped by the clustering of human contacts and can be well approximated by a Weibull distribution [7,27].Accordingly, a Weibull-renewal processes (last row) well reproduces P (δt) and P (n inf ), but it does not have a timevarying ρ(t) and cannot reproduce the period modulations of Ψ(τ ) and n inf .
iii) Encounter rates vary between individuals (third column).This variability can be attributed to intrinsic differences in contact behavior (cf.Fig. 2, gray vs yellow) and is partly captured by the degree distribution of the contact network [28].Recall that such acrossindividual variability is crucial to reproduce the heavy tails of P (n inf ) and offspring distributions (see also Supplemental Fig. S3 for generative processes where individuals share a common rate).
Clearly, models of disease spread can benefit from a generative process that reproduces those relevant features of human contact patterns, such as the tailored Weibull-renewal process showcased here.However, although our process reproduces all discussed features, it is built heuristically, and future work is needed to construct microscopic models that give rise to cyclostationary rates with clustering in a principled way, while remaining mathematically tractable.

III. DISCUSSION
We analyzed real-world human contact patterns and found that their non-Markovian timings shape epidemic spread in two important ways.Firstly, they increase the over-dispersion of offspring distributions, compared to random (Poisson) contact patterns, which (i) leads to more zero-and super-spreading events, and (ii) decreases the probability of an epidemic outbreak from an initial infection.While clustering is typically associated with super-spreading events, it inevitably causes periods of low contact rate that increase the probability of zero-spreading events.The resulting increase in extinction probability (despite super-spreading) is consistent with previous results, where individual variation of R 0 captured the over-dispersion of empirical offspring distributions [23].Still, the sources of this variation remained poorly understood, with candidates ranging from environmental factors (behavior, seasonality) to intrinsic ones (viral load, susceptibility) [29].Here, we disentangled two sources based on contact patterns and identified heterogeneous contact rates and the non-Markovian timing of contacts as relevant factors for over-dispersion in disease transmissibility.
Secondly, human contact patterns non-trivially modulate the pace of epidemic spread depending on the latent period, which we attribute to time-dependent but cyclostationary encounter rates.A cyclostationary rate leads to periods of statistically high and low encounter rates conditioned on a potential infection.How these periods align with the infectious period is affected by the latent period and determines whether the number of potential secondary infections, and in turn R 0 , increases or decreases.This modulation of R 0 can thus be understood as a resonance following either a constructive or destructive interference between a periodically changing contact rate and the disease progression.This resonance is a new mechanism to explain the previously observed slow-down or speed-up of diffusion processes on temporal networks due to non-Markovian characteristics [30].
In the main manuscript we focus on deterministic disease progression with fixed periods (T lat , T inf ), but we also considered non-deterministic disease progression with gamma-distributed periods [31][32][33]; the results are summarized in the Supplementary Material (Fig. S1).We find our main conclusion verified for nondeterministic disease progression: the probability of zerospreading events is reliably higher for human contact patterns compared to randomized; however, the modulation of n inf with T lat is smeared out with increasing variability in the period durations.Thus, in the unrealistic (but commonly adopted) limit of exponentially distributed periods, human contact patterns still reduce the robustness of outbreaks but no longer modulate the pace of epidemic spread.
To reproduce the relevant temporal features of human contact patterns, we introduced non-Markovian contact dynamics in the form of Weibull-distributed interencounter intervals (clustering) or inhomogeneous encounter rates (cyclostationarity).Previous studies of non-Markovian disease spread [7] found that clustering drastically affects the epidemic threshold for T lat = 0, which is caused by the high frequency of small interencounter intervals [34] that, in our context, manifests as a near-zero peak in the conditional encounter rate (Fig. 3).
Although it was shown that some non-Markovian models can be mapped onto effective Markov models [35,36], our results suggest that the non-Markovian and cyclostationary features of human contact patterns make a similar general mapping elusive.This highlights the necessity for generative models that are non-Markovian, yet well understood and simple enough to find broad use in epidemic modeling and beyond.
Our work is a first step towards providing such models.We identified temporal statistics of real-world contact data that affect disease spread, and faithfully reproduced them with our tailored Weibull-renewal process.Thereby, our work provides an accessible pathway towards including non-Markovian statistics into spreading processes, in general, and paves the way to systematically study their non-equilibrium physics.

METHODS
Extracting contacts from real-world physical proximity data: Consider data composed of a list of co-locations (physical proximity) described by the tuple (timestamp, user id A, user id B).We first sort the colocation times into unique lists for all id pairs (A,B) and (B,A).For each valid A, we then iterate over its list of (A,B) and merge co-location times that span consecutive time steps to construct pairwise contacts with starting time s and duration D. Combining these contacts yields a list of contacts {(s i , D i )} A for each participant A.
From the lists of contacts, we construct a pointprocess-like representation for each participant that we call encounter train (see Fig. 1).Throughout the manuscript, an encounter refers to the starting time of a contact.The encounter train of participant A is the time-sorted list of all contact starting times s i and can formally be written as The main data set from the Copenhagen Networks Study [12,20] is based on Bluetooth signals between phones of individuals that participated in the study.The published data is a list of interactions described by the tuple (timestamp, A, B, RSSI), where user id B can be negative if the interaction is with a device outside of the study or an empty scan, and RSSI is the received (Bluetooth) signal strength indicator.The RSSI can be considered as a proxy for interaction distance, especially since all participants used the same device [37], with an RSSI ≈ −80 dBm corresponding to a distance of about 2 m ± 0.5 m.Since the data provides a maximal RSSI per time window, we consider RSSI < −80 dBm to indicate interactions to be further apart than 2 m throughout the full time window [37], and exclude them.Consequently, we filter the raw data to only include those interactions that are within the study (user id B ≥ 0) and have RSSI ≥ −80 dBm.The data set covers a duration of t max = 28 days, with a time step of 5 min, for 675 encounter trains.
Average time-dependent encounter rate ρ(t): Because encounter trains are a point-process-like representation, we can define an encounter rate as the number of encounters in a window of size ∆t.Assuming statistical independence between weeks and between participants, we determine the average time-dependent encounter rate ρ(t) by averaging the number of encounters in a time windows of size ∆t = 1 h throughout the week (i.e.first hour of a Sunday until last hour of a Saturday) across weeks of the experiment and across participants.Statistical errors are calculated on the level of participants using delete-m jackknife error analysis.
Inter-encounter interval δt: To study temporal clustering and contact bursts, we measure the interval δt between consecutive encounter times.Since we are interested in the encounter statistics, each encounter has the same statistical weight independent of its encounter train origin.Consequently, the distribution P (δt) is simply the distribution over all observed intervals.To estimate statistical errors, we take into account that the number of encounters n j differs between individual trains (hence also the number of inter-encounter intervals n j − 1), and evaluate statistical errors on the level of observed intervals using delete-m j jackknife error analysis with m j = n j − 1.
Conditional encounter rate Ψ(τ ): To investigate how contact patterns interact with disease spread, we measure the encounter rate Ψ(τ ) upon a hypothetical infection from an encounter at τ = 0. To construct Ψ(τ ), we iterate over all encounters to measure the timedependent encounter rate with temporal resolution of the experiment, starting from the encounter time, i.e, τ = t − s i = 0, until τ = τ max (we typically chose τ max = 10 days) or, if t max −s i < τ max , until τ = t max −s i .We then average over all these time-dependent encounter rates taking into account their different lengths.To estimate statistical errors, we take into account that the number of encounters n j differs between individual trains by using delete-m j jackknife error analysis with m j = n j .
Disease model: We consider a disease that progresses in three discrete states upon infection: exposedinfectious-recovered.The duration T lat within the exposed state is called latent period and the duration T inf within the infectious state is called infectious period.For our main results, we consider the simple and intuitive case of a deterministic disease progression, where these periods are always the same.This corresponds to drawing the periods from delta distributions, which is quite different to commonly employed approximations that draw periods from exponential distributions (as expected for Poisson processes that describe many state transitions, from radioactive decay to chemical reactions).To control that our results also apply to non-deterministic disease progression, we repeated our analysis for the more realistic case of gamma-distributed periods [31][32][33] and obtained consistent results (Supplemental Material).
Potentially infectious encounters n inf : To avoid assumptions on the probability of infection upon encounter, we introduce potentially infectious encounters as the number of encounters that occur during the infectious period of a hypothetical disease.For the deterministic disease progression, we can enumerate the statistics by iterating over all encounters of the data set.For each encounter s i , we check whether the disease progression still fits into the experimental duration t max (s i + T lat + T inf ≤ t max ), and if true, estimate n inf as the number of subsequent encounter s j for which T lat < s j −s i < T lat +T inf .For the non-deterministic disease progression, we need to sample disease realizations (see Supplemental Material).Statistical errors are calculated again on the level of encounters using the delete-m j jackknife analysis with m j = n j .
Branching process with empirical distribution: To estimate the survival probability from the empirical distribution of potentially infectious encounters, P (n inf ), we construct a discrete-time data-driven branching process (Fig. 2c).In a first step, each infection causes X ∼ P (n inf ) potentially infectious encounters.In a second step, each of these encounters can cause a secondary infection with probability p inf , such that the number of secondary infections is binomial, Y ∼ B(X, p inf ).From Z t infections in generation t, we thus obtain Z t=1 = Zt i=1 Y i infections in the next generation.
Continuous-time branching process with inhomogeneous contacts: To study the pace of epidemic spread, we construct a continuous-time branching process that captures the conditional encounter rates but neglects interactions between infected.Here, each infected individual generates independent encounter trains starting from the initial infection time as inhomogeneous Poisson processes with a time-dependent rate given by the conditional encounter rate (Fig. 3d).Only those encounters that occur during the infectious period cause secondary infections with a chosen probability p inf .Every secondary infection then generates a new encounter train and so on.For simplicity, we restrict our example to deterministic diseases with fixed latent and infectious periods.
Point process models to approximate human contact patterns: To disentangle the effect of distinct features of human contact patterns on the statistics of encounters, we constructed point-process models that captured (i) the distribution of rates across individuals, (ii), a time-dependent average encounter rate, and (iii), the distribution of inter-encounter intervals, or a combination thereof (see Supplementary Material for comparison of combinations).
To reproduce the inter-individual variability, we consider the same number of encounter trains as present in the data and weight each train with their relative rates, i.e, w i = n train,i / n train , where • is the average across trains.
To reproduce a time-dependent encounter rate ρ(t), we employ thinning [38]: From a hidden process with rate ρ i = max t [ρ(t)] we accept events at time t with probability p i (t) = ρ(t)/ρ i .This procedure can formally only be applied for memory-less hidden processes, i.e., Poisson processes, in which case it results in an inhomogeneous Poisson process.To further reproduce heterogeneous rates in the inhomogeneous Poisson process, we fix p(t) but rescale the rates of the hidden processes, To reproduce the empirical distribution of interencounter intervals, we construct a Weibull-renewal process: Inter-encounter intervals are drawn from a Weibull distribution with scale parameter λ and shape parameter k.The Weibull distribution was parameterized by a fit to the data yielding (k, λ) = (0.3690, 3030).To further reproduce heterogeneous rates in the Weibull-renewal process, we notice that the mean rate of a Weibull-renewal process is given by ρ −1 , such that we can simply choose k i = k and λ i = λ/w i .
To combine all features in a single model, we construct a tailored renewal process: Weibull-renewal processes with heterogeneous rates and additional (heuristic) thinning.We start with a set of hidden Weibull-renewal processes with k i = k, λ i = λ/w i , and time-dependent acceptance probability p(t) with time-average p(t).The mean rate of each process is ρ i = p(t)w i /λΓ(1 + 1/k).Since we cannot fit (k, λ) of the hidden process, we further constrain the parameters with the mean rate from data, i.e., ρ(t) = ρ i = p(t)/λΓ(1 + 1/k), with w = 1 by construction.Since ρ(t)/p(t) = max t [ρ(t)], we thus −1 , such that k remains the only free parameter.We obtained our best estimate of k by minimizing the Kullback-Leibler divergence [39] between the distribution tails (δt 0.5 days) of model and empirical P (δt), finding k ≈ 0.24.
Jackknife error estimation: To estimate statistical errors of our results, we use jackknife error estimation while carefully taking into account the size of the leftout data set.The basic idea of the jackknife method is to estimate from some data X = {x 1 , . . ., x g } the variance of an observable Ô = f (X) using a systematic resampling approach [40].Jackknife samples O j are generated by systematically leaving out data, e.g., Ôj = f (X ) with X  = {x 1 , ...., x j−1 , x j+1 , ...x g }.Importantly, here each x j can be a block of (differently many) data points.While typically theses blocks have the same size m (delete-m jacknife), they could have different sizes m j (delete-m j jackknife), which will be relevant for some of our cases.From the jackknife samples, one can show that bias-reduced estimators of the mean and variance are given by [41] where h j = ( g j=1 m j )/m j , and Ô = f (X) is the naive estimate.For blocks of equal size, m j = m, we have h j = g and this simplifies to In our case, the data X is the set of all encounter trains and in the resampling step we leave out individual encounter trains.Since trains include differently many encounters, this can results in removing blocks of different sizes.In particular, all observables that derive from the number of encounters, e.g., n inf or P (n inf ), require the delete-m j analysis, Eq. ( 3), to estimate the statistical error.On the other hand, for observables that depend on time-binned data, e.g., the time-dependent rate, each encounter train has the same size given by the number of time bins during the recording such that the delete-m analysis, Eq. ( 4), is sufficient to estimate the statistical error.

SUPPLEMENTARY MATERIAL
In the Supplementary Material, we provide additional controls to verify the robustness of our results in the main manuscript.

A. Non-deterministic disease progression
In the main manuscript, we have focused on deterministic disease progression, where latent and infectious periods had a precise duration.For a more realistic view, we want to allow for the latent and infectious periods to vary from case to case (Fig. S1).To that end, we draw the periods T lat and T inf from a gamma distribution, but we keep the mean duration fixed1 .The case-to-case variability is then parameterized through the shape parameter k (Fig. S1a).
In particular, we fix the mean T i by choosing the scale θ = T i /k such that The shape parameter allows us to interpolate between a delta distribution (k → ∞) and an exponential distribution (k = 1), as commonly assumed in computational epidemiology for mathematical tractability [1].
Clinically observed distributions of periods between disease states are neither delta distributed nor exponential distributed and may be best described by distributions with a clear peak but vanishing probability at zero, such as log-normal distributions or gamma distributions with shape parameters in between (1, ∞).On the one hand, deltadistributed periods seem like a convenient but unrealistic simplification.On the other hand, exponentially distributed FIG.S1.Increased variability of disease-stage periods does not affect conclusions on robustness of outbreaks but weakens the modulation of spreading pace.a, b: To include person-to-person variability, we draw both latent and infectious periods from gamma distributions characterized by the shape parameter k, which interpolates between exponential (k = 1) and delta (k → ∞) distributions.For small k, T lat and T inf differ in duration from realization to realizationacross individuals, the probability to be infectious at a given time is smeared out.As k increases, the periods vary less around their expected value and the disease progression eventually becomes deterministic.c: Probability distributions of potentially infectious encounters n inf for k → ∞ (top) and k = 10 (bottom).When randomizing trains, the probability of zero-infectious encounters is suppressed.d: Mean number of potentially infectious encounters n inf as a function of k for the two examples from the main manuscript (T inf = 3 days and T lat = 2 or 6 days).For k → ∞, we recover the result for deterministic disease progression (dashed lines), where the latent period induces a notable difference between the two examples.For smaller k, the resonance effects remain relevant but the difference decreases, see also Fig. S2.
periods may appear more realistic, however, they imply an artificially high probability of short durations, which in turn leads to realizations where the infectious period either starts shortly upon infection or has close-to-zero duration (example traces in Fig. S1a).In fact, it has been argued already in the past that gamma distributions are more realistic [31][32][33], see for example empirical distributions of latent periods for COVID-19 [43], such that more realistic shape parameters could be in the range h ∈ [5,20] which is between delta and exponential.To investigate how case-to-case variability affects the number of potentially infectious encounters n inf , we again consider the probability distribution P (n inf ) (Fig. S1c), and revisit the two examples (T inf = 3 days with either T lat = 2 days or T lat = 6 days).
For the delta-disease (k → ∞, top), the red (T lat = 6 days) and blue (T lat = 2 days) distributions exhibit a peak at zero, they are broad, and have a long tail.Comparing red and blue, we find that the latent period determines the height of the peak at zero as well as the position of the bulk distribution, and, thereby, determines the mean number of potentially infectious encounters (dashed vertical lines).The mean values clearly differ.Again comparing with the randomized encounter trains (yellow line), we find no peak at zero, a shorter tail, and no dependence on the latent period (the respective randomized lines fall on top of each other).Importantly, a peak at zero implies that the infected individual does not pass on the infection, so that the disease becomes more likely to die out if case numbers are low.
Changing to the non-deterministic disease progression (k = 10, Fig. S1c bottom), the distribution from randomized data is barely affected.However, the red and blue distributions are more similar to each other, but also broader and smoother than for the delta-disease.This can be explained by gamma-distributed periods acting as a smoothing kernel along both dimensions of Fig. S1f, where variability in the infectious period directly affects n inf , while variability in the latent period affects n inf through the resonance effect.Consequently, we expect that with decreasing shape parameter k, the mean number of potentially infectious encounters becomes independent of the mean latent period.Indeed, when considering n inf as a function of k, we find that the estimates for our two examples approach each other, as k decreases (Fig. S1d).
Note that our analysis has a lower bound in k once realizations of disease progression (latent + infectious period) cannot find sufficiently many initial encounters to fit into the finite duration of the experiment (4 weeks for CNS).However, using other examples with smaller latent and infectious periods (where we can acquire enough statistics), we show that the two extreme cases meet for k ≈ 1 (see Fig. S2).

B. Non-deterministic disease progression with fast disease stages
With the example of the main text, we were not able to sample the for small k values (more variability across disease realizations) because the 28-day duration of the data becomes too short once the periods of disease progression are close-to exponentially distributed (k → 1).
To avoid this issue and to illustrate shorter timescale, we here compare with another hypothetical example of a "faster" disease progression, where T inf = 0.5 days and T lat is either 1 or 1.5 days (Fig. S2).In this case, expected periods are very short so that even realizations with periods that severely exceed their expected value fit into the 28-day duration.
We find a few noteworthy aspects: First, the absolute value of potentially infectious encounters n inf is much lower for faster disease progression.This is due to the much shorter infectious period.However, the relative deviation from the randomized surrogate data is consistent between the two examples.Second, we now find that the longer latent period (1.5 vs. 1.0) leads to fewer potentially infectious encounters (red vs. blue).This is just a result of the chosen latent periods: choosing an even longer latent period would result in a respective decrease of n inf (e.g.2.0 vs 1.5).Third, as expected, as k → 1, the estimates of n inf overlap for different latent periods, because individual disease realizations become very variable (cf.Fig. S1a).

FIG. S2.
Example with smaller latent and infectious periods provides additional insight on low-k regime of nondeterministic disease progression.Here, we compare the example from the main manuscript ("slow", T inf = 3 days and T lat either 2 or 6 days) with a hypothetical "fast" disease progression (T inf = 0.5 days and T lat either 1 or 1.5 days) and show both the mean number of potentially infectious encounters as a function of the shape parameter k of the gamma-distributed latent and infectious period (top) as well as their distributions for selected k.Due to the finite duration of the recording, the accessible low-k regime is determined by the mean latent and infectious period, because for low k large periods quickly exceed the finite duration.For faster disease progression (smaller latent and infectious period), we observe modulations on smaller timescales and, in addition, reach the low-k regime of exponentially distributed periods (k = 1) commonly assumed in epidemiological simulations.As one can see more clearly for faster disease progression, the mean number of potentially infectious encounters approach each other in this low-k regime, which can only be anticipated from the results for slower disease progression.This implies that modulations will not be present for exponentially distributed latent and infectious periods but only for more realistic non-exponentially distributed ones with higher k.

C. Simple point processes cannot fully reproduce temporal features
As we noted in the main text, each of the considered simple generative models is insufficient to reproduce the full set of observed temporal features of contact patterns.Here, we provide a more complete overview of processes and the features they reproduce (Fig. S3).We identified three relevant features: i) a time-varying, cyclostationary encounter rate ρ(t) (first column) ii) a heavy-tailed inter-encounter-interval distribution P (δt) (second column) iii) encounter rates vary between individuals (third column) Feature i is responsible for modulation of the conditional encounter rate Ψ(τ ) (fourth column).Combined with disease progression, this causes modulations of n inf and distributions P (n inf ) that vary with T lat and T inf (last two columns).Feature ii is responsible for zero-spreading events, which destabilize epidemic outbreaks and manifest in the peak of P (n inf ) for n inf ≈ 0 (last column).Feature iii is responsible for super-spreading events, which correspond to the tail of P (n inf ) and, thus, can cause a systematic shift of n inf .FIG. S3.Simple generative processes are not capable of reproducing all aspects of the full real-world contact statistics.An inhomogeneous Poisson process can reproduce cyclostationary ρ(t) but fails to reproduce sufficiently heavy-tailed P (δt).A Weibull-renewal process reproduces heavy-tailed P (δt) but fails to reproduce the cyclostationary ρ(t).Generative processes need to reproduce variability between individuals P (ntrain) to produce long-tailed distributions P (n inf ).In order to reproduce the data in all the considered aspects, we had to design a tailored Weibull-renewal process with homogeneous rates.

D. Control regarding continuous contribution of participants
In the main manuscript, we use the full published data set of the Copenhagen Networks Study [20], covering the physical proximity data of 675 participants.Upon closer inspection, there are periods both at the beginning and the end of the experiment without entries for some of these 675 participants.Since entries also occur for Bluetooth signals with unknown devices, this may indicate irregularities in the contact behavior of some of the participants, e.g., incomplete participation of individuals.
In order to make sure that our results are not affected by such boundary effects, we reanalyzed the data and included only the contact trains of those individuals for which any Bluetooth signal was recorded on both the first and last day of the study (Fig. S4).Technically, we achieved this easily by restricting our analysis to those IDs for which timestamps were recorded within the first day (timestamp < 1 • 24 • 60 • 60s) and the last day (timestamp > 27 • 24 • 60 • 60s), reducing the data set to 533 contact trains.
This control analysis fully supports our quantitative results from the main manuscript (Fig. S4) such that we can rule out artifacts from boundary effects of incomplete participation.In particular, we observe a matching weekly structure of the encounter rate (Fig. S4a), a matching distribution of inter-encounter intervals that can be fitted with a Weibull distribution (Fig. S4b), and a matching conditional encounter rate (Fig. S4c).Consequently, both mean potentially infectious encounters for deterministic disease progression (Fig. S4d and f) as well as for non-deterministic disease progression (Fig. S4e and g) match our main results.

E. Analysis of an alternative data set
To further test the robustness of our results, we repeated our analysis on a completely independent data set.Here, we consider contact data recorded at one of the office buildings of the French Institute for Public Health Surveillance "InVS" [44].This data is recorded with a different technique, namely so-called near-field chips that only record signals in close proximity ( 5 m) and avoid to threshold the Bluetooth signal.Moreover, the temporal resolution of contacts is 20 s as opposed to 5 min in the main manuscript.In addition, the data is recorded for a different social group, namely adults within an office building.Last, the data is recorded in another country (France) by a different collaboration (SocioPatterns).The data set spans 2 weeks of recording 145 participants (two thirds of the staff agreed to participate).
The analysis of this completely independent data set provides completely consistent results to those presented in the main manuscript (Fig. S5).When comparing the results, we have to highlight that the available statistics for this data set are much smaller due to smaller duration and smaller number of participants.However, we clearly see the expected weekly structure in the encounter rate (which is here again dominated by working days because of office hours), the distribution of inter-encounter intervals that is well described by a Weibull distribution, as well as the typical conditional encounter rate with a peak at 7 days.Consequently, also the results for (delta) disease progression are consistent with our main findings on the modulation of potential secondary infections (Fig. S5d).Results on nondeterministic disease progression are confined to shorter latent and infectious periods due to the shorter experimental duration.We conclude that the additional data set fully supports our results in the main manuscript.

FIG. S6.
Infectious encounters for external infections.In our analysis in the main manuscript, we preserved the temporal features by constraining disease onsets to available encounters.a: Distribution of potentially infectious encounters for non-deterministic disease progression (k = 10).6 days latent period.b: Comparison of the mean ninf between 6 days latent period (dark) and 2 days latent period (light).c: Same as a), but 2 days latent period.

FIG. 1 .
FIG. 1. Real-world contacts represented as encounter trains.a: A contact between two individuals is defined as ongoing co-location during consecutive time steps.Focusing on contagious disease transmission, we only consider contacts closer than 2 meters and longer than 15 minutes.b: The start times of the remaining contacts per individual form their encounter train.A raster plot of such encounter trains shows a clear temporal structure of human contact patterns.c, d: Randomization preserves the number of encounters per train but destroys temporal structure.
FIG. 2.Real-world contact patterns increase the probability of rare spreading events, which increase the extinction probability of an epidemic outbreak.a: Sketch of estimating n inf from encounter trains for a disease model with T lat and T inf .b: P (n inf ) depends on T lat for human encounter trains (red, blue) but not for randomized controls (gray, yellow), which underestimate both zero-spreading as well as super-spreading events.c: Data-driven branching model where n inf are drawn from P (n inf ) and infected independently with probability p inf .d: The resulting offspring distribution P (x) can be fitted by a negative binomial, yielding an estimate for the dispersion parameter α. e: Dispersion as a function of T lat , T inf and R0.f: The extinction probability of an epidemic outbreak depends on R0 (left, fixed T inf = 3 days) and on α (right, all combinations of T lat and T inf for given R0), and is larger for human contact patterns compared to randomized controls.
FIG. S4.Control regarding continuous contribution of participants.Here, we excluded those trains that did not have any encounters during the first or last day.This results in 533 instead of 675 trains.Panels match key figures from the main manuscript.Results are consistent.The fitted Weibull distribution has shape parameter 0.3739 and scale parameter 3161.

FIG. S5 .
FIG. S5.Main results for other data set.Because the recording lasted "only" two weeks, the duration of disease that can be sampled were limited to the fast disease progression.The fitted Weibull distribution has shape parameter 0.225 and scale parameter 675.0.