This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Paper The following article is Open access

Coalescent processes emerging from large deviations

Published 3 April 2024 © 2024 The Author(s). Published on behalf of SISSA Medialab srl by IOP Publishing Ltd
, , Citation Ethan Levien J. Stat. Mech. (2024) 033501 DOI 10.1088/1742-5468/ad2dda

1742-5468/2024/3/033501

Abstract

The classical model for the genealogies of a neutrally evolving population in a fixed environment is due to Kingman. Kingman's coalescent process, which produces a binary tree, emerges universally from many microscopic models in which the variance in the number of offspring is finite. It is understood that power-law offsprings distributions with infinite variance can result in a very different type of coalescent structure with merging of more than two lineages. Here, we investigate the regime where the variance of the offspring distribution is finite but comparable to the population size. This is achieved by studying a model in which the log offspring sizes have stretched exponential tails. Such offspring distributions are motivated by biology, where they emerge from a toy model of growth in a heterogeneous environment, but also from mathematics and statistical physics, where limit theorems and phase transitions for sums over random exponentials have received considerable attention due to their appearance in the partition function of Derrida's random energy model (REM). We find that the limit coalescent is a β-coalescent—a previously studied model emerging from evolutionary dynamics models with heavy-tailed offspring distributions. We also discuss the connection to previous results on the REM.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

Evolution is, in large part, shaped by a tension between two opposing 'forces': neutral genetic drift and selection. Neutral genetic drift refers to random changes in the genetic composition of a population due to chance events in the deaths and reproduction of individuals. Selection, in contrast, results from a deterministic bias towards fitter individuals. A major objective of evolutionary biology is to determine the relative impact of these forces. In order to achieve this, we need to understand how different microscopic mechanisms manifest in macroscopic observables, such that the frequency of a genotype or the shape of the genealogical tree.

Much of our understanding of the interplay between genetic drift and selection comes from the Wright–Fisher model [NW18, DD08] or its counterpart with overlapping generations, the Moran model. In both models, the source of noise is the random sampling of individuals from generation to generation. In the large population size limit, this sampling noise has a variance which is inversely proportional to N, the population size. When genetic drift is the sole source of changes in the composition of the population—that is, in neutrally evolving populations—N sets the characteristic time-scale of the evolutionary dynamics.

The true population size is rarely related in a simple way to the variance in genotype frequency, as predicted by the Wright–Fisher model. Therefore, we usually think of an N as an effective population size measuring the overall strength of genetic drift, rather than the literal number of cells. This has motivated the question: what determines the effective population size? Of particular interest is the question of how non-genetic variation between individuals shapes the effective population size [JLA23, LMKA21]. J. H. Gillespie was one of the first to address this question. He studied a model in which the number of offspring from each individual is a random variable having finite variance, showing that the effective population size is obtained by scaling the true population size by the variance in offspring [Gil73, SK12, Sch15].

We now have a more general and mathematically rigorous understanding of this problem based on the Cannings model [Can74]. Starting with N labeled cells, trajectories of the Cannings model are constructed by generating an exchangeable random vector $(\nu_{1,k},\dots,\nu_{N,k})$ satisfying $\sum_{i = 1}^N \nu_{i,k} = N$ for each $k \in \mathbb{N}$. $\nu_{i,k}$ represents the number of offspring in generation k that descend from individual i in the $(k-1)$th time-step. We will henceforth omit the subscript k and it should be understood that all quantities associated with a generation are implicitly dependent on k 2 . Inspired by [Sch03, Hal18, OH21], we will focus on the particular case where νi is obtained by first generating i.i.d. random variables $\{U_i\}_{i = 1}^N$ representing the number of offspring of each individual before resampling and then sampling the resulting offspring pool, without replacement, to obtain the individuals in the next generation. Conditional on the offspring numbers $U_1,\dots,U_N$, νi follows a hypergeometric distribution:

Equation (1)

This formulation bears a close resemblance to the model studied by Gillespie [Gil73] and is an example of the generalized Wright–Fisher model defined in [DEP11].

One limit theorem for the Cannings model concerns the behavior of genotype frequencies over long time-scales in large populations. By long time-scales, we mean on the order of the coalescent time, defined as the average number of generations. We must travel backwards in time to find a common ancestor for two randomly selected individuals of the same generation. The coalescent time is equivalent to the effective population size in some cases, but is more general in the sense that the definition does not require a mapping to the Wright–Fisher model. Under the assumption that the variance in the number of offspring produced by each individual is finite, the time-rescaled genotype frequencies converge as $N \to \infty$ (in the Skorokhod sense—see [Ker22, EK09]) to the well-known Wright–Fisher diffusion [MS01, Gil74]. In the Wright–Fisher diffusion, the change in the frequency, X(t) of a genotype over a time interval $\textrm{d}t\ll 1$ is normally distributed with mean zero and variance $X(t)(1-X(t))\textrm{d}t$.

There is a limit theorem, due to Kingman [Kin82], for the genealogical trees in the Cannings model. These trees can be generated by a coalescent process which, roughly speaking, specifies which lineages have merged k generations back in time from our original sample. Under the condition that the genotype frequencies converge to the Wright–Fisher diffusion, the time-rescaled coalescent process converges to a continuous time process known now as the Kingman coalescent. The genealogical tree produced by this process is a binary tree where pairs of lineages merge at a rate one and, importantly, the probability of more than two lineages merging at a single instant is zero. Similar results exist for other microscopic models (e.g. the Moran process) and a large body of work focuses on understanding how the microscopic details of the process shape the coalescent time or effective population size [Gil74, JLA23, Cha09, Gil04, Gil01].

The Wright–Fisher diffusion/Kingman models are not always sufficient to capture the dynamics of real evolution. This is due to, for example, multiple merger coalescents appearing in experimental data [TL14, SHJ19]. For multiple merger coalescents, there is a non-negligible probability to observe more than two individuals sharing a common ancestor in a single unit of time (e.g. a generation in the model) on time-scales of the order of the coalescent time. In [Sch03], Schweinsberg explored the question of whether such genealogies can emerge from neutral evolution by studying the Cannings model with power-law distributions. A number of papers have since investigated the role of the power-law tail offspring distribution in generating multiple merger coalescents and non-diffusive genotype frequency fluctuations [Hal18, OH21, CGCSWB22]. The genotype frequency dynamics which emerge from power law offspring—known as Λ-Fleming–Viot processes—are in general non-diffusive processes having discontinuous sample paths. With the notable exception of [CGCSWB22], most previous work has focused on how multiple merger coalescents emerge when the variance in offspring is infinite. The assumption of infinite variance is a mathematical convenience, which may be justified in some cases, e.g. for certain models of dormancy [CGCSWB22, WV19] and rare mutations [LD43]. What remains unclear is which coalescent processes emerge when the offspring distribution has a variance which is finite, yet large enough relative to the population size to give rise to multiple merger coalescents.

In this paper, in order to understand the role of large but finite offspring variability, we investigate the limit processes which emerge when the population size and offspring variability are simultaneously taken to be large. Similar limits were investigated in [CGCSWB22], but we focus on a more specific scaling between population size and offspring variability, which allows us to obtain precise descriptions of the limit coalescents. The offspring distribution we consider and our scaling assumption are both inspired by prior work on the random energy model (REM) of disordered systems.

Our main result (theorem 2) states that the limit processes emerging from the genealogies of the Cannings model under this scaling limit, called β-coalescents, are the same as the coalescent processes emerging from power law offspring. Our model is parameterized by a scaling rate which is analogous to the temperature of the REM. Just as in the REM, we find that there are two critical points. Below the lower critical point there is no continuous time limit process, while between the two critical points one finds multiple merger coalescents. However, while the lower critical point corresponds exactly to the lower critical point of the REM marking the transition to the 'frozen' state, the upper critical point does have an obvious interpretation in the context of the REM (which would be to separate the regimes of strong and weak-self averaging of the partition function). Our results complement and expand upon the existing connection between coalescent theory and the REM, which was made by Bolthausen and Sznitman in [BS98].

1.1. Organization of this paper

This paper is organized as follows. In section 2.2 we describe the model under consideration, which is a particular instance of the Cannings model. In sections 2.1 and 2.3 we review some of the background of coalescent theory and review what is known about this model. In section 3 we present our main result, which concerns the limiting coalescent when both the population size and offspring variation are taken to be large. We also discuss the relationship between our results and [CGCSWB22]. Section 4 is devoted to the REM and the connection between our results and the thermodynamic limit of the REM.

2. Background

Throughout this paper, we use the following standard notation. $g \sim f$ means $f/g \to 1$, $a\wedge b = \min\{a,b\}$, $a \vee b = \max\{a,b\}$, $(n)_k = n(n-1)\cdots (n-k+1)$ and $[n] = \{1,2,\dots,n\}$.

2.1. Characterization of limit coalescent processes

For a realization of the Cannings model with population size N, the corresponding discrete coalescent process on a sample of size n, denoted by $(\Psi_{n,k}^N)_{k\unicode{x2A7E}0}$, describes the genealogical tree obtained by following each sampled individual's ancestors back through time and grouping branches when individuals share a common ancestor. This process is shown in figure 1. We can understand $\Psi_{n,k}^N$ as the state of this tree k generations back from the time our original sample was taken. To define $\Psi_{n,k}^N$ more mathematically, let ${\mathcal{P}}_n$ denote the space of partitions on $[n] = \{1,\dots,n\}$; that is, each element of ${\mathcal{P}}_n$ is a set of disjoint subsets of $[n]$ whose union is $[n]$. Then $\Psi_{n,k}^N$ is a Markov chain taking values in ${\mathcal{P}}_n$ and $\Psi_{n,0}^N$ is the partition into singletons: $\Psi_{n,0}^N = \{\{1\},\{2\},\dots,\{n\}\}$. For k > 0, indices $i,j \in [n]$ are in the same block of the partition if and only if the ith and jth individuals in the original sample share an ancestor k generation in the past.

Figure 1.

Figure 1. (left) A simulation of the Cannings model. Squares indicate individuals at the beginning of each generation and the protruding lines are their offspring. The thick red lines indicate an example of a discrete genealogy, or coalescent $\Psi_{5,k}^5$, obtained from a sample of the final population of labeled cells. For example, $\Psi_{5,1}^5 = \{\{1\},\{2,3\},\{4,5\}\}$. (right) A larger simulation of a continuous time coalescent process which is obtained in the limit of the model on the left.

Standard image High-resolution image

The continuous time coalescent processes which emerge as limits of $\Psi_{n,k}^N$ from any exchangeable Cannings model have been characterized in [MS01]. The authors consider the large-N limit of the time-scaled coalescent process

where cN is the probability that two randomly selected individuals in one generation share an ancestor in the previous generation. Observe that $c_N^{-1}$ is simply the coalescent time, since the number of generations to the most recent common ancestor of two random selected individuals from the current generation follows a geometric distribution with parameter cN . Recall that $\{\nu_i\}$ are the offspring numbers after resampling the total offspring pool (see equation (1)). The chance that two individuals are both descendants of the first individual in the previous generation is $(\nu_1/N)(\nu_1-1)/(N-1)$. Multiplying by N and averaging over all possible $\{\nu_i\}_{i = 1}^N$ gives

Equation (2)

We emphasize that, given the distribution of Ui , it is not straightforward to compute cN because it has a nonlinear dependence on the sum SN .

The precise notion of convergence considered is in the Skorohod sense 3 – see [EK09] for details. If $(\Psi_n(t))_{t\unicode{x2A7E}0}$ converges to a continuous time process $(\Psi_{n}^N(t))_{t\unicode{x2A7E}0}$ in the Skorohod sense, the possible transitions that can occur in the process $\Psi_n(t)$ involve $\sum_{i = 1}^rk_i$ of the $n = \sum_{i = 1}^rk_i+s$ blocks in ${\mathcal{P}}_n$ merging into r lineages by collapsing into groups of sizes $k_1,\dots,k_r$ while s lineages remain unchanged. Such events are called $(n,k_1,\dots,k_r;s)$ collisions. The rates of these collisions, which uniquely determine the law of $\Psi_{n}(t)$, will be denoted by $\lambda_{n,k_1,\dots,k_r;s}$ for n > 2.

In order to relate the merge rates $\lambda_{n,k_1,\dots,k_r;s}$ to the distribution of νi , one notes that after conditioning on $\{\nu_i\}_{i = 1}^{N_{\zeta}}$ the chance that r groups of sizes $k_1,\dots,k_r$ each descended from individuals $1,\dots,r$ in the previous generation is

It follows that

Equation (3)

In particular, if this limit exists and $c_{N}^{-1}\to \infty$, the time-rescaled coalescent $\Psi_{n}^{N}(t)$ converges to a process $\Psi_{n}(t)$ with merger rates given by equation (3). The rates for s > 0 can be obtained via a certain recursive relation discussed in [Sch01].

Of particular relevance for our results are those coalescent processes for which there are multiple mergers, but not simultaneous multiple mergers—in this case $\lambda_{n,k_1,\dots,k_r,s} = 0$ unless r = 1. Such coalescents are known as Λ-coalescents and are defined by the merger rates

Equation (4)

Any rates defined in this way will satisfy the consistency condition

Equation (5)

which is proved in [Pit99]. The intuition behind equation (5) is as follows: the difference between the rate to see k mergers in a sample of size n and n + 1 is entirely caused by mergers of k + 1 lineages in the sample of n + 1, since these look like k mergers when restricted to the sample of size n. Pitman also proved that any triangular array $\{\lambda_{n,k}\}_{n = 1,\dots,\infty,k = 1,\dots,n}$ satisfying equation (5) has a representation in terms of a positive measure $\Lambda:[0,1]\to \mathbb{R}_{\unicode{x2A7E}0}$ via the relation

Equation (6)

Thus, there is a correspondence between the coalescent process and positive measures on the unit interval.

A simple criterion for the convergence to a Λ-coalescent is found by noting that if $\lambda_{2,2,2;s} = 0$ then there are no simultaneous multiple mergers, since these would contribute to this rate. Therefore, if $c_N \to 0$ and

Equation (7)

the limit process of the genealogy $(\Psi_{n,k}^{N})_{k\unicode{x2A7E}0}$ is a Λ-coalescent [MS01]. The following proposition, which follows from propositions 1 and 3 of [Sch03], summarizes these observations.

Proposition 1. If $c_{N} \to 0$ and equation (7) is satisfied, then as $N \to \infty$, $(\Psi_n^N(t))_{t\unicode{x2A7E}0} = (\Psi_{n,\lfloor t /c_{N} \rfloor}^{N})_{t\unicode{x2A7E}0}$ converges (in the Skorohod sense) to a Λ-coalescent $(\Psi_{n}(t))_{t\unicode{x2A7E}0}$ with merger rates $\lambda_{n,k}$ given by equation (4) for k = n and equation (5) otherwise.

2.2. Exponential offspring model

In order to study the situation where the variation in offspring numbers is finite, but large relative to the population size, we set

Equation (8)

where $\{\Phi_i\}_{i = 1}^N$ are i.i.d. and ζ is a deterministic parameter. Since we are interested in the large noise limit, we will eventually take $\zeta \to \infty$. In the remainder of this paper, in order to avoid ambiguity, we will replace i with 1 when referencing elements of an exchangeable random vector. Note that, technically, U1 should take values in $\mathbb{Z}$; however, in the large noise limit the distinction is not relevant. Moreover, taking $\zeta \to \infty$ implies $\mathbb{E}[U_1]\to \infty$, so we can assume that

Equation (9)

Offspring distribution of the form (8) were also considered in [SJHW23] within the context of a model including heritability and [CGCSWB22] in a model of dormancy.

It is biologically sensible to work with variation on an exponential scale whenever the organisms in question proliferate exponentially between bottlenecks. In this context, $\zeta \Phi_i$ is the product of the duration of growth between bottlenecks and the time-averaged growth rate for the offspring of the ith cell. For example, imagine an asexual population that is subject to successive cycles of growth and dilution in a heterogeneous environment. An example is a microbial pathogen, such as Mycobacterium tuberculosis [Gag18, FVJH+21]. Suppose that after passing through a bottleneck, each of the N cells occupies a spatially distinct location. Then, due to heterogeneity in the environment, the offspring of each cell will proliferate at different rates, $\Phi_i$. If ζ is the duration of the growth phase, the total number of offspring from the ith cell (before resampling) will be of the form (8). Alternatively, one could imagine that it is the duration of the growth phase which is random—for example, due to a period of dormancy before growth—while the growth rates are fixed. In this case we would use ζ to denote the growth rates and $\Phi_i$ the duration of the growth phase. Such a dormancy model was studied in [CGCSWB22].

We focus on the case where Φ1 has a stretched exponential distribution, which essentially means that the large deviations of Φ1 are scale-invariant. Specifically, let

We assume $W(\phi)$ is smooth and bounded as φ → 0 and, inspired by [Eis83, BABM05, RAT15], assume

Equation (10)

as $\phi \to \infty$. The pre-factor of $1/q$ is arbitrary, since this can be absorbed into ζ, but this particular form leads to some more elegant analytical formula. Note that we can always set $\mathbb{E}[\Phi_1] = 0$ without loss of generality, since the contribution from ${\mathrm{e}}^{\zeta \mathbb{E}[\Phi_1]}$ cancels in the ratio $U_1/S_N$. Note that offspring distributions satisfying equations (9) and (10) include the special cases where U1 is exponential $(q = 1)$ and lognormal $(q = 2)$.

2.3. Known results for heavy-tailed offspring distributions

We now return to the Cannings model with offspring distributions given by equation (10). When q = 1, $\Phi_i$ has exponential tails and the offspring sizes, Ui , have power law tails

Equation (11)

for $\alpha_1 = 1/\zeta$ and some constant C > 0. The large N limit of the Cannings model with power law offspring (and ζ fixed) is covered by the main result of [Sch03] and theorem 1.3 of [CGCSWB22], which we have reproduced below in a slightly abbreviated form.

Theorem 1 (Theorem 4 from [Sch03]). Assuming equation (11) holds, then as $N \to \infty$ we have

  • When $2 \unicode{x2A7D} \alpha_1$, $\Psi_{n,\lfloor t /c_N \rfloor}^N$ converges to the Λ-coalescent with $\Lambda = \delta$, which is called the Kingman coalescent.
  • For $1\unicode{x2A7D}\alpha_1\lt2$, $\Psi_{n,\lfloor t /c_N \rfloor}^N$ converges a Λ-coalescent where Λ given by a β-distribution $\textrm{Beta}(2-\alpha_1,\alpha_1)$. It follows from equation (6) that the merger rates are
    Equation (12)
    where $B(a,b) = \Gamma(a)\Gamma(b)/\Gamma(a+b)$ is the beta function. This process is called a β-coalescent with parameter α1.
  • For $\alpha_1\lt1$, lineages will coalesce in $O(N^0)$ and hence there is a discrete time limit coalescent. We refer to [Sch03] for a detailed description of this process.

This theorem is closely related to the generalized central limit theorem (GCLT) for the sum SN —see [Hal18, OH21]. Recall that the GCLT tells us that there are two critical points for the limit law of SN , one where the central limit theorem (CLT) breaks down (α1 = 2) and another where the law of large numbers (LLN) breaks down (α1 = 1), see e.g. [Ami20b, Ami20a, Nol20]. In theorem 1, these critical points correspond to the appearance of multiple mergers and the disappearance of any continuous time limit process, respectively.

3. Limit coalescent for large but finite offspring variability

3.1. Scaling assumption

Our main result concerns the case where q > 1. We reiterate that in this case the variance is finite for all ζ and, therefore, in the large N limit (with ζ fixed) the convergence is to the Wright–Fisher diffusion for the allele frequencies and Kingman coalescent for the genealogies. For any fixed N, if ζ is large enough the Wright–Fisher diffusion/Kingman models are of course going to provide a very poor approximation. In order to better understand exactly what happens when ζ is large, but finite, we set $N = N_{\zeta}$ where Nζ grows with ζ in such a way that there is a well-defined limit process as $\zeta \to \infty$. As discussed in section 4, the appropriate scaling is related to the thermodynamic limit of the REM [Eis83, BABM05].

To state our scaling assumption, we define the cumulant generating function,

To simplify some formulas later on, we define $A_{\zeta} \equiv \mathbb{E}[S_{\zeta}] = N_{\zeta}{\mathrm{e}}^{r_{\zeta}}$ with

The relationship between rζ and $W^*(z)$ is crucial for our analysis. Using the Laplace method [DB81], which amounts to evaluating $\mathbb{E}[{\mathrm{e}}^{\zeta \Phi_i}]$ where the integrand attains its maximum, it can be shown that rζ is asymptotically the convex conjugates of $W^*$. As a result, these functions are related according to the Legendre transform [Tou05]:

It then follows from equation (10) that

Equation (13)

where q' is the so-called dual exponent $q^{^{\prime}} = q/(q-1)$. The equivalence between equations (13) and (10) is a special case of Kasahara–de Bruijn's exponential Tauberian theorem [Mik99].

It follows from equation (13) that all the moments of Ui grow as $\zeta^{q^{^{\prime}}}$, since

Intuitively, if $\ln N_{\zeta}$ grows slower than $\zeta^{q^{^{\prime}}}$ then Nζ cannot keep up with the variation as ζ increases and no continuous time limit process will exist. This serves as motivation for the scaling assumption,

Equation (14)

Here, τ is a control parameter closely related to temperature in the REM. Limit theorems for the sum Sζ under this scaling assumption are proved in [BABM05]. Much like in the GCLT, Sζ has two critical points, one where the CLT breaks down and one where the LLN breaks down. Moreover, the limit law of Sζ is α-stable (see theorem 4 in section 4), which strongly suggests a connection between the limit coalescent processes obtained from offspring distributions of the form (8) under the scaling assumption (14) and those described by theorem 1. We note that sums over random exponentials also appear in statistical applications [LGA20, RAT15, SSO12].

3.2. Main result

The following theorem generalizes theorem 1 to the case q > 1. Interestingly, we find that the β-coalescent emerges universally from exponentially large offspring variation.

Theorem 2. Assume equations (10) and (14) hold with q > 1. Let

Equation (15)

and $\Psi_{n,k}^{\zeta} = \Psi_{n,k}^{N_{\zeta}}$. Then

  • For $1\lt\alpha_q\lt2$, as $\zeta \to \infty$ the discrete coalescent process $(\Psi_{n,\lfloor t /c_{\zeta}\rfloor}^{\zeta})_{t\unicode{x2A7E}0}$ converges to a β-coalescent with parameter αq .
  • For $\alpha_q \gt2$, $(\Psi_{n,\lfloor t /c_{\zeta}\rfloor}^{\zeta})_{t\unicode{x2A7E}0}$ converges to the Kingman coalescent.

In figure 2, the region $1\lt\alpha_q\lt2$ is plotted for various values of q. By moving upwards through these regions, we obtain β-coalescents. As expected, for larger q, the region becomes more tilted towards the right, since a larger value of ζ is needed to obtain a continuous-time limit process for the same N. As q → 1, the region becomes a vertical strip between $1/\zeta = 1$ and $1/\zeta = 2$; hence, theorem 1 is retrieved in this limit. The regime $\alpha_q \unicode{x2A7D} 1$ requires a different treatment not covered by this result, but is closely related to calculations of the Gibbs measure for the REM at low temperature, as we discuss in section 4.

Figure 2.

Figure 2. (left) The region $1\lt\alpha\lt2$ in $\ln N-\zeta$ space for different values of q. (right) A diagram of the idea behind the definition of α.

Standard image High-resolution image

We will discuss theorem 2 in section 5. Here, we provide an outline of the derivation that will be useful when making the comparison to results for the REM. Following [Sch03], note that we can replace ν1 with $NU_1/S_N$ when computing averages (this is justified by lemma 3). Therefore,

Equation (16)

An LLN for Sζ (lemma 2) allows us to replace Sζ with $U_1 + N_{\zeta}{\mathrm{e}}^{r_{\zeta}}$ (this is justified by lemma 6). Hence, letting $f_U(u)$ denote the density of Ui and replacing sums with integrals (this is justified rigorously by [Sch03, lemma 12]), we have

Equation (17)

Equation (18)

To obtain the last equation we have changed variables $z = u/A_{\zeta}$. If fU is a power law, then the integral can be evaluated and is given in terms of Γ-functions—this is one way to prove theorem 1, although a different approach is taken in [Sch03]. The essence of our argument is that when we evaluate this integral we can replace $f_U(zA_{\zeta})$ with ${\mathrm{e}}^{W^*(\zeta^{-1}(\ln A_{\zeta} z))}$ and then neglect the higher-order terms in

Here, we have used the relations

to simplify the exponents of ζ. Since the coefficient of $\ln z$ is independent of ζ, fU may be approximated by a decaying power law with exponent αq , defined by equation (15). Making this replacement and evaluating the integral leads to the following lemma, which says that $\mathbb{E}\left[(\nu_1)_k\right]$ is dominated by the event $U_1\gt N$ when $k\gt\alpha_q$.

Lemma 1. For $1\lt\alpha_q\lt k$, $k \in {\mathbb{N}}$ we have

Equation (19)

If $\alpha_q\lt2$, then from lemma 1 we have

and then from equation (4) and another application of lemma 1,

These are precisely the merger rates of the β-coalescent and due to the consistency condition (5), uniquely determine the rates $\lambda_{n,k}$ for k < n.

On the other hand, when $\alpha_q \gt k$, $\mathbb{E}[\nu^k]$ is no longer dominated by the tail and we can replace Sζ with Aζ in equation (16), leading to

Thus if $\alpha_q\gt 2$,

Equation (20)

In this regime, cζ decays slower than $N^{1-n}\mathbb{E}[\nu_1^n]$ for all n > 2 and hence all the $\lambda_{n,n}$ except $\lambda_{2,2}$ vanish.

3.3. Relationship to previous results for the dormancy model

We now remark on the relationship between theorem 2 and the results of [CGCSWB22]. In their model, it is assumed that at the beginning of each generation (referred to as the spring in [CGCSWB22]), individuals experience a period of dormancy during which no reproduction occurs. At random times, individuals awaken from dormancy and reproduce according to a Yule process—that is, new individuals are spawned from existing ones at a (deterministic) rate until the end of the generation. The authors also allow for a period (called the summer) during which all individuals are awake and reproducing, although we will neglect this for the present discussion.

To make the connection to our model, let ζ be the rate at which cells divide after the period of dormancy has ended and let $\Phi_i$ denote the duration of the ith cell's growth phase (i.e. the difference between the total time between bottlenecks and the dormancy period). In the dormancy model, it follows from properties of Yule processes that $U_1|\Phi_1$ is a geometric random variable with parameter ${\mathrm{e}}^{-\zeta \Phi_1}$, hence

and $E[U_1|\Phi_1] = {\mathrm{e}}^{\zeta \Phi_1}$. In contrast, in our model U1 is deterministic after conditioning on Φ1. This distinction should have no effect on the limit coalescent under our scaling assumption, since $\left(1 - {\mathrm{e}}^{-\zeta \Phi_1} \right)^{\lfloor {\mathrm{e}}^{\zeta \phi} \rfloor}$ can be replaced with $1_{\Phi_1\gt\phi}$ in the large ζ limit, and therefore

At least heuristically, this justifies the replacement of U1 with ${\mathrm{e}}^{\zeta \Phi_1}$ when calculating the asymptotic behavior of the coalescent process. However, we have neglected variation $U_1|\Phi_1$ in order to simplify the derivations in the present paper.

The main results of [CGCSWB22] concern the limit genealogies and are closely related to ours. Theorem 1.3 concerns the situation where the period of growth after dormancy is exponentially distributed and is therefore very similar to theorem 1 in the present paper. Their more general result, stated below, characterizes the possible forms of Λ which can emerge as limits of the discrete coalescents in the dormancy model.

Theorem 3 (Proposition 1.8 and theorem 1.7 from [CGCSWB22]). Suppose Ui is of the form (8). For any Λ-coalescent that emerges as the limit of $(\Psi_{n,\lfloor t/c_{\zeta} \rfloor}^{\zeta})_{t\unicode{x2A7E}0}$ in the dormancy model described above, Λ has the form

where δx is the point mass at x and h is a probability density on (0, 1) with the representation

Equation (21)

for a completely monotone function g satisfying

Our main result, theorem 2, says that under the scaling assumption (14) the limit coalescent is of the form described by theorem 3 with $g(v) = v^{-1-\alpha_q}$ and $b_0 = b_1 = 0$.

4. Interpretation in the REM

4.1. Background on the REM

The REM was introduced by Derrida in [Der81] as a toy model of spin glasses. As with other models of magnetic systems, the state space is the n-hypercube, ${\mathcal{C}}_n \equiv \{-1,1\}^{n}$ and each configuration $\sigma \in {\mathcal{C}}_n$ is assigned an energy, Eσ . When in equilibrium with a reservoir of temperature T, the steady-state distribution of configurations is given by the Boltzmann distribution $P_{\sigma} \propto {\mathrm{e}}^{-\beta E_{\sigma}}$ where $\beta^{-1} = T$ is the inverse temperature (assuming $k_{\mathrm{B}} = 1$ for notational simplicity). The quantity of central interest in (equilibrium) statistical mechanics is the free energy,

Equation (22)

It is said that the thermodynamic limit exists if the limit of $F_n/n$ exists and, by differentiating the free energy density, $\psi \equiv -\lim_{n \to \infty} F_n/(\beta n)$, with respect to temperature (or other model parameters), various thermodynamic relations are obtained [Gol18, Bov06]. In spin glass models, Eσ is itself taken to be a random variable and the free energy density is computed from the mean free energy, $\mathbb{E}[F_n]$. The hope is always that the thermodynamics emerging from a random energy field are typical of systems with very unstructured energy landscapes. For mathematical convenience $\{E_{\sigma}\}_{\sigma \in {\mathcal{C}}_n}$ is usually taken to be a Gaussian random field and the simplest such model is of course found by taking Eσ to be i.i.d. In order for the thermodynamic limit to exist in this case, the variance of Eσ must grow proportional to n—this is precisely Derrida's REM, which can be understood as the limit of a very rugged energy landscape.

Since there are no correlations between energies, we can abandon the hypercube structure and identify the state space with $[2^n]$, writing Ei for the ith energy level. If $E_i/\sqrt{n}$ has variance $1/2$ and $\Phi_i$ are Gaussian with unit variance, we can map the partition function of the REM to the total number of offspring in the Cannings model by setting

where

Note that the negative sign in front of Ei is inconsequential because we have assumed $\mathbb{E}[\Phi_i] = 0$ and can simply change the sign in equation (10).

Interestingly, even in this simple model one finds a phase transition, which occurs at a critical value $\beta_c = 2\sqrt{\ln2}$ or in our notation, τ = 1 (with q = 2). The nature of this transition can be understood by examining the average number of configurations for which $E_i \in [n\varepsilon,n(\varepsilon + {\mathrm{d}}\varepsilon)]$, which we denote by ${\mathcal{N}}({\mathrm{d}}\varepsilon)$ [MM09]. Loosely speaking, ${\mathcal{N}}({\mathrm{d}}\varepsilon)$ is on the order of [Kis15]

This approximation makes sense only when $\varepsilon \lt \beta_c/\sqrt{2}$, since for large n there are virtually no configurations with $\varepsilon \gt\beta_c/\sqrt{2}$. With these heuristics, one can identify two regimes for the entropy density, $s(\varepsilon) \equiv \lim_{n \to \infty} \frac{1}{n}\ln {\mathcal{N}}({\mathrm{d}}\varepsilon)$: $s(\varepsilon) = \beta_c^2/2 - \varepsilon ^2$ when $|\varepsilon| \unicode{x2A7D} \sqrt{2}\beta_c$ and $s(\varepsilon) = -\infty$ when $|\varepsilon|\gt\sqrt{2}\beta_c$. Meanwhile, the free energy density can be obtained as

and hence $\psi(\beta)$ and $s(\varepsilon)$ are convex conjugates of each other 4 . A short calculation using the Laplace method yields Derrida's result:

Equation (23)

When $\beta\unicode{x2A7D}\beta_c$ the variation in energy is small enough that an LLN holds (lemma 2 below). Indeed, one way to arrive at $\psi(\beta)$ in the high-temperature phase is to replace $\mathbb{E}[\ln S]$ with $\ln \mathbb{E}[S]$, since

In this regime, we say that the system is self-averaging. On the other hand, when $\beta\gt\beta_c$, the system enters a so-called 'frozen' phase where the partition function is dominated by the extremal statistical weights.

The arguments above can be made rigorous and generalized to any exponent q > 1—for example, see [Eis83]. In particular, we have the following LLN.

Lemma 2 (Theorem 2.1 from [BABM05]). Let q > 1 and assume

Then for all ε > 0,

In the context of the Cannings model, the self-averaging of Sζ allows us to make the approximation used in equation (17) and ensures there is a continuous time limit process.

4.2. Limit theorem for the partition function

A richer mathematical structure of the REM is revealed by a careful study of the fluctuations in the partition function. This is closely related to the GCLT for i.i.d. sums over scale-invariant random variables, where one can distinguish between regimes of weak and strong self-averaging. The former refers to the case where there is an LLN, but no CLT for the sum. The precise limit theorem for Gaussian energy distributions is stated in [Bov06] and the extension to distributions of the form (10) can be found in [BABM05]. We now state (a slightly watered down) version of their result.

Theorem 4 (Theorem 2.3 from [BABM05]). Let

Equation (24)

and

Equation (25)

where Bζ is defined by

Equation (26)

  • For $1\lt\tilde{\alpha}_q\lt2$, Zζ converges in distribution to an α-stable random variable $Z_{\tilde{\alpha}}$ for which the characteristic function is
    Equation (27)
    where the parameter, $\tilde{\alpha}_q$, is given by equation (24).
  • For $\tilde{\alpha}_q\gt2$, Uζ obeys a central limit theorem, meaning that it converges to a Gaussian when rescaled by the standard deviation.

Briefly, the α-stable distributions mentioned in this result are defined by the property that they are equal in distribution to linear combinations of realizations of themselves: Z is α-stable if and only if there are constant $c,d,e$ such that $Z = cZ_1 + dZ_2 + e$ for two random variables Z1 and Z2 equal in distribution to Z. The GCLT states that α-stable distributions arise as limits of sums of i.i.d. random variables whose variances are not necessarily finite [Ami20b, Zol86]. Hence, our result plays the role of theorem 4 for the Cannings model by expanding theorem 1 to the 'large but finite' regime.

Much like theorem 2, the idea behind theorem 4 is that in the transition regime ($1\lt\tilde{\alpha}\lt2$), the sum will be dominated by the maximum. From equations (14) and (10), we have

The left-hand side will approach one as $\zeta \to \infty$, so in order to obtain a well-defined limit we need to look at deviations on a scale Bζ , which is increasing with ζ. To this end, we replace u with $uB_{\zeta}$ and make the approximation

Equation (28)

In order for this to have a large ζ limit, the first two terms must cancel, indicating that Bζ should satisfy equation (26). Since $B_{\zeta} = O(\zeta^{q^{^{\prime}}})$, it follows from equation (26) that the coefficient of $\ln u$ in equation (28) is independent of ζ and given by

Equation (29)

which simplifies to equation (24). This plays the role of α in our analysis of the coalescent process. The two parameters agree only at $\alpha = \tilde{\alpha} = 1$, which is the transition to the frozen state; see figure 3. Interestingly, this implies that the regime of weak-self averaging in theorem 4 ($1\lt\tilde{\alpha}_q\lt2$) does not exactly correspond to the regime of multiple merger coalescents in theorem 2 ($1\lt\alpha_q\lt2$), although the two coincide in the limit q → 1.

Figure 3.

Figure 3.  α compared to $\tilde{\alpha}$ as a function of the tail exponent q for different values of τ. The lower limit of the plot is the critical point α = 1 where both expressions agree. Above this point $\alpha\gt\tilde{\alpha}$ for all q.

Standard image High-resolution image

4.3. Gibbs measure

In addition to the partition function, there is an interest in understanding fluctuations in the statistical weights

Equation (30)

in the REM and other disordered systems models. In the large-ζ limit, these weights approach a measure on the infinite dimensional hypercube ${\mathcal{C}}_{\infty} = \{-1,1\}^{\mathbb{Z}}$ called the Gibbs measure—see [RAS15] for an introduction to this formalism. The question of how the Gibbs measure fluctuates between replicates of the energies is closely related to the coalescent process, since $\{U_i/S\}_{i = 1,\dots,N_{\zeta}}$ (asymptotically) have the same distribution as Pσ when the configurations are projected onto the one dimensional lattice $[2^n] = \{1,\dots,2^n\}$. The projected Gibbs measure approaches the Lebesgue measure on the unit interval in the high-temperature regime [Bov06]. However, the main focus in this context of the REM appears to have been the low-temperature regime, $\tilde{\alpha}_q\lt1$.

There is already an established connection between coalescent processes and the Gibbs measure via Derrida's generalized REM (GREM). In this model, the energies are no longer independent, but drawn from a Gaussian random field on ${\mathcal{C}}_n$ whose correlation function depends on a certain ultrametric distance between configurations—see for a precise description [Ber09]. This ultrametric distance induces a hierarchical structure to the configuration space. Ruelle gave a mathematical formulation of Derrida's models [Rue87] and the connection to coalescent processes was made in [BS98] in the context of their abstract cavity method. The authors show that by sampling configurations from the Gibbs measure and constructing a genealogical tree based on the hierarchies induced by the distance, one obtains (up to a time-change) a continuous time coalescent process now referred to as the Bolthausen-Sznitman coalescent. These processes is nothing but the β coalescent with $\tilde{\alpha}_q = 1$.

To understand what theorem 2 tells us about Pσ , suppose we sample two configurations i and j from the equilibrium distribution and define the replica overlap

Equation (31)

It is not too difficult to see that $\mathbb{E}[\rho_{1,2}]$ is asymptotic to the inverse coalescent time, cζ : First, notice that conditional on $\{\Phi_i\}_{i \in [N]}$ the distribution of $\varrho_{1,2}$ is (see [DM21])

Equation (32)

The joint distribution of the terms in the sum is asymptotic to that of $\{(\nu_i/N_{\zeta})^2\}_{i\in [N]}$. Therefore, recalling equation (2), we can see that $\mathbb{E}[\rho_{1,2}] \sim c_{\zeta}$. It is well known that in the high-temperature regime ($\tilde{\alpha}\gt1$)

Equation (33)

while in the low-temperature phase Derrida derives an expression in terms of the Beta functions.

Now suppose we sample configurations of the Gibbs measure from n replicates of the REM and consider the event that these configurations are not unique and instead $k_i\gt1$ come from configuration i for $i = 1,\dots,r$ with $\sum_{i = 1}^rk_i = n$. Such events are simply the analogues of $(k_1,\dots,k_r;0)$-collisions defined in section 2.1, and they occur with probability

Equation (34)

This is asymptotic to the expression appearing in the definition of $\lambda_{n,k_1,\dots,k_r;s}$ (equation (3)) and in the REM is related to the higher-order replica overlaps. We can then ask what the chance of these events is when we take $1/c_{\zeta}$ replicates, which yields $\lambda_{n,k_1,\dots,k_r;s}$. Therefore, theorem 2 tells us that there is a phase transition in the typical composition of our samples at the critical point α = 2. As we have explained above, this transition happens at a lower temperature (higher β) than the breakdown of the CLT for the partition function at $\tilde{\alpha}_q = 2$.

5. Proof of theorem 2

In order to simplify the notation in the proofs, we set $\alpha = \alpha_q$.

Proof of lemma 1. The idea of the proof is that most of the contributions to the moments of ν1 come from the event $\nu_1 \gt A_{\zeta}$. Let $f_{\Phi_1}$ denote the density of Φ1; hence

Define

By lemmas 4 and 6 in appendix A,

where we have changed variables $z = u/A_{\zeta}$. This is a step which breaks down for $\alpha\unicode{x2A7D} 1$, since lemma 6 uses lemma 2.

Now define $R_{\zeta}(z)$ and $K_{\zeta}(z)$ by

Equation (35)

Equation (36)

where

Equation (37)

Observe that $\hat{\alpha}_{\zeta}\sim \alpha$ and by equation (10) there is a constant C' such that

for large enough ζ.

Because $R_{\zeta}(z)\gt0$ for large enough ζ for all z,

Equation (38)

Equation (39)

Assume ζ is large enough that $|\hat{\alpha}_{\zeta}-\alpha|\lt\varepsilon$. Then

Equation (40)

As ε → 0, using that the logarithmic term vanishes,

Equation (41)

Equation (42)

Similarly, for large enough ζ and any L

Equation (43)

Equation (44)

Equation (45)

We want to take $L = L_{\zeta} \to \infty$ as $\zeta \to \infty$ in such a way that $R_{\zeta}(L_{\zeta})$ vanishes . To this end, we note that

Equation (46)

and by the definition of $W^*$ (equation (10)),

Equation (47)

As a result, the coefficient of $(\ln L_{\zeta}/\zeta)^n$ in the expansion of $R_{\zeta}(L_{\zeta})$ grows as a power law in ζ with exponent

In particular, if $\ln L_{\zeta} = \zeta^{c}$, then

where $n\unicode{x2A7E} 1$. Provided we pick $c \in (0,1/(q-1))$, $R_{\zeta}(L_{\zeta})$ will vanish as $\zeta \to \infty$. In summary, we have

Equation (48)

Finally, note that

Equation (49)

so

Equation (50)

 □

If $\alpha\gt k$, the integral derived above diverges and the bounds are no longer useful. The divergence comes from the very small values of z (meaning small Φ relative to Aζ ) when we make the linear approximation. In this case, we have

Equation (51)

Equation (52)

Note that $\left(k(1+\tau)- k^{q^{^{\prime}}}\right)r_1^*\unicode{x2A7E} W^*((1+\tau)r_1^*)$. In fact, these are equal exactly at $\alpha_q = 2$.

Proof of theorem 2. Lemma 1 implies

Equation (53)

Simplifying the exponent, we have

Equation (54)

Equation (55)

The prefactor of $\zeta^{q^{^{\prime}}}$ evaluates to 0 at α = 1. Since q > 1,

Equation (56)

so that $c_{\zeta}^{-1} \to \infty$ for α > 1.

The merger rates in equation (12) follow from lemma 1, so it remains to check (7). Applying lemmas 1 and 5,

Equation (57)

Equation (58)

Notice that we are again using the idea that the expectation is dominated by the event ${\mathrm{e}}^{\zeta \Phi_1}\gt A_{\zeta}$, since the final expression above is asymptotic to $(A^{^{\prime}})^2 N_{\zeta}^4\mathbb{P}(e^{\zeta \Phi_1}\gt A_{\zeta})^2$. It follows that

Equation (59)

When α > 2, lemma 1 and equation (20) implies

Equation (60)

It follows that $\lambda_{3,3}\to0$ and by equation (5) $\lambda_{n,n} = 0$ for n > 3. □

6. Discussion

In this article we have studied the asymptotics of genealogies in the Cannings model when both the population sizes and offspring variation are simultaneously taken to be large. Our results are not covered by previous results for scale-invariant offspring distribution, since in that setting the offspring variation is infinite for finite N. Our analysis rests on a certain scaling assumption under which the total number of offspring produced in a generation is equivalent to the partition function of the REM and the offspring numbers νi are related to the Gibbs measure.

Our main finding is that the β-coalescent—a previous studied model of coalescence in populations where variation in offspring is infinite—also emerge from models where the tail of the offspring distribution is thin, but large fluctuations are not too unlikely. This is related to the existing limit theorems for the Cannings model proved in [Sch03] in the same way that the fluctuation theorem in [BABM05] is related to the GCLT for i.i.d. sums. Our results do not describe the critical point and low-temperature regime, although at least for q = 2 these can likely be deduced from previous results on the REM—see [Bov06]. The limit coalescent processes are the discrete-time Ξ-coalescents with simultaneous multiple mergings of lineages described in [MS01].

Biologically, theorem 2 suggests the β-coalescent serves as a more universal description of neutral evolution in the presence of highly skewed offspring distributions than might be expected. This also indicates that little information about the demographic structure of a population is contained in the coalescent process itself.

Acknowledgments

I am thankful to Daniel Weissman, Takashi Okada, Linh Huynh, Oskar Hallatschek and Ariel Amir for helpful discussions and feedback on early versions of this work. Three anonymous referees provided detailed feedback which greatly improved the quality of this manuscript. Finally, I would like to thank the organizers of the conference 'Evolution Beyond the Classical Limits' held at the Banff International Research Station (BIRS), where preliminary work on this manuscript was carried out.

Appendix: Additional Lemmas

We will assume $U_i\gt1$ and therefore $S_N\gt N$. It is straightforward to obtain the results for the more general case where only equation (9) holds.

Lemma 3. For $0\lt \varepsilon \lt x$

Equation (A.1)

and

Equation (A.2)

Proof. Following the proof of [Sch03, theorem 4], we condition on $\{X_i\}$ to obtain

Using that $\nu_1|\{X_i\}$ has a hypergeometric distribution with parameters $(S,X_1,N)$,

Since T grows sub-exponentially with N,

which is the upper bound.

The lower bound is obtained similarly using the variable $1_{\{X_1/S\unicode{x2A7E} x+\varepsilon\}}$. □

Lemma 4 (Similar to lemma 6 of [Sch03]). For $r\unicode{x2A7E} 1$ and $k_1,\dots,k_r\unicode{x2A7E} 2$,

Proof. Let $M_{k_1,k_2}^i$ be the event that individuals k1 through k2 descent from individual i in the previous generation and set

As $N \to \infty$ (or $\zeta \to \infty$),

Since $S_N \unicode{x2A7E} N$,

Equation (A.3)

which implies the result. □

Lemma 5 (Similar to Lemma 8 of [Sch03]). For each $k\unicode{x2A7E}1$ there is a constant Bk such that

Proof. By lemma 4,

By the LLN

for large enough ζ; therefore,

Equation (A.4)

Equation (A.5)

The result follows from

Equation (A.6)

 □

Lemma 6. Assume equations (10) and (14). Then for $1\lt\alpha$

Proof. The idea of the proof is based on [Sch03] combined with lemma 2. By the lemma 2, for any ε > 0 and δ such that $(1-\delta)\mathbb{E}[X_{i,N}]\gt1$, we can pick ζ large enough that

Equation (A.7)

If δ and N are such that equation (A.7) holds,

and

Equation (A.8)

The result follows after taking $\varepsilon,\delta \to 0$. □

Footnotes

  • The term generation is used to refer to a time-step in the Cannings model.

  • Let ${\mathbb{D}}_{{\mathcal{P}}_n}[0,\infty)$ denote the space of cádlág functions (right continuous functions whose left limit exits) from $[0,\infty)$ to ${\mathcal{P}}_n$. The results of [MS01] concern instances where $(\Psi_{n}^N(t))_{t\unicode{x2A7E}0}$ has a weak limit ${\mathbb{D}}_{{\mathcal{P}}_n}[0,\infty)$ under the Skorohod toplogy J1.

  • In the terminology of large deviation theory, $\phi(\beta)$ is the scaled cumulant generating function associated with the normalized energy ε, whose large deviation rate function is $s(\varepsilon)$ [Tou09].

Please wait… references are loading.