Coalescent processes emerging from large deviations

The classical model for the genealogies of a neutrally evolving population in a fixed environment is due to Kingman. Kingman's coalescent process, which produces a binary tree, universally emerges from many microscopic models in which the variance in the number of offspring is finite. It is understood that power-law offspring distributions with infinite variance can result in a very different type of coalescent structure with merging of more than two lineages. Here we investigate the regime where the variance of the offspring distribution is finite but comparable to the population size. This is achieved by studying a model in which the log offspring sizes have a stretched exponential form. Such offspring distributions are motivated by biology, where they emerge from a toy model of growth in a heterogenous environment, but also mathematics and statistical physics, where limit theorems and phase transitions for sums over random exponentials have received considerable attention due to their appearance in the partition function of Derrida's Random Energy Model (REM). We find that the limit coalescent is a $\beta$-coalescent -- a previously studied model emerging from evolutionary dynamics models with heavy-tailed offspring distributions. We also discuss the connection to previous results on the REM.


Introduction
Evolution is, in large part, shaped by a tension between two opposing "forces": neutral genetic drift and selection.Neutral genetic drift refers to random changes in the genetic composition of a population due to chance events in the deaths and reproduction of individuals.Selection, in contrast, results from a deterministic bias towards fitter individuals.A major objective of evolutionary biology is to determine the relative impact of these forces.In order to achieve this, we need to understand how different microscopic mechanisms manifest in macroscopic observables, such that the frequency of a genotype or the shape of the genealogical tree.
Much of our understanding of the interplay between genetic drift and selection comes from the Wright-Fisher model [NW18,DD08], or its counterpart with overlapping generations, the Moran model.In both models, the source of noise is the random sampling of individuals from generation to generation.In the large population size limit this sampling noise has a variance which is inversely proportional to N , the population size.When genetic drift is the sole source of changes in the composition of the population -that is, in neutrally evolving populations -N sets the characteristic time-scale of the evolutionary dynamics.
The true population size is rarely related in a simple way to the variance in genotype frequency, as predicted by the Wright-Fisher model.Therefore, one usually thinks of an N as an effective population size measuring the overall strength of genetic drift, rather than the literal number of cells.This has motivated the question: What determines the effective population size?Of particular interest is the question of how nongenetic variation between individuals shapes the effective population size [JLA23,LMKA21].J.H. Gillespie was one of the first to address this question by considering a model in which the number of offspring from each individual is a random variable having finite variance.He showed that in this context the effective population size is obtained by scaling the true population size by the variance in offspring [Gil73,SK12,Sch15].
We now have a more general and mathematically rigorous understanding of this problem which is based on the Cannings model [Can74].Beginning with N labeled cells, trajectories of the Cannings model are constructed by generating an exchangeable random vector (ν 1,k , . . ., ν N,k ) satisfying N i=1 ν i,k = N for each k ∈ N. ν i,k represents the number of offspring in generation k which descend from individual i in the (k − 1)th time-step.We will henceforth omit the subscript k and it should be understood that all quantities associated with a generation are implicitly dependent on k. ‡ Inspired by [Sch03,Hal18,OH21], we will focus on the particular case where ν i is obtained by first generating iid random variables {U i } N i=1 representing the number of offspring of each individual before resampling, and then sampling the resulting offspring pool, without replacement, to obtain the individuals in the next generation.Conditional on the offspring numbers U 1 , . . ., U N , ν i follows a hypergeometric distribution: This formulation bears a close resemblance to the model studied by Gillespie [Gil73] and is an example of a Generalized Wright-Fisher model defined in [DEP11].
One limit theorem for the Cannings model concerns the behavior of genotype frequencies over long time-scales in large populations.By long time-scales, we mean on the order of the coalescent time, defined as the average number of generations we must travel backwards in time to find a common ancestor for two randomly selected individuals in the same generation.The coalescent time is equivalent to the effective population size in some cases, but is more general in the sense that the definition does not require a mapping to the Wright-Fisher model.Under the assumption that the variance in the number of offspring produced by each individual is finite, the time-rescaled genotype frequencies converge as N → ∞ (in the Skorokhod sense -see [Ker22,EK09]) to the well-known Wright-Fisher diffusion (WFD).[MS01,Gil74].In the WFD, the change in the frequency, X(t) of a genotype over a time interval dt ≪ 1 is Normally distributed with mean zero and variance X(t)(1 − X(t))dt.
There is similar limit Theorem, due to Kingman [Kin82], for the genealogical trees in the Cannings model.These trees can be generated by the stochastic process known as a coalescent processes which, roughly speaking, is specifies which lineages have merged k generations back in time from our original sample.Under the condition that the genotype frequencies converge to the WFD, the time-rescaled coalescent process converges to a continuous time process known now as the Kingman Coalescent.The genealogical tree produced by this process is almost surely a binary tree where pairs of lineages merge at a rate one, and importantly, the probably of more than two lineages merging in a single instant is zero.Similar results exist for other microscopic models (e.g. the Moran process) and a large body of work focuses on understanding how the microscopic details of the process shape the coalescent time, or effective population size [Gil74,JLA23,Cha09,Gil04,Gil01].
The WFD/Kingman models are not always sufficient to capture the dynamics of real evolution.This is due to, for example, multiple merger coalescents appearing in experimental data [TL14,SHJ19].In multiple merger coalescents there is a non-negligible probability to observe more than two individuals sharing a common ancestor in a single unit of time (e.g. a generation in the model) on time-scales of order of the coalescent time.In [Sch03], Schweinsberg explored the question of whether such genealogies can emerge from neutral evolution by studying the Cannings model with power-law distributions.A number of papers have since investigated the role of power-law tail offspring distribution in generating multiple merger coalescents and non-diffusive genotype frequency fluctuations [Hal18,OH21,CGCSWB22].The genotype frequency dynamics which emerge from power law offspring -known as Λ-Fleming Viot processes -are non-diffusive processes which, in general, have discontinuous sample paths.With the notable exception of [CGCSWB22], most previous work has focused on how multiple merger coalescents emerge when the variance in offspring is infinite.The assumption of infinite variance is a mathematical convenience, which may be justified in some cases e.g. for certain models of dormancy [CGCSWB22, WV19] and rare mutations [LD43].What remains unclear is which coalescent processes emerge when the offspring distribution has a variance which is finite, yet large enough relative to the population size to give rise to multiple merger coalescents.
In this paper, in order to understand the role of large but finite offspring variability, we investigate the limit processes which emerge when the population size and offspring variability are simultaneously taken to be large.Similar limits were investigated in [CGCSWB22], but we focus on a more specific scaling between population size and offspring variability, which allows us to obtain precise descriptions of the limit coalescents.The offspring distribution we consider and our scaling assumption are both inspired by prior work on the Random Energy Model (REM) of disordered systems.
Our main result (Theorem 2) says that the limit processes emerging form the genealogies of the Cannings model under this scaling limit, called β-coalescents, are the same as the coalescent processes emerging from power law offspring.Our model is parameterized by a scaling rate which is analogous to the temperature of the REM.Just as in the REM, we find there are two critical points.Below the lower critical point there is no continuous time limit process, while between the two critical points one finds multiple merger coalescents.However, while the lower critical point corresponds exactly to the lower critical point of the REM marking the transition to the "frozen" state, the upper critical point does have the obvious interpretation in the context of the REM (which would be to separate the regimes of strong and weak-self averaging of the partition function).Our results complement and expand upon the existing connection between coalescent theory and the REM, which was made by Bolthausen and Sznitman in [BS98].

Organization of this paper
This paper is organized as follows.In Section 2.1 we describe the model under consideration, which is a particular instance of the Cannings model.In Sections 2.2 and 2.3 we review some background of coalescent theory and review what is known about this model.In Sections 3 we present our main result which concerns the limiting coalescent when both the population size and offspring variation are taken to be large.We also discuss the relationship between our results and [CGCSWB22].Section 4 is devoted to the Random Energy Model and the connection between our results and the thermodynamic limit of the REM.

Background
Throughout this paper, we use the following standard notation.

Exponential offspring model
In order to study the situation where variation in offspring numbers is finite, but large relative to the population size, we set where {Φ i } N i=1 are iid and ζ is a deterministic parameter.Since we are interested in the large noise limit, we will eventually take ζ → ∞.In the remainder of this paper, in order to avoid ambiguity we will often replace i with 1 when referencing elements of an exchangeable random vector.Note that, technically, U 1 should take values in Z, however in the large noise limit the distinction is not relevant.Moreover, taking ζ → ∞ implies E[U 1 ] → ∞, so we can assume that (3) Offspring distribution of the form 2 were also considered in [SJHW23] within the context of a model including heritability and [CGCSWB22] in a model of dormancy.
It is biologically sensible to work with variation on an exponential scale whenever the organisms in question proliferate exponentially between bottlenecks.In this context, ζΦ i is the product of the duration of growth between bottlenecks and the time-averaged growth rate for the offspring of the ith cell.For example, imagine an asexual population that is subject to successive cycles of growth and dilution in a heterogeneous environment.An example is a microbial pathogen, such as Mycobacterium tuberculosis [Gag18, FVJH + 21].Suppose that after passing through a bottleneck, each of the N cells occupies a spatially distinct location.Then, due to heterogeneity in the environment, the offspring of each cell will proliferate at different rates, Φ i .If ζ is the duration of the growth phase, the total number of offspring from the ith cell (before resampling) will be of the form 2. Alternatively, one could imagine that it is the duration of the growth phase which is random -for example, due to a period of dormancy before growth -while the growth rates are fixed.In this case we would use ζ to denote the growth rates and Φ i the duration of the growth phase.Such a dormancy model was studied in [CGCSWB22].
We focus on the case where Φ 1 has a stretched exponential distribution, which essentially means the large deviations of Φ 1 are scale-invariant.To be precise, let We assume W (ϕ) is smooth and bounded as ϕ → 0 and, inspired by [Eis83,BABM05], assume as ϕ → ∞.The pre-factor of 1/q is arbitrary, since this could be absorbed into ζ, but this particular form will lead to some more elegant analytical formula.Note that we can always set E[Φ 1 ] = 0 without loss of generality, since the contribution from e ζE[Φ1] cancels in the ratio U 1 /S N .Note that offspring distribution satisfying Equations 3 and 4 include the special cases where U 1 is exponential (q = 1) and lognormal (q = 2).

Characterization of limit coalescent processes
For a realization of the Cannings model with population size N , the corresponding discrete coalescent process on a sample of size n, denoted by (Ψ N n,k ) k≥0 , describes the genealogical tree obtained by following each sampled individual's ancestors back through time and grouping branches when individuals share a common ancestor.This process is shown in Figure 1.We can intuitively understand Ψ N n,k as the state of this tree k generations back from the time our original sample was taken.To define Ψ N n,k more mathematically, let P n denote the space of partitions on are in the same block of the partition if and only if the ith and jth individuals in the original sample share an ancestor k generation in the past.
The continuous time coalescent processes which emerge as limits of Ψ N n,k from any exchangeable Cannings model have been characterized in [MS01].The authors consider the large-N limit of the time-scaled coalescent process where c N is the probability two random selected individuals in one generation share an ancestor in the previous generation.Observe that c −1 N is simply the coalescent time, since the number of generations to the most recent common ancestor of two random selected individuals from the current generation follows a geometric distribution with parameter c N .Conditional on {ν i }, the chance for two individuals to both be descendants of the first individual in the previous generation is (ν 1 /N )(ν 1 − 1)/(N − 1).Multiplying by N and averaging over all possible {ν i } N i=1 gives We emphasize that, given the distribution of U i , it is not straightforward to compute c N because it has a nonlinear dependence on the sum S N .The precise notion of convergence considered is in the Skorohod sense §-see [EK09] for details.If (Ψ n (t)) t≥0 converges to a continuous time process (Ψ N n (t)) t≥0 in the Skorohod sense, the possible transitions that can occur in the process Ψ n (t) involve r i=1 k i of the n = r i=1 k i + s blocks in P n merging into r lineages by collapsing into groups of sizes k 1 , . . ., k r while s lineages remain unchanged.Such events are called (n, k 1 , . . ., k r ; s)-collisions.The rates of these collisions, which uniquely determine the law of Ψ n (t), will be denoted by λ n,k1,...,kr;s for n > 2.
In order to relate the merge rates λ n,k1,...,kr;s to the distribution of ν i , one notes that after conditioning on {ν i } N ζ i=1 the chance that r groups of sizes k 1 , . . ., k r each descended from individuals 1, . . ., r in the previous generation is (6) § Let D Pn [0, ∞) denote the space of cádlág functions (right continuous functions whose left limit exits) from [0, ∞) to Pn.The results of [MS01] concern instances where (Ψ N n (t)) t≥0 has a weak limit D Pn [0, ∞) under the Skorohod toplogy J 1 .
Forward process In particular, if this limit exists and c −1 N → ∞, the time-rescaled coalescent Ψ N n (t) converges to a process Ψ n (t) with merger rates given by Equation 6.The rates for s > 0 can be obtained via a certain recursive relation discussed in [Sch01].
Of particular relevance for our results are those coalescent processes for which there are multiple mergers, but not simultaneous multiple mergers -in this case λ n,k1,...,kr,s = 0 unless r = 1.Such coalescents are known as Λ-coalescents and are defined by the merger rates Any rates defined in this way will satisfy the consistency condition which is proved in [Pit99].The intuition behind Equation 8 is: The difference between the rate to see k mergers in a sample of size n and n + 1 is entirely caused by mergers of k + 1 lineages in the sample of n + 1, since these look like k mergers when restricted to the sample of size n.Pitman also proved that any triangular array {λ n,k } n=1,...,∞,k=1,...,n satisfying Equation 8 has a representation in terms of a positive measure Λ : [0, 1] → R ≥0 via the relation Thus, there is a correspondence between coalescent process and positive measures on the unit interval.
A simple criteria for the convergence to a Λ-coalescent is found by noting that if λ 2,2,2;s = 0 then there are no simultaneous multiple mergers, since these would contribute to this rate.Therefore, if c N → 0 and lim N 2 = 0 (10) the limit process of the genealogy (Ψ N n,k ) k≥0 is a Λ-coalescent [MS01].The following proposition, which follows from Propositions 1 and 3 of [Sch03], summarizes these observations.Proposition 1.If c N → 0 and Equation 10 is satisfied, then as N → ∞, (Ψ N n (t)) t≥0 = (Ψ N n,⌊t/c N ⌋ ) t≥0 converges (in the Skorohod sense) to a Λ-coalescent (Ψ n (t)) t≥0 with merger rates λ n,k given by Equation 7for k = n and Equation 8 otherwise.

Known result for q = 1
We now return to the Cannings model with offspring distributions given by Equation 4. When q = 1, Φ i has exponential tails and the offspring sizes, U i , have power law tails for α 1 = 1/ζ and some constant C > 0. The large N limit of the Cannings model with power law offspring (and ζ fixed) is covered by the main result of [Sch03], which we have stated below in an abbreviated form Theorem 1. Assuming Equation 2.3 holds, then as N → ∞ we have where Λ given by a β distribution Beta(2 − α 1 , α 1 ).It follows from Equation 9 that the merger rates are where B(a, b) = Γ(a)Γ(b)/Γ(a + b) is the beta function.This process is a called a β-coalescent with parameter α 1 .• For α 1 < 1, lineages will coalesce in O(N 0 ) and hence there is no continuous time limit process for any rescaling of time.We refer to [Sch03] for a detailed description of this process.
This Theorem is closely related to the Generalized CLT (GCLT) for the sum S N -see [Hal18,OH21].Recall that the GCLT tells us there are two critical points for the limit law of S N , one where the CLT breaks down (α 1 = 2) and another where the LLN breaks down(α 1 = 1) -see e.g.[Ami20b, Ami20a, Nol20].In Theorem 1, these critical points correspond to the appearance of multiple mergers and a disappearance of any continuous time limit process respectively.
3 Limit coalescent for large but finite offspring variability

Scaling assumption
Our main result concerns the case where q > 1.We reiterate that in this case the variance is finite for all ζ, and therefore in the large N limit (with ζ fixed) the convergence is to the WFD for the Allele frequencies and Kingman Coalescent for the genealogies.For any fixed N , if ζ is large enough the WFD/Kingman models are of course going to provide a very poor approximation.In order to better understand exactly what happens when ζ is large, but finite, we set N = N ζ where N ζ grows with ζ in such a way that there is a well-defined limit process as ζ → ∞.The appropriate scaling is related to the thermodynamic limit of the REM [Eis83,BABM05]; see Section 4.
To state our scaling assumption, we define the cumulant generating function, To simplify some formulas later on, we define The relationship between r ζ and W * (z) is crucial for our analysis.Using the Laplace method [DB81], which amounts to evaluating E[e ζΦi ] where the integral attains its maximum, it can be shown that r ζ is asymptotically the convex conjugates of W * .As a result, these functions are related according to the Legendre transform [Tou05]: It then follows from Equations 4 that where q ′ is the so-called dual exponent q ′ = q/(q − 1).The equivalence between Equations 12 and 4 is a special case of Kasahara-de Bruijn's exponential Tauberian Theorem [Mik99].
It follows from Equation 12 that all the moments of U i grow as ζ q ′ , since Intuitively, if ln N ζ grows slower than ζ q ′ then N ζ cannot keep up with the variation as ζ increases and no continuous time limit process will exists.This serves as motivation for the scaling assumption, Here, τ is a control parameter closely related to temperature in the Random Energy Model.Limit theorems for the sum S ζ under this scaling assumption are proved in [BABM05].The authors show that, much like in the GCLT, S ζ has two critical points, one where the CLT breaks down and one where the LLN breaks down.Moreover, the limit law of S ζ is scale-invariant (see Theorem 4 in Section 4), which strongly suggests a connection between the limit coalescent processes obtained from offspring distribution of the form 2 under the scaling assumption 13, and those described by Theorem 1.

Main result
The following theorem generalizes Theorem 1 to the case q > 1.Interestingly, we find that the β-coalescent emerges universally from exponentially large offspring variation.
• For α q > 2, (Ψ ζ n,⌊t/c ζ ⌋ ) t≥0 converges to the Kingman coalescent.In Figure 2, the region 1 < α q < 2 is plotted for various values of q.By moving upwards through these regions that we obtain β-coalescents.As expected, for larger q, the region becomes more tilted towards the right, since a larger value of ζ is needed to obtain a continuous-time limit process for the same N .As q → 1, the region becomes a vertical strip between 1/ζ = 1 and 1/ζ = 2, hence Theorem 1 is retrieved in this limit.The regime α q ≤ 1 requires a different treatment not covered by this result, but is closely related to calculations of the Gibbs measure for the REM at low temperature, as we discuss in Section 4.
We will Theorem 2 in Section 5. Here, we provide an outline of the derivation which we will be useful when making the comparison to results for the REM.Following [Sch03], note that we can replace ν 1 with N U 1 /S N when computing averages (this is justified by Lemma 3).Therefore A To obtain the last equation we have changed variables z = u/A ζ .If f U is a power law, then integral can be evaluated and is given in terms of Γ-functions -this is one way to prove Theorem 1, although a different approach is taken in [Sch03].The essence of our argument is that when we evaluate this integral we can replace f U (zA ζ ) with e W * (ζ −1 (ln A ζ z)) and then neglect the higher order terms in Here, we have used the relations (q − 1)(q ′ − 1) = 1, (q ′ − 1)q = q ′ to simplify the exponents of ζ.Since the coefficient of ln z is independent of ζ, f U is approximately a decaying power law with exponent α q , defined by Equation 14. Making this replacement and evaluating the integral leads to the following Lemma, which says that E [(ν 1 ) k ] is dominated by the event U 1 > N when k > α q .
Lemma 1.For 1 < α q < k, k ∈ N we have If α q < 2, then from Lemma 1 we have and then from Equation 7 and another application of Lemma 1, These are precisely the merger rates of the β-coalescent and due to the consistency condition 8, uniquely determine the rates λ n,k for k < n.
On the other hand, when α q > k, E[ν k ] is no longer dominated by the tail and we can replace S ζ with A ζ in Equation 15, leading to In this regime, c ζ decays slower than N 1−n E[ν n 1 ] for all n > 2 and hence all the λ n,n except λ 2,2 vanish.

Relationship to the result of [CGCSWB22]
We now remark on the relationship between Theorem 2 and the results of [CGCSWB22].In their model, it is assumed that at the beginning of each generation (referred to as the spring in [CGCSWB22]), individuals experience a period of dormancy during which no reproduction occurs.At random times, individuals awaken from dormancy and reproduce according to a Yule process -that is, new individuals are spawned from existing ones at a (deterministic) rate until the end of the generation.The authors also allow for a period (called the summer) during which all individuals are awake and reproducing, although we will neglect this for the present discussion.
To make the connection to our model, let ζ rate at which cells divide after the period of dormancy has ended and let Φ i denote the duration of the ith cell's growth phase (i.e. the difference between the total time between bottlenecks and the dormancy period).In the dormancy model, it follows from properties of Yule processes that U 1 |Φ 1 is a geometric random variable with parameter e −ζΦ1 , hence In contrast, in our model U 1 is deterministic after conditioning on Φ 1 .This distinction should have no effect on the limit coalescent under our scaling assumption, since 1 − e −ζΦ1 ⌊e ζϕ ⌋ can be replaced with 1 Φ1>ϕ in the large ζ limit, and therefore At least heuristically, this justifies the replacement of U 1 with e ζΦ1 when calculating asymptotic behavior of the coalescent process.However, we have neglected variation U 1 |Φ 1 in order to simplify the derivations in the present paper.
The main results of [CGCSWB22] concern the limit genealogies and are closely related to ours.Theorem 1.3 concerns the situation where the period of growth after dormancy is exponentially distributed and is therefore very similar to Theorem 1 in the present paper.Their more general result, stated below, characterizes the possible forms of Λ which can emerge as limits of the discrete coalescents in the dormancy model.
Theorem 3 (Proposition 1.8 and Theorem 1.7 from [CGCSWB22]).Suppose U i are of the form 2. For any Λ-coalescent that emerges as the limit of (Ψ ζ n,⌊t/c ζ ⌋ ) t≥0 in the dormancy model described above, Λ has the form where δ x is the point mass at x and h is a probability density on (0, 1) with the representation for a monotone function g satisfying Our main result, Theorem 2, says that under the scaling assumption 13 the limit coalescent is of the form described by Theorem 3 with g(v) = v −1−αq and b 0 = b 1 = 0.

Background on the REM
The REM was introduced by Derrida in [Der81] as a toy model of spin glasses.As with other models of magnetic systems, the state space is the n-hypercube, C n ≡ {−1, 1} n and each configuration σ ∈ C n is assigned an energy, E σ .When in equilibrium with a reservoir of temperature T , the steady-state distribution of configurations is given by the Boltzmann distribution P σ ∝ e −βEσ where β −1 = T is the inverse temperature (assuming k B = 1 for notational simplicity).The quantity of central interest in (equilibrium) statistical mechanics is the free energy, It is said that the thermodynamic limit exists if the limit of F n /n exists, and by differentiating the free energy density, ψ ≡ − lim n→∞ F n /(βn), with respect to temperature (or other model parameters) various thermodynamic relations are obtained [Gol18,Bov06].In spin glass models, E σ is itself taken to be a random variable, and the free energy density is computed from the mean free energy, E[F n ].The hope is always that the thermodynamics emerging from a random energy field are typical of systems with very unstructured energy landscapes.For mathematical convenience {E σ } σ∈Cn is usually taken to be a Gaussian random field, and the simplest such model is of course found by taking E σ to be iid.In order for the thermodynamic limit to exist in this case, the variance of E σ must grow proportional to n -this is precisely Derrida's REM, which can be understood as the limit of a very rugged energy landscape.
Since there are no correlations between energies, we can abandon the hypercube structure and identify the state space with [2 n ], writing E i for the ith energy level.Following previous convections, we suppose that E i /n has variance 1/2.Then, when Φ i are Gaussian with unit variance, by setting we see that the partition function in the REM with these parameters is equal to the total number of offspring in the Cannings model Note that the negative sign in front of E i is inconsequential because we have assumed E[Φ i ] = 0 and can simply change the sign in Equation 4. Interestingly, even in this simple model one finds a phase transition, which occurs at a critical value β c = √ 2 ln 2, or in our notation, τ = 1 (with q = 2).The nature of this transition can be understood by examining the average number of configurations for which E i ∈ [nε, n(ε + dε)], which we denote by N (dε) [MM09].Loosely speaking, N (dε) is on the order of [Kis15] 2 This approximation makes sense only when ε < β c / √ 2, since for large n there are virtually no configurations with ε > β c / √ 2. With these heuristics, one can identify two regimes for the entropy density, s(ε) ≡ lim n→∞ Meanwhile, the free energy density can be obtained as and hence ψ(β) and s(ε) are convex conjugates of each other.∥A short calculation using the Laplace method yields Derrida's result: Intuitively, when β ≤ β c the variation in energies is small enough that the LLN can be applied to the partition function.Indeed, one way to arrive at ψ(β) in the high temperature phase is to replace In this regime, we say that the system is self-averaging.On the other hand, when β > β c , the system enters a so-called "frozen" phase where the partition function is dominated by the extremal statistical weights.
The arguments above can be generalized to the sum S ζ for any q > 1 -for example, see [Eis83].Of central importance for the application to coalescent theory is following LLN numbers, which tells us the partition function of the REM (with non-gaussian energy distribution) is self-averaging in the high temperature regime.
Lemma 2 (Theorem 2.1 from [BABM05]).Let q > 1 and assume Then for all ε > 0, In the context of the Cannings model, the self-averaging of S ζ allows us to make the approximation used in Equation 16 and ensures there is a continuous time limit process.

Limit theorem for the partition function
A richer mathematical structure to the REM is revealed by a careful study of the fluctuations in the partition function.This is closely related to the GCLT for iid sums over scale-invariant random variables, where one can distinguish between regimes of weak and strong self-averaging.The former refers to the case where there is a LLN, but no CLT for the sum.The precise limit Theorem for Gaussian energy distributions is stated in [Bov06] and the extension to distributions of the form 4 can be found in [BABM05].We now state (a slightly watered down) version of their result.
Theorem 4 (Theorem 2.3 from [BABM05]).Let and where B ζ is defined by • For 1 < αq < 2, Z ζ converges in distribution to an α-stable random variable Z α for which the characteristic function is where the parameter, αq , is given by Equation 23.
• For αq > 2, U ζ obeys a central limit theorem, meaning that it converges to a Gaussian when rescaled by the standard deviation.
Briefly, the α-stable distributions mentioned in this result are defined by the property that they are equal in distribution to linear combinations of realizations of themselves: Z is α-stable if and only if there are constant c, d, e such that Z = cZ 1 + dZ 2 + e for random two variables Z 1 and Z 2 equal in distribution to Z.The GCLT states that α-stable distributions arise as limits of sums of iid random variables whose variances are not necessarily finite [Ami20b,Zol86].Hence, our result is playing the role of Theorem 4 for the Cannings model by expanding Theorem 1 to the "large but finite" regime.
Much like Theorem 2, the idea behind Theorem 4 is that in the transition regime (1 < α < 2), the sum will be dominated by the maximum.From Equation 13 and Equation 4, we have The left-hand side will approach one as ζ → ∞, so in order to obtain a well-defined limit we need to look at deviations on a scale B ζ , which is increasing with ζ.To this end, we replace u with uB ζ in the exponent of Equation 27 and make the approximation In order for this to have a which simplifies to Equation 23.This plays the role of α in our analysis of the coalescent process.The two parameters agree only at α = α = 1, which is the transition to the frozen state; see Figure 3. Interestingly, this implies the regime of weak-self averaging in Theorem 4 (1 < αq < 2) does not exactly correspond to the regime of multiple merger coalescents in Theorem 2 (1 < α q < 2), although the two coincide in the limit q → 1.
1.0 1.5 2.0 2.5 3.0 q Figure 3. α compared to α as a function of the tail exponent q for different values of τ .The lower limit of the plot is the critical point α = 1 where both expressions agree.Above this point α > α for all q.

Gibbs measure
In addition to the partition function, there is an interest in understanding fluctuations in the statistical weights in the REM and other disordered systems models.In the large-ζ limit these weights approach a measure on the infinite dimensional hypercube C ∞ = {−1, 1} Z called the Gibbs measure -see [RAS15] for an introduction to this formalism.The question of how the Gibbs measure fluctuates between replicates of the energies is closely related to the coalescent process, since {U i /S} i=1,...,N ζ (asymptotically) have the same distribution as P σ when the configurations are projected to the one dimensional lattice [2 n ] = {1, . . ., 2 n }.The projected Gibbs measure approaches the Lebesgue measure on the unit interval in the high temperature regime [Bov06].However, the main focus in this context of the REM appears to have been the low temperature regime, αq < 1.
There is already an established connection between coalescent processes and the Gibbs measure via Derrida's generalized REM (GREM).In this model, the energies are no longer independent, but drawn from a Gaussian random field on C n whose correlation function depends on a certain ultrametric distance between configurations -see for a precise description [Ber09].This ultrametric distance induces a hierarchical structure to the configuration space.Ruelle gave a mathematical formulation of Derrida's models [Rue87] and the connection to coalescent processes was made in [BS98] in the context of their abstract cavity method.The authors show that by sampling configurations from the Gibbs measure and constructing a genealogical tree based on the hierarchies induced by the distance, one obtains (up to a time-change) a continuous time coalescent process now referred to as the Bolthausen-Sznitman coalescent.This processes is nothing but the β coalescent with αq = 1.
To understand what Theorem 2 tells us about P σ , suppose we sample two configurations i and j from the equilibrium distribution and set which Derrida refers to this as the replica overlap.It is not too difficult to see that E[ρ 1,2 ] is asymptotic to the inverse coalescent time, c ζ : First, notice that conditional on {Φ i } i∈[N ] the distribution of ϱ 1,2 is (see [DM21]) The joint distribution of the terms in the sum is asymptotic to that of . Therefore, recalling Equation 5, we can see that It is well known that in the high temperature regime (α > 1) while in the low temperature phase Derrida derives an expression in terms of the Beta functions.Now suppose we sample configurations of the Gibbs measure from n replicates of the REM and consider the event that these configurations are not unique and instead k i > 1 come from configuration i for i = 1, . . ., r with r i=1 k i = n.Such events are simply the analogues of (k 1 , . . ., k r ; 0)-collisions defined in Section 2.2, and they occur with probability This is asymptotic to the expression appearing in the definition of λ n,k1,...,kr;s (Equation 6) and in the REM is related to the higher order replica overlaps.We can then ask what chance of these events is when we take 1/c ζ replicates, which yields λ n,k1,...,kr;s .Therefore, Theorem 2 tells us there is a phase transition in the typical composition of our samples at the critical point α = 2.As we have explained above, this transition happens at a lower temperature (higher β) than the breakdown of the CLT for the partition function at αq = 2.

Proof of Theorem 2
In order to simplify notation in the proofs, we set α = α q .
Proof of Lemma 1.The idea of the proof is that most of the contribution to the moments of ν 1 come from the event ν By Lemma 6 and Lemma 4 in the Appendix A, where we have changed variables z = u/A ζ .This is the step which breaks down for α ≤ 1, since Lemma 6 uses Lemma 2. Now define R ζ (z) and K ζ (z) by where Observe that αζ ∼ α and by Equation 4 there is a constant C ′ such that As ϵ → 0, using that the logarithmic term vanishes, Similarly, for large enough ζ and any We can expand R ζ as Therefore, by the definition of W * (Equation 4), which means the coefficient of ln L ζ /ζ grows as a power law in ζ with exponent Finally, note that so If α > k, the integral derived above diverges and the bounds are no longer useful.The divergence comes from the very small values of z (meaning small Φ relative to A ζ ) when we make the linear approximation.In this case, we have ).In fact, these are equal exactly at α q = 2.

Discussion
In this article we have studied the asymptotics of genealogies in the Cannings model when both the population sizes and offspring variation are simultaneously taken to be large.Such limits are not covered by previous results for scale invariant offspring distribution, since in that setting the offspring variation is infinite for finite N .Our analysis rests on a certain scaling scaling assumption under which the total number of offspring produced in a generation is equivalent to the partition function of the REM and the offspring numbers ν i are related to the Gibbs measure.As with the REM, competition between fluctuations in growth rates (energies in the REM) with averaging over an increasing system-size (our log population size) leads to a form of weak self-averaging and anomalous scaling of the coalescent time.
Our main finding is that the β-coalescent -a previous studied model of coalescence in populations where variation in offspring is infinite -also emerge from models where the tail of the offspring distribution is thin, but large fluctuations are not too unlikely.This is related to the existing limit theorems for the Cannings model proved in [Sch03] in the same way that the fluctuation theorem in [BABM05] is related to the GCLT for iid sums.Our result does not describe the critical point and low temperature regime, although at least for q = 2 these can be likely be deduced from previous results on the REM -see [Bov06].The limit coalescent processes are the discrete-time Ξ-coalescents with simultaneous multiple mergings of lineages described in [MS01].
Biologically, Theorem 2 suggests the β-coalescent serves as a more universal description of neutral evolution in the presence of highly skewed offspring distributions than might be expected.This also indicates that little information about the demographic structure of a population is contained in the coalescent process itself.
By the law of large numbers The result follows after taking ε, δ → 0.

Figure 1 .
Figure 1.(left) A simulation of the Cannings model.Squares indicate individuals at the beginning of each generation and the protruding lines are their offspring.The thick red lines indicate an example of a discrete genealogy, or coalescent Ψ 5 5,k , obtained from a sample of the final population of labeled cells.For example, Ψ 55,1 = {{1}, {2, 3}, {4, 5}}.(right) A larger simulation of a continuous time coalescent process which is obtained in the limit of the model on the left.

)Figure 2 .
Figure 2. (left) The region 1 < α < 2 in ln N − ζ space for different values of q. (right) A diagram of the idea behind the definition of α.
large ζ limit, the first two terms must cancel, indicating that B ζ should satisfy Equation 25.Since B ζ = O(ζ q ′ ), it follows from Equation 25 that the coefficient of ln u in Equation 27 is independent of ζ and given by αq = 1 ζ (W * ) ′ ln B ζ ζ (28)