Abstract
In sexual population, recombination reshuffles genetic variation and produces novel combinations of existing alleles, while selection amplifies the fittest genotypes in the population. If recombination is more rapid than selection, populations consist of a diverse mixture of many genotypes, as is observed in many populations. In the opposite regime, which is realized for example in the facultatively sexual populations that outcross in only a fraction of reproductive cycles, selection can amplify individual genotypes into large clones. Such clones emerge when the fitness advantage of some of the genotypes is large enough that they grow to a significant fraction of the population despite being broken down by recombination. The occurrence of this 'clonal condensation' depends, in addition to the outcrossing rate, on the heritability of fitness. Clonal condensation leads to a strong genetic heterogeneity of the population which is not adequately described by traditional population genetics measures, such as linkage disequilibrium. Here we point out the similarity between clonal condensation and the freezing transition in the random energy model of spin glasses. Guided by this analogy we explicitly calculate the probability, Y, that two individuals are genetically identical as a function of the key parameters of the model. While Y is the analog of the spin-glass order parameter, it is also closely related to rate of coalescence in population genetics: two individuals that are part of the same clone have a recent common ancestor.
Genetic diversity is the fodder for natural selection and the fuel of evolution. It is generated by mutations and by recombination, which reshuffles genomes and thereby accelerates the exploration of the space of genotypes. The latter consists of all of the 2L possible combinations of the genetic variants, a.k.a. alleles, present at L (biallelic) polymorphic loci. Of course the number of polymorphic loci, L, itself changes as new mutations arise forming new polymorphisms while 'older' polymorphisms disappear from the population. The population itself consists of N individuals, which sample only a small fraction of the possible genotypes, i.e. N ≪ 2L. The dynamics in genotype space is therefore highly stochastic.
Of particular importance are those genetic polymorphisms that affect the fitness of individuals, the fitness being defined as the expected number of offspring in the next generation. Selection on its own would amplify the number of high fitness individuals and condense the population into a few 'clones' comprising a large fraction of the population. In populations of sexually reproducing organisms, the growth of such clones and the subsequent decline of genetic diversity are checked by recombination. Two parents, if chosen from different clones, produce offspring that are distinct from either parent. In obligate sexually reproducing species, the formation of clones is prevented since reproduction is coupled to recombination and no parent can produce genetically identical offspring. Many species, in particular microbial species and plants, can reproduce both by clonal reproduction (e.g., budding in yeast, selfing or vegetative reproduction in plants) or by sexual propagation. Such facultatively sexual species display a great variety in their mode of propagation, the frequency r of outcrossing, and the heritability of fitness. The latter is very important in sexual populations, as it determines to what extent recombinant offspring benefit from the same fitness advantages that made their parents successful.
The aim of this paper is to describe quantitatively the competing tendencies of natural selection and recombination with regard to genetic diversity, focusing on facultatively sexual organisms. The competition of natural selection with recombination is the dominant mechanism of evolution on relatively short time scales, on which mutational input is negligible compared to diversification by recombination. This situation is particularly relevant to adaptation following a major outcrossing event, or within a so called hybrid zone, where diverged genotypes have come together to generate a hybrid population. As this hybrid population continues to breed within itself, it can give rise to a bout of rapid adaptation, as beneficial alleles from both original populations are combined to form novel fit genotypes that spread within the hybrid population [34].
We will focus on the probability, Y, that two random individuals sampled from the population have the same genotype, i.e., are clones of each other. This quantity is important for population genetics, since it characterizes genetic diversity (its inverse is a measure of the number of dominant clones) as well as the dynamics of coalescence. Whenever two individuals are part of the same clone, they share a recent common ancestor, such that Y is proportional to the rate of pair coalescence. In the canonical theory of neutral coalescence, this rate is equal to the inverse population size. We will find here that Y, and with it the rate of coalescence, is determined by the clonal structure rather than the population size at low outcrossing rates.
Much of our analysis is presented in the context of a facultatively sexual population, where the evolving entities are individuals and their genotypes. Note, however, that some of our considerations also hold for contiguous segments of chromosomal DNA that are short enough to undergo only infrequent recombination even in obligatory sexual reproduction. In that case, we would be interested in the probability of a given chromosomal segment to be identical for a random pair of individuals drawn from the population. In this context, Y is the homozygosity of the population at this extended locus at which many different alleles segregate (this is of course a much weaker condition than clonal relation of whole genomes). A complementary view of this probability relates it to the 'haplotype diversity', i.e., the number and population distribution of distinct genomic sequences for the chromosomal segment in question. We shall return to this important question in the discussion.
In an earlier publication [29], we have shown that clones are absent in the so called quasi-linkage equilibrium (QLE) 'phase' corresponding to frequent outcrossing limit but appear in a regime of small outcrossing frequency r < rc, with rc depending on the complexity of the fitness landscape (i.e. the extent of fitness additivity) and weakly dependent on N. We will review this finding and present a more detailed analysis of the time dependence of this condensation phenomenon, as well as its quantitative dependence on fitness additivity and hence heritability. Furthermore, we also study the extent of clonal condensation in a steady state where the fitness distribution is moving towards higher fitness at a constant velocity. We will put these results into the context of the random energy model (REM) of Statistical Physics, introduced and solved by Bernard Derrida [8, 13, 23]. In fact, Y is closely related to the Parisi order parameter and the onset of clonality is closely related to the spin-glass transition observed in simple models of disordered media such as the REM.
Below, we will first draw the analogy between the dynamics of selection in finite populations and the REM. This analogy is particularly simple for r = 0. Whereas the condensation transition in the REM occurs below a certain critical temperature, the transition to clonal population structure occurs beyond a certain critical time, t > tc(N). Hence the population genetic analog of temperature will be the inverse time. We shall then generalize the model in order to include (facultative) recombination and fitness landscapes with varying degrees of epistasis, i.e., genetic interactions. The results for mixed epistatic and additive models enables us to set the analysis into the context of the 'traveling wave' approximation, which has recently emerged as a powerful representation of adaptive population dynamics in genetically diverse populations [6, 9, 14, 32, 38, 42]. Our results therefore provide insight into how recombination and epistasis affect the dynamics and structure of adapting population waves and define conditions under which genetic diversity is maintained or lost.
1. Natural selection and the random energy model
In the absence of recombination or mutation, the frequency of any individual genotype increases if its fitness lies above the population mean and decreases otherwise. Identifying fitness with relative growth rate and ignoring stochastic effects, the expected number of individuals ni(t) with genotype gi and fitness Fi obeys:
where is the time derivative of ni(t) (we assume ni(0) = 1) and the mean fitness 〈F〉t is defined by averaging over the whole population. Defining the rate of growth of clones by differential fitness (relative to the population mean) ensures constant population size: ∑ini(t) = N. Since the mean fitness term is shared by every genotype in the population, the frequency of a particular genotype is given by
where Z(t) is a time dependent normalization constant known as partition function. Hence the distribution of clones in the population has the Boltzmann form, where inverse time plays the role of temperature. The dynamics of the population will now depend on the initial distribution of fitness values Fi. In the generic case where fitness depends on many loci, each one giving a small contributions to fitness, the density of fitnesses ρ(F) = (0,σ2) is approximately Gaussian (with zero average and variance σ2) and the statistics of clones in the populations is identical to that of the REM [8]. For small t population averages are dominated by the vicinity of the peak of ρ(F), with many individual genotypes contributing. However, for t > tc, the dominant contribution shifts all the way to the leading edge , which corresponds to the maximum fitness sampled from the Gaussian ρ(F) in a population of size N. This means that with increasing time, i.e., decreasing 'effective temperature', the population shifts to fitter and fitter genotypes and eventually, for t > tc, 'condenses' into the fittest. This condensation phenomenon manifests itself in a non-negligible probability Y that two randomly chosen individuals from the population have the same genotype. The latter is equal to the average squared genotype frequency in the population at time t. The average participation ratio is obtained by averaging over the fitness values {Fi} of the N initial genotypes
where we have used the integral representation of and the fact that the Fi are the i.i.d. random variables sampled from ρ(F). This calculation is carried out explicitly in [23, 24] and leads to the following quantitative result: in the limit of large populations, 〈Yt〉 is given by
with . Figure 1 shows how 〈Yt〉, measured in a computer simulation (see below), increases in time and compares to the REM prediction. Note that the sharp transition exhibited in (4) is realized only in the limit of large logN, which is hard to achieve both in reality and in a numerical simulation. In both cases one expects to find a crossover rather than a sharp transition. Nevertheless, considering the idealized N → ∞ limit provides a very useful scaffold for the analysis, and the REM also allows us to compute the 1/logN corrections and therefore determine the detailed nature of the crossover.
Figure 1. The participation fraction 〈Yt〉 in the absence of recombination as a function of time. For each N, the time axis is rescaled by . The fitness variance σ2 = 0.0025 and is included in the expression for tc to account for stochastic effects. The solid black line shows the N → ∞ asymptotic result predicted by the REM (equation (4)). The convergence of the numerical results to this asymptotic result with increasing N is slow because it is governed by logN. Each curve is averaged over 1000 runs.
Download figure:
Standard imageRecombination and heritability
Sexual reproduction mixes the genetic material from two parents and thereby produces new genotypes from existing genetic variation. To account for recombination, we modify the equation describing the evolution of genotypes as follows:
where n(g) is the number of individuals with genome g, and K(g|g',g'') accounts for the probability that genotype g is assembled by recombination from the parental genotypes g' and g''. In the absence of recombination, the only relevant characteristic of a genotype is its fitness, and we could study the evolution of the population along the fitness coordinate instead of on the hypercube of possible genotypes. To achieve a similar simplification with recombination we need to know how the fitness of recombinant offspring relates to that of the parents to map equation (5). The correlation between parental and offspring traits is known as heritability and depends on the underlying genetic architecture. If a trait depends on many loci in the genome in an additive manner, i.e., different loci make independent contributions to the trait, the trait value of offspring will be approximately Gaussian distributed around the parental mean. Such traits are called highly heritable since the correlation between trait values of parents and offspring is high. Conversely, if a trait depends on specific combinations of alleles at many loci (epistasis), these combinations will be disrupted with high probability in sexual reproduction. Such traits have a low heritability in sexual reproduction since the correlation between parents and offspring is low.
The trait we are mainly interested in here is fitness, which in general involves many phenotypes and depends on the environment. We shall set aside the issues associated with fluctuating environment and possible time dependence of selection, and focus on how fitness depends on the genotype. As already noted, the genotype space is a high dimensional hypercube, g = (s1,...,sL), and for present purposes, the fitness is a complicated (but fixed) function of the genotype parameterized as
where the terms fisi define the additive contribution of locus i to fitness, while higher order terms correspond to epistatic interactions. The relation of the fitness of an offspring relative to that of its parents and the density of states, i.e., the distribution of fitness over all possible genomes, is illustrated in figure 2.
Figure 2. Heritable and non-heritable contributions to fitness. Left: the two panels illustrate heritable and non-heritable fitness functions. The black Gaussian represents the density of states over all 2L possible genotypes, while the arrows indicate the fitness of parents P1,2 sampled from this density of states. For additive fitness functions (top), the fitness distribution of recombinant offspring of parents P1,2 (red curve) is centered around the mid-parent value, i.e., fitness is heritable. For completely epistatic fitness functions (bottom), the offspring fitness is a random sample from the density of states and therefore not heritable. Right: fitness is a sum of additive (heritable) and epistatic (non-heritable) components and selection amplifies individuals with large F = A + E. In sexual reproduction, only the additive component of fitness is heritable, so only the additive fitness increases in time with rate v.
Download figure:
Standard imageThe higher the order of the interactions, the less likely it is that a particular set of loci that contributes to one parent is inherited uninterrupted. As a consequence, the heritability of interaction terms goes down with increasing order. Interaction terms of high order are essentially independent of the parents. This is reminiscent of high order spin-glass models, where the energies of any two configurations that differ at a macroscopic number of spins are uncorrelated. The large p limit of such p-spin glasses is the random energy model, where the energy of each configuration is an independent draw from a Gaussian distribution [8, 13].
2. The model
In general the genetic architecture of fitness is expected to be complex, with additive contributions as well as epistatic contributions of various orders. It is very instructive, though, to consider a simplified model that includes a heritable component and a random epistatic component. Specifically, let us assume that fitness can be decomposed into an additive component A and an epistatic component E. If two individuals with fitness F1 = A1 + E1 and F2 = A2 + E2 produce an offspring, its additive component, A, is drawn from a Gaussian with variance around the parental mean (A1 + A2)/2, while its epistatic component, E, is independent from that of the parents and is drawn from a Gaussian centered around 0 with variance . Hence the recombination kernel of this model depends only on the additive fitness of the parents
For much of the analysis below, we will simplify this model even further and assume that the additive fitness of recombinant genotypes is independent of that of the parents and simply drawn from a Gaussian with variance centered around the population mean 〈A〉. This distribution is expected if the distribution of A in the population is approximately Gaussian. This simplified model is easier to analyze, while displaying the same qualitative behavior, as has been checked by numerical simulations. The joint distribution of A and E in the population then evolves according to
The ratio of σA and σE define the extent of fitness heritability in the process of recombination. We define 'heritability', h, which will be one of the key independent parameters of the model, as:
To illustrate the behavior of the model at different recombination rates and different heritabilities, we implemented it as a computer simulation. The computer program keeps track of clones with fitness Ai and Ei and population size ni. At each generation, the size of a clone is updated by a Poisson distributed number with mean ni eFi−〈F〉−r+C, where C = (1 − N/N0) is a density regulating term. A clone is deleted if its size is 0. In addition, at each generation, Nr new clones of size 1 are seeded, with additive fitness drawn from a Gaussian (〈A〉,σA) and epistatic fitness drawn from
(0,σE). Alternatively, we can impose a 'velocity" of the additive fitness by drawing its value from
(vt,σA) instead of
(〈A〉,σA).
Due to the stochastic nature of reproduction, the majority of all initial genotypes will rapidly die out, even if very fit. Of the N initial genotypes, only a fraction ∼σ remains, where is a measure of the typical strength of selection. Similarly, of the clones produced by recombination, only a fraction σ 'establishes'. Those clones that do establish are on average larger than the deterministic expectation by a factor of σ−1. We will neglect these factors in most of the formulas below. The dominant effect of this stochasticity can be accounted for by rescaling N to σN inside logarithms. We will reinstantiate this correction in comparisons to simulations when necessary. These stochastic effects are of minor importance for the phenomena we study here since they are overshadowed by the randomness inherent in the fitness and seeding time of new clones.
3. Results: structure of adapting populations
Depending on the rate of recombination r and the degree of fitness heritability h, the model formulated above exhibits very different behaviors. Before we present a formal characterization of the population structure and the dynamics of evolution, it is instructive to discuss the dynamics of the model as observed in simulations.
Figure 3 shows the distribution of additive and epistatic fitness at two different times for scenarios where additive (left) or epistatic (right) fitness dominates. If fitness is predominantly additive, the population adapts and moves towards high additive fitness as long a new genotypes are generated by recombination. The velocity is given by the variance of the population along the additive direction. If most of the variance is along the epistatic direction, as in the right panel, the adaptation of the population is much slower. In addition, large fractions of the population tend to be condensed into a small number of clones, as we will discuss at greater length below. As the variance along the additive direction decreases to zero, so does the velocity.
Figure 3. The distribution and dynamics of additive and epistatic fitness in the population at moderate recombination rates r = σ. Panel A: additive fitness dominates (, ). Panel B: epistatic fitness dominates (, ). In both panels, σ2 = 0.0025 and N = 105.
Download figure:
Standard imageThe population structure and dynamics at different recombination rates are illustrated in figure 4. The figure shows the composition of the population as a function of time for different recombination rates and different heritabilities. In each panel, the population was initialized as a diverse sample from the density of states and was subsequently allowed to evolve via selection and recombination. Hence each panel shows a transient, which gives way to a steady state behavior. Each genotype in the population is assigned a specific color, and individuals are ordered according to fitness in each time slice, with the fittest individuals at the bottom.
Figure 4. Structure of population for different recombination rates and heritabilities; see text for discussion. In all panels, σ2 = 0.0025 and N = 104.
Download figure:
Standard imageThe left column of figure 4 shows evolution governed by an all additive fitness function (σA = σ, σE = 0) for different recombination rates. In this case, the population moves steadily to higher fitness since new genotypes, fitter than their parents, are constantly produced. At low recombination rates, the population consists of a small number of clones that arise at the high fitness edge (bottom part of the panel), grow as time progresses, and shrink again once they fall behind in fitness (i.e., when they move towards the center of the panel). As the recombination rate is increased to values comparable to σ, large clones cease to exist. Most of the population is made from nearly unique genotypes that are short lived. This pattern becomes even more pronounced at larger recombination rates. The high recombination limit of this dynamics can be understood in mean field theory (MFT), described in section 3.1. The low recombination limit is described in greater detail in section 3.4.
The right column of figure 4 shows the opposite limit, when fitness is not heritable but completely epistatic, corresponding to a fitness function with random high order interactions. In this case, we do not expect the population to move towards higher fitness indefinitely since the epistatic fitness is non-heritable. At low recombination rates, we expect that the fittest genotype in the population will grow while generating new recombinants distributed around zero fitness. With the growing genotypes, the mean epistatic fitness will increase until the selection on the fittest genotype, Emax − 〈E〉, equals the rate at which it produces recombinants. This behavior is clearly seen in all four panels with h2 = 0 on the right. Since the recombination rate increases from top to bottom, the size at which the fittest genotype stabilizes decreases, while the 'dust'-like fraction of the population that consists of short-lived unfit recombinants increases. Occasionally, recombination generates exceptionally fit genotypes, which have a chance of replacing the previous record holder. As time progresses, these records become rarer and rarer according to the well known results on record dynamics5. The clone structure and its dynamics for the fully epistatic case will be discussed in section 3.2. The high recombination limit can again be understood within mean field theory; see section 3.1.
The center column of figure 4 shows the clone structure and dynamics of a case of intermediate heritability. At low recombination rate, the population is dominated by clones, and clones exist even when the additive population is only dust. However, unlike the h2 = 0 case, no clone dominates forever, but new clones are continuously established. The velocity in this regime will depend critically on how often such new clones are produced and on how much they advance the additive fitness. We will investigate this case below in section 3.4.
3.1. Quasi-linkage equilibrium (QLE) and its breakdown
At large r, this model admits a factorized solution P(A,E) = ϑ(A,t)ω(E) with
derived in [29]. This solution travels towards higher additive fitness with a velocity equal to the variance of the additive fitness of recombinant offspring, while the epistatic fitness has a steady distribution, where 〈E〉 adjusts itself to normalize the distribution. This factorization is a hallmark of quasi-linkage equilibrium [17, 27, 43], where additive components evolve independently, while epistatic components are in a quasi-steady balance between selection and recombination. However, this factorized solution breaks down as soon as there are individuals with epistatic fitness larger than r + 〈E〉. In that case this solution is no longer normalizable, and additive and epistatic components cease to be independent.
Figure 5 illustrates this condensation behavior. The smooth distribution ω(E) is a deformed Gaussian which diverges at E = r + 〈E〉. Beyond r + 〈E〉, the population consists of growing clones whose size depends on when and where they were seeded.
Figure 5. The population consists of mediocre recombinant genotypes (dust) and leading clones. A sketch of the population structure, indicating the smooth quasi-static distribution ω(E) at E < 〈E〉 + r and the exponentially growing clones with E > 〈E〉 + r.
Download figure:
Standard imageIn the following we will characterize the population structure and study how the velocity in the direction of increasing additive fitness depends on parameters. We will begin by studying the purely epistatic case with recombination, which extends the REM by continuous seeding of novel recombinant genotypes. We will then study the condensation phenomenon in the presence of additive fitness, where the population forms a traveling pulse in fitness.
3.2. Clonal condensation for zero heritability
Suppose we start with a diverse sample of size N from the distribution of epistatic fitness and inject new recombinant genotypes with rate Nr. After a time t, the population consists of N initial clones and a Poisson distributed number of new clones. A clone of age τi and fitness Fi = Ei will have approximately size
where the mean fitness 〈E〉t' can be thought of as a Lagrange parameter that keeps the overall population size constant. To calculate 〈Yt(r,h)〉 we have to average over the fitness of the N initial clones and over the fitnesses and seeding times of all subsequently produced recombinant genotypes:
where B0(z) is the average of over the Ei of initial genotypes (note that 'initial' means that τi = t). Br(z) is the corresponding average over recombinant clones, which in addition are averaged over their age τi; see appendix A for derivation. Similarly, C0(z) and Cr(z) are averages of e−zni(t) over the Ei and τi. The sum over k—the number of recombinants generated in time t—can be performed easily. These integrals are evaluated in the appendix B in the limit of large N. One obtains an approximation
valid for r ∼ rc, with
We observe that the participation ratio substantially deviates from zero only for r < rc and only upon reaching t > tc(r), where for r ≈ rc
This provides us with estimates of the critical time of condensation tc(r) and the critical rc below which condensation occurs. Condensation is delayed compared to the case of r = 0 and the critical time (analogous to inverse critical temperature) diverges as the recombination rate approaches its critical value rc. Of course, this divergence only occurs in the limit of going to infinity. In practice, with realistic population sizes N, one observes not a transition, but merely a crossover between different regimes of behavior. More elaborate expressions for tc and rc at finite N can be found in appendix B. Note that the values of tc and rc themselves scale with the square root of the logarithm of N.
〈Yt(r,0)〉 is shown in figure 6(A) for different recombination rates. With increasing recombination rate, the early plateau of 〈Yt(r,0)〉 is reduced and condensation is delayed. Each individual run condenses rapidly once the dominant clone is large enough that it often recombines with itself, which leaves the genotype intact. Also, the lack of condensation for r > rc is clearly seen. Figure 6(B) shows the inverse time to condensation for different recombination rates and population sizes, confirming equation (15) for r ≈ rc. This transition with increasing r was identified in [29]. Intuitively, the divergence of the time to condensation is due to the fact that the growth rate of best genotypes decreases with increasing recombination load r. As soon as E − r is smaller than zero for all existing genotypes, all clones are short lived and no condensation can occur. Similar behavior has been observed in populations with heterozygote advantage or disadvantage [1, 11].
Figure 6. Panel A shows 〈Yt(r,0)〉 for different ratios of r/σ for fully epistatic fitness functions as a function of time relative to the critical time with r = 0. The curves shown are averages over many runs which exhibit substantial stochasticity. Recombination delays condensation and reduces 〈Yt(r,0)〉 early on. The model used to produce the data in this plot accounts for the fact that recombination of identical parents does not produce a novel genotype, which results in rapid condensation once one clone becomes macroscopic. Parameters N = 16 000, σ2 = 0.005, and h2 = 0. Panel B shows how the condensation time tc(r) diverges as r approaches . For times and recombination rates below the lines, the population is condensed into clones; above the lines, no genotype is populated by a macroscopic fraction of the population. Here, tc(r) is defined empirically by 〈Ytc(r)〉 = 0.1. The solid black line indicates the prediction of equation (15) for r ≈ rc.
Download figure:
Standard imagePopulation dynamics for r < rc, including t > tc, has the nature of a 'records process' [20, 40]. As time progresses, more and more genotypes are sampled from the density of states and tested by selection. As a consequence, the population will come across fitter and fitter genotypes, resulting in a record process with Nrt trials. Even if initially no genotype with E − r > 0 is present, such a genotype will eventually be found and result in condensation. On the other hand, any finite population will eventually reach a final 'record' (with fitness Ef), giving rise to the clone that will eventually take over the whole population. This is because a clone rapidly fixates once it is large enough to frequently recombine with itself. To prevent its fixation a new record would have to be created with the fitness advantage ΔE > τ(Ef−r)2/logN within the time delay of τ, which is very unlikely beyond a certain Ef.
3.3. Traveling solutions for additive fitness
In the opposite limit when h2 = 1, the population moves towards higher fitness with a velocity that is given by the additive variance for sufficiently large r. It will be useful to parameterize the ratio between the velocity v and the scale of selection σ2 as γ = v/σ2. At high recombination rates, we have γ = h2, while we generally expect γ < h2 at low recombination rates. In contrast to the case h2 = 0, no aging dynamics is observed for h2 > 0. Instead, old genotypes are constantly replaced, and dominant genotypes in the population have a finite characteristic age. Hence the initial genotypes are rapidly forgotten and we can restrict the analysis to recombinant genotypes.
Genotypes seeded at the high fitness nose of the population distribution can grow large and dominate the participation ratio 〈Y〉. Since 〈Y〉 is closely related to the rate of coalescence, it is instructive to calculate 〈Y〉 explicitly assuming v = σ2 before considering the case v < σ2 at low r or h2 < 0.
In the steady state, a clone is specified by its age τj and its initial fitness Aj. Assuming v = σ2, we then find
To evaluate 〈Y(r,1)〉, we have to evaluate the integrals Br(z) and Cr(z) that appear in equation (12) and are defined in the appendix A, see equation (A.9). The integrals involve averaging over the initial fitness Ai and all possible ages τi. For h2 = γ = 1, one finds Cr(z) ≈ z because relatively young and small clones dominate. In contrast, Br(z) has a significant contribution from rare old clones, which dominate because of their large size (through the factor). The evaluation of the integrals is detailed in appendix C with the result equation (C.5). We obtain
The first term is of order 1 and corresponds to the contribution of young clones. Its exact value depends on the details of the stochastic dynamics, such as the offspring distribution. The second term is the contribution from old large clones. The participation ratio therefore becomes
For small r, 〈Y(r,1)〉 does not scale as N−1. In other words, the larger the population is, the larger are the clones it is composed of, and those largest clones dominate 〈Y〉. This result is in agreement with arguments made for rapid coalescence in facultatively sexual populations in [30, 36, 37]. Figure 7(A) shows 〈NY(r)〉 as a function of . As soon as , Y(r) increases and no longer scales with N, as predicted by the mean field calculation above. The figure also shows the explicit expression in equation (18) as dashed lines. Note, however, that the calculation of 〈Y(r,1)〉 involved several approximations where was assumed large. As a result, the accuracy of the prefactor is low, and we have dropped all non-exponential parts, while enforcing 〈Y(r,1)〉 = 1/N at the crossover.
Figure 7. Panel (A): the rescaled average participation fraction 〈NY(r)〉 as a function of . For large r, we find N〈Y(r)〉 ∼ 1, while N〈Y(r)〉 ≫ 1 for small r. The dashed lines indicate the prediction by equation (18), which is expected to be valid for . The theory curves are scaled such that Y = N−1 at the crossover. Panel (B) plots the reduction of velocity observed in simulations against the numerical solution of equation (19) where γ = v/σ2.
Download figure:
Standard imageThe derivation above is valid for . For smaller r, the velocity drops below the high recombination limit, γ = 1, and the fitness distribution in the population stops being Gaussian. In this regime, the velocity has to be calculated self-consistently by matching the rate of production of new extremely fit clones to the overall velocity of the population. This problem has been studied by Rouzine and Coffin [35, 37] in the context of HIV evolution.
A simplified version of the arguments of Rouzine and Coffin [35] is given in appendix D. With γ = v/σ2, we find
This implicit equation for γ can either be solved numerically or, in the case of very small r, iteratively. The result is similar to that obtained by Rouzine and Coffin [35]; differences are due to the difference in the model used. The numerical solution is compared to simulation data in figure 7(B). Convergence with increasing N is slow and equation (19) has a solution only for small r, for which we have little data.
The velocity of the traveling wave falls below σ2 as soon as a few large clones dominate the population. In this case, not only Br(z,h) but also Cr(z,h) are dominated by those large clones, and 〈Y(r,h)〉 behaves differently. We show in the appendix that 〈Y(r,h)〉 ∼ 1 − γ is of order 1 (see equation (C.7)). This is in agreement with [35], who found that the typical number of large clones is ∼logrNσ ∼ Y(r,h)−1.
3.4. Traveling wave solutions at intermediate heritabilities
In the high recombination limit, the population moves towards higher fitness with velocity regardless of any epistatic fitness contributions. However, we have seen above that a clonal structure can emerge even in the absence of epistasis if recombination rates are low enough. Intuition and the simulation in figure 4 show us that this clonal structure is more pronounced in the presence of epistasis and persists at high recombination rates.
The reason for this behavior is the fact that the fitness variance of the recombinant offspring is larger than the velocity . In this case, fit genotypes grow above the traveling Gaussian envelope and generate macroscopic clones.
Figure 4 shows that, at low recombination rate and heritability, the population is always dominated by a few large clones with high non-heritable fitness components, which produce a large number of (on average) non-fit offspring. Rarely, such offspring are very fit, and replace their predecessors. The probability that offspring are fitter than their parents, and with it the velocity, increases dramatically with h2 and r. Conversely, the size of the fit clones and the participation fraction decreases with h2 and r. Simulation results for Y and v in the steady state are shown in figure 8 for different values of h2 and r. In the following, we will rationalize and quantify the observed behavior. To begin with, we will assume a constant velocity and calculate Y. Again, these calculations are detailed in appendix C.
Figure 8. Clonal condensation as a function of recombination rate and heritability. Panel (A): ratio of the traveling wave velocity to the total variance. At low heritability and low recombination rates, the average velocity of the traveling wave is much lower than the additive variance. The equality of velocity and additive variance promised by Fisher's fundamental theorem is recovered in the high recombination limit. Panel (B) shows the corresponding participation ratio log〈Y∞(r,h)〉. Low velocity goes along with high 〈Y∞(r,h)〉. Panel (C) shows the coefficient of variation of the time trace Yt, i.e., std(Yt)/mean(Yt). Yt fluctuates strongly at intermediate recombination rates between the two critical lines identified in the main text and indicated by the white lines. In all panels, N = 106 and σ2 = 0.0025. The additive fitness is drawn from a Gaussian centered around the current mean additive fitness.
Download figure:
Standard imageAfter successful establishment, a clone will grow approximately deterministically according to
where Fj = Aj + Ej is the sum of the additive fitness Aj, measured relative to vt when the clone was born, and Ej is the epistatic fitness. The advancing mean additive fitness is accounted for by vτj, while 〈E〉 is the mean epistatic fitness. The size of the clone peaks at age (where we have defined ) and then decreases until the clone disappears. The maximal size of the clone is . Since the fittest genotypes in the population have a fitness above the mean, they grow larger than N if
which gives us a first indication of the breakdown of the mean field solution, where . We confirm this again by calculating the integrals Cr(z,v) and Br(z,v) in appendix C; see equation (C.4).
Furthermore, we calculate 〈Y(r,h2)〉 in appendix C and find that 〈Y(r,h2)〉 starts to be larger than N−1 as soon as
These two conditions on define two critical recombination rates rc and r* at which different features of the mean field solution break down. Note, however, that this expression is not valid close to v = 0, since we have assumed a traveling wave with clones of finite lifetime.
Both of these two lines seem to play an important role in the behavior of the population. Between the two lines, we observe a coexistence between condensed populations and non-condensed populations, which results in large fluctuations of Y(t). An example trajectory of Y(t) is shown in figure 9. For a limited amount of time, the population is condensed with large Y and does not move in the additive direction. It then becomes unstuck and adapts for a while before getting stuck again. The time average of Y is dominated by these intermittent condensates in this meta-stable region and is larger than N−1 as soon as . Good simulation evidence for the different transition lines, however, is difficult to obtain because of substantial sub-leading corrections.
To investigate the model in the absence of this stick–slip behavior, we simulated a variant of the model where additive fitness is drawn from a distribution that steadily moves towards higher fitness with velocity v, as is assumed in the calculations. The time averaged 〈Y(r,h)〉 is shown in figure 9(B) and compared to the predicted onset of condensation by equation (C.7) (black lines).
Figure 9. Panel (A) shows a time trace of Yt(r,h) which exhibits strong fluctuations. For part of the time, the population is dominated by a single clone and does not adapt. After a while, this clone is superseded and the population starts moving again. Panel (B): participation ratio for different heritabilities and different population sizes at an imposed velocity of (the transition is sharper if v is adjusted self-consistently). Note that above the critical recombination rate, 〈Y〉 is of order N−1. The analytical predictions (equation (C.7)), up to the prefactor which was fixed to ensure Y(rc,h2) = N−1, are shown as black lines. Dotted lines: N = 104, dashed: N = 105, solid: N = 106, σ2 = 0.0025.
Download figure:
Standard image4. Discussion
4.1. Summary
Correlations between different parts of the genome are usually referred to as linkage disequilibrium, suggesting that due to genetic linkage, i.e., a high probability of coinheritance, the allele frequencies at different loci are not independent. Here, we have used a different measure to characterize correlations in the population. Instead of looking at correlations between individual loci, we have characterized the distribution of clone sizes, or equivalently haplotype frequencies, in adapting populations. Our analysis is not restricted to additive fitness functions, but we have also analyzed a simple model where fitness is partly heritable and partly random.
While macroscopic condensation, 〈Y∞〉 ∼ (1), sets in only for in the absence of epistasis, we observe condensation of the population at recombination rates of order σ in presence of epistasis. Here, σ is the standard deviation in fitness and sets the strength of selection. The reason for this behavior is the fact that the velocity at which the population adapts is smaller than the fitness variance of the recombinant offspring. The additional epistatic variance allows the seeding of new clones far ahead of the population mean, which is only slowly catching up. Hence fit clones can out-grow any traveling Gaussian. At the same time, condensed clones cause the average epistatic fitness to be significantly greater than 0. Since this epistatic fitness is lost upon outcrossing, the population has a tendency to partition into a few fit clones with high epistatic fitness and a large number of recombinant genotypes with random epistatic fitness. This coexistence between 'condensed' and 'dust' phases is seen in figure 4 in the panels on the right. As long as the heritability, i.e., the fraction of additive variance, is larger than zero, the population seeds new clones even at low recombination rates and the rate of coalescence will be given by 〈Y〉 times the characteristic turn-over rate of clones.
In the absence of any additive variance, the observed behavior is quite different. In this case, the fitness function is completely random (a.k.a. House-of-cards model [18], or random epistasis/energy model [8, 29]). The only way the population adapts is by sampling fitter and fitter individuals from the same distribution. In other words, the population dynamics amounts to a record process where the total number of samples taken increases as N(1 + rt). Records establish and grow with the rate E − 〈E〉 − r. One additional complication that is of particular importance in the case h2 = 0 is the fact that whenever a clone recombines with itself, it does not generate a novel genotype. This has the tendency to shut off recombination and stabilize clones as soon as they grow large, resulting in a rapid loss of genetic diversity. In a previous publication [29], some of us have studied the onset of condensation in a more descriptive manner. Here, we have extended this work by explicitly calculating Y, both during an initial transient as well as in a steady state where variance is maintained.
The model we have used is extremely simplistic, and one might wonder about its relation to real world populations. Nevertheless, it accounts for a number of features of real populations such as heritabilities between 0 and 1, outcrossing, and mimics a large number of loci in the sense of quantitative genetics. These features give rise to qualitatively different dynamical regimes, which will also be observable in more realistic models. Some facultatively sexual populations are in fact remarkably close to this simple 'E and A' model. Many plants and microbial populations are facultative outcrossers. In the event of outcrossing, a large number of crossovers on many chromosomes produces many independently inherited genomic segments. HIV, for example, recombines via template switching following coinfection at an outcrossing rate of a few per cent [3, 28]. In each of these outcrossing events, roughly 10 crossovers are observed [21]. If populations are polymorphic at many loci, the resulting offspring distribution is approximately described by equation (7). We have made a further simplification by assuming that the fitness of a recombinant offspring is independent of its parents and simply drawn from a comoving distribution. This assumption is expected to approximate recombination processes where the offspring and parent fitness decorrelate rapidly over a few outcrossing events, as for example in equation (7) [32]. Note, however, that loci close together on the chromosome decorrelate only slowly.
The other dramatic simplification made above was the partitioning of fitness into additive and random components corresponding to high order epistatic interactions. More generally we expect to find epistatic interactions of various orders, which are heritable to different extents. However, the 'E and A' model should not be thought of as a parameterization of the genotype fitness map, but rather as a partitioning into variance that can be explained by the best fitting additive model, and the remaining variance [10, 27]. The best fitting additive model is in general time dependent, and the heritability can change as the allele frequencies change [44]. We know rather little about the genetic architecture of fitness, which justifies the use of such simple models. In specific scenarios where the genotype–phenotype map is known, more detailed modeling should be guided by the general conclusions drawn from the 'E and A' model.
The variation that we assume is always present among the recombinant offspring, it is ultimately fueled by mutations. The balance between the influx of beneficial mutations and selection in facultatively sexual populations has been investigated in [7, 32]. Even in sexual populations, the dynamics of mutation can be strongly affected by the selection through the clonal structure of the population.
4.2. Why is Y important?
The participation ratio, Y, is exactly the probability for two individuals to be genetically identical. Therefore, it immediately gives a measure of the clonal structure of the population. If the two genotypes are identical, they have had a common ancestor in the recent past. Hence, 〈Y〉 is proportional to the rate of coalescence, and as soon as 〈Y〉 is no longer proportional to N−1, coalescence is greatly accelerated relative to a neutral model. It is well known that selection accelerates coalescence since fit individuals have more offspring and dominate future generations. Recombination tends to reduce the effects of linked selection since it decouples different regions of the genome. We have calculated the rate of coalescence in our model, which is set by a balance between selection and recombination. We have shown that there is a critical recombination rate, where recombination is overwhelmed by selection and the population structure changes qualitatively.
In the case of additive fitness functions, we have found that , which is in agreement with earlier work [30, 37]. In this case, the population consists of clones apparent as little 'bubbles" in the representation of figure 4. Any such bubble originates from a common ancestor generations in the past, and 〈Y〉 is the probability that two individuals belong to the same 'bubble'. This bubble coalescent is similar to ideas developed for structured populations [33] or the fitness class coalescent [45]. Note, however, that the clone size distribution is very long tailed and the bubble coalescent is not in the universality class of the Kingman coalescent [19], but possibly of Bolthausen–Sznitman type [4, 25, 30].
Genetic identity between some of the individuals reduces the effect of outcrossing, since identical parents produce identical offspring. Since the probability of such an occurrence is equal to Y, the effective rate of recombination in the partly clonal population is reff = r(1 − Y(reff,h)). Hence, strictly speaking our calculations of Y(r,h) deep in the clonal condensation regime must be taken through a self-consistency step, which replaces r that was hereto an independent variable by a dependent variable reff. This however would not change our estimates for rc(h) and tc(r,h), which are defined by the point of first emergence of clones (when Y rises above O(1/N)). Going beyond the MFT description of recombination, one may define a participation ratio density ϒ(F) in terms of which Y = ∫dF ϒ(F), which picks up the fitness dependence of Y: individuals with relatively high fitness are much more likely to be clonally related.
The significance of Y is not limited to the case of exact genetic identity within clones. In particular, mutations would introduce additional polymorphic loci into the 'clones' that were the focus of our study. Yet the genetic structure of the population introduced by clonal condensation still survives: one only needs to distinguish between high-frequency polymorphisms, which are being reshuffled by recombination as approximated by our model, and the low-frequency new polymorphisms due to recent (on the time scale of clonal growth) mutations. The latter would appear on the background that is clonal with respect to the high-frequency alleles. Thus the 'clones' emerging at low r should be thought of as haplotypes [22] and the 'clonal condensation' is the process that suppresses the number of haplotypes on small length scales.
The participation ratio can be readily generalized to allow for a degree of genetic distance within a pair of individuals. As currently defined Y = 〈δ(||g − g'||)〉 (where ||g − g'|| stands for the genetic distance between g and g'). This is immediately recognizable as a special case of the Parisi order parameter q(x) = 〈δ(||g − g'|| − x)〉 [23]. The latter therefore provides an interesting representation of the haplotypic structure of populations. It would be interesting to understand whether more realistic models of fitness landscapes (with low order, rather than random epistasis) would generate a more complex hierarchical structure of q(x) than the simple 'dust/clone' dichotomy found in our simply model. The relation between the REM and Sherrington–Kirkpatrick models of spin glasses [39] could provide useful guidance and ultimately yield better understanding of haplotype distributions and recombinant coalescents [16, 26, 41].
4.3. Future directions
The analysis of 'clonal condensation' presented above can and should be extended in several ways. Within the confines of the model considered, one may want to obtain a better understanding of the 'mixed phase' lying between the two transition lines identified in figure 8. This phase is characterized by large fluctuations in clone size distribution, and hence in Y, even in the approximation imposing a fixed traveling wave velocity v. In reality, the population sets its own instantaneous rate of change of average fitness, which depends sensitively on the fitness of the leading clones and changes with time as new fitness 'records' are established by fresh recombinants. We have already described in figure 9 the 'stick–slip' dynamics, which is characteristic of the mixed phase regime. (Needless to say, the existence of the mixed phase region corresponds to the 1st order nature of the clonal condensation transition for h ≠ 0,1.) A fully quantitative description of this behavior would require going beyond MFT. So far, attempts to describe fluctuations in the dynamics of adaptation have been few and far apart [4, 14, 31]: a quantitative description of the 'stick–slip' progress of adaptation would represent a major step forward.
Another necessary extension of the model involves mutational influx. A non-zero mutation rate would provide an influx of genetic variation and define true statistically stationary states corresponding to adaptive traveling waves [6, 9, 32, 38, 42] or, in the presence of both deleterious and beneficial mutations, to a dynamic mutation selection balance [12]. In that case, emergent clones become 'fuzzy' as they accumulate mutations, and the participation ratio should be replaced by a more general Parisi order parameter as explained above. The result should provide interesting quantitative insight into the expected genetic structure of facultatively sexual populations.
Perhaps the most interesting and important extension of the present work would involve a generalization of the model and its analysis to linear genomes and obligatory sexual populations. In contrast to the current model where recombination freely re-assorts the loci, a more realistic linear chromosome model would describe recombination in terms of relatively infrequent crossover. This would naturally tie the frequency of recombination to the length of the segment considered. We expect that on sufficiently short scales, where recombination is infrequent, a tendency for haplotype condensation would be manifest if the population is diverse enough. Whether epistasis plays a significant role in the condensation process will depend on the distribution of epistatic interactions along the genome [27]. If there is a strong tendency of mutations to interact with mutations nearby [5], the heritability decreases as smaller and smaller segments are considered. This could result in condensation of mutations into 'super-alleles'. However, the embedding of the haplotype in question into a larger genome gives rise to complications related to Hill–Robertson interference [2, 15, 16]. Transient associations with other genomic regions will either boost (the hitch-hiking process) or suppress (background selection) the spread of a haplotype in the population. This reduces the efficacy of selection and gives rise to a stochastic dynamics with rather different properties than genetic drift [30]. Bridging the different scales and resulting dynamical regimes represent an important challenge for the future.
4.4. Conclusion
In conclusion, we stress the distinction between the QLE and 'clonal condensation' regimes. In the QLE regime, recombination is sufficiently rapid to overwhelm any clonal amplification due to selection, and correlations between alleles at different loci are relatively weak. In this regime, the correlations between loci are well described by linkage disequilibrium, which measures population averaged pair correlation of loci. By contrast, clonal condensation is a non-perturbative, large deviation from linkage equilibrium (under which loci are completely uncorrelated), which in particular results in a stratification of the population depending on its fitness: clones appear predominantly in the upper reaches of the fitness distribution. Strong heterogeneity among individuals along the fitness axis is not captured by measuring linkage disequilibrium and other traditional approaches. Understanding strong interactions in multi-locus systems requires new ideas and tools. We have found simple models such as the REM to be a very useful source of insight into these non-trivial aspects of population genetics.
Acknowledgments
We are grateful for stimulating discussions with H Teotonio, A Dayarian, I Rouzine, D Fisher, M Desai and M Lynch and would like to thank T Kessinger for careful reading of the manuscript. RAN is supported by an ERC starting grant 260686 and BIS acknowledges support of the HFSP RFG0045/2010.
Appendix A.: Participation ratio for clonal condensation
Following its definition for the REM we define the participation ratio for a set of clones as the sum of squared frequencies
where
describes the size of clone i created a time τi in the past with fitness Fi. Due to stochastic effects, most clones go extinct early on, and the fraction σ that remains is on average larger by a factor of σ−1. We will ignore this correction and reinstantiate it later when comparing to simulations. The rate of growth of the clone is controlled by the 'replication rate': the fitness relative to the time dependent average fitness of the population 〈F〉(t) minus the recombination rate. The average over clones 〈⋯〉 implies averaging over Fi and τi. It can be decomposed into the average over the initial population (of individuals present at t = 0 with τi = t) and the average over subsequent recombinants that are generated via Poisson process with rate Nr. Hence
where we have defined
which expresses the average over the initial population and
which gives the contribution of all recombinants. If we further define
we arrive at the following expression for the participation fraction:
At sufficiently long times, rt ≫ 1, the population is dominated by new recombinants so that B0 and C0 may be neglected in comparison with Br and Cr.
The growth and decay of clones, equation (A.2), depends critically on the dynamics of the mean fitness, which in turn depends on the heritability of fitness. We have discussed the different cases (h2 = 0,1 and intermediate heritabilities) in the main text and will present the calculations pertaining to them in separate appendices below. To streamline the calculation, we will rescale all rates with σ (r → r/σ, A → A/σ, E → E/σ, F → F/σ) and multiply times with σ (t → tσ, τ → τσ). Consequently, v now equals γ. We will reinstantiate σ in the final results to facilitate comparison with formulas in the main text. In addition, the following integrals will prove useful:
where the latter holds only for a < b.
Appendix B.: Participation ratio for the fully epistatic h = 0 case
The fully epistatic case with recombination or mutation is the closest relative of the REM in population genetics, known in the population genetics literature as the 'house-of-cards' model [18]. In this model, every new genotype is drawn from the same distribution, which we will take to be the standard normal distribution, that is, fitness is non-heritable.
Let us consider rt ≫ 1 so that enough recombinants are produced to dominate over the initial set of genotypes and evaluate Cr(z,t) and Br(z,t). Furthermore, let us assume that the mean fitness is constant (we will revisit this assumption later). The size of a clone with age τ is then simply , and Cr(z,t) is given by
Our strategy in approximating this integral will be to identify the region where and expand the e−Φ exponential in that region; outside that region we shall neglect the exponential compared to 1 (in the square bracket). Defining , we have
where and we have along the way assumed that t > E∗(t), which is realized for . As we shall see below, the dominant contribution to Y comes from the region z−1 ∼ N, so that in the low r regime () we have . At the end of the calculation we shall check that t > t∗ condition is satisfied.
At short times the second term in our expression for Cr,t(z) is clearly smaller than the first, so that the dz integration in the expression for Y is controlled by the e−Nz factor, which set the scale for z at z ∼ N−1. Let us next estimate the time for which the second term in the expression for Cr,t(z) becomes larger than the first provided that z ≈ N−1. We obtain
Let us define and ; then
For this can only be satisfied if , which diverges in the large N limit. Below rc, but still close to it, we can approximate the crossover , and the crossover condition becomes
which is an approximation valid for .
It remains to calculate the participation ratio as a function of t asymptotically for t ≫ tc and for t ≈ tc. To do so, we need to evaluate the B-integral and perform dz integration in the integral representation for Y(t). The B-integral can be obtained by differentiating Cr(z,v) twice with respect to z, which evaluates approximately to
Hence
where
and
so that
which upon substitution into equation (B.11) yields the final result
This result deviates from zero for , consistent with our expected tc for r ≈ rc. For r > rc the participation ratio stays small for all t.
Appendix C.: Participation ratio for the general traveling wave
As before, the participation fraction is given by an integral over clones seeded in the past. As opposed to the fully epistatic case, however, we now consider a finite velocity, which limits the lifetime of clones. A clone seeded at time τ in the past at fitness A above the mean has a size
where . New clones get seeded with rate Nr, and to calculate 〈Y〉, we need to evaluate the integrals Cr(z,v) and Br(z,v) as defined in equations (A.7) and (A.9). In contrast to the case discussed above with v = 0, a finite v causes clones to have a limited lifetime, even if they are initially very fit.
The integral Cr(z,v) over F and τ has one contribution from young (small) clones with average fitness, in which case znj is small. This young contribution evaluates to
Here we have evaluated only the contribution from young clones and neglected the fact that there is a growing quadratic term in the exponent. The latter is due to old clones; we will evaluate this term next. If v < 1, the integrand starts growing again with τ and θ until it is cutoff at logz−1 = θτ − vτ2/2. For a given τ, this cutoff is at θ* = τ−1logz−1 + vτ/2, and the integrand has a peak at θ* if . In this case, we expand the integrand around θ*, i.e., , and find for the old clones
The remainder can be evaluated by assuming that the non-exponential parts are slowly varying. The integrand peaks roughly at logz−1/τ*2 = v/2 or . This translates into , and the curvature is 2F*logz−1/τ*3 = vF*/τ* ≈ v2. Hence
Br(z,v) can be evaluated by differentiating C twice, which yields
Appendix. Additive fitness functions
In the additive case with v = 1, the cutoff Cr(z,v) is dominated by the young clones for any recombination rate. Its second derivative, however, is dominated by the contribution from old clones if . In this regime, we find
In the last step, we have dropped all preexponential factors and reinstantiated σ. This correction approximately accounts for the fact that only a fraction σ of the clones that are seeded are successful.
For larger r, even the Br(z,h) term is dominated by young clones, and Y ∼ N−1. This behavior holds while the recombination rate is larger than . For smaller r, the velocity starts to deviate from 1. In this case, Cr(z,h) is dominated by old clones. For large enough populations, the powers of z vary much more quickly than all other terms, which we denote collectively by A(z), and we can approximate the integral as
where in the last step we reintroduced σ by replacing v with γ = v/σ2. Along the way, we have assumed , which is consistent with the assumption of small r made above.
Appendix. Epistatic fitness function
In cases with heritabilities 0 < h2 < 1, we generally have v < 1 and can be in a regime where Cr(z,v) ≈ z, while Br(z,v) is dominated by old clones. In this case, we have
This starts to be of order N−1 as soon as . This is similar to the condition identified in [29] based on the factorization of the mean field solution. Note, however, that this expression does not hold near the v = 0 line, since it is derived assuming a steady traveling wave.
Appendix D.: Velocity in the low recombination limit
With sufficiently frequent recombination, the mean fitness in the population increases with a rate v given by the variance in additive fitness among recombinant offspring. At lower recombination rates, however, the fitness variance in the population gets reduced below that of the distribution of recombinants. With reduced additive variance, v decreases. This effect has been studied in [35] for a model with only additive fitness, and we reproduce this argument for our simplified recombination model.
We will again work in units where the variance of the fitness distribution of recombinant genotypes is 1 and measure fitness a relative to the instantaneous mean fitness in the population. The invariant fitness distribution P(a) in the comoving frame a = A − vt is governed by
where the last term accounts for the injection of recombinant offspring. This equation has the solution
Note that this solution becomes negative for a > ac, and only the part a < ac is of interest. The zero crossing ac marks the position of the most fit individuals in the population, and its value will depend on the population size.
To determine the velocity of a finite population, Rouzine and Coffin [35] compared the rate at which new genotypes ahead of ac (records) are created to the speed at which the bulk of the population moves towards higher fitness.
Within the model, recombinant genotypes are generated with rate rN and drawn from a Gaussian distribution with unit variance. Hence genotypes with a > ac are produced at rate
Such a genotype seeded ac ahead of the mean has a chance acσ to establish and, if established, advances the nose by 1/ac. We therefore find for the speed of the nose
To solve for vnose = v and ac, we need an additional relation which can be obtained from the condition that P(a) is normalized. In the limit of small r, we can approximate equation (D.2)
The approximation is valid for ac(1 − v) ≫ r; otherwise the population distribution is a simple Gaussian, and we find v ≈ 1. Note that this condition is the same as the one we encountered above in the context of whether Cr(z,v) is dominated by young or old clones. For ac(1 − v) ≫ r, Cr(z,v) and equivalently the normalization of P(a) are dominated by the old clones. The normalization condition on P(a) now requires
which yields
Solving equation (D.4) for and equating the result with equation (D.7) results in
After rearranging and replacing v by γ = v/σ2, we obtain the expression given in equation (19).
Appendix E.: Mean field solution
Above, we discussed the large r solution to the equation
where we have introduced the symbol ρ(E) for the distribution of epistatic fitness. At large r, this model admits a factorized solution P(A,E) = ϑ(A,t)ω(E) with
The epistatic part needs both to be normalized, and C needs to equal the mean of the distribution ω(E). Those two are linked: when one is true, so is the other. Hence
Hence the constant C, adjusted to achieve normalization, equals the population mean of E.
Footnotes
- 5
Note that a slightly different dynamics is observed if recombination of individuals with the same genotype do not produce a novel genotype. In this case the effective outcrossing rate of large clones goes down and the population quickly condenses into a unique clone.