Ultrametric identities in glassy models of natural evolution

Spin-glasses constitute a well-grounded framework for evolutionary models. Of particular interest for (some of) these models is the lack of self-averaging of their order parameters (e.g. the Hamming distance between the genomes of two individuals), even in asymptotic limits, much as like what happens to the overlap between the configurations of two replica in mean-field spin-glasses. In the latter, this lack of self-averaging is related to a peculiar behavior of the overlap fluctuations, as described by the Ghirlanda–Guerra identities and by the Aizenman–Contucci polynomials, that cover a pivotal role in describing the ultrametric structure of the spin-glass landscape. As for evolutionary models, such identities may therefore be related to a taxonomic classification of individuals, yet a full investigation on their validity is missing. In this paper, we study ultrametric identities in simple cases where solely random mutations take place, while selective pressure is absent, namely in flat landscape models. In particular, we study three paradigmatic models in this setting: the one parent model (which, by construction, is ultrametric at the level of single individuals), the homogeneous population model (which is replica symmetric), and the species formation model (where a broken-replica scenario emerges at the level of species). We find analytical and numerical evidence that in the first and in the third model nor the Ghirlanda–Guerra neither the Aizenman–Contucci constraints hold, rather a new class of ultrametric identities is satisfied; in the second model all these constraints hold trivially. Very preliminary results on a real biological human genome derived by The 1000 Genome Project Consortium and on two artificial human genomes (generated by two different types neural networks) seem in better agreement with these new identities rather than the classic ones.

Here we will focus on the framework provided by spin-glass theory to Natural Evolution, which has attracted much interest in the past decades and nowadays represents an insightful and solid branch of modern disordered statistical mechanics.In this picture genomic randomness plays a crucial role; take for instance the adaptive walks approach, where Natural Evolution is modelled as a two-step stochastic process: i. the genotype of a species undergoes random mutations, ii.its newborns with higher fitness are preserved (see e.g.[21]).Then, as pointed out by Eigen, the compromise between replication efficiency and frequency of mutations in evolutionary dynamics is conceptually close to the compromise between energy minimization and entropy maximization in statistical mechanics, moreover, the error threshold in the mutation rate in the former mimics the thermal noise of the latter [22,23].Also, to quantify the genetic variability within a population, we can introduce a proximity measure q between pairs of individuals that plays like an order parameter (mirroring the replica overlap) and, just like in disordered systems, two distinct averages can be implemented, namely the population average (mirroring the thermal average) and the process average (mirroring the quenched average) [24].Therefore, one can consider the population-average of q, which in general depends on time, and inspect whether its average over a long time stretch exhibits vanishing fluctuations.If this is the case we have a self-averaging structure of the model (in such a way that its main features could be captured by the quasi-species1 limiting description), and, if not, we have a non-self-averaging structure, that is a hallmark of complex systems.
In this context it is also worth recalling Wright's Adaptive Landscape (see e.g.[35]), Fisher's Fundamental Theorem of Natural Selection and Kimura's Neutral Theory [36] which constitute fundamental steps [37].Along these lines, the development of a disordered statistical mechanical theory for Natural Evolution was started by Leuthäusser [30] and Tarazona [23] and a spin-glass setting was pursued by Derrida, Higgs, Franz, Peliti, Sellitto, and Serva, just to name a few (see e.g.[24,22,[31][32][33][34] and references therein).More specifically, we recognize two classes of models: those where both random mutations and selective pressure are involved (also referred to as rugged landscape models) and those where evolution is driven only by random mutations (also referred to as flat landscape models).Reference models for the former are the P-spin-glass [21], the random energy model (REM) [22] and the Hopfield model [23], while for the latter we mention the One Parent Model (OPM) [24], the Homogeneous Population Model (HPM) [33], and the Specie Formation Model (SFM) [31,32].The OPM is asexuated and its order parameter lacks self-averaging, while its sexuated counterpart, the HPM, is self-averaging, unless a threshold in the similarity between the two genomes that are matching to reproduce is introduced and this case corresponds to the SFM.Notably, in the latter, the presence of a similarity threshold yields a persistent, spontaneous formation and extinction process at the level of species with consequent breakdown of self-averaging.
The behaviour of the order-parameter fluctuations in spin-glass models has been extensively studied, starting from the fully-connected Sherrington-Kirkpatrick (SK) model [38][39][40][41], to its generalizations (see e.g., [43][44][45][46][47][48][49][50][51][52][53][54][55][56][57][58][59][60] and Sec.1.2 for more details), including the rugged landscape models mentioned above.There, the order parameters are proved to be non-self-averaging and their momenta satisfy a class of non-trivial identities known as Ghirlanda-Guerra and Aizenman-Contucci (the latter are actually a family of identities that is a subset of the Ghirlanda-Guerra ones).Thus, in the context of Natural Evolution, the presence of both random mutation and selective pressure seems to be associated to the break-down of self-averaging with the momenta of the order parameter obeying some constraints.The validity of such constraints in the case of flat landscape models is still an open question that deserves attention.In fact, we recall that Ghirlanda-Guerra identities played a pivotal role in Panchenko's proof of Parisi ultrametricity in the SK model [61][62][63] and ultrametricity, in turns, covers a key role in Natural Evolution (think for instance at the taxonomic classification in Biology).In this work we prove analytically for the OPM the validity of a new class of identities and find numerical evidence for their validity also for the SFM for which, instead, classic identities seem to be violated.

The harmonic oscillator of spin glasses: Sherrington-Kirkpatrick model
The SK model [1,2,64,65] is defined in terms of the pairwise Hamiltonian where the symmetric couplings J = {J ij } N,N i<j are 1 2 N (N − 1) i.i.d.random variables sampled from P(J ij ) = N (0, 1) 2 and the interacting units are N Ising spins S = {S 1 , ..., S N } ∈ {−1, +1} N .For a given inverse temperature β ∈ R + and for a quenched coupling setting J , we introduce the Boltzmann-Gibbs measure P N,β (S|J ), the partition function Z N,β (J ), and the quenched free-energy F N,β that read as where the expectation E is over the possibile realizations of J drawn from P. Next, for a generic observable O(S), we define the following averages Due to frustration among the spins in the network, once the temperature is lowered beyond a critical one 1/β c , the free-energy landscape of this system gets spontaneously rugged and minima hierarchically split one into another recursively; consequently, spins tend to freeze in configurations displaying no long-range ferromagnetic-like order.Then, a natural measure of (any) internal organization of the system is a similarity measure between the spin configurations obtained for two replicas of the system characterized by the same realization of disorder J , namely two configurations sampled from the same distribution P N,β (S|J ).In particular, the (simplest) order parameter is the two-replica overlap that is nothing but the normalized scalar product between the configurations, corresponding to two replicas labelled as a, b, and denoted as S a and S b .In the high-temperature region, spins behave independently of each other, replica configurations are uncorrelated and the overlap distribution is a Dirac delta peaked at zero, however, beyond β c and in the thermodynamic limit N → ∞, there emerge non-zero values for q ab , such that the overlap distribution is a (possibly infinite) sum of Dirac's deltas at these values (i.e. the so-called Parisi plateau), and the whole distribution P β (q) is retained as order parameter.Thus, as ergodicity breaks down, this model breaks also the permutational invariance among its replicas giving rise to the well-known phenomenon of replica symmetry breaking (RSB): this was suggested as an ansatz by Giorgio Parisi in the eighties [1,2] and then mathematically proved twenty years laters by Francesco Guerra [66] and Michel Talagrand [67] for the expression of free energy and by Dmitry Panchenko [61][62][63] for the hierarchical organization of its valleys, i.e. ultrametricity.Remarkably, Panchenko's proof is significantly based on the peculiar fluctuations of the overlap as summarized by the Ghirlanda-Guerra identities [41], vide infra.Indeed among the most striking features of the emergent order of the SK model at low temperature lies the spontaneous ultrametric organization of its pure states, resembling taxonomic ordering in Natural Evolution, as for instance captured by the 3-replicas and 4-replicas overlap joint distributions P β (q 12 , q 13 ) and P β (q 12 , q 34 ) that read as P β (q 12 , q 13 ) = 1 2 P β (q 12 )P β (q 13 ) + 1 2 P β (q 12 )δ (q 12 − q 13 ) , ( P β (q 12 , q 34 ) = 2 3 P β (q 12 )P β (q 34 ) + 1 3 P β (q 12 )δ (q 12 − q 34 ) . (1.9) The first expression highlights that, when considering three replicas of the system, it turns out that either two of their overlaps are independent, or they are identical and these two outcomes happen with the same probability; the second expression confirms that, even when looking at overlaps between two distinct couples of replicas, hence considering four replicas, such a correlation remains strong.
As a consequence of ultrametricity, along the past two decades a number of constraints on overlap fluctuations in the low temperature regime of spin glasses have been obtained in a mathematically controllable settings and, among these ensembles of families, the most famous ones are certainly the Ghirlanda-Guerra identities [41], whose simplest expressions read as as well as their linear counterpart (where we get rid of q 2 12 2 by substitution in the two equations above), obtained independently by Aizenman and Contucci [38] via stochastic stability (and later with several other techniques [58,59,39,40]), whence the first identity of the family reads as q 4  12 − 4 q 2 12 q 2 23 + 3 q 2 12 q 2 34 = 0. (1.12) Although the SK model remains the archetype of spin glasses, several variations on theme have appeared in the Literature, possibly relaxing its mean-field fully-connected nature.For instance, its version on random graphs (known as Viana-Bray model [42][43][44][45]) was studied finding ultrametric fluctuations, that naturally generalize Ghirlanda-Guerra and Aizenman-Contucci identities (and reduce to the latter whenever the coordination number of the graph approached the network size).and P-spin models [46][47][48] The same holds for models with higher-order interactions (known as P-spin models [46][47][48]), even in the diverging number of interactions (known as random energy model, REM [47]), up to extensions as neural networks (e.g., the Hopfield model) [52], and beyond [68,[55][56][57][58][59][60].Further, more abstract representations of the SK model, as for instance the Random Overlap Structures (ROSt) by Aizenman, Sims and Starr [49,50] and its diluted RaMOSt counterpart, also exhibit Ghirlanda-Guerra fluctuations [69,51].It is thus rather natural to further inspect the validity of these ultrametric constraints in glassy models of Natural Evolution.

Ultrametric fluctuations in glassy evolutionary models without selective pressure
Models such as Gardner's P-spin glass [70], Derrida's REM [71] or Hopfield's associative memory [23] have been shown to be plausible models for Natural Evolution under random mutations and selective pressure (see e.g.[22,21,72]), also, they are well-known to exhibit overlap fluctuations that respect both the Ghirlanda-Guerra and the Aizenman-Contucci identities.However, moving to models of Natural Evolution taking place in flat landscapes nothing has been said so far on the validity of these ultrametric constraints, a possible difficulty in answering this question possibly lays in the absence of an Hamiltonian representation for these models.In the following we inspect the three best-known models in this context, that is the One Parent Model (OPM, that is a model for asexual reproduction exhibiting, by construction, RSB on the scale of single progenies), the Homogeneous Population Model (HPM, that is a basic model for sexual reproduction, where reproduction may involve two parents regardless their genetic distance and it is replica-symmetric) and the Species Formation Model (SFM, that is much as the previous model where, crucially, a threshold in string similarity for dating is introduced and the latter turns the evolution of the model to be RSB and at the level of species rather than single genomes).We will show that for the OPM both the Ghirlanda-Guerra and the Aizenman-Contucci identities are violated and we prove the existence of another family of identities that is instead respected.Extending the same analysis on the HPM returns a rather simple scenario where all the identities are trivially respected (as anticipated since the model is replica-symmetric).Next, we tested (numerically) all the three families of ultrametric constraints on the SFM: a finite-size-scaling analysis suggests that they are expected to hold in the suitable limits (i.e., the infinite genome limit and large population limit), with the new class of identities being the ones minimally violated by the finite size effects.Driven by this last finding, we close this section by inspecting whether these constraints are fulfilled on actual genomes, focusing on a sample of the biological human genome and two artificial genomes and we find that the scenario depicted for the SFM is preserved also in these realistic settings.
The simplifying assumptions that we preserve along the paper are those of the original manuscripts (see e.g., [24]), namely • while Evolution takes place the population size is preserved and set equal to M ; • each individual a ∈ {1, ..., M } is represented by a string of N bits {S a 1 , S a 2 , ..., S a N }, with N constant during the evolutionary process, which can be interpreted as the genome of the individual a3 ; • the genome are subjected to mutations and we will focus on point-mutations4 that happen at constant mutation rate (among different generations) and independently of a given locus (i.e., the unit of the genome that mutates): we thus associate one-to-one to each genotype a phenotype5 .
• the dynamics is parallel: at each iteration all the individuals in the populations are removed and replaced by their offsprings.
With these simplifications the state of the population at given time t can be described by specifying the genome of all the individuals {S a i (t)} N i=1 .The natural measure of genetic distance between two individuals a and b is the Hamming distance where q ab is the overlap between the genomes of the individuals a and b and mirrors the overlap between the spin configurations of two replicas.Analogously, the M × M matrix q evaluated at a given time t provides a snapshot of the population structure at that time.Interestingly, it can be proved that, in the N → ∞ limit, the three flat landscape models under consideration can be simulated by directly looking at the evolution of q rather than dealing with the set of genomic sequences [31,32].

The One Parent Model (OPM)
In the OPM studied by Derrida and Peliti [24] we consider a population Ω, made up of a fixed number, M , of individuals reproducing synchronously and asexually, whose genome at generation t is encoded by a N -bits vector S a (t) ∈ {−1, +1} N for a = 1, ..., M .At each generation t, all the individuals are removed, and a new generation is formed by offsprings of the previous individuals.More precisely, each individual a ∈ Ω is randomly associated to a parent G(a) ∈ Ω and the genome S a (t) is taken identical to that of its parent S G(a) (t−1) at the previous generation t−1 except for random mutations, as specified by where µ ∈ R + tunes the mutation probability; the subscript highlights "1" that we are comparing individuals separated by one generation and, in the following, the expectation related to P 1 shall be referred to as E 1 .As for the mapping a → G(a), it is assumed that G(a) is chosen independently and uniformly in Ω for each individual and at each generation.Therefore, for any individual a ∈ Ω, its ancestors over the previous t generations are given by the sequence {G(a), G 2 (a), ..., G t (a)} = Γ t (a).
As remarked in Sec. 1, analogously to spin glasses, we have two averages: at each generation t we can take the average of any quantity involving the individual of the whole population Ω (population average • ) but, as this quantity may fluctuate even for an infinitely large population Ω according to the particular mapping sequence (Γ t ) which has taken place, we should consider also the average of these quantities taken over all possible realizations of the reproduction process (process average • ).Crucially, the process average can be obtained by averaging over the temporal unfolding of the process for a sufficiently long time stretch, since the time sequence (Γ t ) of mappings belonging to different time intervals are independent, but this has to be done with care, properly inspecting the typical decorrelation time of the stochastic process (an analysis that we perform in the next subsection).
Specifically, at generation t, the population average, that we denote with q t , can be obtained by means of the following where Thus, q t fluctuates in time about a mean value that we denote with q and which can be expressed as q = q P (q)dq, where P is the overlap distribution averaged over time.It can be proven [24] that, in the limit N 1, the time-averaged overlap distribution depends only on the parameter λ := 1 4M µ and it is Numerical estimate for the overlap probability density function P (q) for a population of M = 50 individuals averaged over 10 5 generations.Several genome sizes are tested and depicted in different colors as explained in the legend, while the dashed curve corresponds to the theoretical probability distribution given by (2.6).As expected, if λ < 1 mutations are likely and the overlap concentrates on values q ≈ 0, if λ > 1 mutations are rare and the overlap concentrates on values q ≈ 1, while if λ = 1 the probability distribution becomes uniform.
such that q = q P (q)dq = λ λ + 1 . ( Notice that for λ < 1 the distribution is peaked at q = 0, for λ = 1 the distribution is uniform in the interval 0 < q ≤ 1, and as λ exceeds 1 the peaks is at q = 1.As shown in Figure 1, the agreement between theoretical predictions and simulation outcomes is pretty good already for relatively small sizes and the overlap is better and better as N is made larger.Remarkably, the broad distribution of the overlap q highlights the non-self-averaging nature of the order parameter in the OPM.Indeed, even in the infinite genome-size limit, one has [24] q 2 − q 2 = 0. (2.8)

Exponential decay of correlations for efficient sampling
If we let the system evolve for a time t 1, the last generation will be made up of individuals with a unique common ancestor with probability one in the asymptotic limit [24], hence it is possible to find an expression for the decay of genome-correlations between individuals at a given time and their common ancestor, as a function of time.Let us start evaluating the expectation value of the overlap between a parent S G(a) (t) and the corresponding offspring at t + 1: where in the last line we exploited the fact that loci are independent.We can demonstrate that the expectation value of the overlap between a parent and the corresponding offspring at t + ∆t is where the average E ∆t is performed over the distribution P ∆t which generalizes (2.2).We prove this by induction: first we observe that q ∆t=1 = e −2µ is true, next we assume that q ∆t = e −2µ∆t is true for any ∆t and we check that this is sufficient to ensure that also q ∆t+1 = e −2µ(∆t+1) is true.In the following, in order to lighten the notation, we shall we set t = 0 without loss of generality and we shall drop the superscript labelling the individual without any loss of information: the individual we are referring to is a or its ancestor at the generation specified by time dependence of S.
Let use observe that q ∆t can be written as and, analogously, q ∆t+1 can be written as By the law of total probability we can write Recalling that we reach (2.17) By direct substitution of the last equation into (2.14)we get and, recalling the definition of q ∆t given in (2.13), the last equation gets In figure 2, the exponential decay of the overlap between the ancestor and its offsprings as a function of time is shown along with the related family tree.

A new class of ultrametric identities
As stressed in Sec.2.1, in the OPM the genome overlap is non self-averaging and q 2 − q 2 = 0.
However, following spin-glass theory, one may wonder whether other kinds of relation hold among overlap momenta.As recalled in Sec. 2, in the SK and many variations of its, the Ghirlanda-Guerra and Upper panel: number of individuals generated by the ancestor as a function of the generation; in this particular evolution, all the individuals present at time t = 13 turn out to stem from the same common ancestor and other branches that have not survived are omitted in this plot.The colormap highlights the overlap between the various individuals and the ancestor.Lower panel: time decay of the overlap between the ancestor and its offsprings; the numerical estimate (solid line) is consistent with the theoretical estimate e −2µt (dashed line); the shadow represents three standard deviations evaluated by repeating the process 100 times.
Aizenman-Contucci identities are preserved even under replica-symmetry-breaking. Here, to inspect the validity of these identities we study numerically the following quantities ε GG = q 4  12 − 2 (q 12 q 13 ) 2 + q 2 12 2 (2.20) ε AC = q 4 12 − 4 (q 12 q 13 ) 2 + 3 q 2 12 q 2 34 (2.21) where ε GG,AC,SA are interpreted as a measure of possible violation.Remarkably, since our inspection is only based on numerics, at finite population size and along a finite time span, in order to verify if non-null values of ε GG,AC,SA are intrinsic or, rather, stem from finite-size effects, we will perform a finite-size-scaling: if the extent of ε is non-decreasing by increasing the system size, we will have a signature for the breakdown of the related identity.

23
+ Perm(1, 2, 3) q 12 , q 13 , q 23 ∈ (0, 1] Errors measured for the various ultrametric identities as a function of the mean overlap for the OPM and for different values of M ; the process average is taken over a run of 10 2 × M generations.Note that, as M (and the time span, accordingly) grows, the error on self-averaging (εSA), on Ghirlanda-Guerra identities (εGG) and on Aizenman-Contucci polynomials (εAC) does not decrease, while the errors ε1, ε2 are robustly vanishing.As M is made larger and larger P (q, t) gets monomodal and peaked at q t = 1/2, thus highlighting a replica symmetric behavior of the specie evolution in the HPM.Errors measured for the various ultrametric identities as a function of the mean overlap for the HPM and for different values of M ; the process average is taken over a run of 10 2 × M generations Note that, as M grows, all the errors depicted tend to approach the horizontal axes as expected since the overlap is self-averaging in this model.
OPM model if the overlap between the parents G(a) and G(b) of two individuals, a and b, is q G(a)G(b) then the expectation value of the overlap of a and b is If N is infinite this becomes a deterministic rule for updating the overlap matrix.There is an equivalent rule for updating the overlap matrix for the HPM in the limit N → ∞.The pair of spins S a i S b i is inherited from one of the four combinations of parents of the two individuals with equal probability, therefore with q aa = 1 always.It can be proven that the variance of q t vanishes in the limit M → ∞, thus q t is self-averaging in the HPM, in particular lim M →∞ q = λ 1+λ .Figure 4 shows results about the overlap distribution for long simulations of the HPM model with λ = 1 and, accordingly, lim M →∞ q = 1/2: the various rows show the overlap distribution P (q, t) sampled at different times and by inspecting its variance as a function of M it can be shown that it scales as 1/M hence it is expected to disappear for large population size M such that P (q, t) gets concentrated around q = 1/2, showing that selfaveraging of the overlap is respected hence proving that the model behaves in a replica symmetric manner.Consistently, in figure 5 we show that the errors on the ultrametric identities approaches zero as M is made larger and larger.

The Specie Formation Model (SFM)
The SFM introduced by Higgs and Derrida [31,32] is nothing but the two-parents model of Serva and Peliti with a threshold for mating q min .Specifically, it is defined in the same ways as the HPM earlier, except that the first parent G 1 (a) of individual a is chosen randomly from the previous generation whereas the second G 2 (a) is chosen only from those individuals having an overlap q G1(a)G2(a) greater  .Distribution of the overlap P (q, t) of the SFM for λ = 1.0, qmin = 0.65, M = 2000.Note that, at difference w.r.t. the previous replica-symmetric HPM, here there actually are new specie formation and extinction as the presence of several peaks in the P (q, t) evidences.Further, beyond the big one on the left (that is exponentially collapsing toward zero as time goes on), the erratic presence of these small peaks confirms that the SFM gives rise to an evolutive scenario strongly resembling the paining by replica-symmetry-breaking of the low temperature spin glasses.
than a cutoff value q min6 .In the absence of a cutoff, thus in the case of HPM, there is a natural mean value of the overlap q = λ/(1 + λ), in such a way that, if the SFM q min > λ/(1 + λ), the system is highly perturbed by the introduction of the threshold and it can never reach its natural equilibrium state: π άντ α ρε ι as in the low temperature regime of spin glasses.A corroboration of this picture appears in figure 6 where we show the distribution of the overlap P (q, t) at different times: the peaks appearing above the threshold q min > 0.65 correspond to the overlaps of the new species that have formed (and their disappearance to their extinction) while the large peak below the threshold exponentially collapses toward zero (since such values of the overlap are lower than q min no interbreeding is possibile between them and, consequently, the peak must be vanishing).

Numerical inspection of ultrametric constraint's violation in the SFM
We now study numerically the validity of all the ultrametric identities as well as of the self-averaging, by measuring the related errors ε.Specifically, we set λ = 1 in such a way that the expected value in the HPM is q = 1 2 and we vary q min ∈ [0, 1].We simulate the evolution over a population of size M and a time span 10 2 × M , and we collect data for ε 1 , ε 2 , ε GG , ε SA , ε AC that are plotted in Figure 7 versus q min .As expected, when q min < 1 2 , the threshold does not involve significant effects At the numerical level, the new set of identities keep holding also for the SFM, while all the other constraints seem to be violated.
with respect to the HPM and a replica-symmetric scenario is recovered with errors ε close to zero.Conversely, the region min > 1 2 is non-trivial and there emerge differences between the errors.As for the classical identities for the variance, the related errors (ε SA,GG,AC ) are non-vanishing, and their values grows with q min without any robust trend with respect to the size M ; as for the new identities, the related errors (ε 1,2 ) remain close to zero.

Numerical inspection of ultrametric constraint's violation in biological and artificial human genomes
In this Section we look for any evidence of the relations discussed before in actual genomic datasets.
To this aim we tested all the ultrametric identities and the self-averaging property on the biological genome collected by The 1000 Genomes Project Consortium [73] and on artificial genomes generated by two neural neworks (a Generativa Adversarial network and a Restricted Boltzmann machine) [74]; notably, the latter have already proved to reproduce correctly allele frequencies, linkage disquilibrium, pairwise haplotype distances and population structure.
As in [74,75], we consider a population of M = 2504 individuals (∼ 5 • 10 3 haplotypes) spanning N = 805 Single Nucleotide Polymorphism (SNPs) 7 from [73], which reflect a high proportion of the population structure present in the whole dataset [74,76].The various fluctuation relations are evaluated by splitting the dataset of M individuals into √ M groups: the population average • is carried out by identifying distinct replica indices with distinct individuals within the same group.In contrast, the process average • is carried out by performing an arithmetic mean over the different evaluations of each group.Regarding the finite size scaling in N it has been carried out by selecting a common subset of size N of the genome variable (−1, +1) for each individual. 7Single nucleotide polymorphisms are the most common type of genetic variation among people: each SNP represents a difference in a single nucleotide (e.g., a SNP may replace the nucleotide cytosine C with the nucleotide thymine T in a certain stretch of DNA). .Finite-size-scaling on genome length testing the various ultrametric identities for the real human genome (HUMAN, taken from [73]) and two artificial (taken from [74]) generated rispectively with a Generative Adversarial Network (GAN) and a Restricted Botlzmann machine (RBM).In these settings still our new family of identities seems to be respected, while mild violations of the others persist (despite their violation is minimal, i.e. εAC ∼ εGG ∼ O(10 −3 )).
Results are collected in Figure 8 and show that also in these structured datasets the new set of identities is better respected w.r.t. the classical ones (despite the violation of the latter is minimal in these settings).

Conclusions
The non-self-averaging behavior of the order parameter in spin-glass models is a peculiar, intensivelystudied feature which can be described in terms of a set of relations connecting the fluctuations of the order parameter.Driven by strong analogies between Natural Evolution and statistical mechanics of disordered systems, we investigated the validity of these ultrametric relations and the existence of other kinds of relations focusing on three stochastic models of evolving populations in flat landscapes.These models are the fairly standard ones in the Literature on Natural Evolution without selective pressure, that is (i) the One Parent Model (OPM) -where reproduction is asexual and the distribution of genetic distances lacks self-averaging -(ii) the Homogeneous Population Model (HPM) -where reproduction is sexual and with random mating (i.e., regardless the genetic distance) and thus results in a replica symmetric picture where the genetic distance between pairs of individuals has vanishing fluctuations in the thermodynamic lmiit (hence the adjective homogeneous in the model's name) -and (iii) the Specie Formation model (SFM) where reproduction is still sexual but with a threshold on the required similarity among mating genomes before duplication.The latter represents the most interesting case as it is the closest to biology and it spontaneously gives rise to a complex dynamics reaching a steady state with new species that are continuously and spontaneously generated and suppressed during the evolutionary process.Further, while in the first model the evolutionary tree is assumed and it is related to single descendants from common ancestors, in the latter the evolutionary tree emerges and it works at the level of species rather than single elements.Focusing on fluctuations in the genetic distances between individuals, as far as the OPM is concerned, after checking that self-averaging is absent in this model statistics, we have shown by a finite-sizescaling argument that nor the Ghirlanda-Guerra identities neither the Aizenman-Contucci polynomials are respected.On the other hand, we were able to prove a new class of identities that are indeed respected also in our finite-size numerical checks.For HPM, as it is replica symmetric, all these constraints are equally guarantee to converge to zero in the asymptotic limit but they do not convey actual information.Then, dealing with the SFM, our identities continue to hold, being only mildly affected by finite-size effects.
As a final test we focused on human gemomes: we considered the real biological dataset taken from the 1000 genome project consortium and two synthetic datasets on artificial genomes generated by neural networks and, for all these three cases the scenario depicted by the SFM seems to be here as well: the new set of ultrametric identities is sharply respected while mild violations affect both Ghirlanda-Guerra identities as well as Aizenman-Contucci polynomials.
As models of Natural Evolution under selective pressure, namely Darwinian Evolution, are known to display standard Ghirlanda-Guerra fluctuations (from the Franz-Peliti-Sellitto model, i.e., the P → ∞ limit of the Kauffman-Levin P-spin-glass model, or the REM in a spin-glass jargon to the equaltrap model analyzed by Leuthäusser and Tarazona the Hopfield model in spin glass jargon), a similar analysis to the present one should be conducted also in these settings to better infer the role covered by Natural Selection (beyond random mutation) in shaping evolutionary taxonomies because, at present, these new findings seem to be in better agreement with Kimura Theory of Neutral Evolution: we plan to report soon on this topic.

. 6 )Figure 1 .
Figure1.Numerical estimate for the overlap probability density function P (q) for a population of M = 50 individuals averaged over 10 5 generations.Several genome sizes are tested and depicted in different colors as explained in the legend, while the dashed curve corresponds to the theoretical probability distribution given by (2.6).As expected, if λ < 1 mutations are likely and the overlap concentrates on values q ≈ 0, if λ > 1 mutations are rare and the overlap concentrates on values q ≈ 1, while if λ = 1 the probability distribution becomes uniform.
Figure2.Upper panel: number of individuals generated by the ancestor as a function of the generation; in this particular evolution, all the individuals present at time t = 13 turn out to stem from the same common ancestor and other branches that have not survived are omitted in this plot.The colormap highlights the overlap between the various individuals and the ancestor.Lower panel: time decay of the overlap between the ancestor and its offsprings; the numerical estimate (solid line) is consistent with the theoretical estimate e −2µt (dashed line); the shadow represents three standard deviations evaluated by repeating the process 100 times.

Figure 3 .
Figure 3. Errors measured for the various ultrametric identities as a function of the mean overlap for the OPM and for different values of M ; the process average is taken over a run of 10 2 × M generations.Note that, as M (and the time span, accordingly) grows, the error on self-averaging (εSA), on Ghirlanda-Guerra identities (εGG) and on Aizenman-Contucci polynomials (εAC) does not decrease, while the errors ε1, ε2 are robustly vanishing.

Figure 4 .
Figure 4. Distribution of the overlap P (q, t) of the HPM at λ = 1.0 and for different values of M as shown in the legend.As M is made larger and larger P (q, t) gets monomodal and peaked at q t = 1/2, thus highlighting a replica symmetric behavior of the specie evolution in the HPM.

Figure 5 .
Figure 5. Errors measured for the various ultrametric identities as a function of the mean overlap for the HPM and for different values of M ; the process average is taken over a run of 10 2 × M generations Note that, as M grows, all the errors depicted tend to approach the horizontal axes as expected since the overlap is self-averaging in this model.

Figure 6
Figure 6.Distribution of the overlap P (q, t) of the SFM for λ = 1.0, qmin = 0.65, M = 2000.Note that, at difference w.r.t. the previous replica-symmetric HPM, here there actually are new specie formation and extinction as the presence of several peaks in the P (q, t) evidences.Further, beyond the big one on the left (that is exponentially collapsing toward zero as time goes on), the erratic presence of these small peaks confirms that the SFM gives rise to an evolutive scenario strongly resembling the paining by replica-symmetry-breaking of the low temperature spin glasses.

Figure 7 .
Figure 7. Errors measured for the various ultrametric identities as a function of the mating threshold qmin for the SFM and for different values of M ; the process average is taken over a run of 10 2 × M generations.At the numerical level, the new set of identities keep holding also for the SFM, while all the other constraints seem to be violated.

Figure 8
Figure 8. Finite-size-scaling on genome length testing the various ultrametric identities for the real human genome (HUMAN, taken from[73]) and two artificial (taken from[74]) generated rispectively with a Generative Adversarial Network (GAN) and a Restricted Botlzmann machine (RBM).In these settings still our new family of identities seems to be respected, while mild violations of the others persist (despite their violation is minimal, i.e. εAC ∼ εGG ∼ O(10 −3 )).