New probability distribution describing emergence in state space

We revisit the pairing model of state spaces with new emergent states introduced in J. Phys. A: Math. Theor. 51 375002, 2018. We facilitate our analysis by introducing a simplified pairing model consisting of balls able to form pairs but without any internal structure. For both the simplified and the original model we compute exactly the probability distribution for observing a state with $n_p$ pairs. We show this distribution satisfies a large deviation principle with speed $n \ln(n)$. We present closed form expressions for a variety of statistical quantities including moments and marginal distributions.


Introduction
The focal point of complexity science is the study of the emergent structures generated by the interactions between the components of a given system. Depending on the nature of these interactions, the number of available states W (N ) accessible to the fully interacting system will exhibit different functional dependencies on the number of components N .
In statistical mechanics the state space of the N component system is given by the Cartesian product of the state space of the individual components. For this reason W (N ) will be exponential in N , or asymptotically it grows as W (N ) ∼ O(k N ) for a real, positive constant k, such as in the Ising model where W (N ) = 2 N .
If the interdependence between the components is able to freeze out some of the Cartesian product states and makes the states inaccessible, W (N ) may grow slower than exponentially; some examples of this case are described in [1].
In contrast, if new collective states become possible as a result of the intercomponent interaction, W (N ) will grow faster than exponentially. One may, e.g., think of the formation of hydrogen molecules out of hydrogen atoms. Hydrogen molecules exhibit properties that are not intrinsic to hydrogen atoms. Taking into account the emergent properties of hydrogen molecules as new collective states, the state space of a compound system -atoms and molecules -will grow faster than an exponential function.
One all inclusive definition of emergence does not exist. Here we focus on situations where interaction generates entirely new states different from the Cartesian combination of single particle states. From this perspective the hydrogen molecule is a new emergent state. To do this we study the statistics of a minimal model of such cases, namely the pairing model introduced in [2]. The pairing model has been studied in the context of relating the N dependence of W (N ) to generalised entropies in a number of publications, see [3,4,5,6,7,8,9,10,11]. One may think of the components of the pairing model as coins. Each coin can either show head or tail. In addition to the single component states, head and tail, two coins can form a paired state.
As a result W (N ) grows faster than exponentially and the dependence on N is controlled by a recursive equation, which we use to compute statistics of the paired and unpaired components. This allows us to present large deviation expressions with a speed different from N , namely of the form P N (x) exp[−N ln N I(x)].
From now on, to make our formulae more readable we will use lower case n to denote the system size to reduce clutter. We will introduce two probability distributions, P n (n p ) and P n (n p , n h ), where the number of coins showing head is denoted by n h and the number of pairs by n p . We point out that most of the properties of these distributions can be computed in closed form thereby facilitating statistical modelling and further analytical investigation.
Moreover, we will show that both distributions P n (n p ) and P n (n p , n h ) satisfy a large deviation property. We recall that a probability distribution satisfies a Large Deviation Property (LDP) [12] if the limit lim n→∞ − 1 a n ln P n (X n ) exists for a random variable X n , a sequence of distributions P n , and a sequence of positive numbers a n , called speed, for n ∈ {1, 2, . . .} that tends to ∞.
The large deviation speeds that we encounter in standard statistical mechanics corresponding to state spaces sizes W (n) exponential in n, are usually linear a n = n. However, non-linear speed is studied in the large deviation literature. See for example [13,14,15].
The n ln n speed becomes particularly interesting when one realises that a system of pairing coins corresponds to a graph where each node has either degree zero or one. This is the simplest class of the more general class of non-trivial sparse networks, for which the form of the LDP still remains open though considerable progress has been achieved in recent years [16,17,18] (e.g. see [17] for an introduction).
In the sparse graphs case, quadratic speed, or a n = n 2 , is needed to ensure the existence of non-trivial large deviation bounds [17]. However, these bounds do not fully solve ultra-sparse networks problem. For instance, for a Erdös-Rényi graph G(n, p) in which p → 0 and n → ∞ and sampled random networks are ultra-sparse, the problem has not been fully resolved. Because of the indicated resemblance of the pairing model and ultra-sparse networks, the results of the present paper are relevant to the LDP for sparse graphs, and in particular, suggest that the speed n ln n may be of importance to such networks.
To find the large deviation distribution of P n (n p ) and P n (n p , n h ) for a random configuration of size n, let us denote the ratio of the number of pairs by m n as where δ X i ,p is the Kronecker delta and equal to one if X i is in a pair state, and the ratio of the number of head, namely s n , states as whereas δ X i ,1 is equal to one for head states. For m n and s n , we shall derive the large deviation probabilities P n (m n ) and P n (m n , s n ) with speed n ln n in the forthcoming sections.
The statistics also contain quantities described by a LDP with the usual linear speed n. This happens when the relative abundance of non-paired to paired elements is keep fixed while n increases.
The rest of the paper is organised as follows. The next section describes the pairing model and presents the recursive relation for the state space volume W (n) and discusses a number of random variables and their probability distributions. Sec. 3 defines a family of binomial-like distributions to formalise our discussion of the large deviation expressions presented in Sec. 4. Sec. 5 presents results for the case where the system size grows in relation to number of interactions. Sec. 6 derives moments of the distributions in closed form. This is followed by a discussion of expressions for marginal distributions in Sec. 7. We present the summary in the last section.
Finally, for completeness we include a number of tables at Appendix A to list the distributions and their properties. See tables A1, A2, A3 and A4.

The model and its recursive state space volume
Think of the components of the pairing model as coins. Each coin can either show head or tail or form a paired state with another coin. The state space volume of n coins is determined by the following recursive rule [2] The asymptotically leading behaviour of the solution to this equation is given by [2] Despite the model's simplicity, to the best of our knowledge, the probability distributions describing the configurations in terms of their number of heads, tails and paired coins have not been reported in the literature.
It turns out to be instructive to introduce a variation of the model, which is even simpler than the original pairing model. Let us refer to the original version introduced in [2] as Coins Model (C-Model): consisting of n coins. Each coin can be in one of two states, head or tail, or in a paired state together with another coin. The C-Model is introduced and discussed in [2].
Balls Model (B-Model): consisting of n structureless objects, we will call them balls. Each ball can occupy one single particle state only -stand-alone state-, but any two balls are able to combine and form a paired state. The B-Model is introduced for the first time in this paper.
It is illuminating to analyse the pairing in a way similar to the usual binomial analysis of Bernoulli variables. Where the binomial distribution corresponds to binary random variables, we will also look at binomial-like distributions suitable for the analysis of random variables that besides the binary single component state are able to form pairs. In the next section we do this analysis starting with the simpler B-model before we turn tot he A-model.

Model B-Model
Let S n denotes the set of all possible configurations for the B-Model. The state space volume of n balls is determined by the following rule The number of pairs denoted by n p is a random variable, the ensemble of configurations of the B-Model, S n . From the outset, we assume equal probability among configurations that consist of the same number of pairs. A specific configuration with i pairs by c i , i ∈ {0, 1, . . . , n 2 }. Partitioning S n by the number of pairs introduces the following disjoint subsets such that In other words, S i is a subset of configurations with i pairs. Next, let p i denote the probability of the event set S i Moreover, since p i are probabilities the normalization condition requires The cardinality of the subset S i is Note that the double factorial counts the number of distinguishable pairs. In the case of uniform distribution, we must have In contrast, the probability for observing a specific configuration c i ∈ S i defines by Using the assumption of equal probability among configurations with the same number of pairs, p i must be equal to q i times the cardinality of the subset S i . Thus Consequently, the normalization condition (10) reads as     Table 2.
Another S 4 distribution.

Model C-Model
The set S n denotes the set of all possible configurations for the C-Model, and for n coins the cardinality of S n is introduced recursively in (4). Recall that for the B-Model, we assumed equal probability for configurations in S i . Similarly, for the C-Model, the number of pairs in each configuration partitions S n to disjoint subsets. In addition, each configuration has (n − 2i) coins that are in non-paired state, and n h is a random variable that denotes the number of coins in the head state Let us denote by j, the numbers of head states in a specific configuration c i . For the C-Model, we assume equal probability among configuration with the same number of pairs and head states. Consequently, to obtain the probability distribution for the C-Model, when for a single coin the probability of observing a head is ρ and a tail state is 1 − ρ, this new parameter must be included either. Furthermore, we denote by S ij disjoint subsets of S i with j coins in head state (n u = j), such that S ij := {c ∈ S i : c has j coins in head state}, (17) and Given S ij is a subset of S i , we define the conditional probability of event S ij |S i , or probability of observing configurations with j heads, given i pairs There are n−2i j distinct configurations for selecting j heads among n − 2i non-paired coins, and therefore p j|i is Note that the normalization condition for the conditional distribution, given i, satisfies Finally, the probability of event S ij , or observing configurations with i pairs and j heads, is and according to the probability chain rule it can be written as Hence, the probability of observing a specific configuration c ij ∈ S ij with i pairs and j heads must be . This argument is schematically represented in figure (2) as a probability tree.

Closed form of p i and p ij
We shall see, aside from ρ, or head state parameter, one parameter is enough to describe the pairing probabilities, q i . Let us for a moment consider on the B-Model, i.e. balls in stand-alone or paired state. First consider just two balls, as depicted in figure (3), there are two configurations in the set S 2 . The normalisation condition in (15) reduces to and we have for the ratio that Since r denote the ratio between the probabilities for not making a pair to making one, r < 1 favours pairs to stand-alone states, r = 1 corresponds to equal probability and r > 1 favours stand-alone states to pairs. For n = 3, the number of possible pairs is the same as for two coins, in other words, n p ∈ {0, 1}. Indeed, the difference between n = 3 and n = 2 is in the degenerate configurations for n p = 1: there are three distinguishable configurations, each with equal probability.
A similar relationship holds for all consecutive even and odd numbers, since the range of pair numbers n p ∈ {0, 1, . . . , n} is the same for 2n and (2n + 1). In fact n = 2n To stress the dependence of q i on the number of balls, namely n, we replace q i by q Since q (n) i is the probability for forming i specific pairs amongst the n balls, we have We normalise over the total number W (n) of configurations by introducing a normalisation constant defined as which allows us to write the probability for a specific state with i pairs amongst n balls as For the B-Model we arrive at the following expression for the probability distribution whereas, for the C-Model we obtain a similar result that includes the head and tail states 3.4. Finding the normalisation constant, c n,r We now derive some properties of the normalisation constant, c n (r), that was introduced in (30). In Appendix D, we obtain a recursive relation for even and odds normalization constants The last result for even numbers resembles the recursive rule of the state space volume in (4) despite a factor r in place of 2. See table (3) for the first eight iterations of c n (r) These are very similar to Hermite polynomials as function of r. Though all their coefficients are positive. Appendix B represents the relation between Hermite polynomials and the normalization constant. Table 3. Normalization constant for first eight ns. n c n (r) 2 r + 1 3 r + 3 4 r 2 + 6r + 3 5 r 2 + 10r + 15 6 r 3 + 15r 2 + 45r + 15 7 r 3 + 21r 2 + 105r + 105 8 r 4 + 28r 3 + 210r 2 + 420r + 105 9 r 4 + 36r 3 + 378r 2 + 1260r + 945 We must stress that the new recursive relation is satisfied by the normalisation constant for an arbitrary, non-negative r. This result is more general than the state space volume recursive relation. It seems that the state space structure is not encoded only in a single recursive rule but also is valid for the normalisation constant of the probability distribution, parametrised by r and n.
In a statistical mechanics context, the normalisation constant c 2n (r), is known as the partition function. Hence Eq. (34) describes the composition law of the partition function in the state space [2,3,9].
Also from the definition of c 2n (r) in Eq. (30), the asymptotic leading terms for constant r and r n is The above asymptotic identity is derived directly from the result reported in [2].
Recall that r ∈ [0, ∞), thus when r is in the same order of n or greater we obtain different asymptotic leading term for c n (r) like where Check Appendix J for the details.

Large deviation property
The mathematical definition of the large deviation property (LDP) is represented in Appendix C. It should suffice here to recall that a sequence of probability measures {P n ; n = 1, 2, . . .} is said to have a large deviation property if there exist a sequence of positive numbers a n (speed) for n ∈ {1, 2, . . .} that tends to ∞, and a function I(x) (rate function) such that the following limit exist Denote the state of a ball in the B-model at index l, or a coin in the C-model, by a random variable X l . Then for a particular configuration, random variables n p (the number of pairs) and n h (the number of heads) calculates as where δ X i ,p is the Kronecker delta and equal to one if X i is in a pair state, whereas δ X i ,1 is equal to one for head states. In addition, the large deviation random variables of interest are defined as and In the continuum limit, n → ∞, both variables are in R [19].
Appendix E finds the logarithm of P n (m n ) in (E.3) as It is interesting to see the resemblance of the terms inside the bracket and the Shannon entropy [20]. For 0 ≤ p ≤ 1, the Shannon entropy for a binary variable with probability p is and along the same line, we definẽ to rewrite the last result as Consequently, the large deviation limit is Notice that the speed is a n = n ln n. Hence, the large deviation probability of the B-model obtains as for the rate function When n ln n is not appreciably larger than n, we need to include the correction due to terms in O(n) order as or Although the large deviation rate function is (1 − m n )/2, for practical purposes the O( 1 log n ) correction must be included. For instance, let say the number of elements is in Avogadro's number order, or n = 10 23 . Then n ln n = 53 × 10 23 is 53 times larger than n, andH r (m n )/2 can have comparable number of significant figures in comparison to (1 − m n )/2.
For the B-model, Appendix E calculates (E.5) as where ρ is the probability of observing head state, and I ρ (s n ) is the Bernoulli rate function [19,21] Furthermore, the large deviation limit is and for the rate function 5. The probability distributions for the limiting case lim n→∞ r n = For both P n (m n ) and P n (m n , s n ) the minimum of the rate function is at m n = 1, independent of r and ρ. In other words, in the thermodynamic limit (n → ∞), all the balls or coins are in a paired state. Recall that the parameter r is the relative abundance of non-pairs to pairs in both the B-Model and the C-Model. And seemingly, it does not affect the final outcome of distribution of states in the thermodynamic limit.
To explain this result, we refer to the degeneracy of the number of pairs in Sec. 3. In the thermodynamic limit, the terms n 2np (2n p − 1)!! grows fastest when n p = n/2. So, using the definition of partitions of S or S in Sec. 3, for any finite value of r, the cardinality of the subset S i=n/2 is much larger than any other subsets, and their volumes are negligible in comparison to this single subset. Consequently in the limit, the probability of observing configurations in S i=n/2 approaches to one.
As we shall see, in one special setting the n ln n speed is replaced by the usual linear n speed. Let us consider r to be proportional to n. Recall r ∈ [0, ∞), such that r > 1 assigns higher probability to stand-alone states. We will examine the large deviation probability of the same random variable when This idea is similar to retrieving the Poisson distribution from the Binomial distribution, when ρn = λ is kept constant. First we need to find the asymptotic leading term for c n ( ), or the normalization constant in (36). We do that by replacing n in place of r (Note that (35) was written for constant r and is not applicable for this case). The asymptotic leading term of ln c n ( ) in (36) finds as ln c n ( ) = − n 2 g( ) ln g( ) + n e + n 2 ln n e + O(1) , ≤ e − n 2 g( ) ln g( ) + n 2 ln( n) where g( ) is defined in (37). See Appendix J for details. Next, using ln c n ( ) and replacing r by n for ln P n (m n ) in (42), the terms that are in order O(n ln n) cancels each others, and for the B-model it obtains where Similarly for the C-model and P (m n , s n ), (51) derives P (m n , s n ) e −nI 2 (mn,sn; ) , for Notice that the logarithm of these two probability distributions are in order O(n).
6. The closed form of P n (n p ) moments As we shall see, moments of P n (n p ), or the B-Model, can be expressed in closed form.
Although the moments of the C-Model have the same feature, we do not report them here.

First moment
In Appendix F, in (F.8) and (F.9), we find that the first moment of P n (n p ) and for even and odd numbers as Recall that c n (r) is the Hermite polynomial with positive coefficients, evaluated at r. And indeed, the first moment is proportional to the ratio of two Hermite polynomials. For m n = 2n p /n, equation (F.11) finds m n r = (n − 1) c n−2 (r) c n (r) .
For constant r and r n, the asymptotic expansion of m n r derives from the ratio of two asymptotic leading terms for c n−2 (r) and c n (r). Using (35), we get and therefore m n r ∼ exp − r n .
Notice that m n r → 1 as n → ∞. This is anticipated, as the minimum of the large deviation rate function in (48) is m n = 1, independent of r.
The second asymptotic leading term of c n (r) in (57), when r ∼ n or r n, is given by a different asymptotic expression The numerical study of the above result shows that its estimate is more accurate for values of r that are not close to en. And finally, the large deviation expectation of m n with respect to P (m n ), or for the regime that r/2n = is kept constant, is

Other moments
Appendix F finds the k-th moment in (F.18) with respect to P n (n p ) as where the recursive relation for a Check Appendix F for details.

A relation among first moments with different sizes
In this part, we show an identity that relates the k-th moment of a system with length n to the first moment of systems with smaller sizes. In Appendix G, it is shown in (G.4) that n k p n = k i=1 a (k) i n p n n p (n−2) n p (n−4) . . . n p (n−2i+2) , where a (k) i is defined in (69). Note that all the summand terms are the first moment for smaller system sizes. e.g. the average n p (k) is taken with respect to P k (n p ). To find the variance of the random variable n p with respect to P n (n p ), the second moment writes as 2 n p n n p (n−2) = n p n 1 + n p (n−2) .
Hence, Var[n p ] n = n 2 p n − n p 2 n = n p n 1 + n p (n−2) − n p n .

Probability generating function
Appendix H finds the probability generating function for random variables n p and n h . Recall that the normalization constant c n (r) is a Hermite polynomial degree n, evaluates at r. For n p , the probability generating function is where c n ( r s ) is the normalization constant, evaluates at r s .
Similarly, for the probability generating function for random variables n p and n h is The normalization constant in the numerator evaluates at r(ρu+1−ρ) 2 s . 7. Marginal distributions of P n (n p ) and P n (n p , n h ) The B-model marginal distribution is defined for a single random variable, say X l , as a function that projects the state of a ball at l to its domain where S n is the state space of the B-Model. X l = 0 corresponds to a paired state and X l = 1 to a stand-alone ball.
For the C-Model, in the case of single coin in head or tail state for state space S n , X l = 1 for the head, X l = 0 for the paired state, and X l = −1 for the tail state. To find the marginal distribution, summing the probabilities over configurations that X l has the same value suffices P n (X l = x) = c∈Sn:X l (c)=x P n (c).
(77) Therefore, partitioning the state space S n or S n according to the state of an element at l is the key to find the marginals. Intuitively, we understand that a marginal is invariant with respect to l, or say, identical for all ls. This means that l is an arbitrary site.

Marginal distribution of P n (n p )
Let us start with the B-Model. In Sec. 3, S i introduced as the subsets of S n that has i pairs. Next, partition the elements of S i to two disjoint subsets as such that S In other words, for every configuration in S (1) i , the ball at l is in a stand-alone state, whilst for configurations belong to S (2) i , the ball at l are in a paired state. After that, we partition S i .
such that, S i,k contains the configurations that has a paired state between site k and l. The partitioning of S n allows us to rewrite the marginal sum in (77) as P n {the site l in a stand-alone state} = P n (X l = 1) = From (13) and (31), the probability of a observing a single configurations c i ∈ S i is equal to so, we need to find the cardinality of S (1) i and S (2) i,k to find P (S (1) i ) and P (S (2) i,k ), respectively.
Form its definition, S (1) i contains i pairs while the site l is in a stand-alone state. Indeed, we need to choose 2i paired balls among n−1 candidates, and there are (2i−1)!! distinguishable permutations among i pairs. It concludes that the cardinality of S For S (2) i,k , one of the pairs has already been selected, so there are 2i − 2 choices from n − 2 candidates, and (2i − 3)!! permutations among them. It results in Appendix I shows the details of the calculation, and finds To find the large deviation marginal, or for n 1, the normalization constant in (66) for r < en results in and for r ≥ en When we apply the same argument for P n (n p , n h ) similar to the previous part, we must sum over head and tail states too. Doing that we get The large deviation marginal for n 1 and r < en is while for r ≥ en is

Summary
We have analysed the statistics of two simple models by use of combinatorial arguments. One consisting of balls (B-Model) with no internal structure and another consisting of coins (C-model) with single particle states: head or tail. In both cases new emergent pair states can form .
The simplicity of the models permits a detailed analysis of the probability distribution P n (n p ) for observing n p pairs in the B-model and the probability distribution P n (n p , n h ) for observing n p pairs together with n h heads in the C-model.
Both probability distribution satisfy large deviation principles with speed equal to n ln n. We derived order O(n) correction terms which will make important contributions to the leading O(n ln n) terms except in the ultimate asymptotic limit.
When pairing is gradually frozen out by tuning the ratio r = q 0 /q 1 between the nonpairing (q 0 ) and the pairing (q 1 ) probabilities such that r decreases inversely proportional to the increasing number of balls, i.e. n, the probability distribution for pairs satisfies a large deviation principle with the usual speed n.
Both the original coin model introduced in [2] and the new ball model introduce in the present paper can be seen as minimalistic and paradigmatic models for statistical analysis of systems with emergent structures. For example, the C-Model can be viewed as an Ising model where pairs of Ising spins can combine to form new states.
For reference we have included an appendix Appendix A with tables listing statistical properties, which may be useful as reference points for studies of more involved models of state spaces with new emergent many-component states. Tables   Table A1. The probability distribution for pairs and non-paired states.

Name
Mathematical form Other Moments (even n)  1  Table A2. The probability distribution including head and tail states.

Name
Mathematical form  1   Table A3. The probability distribution P (m n ).

Name
Mathematical form Large deviation form P (m n ) exp [−nI 1 (m n ; )] m n = 1 n n i=0 δ Xi,p 1  Table A4. The probability distribution P (m n , s n ).

Name Mathematical form
Large deviation form P (m n , s n ) exp [−nI 2 (m n , s n ; )] m n = 1 n n i=0 δ Xi,p s n = 1

Appendix B. The Hermit polynomials and normalization constant
Recall that the Hermit polynomials define as We define Hermit polynomials with positive coefficients, say H In [12], Definition II.3.1, the large deviation property (LDP) defines as follows: Let Ω be a complete separable metric space, and B(Ω) the Borel σ-field of Ω. A sequence of probability measures {P n ; n = 1, 2, . . .} is said to have a large deviation property if there exist a sequence of positive numbers {a n ; n = 1, 2, . . .} (speed) that tend to ∞, and a function I(x) (rate function) that maps Ω into [0, ∞] such that the followings hold: (i) I(x) is lower semicontinuous on Ω and has compact level sets.
(ii) lim sup n→∞ ln P n (K)/a n ≤ − inf x∈K I(x) for each closed set K in Ω.
(iii) lim inf n→∞ ln P n (G)/a n ≥ − inf x∈G I(x) for each open set G in Ω.
Then any Borel subset of Ω, say, A, that is an I-continuity set has the limit [12] lim n→∞ 1 a n ln P n (A) = − inf The set A is an I-continuity if inf x∈cl(A) where cl(A) and in(A) denote closure and interior of A respectively.

Appendix D. Finding the normalization constant's recursive relation
To find the normalization constant c n (r), defined in (30), we introduce a generating function as Then, for both even and odd numbers it rewrites the normalization constants as Afterwards, summing both sides of (D.1) constructs F 1 (Y ; x) as a generating function that its Y 2n coefficient is equal to f 2n (x) Similarly, for odd numbers Moreover, adding F 1 (Y ; x) and F 2 (Y ; x) together results in a generating function for even and odd powers On summing F 1 (Y ; x) and F 2 (Y ; x), we wrote F (Y, x) as a power series with odd and even powers of Y . F (Y, x) is an analytic function and infinitely many differentiable. Hence, the coefficients of the Taylor expansion of F (Y, x) at Y = 0 obtains f n (x) Since the first derivative of F (Y, x) = exp [(xY 2 )/2 + Y ] is recursively relates to itself repetitively taking the derivatives finds the recursive equation for F (n+1) (Y, x) as And for Y = 0 Plugging in (D.6) into the above equation finds It is necessary to rewrite f n+1 (x) for odds and even numbers separately, as it showed in (D.2). It means that the governing recursive equations are c 2n (r) = rc 2n−1 (r) + (2n − 1)c 2n−2 (r), c 2n+1 (r) = c 2n (r) + 2nc 2n−1 (r). (D.11)

Appendix E. Large deviation probabilities
Let us start with P n (n p ) in (32). Using Sterling's approximation for ln(n!), we get ln P n (n p ) = − ln c n (r) + n p ln n − n 2 When one divides ln P n (m n ) by −(n ln n), as n → ∞ the LDP limit exists Repeating the same for P n (m n , s n ) in (33), one finds which ρ is the probability of observing a head state, and I ρ (s n ) is the Bernoulli rate function [19,21] I ρ (x 2 ) = x 2 ln( The large deviation limit exists, and is equal to Appendix F. Finding moments of the distribution P 2n (n p ) First, let us find the first moment of P 2n (n p ). Note that we use 2n instead of n for the notation convenience, and later, write the final result for n. The first moment of P 2n (n p ) defines as where in the step before the last one, we plugged in the generating function F (Y, x) in (D.5). Taking the derivative of F (Y, x) repetitively provides We can say, the derivative with respect to x moves the polynomial f 2n (x) to f 2n−2 (x) times n(2n − 1). Therefore, But (D.2) defined c 2n−2 (r) = r n−1 f 2n−2 ( 1 r ), thus This is the first moment for a system with even size 2n. Similarly for an odd system size, we find Also m n = 2n p /n is the ratio of the number of elements that are in a paired state to the system size n. Using the last results one obtains Combining both results asserts To find the k-th moment, applying the operator x d dx k-times on f 2n (x) results in and the k-th moment, n k p 2n , must be In (F.6), we found the effect of single x d dx operator on f 2n (x). Then, after careful applying the operator x d dx on f 2n (x) for k times and some bookkeepings, we obtain (F.14) The recursive equation for a and therefore Finally, the k-th moment equation with respect to P 2n (n p ) is The normalization constant in the numerator evaluates at r (ρu + 1 − ρ) 2 /s .

Appendix I. The Marginal sums
Using the cardinality of S i in (83), the marginal for X l = 1 is where we used the definition of the normalization constant and the expectation of n p in the last step. Similarly, using the cardinality of S i in (84), the marginal for X l = 0 is However, k is non-negative, and therefore, the maximum of t 2n (k) is at To not clutter the notation, we denote the bracket as and write The sum in the definition of c 2n ( ) is concentrated at its maximum. We find the asymptotic leading term of t 2n (k) as follows: first simplifying it by using the Sterling's approximation, n! = √ 2πn (n/e) n , and next evaluating it at k max (J.10) The first ratio 1/ k(n − k) approximates as We have to emphasis that we started the sum from k = 1 instead of k = 0. To justify it, observe that using (J.3) we derive the ration of two consecutive summands for k = 0 and k = 1 t 2n (0) t 2n (1) = 1 4 n × (n − 1)! n! × 0! 2! = 1 8 n 2 . (J.17) Asymptotically it means t 2n (0) t 2n (1) for any > 0 as n → ∞. And it justifies the exclusion of k = 0. If we approximate (2n)!, it results in Finally, we need to estimate the asymptotic leading term of the sum n k=0 ( √ e n/k) 2k . Similar to the argument that reported in [2], the sum can be estimated as an integral. Still, the free parameter introduces two different behaviours that needs a subtle consideration. Let us call the summand as Similar to what we have already done to find k max , we treat ψ(k) as a continuous function. And to find its maximum with respect to , we take the derivative In other words, the maximum of the sum is concentrated at k max = n. Thus the bound > e must be considered in finding the asymptotic leading term. Continuing the estimation by an integral, we divide the range of to two regions • > e: In this case the sum can be estimated by its largest term