Replica methods for loopy sparse random graphs

I report on the development of a novel statistical mechanical formalism for the analysis of random graphs with many short loops, and processes on such graphs. The graphs are defined via maximum entropy ensembles, in which both the degrees (via hard constraints) and the adjacency matrix spectrum (via a soft constraint) are prescribed. The sum over graphs can be done analytically, using a replica formalism with complex replica dimensions. All known results for tree-like graphs are recovered in a suitable limit. For loopy graphs, the emerging theory has an appealing and intuitive structure, suggests how message passing algorithms should be adapted, and what is the structure of theories describing spin systems on loopy architectures. However, the formalism is still largely untested, and may require further adjustment and refinement.


Introduction
Networks and graphs are increasingly popular and effective tools for visualising and modelling large and complex processes and 'big' data sets.We know that many important graphical structures in the world (biological networks, computing and communication networks, resource grids, lattices in physics, etc) are not tree-like; they tend to have many short loops, and we know that processes on graphs are affected significantly by the presence of such loops.It is therefore problematic that most of our tools and algorithms for analysing (processes on) finitely connected graphs, such as cavity methods [1], belief propagation type algorithms [2,3,4], and conventional replica analyses [5,6], require topologies that are locally tree-like.Some methods were extended with loop corrections [7,8,9,10], but all tend to fail for graphs with many short loops.The exceptions, solvable spin models or spectrum calculations for loopy graphs, all rely on some special property of either the dynamics or the graph topology, and are thus non-generic: these include spherical models [11], one-and two-dimensional Ising systems [12,13], loopy immune models that can be mapped to tree-like systems [14], and trees of loopy modules [15,16].
In this paper I report on ongoing research aimed at the development of a new and more general statistical mechanical method that removes the restriction to tree-like graphs.It is designed to handle analytically ensembles of large and sparse random graphs with prescribed degree sequences and prescribed loop statistics (via their adjacency spectra), and stochastic processes on such graphs.It is based on an alternative flavour of the replica method, with imaginary replica dimensions, and produces explicit closed equations in the infinite size limit, leading to Shannon entropies and expressions for spectra of ensembles of sparse loopy graphs.The familiar equations describing tree-like graphs are recovered as a simple limiting case.

Definitions
We study simple nondirected N -node graphs characterised by adjacency matrices c = {c ij }, with c ij ∈ {0, 1}, c ij = c ji and c ii = 0 for all (i, j).They are drawn randomly from maximum entropy ensembles p(c), in which we prescribe the values of all degrees k i (c) = j c ij and the eigenvalue spectrum ̺(µ|c), subject to dµ µ̺(µ|c Since the spectrum controls the statistics of closed paths of all lengths via its moments, we are not limited to tree-like graphs.Modulo isomorphisms, many graphs are already determined by their spectra [17,18,19], and without external fields the free energy of spin systems on loopy graphs depends only on the loop statistics [20], so we expect that graphs from such ensembles can be tailored very effectively to model complex real-world systems.We impose the degrees as hard constraints, and the spectrum as a soft constraint, so We can write the relevant sums over graphs in terms of the generating function The equation from which to solve the Lagrange parameter ̺(µ), i.e. ̺(µ) = c p(c)̺(µ|c), and the ensemble entropy per node S = −N −1 c p(c) log p(c), are both expressed in terms of (2): We could alternatively start by choosing ̺(µ), and view the first equation of (3) as a tool for calculating the associated spectrum.Locally tree-like graphs correspond to ̺(µ) = 0.For ̺(µ) = α 3 µ 3 with α 3 > 0 we get loopy random graphs constrained by the degrees and the density of triangles.Adding higher order terms to ̺(µ) means controlling higher order closed path statistics.Imposing the full spectrum means constraining the numbers of closed paths of all lengths.The core of our problem is how to do analytically the sum over graphs in (2).

Calculation of the generating function
We can write (2) as an average over an Erdös-Rènyi graph ensemble [21] with average degree k = 1 N i k i .Since the probabilities p ER (c) of this ensemble depend on c only via i<j c ij , we can use the short-hand f (c The factors induced by degree constraints are harmless.Our problem is the dependence on c via ̺(µ|c).To handle this we use the following identity which can be derived in a few lines from the spectrum formula of [22], and was first presented in [23]: where z indicates complex conjugation, and with the integrals Z(µ|c) = IR N dφ exp(−1 2 iφ • [c− µ1 I]φ).We can now proceed via the replica method.We evaluate the graph average for integer {n µ , m µ }, and take the limits to imaginary values via analytical continuation 1 .The tricky factor then becomes a product of integrals: We can now do the graph summations.We abbreviate φ i = {φ i µ,αµ } and ψ i = {ψ i µ,βµ }, with φ i •φ j = µ αµ≤nµ φ i µ,αµ φ j µ,αµ and ψ i •ψ j = µ βµ≤mµ ψ i µ,βµ ψ j µ,βµ , and we introduce a matrix M with entries M µ,α;µ ′ ,α ′ = µδ µµ ′ δ αα ′ .Upon inserting (5) into (2), writing degree constraints in integral form via , and using (6), we find that the order parameter of our problem is the distribution ), which we introduce into our formulae by inserting for each (φ, ψ, ω), with ω ∈ [−π, π]: Upon writing the δ-functions in integral form, we then obtain the following path integral representation of Φ[̺] which is for large N evaluated via steepest descent, with lim N →∞ ǫ N = 0: Ψ[P, P] = i dφdψdω P(φ, ψ, ω)P(φ, ψ, ω) Working out the saddle-point equations of (9) shows that (8) can be written in the form Here

Order parameter equations and spectrum formula
We work out the functionals A[{π}] and B[{π, π ′ }].Starting from (14,22) or (24,25) one derives Similarly we can work out the order parameter equations (15) for W[{x, y}] and (20) for the normalisation factor C. To compactify the result we introduce the short-hand Here F [x 1 , . . ., x k ] and G[x 1 , y 1 , . . ., x k , y k ] are the functions defined in (29,30).Now Finally we turn to our spectrum equation (23), which for the density (26) becomes with

Reflection symmetry in the origin
Our equations are invariant under W[{x, y}] → W[{x, −y}], which represents reflection in the origin of the complex plane of the centres of the π(φ|x, y).Assuming the true saddle-point to be of the invariant form W[{x, y}] = W[{x}]δ[{y}] implies that we consider the functions π(φ|x, y) to be centred at the origin, and we will find that this is a property that holds in the limit of treelike ensembles.It simplifies our equations to with and The spectrum becomes Continuous bifurcations of states with W[{x, y}] = W[{x}]δ[{y}] can be located via a Guzai (i.e.functional moment) expansion [6].One can show that such bifurcations do occur, but it is not yet clear whether they correspond to physically relevant transitions.

Regular graphs
For regular graphs our formulae simplify considerably.Here we find the order parameter equation (34) for W[{x, y}] and the spectrum reducing to with 6. Interpretation and simple tests of the theory 6.1.Interpretation and link with message passing algorithms In the limit ̺ → 0, where C = A[{x, y}] = 1 for all {x, y}, formula (34) acquires the standard form of message passing algorithms on tree-like graphs.Also the more complex structure of (34) can be interpreted as describing the stationary state of a message passing process, but now extended with the nontrivial message acceptance probabilities (modulo a multiplicative constant).This is similar to the re-weighting of solutions in e.g.[38].
In tree-like graphs we accept each proposed message {x, y}.This interpretation shows us how to solve (34,35) numerically, and how we should adapt belief propagation and other message passing algorithms [2,3,4], to make these exact for stochastic processes on loopy graphs.

Locally tree-like graphs
Our theory should recover established results on locally tree-like graphs upon setting ̺ = 0.This reduces our ensemble to a maximum entropy one in which only the degrees are prescribed.Now A[{x, y}] = B[{x, y; x ′ , y ′ }] = C = 1, and we indeed recover from ( 17) the correct ensemble entropy density of tree-like ensembles with prescribed degrees [39], with lim N →∞ ǫ N = 0: The spectra of locally tree-like graphs are also recovered correctly, with solutions of the simple form W[{x, y}] = W[{x}]δ[{y}] and real-valued {x}.Our present formalism predicts that in which For k-regular tree-like graphs our order parameter equation (50) and the spectrum (49) become For k = 1 equation (51) immediately leads to ̺(µ) = 1 2 δ(µ − 1) + 1 2 δ(µ + 1), which is indeed the spectrum of a graph consisting of N/2 disconnected 2-node components (the only possible regular graph with k = 1).Equation (51) can be solved analytically also for k > 1.To find the spectrum we only need the distribution W(x|µ) of each individual x(µ), for which we find: Inserting this result into (52) leads us to an integral that can again be evaluated analytically, and recovers the correct spectrum of [40] for k-regular treelike graphs: For graphs with arbitrary degree distributions p(k) we can show that from (50) indeed follows the equation obtained in [41].Thus our formalism passes the available tests in the tree-like limit.
7. Current and next stage of the research programme 7.1.Equations for order parameters and spectra of loopy ensembles We now also have explicit equations for the analysis of graphs with extensively many short loops, where ̺(µ) = 0.The order parameter equation and the spectrum are given by (44,45).In the tree-like limit we had W[{x, y}] = W[{x}]δ[{y}] with Im[x(µ)] ↑ 0 for all µ.If we make the ansatz that also in the presence of loops we still have W[{x, y}] = W[{x}]δ[{y}] with Im[x(µ)] ↑ 0 for all µ, and limit ourselves to regular graphs, we can simplify our order parameter equation to with in which It is not difficult to prove that (58) is normalised correctly.However, testing the validity of our equations in full is nontrivial for a number of reasons.First, there appear to exist no benchmark solutions yet, in the form of exact and independent asymptotic spectrum derivations for ensembles of the form (1) with ̺ = 0, against which to test our predictions.Second, solving the order parameter equations numerically for ̺ = 0 is nontrivial, as it involves population dynamics algorithms in which we propagate functions rather than fields, and with nontrivial acceptance probabilities, that are quite hard to equilibrate accurately without sacrificing eigenvalue resolution.Thirdly, generating random graphs from (1) numerically, in order to then obtain their spectra by numerical diagonalisation, is itself nontrivial [42]; for relatively simple choices like ̺(µ) = α 3 µ 3 +α 4 µ 4 reliable algorithms and code do exist, but these ensembles are notorious for their complex phase transitions [43,44,45,46].For instance, for regular graphs with k = 3 and ̺(µ) = α 3 µ 3 one observes in numerical simulations that, modulo finite size effects, the spectrum is of the form in which γ ∈ [0, 1] and ̺(µ) is McKay's k = 3 formula (55) for tree-like regular random graphs.One confirms that µ = 0 and µ 2 = 3 for any γ, as required.The second term in (60) is the spectrum of a fully connected simple graphlet of size four.Hence (60) describes graphical phase separation; the system increases the number of triangles by disconnecting a fraction γ of the nodes from the treelike bulk, to form γN/4 disconnected 4-node cliques.Similarly, For regular graphs with k = 3 and ̺(µ) = α 4 µ 4 one observes in numerical simulations spectra of the form in which again ̺(µ) is McKay's formula (55) for k = 3.Also here µ = 0 and µ 2 = 3 for any γ.The second term of (61) is the spectrum of the six node graphlet with adjacency matrix a ij = δ i,j+1 + δ i,j−1 + δ i,j+3 + δ i,j−3 (i, j mod 6), which is a 6-node clique from which all triangles are removed.Here the system increases the number of squares by disconnecting a fraction γ of the nodes from the treelike bulk, to form γN/6 suitable disconnected 6-node graphlets.
With the available numerical evidence at this moment, consisting of numerical solutions of order parameter equations and spectra for ensembles with ̺(µ) = α 3 µ 3 + α 4 µ 4 (obtained via population dynamics with nontrivial acceptance probabilities), together with spectra measured in numerical graph simulations, satisfactory agreement has not yet been obtained.Preliminary population dynamics computations suggest that the above phase separation phenomena are not yet captured.Yet the structure of the theory seems elegant and intuitive, with the richness to exhibit complex phase transition phenomenology, and it is correct in the limit ̺ → 0. The reason for the residual disagreement has not yet been identified.Apart from mundane explanations (e.g.imprecise or nonequilibrated numerical algorithms, either in population dynamics or graph generation, or saddle-points that are not of the assumed form), a possible candidate lies in the usage of complex logarithms.Some of the identities used in the present derivations, such as log exp(Z) = Z or log(ZW ) = log(Z) + log(W ), hold for complex values only if we can steer away from the cut in the complex plane.A more careful re-derivation may well reveal extra terms that for some reason become irrelevant for ̺ → 0 (where we know the theory works) but may be important for ρ = 0. To this author, the structure of the equations simply feels right as a description of loopy graph ensembles, and he is optimistic that the remaining discrepancy will soon be removed.

Processes on loopy graphs
Looking ahead towards the use of the presently proposed method for the modelling of spin systems on loopy graphs, one can see that the calculation of the disorder-averaged free energy density of such systems will involve further (non-complex) replicas, and requires the evaluation of the following more complicated generating function: with K = βJ and the n-replicated spin vectors σ i = (σ 1 i , . . ., σ n i ).Again also this quantity can be calculated, the limit N → ∞ can be taken, and one finds a more complex closed set of equations in which now the graph order parameters and the spin order parameters are entangled.If we assume spin replica symmetry, in addition to graph replica symmetry, the final RS order parameter will take the form W[{x, y}, v], in which v denotes an effective field.It is encouraging to confirm that upon assuming the physical saddle-point to be of the form W[{x, y}, v] = W[{x, y}]W[v] (i.e. the statistical features of the process decouple from those of the graph) we indeed recover the previous equations of this paper for W[{x, y}], and the solution of [47] for the transition temperature on tree-like graphs (where indeed one would expect such decoupling to apply).