Glass--like transition described by toppling of stability hierarchy

Building on the work of Fyodorov (2004) and Fyodorov and Nadal (2012) we examine the critical behaviour of population of saddles with fixed instability index $k$ in high dimensional random energy landscapes. Such landscapes consist of a parabolic confining potential and a random part in $N\gg 1$ dimensions. When the relative strength $m$ of the parabolic part is decreasing below a critical value $m_c$, the random energy landscapes exhibit a glass-like transition from a simple phase with very few critical points to a complex phase with the energy surface having exponentially many critical points. We obtain the annealed probability distribution of the instability index $k$ by working out the mean size of the population of saddles with index $k$ relative to the mean size of the entire population of critical points and observe toppling of stability hierarchy which accompanies the underlying glass-like transition. In the transition region $m=m_c + \delta N^{-1/2}$ the typical instability index scales as $k = \kappa N^{1/4}$ and the toppling mechanism affects whole instability index distribution, in particular the most probable value of $\kappa$ changes from $\kappa = 0$ in the simple phase ($\delta>0 $) to a non-zero value $ \kappa_{\max} \propto (-\delta)^{3/2}$ in the complex phase ($\delta<0$). We also show that a similar phenomenon is observed in random landscapes with an additional fixed energy constraint and in the $p$-spin spherical model.


Introduction
Low-dimensional random fields for a long time held a prominent role in sciences as many natural phenomena are described through statistics of random fields defined on either space or space-time [1] with typical dimensionalities N = 2, 3 or 4. On the other hand, only recently high-dimensional spaces (N ≫ 1) received attention with the advent of machine learning [2] or in studies of complex systems like spin glasses and large biomolecules [3]. In such applications, the dimensionality N is identified with the number of degrees of freedom and the field itself is interpreted as either energy function (spin glasses, proteins) or loss function (machine learning). As was pointed out and utilized numerous times [4,5,6], the task is always to find a configuration of the system such that the energy (or loss) function is minimized.
Although this task is straightforward in low dimensions, its successful completion for non-convex surfaces in high-dimensional spaces is notoriously intractable and is a NPhard problem. As a result, heuristic methods, e.g., algorithms which work approximately but whose robustness remain elusive, are widespread and remain the main choice of tool for practitioners. A well-known example is the stochastic gradient descent (SGD) algorithm derived from a gradient descent method for solving convex problems.
Almost all problems with minimizing non-convex functions are ultimately related to the structure of their surface which varies just as the Earth's landscape does with plateaus, valleys and peaks only with a greater degree of variability. On top of that, the landscape described is very high-dimensional and thus heavily impeding our intuition. Despite these difficulties, insights into behaviour of energy/loss landscapes are crucial in understanding of the success of algorithms like SGD. In this context, the most natural approach is to study quantities related to stationary points of the function by either counting their total number or study their respective positions.
In recent years, perhaps the most startling case of a successful approach to nonconvex optimization is that of Deep Neural Networks (DNNs). Not only the success of training DNNs through SGD is surprising, also such models do not suffer from overfitting despite being extremely overparametrized, generalize well to unseen examples and do not get stuck in local minima [7]. One possible explanation of these features [6] is provided by inspecting statistics of stationary points of the DNN loss function based on an explicit link to spherical spin glass models. This approach along with related works [8,9] offers a possible explanation of trainability aspects of DNNs through analyzing structure of stationary points. Our work extends this line of inquiry through a probabilistic approach to the description of populations of stationary points having a fixed instability index.
The aim of our work is to offer a detailed analysis of the stationary points in the energy landscapes of systems undergoing glass-like transition. The existing body of work on this subject mostly focuses on parameter ranges far from the critical threshold, both in the topologically non-trivial phase (where stationary points are exponentially abundant) and in the topologically trivial phase (where, typically, there are very few stationary points) [10,11,12,13]. To the best of our knowledge, the only work providing insights into what happens near the critical threshold is one by Fyodorov and Nadal [14] who counted minima. We complement this study by tracking the relative sizes of populations of minima, maxima as well as saddle points with any given instability index k (number of unstable directions) near the critical threshold. To this end, we work out fractional probabilities p k for a stationary point to have a given instability index k, an approach introduced in [15]. These probabilities are quotients of the population size of the stationary points with index k to the total number of the stationary points. The picture that is emerging from our analysis offers an alternative explanation of the glass-like transition based on toppling of stability hierarchy of populations of stationary points. In this context, the stability hierarchy refers to a steady decrease of p k when the instability index k is increasing, so that local minima are the most likely stationary points, while toppling refers to a sudden change when the most likely stationary points are those with a fixed number of unstable directions k max > 0. As we show in the current work, this phenomenon which marks a global change in the underlying random landscape is shared by random landscapes, its energy constrained variant and by the paradigmatic spherical p-spin model.

Main results
In order to gain insights into statistics of stationary points in the transition region, we employ the paradigmatic model [16,10] of random energy landscape where a random scalar field is coupled to a parabolic confining potential with the transition driven by the coupling strength µ > 0: Here x is a vector in N-dimensional state-space and V (x) is isotropic homogeneous Gaussian vector field with zero mean value and covariance function The main feature of model (1) is the existence of two distinct phases in the thermodynamic limit N → ∞ with a sharp transition region between the two phases at µ c = f ′′ (0) [10]. Introducing the rescaled coupling strength for large values of m the system is in a topologically trivial phase whereby the parabolic confining potential dominates the energy landscape (the probability of finding more than one stationary point is zero in the limit N → ∞). As m decreases below the critical threshold m c = 1 the energy landscape becomes highly complex which manifests itself in an exponential explosion of stationary points -the (average) total number of stationary points and the number of local minima grow exponentially as the dimension of the state space N increases. This transition is frequently called glass-like as it is closely related to one found in spherical spin-glasses [17]. In a natural way, the energy landscape (1) gives rise to a gradient flow which is defined by the differential equationẋ Then the stationary points x * of E(x), i.e., the points where ∇E vanishes, are the equilibria (fixed points) of the gradient flow (4). In this picture, if x * is a point of local minimum of E(x) (i.e., all eigenvalues of the Hessian (∂ i ∂ j E(x)) ij at x = x * are positive) then the equilibrium x * is asymptotically stable. That is, a small displacement from x * in any direction results in the system asymptotically returning back to x * . If x * is a saddle and k is the number of negative eigenvalues of the Hessian of E(x) at x = x * then the equilibrium at x * will have N − k stable directions. Displacement along these directions will result in the system asymptotically returning back. In this way, the index k is a measure of instability of the equilibrium, and we shall call it the instability index.
The higher the value of k is, the fewer stable directions there will be at the equilibrium. Obviously, local maxima, i.e., stationary points where k = N, are most unstable.

Populations of stationary points
In this work, we go beyond counting few sub-populations (like minima) of stationary points in absolute terms and instead work out relative or fractional probability distributions of stationary points with fixed instability index. This approach enables novel, refined questions like: • Given a randomly sampled stationary point, what would be its most likely instability index? How does a full probability distribution of indices look like?
• How does the probability distribution of indices change in the vicinity of transition region?
Such pertinent enquiries can be made by a local agent (like an SGD algorithm or glassy system looking for a configuration minimizing its energy) probing the landscape and encountering saddles on its way. For example, it was argued in ref. [18] that the abundance of saddles with a large number of stable directions leads to slowing down the gradient descent dynamics in high dimensional energy landscapes due to the dominance of borders in high dimensions: at low temperatures the system is trapped for long times near borders (ridges) of basins of attraction of local minima and the gradient descent is determined mainly by nearby saddles.
In what follows, we utilize three interrelated counting statistics: • N k , the number of stationary points with instability index k; • N eq = N k=0 N k , the total number of stationary points; • N (k) = k n=0 N n the number of stationary points with instability index up to k.
The quotient of the first two counting statistics is the relative frequency N k /N eq of saddles with instability index k, whilst the last one is the associated cumulative frequency distribution N (k) /N eq . If x * is a stationary point of E(x) drawn at random from the entire population of stationary points then, as was shown in [15], the probability for x * to have k unstable directions is given by the average of N k /N eq over the realizations of the random field V (x): Therefore, N k /N eq and N (k) /N eq are, respectively, the probability density function (pdf) and cumulative distribution function of the instability index k.
In the context of the above two questions, calculating both averages is a natural starting point. This seems a prohibitively difficult task, and, instead, we set out to analyse the annealed probabilities where enumerator and denominator are averaged separately and combined afterwards.
To justify a connection between the right-hand side in (5) and its annealed counterpart p k , counting statistics in both enumerator and denominator ought to have a selfaveraging property in the limit of high dimensionality limit. The recent works [19,20,21] addressed this type of question for the pure p-spin spherical model whose energy landscape falls in the same class as the random energy model and gave an affirmative answer for equilibria with sufficiently low energy or with finite instability index k. This gives rise to a hope that for some classes of coupling fields the annealed picture will resemble the quenched one. While the task of identifying such classes of coupling fields is a challenging open problem which deserves further investigations, we think that the annealed probabilities deserve a closer look. The picture which is emerging from our analysis of these probabilities exhibits some interesting features and is described below.

Toppling of stability hierarchy
As was mentioned above, one manifestation of the phase transition in the random energy model (1)-(2) is the exponential explosion in the number of stationary points. In the limit of high dimensionality N → ∞, the complexity exponents associated, respectively, with the total number of stationary points and the number of local minima, are positive for every m < 1, Diagram illustrating the toppling mechanism in the random energy landscape model (1) -(2) in the limit of high dimensionality N ≫ 1. The diagram on the left-hand side depicts the four scaling regions of parameter m (3). The vertical array of five plots on the right-hand side depicts the annealed distribution of the instability index k in each of these regions (there are two plots for the toppling region). The plot at the top depicts the annealed probabilities p k (6) in the simplicity region and the plot below it depicts p k in the hierarchy region. In these two regions the typical values of k are finite and, correspondingly, the plots are discrete. In the toppling and complexity regions the typical values of k scale with, correspondingly, N 1/4 and N . There, the annealed distribution of k is best described via the cumulative probability P k = k n=0 p n . The corresponding densities, p (c) (κ) = d dκ P κN 1/4 and p (d) (κ) = d dκ P κN , are depicted in the bottom three plots. In the toppling region the annealed density p (c) (κ) undergoes a gradual change when the value of m decreases below the critical threshold m c = 1 from being a monotone decreasing function of κ (the likeliest saddles are local minima) to being a unimodal function (the likeliest saddles have κ max N 1/4 unstable directions, κ max = 4 √ 2 3π (−δ) 3/2 ). The plots were produced using analytic expressions presented in the third column in Table 1 and evaluated at parameter values m > 1 in plot a), δ = 0.8 in plot b), δ = 0.2 and δ = −0.8 in plot c), and m = 0.6 (solid line) and m = 0 (dashed line) in plot d). Plot b) was produced with the help of numerical package [22]. and vansih for every m > 1 [10,14]. However, as can be seen from (8) at the critical threshold these two counting statistics develop the exponential growth on different scales: the width of the transition region for N eq is N −1/2 , whilst the width of the transition region for N 0 is N −1/3 . When one analyzes the relative sizes of populations of stationary points near the critical threshold, such as the annealed probabilities p k and P k (6), the microscopic scales get superimposed. This suggests that the glassy transition in model (1)-(2) has several distinct transition regions which we will now describe.
It is convenient to encode the scaling regimes by the formula The parameter values β = 1/2 and β = 1/3 define the two aforementioned microscopic scales and the parameter value β = 0 defines a global scale. Based on these three natural scales, we can identify four distinct scaling regions of change consisting of two microscopic and two macroscopic scales, see Figure 1 and Table 1: In this region the parabolic confining potential dominates in the limit N → ∞. The instability index k is a discrete variable and k = 0 corresponds to a minima. With probability asymptically close to 1 the system has only one stationary point which is a local minimum, and so p k = 1 in the limit N → ∞ if k = 0 and p k = 0 otherwise. This is illustrated in plot a), Figure 1.
b) Hierarchy region, m = 1 + δ/N 1/3 , δ > 0 As the value of m is decreasing and getting closer to the critical threshold m c = 1, the system exits the simplicity region and enters a hierarchy region. In this region the mean number of saddles N eq is increasing as m gets closer to m c (but staying finite in the limit of high dimensionality) and the deviations of the instability index k from zero become larger. This results in a flow of indices away from k = 0 and the emergence of groups of increasingly more unstable saddles as evidenced by nonzero values of N k for k > 0, see Table 1. Consequently, the annealed probability distribution of the instability index k develops a non-zero tail extending to finite values of k and displaying the hierarchy of stability p 0 > p 1 > p 2 > . . . with the most likely stationary point being a local minimum. This is illustrated in plot b), Figure  1. Note that typical values of the instability index k do not scale with N in the limit of high dimensionality, i.e. k takes finite values k = 0, 1, 2, . . . . c) Toppling region, m = 1 + δ/N 1/2 , δ ∈ R As the value of m is decreasing further and getting closer to the critical threshold, the flow of indices away from zero becomes stronger and their distribution flatter, i.e., the difference between p k and p k+1 is becoming smaller and smaller. This microscopic mechanism is fundamental and eventually leads to a macroscopic change which occurs in the region m = 1 + δ/N 1/2 . It can be shown that in this region the total number of stationary points scales as N 1/4 and so do the typical values of the instability index k. It is then natural to introduce rescaled instability index κ = k/N 1/4 which becomes a continuous random variable in the limit of high dimensionality. It is instructive to inspect the dependence of the rescaled index density d dκ P κN 1/4 on parameter δ, see Figure 1 and Table 1. For every fixed δ > 0, this density is monotonically decreasing function of κ in the interval 0 ≤ κ < ∞ and, hence, the hierarchy developed in region b) persists. The critical threshold m c = 1 which corresponds to δ = 0 on this microscopic scale is the tipping point. For every fixed δ < 0, the index density is a unimodal function of κ attaining its maximum value at κ max = 4 √ 2 3π (−δ) 3/2 . The loss of monotonicity means that the most likely stationary point is no longer a local minimum but instead a saddle with κ max N 1/4 unstable directions. In other words, the hierarchy of stationary point populations is broken and transition to complex phase starts which eventually produces a typical (i.e. most probable) stationary point with non-zero instability index. d) Complexity region, 0 < m < 1 As the scaled coupling strength m is decreasing further below the critical threshold, the system enters the topologically non-trivial phase where the total number of stationary points is exponential in N, see (8). In this region, the typical values of k scale with N and thus k = κN. The resulting index density d dκ P κN has a highly localized peak at Figure 1. As one would expect κ max (1) = 0 and κ max (0) = 1/2, so that when the parabolic confining potential is turned off completely, the most likely instability index is N/2. This of course makes complete sense as for a pure random field the stable and unstable directions are equally likely.
Closed form expressions for the fractional probabilities (6) in the limit of high dimensionality N → ∞ are written down explicitly in Table 1 while all the derivations can be found in Appendix A.

Universality of the toppling mechanism
Although in this work we describe the glass-like transition via the toppling of stability hierarchy for the toy model (1), we believe this mechanism holds more generally. To support this claim, in Appendix B we report on an analogous mechanisms driving the phase transition in the model (1) with an additional fixed energy constraint [12] and in and Appendix C we do the same for the p-spin spherical model [23,24,13,8]. Below we provide a brief summary of our findings.
First, consider the same random energy landscape model (1) -(2) as before but now at a fixed energy level E 0 , We shall call this model the fixed energy model or the constrained random landscape model. It has two parameters. One is the coupling strength µ and the other one is the energy level E 0 . Similarly to the unconstrained model, one can investigate the complexity exponent Σ eq associated with the total number of stationary points n eq at Table 1. Annealed cumulative distribution P k = N (k) / N eq of the instability index k in the random energy landscape model (1)-(2) in the limit of high dimensionality Ai(t)dt with Ai denoting the Airy function, and ρ sc (λ) = 2 π √ 1 − λ 2 are the eigenvalue densities at the edge and in the bulk of the eigenvalue distribution in the GOE, and F n (λ) is the cdf of the top (n + 1)-st eigenvalue in the GOE. The complexity exponent Σ eq is given in (8) and c = 3π/(4 √ 2) 2/3 . The probability densities shown in the third column are plotted in Figure 1.
energy level E 0 , Σ eq = lim N →∞ 1 N ln n eq , and, more generally, the relative (to the total number) average sizes of the population of saddles at energy level E 0 with instability index k. The latter is described by the annealed cumulative distribution function P k = n (k) / n eq of the instability index k, where n (k) is the number of stationary points at energy level E 0 with instability index up to k.
Instead of extensive parameters µ and E 0 it is more convenient to use their intensive versions, the rescaled coupling strength m (3) and the rescaled energy level The complexity exponent Σ eq can be obtained in a closed form. Referring the reader to Appendix B for details, here we focus on the macroscopic phase diagram which emerges from this calculation, see the left plot in Figure 2. The macroscopic phase space is defined by the zero level line of the complexity exponent Σ eq in the (m, ǫ 0 )-plane. This line consists of two curves (defined in (B.8)) and a straight line below which the random energy landscape has no stationary points for any m > 0. The macroscopic toppling region around the critical point, defined in (13), is depicted in the plot on the right. The dotted line is the boundary line along which toppling of the hierarchy of stability happens, see items a), (b), c) and d) above and Figure 1 where All three elements intersect at a point (m, ǫ 0 ) * ,max = (1, − 1 2q ). The complexity region in the fixed energy model is the area in the (m, ǫ 0 )-plane which is bounded by the straight line m = 0 and the two curves (11). For all parameter values inside this region the complexity exponent Σ eq is positive and the random energy landscape (1) -(2) has, on average, exponentially many stationary points at energy levels in the interval ǫ − (m) < ǫ 0 < ǫ + (m). With the exception of the straight line (12), the complexity exponent Σ eq is negative outside the complexity region. This is the two-dimensional simplicity region. For all parameter values (m, ǫ 0 ) in this region the probability for the random energy landscape to have at least one stationary point is exponentially small unless ǫ 0 = − 1 2q in which case the landscape typically has one stationary point (minimum) at each m > 1.
The glass-like transition from the simplicity to complexity phases happens at the critical point (m, ǫ 0 ) * ,max . In what follows we focus only on the toppling region, depicted in the right plot in Figure 2, which is an area of linear size O( 1 √ N ) around this point: The toppling mechanism manifests itself in the form of annealed cumulative distribution of instability index: where c q = 2q 2 +3 is a constant and The behaviour of function (14) is driven by the sign of ∆ q (δ, ǫ) so the condition ∆ q (δ, ǫ) = 0 describes a boundary line along which toppling takes place (the dotted line in right plot of Figure 2). In the (δ, ǫ)-plane the transition from the simplicity to the complexity regions will occur along any path that starts to the right of the boundary line ǫ = qδ in the region δ ≫ 1, enters the cone bounded by two straight lines and then continues inside this cone to the region of δ ≪ −1. When we take into account only such meaningful paths, we recreate the toppling mechanism in two-dimensional phase space. In particular, the system develops the most probable non-zero instability on crossing the line ǫ = qδ (dotted line in the right plot in Figure 2) while the main features of the annealed probability density of the instability index mirrors that of the unconstrained model presented in Figure 1. Furthermore, connection with the unconstrained model is evident as the annealed probability density in the toppling region in this model model have the same functional form as (14) (compare with the relevant entries of Table 1). Now, consider the p-spin spherical model defined by an energy function where x is an N + 1 dimensional vector constrained to lie on the sphere N +1 i=1 x 2 i = N and p ≥ 2 is a positive integer. Symmetric coupling matrix J and random external field h i are both drawn from Gaussian distributions with vanishing means and variances There, the annealed distribution of k is best described via the cumulative probability P k = n n=0 p n . The corresponding density p(κ) = d dκ P κN is depicted in the bottom three plots. In the toppling region p(κ) undergoes a gradual change when the parameter B increases above the critical threshold B = 0 from being a monotone decreasing function on the interval [0, 1/2] (the likeliest saddles are local minima and maxima) to being a monotone increasing function (the likeliest saddles have N/2 unstable directions). The plots were produced using analytic expressions presented in the third column in Table 2 and evaluated at specific parameter values . Plot (b) was produced with the help of numerical package [22].
In Appendix C we both recall known results and summarize new calculations enabling calculation of asymptotic forms of annealed probabilities across regions a)d), see Table 2 for a summary. In Figure 3 we plot these probabilities in all four regions as a function of an effective variable combining variances J, σ and the parameter p. Importantly, although at first the resulting picture might not resemble Figure 1 plotted for the toy model (1), it is due to development of a dual hierarchy resulting in likewise toppling of both hierarchies simultaneously. Table 2. Annealed distribution P k = N (k) / N eq of the instability index k in the spherical spin-glass model (15)- (16) in the limit of high dimensionality N ≫ 1. The first two scaling regimes are expanded for two different parameter ranges of k due to topological constraint linking stability and instability in the spherical model. ρ edge (λ), ρ sc (λ) and F n (λ) are as defined in Table 1 and Q x is the quantile function of the semicircular law, The mean total number of saddles N eq in all four regimes was obtained in ref. [13].
Density of distribution Total number of distribution P k saddles N eq In contrast to the toppling mechanism of model (1) where around a single minimum in the simple region one hierarchy is developed, in the spherical model, by topological reasons, the simple phase consists instead of two stationary points -a minimum and a maximum. These in turn produce two disjoint stability hierarchies and eventually, in the toppling region both hierarchies are toppled and merged together to create a joint density centered around scaled instability index κ = 1/2. Complexity region is trivial and centered around κ = 1/2 so the toppling mechanism happens on a smaller scale.

Methods
This section provides an outline of the approach we employ to calculate the fractional probabilities (6) in the limit of high dimensions. Technical details of our calculations can be found in Appendices. For simplicity, we only discuss the unconstrained random energy landscape model (1) - (2). Our approach to the other two models, the fixed energy landscape model and the p-spin spherical model is similar and we only provide references to key results relevant to these two models.
The key observation that enables calculation of the fractional probabilities (6) in the limit of high dimensionality is a relation between the mean number N k of stationary points with instability index k = 0, 1, 2, . . . , ... and the probability distribution of the top (k + 1)-st eigenvalue in the Gaussian Orthogonal Ensemble of random matrices. For random energy landscapes (1) - (2) in N-dimensions this relation reads where ρ (k+1) N +1 (λ) is the probability density function of the top (k + 1)-st eigenvalue in GOE N +1 , the Gaussian Orthogonal Ensemble of matrices of size (N + 1) × (N + 1), and m is the rescaled coupling strength (3). To the best of our knowledge, relation (18) has not been stated in the literature apart from the case of local minima (k = 0) [14]. We derive this relation in Appendix A. Analogous formulae for the fixed energy and the p-spin spherical models expressing the mean number of stationary points with instability index k in terms ρ N +1 (λ) applicable, depending on the values of m, either in the bulk, at the spectral edge, or beyond the spectral edge in the large deviations region. Note that ρ N +1 (λ) is the usual mean eigenvalue density: for an infinitesimal δλ, the probability to find an eigenvalue of the GOE N +1 matrix in the interval (λ, λ + δλ) is given by ρ N +1 (λ)δλ.
Asymptotic analysis in the simplicity region can be performed by making use of the large deviation rate function for ρ N +1 (λ), see [14,10]. In this region, asymptotically N 0 = N eq and no approximation of the partial eigenvalue density ρ (k+1) N +1 (λ) is needed. Asymptotic analysis in the hierarchy region can be performed by making use of the Tracy-Widom approximation for eigenvalues at the spectral edge: where F k (σ) is the limiting cumulative distribution function of the appropriately scaled top (k + 1)-st eigenvalue in GOE N in the limit N → ∞ [25,26]. Asymptotic analysis in the toppling and complexity regions can be performed by making use of the Gaussian approximation for eigenvalues at the spectral edge and in the bulk [27,28] where (20) holds with the mean value µ k = q k and variance σ 2 k = log N 2N (1−q 2 k ) expressed in terms of the k-th quantile q k of the Wigner's semicircle law.
The utility of approximations (19) - (20) is in that they allow for an asymptotic evaluation of the integral on the right-hand side in (18). In this way one obtains the basic building blocks which in turn comprise distinct fractional probabilities for each scaling regime in the random energy landscape model. For details we refer the reader to Appendix A. Using a similar approach, the fractional probabilities are obtained for the fixed energy model in Appendix B and for the p-spin spherical model in Appendix C.

Conclusions
In this work we propose a detailed picture of the glass-like transition in the random energy landscape model described through the lens of populations of stationary points. To this end, we work out fractional probabilities of populations of stationary points with fixed number of unstable directions. These fractional probabilities can also be thought of as representing the annealed probability distribution of the instability index of a stationary point picked up at random from the totality of all stationary points. The behaviour of fractional probabilities changes as the system transits from the simple phase with typically very few stationary points to the complex phase with a multitude of stationary points. When exiting the simple phase, the system develops a hierarchy of stability defined as monotonic behaviour of fractional probabilities -the most probable stationary points are local minima, then stationary points with one unstable direction, and so on. This order of stationary points breaks down as the system enters the complex phase, the process that we refer to as the toppling of stability hierarchy. In the vicinity of the transition, we identify toppling as breaking the monotonicity of fractional probabilities of populations or, equivalently, as a change in the hierarchy where the local minima cease to be the most probable stationary points.
Although our analysis is based mainly on one toy model, we argue that the discussed toppling mechanism is likely to be a universal feature of glass-like transition. To this end, we analyse the constrained random energy landscape model (1) with the fixed energy constraint modifying the vicinity of the glassy transition and the spherical spin-glass model (15) introducing a toppling of dual hierarchy. In both cases we find the same toppling of stability hierarchy driving transition between the simple and complex phases. We believe this is a generic feature -considered models are all linked to properties of the underlying random matrices which in turn have known universal properties [29]. At the same time, we would like to emphasize that the transition picture is dependent on two reasonable yet simplifying assumptions of the annealed approximation (6) and the zerotemperature limit. Abandoning these assumptions can introduce new phenomena. As an example, although the toppling phenomenon is also expected to happen for the mixed spherical p-spin model considered in [30], working at finite temperatures introduces new dynamical behaviour.
Finally, systems with asymmetric couplings have recently attracted considerable interest due to their relevance in various contexts, e.g., neural networks or biological and ecological systems. Our approach relies on the exact relation (18) between the average number of saddles with instability index k and the density of the (k + 1)-th eigenvalue of the random matrix in the Kac-Rice integral. This allows for a quantitative of analysis of the transition region in the annealed approximation. In the asymmetric setting no such relation is unknown. This makes extending our approach to the asymmetric settings, such as non-gradient random flows (cf (4)) challenging. Away from the transition region, one can use tools of the large deviation theory for non-Hermitian matrices and work out the annealed density of the instability index for the random flow (21) [15]. Interestingly, in the left tail of the transition region this density is given by exactly the same functional expression as one found in this paper in the toppling region. Therefore, one may expect the toppling mechanism at work for the random flows (21) and their variations [31,32]. However, it is unclear whether the intuition based on our annealed calculations in the symmetric case could be helpful in the asymmetric world at large. For example, recent work [33] on the generalized Lotka-Volterra model with random symmetric interactions argues that adding weak asymmetric interactions completely wipes out marginally stable states and replaces locally stable states with chaotic attractors.

Acknowledgments
We thank F. Bornemann for sharing with us the package RMTFredholmToolbox [22] used to calculate Tracy-Widom distributions.

Appendix A. Random energy landscape model
In this appendix we provide details of our calculations in the case of the random landscape model (1).
Appendix A.1. Calculating N k or the mean number of stationary points with index k Counting statistics for populations of stationary points are given in terms of a formal density where the summation is over all stationary points x (k) * with fixed instability index k. This density is in turn expressed by the celebrated Kac-Rice formula where Heaviside function Θ k is equal to 1 when the Hessian ∂ ij E(x) has exactly k negative eigenvalues and 0 otherwise. We defined three basic counting statistics: • the total number of stationary points, N eq = N k=0 ρ k (x)dx ; • the number of stationary points with instability index k, N k = ρ k (x)dx; • the number of stationary points with instability index up to k, N (k) = k n=0 ρ n (x)dx.
Firstly, we combine the density of stationary points ρ k (x) introduced in (A.1) with the multidimensional Kac-Rice formula: where E(x) is the random function introduced in (1) and the Heaviside function Θ k outputs 1 when the Hessian has exactly k negative eigenvalues and 0 otherwise. Averaging is taken wrt. random field V .
Following [10], the averaging factorize into two terms since the derived fields ∂ i V and ∂ ij V decouple as evidenced by the vanishing of their cross-correlations ∂ i V ∂ kl V = 0: The first term is re-expressed using ∂ ij E = µδ ij + ∂ ij V and an x independent matrix M: where the average δ(∂V (x) + M) V is computed by first representing a multidimensional delta function in terms of the Fourier integral and then integrating out the V dependent term by using an identity [TrP ∂V ] 2 = µ 2 c N (2TrP 2 + (TrP ) 2 ) with µ c = f ′′ (0). Due to the Gaussianity of the field V , the integral does not depend on x and is given by which is the joint pdf for the random matrix M that we average over in (A.3). This random matrix is closely related to the GOE since M is real and symmetric. Although the second term ∼ (TrM) 2 is non-standard, the formula can be recast into a proper GOE through introduction of an additional Gaussian integration and a trivial rescaling. The result reads: where the constant factor N 2π is specified in the µ → ∞ limit while the average is taken over the GOE with joint pdf given by Since the first term (A.5) is independent of x, we integrate out the space-like variable x as long as f ′ (0) < 0. Lastly, we bring together both terms (A.5) and (A.6) and find: where z t = µ + µ c t and the matrix-averaged term reads: is a generalization to k = 0 of equation found in [14]. The Heaviside function is a symmetrized product each for eigenvalues of M: It conditions the matrix M to have exactly N − k positive and k negative eigenvalues.
To compute K k,N we follow the standard approach of random matrix theory [29] and change integration variables from matrix elements M ij to its eigenvalues λ i : The binomial is a combinatorial factor resulting from the symmetrization of the Heaviside step function while z N = c N 2 N π − N(N+1) is the new normalization arising by integrating out the eigenvectors. In Appendix A.2 we derive the formula: N +1 is probability density function of finding the (k + 1)-th largest eigenvalue of a random matrix of size (N + 1) × (N + 1) and C N = . We plug it back to (A.7) with rescaling t → √ 2t − m and obtain the final form given in (18): We readily calculate also formula for the cumulative variant N (k) as a sum: We stress that both formulas (A.11) and (A.12) are exact.
Appendix A.2. Derivation of (A.10) We establish a relation (A.10): . We start off from the l.h.s. given by (A.9): We integrate out the Heaviside functions, rescale λ i = 2µ 2 c N µ i and set z = y 2µ 2 c N to find: with the rescaled quantity given by: This formula is related to the probability that exactly k eigenvalues lie inside an interval J = (y, +∞) (Definition 8.1 in [34]): . Taking a derivative of E N gives the pdf of the k-th largest eigenvalue ρ (k) N : In particular, setting k = 0 gives the pdf of the largest eigenvalue. These formulas are found by using (A.14) as a probability distribution. We sum first k + 1 terms to obtain density ρ On the other hand, the derivative of a single probability function E N +1 is related to the quantitiesκ k,N through: . Due to the telescopic property of the derivative, we sum up k + 1 terms in order to find a formula for a singleκ k,N : where constant prefactor is given by Appendix A.3. Calculating asymptotic forms of N k and N (k) across the transition In this section we derive the asymptotic approximations of the averages (A.11) and (A.12) in four regions detailed in Section 2.2 and Figure 1. All results are summarized in Table 1.
We then calculate the integral (A.11) through the saddle point method. To this end, we find the saddle from g ′ (s; δ) = 0 as s * = √ 2 and expand all terms around s = s * + 1 is the family of Tracy-Widom distributions [25,26] for the (k + 1)-th largest eigenvalue of the GOE. Cumulative mean (A.12) is given by a sum of k + 1 contributions: Region c) m = 1 + δ/N 1/2 , δ ∈ R and k = κN 1/4 . In this region, we use an approximate result found in [27,28]. With k = κN γ and for γ ∈ (0, 1), the (k + 1)-th largest eigenvalue of GOE is asymptotically distributed as: and variance σ k = 2 log k N 1/3 (12πk) 2/3 . To calculate the cumulative mean (A.12) we first calculate the prefactor as: From g ′ (s; δ) = 0 we find the saddle-point at s * = √ 2 and expand integrand around s = s * + σ/ √ N: Now we turn to the asymptotic form of the sum inside the integral: in the general case γ ∈ (0, 1). Since we inspect the N → ∞ asymptotics, we approximate the sum by the Euler-Maclaurin formula: , . In the next step we change variables λ 2/3 = x, λ 1/3 dλ = 3 2 xdx: wheref (x; s) = f (λ = x 3/2 ; s). Since we will eventually expand the sum around s = s * + σ/ √ N , a natural scale is such thatf (x; . This happens when 2/3(γ − 1) = −1/2 or the powers of N in both terms agree. From now on we set γ = 1/4: and must lie within the integration interval x * ∈ (0, κ 2/3 ) otherwise its leading order contribution vanishes. We set and compute the resulting integral: Since θ(x * ) = θ(−σ), θ(κ 2/3 − x * ) = θ(σ + c 2 κ 2/3 ),f 0 (x * ; σ) = 0 andf ′′ 0 (x * ; σ) = 2c 1 c 2 σ. Finally, we obtain We combine (A.22) and (A.23) and plug them back to (A.12) which result in the final formula given in Table 1: . For completeness, we also take its derivative: Appendix A.3.4. Region d) 0 < m < 1, m ∈ O(1) and k = κN. Lastly, we consider the complexity region with an extensive index variable. In this case we use a result found in [27,28] valid for k = κN: . The quantile parameter q k = t −1 (k/N) is the inverse cdf of the Wigner's semicircle law: It is expressed in terms of an inverse incomplete beta function defined through We follow essentially the same steps as in the toppling region c), the sum in (A.12) is asymptotically given by: We find an approximate value for the integral by the saddle point method. First, there are three saddles: where λ 0 * is the extremum. Leading order asymptotics is found when λ 0 * ∈ (0, κ), otherwise the integral is subleading. We expand λ = λ 0 and, after integrating out the x variable, find We evaluate some of the terms given above: We bring these factors together and the sum S κN ( √ Ns) is equal to which is the Wigner's semicircle law truncated at s = √ 2Q κ . We plug back above formula to (A.12): Lastly, the large N contribution to this integral reads: while the prefactor is c N m −N N ∼ 2Ne −N/2−N ln m . Together, they form the final result given in Table 1: where Σ eq (m) = 1 2 (m 2 − 1) − ln m. The corresponding pdf is found by differentiation: where we used d dκ θ(m − Q κ ) = δ(κ − t(m)). Closely related formula was found by a different approach in [11,12].
Appendix A.4. Calculating asymptotic forms of N eq across the transition To obtain the mean number of all stationary points we can either follow the same route as in the previous section when deriving the fixed index case or simply use the fact that it is a special case of the cumulative distribution (A.12). For k = N, we find N eq = N (N ) since the sum of the individual eigenvalues pdf's summed over all eigenvalues gives the total density N k=0 ρ (k+1) N +1 = ρ N +1 : We compute the asymptotics of (A.29) in all four regions detailed in Section 2.2 and Figure 1. Due to similarities between (A.29) and (A.11), (A.12), in many steps we will reuse formulas obtained in Appendix A.3.
The spectral density ρ N is again approximated using the Wigner's semicircle law: The formula above does not depend on the parameter σ so in (A.33) we are left with a Gaussian integral The asymptotic formula for the prefactor of N eq reads:  Table 1: where Σ eq (m) = 1 2 (m 2 − 1) − ln m. The exponential part of this formula was calculated in (18) of [10].

Appendix B. Random energy landscape model with constraint E 0 = E(x)
In this section we consider a variant of the toy model (1): where we restrict to solutions of a fixed energy E 0 . In particular, we introduce the means n k , n eq and n (k) analogous to the quantities introduced in Section 2.1 but with an additional term δ E 0 − E(x) . Since both n eq and n (k) are easily derived from the cumulative mean n k , in what follows we consider only the latter quantity. Firstly, we write down its definition: where the E 0 dependence is explicitly stated. The mean number of stationary points with instability index k N k is related to (B.2) through an integral N k = dE 0 n k (E 0 ) . By essentially the same steps as in Appendix A.1 (also [12]), we arrive at the formula: The cumulative mean n k given by (B.3) is in a form resembling the corresponding quantity for the unconstrained toy model N k given by (18) as closely as possible. The only additional factor is the geometric function G N as the only term where the energy ǫ 0 enters into the formula. We can check that indeed dE 0 G N = 1 and (B.3) is reduced to the previously studied (18). Also, we reduce to (18) in the limit q → ∞ where also G N → 1.
Lastly, we readily find an expression for the number of all stationary points at a fixed energy E 0 : We turn to inspecting the phase space which, in contrast to the toy model (1), is twodimensional as we vary both the coupling strength m and the energy level ǫ 0 . We first study the behaviour of N , in what follows we calculate only the integral with F (s, R; m, ǫ 0 ) = f (s; m)+g(R; m)+h(s, R; m, ǫ 0 ). We first inspect the macroscopic phase space based on analysis of (B.4) as summarized in Figure 2.
where ∆(m, ǫ 0 ) = 2(1 + q 2 ) + q 2 (mq − ǫ 0 ) 2 . The correct saddles were chosen so that both R sp > 0 and s sp > 0. The latter condition s sp > 0 is true for values of m ∈ (0, m c ) with We expand the integral I(m, ǫ 0 ) around s = s sp + σ/N 1/2 , R = R sp + ρ/N 1/2 : where F sp = F (s sp , R sp ) and we skipped the N-independent prefactor to focus on the complexity exponent : with the complexity exponent given by Σ < eq (m, ǫ 0 ) = −F (s sp , R sp ) with function F defined in (B.4). Since m > 0 is positive, when m c becomes negative, the leading contribution to n eq vanishes. From m c = 0 we find an energy threshold (ǫ 0 ) th = − 1+2q 2 2q below which complexity exponent is always negative (gray dot in the left plot in Figure  2).
An implicit equation defines two curves ǫ 0 = ǫ ± (m) (11) in the (ǫ 0 , m)-plane (see the left plot in Figure  2) where the complexity exponent Σ eq changes sign and we move between simple and complex regions. Points on these curves are denoted by ((ǫ 0 ) * , m * ). In particular, for value (ǫ 0 ) * ,max = − 1 2q (dashed line in the left plot in Figure 2) sign change happens at m * ,max = 1.
For ǫ 0 = (ǫ 0 ) * ,max and in the large q limit, we recreate the complexity exponent for and compute the resulting integral I(m, ǫ 0 ): where F was defined in (B.4). From now on, we track only the exponential part. From ∂ s F = 0, ∂ R F = 0 we find the relevant saddles as where from s ′ sp > √ 2 we read off the condition m > m c with m c given by (B.6). The exponential contribution to the integral is I(m, ǫ 0 ) ∼ e −N Fsp with F sp = F (s ′ sp , R ′ sp ) + φ(s ′ sp ). The mean total number is given by . Function F sp = 0 vanishes for ǫ 0 = − 1 2q and m > m c . For different values of ǫ 0 we find F sp > 0 and so the complexity exponent is negative.
Appendix B.1.3. Microscopic scale in the vicinity of (m * ,max , (ǫ 0 ) * ,max ). We expand the complexity exponents Σ < eq and Σ > eq at the boundary m = m c : where from explicit formulas for both complexity exponents we find two first terms in the expansion equal Σ < eq (m c ) = Σ > eq (m c ), Σ < eq (m c ) ′ = Σ > eq (m c ) ′ and the discontinuity happens for the quadratic term Σ < eq (m c ) ′′ = Σ > eq (m c ) ′′ . Hence, the proper microscopic scaling is m = m c + δ/N 1/2 , ǫ 0 = (ǫ 0 ) c + ǫ/N 1/2 with points related by m c = 1 + 1+2q(ǫ 0 )c which we deal with in what follows. In the right plot of Figure 2 we graph a detailed picture of phase space near this critical point.

The result reads
, and parameters a 1 , a 2 defined before. For all values of m besides the critical value m = m * ,max = 1, the exponent function ∆ 0 < 0 is negative so the overall mean number of stationary points n eq is exponentially small in N. Hence, along the line m = m c besides m = 1, the fractional probabilities (6) loose its interpretation when the counting function in the denominator is vanishing asymptotically with N.

Appendix B.2. Calculating cumulative means n (k)
We calculate the cumulative mean n (k) as a sum of (B.3): We have already found asymptotic approximations of these sums in the two relevant cases with k = κN 1/4 for (A.23) and k = κN in (A.27).
Appendix B.2.1. Macroscopic scale m < m c and k = κN. We use the result of (A.27) where Q κ is the quantile function defined as the inverse cdf of the Wigner's semicircle law (A.26). We essentially follow the same steps as in Appendix A.3.3 where the asymptotics of N (k) were found. We combine it with the same saddle point analysis as in the derivation of n eq asymptotics: where the saddle point s sp is given by (B.5).

Appendix B.3. Calculating probabilities P k
We can calculate the probabilities P k provided by both total n eq and cumulative n (k) means for m < m c and around m = m * ,max . These correspond to regions c) and d) introduced in the paper and describe the toppling and complexity regimes respectively. Since the aim of this Appendix is to focus on the toppling mechanism in the fixed energy model (B.1), we altogether skip the simplicity and hierarchy regions a) and b) which are analogous to the unconstrained model. Appendix B.3.1. Region c) m = m * ,max + δ/N 1/2 , ǫ 0 = (ǫ 0 ) * ,max + ǫ/N 1/2 and k = κN 1/4 . We use (6) for the annealed probabilities and simply calculate the ratio of (B.10) and (B.12): where the parameters read .
The corresponding probability formula for the unconstrained toy model (1) given in Table 1 has the same functional form with different parameters C 2 , ∆. An easy check ensures that these parameters reduce in the large q limit lim q→∞ C 2 = c 2 and lim q→∞ ∆ = √ 2δ. The tipping point for the toppling mechanism in this model is when ∆ changes sign or when δ = ǫ/q. The maximum instability index in this model reads . Therefore, the underlying mechanism has the same characteristics as in the unconstrained toy model (1) shown in Figure 1.
Appendix B.3.2. Region d) m < m c . The ratio of (B.7) and (B.11) gives the probability in the complexity region: θ ssp∈(0, and ∆(m, ǫ 0 ) = 2(1 + q 2 ) + q 2 (mq − ǫ 0 ) 2 . We recast the conditions on the saddle point into that for m, ǫ and so s sp > 0 means m > 0, s sp < √ 2 is translated to m < m c while s sp > √ 2Q κ means m > m + with m + known only implicitly. For m ∈ (0, m c ), the denominator in (B.13) is always equal to 1 and moreover we can invert the second inequality θ(s sp > √ 2Q κ ) = θ κ − t(s sp / √ 2) with the Wigner's semicircle law cdf given by (A.26): As a check, we study q → ∞ limit where lim q→∞ s sp = √ 2m which again reduces to the formula found in Table 1. Finally, the cdf reads P κN (m, ǫ 0 ; q) ∼ θ κ − t(s sp / √ 2) while its pdf is a delta function: Appendix C. p-spin spherical model In this section we consider statistics of stationary points in a variant of p-spin spherical model following closely [13]. The energy function we consider reads: where x is now an N+1 dimensional vector constrained to lie on the sphere N +1 i=1 x 2 i = N and p ≥ 2 is an integer. The symmetric coupling matrix J and the random external field h i are both drawn from a Gaussian distribution with mean and variance given by: If we treat the energy itself as a random field, its covariance structure reads: with the correlation functionf (u) = J 2 p u p + σ 2 u. We follow [13] and arrive at a formula for the mean counting function: wherec J,p,σ = 2 √ π Γ( N+1 2 ) 2 N (J 2 + σ 2 ) −N/2 and z t = t J 2 p + σ 2 . Average K ′ is given by where H is a matrix drawn from the GOE with jPDF P (H) ∼ exp − N 4J 2 (p−1) TrH 2 . Lastly, we use a relation (A.10) with replacement µ 2 c → J 2 (p − 1) stemming from a different normalization of the random matrix and find: . We plug it back, rescale t = s 2J 2 (p−1) J 2 p+σ 2 and find . Lastly we introduce a single parameter where the constant reads c ′ We observe that for p ≥ 2, the parameter B ∈ −1, p−2 p . For completeness we also write down the total and the cumulative number of stationary points: Formula for N eq • agrees with (41) of [13]. and total instability k = N due to topological constraints present in the system. In particular, this constraint is most clearly manifested in the total number of all stationary points N eq • being equal to 2 in the simple region a). Closer inspection conducted in [13] reveals that stationary points in the simple region are necessarily a total minimum with instability index k = 0 and a total maximum with k = N. This fact has consequences also in the toppling region c).
Appendix C.4. Calculating probabilities P k With the knowledge of N (k) • and N eq • the annealed probabilities, (6) , are easily computed from the results in the previous two sections. The results are summarized in Table 2. Aforementioned topological constraint is manifested across all regions as symmetries of pdfs upon substitutions k → N − k and κ → 1 − κ. All results in this section are presented in Figure 3 where the symmetry is evident.