This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Brought to you by:
Paper The following article is Open access

On the atypical solutions of the symmetric binary perceptron

, , and

Published 26 April 2024 © 2024 The Author(s). Published by IOP Publishing Ltd
, , Citation Damien Barbier et al 2024 J. Phys. A: Math. Theor. 57 195202 DOI 10.1088/1751-8121/ad3a4a

1751-8121/57/19/195202

Abstract

We study the random binary symmetric perceptron problem, focusing on the behavior of rare high-margin solutions. While most solutions are isolated, we demonstrate that these rare solutions are part of clusters of extensive entropy, heuristically corresponding to non-trivial fixed points of an approximate message-passing algorithm. We enumerate these clusters via a local entropy, defined as a Franz–Parisi potential, which we rigorously evaluate using the first and second moment methods in the limit of a small constraint density $\alpha$ (corresponding to vanishing margin $\kappa$) under a certain assumption on the concentration of the entropy. This examination unveils several intriguing phenomena: (i) we demonstrate that these clusters have an entropic barrier in the sense that the entropy as a function of the distance from the reference high-margin solution is non-monotone when $\kappa \unicode{x2A7D} 1.429 \sqrt{-\alpha/\log{\alpha}}$, while it is monotone otherwise, and that they have an energetic barrier in the sense that there are no solutions at an intermediate distance from the reference solution when $\kappa \unicode{x2A7D} 1.239 \sqrt{-\alpha/ \log{\alpha}}$. The critical scaling of the margin $\kappa$ in $\sqrt{-\alpha/\log\alpha}$ corresponds to the one obtained from the earlier work of Gamarnik et al (2022 (arXiv:2203.15667)) for the overlap-gap property, a phenomenon known to present a barrier to certain efficient algorithms. (ii) We establish using the replica method that the complexity (the logarithm of the number of clusters of such solutions) versus entropy (the logarithm of the number of solutions in the clusters) curves are partly non-concave and correspond to very large values of the Parisi parameter, with the equilibrium being reached when the Parisi parameter diverges.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 license. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

1.1. Background and motivation

We consider the symmetric binary perceptron (SBP), introduced in [4], where we let ${\boldsymbol{G}} = (\boldsymbol{g}_a)_{a = 1}^M$ be a collection of M i.i.d. standard Gaussian random vectors in $\mathbb{R}^N$, with $M = \lfloor\alpha N \rfloor$ for a fixed $\alpha>0$ and for $\kappa>0$, we consider the set of binary solutions ${\boldsymbol{x}} \in \{-1,+1\}^N$ to the system of linear inequalities

Equation (1)

We denote the set of solutions by $S({\boldsymbol{G}}, \kappa)$, and its cardinality by

Equation (2)

It was shown by Aubin et al [4] that $S({\boldsymbol{G}},\kappa)$ is nonempty with high probability if and only if $\kappa \gt \kappa_{\tiny \mathrm{SAT}}(\alpha)$ where $\kappa_{\tiny \mathrm{SAT}}(\alpha)$ is defined by the equation

Equation (3)

Moreover, in the limit of small $\alpha$ we have

Equation (4)

Our main interest is in investigating the possibility of finding solutions efficiently when $\kappa \gt \kappa_{\tiny \mathrm{SAT}}(\alpha)$.

Mézard and Krauth [25] showed in their seminal work using the non-rigorous replica method [31] that the solution landscape of the one-sided perceptron (where there is no absolute value in the constraints (1)) is dominated by isolated solutions lying at large mutual Hamming distances, a structure sometimes called 'frozen replica symmetry breaking' [17, 22, 23, 29, 40]. From the mathematics point of view, the frozen replica symmetry breaking prediction was proven true for the SBP in works by Perkins and Xu [34] and Abbé et al [1], who showed that for all $\kappa \gt \kappa_{\tiny \mathrm{SAT}}(\alpha)$, a solution drawn uniformly at random from $S(\boldsymbol{G},\kappa)$ is isolated with high probability, in the sense that it is separated from any other solution by a Hamming distance linear in N.

This type of landscape property has been traditionally associated with algorithmic hardness, with the rationale that an algorithm performing local moves is unlikely to succeed in the face of such extreme clustering, as argued, for instance, by Zdeborová and Mézard [39], or Huang and Kabashima [22]. In some problems, this predicted algorithmic hardness was confirmed empirically, e.g. [39, 40]. In other problems, a prominent example being the binary perceptron (symmetric or not), it is known that certain efficient heuristics are able to find solutions for α small enough as a function of $\kappa$ [6, 7, 16, 18, 19, 23, 24]. Statistical physics studies of the neighborhood of the solutions returned by efficient heuristics have put forward the intriguing observation that in the binary perceptron problem, a dense region of other solutions surrounds the ones which are returned [5, 9, 10]. This means that efficient algorithms may be drawn to rare, well connected subset(s) of $S(\boldsymbol{G},\kappa)$. Moreover, these efficient algorithms fail to return a solution when α becomes large, suggesting the existence of a computational phase transition in the binary perceptron (symmetric or not).

For the symmetric version of the problem, this state of affairs has been partially elucidated in two recent mathematical works: in [2], Abbé et al show the existence of clusters of solutions of linear diameter for all $\kappa \gt \kappa_{\tiny \mathrm{SAT}}(\alpha)$, and maximal diameter for $\alpha$ small enough. In a different direction, Gamarnik et al [21] established an almost sharp result in the regime of small $\alpha$, stating the following: there exists constants $c_0, c_1 \gt0 $ such that for α small enough,

  • if $\kappa \unicode{x2A7E} c_0 \sqrt{\alpha}$ then a certain online algorithm of Bansal and Spencer [12] finds a solution in $S({\boldsymbol{G}},\kappa)$, and
  • if $\kappa \unicode{x2A7D} c_1 \sqrt{-\alpha/\log(\alpha)}$ then $S({\boldsymbol{G}},\kappa)$ exhibits a overlap gap property ruling out a wide class of efficient algorithms.

We mention that the positive result which holds for $\kappa \unicode{x2A7E} c_0 \sqrt{\alpha}$ is established in the case where the constraint matrix G is Rademacher instead of Gaussian; nevertheless, the same result is expected in the Gaussian case.

Baldassi et al [8] suggest that this computational transition can be probed by studying the monotonicity properties of the local entropy of solutions around atypical solutions ${\boldsymbol{x}}_0$ as a function of the distance from this solution. One can interpret the results of [8] as evidence towards a conjecture that finding a solution is computationally easy precisely when there exist some rare solutions around which this local entropy is monotone in the distance and that the problem becomes hard when this local entropy develops a local maximum at some distance r0 from the reference solution ${\boldsymbol{x}}_0$. If such a conjecture is correct, then it must agree with the above-mentioned finding of Gamarnik et al [21] in the regime of small $\alpha$. This question motivated the present work.

Another gap in the physics literature we elucidate in this work relates to the fact that the replica method on the one-step replica symmetry breaking level so far has not managed to find clusters of solutions in the binary perceptron. Indeed, the method can count rare clusters as long as they correspond to fixed points of a corresponding message-passing algorithm, see e.g. [38]. Parallels between the 1RSB calculation and the analysis of solutions with a monotonic local entropy have been put forward in [5, 9, 10], but not in the form where one writes the standard 1RSB equations and shows that they have a solution corresponding to rare subdominant clusters. We show that the standard 1RSB framework actually does present such solutions which describe subdominant clusters of extensive entropy, and we give likely reasons why these solutions were missed in past investigations.

1.2. Summary of our results

1.2.1. Local entropy around high margin solutions.

We define and study a notion of local entropy around solutions which are typical at some margin $\kappa_0 \lt \kappa$. While typical solutions at $\kappa_0$ are isolated from each other, it was shown in [2] that they belong to connected components of solutions at margin $\kappa$ having a linear diameter in N. Here, we show that these solutions are surrounded by exponentially many solutions at margin $\kappa$.

Consistently with the statistical physics literature, we say that there is a cluster of extensive entropy around a reference solution ${\boldsymbol{x}}_0$ when the local entropy as a function of the distance achieves a local maximum at some distance from ${\boldsymbol{x}}_0$. We show that for a certain range of $\kappa$ typical solutions at margin $\kappa_0$ have extensive entropy clusters around them. We define the entropy of these clusters as the value of the entropy at a local maximum. An analogous investigation of local entropy around large margin solutions was performed in [11] for the one-sided binary perceptron using the replica method.

In our case, the symmetry of the constraints (1) allows us to derive simpler formulas for the local entropy in the regime of small $\alpha$, essentially via a first moment method. This is due to the present model being contiguous to a corresponding simpler planted model in which the first and second moment computations can be conducted. We show that under a certain assumption on the concentration of the entropy of the SBP, while for any constant value of $\alpha$ the second moment is exponentially larger than the square of the first moment, the exponent of the ratio of these quantities, when normalized by N, tends to zero in the limit of small $\alpha$.

The resulting entropy of these clusters is plotted in figure 1 for various values of $\kappa$ and $\kappa_0$ in the $\alpha$ → 0 limit. We observe that at a certain margin $\kappa_\textrm{entr}(\kappa_0)$ the entropy curve stops because the local entropy curve becomes monotone in the distance for $\kappa \gt \kappa_\textrm{entr}(\kappa_0)$. As discussed above, the existence of reference solutions such that the local entropy curve is monotone was speculated to provoke the onset of a region of parameters where finding solutions is algorithmically easy. In this paper, we show the existence of solutions–those typical at $\kappa_0$–for which the local entropy is monotone, and hence we do not expect the problem to be computationally hard for $\kappa \gt \kappa_\textrm{entr}(\kappa_0)$. In figure 1 we see that the smallest $\kappa$ where this happens is $\kappa_\textrm{entr} \equiv \textrm{min}_{\kappa_0} \kappa_\textrm{entr}(\kappa_0) = \kappa_\textrm{entr}(\kappa_0 = \kappa_{\tiny \mathrm{SAT}})$. For this reason, a large part of this investigation is devoted to the case $\kappa_0 = \kappa_{\tiny \mathrm{SAT}}(\alpha)$.

Figure 1.

Figure 1. Entropy of clusters that exist at margin κ around a typical solution at margin κ0. We focus on the small α limit where the margins are rescaled as $\kappa = \tilde{\kappa}\sqrt{-\alpha/\log(\alpha)}$. and $\kappa_0 = \tilde{\kappa}_0\sqrt{-\alpha/\log(\alpha)}$. The dashed line corresponds to the envelope of the ending points for all values of κ0, i.e. to the entropies at which the clusters (non-trivial AMP/TAP fixed points) disappear. In particular, we observe that the clusters with $\tilde{\kappa}_0 = \tilde \kappa_\textrm{SAT} = 0$ disappear first, thus marking a threshold of $\tilde{\kappa}_\textrm{entr} \approx 1.429$ above which the so-called 'wide-flat-minima' of [5, 11] exist.

Standard image High-resolution image

Motivated by these findings, we then study the local entropy of solutions that are at a Hamming distance Nr from the solution planted at $\kappa_0 = \kappa_{\tiny \mathrm{SAT}}(\alpha)$. This is akin to the Franz–Parisi potential as studied in the physics of spin glasses [20]. Here, we compute this potential around a typical solution at $\kappa_0$. Our findings, again in the regime of small $\alpha$, are summarized in figure 2(left), where it is apparent that the local entropy as a function of the distance r from a reference solution is monotone when $\kappa \unicode{x2A7E} \tilde{\kappa}_\textrm{entr} \sqrt{-\alpha/\log(\alpha)}$ and has a local maximum at an intermediate distance r0 when $\kappa \lt \tilde{\kappa}_\textrm{entr} \sqrt{-\alpha/\log(\alpha)}$, with $\tilde{\kappa}_\textrm{entr} \approx 1.429$ given by implicit equations (92) and (93). We also show that no solutions can be found in an interval of distances from the reference solution when $\kappa \lt \tilde{\kappa}_\textrm{ener} \sqrt{-\alpha/\log(\alpha)}$ with $ \tilde{\kappa}_\textrm{ener} \approx 1.239$ given by the implicit equations (87) and (88).

Figure 2.

Figure 2. On the left panel we plot the local entropy $\phi_1(r) = h(1-2r) + \alpha \varphi_1(1-2r)$ in the small α limit as a function of the rescaled distance $\tilde{r} = -4r\log(\alpha)/\alpha$; see section 2. Each curve corresponds to a different value of $\kappa$ (with $\tilde{\kappa} = \kappa/\sqrt{-\alpha/\log(\alpha)}\,$), where we have set $\kappa_0 = \kappa_{\textrm{SAT}}(\alpha)$ and thus $\kappa_0/\kappa{\rightarrow}0$ with $\alpha \rightarrow 0$. First, we highlight the value $\tilde{\kappa}_\textrm{ener}$ above which the entropy remains positive for any distance $\tilde{r}\gt0$. Then, we have a second critical value for $\tilde{\kappa}$ that we label $\tilde{\kappa}_\textrm{entr}$. It corresponds to the minimum value of $\tilde{\kappa}$ for which the local entropy has a local maximum with $\tilde{r}\neq 0$. On the right panel we plot the complexity $\Sigma[\tilde{\kappa}_0]$ as a function of the local entropy $s[\tilde{\kappa}_0,\tilde{\kappa}]$ (with $\tilde{\kappa}_0 = \kappa_0/\sqrt{-\alpha/\log(\alpha)}\,$). In this context, the complexity corresponds to the exponential number of possible planted configurations at $\kappa_0$. The entropy corresponds to the local entropy $\phi_1(r)$ evaluated at its local maxima in $\tilde{r}$ for fixed $\tilde{\kappa}_0$ and $\tilde{\kappa}$. To obtain a O(1) scaling of the complexity in the low α limit we subtracted $\Sigma_o = \log(2)+\alpha\log(-\alpha/\log(\alpha))$. Finally, for $\tilde{\kappa}\gt\tilde{\kappa}_\textrm{entr}$ a span of value for $\tilde{\kappa}_0$ yields no local maxima for the local entropy. This explains the sudden stop in the right-hand tail of the purple and brown curves (for which $\tilde{\kappa}\gt\tilde{\kappa}_\textrm{entr}$).

Standard image High-resolution image

From these results, we note the existence of a logarithmic gap in $1/\alpha$ in the value of κ where the local entropy curve becomes monotone and the value where the Bansal–Spencer algorithm is proved to succeed, in the regime of small α. It is an interesting open problem to close this gap, either by showing that efficient algorithms can find solutions for all $\kappa \unicode{x2A7E} \tilde{\kappa}_\textrm{entr} \sqrt{-\alpha/\log(\alpha)}$ or by showing the local entropy approach is not indicative of algorithmic hardness.

1.2.2. The 1RSB computation of the complexity curve.

We note that in the statistical physics literature, clusters as defined above are also associated with a fixed point of the approximate message passing (AMP) algorithm or equivalently the Thouless–Anderson–Palmer (TAP) equations. The cluster entropy can be thus computed as the Bethe entropy corresponding to the AMP/TAP fixed point that is reached by AMP run at κ and initialized in one of the typical solutions at margin κ0. For $\kappa \gt \kappa_\textrm{entr}(\kappa_0)$ the AMP/TAP converges to the same fixed point as would be reached from a random initialization, corresponding to an entropy covering the whole space of solutions. Using this relation, the onset of a region where algorithms may be able to find these solutions is then related to the existence of solutions such that a AMP/TAP iteration initialized at these points converges to the same fixed point as if the iteration was initialized uniformly at random from $S({\boldsymbol{G}},\kappa)$. Indeed, it was observed empirically that solutions found by efficient algorithms always have such a property of AMP/TAP or the belief propagation algorithm converging to the same fixed point as from a random initialization [15, 28].

In the existing statistical physics literature, using the replica method on the one-step replica symmetry breaking level, researchers so far have not found clusters of solutions of extensive entropy in the binary perceptron. This is a point of concern as this method is supposed to count all clusters of solutions corresponding to the TAP/AMP fixed points, including the rare non-equilibrium ones [17, 26, 30, 32, 38]. This a priori casts doubt on the efficacy of the replica method and the validity of its predictions for the number of clusters of a given size, since the method misses a large part of the phase space (unless some explicit conditioning is done as in [9, 10]).

We propose, based on the replica method, that the answer to this question lies in the properties of the complexity (the logarithm of the number of clusters) versus entropy (the logarithm of the number of solutions in the clusters) $\Sigma(s)$. We observe that the numerical value of the complexity is rather large compared to the entropy. The slope of $\Sigma(s)$ gives the value of the so-called Parisi parameter x that is therefore rather large: $x \gg 1$. Since the value of x describing the equilibrium properties of the system is always between 0 and 1 it is not that surprising that the literature has not investigated solutions of the replica equations corresponding to $x \gg 1$. When we consider a large range of values of x in the standard 1RSB equations for SBP [4], we obtain the $\Sigma(s)$ depicted in figure 2 right. We then provide an argument that leads us to conjecture that in the small α limit, the curve $\Sigma(s)$ corresponds to the one we obtain via the approach of planting at κ0. Thus even though, in general, by planting we construct only some of the rare clusters, it seems that in fact we construct the most frequent ones in the limit of small α.

Another property that we unveil is related to the fact that the curve $\Sigma(s)$ is usually expected to be concave. The non-concave parts were so far considered 'unphysical' in the literature (e.g. figure 8 in [14] or figure 5 in [38]). We show in our present work that the so-called 'unphysical branch' of the replica/cavity prediction is actually not 'unphysical' in the SBP and that it reproduces the curve $\Sigma(s)$ obtained from the local entropy calculation at small α and small internal cluster entropy. Moreover, we show that some of the relevant parts of the curve $\Sigma(s)$ cannot be obtained in the usually iterative way of solving the 1RSB equations at a fixed value of the Parisi parameter x. To access this part of the curve we need to adjust the value of x adaptively in every step when solving the 1RSB fixed point equations iteratively.

1.3. Organization of the paper and the level of rigour

The rest of the paper is organized as follows: section 2 defines the local entropy and states the main theorem 1 in the small α limit. Section 3 introduces the planted model and its contiguity to the original model; a key element of the proof. Section 4 contains the moment computations in the planted model, ending with the proof of theorem 1. In section 5 we use the result of theorem 1 and study the properties of the asymptotic formula of the local entropy in the small α limit. In section 6 we study the one-step-replica-symmetry breaking solution of the SBP and its relation to the local entropy. This section investigates general values of α, not only the small α limit. Finally, we conclude in section 7.

Sections 24 are fully mathematically rigorous. In section 5 we analyze the resulting local entropy formula heuristically, solving the corresponding fixed point equations numerically, and deriving the numerical values for the energetic and the entropic thresholds. In section 6 we rely on the replica method which is well-accepted and widely used in theoretical statistical physics but not rigorously justified from the mathematical standpoint.

2. Definitions and main theorem

In this paper, the local entropy is defined around a solution satisfying the SBP inequalities (1) with a stricter margin κ0. More precisely, for $\kappa_0 \unicode{x2A7D} \kappa$, let ${\boldsymbol{x}}_0 \in S(\boldsymbol{G}, \kappa_0)$, and let $Z({\boldsymbol{x}}_0 , \kappa , r)$ be the set of solutions $\boldsymbol{y}\in S(\boldsymbol{G}, \kappa)$ which are at Hamming distance $N r$ form ${\boldsymbol{x}}_0$:

Equation (5)

We then define the local entropy function as the (truncated) logarithm of Z averaged over the choice of ${\boldsymbol{x}}_0$ and the disorder G :

Equation (6)

where $\log_{N\delta}(x) = \max\{\log (x),N \delta\}$, δ > 0. This truncation to the logarithm is technically convenient, following [33, 36]. Note that for $\kappa_0 = \kappa$, the fact that there are no solutions at a distance less than $r_0N$ around ${\boldsymbol{x}}_0$ for some $r_0 = r_0(\kappa,\alpha)$ with high probability [34] implies that $\phi_{N,\delta}(r) = \delta + o_N(1)$ for all $r \lt r_0$, and so $\lim_{\delta \to 0} \lim_{N\to \infty} \phi_{N,\delta}(r) = 0$ for $r\lt r_0$. However, as we increase κ starting from κ0, new nearby solutions are expected to emerge. These are the solutions which are counted by $\phi_{N,\delta}(r)$. This, of course, does not contradict the frozen-1RSB property of $S({\boldsymbol{G}},\kappa)$ since ${\boldsymbol{x}}_0$ is not typical in $S({\boldsymbol{G}},\kappa)$.

We show that under a certain concentration condition, assumption 1 stated in section 3, the local entropy $\phi_{N,\delta}(r)$ is given in the limit $N \to \infty$ followed by α → 0 then δ → 0 by a simple formula which corresponds to the first moment bound (i.e. annealed entropy) in the corresponding planted model of the SBP. We define binary entropy function

Equation (7)

and

Equation (8)

where the outer expectation is taken with respect to $Z_0 \sim N(0,1)$ conditioned on the event $|Z_0| \unicode{x2A7D} \kappa_0$, and $Z \sim N(0,1)$ independently of Z0. (Z0 has p.d.f. f; equation (12)).

Theorem 1. For $m \in (-1,1)$, $r = (1-m)/2$ and any sequence $\kappa = \kappa(\alpha) \to 0$ as α → 0 such that $\alpha \log(1/\kappa) \to 0$, under assumption 1 we have

Equation (9)

Remark. Observe that for α small, we have $\kappa_{\tiny \mathrm{SAT}}(\alpha) = \Theta(2^{-1/\alpha})$, therefore the condition on κ and α in the theorem can be interpreted as $\kappa \gg \kappa_{\tiny \mathrm{SAT}}(\alpha)$.

The proof of theorem 1 can be found in section 4.

3. The planted model and contiguity

The analysis of the local entropy is achieved via a planted model where ${\boldsymbol{x}}_0$ is drawn uniformly at random from the hypercube $\{-1,+1\}^N$ and then the constraint vectors ${\boldsymbol{g}}_a$ are drawn from the Gaussian distribution conditional on ${\boldsymbol{x}}_0$ being a satisfying configuration, i.e. conditional on ${\boldsymbol{x}}_0 \in S({\boldsymbol{G}},\kappa_0)$.

More precisely, we fix the reference (planted) vector ${\boldsymbol{x}}_0\in \{-1,+1\}^N$ and for each $a \in \{1,\ldots,M\}$ we independently draw Gaussian random vectors ${\boldsymbol{g}}_a$ conditioned on the event that

Equation (10)

Equivalently, we can write

Equation (11)

where $(\tilde{\boldsymbol{g}}_a)_{a = 1}^M$ are independent $N(0, {\boldsymbol{I}}_N)$ random vectors and ${\boldsymbol{w}} = (w_a)_{a = 1}^M$ has mutually independent coordinates, independent of $(\tilde{{\boldsymbol{g}}}_a)_{a = 1}^M$, and distributed as $N(0,1)$ r.v.'s conditioned to be smaller than κ0 in absolute value, i.e. they have a p.d.f.

Equation (12)

We let $\operatorname{\mathbb{P}}_{\tiny{pl}}$ be the distribution of the pair $({\boldsymbol{G}},{\boldsymbol{x}}_0)$ as per the description above, equation (10), and $\operatorname{\mathbb{P}}_{\mathrm{\tiny rd}}$ be their distribution according to the original model where ${\boldsymbol{G}} \in \mathbb{R}^{M \times N}$ is an array of standard Gaussian vectors and ${\boldsymbol{x}}_0$ is drawn uniformly at random from $S({\boldsymbol{G}},\kappa)$, conditional on the latter being non-empty. We denote by $\operatorname{\mathbb{E}}_{\tiny{pl}}$ and $\operatorname{\mathbb{E}}_{\tiny rd}$ the associated expectations. A simple computation reveals that the ratio of $\operatorname{\mathbb{P}}_{\tiny{pl}}$ to $\operatorname{\mathbb{P}}_{\tiny rd}$ is given by

Equation (13)

It was shown in [2] in the case of binary disorder where G has independent Rademacher entries that the above likelihood ratio has constant order log-normal fluctuations for all $\kappa_0 \gt \kappa_{\tiny \mathrm{SAT}}(\alpha)$; this implies in particular that $\operatorname{\mathbb{P}}_{\tiny rd}$ and $\operatorname{\mathbb{P}}_{\tiny{pl}}$ are mutually contiguous, meaning that for any sequence of events En (in the common probability space of $\operatorname{\mathbb{P}}_{\tiny rd}$ and $\operatorname{\mathbb{P}}_{\tiny{pl}}$), $\operatorname{\mathbb{P}}_{\tiny rd}(E_n) \to 0$ if and only if $\operatorname{\mathbb{P}}_{\tiny{pl}}(E_n) \to 0$, see for instance [37, lemma 6.4]. In other words, any high-probability event under the planted distribution $\operatorname{\mathbb{P}}_{\tiny{pl}}$ is also a high-probability event under the original distribution $\operatorname{\mathbb{P}}_{\tiny \mathrm{rd}}$.

Contiguity allow to compute the local entropy in the planted model, where ${\boldsymbol{x}}_0$ is uniformly distributed over $\{-1,+1\}^N$ instead of $S({\boldsymbol{G}},\kappa_0)$, and then transfer the result of this computation to the original model. While it appears contiguity is currently not known to hold when the disorder is Gaussian, Perkins and Xu [34] showed that under a certain numerical assumption (see assumption 1 therein),

Equation (14)

in $\operatorname{\mathbb{P}}_{\tiny rd}$-probability for all $\kappa \gt \kappa_{\tiny \mathrm{SAT}}(\alpha)$. As observed in [3, 34] this implies the weaker statement that events of probability $e^{-c n}$, c > 0 under $\operatorname{\mathbb{P}}_{\tiny{pl}}$ are of probability $o_N(1)$ under $\operatorname{\mathbb{P}}_{\tiny rd}$. This turns out to be sufficient for our purposes. This argument is used to prove lemma 2 below.

In addition to the above, we require a concentration property of the restricted partition function $Z({\boldsymbol{x}}_0,\kappa,r)$ with respect to the disorder G , which we state in more general form as follows: let $a_j\lt b_j$, $1 \unicode{x2A7D} j \unicode{x2A7D} M$ be two sequences of real numbers, let $m \in [-1,1]$ and consider the partition function

Equation (15)

where ${\boldsymbol{g}}_j$ are i.i.d. standard Gaussian random vectors in $\mathbb{R}^N$.

Assumption 1. For any δ > 0, $m \in [-1,1]$ and sequences $(a_j), (b_j)$ as above, there exist a constant C > 0 depending only on δ and $\Delta : = max_j (b_j-a_j)$ such that for all t > 0,

Equation (16)

In models of disordered systems where the free energy is a smooth function of the Gaussian disorder, this concentration follows from general principles of Gaussian concentration of Lipschitz functions, see e.g. [13]. In particular, a stronger version of the above assumption (with no truncation to the logarithm and where the decay on the right-hand side is sub-Gaussian for all t > 0) holds for the SK and p-spin models at any positive temperature, and for the family of U-perceptrons where the activation function U is positive and differentiable with bounded derivative. However, in our case the hard constraints defining the model make concentration far less obvious. Currently, exponential concentration of the truncated log-partition function is known for the half-space model i.e. the one-sided perceptron [36], and for the more general family of U-perceptrons which includes the SBP model under study here, albeit with a non-optimal exponent in N on the right-hand side of equation (16), and with an additional slowly vanishing term on the right-hand side; see [33, proposition 4.5]. (The latter paper also studies concentration and the sharp-threshold phenomenon for more general disorder distributions.) For our purposes, an essential feature is exponential decay in $N\theta(t)$ where $\theta: \mathbb{R}_{+} \to \mathbb{R}_{+}$ is any increasing function with $\theta(0) = 0$. We assume $\theta(t) = \min\{t^2,t\}$ in the above since this is the sub-exponential tail which is expected, but this is not crucial to the proof. Establishing the above assumption is an interesting mathematical problem on its own and goes beyond the scope of this paper.

In the planted model, the local entropy takes the simplified form

Equation (17)

where the expectation is with respect to ${\boldsymbol{x}}_0$ taken uniformly in $\{-1,+1\}^N$ and the conditional distribution ${\boldsymbol{G}} | {\boldsymbol{x}}_0 $ is given by equation (11). We now show that under assumption 1, $\phi_{N,\delta}(r)$ and $\phi^{\tiny{pl}}_{N,\delta}(r)$ are close:

Lemma 2. Under assumption 1 we have for all $r \in (0,1)$,

Equation (18)

Proof. We define the random variable $X = (1/N) \log_{N\delta} Z({\boldsymbol{x}}_0, \kappa, r)$. We have $ \operatorname{\mathbb{E}}_{\tiny{pl}}[X] = \phi^{\tiny{pl}}_{N,\delta}(r)$ and $\operatorname{\mathbb{E}}_{\mbox rd}[X] = \phi_{N,\delta}(r)$.

Now for t > 0 fixed, we consider the event $A = \big\{\big|X -\operatorname{\mathbb{E}}_{\tiny{pl}}[X] \big| \unicode{x2A7D} t\big\}$. Under the planted model $\operatorname{\mathbb{P}}_{\tiny{pl}}$ we may assume that ${\boldsymbol{x}}_0 = \mathbf{1}$ by symmetry of the Gaussian distribution. Therefore by assumption 1 (with $\Delta = (1-2r)\kappa$) we have $\operatorname{\mathbb{P}}_{\tiny{pl}}(A^c) \unicode{x2A7D} e^{-c N}$, $c = c(t)\gt0$. We show that this combined with (14) implies $\operatorname{\mathbb{P}}_{\tiny rd}(A^c) = o_N(1)$. Indeed for any ε > 0,

Equation (19)

Equation (20)

where the $o_N(1)$ bound on the second term follows from (14). Taking $\varepsilon = c/2$ shows that $\operatorname{\mathbb{P}}_{\tiny rd}(A^c) = o_N(1)$. Further, observe that $0 \unicode{x2A7D} X \unicode{x2A7D} \log 2$, $\operatorname{\mathbb{P}}_{\tiny rd}$-almost surely. Therefore we have

Equation (21)

The claim follows by letting t → 0 after $N \to \infty$. □

4. Moment estimates in the planted model

Now we aim to calculate the limit of $\phi^{\tiny{pl}}_N(r)$ as $N\to \infty$ for small α. To this end we evaluate the first two moments of $Z({\boldsymbol{x}}_0, \kappa, r)$ and show that the second moment is only larger than the square of the first moment by an exponential factor which shrinks as α → 0. Then we show that $\phi^{\tiny{pl}}_N(r)$ is close to its annealed approximation using assumption 1.

We first need to define two auxiliary functions. For a jointly distributed pair of discrete random variables $(\theta_1,\theta_2)$ let $h(\theta_1,\theta_2)$ be their Shannon entropy. For $m,q \in (-1,1)$ we define the function

Equation (22)

where $Z_0 \sim f$ and the pair $(Z_1,Z_2)$ is a centered bivariate Gaussian vector independent of Z0 with covariance

Equation (23)

Theorem 3. Let ${\boldsymbol{w}} = (w_a)_{a = 1}^M$ as in equation (11). For $m \in (-1,1)$, $r = (1-m)/2$ we have

Equation (24)

Equation (25)

where ϕ1 is defined in equation (8), and the inner maximization in equation (25) is over the joint distribution of two $\{-1,+1\}$-valued random variables $(\theta_1,\theta_2)$ such that $\operatorname{\mathbb{E}}[\theta_1] = \operatorname{\mathbb{E}}[\theta_2] = m$ and $\operatorname{\mathbb{E}}[\theta_1\theta_2] = q$.

The proof of the above theorem relies on a standard use of Stirling's formula, and is postponed to the end of this section. At this point, if the right-hand side of equation (25) is equal to twice the right-hand side of equation (24), a mild concentration argument would allow us to conclude that $\phi^{\tiny{pl}}_N(r)$ is given by equation (24) in the large N limit. This equality would follow if the value $q = m^2$ is a maximizer in equation (25). This does not appear to be the case for any values of $\alpha,\kappa_0,\kappa$. However, we show that the difference is vanishing when α → 0. Let

Equation (26)

Equation (27)

Lemma 4. Assume $\kappa_0\lt1$ and $\kappa^2 \unicode{x2A7E} \kappa_0^2/(1-\kappa_0^2)$. Then for all $m \in (-1,1)$,

Equation (28)

where $p(\kappa) = \operatorname{\mathbb{P}}\big(|Z| \unicode{x2A7D}\kappa\big)$, $Z \sim N(0,1)$. In particular the above difference tends to zero wherever $\alpha \to 0, \kappa \to 0$ with $\alpha \log(1/\kappa) \to 0$, and $\kappa_0 \ll \kappa$.

Proof. We first remark that by sub-additivity of the entropy,

Equation (29)

with equality if and only if the pair $(\theta_1,\theta_2)$ is independent, i.e. if $q = m^2$. Moreover we remark that for all $q \in [-1,1]$,

Equation (30)

where the lower bound follows from the Gaussian correlation inequality [27, 35] (with equality if $(Z_1,Z_2)$ are independent, i.e. $q = m^2$) and the upper bound by Cauchy Schwarz (with equality if $Z_1 = Z_2$, i.e. q = 1).

Using the bounds (29) and (30) we have

Equation (31)

whence,

Equation (32)

It remains to show that ϕ1 is a non-decreasing function so that $\varphi_1(m) \unicode{x2A7E} \varphi_1(0) = \log p(\kappa)$. A simple computation of the derivative of ϕ1 reveals that

Equation (33)

where Z0 has p.d.f. (12), and the expectation is taken with respect to Z0. Using $a_{\pm}^{^{\prime}}(m) = \frac{-Z_0 \pm m\kappa}{(1-m^2)^{3/2}}$, the numerator of the above expression can be written as follows:

Equation (34)

We will show that the above display is non-negative for all values of Z0. First, note that this expression is even as a function of Z0 so we assume $Z_0\unicode{x2A7E}0$ without loss of generality. Now, since $Z_0 \unicode{x2A7D} \kappa_0$ a.s. the above expression is nonnegative if $m \unicode{x2A7E} \kappa_0/\kappa$. Now let us consider the remaining case $m \lt \kappa_0/\kappa$. Processing the numerator further we obtain

Equation (35)

Equation (36)

Equation (37)

From the bound $\tanh(x) \unicode{x2A7D} x$ for $x \unicode{x2A7E} 0$ we see that

Equation (38)

This is non-negative as long as $1 - \kappa_0^2/(1-m^2) \unicode{x2A7E} 0$. Since $m \lt \kappa_0/\kappa$, this is verified when $1-(\kappa_0/\kappa)^2 \unicode{x2A7E} \kappa_0^2$, i.e. when $\kappa^2\unicode{x2A7E} \kappa_0^2/(1-\kappa_0^2)$. □

Next, we are ready to prove the main result of this section:

Theorem 5. Under the assumptions of theorem 1 we have

Equation (39)

where the limit in α is such that $\alpha \to 0, \kappa \to 0$ with $\alpha \log(1/\kappa) \to 0$.

We see that theorem 1 follows from theorem 5 and lemma 2. Now we prove theorem 5:

Proof. We write $Z = Z({\boldsymbol{x}}_0, \kappa, r)$. All probabilities and expectations are taken under $\operatorname{\mathbb{P}}_{\tiny{pl}}$. For fixed $t,t^{^{\prime}}\gt0$ to be chosen later we define the events

Equation (40)

Equation (41)

First, we note that by Jensen's inequality,

Equation (42)

Equation (43)

Equation (44)

Equation (45)

Since by theorem 3, $\frac{1}{N}\log \operatorname{\mathbb{E}}_{\tiny{pl}}\big[Z \,|\, \boldsymbol{w} \big] \to \phi_1(m)$ almost surely as $N\to \infty$, we have by dominated convergence,

Equation (46)

Next, under $A \cap B \cap C$ we have

Equation (47)

Now the goal is to show that $\operatorname{\mathbb{P}}_{\tiny{pl}}(A\cap B \cap C) \gt0$. Let w be such that $C \cap D$ holds. It follows by the Paley–Zigmund inequality that

Equation (48)

Equation (49)

Equation (50)

Equation (51)

Equation (52)

where the last inequality follows from $C \cap D$. From lemma 4 we have $\phi_2(m)-2\phi_1(m) \unicode{x2A7D} \alpha \log(1/p(\kappa))$ when $\kappa^2 \unicode{x2A7E} \kappa_0^2(1-\kappa_0^2)$. Next by theorem 3 we have $\operatorname{\mathbb{P}}_{\tiny{pl}}(C \cap D) \unicode{x2A7E} 1/2$ for N large enough (it is actually $1-o_N(1)$). It follows that

Equation (53)

On the other hand, by our concentration assumption 1, $\operatorname{\mathbb{P}}_{\tiny{pl}}(B^c) \unicode{x2A7D} \exp(-N \min\{t^{^{\prime}} 2/K^2,$ $t^{^{\prime}}/K\})$ where $K = K(\delta,\Delta)$, $\Delta = (1-2r)\kappa$ is the constant appearing in the assumption, we have by a union bound

Equation (54)

Now we choose $t = \frac{1}{3}(2\delta+\alpha \log(1/p(\kappa)))$ and $t^{^{\prime}} = t_N^{^{\prime}}$ such that $\min\{t_N^{^{\prime}} 2/K^2,t_N^{^{\prime}}/K\} = (\log 16) /N + 2\alpha \log(1/p(\kappa)) + 4\delta$. We obtain

Equation (55)

Therefore the bound (47) holds with this choice of parameters:

Equation (56)

and we obtain

Equation (57)

Letting $\alpha \to 0,\kappa \to 0$ such that $\alpha \log(1/\kappa)\to0$ and then δ → 0 concludes the proof. □

Proof of theorem 3. Let us start with the first moment. First, we have

Equation (58)

Equation (59)

We further have

Equation (60)

Equation (61)

where $Z \sim N(0,1)$ independently. Using Stirling's formula, we obtain

Equation (62)

An application of the strong law of large numbers yields the formula in equation (24).

We now calculate the second moment:

Equation (63)

Equation (64)

Fix $m,q \in [-1,1]$ and three vectors ${\boldsymbol{x}}_0, {\boldsymbol{x}}^1$, ${\boldsymbol{x}}^2$ such that $\langle{\boldsymbol{x}}^1,{\boldsymbol{x}}^2\rangle = N q$, and $\langle{\boldsymbol{x}}^i,{\boldsymbol{x}}_0\rangle = N m$, for $i = 1,2$. Then as before,

Equation (65)

Equation (66)

where the pair $(Z_1,Z_2)$ is defined as in equation (23). Furthermore, by symmetry we can assume that ${\boldsymbol{x}}_0 = \mathbf{1}$, and we define the set

Equation (67)

We have

Equation (68)

Therefore

Equation (69)

Next we compute the size of $C(m,q)$. We can write

Equation (70)

where $\binom{N}{{{\boldsymbol{k}}}} = \binom{N}{k_{+,+},k_{+,-},k_{-,+},k_{-,-}}$ is the multinomial coefficient and the sum is restricted to those integers satisfying

Equation (71)

Equation (72)

Equation (73)

Equation (74)

Using Stirling's formula we find

Equation (75)

where the maximization is as in equation (25). (The correspondence being that $\operatorname{\mathbb{P}}(\theta_1 = \epsilon,\theta_2 = \epsilon^{^{\prime}}) = k_{\epsilon,\epsilon^{^{\prime}}}/N$.)

Moreover, letting $\theta(w) : = \log \operatorname{\mathbb{P}}( | m w +Z_i | \unicode{x2A7D} \kappa \, , i = 1,2 \, | \, w)$ the average $X_N = \frac{1}{N} \sum_{a = 1}^M \theta(w_i)$ has a subGaussian tail in N, i.e. $\operatorname{\mathbb{P}}(|X_N - \operatorname{\mathbb{E}}[X_N]| \unicode{x2A7E} t) \unicode{x2A7D} 2 e^{-Nt^2/(2C)}$ for some constant C > 0 by the Azuma–Hoeffding inequality. Since the maximum in equation (69) is taken over no more than $2N+1$ values we can let $t = t_N \to 0$ slowly with N such that $\sum_{N} N e^{-Nt_N^2/(2C)}\lt\infty$. The Borel–Cantelli lemma and continuity allow us to conclude the proof. □

5. Analysing the local entropy and its thresholds

Having shown in theorem 1 that the local entropy $\phi_{N,\delta}(r)$ is asymptotically given by the formula $\max\{0,\phi_1(r)\}$ in the limit $N\to \infty$, α → 0 then δ → 0, where

Equation (76)

We will now focus on the analysis of this function. In appendix we derive the local entropy for generic values of these parameters and show a posteriori how we can recover the limit presented above.

A first step to simplify our analysis is to rewrite $\varphi_1(m)$ in the following fashion

Equation (77)

where $\mathcal{N}_{\kappa_0} = \operatorname{\mathbb{P}}(|Z_0|\unicode{x2A7D} \kappa_0)$, and we let $\mathcal{D}Z_0 = \frac{e^{-Z_0^2/2}}{\sqrt{2\pi}}\mathrm{d} Z_0$ denote the Gaussian measure. We recall that the error function is

Equation (78)

In fact, when $\alpha\ll 1$ the local entropy is a non-trivial function for only a restricted range of parameters κ and m. For this to happen the entropic and energetic contributions have to be comparable. This leads us to introduce a rescaling of the form

Equation (79)

in order to have both $\varphi_1(m)$ and h(m) contributing as a $\mathcal{O}(\alpha)$ in the local entropy when $\alpha\ll 1$. This first indicates that we can restrict our analysis to a regime where $1-m\ll 1 $. Consequently, the entropic term is simplified to

Equation (80)

Then, using this rescaling we obtain the simplified form of the local entropy and the equation for its local maxima (at $\tilde{r}\neq 0$)

Equation (81)

Equation (82)

with again $\mathcal{N}_{\tilde{\kappa}_0} = \operatorname{\mathbb{P}}(|Z_0|\unicode{x2A7D} \tilde{\kappa}_0)$. The presence of this local maximum in the potential tells us that there is a cluster of atypical solutions with margin κ around each typical configuration with margin κ0. In the following, we will denote as $s[\tilde{\kappa}_0,\tilde{\kappa}]$ the local entropy evaluated at this maximum.

In figure 1 we display the behavior of the local entropy $s[\tilde{\kappa}_0,\tilde{\kappa}]$ as a function of $\tilde{\kappa}_0$ and $\tilde{\kappa}$. As outlined by the dashed line, clusters exist only for a finite span of values for $\tilde{\kappa}$, which depends on the margin $\tilde{\kappa}_0$ of the reference vector ${\boldsymbol{x}}_0$. Defining $\tilde{\kappa}_\textrm{entr}(\tilde{\kappa}_0)$ as the critical value of $\tilde{\kappa}$ for which clusters disappear, we see from the figure that $\tilde{\kappa}_\textrm{entr}\equiv \textrm{min}_{\tilde{\kappa}_0} \tilde{\kappa}_\textrm{entr}(\tilde{\kappa}_0) = \tilde{\kappa}_\textrm{entr}(\tilde{\kappa}_0 = 0)$. In other words, the first clusters to disappear are the ones formed around a reference vector at $\tilde{\kappa}_0 = 0$. In particular, this corresponds to planting at $\tilde{\kappa}_0 = \tilde{\kappa}_{\tiny \mathrm{SAT}}$ as we have

Equation (83)

Since clusters are associated with AMP/TAP fixed points, $\tilde{\kappa}_\textrm{entr}$ corresponds to the margin above which the AMP/TAP initialized close to a typical solution with margin $\kappa_0 = {\kappa}_{\tiny \mathrm{SAT}}$ converges to the same fixed point as would be reached from a random initialization. The existence of solutions from which AMP/TAP converges to this trivial fixed point was linked to the onset of a region where algorithms may be able to find solutions. More precisely, numerical evidence in the literature suggests that solutions that are found by efficient algorithms do not correspond to other AMP/TAP fixed points than the one reached from random initialization [15, 28].

As shown in figure 2, in which we plant at $\kappa_0 = \kappa_{\tiny \mathrm{SAT}}$, the local entropy undergoes two interesting thresholds with distinctive values of $\tilde{\kappa}$ (for fixed α). One being the value of $\tilde{\kappa}$ above which the potential remains positive for all m, we will refer to it as the energetic threshold with $\tilde{\kappa} = \tilde{\kappa}_\textrm{ener}(\tilde{\kappa}_0)$. In other words, this means that above this critical margin we can find solutions to the SBP with margin κ at any distance from the reference vector. The second critical value for $\tilde{\kappa}$ corresponds to the loss of the local maximum at m ≠ 0 in the potential. This corresponds to the entropic threshold that we mentioned earlier with $\tilde{\kappa} = \tilde{\kappa}_\textrm{entr}(\tilde{\kappa}_0)$.

In the two following sections, we focus our analysis on these two thresholds in the case where $\tilde{\kappa}_0 = 0$. Again, this choice is justified by the fact that the energetic and entropic threshold happen first when planting at $\tilde{\kappa}_0 = \kappa_{\tiny \mathrm{SAT}} \to_{\alpha \to 0} 0$, i.e.

Equation (84)

Equation (85)

Similarly to the entropic threshold, we will use in the following the shortening $\tilde{\kappa}_\textrm{ener}(0) = \tilde{\kappa}_\textrm{ener}$.

5.1. Energetic threshold

The energetic threshold occurs when in a range of intermediate distances the local entropy $\phi_1(r)$ is negative. This means that we want to find the exact point where the minimum of the entropy (excluding m = 1) is zero. We start by setting $\tilde{\kappa}_0 = 0$ in equation (81) to obtain the simplified form of the local entropy

Equation (86)

The potential is then null when

Equation (87)

and the rhs of the upper equation has a maximum for

Equation (88)

Finally, if we solve the two previous equations, we obtain the set of values $\{\tilde{\kappa}_\textrm{ener},\tilde{r}\}$ for which the potential stops being negative for any value of the magnetization m. Numerically we obtain

Equation (89)

Equation (90)

5.2. Entropic threshold

The entropic threshold occurs when the local maximum other than m ≠ 0 of the free entropy cease to exist. We recall that the local entropy for $\tilde{\kappa}_0 = 0$ reads

Equation (91)

with a non-trivial local maximum obtained by solving the fixed point equation

Equation (92)

Again, the rhs of the previous has a maximum for

Equation (93)

Finally, we can solve numerically the two previous equations and we obtain

Equation (94)

Equation (95)

5.3. Complexity versus entropy

In this section, we focus on the relation between the complexity of the clusters around the high-margin solutions and their local entropy. We define the complexity as the logarithm of the number of clusters around solutions at margin κ0, normalized by N, and we recall that the local entropy of a cluster is the value of the local entropy $\phi_1(r = \frac{1-m}{2})$ at the nearest local maximum to the reference solution. By contiguity to the planted model, the clusters of solutions with margin $\kappa\gt\kappa_0$ living around two different planted configurations are distant, since the reference configurations are nearly orthogonal with high probability. Thus, heuristically, counting their exponential number (or complexity) simply consists of enumerating the number of typical solutions at κ0 we can plant.

Taking these previous considerations into account the obtained clusters have a complexity that depends solely on κ0 while their local entropy is a function of κ and κ0. Fixing κ while tuning κ0 enables us to scan across sets of clusters with different complexities and local entropies, all containing atypical solutions of the SBP with margin κ. More specifically, the complexity is

Equation (96)

and the entropy of a cluster is

Equation (97)

in which m is evaluated with the fixed-point equation

Equation (98)

Using the rescaling from the previous section we can finally write for these two functions in the leading order in α → 0:

Equation (99)

Equation (100)

where we recall that $\tilde{r}$ is evaluated with

Equation (101)

and

Equation (102)

In figure 2 the right-hand side displays several curves of complexity $\Sigma[\tilde{\kappa}_0]$ as a function of the local entropy $s[\tilde{\kappa}_0,\tilde{\kappa}]$ for fixed values of $\tilde{\kappa}$. Three regimes can be outlined for $\kappa \lt \kappa_\textrm{entr}$. First, for $s[\tilde{\kappa}_0,\tilde{\kappa}]\approx 0$, we have locally convex curves (and $\tilde{\kappa}_0-\tilde{\kappa} = o(1)$). This result appears quite surprising as usually these $\Sigma(s)$ curves are fully concave [14, 38]. Then, the curve becomes concave while having $s[\tilde{\kappa}_0,\tilde{\kappa}] = \mathcal{O}(\alpha)$ and $\tilde{\kappa}_0-\tilde{\kappa} = \mathcal{O}(1)$. In this regime, the complexity continues to scale as $\Sigma[\tilde{\kappa}_0]-\Sigma_o = \mathcal{O}(\alpha)$ and the local entropy is upper bounded by $s[0,\tilde{\kappa}]$. Finally, if we set $\tilde{\kappa}_0\ll \tilde{\kappa}$ (i.e. $\kappa_0 = o\left(\sqrt{-\alpha/\log(\alpha)}\right)\,$) the complexity jumps from $\Sigma[\tilde{\kappa}_0]\approx\Sigma_o$ to $\Sigma[\tilde{\kappa}_0] = 0$. In this case, the entropy remains fixed (in first order) at $s[\tilde{\kappa}_0,\tilde{\kappa}] = s[0,\tilde{\kappa}]$. We sketched these three regimes for the complexity versus entropy curves in figure 3. For $\tilde{\kappa} \gt \tilde{\kappa}_\textrm{entr}$ only the first regime exists since for small enough κ0 the local maximum of the potential disappears.

Figure 3.

Figure 3. Sketch of the behavior of the complexity as a function of the local entropy. First, close to $s[\tilde{\kappa}_0,\tilde{\kappa}] = 0$, we observe a regime in which the curve is convex. Then, as $\tilde{\kappa}_0$ is lowered (keeping $\tilde{\kappa}_0-\tilde{\kappa} = \mathcal{O}(1)$) the curve becomes concave. The complexity remains such that $\Sigma[\tilde{\kappa}_0]-\Sigma_o = \mathcal{O}(\alpha)$ while the entropy increases up until the upper bound $s[0,\tilde{\kappa}]$. Finally, as we set $\tilde{\kappa}_0\ll\tilde{\kappa}$, the entropy remains constant at $s[0,\tilde{\kappa}]$ while the complexity drops to zero.

Standard image High-resolution image

6. Analysis of the clustered structure through the replica method

6.1. The 1-RSB free energy

In this section, we show how the clustered structures we obtained with the planting approach can also be observed via the ordinary 1-RSB computation [32]. For this we will consider the set of solutions $S({\boldsymbol{G}}, \kappa)$ of the unbiased SBP. In particular, we will consider its cardinality

Equation (103)

and its total entropy function as the logarithm of Z averaged over the disorder G

Equation (104)

So as to perform the average over the disorder we will use the replica trick [32]. This trick takes the form of

Equation (105)

where each of the n introduced copies of the system is called a replica. This technique enables to shift from a computation where interactions are random and the replica decoupled to a computation where the replica interacts with deterministic couplings. With this approach, the rest of the computation mainly consists in evaluating the quantity $\operatorname{\mathbb{E}}_{{\boldsymbol{G}}}\left[ Z^n( \kappa )\right]$ at a fixed-point of the overlap matrix $Q\in \mathrm{I\!R^{n\times n}}$, where

Equation (106)

Moreover, as the constraints on the overlaps are introduced in the following fashion

Equation (107)

we will have also to evaluated $\operatorname{\mathbb{E}}_{{\boldsymbol{G}}}\left[ Z^n( \kappa )\right]$ at a fixed-point of the matrix $\widehat{Q}$. In more detail, the computation consists of evaluating

Equation (108)

with

Equation (109)

The computation of $\operatorname{\mathbb{E}}_{{\boldsymbol{G}}}\left[ Z^n( \kappa )\right]$ with the 1-step replica symmetric (1-RSB) ansatz implies the following form for the matrices Q and $\widehat Q$

Equation (110)

With this ansatz equation (108) boils down to

Equation (111)

with

Equation (112)

Equation (113)

For more details on the calculation steps to derive $\phi^\textrm{1-RSB}$ we redirect the interested readers to the first appendix of [4]. Before moving on with the analysis of the 1-RSB potential, a first simplification consists in taking into account a symmetry in the in/out channels: $\phi^{\kappa}_\textrm{out}[\omega,V] = \phi^{\kappa}_\textrm{out}[-\omega,V]$ and $\phi_\textrm{in}[B] = \phi_\textrm{in}[-B]$. Indeed, this symmetry implies that optimizing the potential yields the solution $q_0 = \widehat{q}_0 = 0$. Thus, in the following, we will always take this solution. Then, the remaining equations we have to verify for the fixed point are

Equation (114)

Equation (115)

With these definitions, the entropy and complexity of the clusters can be determined at the fixed point as

Equation (116)

6.2. The 1RSB solution at finite α

When it comes to solving the 1RSB equations, we focus in this subsection on α = 0.5 as a representative value not close to zero, the corresponding satisfiability threshold is $\kappa_{\tiny \mathrm{SAT}}(\alpha = 0.5) = 0.319$. We obtained four branches of solutions when solving the fixed-point equations (114) and (115) with respect to q1 and $\widehat{q}_1$ for the 1-RSB potential (and browsing through values for the Parisi parameter x). Two of these solutions are unstable under the iteration scheme

Equation (117)

while the remaining two are stable. When browsing different values of x, we also observe a threshold value for κ for which the overall behavior of these fixed points changes. We will call this value $\kappa_\textrm{break}(\alpha = 0.5)\approx 0.455$. In figure 4(left panel) we plot the complexity Σ as a function of their entropy s for the four branches. When tuning x each solution describes a trajectory that we highlighted with either a dashed (unstable fixed point) or a full line (stable fixed point). One key question arising from these results is how we should select the fixed-point branch that corresponds to the actual clusters of solutions in the problem. First, we clearly need to restrict to non-negative Σ and non-negative s. Moreover, we know that the correct equilibrium state is given by the solution where the total entropy

Equation (118)

is maximized. For the present model, this happens for s = 0 when the (negative) slope of the $\Sigma(s)$ curve is infinite. This can be seen by realizing that the slope of the curve $\Sigma(s)$ is much smaller than −1. We recall that this slope is equal to −x, where x is the Parisi parameter, as explained in equation (116). We highlighted this equilibrium point with a colored dot in the left panel of figure 4. In particular, this point $\Sigma(0)$ corresponds to the equilibrium frozen 1RSB solution of the SBP problem with a value corresponding to one computed in [4]. We note that this criterion for equilibrium is rather unusual among other models where the 1RSB solution was evaluated. Usually, either both $\Sigma\gt0$ and s > 0 at the point where the negative slope x = 1 corresponding to the so-called dynamical-1RSB phase, or the maximum is achieved when $\Sigma = 0$ at a (negative) slope strictly between $0\lt x \lt1$ corresponding to the so-called static-1RSB phase. Here, we observe the equilibrium being achieved for $x\rightarrow +\infty$ corresponding to frozen-1RSB at equilibrium. Finally, we observe that for $\kappa\gt\kappa_\textrm{break} \sim 0.455$ (still considering α = 0.5) the curve $\Sigma(s)$ for positive values of both s and Σ breaks into two branches. Consequently, there is a finite range of values for the entropy s where we do not obtain any fixed point. The meaning of such a gap is unclear, but it appears in other problems and their 1-RSB solution [39].

Figure 4.

Figure 4. We plot in these two panels the complexity Σ as a function of the entropy s, in both cases α = 0.5. On the left panel, we plot all the branches obtained when solving the saddle-point equations with respect to q1 and $\widehat{q}_1$ for the 1-RSB potential (and browsing through values for the Parisi parameter x). As a guide-to-the-eye we emphasize each of the four branches of solutions with either a full or dashed line. The full lines correspond to stable fixed points regarding the iteration scheme of equation (117), while dashed ones correspond to unstable fixed points. We highlighted with colored dots the fact that certain branches stop at s = 0 at a value of Σ corresponding to the equilibrium solutions. To reach this point, the Parisi parameter x has to be set to infinity. On the right panel, we compare the results obtained by the branches yielding this equilibrium complexity with the ones obtained with the previous planting method. We added colored dots when the fixed point (either of the planted or the 1-RSB saddle-point equations) with maximum entropy s was obtained for $\Sigma = 0$.

Standard image High-resolution image

In figure 4 right panel, we plot the complexity Σ as a function of the entropy s selecting the branch that is an analytic continuation of the equilibrium $\Sigma(0)$ point and compare it to the one obtained via the planting approach. For this comparison, we need the local entropy in the planted model at finite values of α that is derived in appendix . We note that the two complexities exactly agree at s = 0 as is expected because at s = 0 both the complexities correspond to the total number of solutions at that κ. For s > 0, the two complexities have a similar shape, being clearly convex for small values of s. We note again that the overall values of Σ are larger than the values of s meaning that the slope actually takes a rather large value in the whole range of those curves. We recall that in the context of the 1-RSB computation this slope is equal to −x, where x is the Parisi parameter. Taking its value much larger than one is not common in other models for which 1RSB was studied. This is likely the reason why these extensive size clusters were not described earlier in the literature for the binary perceptron. We further see that the 1RSB complexity, when it exists, is strictly larger than the one obtained via planting as again expected since via planting we obtain only some of the clusters of solution whereas the 1RSB computation should be able to count all of them. Then, when $\kappa \gt \kappa_\textrm{break}$, the planted model predicts the existence of clusters with an internal entropy that lies inside the fixed-point gap of the 1-RSB approach. This indicates that the 1RSB solution does not fully describe the space of solutions in this case. This may have many causes. For example, we may have missed a branch of fixed points in our analysis of the 1-RSB potential. Or, this region may involve a replica ansatz with further symmetry breaking. Or perhaps these rare clusters simply cannot be obtained with a replica computation. Finally, when $\kappa\gt\kappa_\textrm{ener}(\alpha = 0.5)\approx 0.499$, the curves $\Sigma(s)$ obtained with planting stop at some positive values of Σ and s and thus look again qualitatively similar to the portion of the curve $\Sigma(s)$ that is obtained from 1RSB by analytically continuing from the equilibrium $\Sigma(s = 0)$ point.

Overall, the 1RSB approach evaluated at sufficiently larger values of the Parisi parameter x identified clusters of extensive size in parts of the solution corresponding to a convex curve that is unstable under the iterations of the 1RSB fixed point equations. These curves are partly compatible with the complexity obtained from planting. Yet there are still regions of $\kappa, s$ for which we obtain extensive clusters of solution from the planting procedure but not from the 1RSB. The reason behind this paradox is left for future work.

For small α, the situation becomes actually clearer. In the next section, we will discuss this case.

6.3. The α → 0 and $x\rightarrow+\infty$ limit

We now focus, as in the first part of the paper, on the regime of small α. Using our results from the planting computation, and anticipating a similarity of behaviour in the 1RSB, we can deduce the behavior of the Parisi parameter x in the low α limit. Indeed, alike the 1-RSB computation, we saw that the planting approach probes clustered solutions. It also allows for computing their complexity Σ and local entropy s, see equations (99) and (100). As mentioned above, in the context of a 1-RSB computation we have $\partial\Sigma/\partial s = -x$. Thus, if we plug-in the entropy and complexity from equations (99) and (100) we can compute $\partial\Sigma/\partial s$ and estimate the Parisi parameter.

By doing so we obtain two regimes for which the slope $\partial\Sigma/\partial s$ becomes infinite in the low α limit. First, when $\tilde{\kappa}_0\ll\tilde{\kappa}$ the entropy remains constant at first order in α while the complexity roughly jumps from $\Sigma_o$ to zero, see the left panel in figure 2. It indicates that to recover these states with the 1-RSB computation we should set $x\gg1$. The second regime for which we observe an infinite slope corresponds to $\tilde{\kappa}_0\approx\tilde{\kappa}$. Indeed, we have $\vert\partial \Sigma/ \partial s \vert\sim (\tilde{\kappa}-\tilde{\kappa}_0)^{-1}$ close to $\tilde{\kappa}_0 = \tilde{\kappa}$. Consequently, if we want to probe the clusters with almost zero local entropy we should also set $x\gg1$ to obtain this regime.

A last piece of information given by the planted model is that these clusters (in the two regimes mentioned above) correspond to a limit where $q = m^2\approx 1$. Thus, if we impose the same condition in the 1-RSB fixed-point equations, we have that setting $q_1\approx1$ in equation (115) implies $\widehat q_1\gg 1$. Therefore, in order to find these clusters we will not only set $x\gg 1$ but we will also take $q_1\approx 1$ and $\widehat q_1\gg 1$.

The first regime we mentioned will be referred as the maximum entropy regime, while the second one will be referred as the minimum entropy regime. First, we see that the entropic contribution can be simplified identically in both regimes. Indeed, when setting $\widehat q_1\gg1$ we obtain

Equation (119)

which then yields

Equation (120)

As we will see in the following subsections, the simplification for the energetic term will be regime-dependent.

6.3.1. Maximum entropy regime.

For the maximum entropy regime, we can again go back to the results from the planting model to help us make the correct approximation. We know, for example, that we should have $1-q_1\sim \kappa^2$ in the low α limit (as both quantities have the same scaling in α). This implies that we should have

Equation (121)

Therefore, with $x\gg1$, we will approximate the energetic term with a saddle-point method. In other words, we will compute

Equation (122)

where Z0 corresponds to the maxima of $\phi^{\kappa}_\textrm{out}[\sqrt{q_1}z,1-q_1]$. In particular, with this function we have Z0 = 0. By doing the saddle-point approximation in Z0 = 0 we obtain

Equation (123)

with

Equation (124)

Finally, if we combine the simplification of both the entropic and energetic contributions the total entropy becomes

Equation (125)

and its fixed point equations are, at first order in x,

Equation (126)

Equation (127)

Now, we are able to draw a direct parallel with the planted system at low α and $\kappa_0\ll \kappa$. Indeed, if we use the correspondence $q_1\equiv m^2$ and $(x-1)\widehat{q}_1\equiv \widehat{m}$, equations (126) and (127) are nothing but the fixed-point equations for the planted model (see equations (A23) and (A24)). This indicates a posteriori that the 1-RSB calculation enables us to recover the same clusters as in the planted model where we have set $\kappa_0\ll \kappa$. To make the identification between the two approaches even more direct we can focus on the entropy and the complexity of this 1-RSB fixed-point. We obtain at first order in x that

Equation (128)

and

Equation (129)

Thus, at first order in x these clustered states have exactly the same entropy and complexity as the ones from the planted system with $\kappa_0\ll\kappa$ (and $\alpha\ll 1$).

6.3.2. Minimum entropy regime.

In this section, we want to probe clusters with a very small entropy. Now, keeping κ fixed, this means that we will have to set q1 extremely close to one and eventually have $1-q_1\ll\kappa$ in order to go up to zero local entropy. This scaling between q1 and κ is incompatible with the saddle-point approximation we performed for the maximum entropy regime, as equation (121) is not verified anymore. In fact, in this case, we have to introduce an asymptotic expansion

Equation (130)

where we used the identity

Equation (131)

We then estimate the interval of value of z ($z\in[-Z_0,Z_0]$) for which $e^{x\phi^{\kappa}_\textrm{out}[\sqrt{q_1}z,1-q_1]}$ remains finite. In other words, we compute the value of z for which the function $e^{x\phi^{\kappa}_\textrm{out}[\sqrt{q_1}z,1-q_1]}$ is equal to an arbitrary value $1/C$,

Equation (132)

where $W_0(.)$ is the Lambert function with branch index k = 0. This computation thus shows that $e^{x\phi^{\kappa}_\textrm{out}[\sqrt{q_1}z,1-q_1]}$ jumps from 1 to any arbitrary fraction $1/C$ exactly at

Equation (133)

In this limit, we thus have for the energetic contribution

Equation (134)

And finally, if we put together the simplified entropic and energetic contribution to the 1-RSB potential we obtain

Equation (135)

and the corresponding fixed point equations are (at first order in x)

Equation (136)

Equation (137)

The combination of these two fixed-point equations implies $\kappa\gg\sqrt{2(1-q_1)\log x}$. Consequently, the term $x^2(1-q_1)\widehat{q}_1$ can be neglected and the 1-RSB free energy boils down to

Equation (138)

We thus recover the case of the planted system at $\kappa_0 = \kappa$ as we have

Equation (139)

Equation (140)

In [4] the authors showed in a more standard computation that these equilibrium configurations (verifying a frozen 1-RSB structure) can also be obtained by imposing q0 = 0, q1 = 1 and x = 1.

In figure 5 we display for several values of $\tilde{\kappa} = \kappa\sqrt{-\log(\alpha)/\alpha}$ the complexity as a function of the entropy. The light-colored full lines correspond to the results obtained with the planted model. The dashed and dotted lines correspond respectively to the maximum and minimum entropy fixed-point branches of the 1-RSB free energy. To obtain them we solved the fixed-point equations in each regime for large but finite Parisi parameter x. As shown by the previous computations each end of the curve sees a close match between the planting approach and one of the $x\rightarrow+\infty$ regimes. This leads us to conjecture that in the limit of small α, the planted and 1-RSB $\Sigma(s)$ curves match exactly. In other words, the planting approach actually captures the dominant clusters for each size s.

Figure 5.

Figure 5. We plot the complexity Σ as a function of the entropy s for several values of $\tilde{\kappa}$. The dashed and dotted curves correspond to the maximum and minimum entropy regime and are obtained by optimizing the potentials equations (125) and (135) over q1 and $\widehat{q}_1$ with large but finite values of the Parisi parameter x. In order to compare the 1-RSB computation with the planting approach, the full curves correspond to the complexity and entropy obtained with the planting computation.

Standard image High-resolution image

7. Conclusion and discussion

We study the local entropy in the SBP problem around solutions planted at a smaller margin. Our results are rigorous in the limit of small α, conditional on a condition of concentration of a certain entropy. We identify clusters of solutions of an extensive entropy as local maximizers of this local entropy. We identify two thresholds κener and κentr that we consider of particular interest. κentr is the smallest κ at which the planted clusters and the corresponding maximum disappear, thus presumably melting into an extended structure that may be accessible to efficient algorithms. κener is a value above which there are solutions at all distances from the planted solutions and as such is an upper bound on the overlap gap property threshold.

We then investigated the 1RSB solution of the SBP problem and showed how it allows us to identify extensive clusters of solutions without introducing concepts that are not present in the canonical 1RSB computation already. It suffices to consider large values of the Parisi parameter x and both convex and concave parts of the $\Sigma(s)$ curve. We discuss how the equilibrium frozen-1RSB is recovered in the $x \to \infty$ limit. While this resolves some open questions about the 1RSB solution for binary perceptions, we conclude that the 1RSB calculation is incomplete at finite α as we did not find solutions corresponding to all the extensive clusters identified by the planting procedure.

We further showed that while, in general, the planting procedure we study does not describe all the rare clusters, in the limit of small α it seems that the $\Sigma(s)$ obtained via planting is exactly the same as the one obtained from the 1RSB. This leads us to conjecture that in the limit of small α the planting actually describes almost all clusters of a given size.

Acknowledgments

We acknowledge funding from the Swiss National Science Foundation grants OperaGOST (Grant Number 200390) and SMArtNet (Grant Number 212049). We also thank David Gamarnik, Carlo Lucibello and Riccardo Zecchina for enlightening discussions on these problems.

Data availability statement

All data that support the findings of this study are included within the article (and any supplementary files).

Appendix: The local entropy in the planted model

A.1. The local entropy for generic parameters α, κ0 and κ

While in the main text we derive the local entropy rigorously at small α, in this appendix we give the expressions for a generic value of α. This we have not established rigorously and instead resorted to the replica method applied to the contiguous planted system. Following [4] the replica symmetric (RS) free energy for the planted binary perceptron is

Equation (A1)

with

Equation (A2)

Equation (A3)

and where $\mathcal{D}z$ (as well as $\mathcal{D}x$ later) represents an integration with a scalar normal-distributed variable. Moreover, in the following we will take $P_{\kappa_0}[w] = e^{-w^2/2}\mathbf{1}\{\vert w \vert \unicode{x2A7D} \kappa_0\}/\mathcal{N}_{\kappa_0}$ where $\mathcal{N}_{\kappa_0}$ is the prefactor normalizing the distribution. This distribution is given by the typical clusters dominating the probability measure $P^{{\boldsymbol{g}}}_{\kappa_0}[{\boldsymbol{x}}]$. The variable xo corresponds to the planted configuration and can take the values ±1 indifferently.

This free energy has saddle points for a set of parameters $\{q,\widehat{q},m,\widehat{m}\}$ verifying

Equation (A4)

Equation (A5)

Equation (A6)

Equation (A7)

In figure 6 we plot the complexity, see equation (96), as a function of the local entropy of the planted model. As explained in section 5.3 the entropy is obtained by evaluating $\phi^{\kappa_0,\kappa}_\textrm{planted}$ with its only non-trivial saddle-point over $\{q,m,\widehat{q},\widehat{m}\}$. The complexity is the exponential number of possible typical solutions for the binary perceptron at α and κ0, see equation (96), which corresponds to the number of possibilities of planting the configuration. The results we display in this figure are obtained for $\alpha = 10^{-2}$.

Figure 6.

Figure 6. We plot the complexity $\Sigma[\kappa_0]$ as a function of the local entropy $s[\kappa_0,\kappa]$ for $\alpha = 10^{-2}$ and $\kappa\in[0.03,0.06]$ using the planting approach. The function $s[\kappa_0,\kappa]$ is obtained by computing the non-trivial optimizer $\{q,m,\widehat{q},\widehat{m}\}$ for the local free entropy $\phi^{\kappa_0,\kappa}_\textrm{planted}$ and evaluating its entropy, see equations (A1)–(A7). The complexity corresponds to the exponential number of typical solutions for the binary perceptron at α and κ0, see equation (96).

Standard image High-resolution image

A.2. Results in the low α limit

In this section, we focus on the planted model in the limit $\alpha\ll 1$. Indeed, it can be shown in this case that the non-trivial saddle point of the planted free energy verifies $\widehat{q}\ll\widehat{m}$, $q-m^2\ll 1-m$ and $1-m = o(1)$. We start first by rewriting the energetic contribution as

Equation (A8)

with the change of variable

Equation (A9)

and integrating over $B^\top$. Recalling the fact that we focus on probing a saddle-point with $q-m^2\ll1-m$ and $1-m = o(1)$ (i.e. $1-q = o(1)$), we have for the energetic contribution

Equation (A10)

and the saddle point equations over q and m become

Equation (A11)

Equation (A12)

Now with $q-m^2 = o(1)$ the energetic contribution becomes

Equation (A13)

and we obtain using the saddle point with respect to m

Equation (A14)

If we inject this solution in equation (A12) we obtain $\widehat{q} = o(1/(1-m)^{3/2})$, while equation (A14) implies $\widehat{m} = \mathcal{O}(1/(1-m)^{3/2})$. Finally, setting $\widehat{q}\ll\widehat{m}$ in equations (A4) and (A5) we can derive

Equation (A15)

Equation (A16)

In a nutshell, by setting $q-m^2\ll1-m$ and $1-m = o(1)$ we obtain $\widehat{m} = \mathcal{O}(1/(1-m)^{3/2})$ and $\widehat{q} = o(1/(1-m)^{3/2})$. Then, we showed that $\widehat{q}\ll \widehat{m}$ implies $q-m^2\ll1-m$. At this stage we still need to check if closing equations (A14) and (A16) will indeed provide $1-m = o(1)$

In fact, using $q-m^2\ll 1-m$, $1-m = o(1)$, $\widehat{q}\ll\widehat{m}$ along with equation (A16) we have

Equation (A17)

Then, the planted free energy is a non-trivial function for only a restricted range of parameters κ and m. It happens when the entropic and energetic contributions compete with each other. This leads us to introduce a rescaling of the form $1-{m}^2 = -\alpha \tilde{r} /\log(\alpha)$, $\kappa_0 = {\tilde{\kappa}_0} \sqrt{-\alpha /\log(\alpha)}$ and $\kappa = {\tilde{\kappa}} \sqrt{-\alpha /\log(\alpha)}$. For $\alpha\ll 1$, this rescaling enables to check directly that $1-m = o(1)$ as we have $\alpha/\log(\alpha)\ll1$. In other words, if there exist a solution for the saddle-point equations (A14) and (A16) it will be for $1-m = o(1)$ in the low α limit. If we now rewrite the above planted free energy with this rescaling we obtain

Equation (A18)

Equation (A19)

with

Equation (A20)

A last simplification can be made if we set $\tilde{\kappa}_0 = 0$, for example this condition is verified when planting at $\kappa_{\tiny \mathrm{SAT}}$ as $\kappa_{\tiny \mathrm{SAT}}\sqrt{-\log(\alpha)/\alpha}\rightarrow 0$ when α is sent to zero. In this case the integration over B can be dropped and we obtain for the local entropy and its saddle-point equation

Equation (A21)

Equation (A22)

As a final note, and in order to draw a parallel with the 1-RSB computation, we outline here that the saddle-point equations (A14) and (A16) become after setting $m\rightarrow 1$ ($\widehat m\gg1$) and $\kappa_0 = \kappa_{\tiny \mathrm{SAT}}\rightarrow0$

Equation (A23)

Equation (A24)

Please wait… references are loading.
10.1088/1751-8121/ad3a4a