On the Atypical Solutions of the Symmetric Binary Perceptron

We study the random binary symmetric perceptron problem, focusing on the behavior of rare high-margin solutions. While most solutions are isolated, we demonstrate that these rare solutions are part of clusters of extensive entropy, heuristically corresponding to non-trivial fixed points of an approximate message-passing algorithm. We enumerate these clusters via a local entropy, defined as a Franz-Parisi potential, which we rigorously evaluate using the first and second moment methods in the limit of a small constraint density α (corresponding to vanishing margin κ ) under a certain assumption on the concentration of the entropy. This examination unveils several intriguing phenomena: i) We demonstrate that these clusters have an entropic barrier in the sense that the entropy as a function of the distance from the reference high-margin solution is non-monotone when κ ≤ 1.429 (cid:112) − α/ log α , while it is monotone otherwise, and that they have an energetic barrier in the sense that there are no solutions at an intermediate distance from the reference solution when κ ≤ 1.239 (cid:112) − α/ log α . The critical scaling of the margin κ in (cid:112) − α/ log α corresponds to the one obtained from the earlier work of Gamarnik et al. [ 20 ] for the overlap-gap property, a phenomenon known to present a barrier to certain efficient algorithms. ii) We establish using the replica method that the complexity (the logarithm of the number of clusters of such solutions) versus entropy (the logarithm of the number of solutions in the clusters) curves are partly non-concave and correspond to very large values of the Parisi parameter, with the equilibrium being reached when the Parisi parameter diverges.


I. INTRODUCTION A. Background and Motivation
We consider the symmetric binary perceptron (SBP), introduced in [45], where we let G = (g a ) M a=1 be a collection of M i.i.d.standard Gaussian random vectors in N , with M = ⌊αN ⌋ for a fixed α > 0 and for κ > 0, we consider the set of binary solutions x ∈ {−1, +1} N to the system of linear inequalities 〈g a , x〉 ≤ κ N for all 1 ≤ a ≤ M . (1) We denote the set of solutions by S(G, κ), and its cardinality by It was shown by Aubin, Perkins and Zdeborová [45] that S(G, κ) is nonempty with high probability if and only if κ > κ SAT (α) where κ SAT (α) is defined by the equation Moreover, in the limit of small α we have Our main interest is in investigating the possibility of finding solutions efficiently when κ > κ SAT (α).Mézard and Krauth [66] showed in their seminal work using the non-rigorous replica method [72] that the solution landscape of the one-sided perceptron (where there is no absolute value in the constraints (1)) is dominated by isolated solutions lying at large mutual Hamming distances, a structure sometimes called "frozen replica symmetry breaking" [58,63,64,70,82].From the mathematics point of view, the frozen replica symmetry breaking prediction was proven true for the SBP in works by Perkins and Xu [75] and Abbé, Li and Sly [42], who showed that for all κ > κ SAT (α), a solution drawn uniformly at random from S(G, κ) is isolated with high probability, in the sense that it is separated from any other solution by a Hamming distance linear in N .
This type of landscape property has been traditionally associated with algorithmic hardness, with the rationale that an algorithm performing local moves is unlikely to succeed in the face of such extreme clustering, as argued, for instance, by Zdeborová and Mézard [81], or Huang and Kabashima [63].In some problems, this predicted algorithmic hardness was confirmed empirically, e.g.[81,82].In other problems, a prominent example being the binary perceptron (symmetric or not), it is known that certain efficient heuristics are able to find solutions for α small enough as a function of κ [47,48,57,59,60,64,65].Statistical physics studies of the neighborhood of the solutions returned by efficient heuristics have put forward the intriguing observation that in the binary perceptron problem, a dense region of other solutions surrounds the ones which are returned [46,50,51].This means that efficient algorithms may be drawn to rare, well connected subset(s) of S(G, κ).Moreover, these efficient algorithms fail to return a solution when α becomes large, suggesting the existence of a computational phase transition in the binary perceptron (symmetric or not).
For the symmetric version of the problem, this state of affairs has been partially elucidated in two recent mathematical works: In [43], Abbé et al. show the existence of clusters of solutions of linear diameter for all κ > κ SAT (α), and maximal diameter for α small enough.In a different direction, Gamarnik et al. [62] established an almost sharp result in the regime of small α, stating the following: There exists constants c 0 , c 1 > 0 such that for α small enough, • if κ ≥ c 0 α then a certain online algorithm of Bansal and Spencer [53] finds a solution in S(G, κ), and • if κ ≤ c 1 −α/ log(α) then S(G, κ) exhibits a overlap gap property ruling out a wide class of efficient algorithms.
We mention that the positive result which holds for κ ≥ c 0 α is established in the case where the constraint matrix G is Rademacher instead of Gaussian; nevertheless, the same result is expected in the Gaussian case.
Baldassi et al. [49] suggest that this computational transition can be probed by studying the monotonicity properties of the local entropy of solutions around atypical solutions x 0 as a function of the distance from this solution.One can interpret the results of [49] as evidence towards a conjecture that finding a solution is computationally easy precisely when there exist some rare solutions around which this local entropy is monotone in the distance and that the problem becomes hard when this local entropy develops a local maximum at some distance r 0 from the reference solution x 0 .If such a conjecture is correct, then it must agree with the above-mentioned finding of Gamarnik et al. [62] in the regime of small α.This question motivated the present work.
Another gap in the physics literature we elucidate in this work relates to the fact that the replica method on the one-step replica symmetry breaking level so far has not managed to find clusters of solutions in the binary perceptron.Indeed, the method can count rare clusters as long as they correspond to fixed points of a corresponding message-passing algorithm, see e.g.[80].Parallels between the 1RSB calculation and the analysis of solutions with a monotonic local entropy have been put forward in [46,50,51], but not in the form where one writes the standard 1RSB equations and shows that they have a solution corresponding to rare subdominant clusters.We show that the standard 1RSB framework actually does present such solutions which describe subdominant clusters of extensive entropy, and we give likely reasons why these solutions were missed in past investigations.

a. Local entropy around high margin solutions:
We define and study a notion of local entropy around solutions which are typical at some margin κ 0 < κ.While typical solutions at κ 0 are isolated from each other, it was shown in [43] that they belong to connected components of solutions at margin κ having a linear diameter in N .Here, we show that these solutions are surrounded by exponentially many solutions at margin κ.
Consistently with the statistical physics literature, we say that there is a cluster of extensive entropy around a reference solution x 0 when the local entropy as a function of the distance achieves a local maximum at some distance from x 0 .We show that for a certain range of κ typical solutions at margin κ 0 have extensive entropy clusters around them.We define the entropy of these clusters as the value of the entropy at a local maximum.An analogous investigation of local entropy around large margin solutions was performed in [52] for the one-sided binary perceptron using the replica method.
In our case, the symmetry of the constraints (1) allows us to derive simpler formulas for the local entropy in the regime of small α, essentially via a first moment method.This is due to the present model being contiguous to a corresponding simpler planted model in which the first and second moment computations can be conducted.We show that under a certain assumption on the concentration of the entropy of the SBP, while for any constant value of α the second moment is exponentially larger than the square of the first moment, the exponent of the ratio of these quantities, when normalized by N , tends to zero in the limit of small α.
The resulting entropy of these clusters is plotted in Fig. 1 for various values of κ and κ 0 in the α → 0 limit.We observe that at a certain margin κ entr (κ 0 ) the entropy curve stops because the local entropy curve becomes monotone in the distance for κ > κ entr (κ 0 ).As discussed above, the existence of reference solutions such that the local entropy curve is monotone was speculated to provoke the onset of a region of parameters where finding solutions is algorithmically easy.In this paper, we show the existence of solutions-those typical at κ 0 -for which the local entropy is monotone, and hence we do not expect the problem to be computationally hard for κ > κ entr (κ 0 ).In Fig. 1 we see that the smallest κ where this FIG. 1. Entropy of clusters that exist at margin κ around a typical solution at margin κ 0 .We focus on the small α limit where the margins are rescaled as κ = κ −α/ log(α).and κ 0 = κ0 −α/ log(α).The dashed line corresponds to the envelope of the ending points for all values of κ 0 , i.e. to the entropies at which the clusters (non-trivial AMP/TAP fixed points) disappear.In particular, we observe that the clusters with κ0 = κSAT = 0 disappear first, thus marking a threshold of κentr ≈ 1.429 above which the so-called "wide-flat-minima" of [46,52] exist.
happens is κ entr ≡ min κ 0 κ entr (κ 0 ) = κ entr (κ 0 = κ SAT ).For this reason, a large part of this investigation is devoted to the case κ 0 = κ SAT (α).Motivated by these findings, we then study the local entropy of solutions that are at a Hamming distance N r from the solution planted at κ 0 = κ SAT (α).This is akin to the Franz-Parisi potential as studied in the physics of spin glasses [61].Here, we compute this potential around a typical solution at κ 0 .Our findings, again in the regime of small α, are summarized in Fig. 2 (left), where it is apparent that the local entropy as a function of the distance r from a reference solution is monotone when κ ≥ κentr −α/ log(α) and has a local maximum at an intermediate distance r 0 when κ < κentr −α/ log(α), with κentr ≈ 1.429 given by implicit equations (92) and (93).We also show that no solutions can be found in an interval of distances from the reference solution when κ < κener −α/ log(α) with κener ≈ 1.239 given by the implicit equations ( 87) and (88).
From these results, we note the existence of a logarithmic gap in 1/α in the value of κ where the local entropy curve becomes monotone and the value where the Bansal-Spencer algorithm is proved to succeed, in the regime of small α.It is an interesting open problem to close this gap, either by showing that efficient algorithms can find solutions for all κ ≥ κentr −α/ log(α) or by showing the local entropy approach is not indicative of algorithmic hardness.
b.The 1RSB computation of the complexity curve: We note that in the statistical physics literature, clusters as defined above are also associated with a fixed point of the approximate message passing (AMP) algorithm or equivalently the Thouless-Anderson-Palmer (TAP) equations.The cluster entropy can be thus computed as the Bethe entropy corresponding to the AMP/TAP fixed point that is reached by AMP run at κ and initialized in one of the typical solutions at margin κ 0 .For κ > κ entr (κ 0 ) the AMP/TAP converges to the same fixed point as would be reached from a random initialization, corresponding to an entropy covering the whole space of solutions.Using this relation, the onset of a region where algorithms may be able to find these solutions is then related to the existence of solutions such that a AMP/TAP iteration initialized at these points converges to the same fixed point as if the iteration was initialized uniformly at random from S(G, κ).Indeed, it was observed empirically that solutions found by efficient algorithms always have such a property of AMP/TAP or the belief propagation algorithm converging to the same fixed point as from a random initialization [56,69].
In the existing statistical physics literature, using the replica method on the one-step replica symmetry breaking level, researchers so far have not found clusters of solutions of extensive entropy in the binary perceptron.This is a point of concern as this method is supposed to count all clusters of solutions corresponding to the TAP/AMP fixed points, including the rare non-equilibrium ones [58,67,71,73,80].This a priori casts doubt on the efficacy of the replica method and the validity of its predictions for the number of clusters of a given size, since the method misses a large part of the phase space (unless some explicit conditioning is done as in [50,51].) We propose, based on the replica method, that the answer to this question lies in the properties of the complexity (the logarithm of the number of clusters) versus entropy (the logarithm of the number of solutions in the clusters) Σ(s).We observe that the numerical value of the complexity is rather large compared to the entropy.The slope of Σ(s) gives the value of the so-called Parisi parameter x that is therefore rather large: x ≫ 1.Since the value of x describing the equilibrium properties of the system is always between 0 and 1 it is not that surprising that the literature has not investigated solutions of the replica equations corresponding to x ≫ 1.When we consider a large range of values of x in the standard 1RSB equations for SBP [45], we obtain the Σ(s) depicted in Fig. 2 right.We then provide an argument that leads us to conjecture that in the small α limit, the curve Σ(s) corresponds to the one we obtain via the approach of planting at κ 0 .Thus even though, in general, by planting we construct only some of the rare clusters, it seems that in fact we construct the most frequent ones in the limit of small α.
Another property that we unveil is related to the fact that the curve Σ(s) is usually expected to be concave.The nonconcave parts were so far considered "unphysical" in the literature (e.g.Fig. 8 in [55] or Fig. 5 in [80]).We show in our present work that the so-called "unphysical branch" of the replica/cavity prediction is actually not "unphysical" in the SBP and that it reproduces the curve Σ(s) obtained from the local entropy calculation at small α and small internal cluster entropy.Moreover, we show that some of the relevant parts of the curve Σ(s) cannot be obtained in the usually iterative way of solving the 1RSB equations at a fixed value of the Parisi parameter x.To access this part of the curve we need to adjust the value of x adaptively in every step when solving the 1RSB fixed point equations iteratively.

C. Organization of the paper and the level of rigour
The rest of the paper is organized as follows: Section II defines the local entropy and states the main Theorem 1 in the small α limit.Section III introduces the planted model and its contiguity to the original model; a key element of the proof.Section IV contains the moment computations in the planted model, ending with the proof of Theorem 1.In Section V we use the result of Theorem 1 and study the properties of the asymptotic formula of the local entropy in the small α limit.In section VI we study the one-step-replica-symmetry breaking solution of the SBP and its relation to the local entropy.This section investigates general values of α, not only the small α limit.Finally, we conclude in section VII.
Sections II to IV are fully mathematically rigorous.In Section V we analyze the resulting local entropy formula heuristically, solving the corresponding fixed point equations numerically, and deriving the numerical values for the energetic and the entropic thresholds.In Section VI we rely on the replica method which is well-accepted and widely used in theoretical statistical physics but not rigorously justified from the mathematical standpoint.

II. DEFINITIONS AND MAIN THEOREM
In this paper, the local entropy is defined around a solution satisfying the SBP inequalities (1) with a stricter margin κ 0 .More precisely, for κ 0 ≤ κ, let x 0 ∈ S(G, κ 0 ), and let Z(x 0 , κ, r) be the set of solutions y ∈ S(G, κ) which are at Hamming distance N r form x 0 : We then define the local entropy function as the (truncated) logarithm of Z averaged over the choice of x 0 and the disorder G: where log N δ (x) = max{log(x), N δ}, δ > 0. This truncation to the logarithm is technically convenient, following [74,78].Note that for κ 0 = κ, the fact that there are no solutions at a distance less than r 0 N around x 0 for some r 0 = r 0 (κ, α) with high probability [75] implies that φ N ,δ (r) = δ + o N (1) for all r < r 0 , and so lim δ→0 lim N →∞ φ N ,δ (r) = 0 for r < r 0 .However, as we increase κ starting from κ 0 , new nearby solutions are expected to emerge.These are the solutions which are counted by φ N ,δ (r).This, of course, does not contradict the frozen-1RSB property of S(G, κ) since x 0 is not typical in S(G, κ).
We show that under a certain concentration condition, Assumption 1 stated in Section III, the local entropy φ N ,δ (r) is given in the limit N → ∞ followed by α → 0 then δ → 0 by a simple formula which corresponds to the first moment bound (i.e., annealed entropy) in the corresponding planted model of the SBP.We define binary entropy function and where the outer expectation is taken with respect to Z 0 ∼ N (0, 1) conditioned on the event |Z 0 | ≤ κ 0 , and Z ∼ N (0, 1) independently of Z 0 .(Z 0 has p.d.f.f ; Eq. ( 12).) Remark.Observe that for α small, we have κ SAT (α) = Θ(2 −1/α ), therefore the condition on κ and α in the theorem can be interpreted as κ ≫ κ SAT (α).
The proof of Theorem 1 can be found in Section IV.

III. THE PLANTED MODEL AND CONTIGUITY
The analysis of the local entropy is achieved via a planted model where x 0 is drawn uniformly at random from the hypercube {−1, +1} N and then the constraint vectors g a are drawn from the Gaussian distribution conditional on x 0 being a satisfying configuration, i.e., conditional on x 0 ∈ S(G, κ 0 ).
More precisely, we fix the reference (planted) vector x 0 ∈ {−1, +1} N and for each a ∈ {1, • • • , M } we independently draw Gaussian random vectors g a conditioned on the event that Equivalently, we can write where (g a ) M a=1 are independent N (0, I N ) random vectors and w = (w a ) M a=1 has mutually independent coordinates, independent of (g a ) M a=1 , and distributed as N (0, 1) r.v.'s conditioned to be smaller than κ 0 in absolute value, i.e., they have a p.d.f.
We let pl be the distribution of the pair (G, x 0 ) as per the description above, Eq. ( 10), and rd be their distribution according to the original model where G ∈ M ×N is an array of standard Gaussian vectors and x 0 is drawn uniformly at random from S(G, κ), conditional on the latter being non-empty.We denote by pl and rd the associated expectations.A simple computation reveals that the ratio of pl to rd is given by It was shown in [43] in the case of binary disorder where G has independent Rademacher entries that the above likelihood ratio has constant order log-normal fluctuations for all κ 0 > κ SAT (α); see similar result for Gaussian disorder in [77] for κ 0 close to κ SAT (κ).; Tthis implies in particular that rd and pl are mutually contiguous, meaning that for any sequence of events E n (in the common probability space of rd and pl ), rd (E n ) → 0 if and only if pl (E n ) → 0, see for instance [79,Lemma 6.4].In other words, any high-probability event under the planted distribution pl is also a high-probability event under the original distribution rd .Contiguity allow to compute the local entropy in the planted model, where x 0 is uniformly distributed over {−1, +1} N instead of S(G, κ 0 ), and then transfer the result of this computation to the original model.In our case a result slightly weaker than contiguity is sufficient: Perkins and Xu [75] showed that under a certain numerical assumption (see Assumption 1 therein), lim in rd -probability for all κ > κ SAT (α).As observed in [44,75] this implies the weaker statement that events of probability e −cn , c > 0 under pl are of probability o N (1) under rd .This turns out to be sufficient for our purposes.This argument is used to prove Lemma 2 below.
In addition to the above, we require a concentration property of the restricted partition function Z(x 0 , κ, r) with respect to the disorder G, which we state in more general form as follows: Let a j < b j , 1 ≤ j ≤ M be two sequences of real numbers, let m ∈ [−1, 1] and consider the partition function where g j are i.i.d.standard Gaussian random vectors in N .
Assumption 1.For any δ > 0, m ∈ [−1, 1] and sequences (a j ), (b j ) as above, there exist a constant C > 0 depending only on δ and ∆ := max j (b j − a j ) such that for all t > 0, In models of disordered systems where the free energy is a smooth function of the Gaussian disorder, this concentration follows from general principles of Gaussian concentration of Lipschitz functions, see e.g.[54].In particular, a stronger version of the above assumption (with no truncation to the logarithm and where the decay on the right-hand side is sub-Gaussian for all t > 0) holds for the SK and p-spin models at any positive temperature, and for the family of U-perceptrons where the activation function U is positive and differentiable with bounded derivative.However, in our case the hard constraints defining the model make concentration far less obvious.Currently, exponential concentration of the truncated log-partition function is known for the half-space model i.e., the one-sided perceptron [78], and for the more general family of U-perceptrons which includes the SBP model under study here, albeit with a non-optimal exponent in N on the right-hand side of Eq. ( 16), and with an additional slowly vanishing term on the right-hand side; see [74,Proposition 4.5].(The latter paper also studies concentration and the sharp-threshold phenomenon for more general disorder distributions.)For our purposes, an essential feature is exponential decay in N θ (t) where θ : + → + is any increasing function with θ (0) = 0. We assume θ (t) = min{t 2 , t} in the above since this is the sub-exponential tail which is expected, but this is not crucial to the proof.Establishing the above assumption is an interesting mathematical problem on its own and goes beyond the scope of this paper.
In the planted model, the local entropy takes the simplified form where the expectation is with respect to x 0 taken uniformly in {−1, +1} N and the conditional distribution G|x 0 is given by Eq. (11).We now show that under Assumption 1, φ N ,δ (r) and φ pl N ,δ (r) are close: Under Assumption 1 we have for all r ∈ (0, 1), lim Proof.We define the random variable X = (1/N ) log N δ Z(x 0 , κ, r).We have pl [X ] = φ pl N ,δ (r) and rd [X ] = φ N ,δ (r).Now for t > 0 fixed, we consider the event A = X − pl [X ] ≤ t .Under the planted model pl we may assume that x 0 = 1 by symmetry of the Gaussian distribution.Therefore by Assumption 1 (with ∆ = (1−2r)κ) we have pl (A c ) ≤ e −cN , c = c(t) > 0. We show that this combined with (14) implies rd (A c ) = o N (1).Indeed for any ϵ > 0, where the o N (1) bound on the second term follows from (14).Taking ϵ = c/2 shows that rd (A c ) = o N (1).Further, observe that 0 ≤ X ≤ log 2, rd -almost surely.Therefore we have The claim follows by letting t → 0 after N → ∞.

IV. MOMENT ESTIMATES IN THE PLANTED MODEL
Now we aim to calculate the limit of φ pl N (r) as N → ∞ for small α.To this end we evaluate the first two moments of Z(x 0 , κ, r) and show that the second moment is only larger than the square of the first moment by an exponential factor which shrinks as α → 0. Then we show that φ pl N (r) is close to its annealed approximation using Assumption 1.We first need to define two auxiliary functions.For a jointly distributed pair of discrete random variables (θ 1 , θ 2 ) let h(θ 1 , θ 2 ) be their Shannon entropy.For m, q ∈ (−1, 1) we define the function where Z 0 ∼ f and the pair (Z 1 , Z 2 ) is a centered bivariate Gaussian vector independent of Z 0 with covariance Theorem 3. Let w = (w a ) M a=1 as in Eq. (11).
where ϕ 1 is defined in Eq. (8), and the inner maximization in Eq. ( 25) is over the joint distribution of two The proof of the above theorem relies on a standard use of Stirling's formula, and is postponed to the end of this section.At this point, if the right-hand side of Eq. ( 25) is equal to twice the right-hand side of Eq. ( 24), a mild concentration argument would allow us to conclude that φ pl N (r) is given by Eq. ( 24) in the large N limit.This equality would follow if the value q = m 2 is a maximizer in Eq. ( 25).This does not appear to be the case for any values of α, κ 0 , κ.However, we show that the difference is vanishing when α → 0. Let where p(κ) = |Z| ≤ κ , Z ∼ N (0, 1).In particular the above difference tends to zero wherever α → 0, κ → 0 with α log(1/κ) → 0, and κ 0 ≪ κ.
Proof.We first remark that by sub-additivity of the entropy, with equality if and only if the pair (θ 1 , θ 2 ) is independent, i.e., if q = m 2 .Moreover we remark that for all q ∈ [−1, 1], where the lower bound follows from the Gaussian correlation inequality [68,76] (with equality if (Z 1 , Z 2 ) are independent, i.e., q = m 2 ) and the upper bound by Cauchy Schwarz (with equality if Z 1 = Z 2 , i.e., q = 1).Using the bounds ( 29) and (30) we have whence, It remains to show that ϕ 1 is a non-decreasing function so that ϕ 1 (m) ≥ ϕ 1 (0) = log p(κ).A simple computation of the derivative of ϕ 1 reveals that where Z 0 has p.d.f. ( 12), and the expectation is taken with respect to Z 0 .Using , the numerator of the above expression can be written as follows: We will show that the above display is non-negative for all values of Z 0 .First, note that this expression is even as a function of Z 0 so we assume Z 0 ≥ 0 without loss of generality.Now, since Z 0 ≤ κ 0 a.s. the above expression is nonnegative if m ≥ κ 0 /κ.Now let us consider the remaining case m < κ 0 /κ.Processing the numerator further we obtain From the bound tanh(x) ≤ x for x ≥ 0 we see that This is non-negative as long as 1 Next, we are ready to prove the main result of this section: Theorem 5.Under the assumptions of Theorem 1 we have where the limit in α is such that α → 0, κ → 0 with α log(1/κ) → 0.
We see that Theorem 1 follows from Theorem 5 and Lemma 2. Now we prove Theorem 5: Proof.We write Z = Z(x 0 , κ, r).All probabilities and expectations are taken under pl .For fixed t, t ′ > 0 to be chosen later we define the events First, we note that by Jensen's inequality, Since by Theorem 3, 1 N log pl Z | w → φ 1 (m) almost surely as N → ∞, we have by dominated convergence, lim sup Next, under where the last inequality follows from C∩D.From Lemma 4 we have φ On the other hand, by our concentration Assumption 1, pl (B c ) ≤ exp(−N min{t ′2 /K 2 , t ′ /K}) where K = K(δ, ∆), ∆ = (1 − 2r)κ is the constant appearing in the assumption, we have by a union bound Now we choose t = 1 3 (2δ + α log(1/p(κ))) and t ′ = t ′ N such that min{t ′2 N /K 2 , t ′ N /K} = (log 16)/N + 2α log(1/p(κ)) + 4δ.We obtain Therefore the bound (47) holds with this choice of parameters: and we obtain lim inf Letting α → 0, κ → 0 such that α log(1/κ) → 0 and then δ → 0 concludes the proof.
Proof of Theorem 3. Let us start with the first moment.First, we have We further have where Z ∼ N (0, 1) independently.Using Stirling's formula, we obtain An application of the strong law of large numbers yields the formula in Eq. (24).We now calculate the second moment: Fix m, q ∈ [−1, 1] and three vectors x 0 , x 1 , x 2 such that 〈x 1 , x 2 〉 = N q, and 〈x i , x 0 〉 = N m, for i = 1, 2. Then as before, where the pair (Z 1 , Z 2 ) is defined as in Eq. ( 23).Furthermore, by symmetry we can assume that x 0 = 1, and we define the set We have Next we compute the size of C(m, q).We can write where is the multinomial coefficient and the sum is restricted to those integers satisfying Using Stirling's formula we find where the maximization is as in Eq. ( 25).(The correspondence being that 2C) for some constant C > 0 by the Azuma-Hoeffding inequality.Since the maximum in Eq. ( 69) is taken over no more than 2N + 1 values we can let t = t N → 0 slowly with N such that N N e −N t 2 N /(2C) < ∞.The Borel-Cantelli lemma and continuity allow us to conclude the proof.

V. ANALYSING THE LOCAL ENTROPY AND ITS THRESHOLDS
Having shown in Theorem 1 that the local entropy φ N ,δ (r) is asymptotically given by the formula max{0, φ 1 (r)} in the limit N → ∞, α → 0 then δ → 0, where We will now focus on the analysis of this function.In App.A we derive the local entropy for generic values of these parameters and show a posteriori how we can recover the limit presented above.A first step to simplify our analysis is to rewrite ϕ 1 (m) in the following fashion where N κ 0 = (|Z 0 | ≤ κ 0 ), and we let DZ 0 = e −Z 2 0 /2 2π dZ 0 denote the Gaussian measure.We recall that the error function is In fact, when α ≪ 1 the local entropy is a non-trivial function for only a restricted range of parameters κ and m.For this to happen the entropic and energetic contributions have to be comparable.This leads us to introduce a rescaling of the form 1 − m 2 = −αr/ log(α) , κ 0 = κ0 −α/ log(α) , and κ = κ −α/ log(α) , (79) in order to have both ϕ 1 (m) and h(m) contributing as a O(α) in the local entropy when α ≪ 1.This first indicates that we can restrict our analysis to a regime where 1 − m ≪ 1.Consequently, the entropic term is simplified to Then, using this rescaling we obtain the simplified form of the local entropy and the equation for its local maxima (at r ̸ = 0) with again N κ0 = (|Z 0 | ≤ κ0 ).The presence of this local maximum in the potential tells us that there is a cluster of atypical solutions with margin κ around each typical configuration with margin κ 0 .In the following, we will denote as s[κ 0 , κ] the local entropy evaluated at this maximum.
In Fig. 1 we display the behavior of the local entropy s[κ 0 , κ] as a function of κ0 and κ.As outlined by the dashed line, clusters exist only for a finite span of values for κ, which depends on the margin κ0 of the reference vector x 0 .Defining κentr (κ 0 ) as the critical value of κ for which clusters disappear, we see from the figure that κentr ≡ min κ0 κentr (κ 0 ) = κentr (κ 0 = 0).In other words, the first clusters to disappear are the ones formed around a reference vector at κ0 = 0.In particular, this corresponds to planting at κ0 = κSAT as we have Since clusters are associated with AMP/TAP fixed points, κentr corresponds to the margin above which the AMP/TAP initialized close to a typical solution with margin κ 0 = κ SAT converges to the same fixed point as would be reached from a random initialization.The existence of solutions from which AMP/TAP converges to this trivial fixed point was linked to the onset of a region where algorithms may be able to find solutions.More precisely, numerical evidence in the literature suggests that solutions that are found by efficient algorithms do not correspond to other AMP/TAP fixed points than the one reached from random initialization [56,69].
As shown in Fig. 2, in which we plant at κ 0 = κ SAT , the local entropy undergoes two interesting thresholds with distinctive values of κ (for fixed α).One being the value of κ above which the potential remains positive for all m, we will refer to it as the energetic threshold with κ = κener (κ 0 ).In other words, this means that above this critical margin we can find solutions to the symmetric binary perceptron with margin κ at any distance from the reference vector.The second critical value for κ corresponds to the loss of the local maximum at m ̸ = 0 in the potential.This corresponds to the entropic threshold that we mentioned earlier with κ = κentr (κ 0 ).
In the two following sections, we focus our analysis on these two thresholds in the case where κ0 = 0. Again, this choice is justified by the fact that the energetic and entropic threshold happen first when planting at κ0 = κ SAT → α→0 0, i.e. κener (0) = min κ0 κener (κ 0 ) , ( 84) Similarly to the entropic threshold, we will use in the following the shortening κener (0) = κener .

A. Energetic threshold
The energetic threshold occurs when in a range of intermediate distances the local entropy φ 1 (r) is negative.This means that we want to find the exact point where the minimum of the entropy (excluding m = 1) is zero.We start by setting κ0 = 0 in Eq. ( 81) to obtain the simplified form of the local entropy The potential is then null when Finally, if we solve the two previous equations, we obtain the set of values {κ ener , r} for which the potential stops being negative for any value of the magnetization m.Numerically we obtain

B. Entropic threshold
The entropic threshold occurs when the local maximum other than m ̸ = 0 of the free entropy cease to exist.We recall that the local entropy for κ0 = 0 reads Finally, we can solve numerically the two previous equations and we obtain

C. Complexity versus entropy
In this section, we focus on the relation between the complexity of the clusters around the high-margin solutions and their local entropy.We define the complexity as the logarithm of the number of clusters around solutions at margin κ 0 , normalized by N , and we recall that the local entropy of a cluster is the value of the local entropy φ 1 (r = 1−m 2 ) at the nearest local maximum to the reference solution.By contiguity to the planted model, the clusters of solutions with margin κ > κ 0 living around two different planted configurations are distant, since the reference configurations are nearly orthogonal with high probability.Thus, heuristically, counting their exponential number (or complexity) simply consists of enumerating the number of typical solutions at κ 0 we can plant.
Taking these previous considerations into account the obtained clusters have a complexity that depends solely on κ 0 while their local entropy is a function of κ and κ 0 .Fixing κ while tuning κ 0 enables us to scan across sets of clusters with different complexities and local entropies, all containing atypical solutions of the symmetric binary perceptron with margin κ.More specifically, the complexity is and the entropy of a cluster is in which m is evaluated with the fixed-point equation (98) Using the rescaling from the previous section we can finally write for these two functions in the leading order in α → 0 : where we recall that r is evaluated with In Fig. 2

VI. ANALYSIS OF THE CLUSTERED STRUCTURE THROUGH THE REPLICA METHOD A. The 1-RSB free energy
In this section, we show how the clustered structures we obtained with the planting approach can also be observed via the ordinary 1-RSB computation [73].For this we will consider the set of solutions S(G, κ) of the unbiased symmetric binary perceptron.In particular, we will consider its cardinality and its total entropy function as the logarithm of Z averaged over the disorder G So as to perform the average over the disorder we will use the replica trick [73].This trick takes the form of where each of the n introduced copies of the system is called a replica.This technique enables to shift from a computation where interactions are random and the replica decoupled to a computation where the replica interacts with deterministic couplings.With this approach, the rest of the computation mainly consists in evaluating the quantity G [Z n (κ)] at a fixed-point of the overlap matrix Q ∈ IR n×n , where Moreover, as the constraints on the overlaps are introduced in the following fashion we will have also to evaluated G [Z n (κ)] at a fixed-point of the matrix Q.In more detail, the computation consists of evaluating The computation of G [Z n (κ)] with the 1-step replica symmetric (1-RSB) ansatz implies the following form for the matrices With this ansatz Eq. (108) boils down to For more details on the calculation steps to derive φ 1−RSB we redirect the interested readers to the first appendix of [45].
Before moving on with the analysis of the 1-RSB potential, a first simplification consists in taking into account a symmetry in the in/out channels: . Indeed, this symmetry implies that optimizing the potential yields the solution q 0 = q 0 = 0. Thus, in the following, we will always take this solution.Then, the remaining equations we have to verify for the fixed point are With these definitions, the entropy and complexity of the clusters can be determined at the fixed point as

B. The 1RSB solution at finite α
When it comes to solving the 1RSB equations, we focus in this subsection on α = 0.5 as a representative value not close to zero, the corresponding satisfiability threshold is κ SAT (α = 0.5) = 0.319.We obtained four branches of solutions when solving the fixed-point equations (114, 115) with respect to q 1 and q 1 for the 1-RSB potential (and browsing through values for the Parisi parameter x).Two of these solutions are unstable under the iteration scheme while the remaining two are stable.When browsing different values of x, we also observe a threshold value for κ for which the overall behavior of these fixed points changes.We will call this value κ break (α = 0.5) ≈ 0.455.In Fig. 4 (left panel) we plot the complexity Σ as a function of their entropy s for the four branches.When tuning x each solution describes a trajectory that we highlighted with either a dashed (unstable fixed point) or a full line (stable fixed point).One key question arising from these results is how we should select the fixed-point branch that corresponds to the actual clusters of solutions in the problem.First, we clearly need to restrict to non-negative Σ and non-negative s.Moreover, we know that the correct equilibrium state is given by the solution where the total entropy is maximized.For the present model, this happens for s = 0 when the (negative) slope of the Σ(s) curve is infinite.This can be seen by realizing that the slope of the curve Σ(s) is much smaller than −1.We recall that this slope is equal to −x, where x is the Parisi parameter, as explained in Eq. ( 116).We highlighted this equilibrium point with a colored dot in the left panel of Fig. 4. In particular, this point Σ(0) corresponds to the equilibrium frozen 1RSB solution of the SBP problem with a value corresponding to one computed in [45].We note that this criterion for equilibrium is rather unusual among other models where the 1RSB solution was evaluated.Usually, either both Σ > 0 and s > 0 at the point where the negative slope x = 1 corresponding to the so-called dynamical-1RSB phase, or the maximum is achieved when Σ = 0 at a (negative) slope strictly between 0 < x < 1 corresponding to the so-called static-1RSB phase.Here, we observe the equilibrium being achieved for x → +∞ corresponding to frozen-1RSB at equilibrium.Finally, we observe that for κ > κ break ∼ 0.455 (still considering α = 0.5) the curve Σ(s) for positive values of both s and Σ breaks into two branches.Consequently, there is a finite range of values for the entropy s where we do not obtain any fixed point.The meaning of such a gap is unclear, but it appears in other problems and their 1-RSB solution [81].
FIG. 4. We plot in these two panels the complexity Σ as a function of the entropy s, in both cases α = 0.5.On the left panel, we plot all the branches obtained when solving the saddle-point equations with respect to q 1 and q 1 for the 1-RSB potential (and browsing through values for the Parisi parameter x).As a guide-to-the-eye we emphasize each of the four branches of solutions with either a full or dashed line.The full lines correspond to stable fixed points regarding the iteration scheme of Eq. ( 117), while dashed ones correspond to unstable fixed points.We highlighted with colored dots the fact that certain branches stop at s = 0 at a value of Σ corresponding to the equilibrium solutions.To reach this point, the Parisi parameter x has to be set to infinity.On the right panel, we compare the results obtained by the branches yielding this equilibrium complexity with the ones obtained with the previous planting method.We added colored dots when the fixed point (either of the planted or the 1-RSB saddle-point equations) with maximum entropy s was obtained for Σ = 0.
In Fig. 4 right panel, we plot the complexity Σ as a function of the entropy s selecting the branch that is an analytic continuation of the equilibrium Σ(0) point and compare it to the one obtained via the planting approach.For this comparison, we need the local entropy in the planted model at finite values of α that is derived in appendix A. We note that the two complexities exactly agree at s = 0 as is expected because at s = 0 both the complexities correspond to the total number of solutions at that κ.For s > 0, the two complexities have a similar shape, being clearly convex for small values of s.We note again that the overall values of Σ are larger than the values of s meaning that the slope actually takes a rather large value in the whole range of those curves.We recall that in the context of the 1-RSB computation this slope is equal to −x, where x is the Parisi parameter.Taking its value much larger than one is not common in other models for which 1RSB was studied.This is likely the reason why these extensive size clusters were not described earlier in the literature for the binary perceptron.We further see that the 1RSB complexity, when it exists, is strictly larger than the one obtained via planting as again expected since via planting we obtain only some of the clusters of solution whereas the 1RSB computation should be able to count all of them.Then, when κ > κ break , the planted model predicts the existence of clusters with an internal entropy that lies inside the fixed-point gap of the 1-RSB approach.This indicates that the 1RSB solution does not fully describe the space of solutions in this case.This may have many causes.For example, we may have missed a branch of fixed points in our analysis of the 1-RSB potential.Or, this region may involve a replica ansatz with further symmetry breaking.Or perhaps these rare clusters simply cannot be obtained with a replica computation.Finally, when κ > κ ener (α = 0.5) ≈ 0.499, the curves Σ(s) obtained with planting stop at some positive values of Σ and s and thus look again qualitatively similar to the portion of the curve Σ(s) that is obtained from 1RSB by analytically continuing from the equilibrium Σ(s = 0) point.
Overall, the 1RSB approach evaluated at sufficiently larger values of the Parisi parameter x identified clusters of extensive size in parts of the solution corresponding to a convex curve that is unstable under the iterations of the 1RSB fixed point equations.These curves are partly compatible with the complexity obtained from planting.Yet there are still regions of κ, s for which we obtain extensive clusters of solution from the planting procedure but not from the 1RSB.The reason behind this paradox is left for future work.
For small α, the situation becomes actually clearer.In the next section, we will discuss this case.
C. The α → 0 and x → +∞ limit We now focus, as in the first part of the paper, on the regime of small α.Using our results from the planting computation, and anticipating a similarity of behaviour in the 1RSB, we can deduce the behavior of the Parisi parameter x in the low α limit.Indeed, alike the 1-RSB computation, we saw that the planting approach probes clustered solutions.It also allows for computing their complexity Σ and local entropy s, see Eq. ( 99) and (100).As mentioned above, in the context of a 1-RSB computation we have ∂ Σ/∂ s = −x.Thus, if we plug-in the entropy and complexity from Eq. ( 99), (100) we can compute ∂ Σ/∂ s and estimate the Parisi parameter.
By doing so we obtain two regimes for which the slope ∂ Σ/∂ s becomes infinite in the low α limit.First, when κ0 ≪ κ the entropy remains constant at first order in α while the complexity roughly jumps from Σ o to zero, see the left panel in Fig. 2. It indicates that to recover these states with the 1-RSB computation we should set x ≫ 1.The second regime for which we observe an infinite slope corresponds to κ0 ≈ κ.Indeed, we have |∂ Σ/∂ s| ∼ (κ − κ0 ) −1 close to κ0 = κ.Consequently, if we want to probe the clusters with almost zero local entropy we should also set x ≫ 1 to obtain this regime.
A last piece of information given by the planted model is that these clusters (in the two regimes mentioned above) correspond to a limit where q = m 2 ≈ 1.Thus, if we impose the same condition in the 1-RSB fixed-point equations, we have that setting q 1 ≈ 1 in Eq. (115) implies q 1 ≫ 1.Therefore, in order to find these clusters we will not only set x ≫ 1 but we will also take q 1 ≈ 1 and q 1 ≫ 1.
The first regime we mentioned will be referred as the maximum entropy regime, while the second one will be referred as the minimum entropy regime.First, we see that the entropic contribution can be simplified identically in both regimes.Indeed, when setting q 1 ≫ 1 we obtain As we will see in the following subsections, the simplification for the energetic term will be regime-dependent.

Maximum entropy regime
For the maximum entropy regime, we can again go back to the results from the planting model to help us make the correct approximation.We know, for example, that we should have 1 − q 1 ∼ κ 2 in the low α limit (as both quantities have the same scaling in α).This implies that we should have Therefore, with x ≫ 1, we will approximate the energetic term with a saddle-point method.In other words, we will compute where Z 0 corresponds to the maxima of φ κ out [ q 1 z, 1 − q 1 ].In particular, with this function we have Z 0 = 0.By doing the saddle-point approximation in Z 0 = 0 we obtain Finally, if we combine the simplification of both the entropic and energetic contributions the total entropy becomes and its fixed point equations are, at first order in x, , ( 126) Now, we are able to draw a direct parallel with the planted system at low α and κ 0 ≪ κ.Indeed, if we use the correspondence q 1 ≡ m 2 and (x − 1) q 1 ≡ m, Eqs.(126,127) are nothing but the fixed-point equations for the planted model (see Eqs. A23, A24).This indicates a posteriori that the 1-RSB calculation enables us to recover the same clusters as in the planted model where we have set κ 0 ≪ κ.To make the identification between the two approaches even more direct we can focus on the entropy and the complexity of this 1-RSB fixed-point.We obtain at first order in x that and Thus, at first order in x these clustered states have exactly the same entropy and complexity as the ones from the planted system with κ 0 ≪ κ (and α ≪ 1).

Minimum entropy regime
In this section, we want to probe clusters with a very small entropy.Now, keeping κ fixed, this means that we will have to set q 1 extremely close to one and eventually have 1−q 1 ≪ κ in order to go up to zero local entropy.This scaling between q 1 and κ is incompatible with the saddle-point approximation we performed for the maximum entropy regime, as Eq. ( 121) is not verified anymore.In fact, in this case, we have to introduce an asymptotic expansion where we used the identity erf We then estimate the interval of value of z (z ∈ [−Z 0 , Z 0 ]) for which e xφ κ out [ q 1 z,1−q 1 ] remains finite.In other words, we compute the value of z for which the function e xφ κ out [ q 1 z,1−q 1 ] is equal to an arbitrary value 1/C, where W 0 (.) is the Lambert function with branch index k = 0.This computation thus shows that e xφ κ out [ q 1 z,1−q 1 ] jumps from 1 to any arbitrary fraction 1/C exactly at In this limit, we thus have for the energetic contribution And finally, if we put together the simplified entropic and energetic contribution to the 1-RSB potential we obtain φ 1−RSB ≈ − x q 1 2 + x(1 − x)q 1 q 1 2 + α log erf κ − 2(1 − q 1 ) log x 2q 1 (135) + log(2) + x 2 q 1 2 + x e −2(x−1) q 1 and the corresponding fixed point equations are (at first order in x) x 2 q 1 2 = α e −κ 2 /2 erf κ 2 log x π(1 − q 1 ) , ( 136) The combination of these two fixed-point equations implies κ ≫ 2(1 − q 1 ) log x.Consequently, the term x 2 (1 − q 1 ) q 1 can be neglected and the 1-RSB free energy boils down to We thus recover the case of the planted system at κ 0 = κ as we have In [45] the authors showed in a more standard computation that these equilibrium configurations (verifying a frozen 1-RSB structure) can also be obtained by imposing q 0 = 0, q 1 = 1 and x = 1.
In Fig. 5 we display for several values of κ = κ − log(α)/α the complexity as a function of the entropy.The light-colored full lines correspond to the results obtained with the planted model.The dashed and dotted lines correspond respectively to the maximum and minimum entropy fixed-point branches of the 1-RSB free energy.To obtain them we solved the fixedpoint equations in each regime for large but finite Parisi parameter x.As shown by the previous computations each end of the curve sees a close match between the planting approach and one of the x → +∞ regimes.This leads us to conjecture that in the limit of small α, the planted and 1-RSB Σ(s) curves match exactly.In other words, the planting approach actually captures the dominant clusters for each size s.125) and (135) over q 1 and q 1 with large but finite values of the Parisi parameter x.In order to compare the 1-RSB computation with the planting approach, the full curves correspond to the complexity and entropy obtained with the planting computation.

VII. CONCLUSION AND DISCUSSION
We study the local entropy in the SBP problem around solutions planted at a smaller margin.Our results are rigorous in the limit of small α, conditional on a condition of concentration of a certain entropy.We identify clusters of solutions of an extensive entropy as local maximizers of this local entropy.We identify two thresholds κ ener and κ entr that we consider of particular interest.κ entr is the smallest κ at which the planted clusters and the corresponding maximum disappear, thus presumably melting into an extended structure that may be accessible to efficient algorithms.κ ener is a value above which there are solutions at all distances from the planted solutions and as such is an upper bound on the overlap gap property threshold.
We then investigated the 1RSB solution of the symmetric binary perceptron problem and showed how it allows us to identify extensive clusters of solutions without introducing concepts that are not present in the canonical 1RSB computation already.It suffices to consider large values of the Parisi parameter x and both convex and concave parts of the Σ(s) curve.We discuss how the equilibrium frozen-1RSB is recovered in the x → ∞ limit.While this resolves some open questions about the 1RSB solution for binary perceptions, we conclude that the 1RSB calculation is incomplete at finite α as we did not find solutions corresponding to all the extensive clusters identified by the planting procedure.
We further showed that while, in general, the planting procedure we study does not describe all the rare clusters, in the limit of small α it seems that the Σ(s) obtained via planting is exactly the same as the one obtained from the 1RSB.This leads us to conjecture that in the limit of small α the planting actually describes almost all clusters of a given size.

FIG. 2 .
FIG.2.On the left panel we plot the local entropy φ 1 (r) = h(1 − 2r) + αϕ 1 (1 − 2r) in the small α limit as a function of the rescaled distance r = −4r log(α)/α; see Section II.Each curve corresponds to a different value of κ (with κ = κ/ −α/ log(α) ), where we have set κ 0 = κ SAT (α) and thus κ 0 /κ→0 with α → 0. First, we highlight the value κener above which the entropy remains positive for any distance r > 0.Then, we have a second critical value for κ that we label κentr .It corresponds to the minimum value of κ for which the local entropy has a local maximum with r ̸ = 0. On the right panel we plot the complexity Σ[κ 0 ] as a function of the local entropy s[κ 0 , κ] (with κ0 = κ 0 / −α/ log(α) ).In this context, the complexity corresponds to the exponential number of possible planted configurations at κ 0 .The entropy corresponds to the local entropy φ 1 (r) evaluated at its local maxima in r for fixed κ0 and κ.To obtain a O(1) scaling of the complexity in the low α limit we subtracted Σ o = log(2) + α log(−α/ log(α)).Finally, for κ > κentr a span of value for κ0 yields no local maxima for the local entropy.This explains the sudden stop in the right-hand tail of the purple and brown curves (for which κ > κentr .

)
Now the goal is to show that pl (A∩ B ∩ C) > 0. Let w be such that C ∩ D holds.It follows by the Paley-Zigmund inequality that pl (A| w) = pl max e N δ , Z ≥ max e N δ , pl [Z | w] /2 w (48) -trivial local maximum obtained by solving the fixed point equation FIG. 3. Sketch of the behavior of the complexity as a function of the local entropy.First, close to s[κ 0 , κ] = 0, we observe a regime in which the curve is convex.Then, as κ0 is lowered (keeping κ0 − κ = O(1)) the curve becomes concave.The complexity remains such that Σ[κ 0 ] − Σ o = O(α) while the entropy increases up until the upper bound s[0, κ].Finally, as we set κ0 ≪ κ, the entropy remains constant at s[0, κ] while the complexity drops to zero.

FIG. 5 .
FIG.5.We plot the complexity Σ as a function of the entropy s for several values of κ.The dashed and dotted curves correspond to the maximum and minimum entropy regime and are obtained by optimizing the potentials Eqs.(125) and (135) over q 1 and q 1 with large but finite values of the Parisi parameter x.In order to compare the 1-RSB computation with the planting approach, the full curves correspond to the complexity and entropy obtained with the planting computation.