Eigenvector dreaming

Among the performance-enhancing procedures for Hopfield-type networks that implement associative memory, Hebbian unlearning (HU) (or dreaming) strikes for its simplicity and lucid biological interpretation. However, it does not easily lend to a clear analytical understanding. Here, we show how HU can be efficiently described in terms of the evolution of the spectrum and the eigenvectors (EVs) of the coupling matrix. That is, we find that HU barely changes the EVs of the coupling matrix, whereas the benefits of the procedure can be ascribed to an intuitive evolution of the spectrum. We use these ideas to design novel dreaming algorithms that are effective from a computational point of view and are analytically far more transparent than the original scheme.


I. INTRODUCTION
Consider a fully connected network of N binary variables {S i = ±1}, i ∈ [1, .., N ], linked by couplings J ij .The network is endowed with a dynamics which can be run either in parallel (i.e.synchronously) or in series (i.e.asynchronously in a predetermined or in a random order) over the i indices.This kind of network can be used as an associative memory device, namely for reconstructing an extensive number P = αN of binary patterns {ξ µ i } = ±1, µ ∈ [1, ..., P ], called memories.In this work, we will focus on i.i.d.memories, generated with a probability P (ξ µ i = ±1) = 1/2.We consider a recognition process based on initializing the network dynamics to a configuration similar enough to one of the memories, and iterating eq. ( 1) asynchronously until a fixed point is reached.The network performs well if such asymptotic states are similar enough to the memories.Whether this is the case depends on the number of patterns one wants to store and on the choice of the coupling matrix J. Hebb's learning prescription [1] used in the seminal work of Hopfield [2], allows retrieving memories up to a critical capacity α H c ∼ 0.14 [3].In this model even when α < α H c memories are not perfectly recalled, but the state of the system always presents a small finite fraction of misaligned spins.This feature is linked to the value of the minimum stability ∆ min , defined as where the stability ∆ µ i is defined by The value of the stability tells us if a given pattern is aligned or not to its memory field.As soon as ∆ min > 0, memories themselves become fixed points of the dynamics [4], allowing error-less retrieval when the dynamics is initialized close enough to one of them.
Several techniques have been developed to build better performing coupling matrices, i.e. to reduce the retrieval error and increase the critical capacity as well as the size of the basins of attraction to which the memories belong [5][6][7][8][9].One such technique is Hebbian Unlearning.

II. HEBBIAN UNLEARNING (HU)
Inspired by the brain functioning during REM sleep [10], the unlearning algorithm [11][12][13][14] is a training procedure for the coupling matrix J, leading to error-less retrieval and increased critical capacity in a symmetric neural network.The coupling matrix is built according to the following iterative procedure:
The learning rate ϵ and the number of dreams D max are free parameters of the algorithm.The algorithm 1 does not change the diagonal elements of the coupling matrix, which are fixed to J ii = 0.For sufficiently small values of the learning rate, below the critical load α < α HU c ∼ 0.6 the evolution of ∆ min follows a nonmonotonic curve as a function of D max , as illustrated in fig. 1.The number of dreams D = D in marks the point where ∆ min crosses 0.Here all the memories are fixed points of the dynamics.Other two points, D = (D top , D f in ) are shown in the plot, corresponding to the maximum of ∆ min and the point where ∆ min becomes negative again.The scaling of (D in , D top , D f in ) was studied in [14].
In addition to error-less retrieval, when α < α HU c , dreaming creates large basins of attraction around the memories.This can be measured in terms of the retrieval map Here, ⃗ S µ (∞) is the stable fixed point reached when the dynamics is initialized to a configuration ⃗ S µ (0) having overlap m 0 with a given memory ⃗ ξ µ .The symbol • denotes the average over different realizations of the memories and ⟨•⟩ the average over different realizations of ⃗ S µ (0).We show in Fig. 2 the retrieval map for N = 1000 and α = 0.4.The performance of HU is best at D = D in .Interestingly, as discussed in [14], the curve relative to Gardner's optimal symmetric perceptron [4,5] and to unlearning at D = D in coincide with good accuracy.

III. TWO NOVEL DREAMING ALGORITHMS
An interesting interpretation of the HU algorithm emerges when analyzing the evolution of the spectrum and of the eigenvectors of the coupling matrix J during the dreaming procedure.Before dreaming, the spectrum of J is of the Marchenko-Pastur type [15], and the Ndimensional vector space is split between a degenerate N −P dimensional eigenspace orthogonal to all the memories, and a P dimensional space spanned by the memories, split in non-degenerate eigenspaces.Fig. 3 focuses on the evolution under dreaming of the ranked spectrum of J.The evolution of the ranked spectrum indicates that HU is targeting, and reducing, the largest eigenvalues of the coupling matrix, while all other eigenvalues are increased by a constant amount at every dream, maintaining a traceless coupling.This leads to a plateau on the high end of the ranked spectrum.In fig. 4  to ω.For clarity, only lines corresponding to overlaps larger than 0.1 are shown.As the dreaming procedure unfolds, the majority of the eigenvectors does not change much (blue lines), and lines do not cross.This means that eigenvalues evolve continuously, while the corresponding eigenvectors barely change.The highest and lowest part of the ranked spectrum, on the other hand, show some crossing of lines, and low values of the overlaps (in red).This is due to the eigenvalues becoming almost equal, leading to an effectively degenerate eigenspace, corresponding to the plateau in fig. 3.
These observations suggest the following alternative al-gorithm.
A. Eigenvector dreaming

Algorithm 2 EVdreaming
Initialize J using Hebb's rule eq. ( 2) for D = 1 to Dmax do 1-Find an orthonormal basis of eigenvectors ζ µ of J.
2-Select the eigenvector ζ u D with the largest absolute eigenvalue. 3 In this algorithm, the update of the couplings reduces the value of the highest eigenvalue by an amount ϵ, leaving the eigenvectors unchanged.Resetting the diagonal to zero, on the other hand, increases the value of every eigenvalue by a stochastic amount (see section III B), and also modifies the eigenvectors.Each step of this algorithm is based on the spectrum of the current coupling matrix.Note that this algorithm could be implemented using purely local rules, by iterating the synchronous update σ t+1 = f (Jσ t ) withf (x) = x ||x||2 , which converges towards the eigenvector of J with the largest eigenvalue.

B. Initial Eigenvector dreaming
An even simpler dreaming procedure, which does reproduce the qualitative features of HU (specifically the centrality of the spectrum evolution and the marginality of the eigenspaces evolution) is obtained by modifying the coupling matrix on the basis of the eigenvectors of the initial coupling matrix J H , as listed in algorithm 3. We call this procedure Initial Eigenvector dreaming (IEVdreaming).This algorithm is simple enough that it can be analyzed in some detail.

C. A first analysis of IEVdreaming
As a first approach, imagine removing step 5 of the iterative process, and simply setting the diagonal to zero after the for cycle.The resulting J reads where the average ⟨(ζ u d i ) 2 ⟩ is computed over the statistics generated by the choice of the eigenvector u D to be dreamed at each step, given the realization of disorder (i.e. the value of the eigenvectors ζ µ i ).Since the eigenvectors of a Wishart matrix are isotropically distributed on the (N − 1)-dimensional sphere, one has that The result is then where The first two terms preserve the eigenvectors of J.The η correction changes both the eigenvectors and eigenvalues of the coupling matrix, and assuming that η is small enough, we can compute those changes perturbatively.
In particular, the degenerate eigenspace corresponding to the low eigenvalue plateau will be split by corrections λ → λ + δλ i , i = 1, ..., N − P given by the N − P eigenvalues of the matrix where the eigenvectors all belong to the low eigenvalue degenerate plateau (any orthonormal set of eigenvectors is equivalent).In the thermodynamic limit, the impact of η on J becomes negligible, as shown in fig. 5.The xaxis represents N .The y-axis represents the eigenvalues of the A matrix eq. ( 9) divided by the absolute height of the low plateau.In the thermodynamic limit, all curves tend to zero, showing that the corrections become negligible compared to the low plateau value.Some insight into this behavior can be gained by considering the statistics of the diagonal element of η.Their average is zero, by definition.If the ξ µ i involved in eq. ( 8) were a finite number, they could be treated as i.i.d.normal variables N (0, 1/N ), and the statistics of η could be heuristically understood as proportional to a χ 2 distribution, whose variance scales as 1/N (this is not exact, since not every eigenvector is dreamed the same number of times).Since we are dreaming an extensive number of eigenvectors, the ξ µ i are not independent (for one thing, they are constrained by normalization N µ=1 ξ µ i = 1).Intuitively though, this has the effect of reducing the variance of η ii .Hence, the χ 2 distribution is an upper bound for the size of η, going to zero.Given this, the dreaming procedure is described by the simple update rule This algorithm is very inexpensive from the computational point of view, since one does not need to compute eigenvectors multiple times.
Whether the correction to the diagonal elements of J is carried out at each step of the algorithm or at the end, affects the choice of the eigenvector that gets dreamed: if the correction is carried out at the end, the negative degenerate plateau will quite soon be higher in absolute value than the high plateau (we call this inversion).Then, the algorithm will start selecting eigenvectors from the low plateau, which are orthogonal to the memories, having no effect on the stabilities.On the other hand, the choice in algorithm 3 reproduces the qualitative behavior of HU in an analytically simple setting, since taking out the diagonal at each step decreases the absolute value of the low negative plateau while increasing the absolute value of the positive plateau, delaying the inversion.

IV. ALGORITHM PERFORMANCE
In fig.6  lar performance before the inversion point D inv (marked by circles on the curves in fig.6).This also indicates that the IEVdreaming is indeed a good model of EVdreaming.They also display the same qualitative behavior as HU.In fig.6, crosses on the curves indicate when the algorithms start dreaming for the first time the lowest eigenvalue of the high portion of the ranked spectrum.This condition corresponds to the highest portion of the ranked spectrum becoming a plateau.In our new procedures this instant is very close to D top .After D top , IEV and EV display a plateau in the stability curve, which lasts until the inversion point, marked by dots in the curves.After the inversion point, which experimentally happens first in EVdreaming, EV and IEV display different behaviors, since the procedure becomes very sensitive to the eigenvectors dreamt.The behavior of IEV dreaming is detailed in section V.
In fig.7 we compare the different algorithms in terms of the retrieval mapping, at d = D in , where the performance is optimal.The quantitative differences in the ∆ min profile between the algorithms are reduced to virtually no difference, when the retrieval mapping is concerned.Below the critical load wide basins of attractions are produced around the memories.
Defining the critical capacity of an algorithm α c as the highest load such that ∆ min > 0 is reached before D inv , we find α At each dream, the change in the ranked spectrum consists of an increase of every eigenvalue due to the resetting to zero of the diagonal elements of J, and a decrease of the dreamed eigenvalue, as per eq.( 10).Prior to D top , i.e. before the high part of the ranked spectrum is completely flattened into a plateau, the evolution of the spectrum can be characterized by: while δ(D) can be determined numerically, noting that the area A(D) is Similar geometrical reasoning for D > D top leads to even simpler equations: Given these relations, D top and D inv are determined by These theoretical results for D top and D inv are compared to the results of the numerical simulations in fig.9, with excellent agreement.In IEV dreaming, the evolution of the stabilities is determined exclusively by the evolution of the spectrum of J, since the eigenvectors do not change.
where w µ ν are the coordinates of the memories in the basis of the eigenvectors After D top , when the spectrum is composed by two plateaus P±, this expression simplifies to which is constant (after D top ) as a consequence of eqs.( 17) and (18).This explains the plateaus in fig.6.
For α < 0.5, one has D inv = P/ϵ, and λ l (D inv ) = λ 1−α (D inv ) = 0.This means that at D inv we have J = 0.In numerical simulations, given the finite value of ϵ, this never happens.Instead, from D inv the network dreams every eigenvector of the high plateau, making it smaller than the low plateau, and then every eigenvector in the low plateau.Over N dreams, all eigenvectors have been dreamed once.Thus, each eigenvalue is decreased once by −ϵ and increased N times by ϵ N , restoring it to the initial value.This reflects in a periodic behavior of ∆ min , which oscillates (see fig. 6).For α > 0.5, on the other hand, the inversion happens with well separated plateaus λ l (D inv ) < 0 < λ 1−α (D inv ).Hence, around D inv , when the high plateau and the low plateau become closer than ϵ in absolute value, the network starts dreaming one eigenvector of the low plateau.At each dream, the corresponding eigenvalue is made even smaller, i.e. bigger in absolute value, and the network gets stuck dreaming it repeatedly.Asymptotically, this eigenvector (orthogonal to the memories) dominates the coupling matrix, leading again to zero stability without oscillations (see fig. 6).

VI. CONCLUSIONS
In this paper we unveiled an interesting feature of Hebbian Unlearning, namely the fact that eigenvectors of the coupling matrix do not change significantly during the algorithm, and the improvement in recognition performance is mostly due to a modification of the spectrum.Starting from this observation, we have proposed two new effective unlearning algorithms: Eigenvector dreaming and Initial Eigenvector dreaming, which emphasize the splitting of the learning problem into a trivial eigenvector evolution and a non-trivial spectrum evolution.IEVdreaming is the simplest algorithm, being computationally efficient and easy to control analytically.IEVdreaming turns out to give a very good description of EVdreaming, and a qualitatively good description of HU.Finally, in our new algorithms, we found a strong correlation between the moment when lowest eigenvalues of the high plateau starts being dreamed, and the moment when the algorithm stops increasing the minimum stability ∆ min .This correlation, which follows from simple analytical arguments in the case of IEV dreaming, is also present, to a lesser extent, in HU.

FIG. 1 .
FIG. 1.The minimum stability ∆min as a function of the normalized number of dreams, for different values of α.The threshold ∆ = 0 is indicated with the gray dotted line.For α < 0.59, ∆min crosses zero at Din, peaks at D = Dtop and then becomes negative again at D = D f in .Where appropriate the three relevant amounts of dreams are indicated: D = Din by "x", D = Dtop by a dot, D = D f in by a "+".All measurements are averaged over 50 realizations of the network.N = 400, ϵ = 10 −2 .

FIG. 2 .
FIG. 2. Retrieval map m f (m0) for the unlearning algorithm at the three relevant steps indicated in Fig. 1, and before unlearning.All measurements are averaged over 10 realizations of the network.N = 1000, α = 0.4, ϵ = 10 −2 .The performance of the algorithm is maximal ad D = Din.
FIG.3.On the y-axes, the value of the eigenvalues; on the x-axes their ranking.Curves of different colors correspond to measures of the ranked spectrum taken after different amounts of dreams.Before dreaming, the spectrum is of the Marchenko-Pastur type.HU progressively flattens the high portion of the ranked spectrum

1 - 4 - 5 -
Initialize J using Hebb's rule eq.(2) 2-Find an orthonormal basis of eigenvectors ζ µ of the initial coupling matrix.for D = 1 to Dmax do 3-Consider the most recent coupling matrix J D−1 , and select the eigenvector ζ u D with the largest absolute eigenvalue.Update Jij ← Jij − ϵζ u D i ζ u D j .Remove the average value of the diagonal elements of J: Jii ← Jii − ϵ N .end for

FIG. 5 .
FIG. 5. Dispersion of the corrections to the low plateau eigenvalues, divided by the low plateau eigenvalue, at Dtop, as a function of N, for different values of α.As the system size is increased, the corrections become negligible compared to the low plateau eigenvalue.
FIG. 6. Evolution of ∆min while iterating different dreaming procedures, for some α values.N = 400, ϵ = 0.001.Dtop is indicated by a cross, Dinv is indicated by a dot.The new algorithms have very similar performances before Dinv, indicating the IEVdreaming is indeed a good model of EVdreaming.
CHARACTERIZATION OF IEVDREAMINGIn the case of IEVdreaming, both the values of D top and D inv can be computed analytically.Let us define by λ l (D) the height of the low plateau, by λ 1−α (D) the height of the lowest eigenvalue in the high part of the ranked spectrum, and by δ(D) the distance between the

FIG. 7 .
FIG. 7. Retrieval mapping for the various dreaming procedures, at D = Din, where attraction basins are the largest.N = 400, α = 0.4, ϵ = 0.01.Different curves coincide, suggesting that our new dreaming procedures capture the essence of HU.

FIG. 9 .
FIG. 9. Comparison between analytical estimate and simulations for Dinv and Dtop as a function of α.Parameters for the simulations are N = 1000, ϵ = 0.001.The agreement is excellent, as finite size effects are already small at this size.
VII. ACKNOWLEDGMENTSEM acknowledges funding from the PRIN funding scheme (2022LMHTET -Complexity, disorder and fluctuations: spin glass physics and beyond) and from the FIS (Fondo Italiano per la Scienza) funding scheme (FIS783 -SMaC -Statistical Mechanics and Complexity: theory meets experiments in spin glasses and neural networks) from Italian MUR (Ministery of University and Research).MM acknowledges financial support by the