Cycle equivalence classes, orthogonal Weingarten calculus, and the mean field theory of memristive systems

It has been recently noted that for a class of dynamical systems with explicit conservation laws represented via projector operators, the dynamics can be understood in terms of lower dimensional equations. This is the case, for instance, of memristive circuits. Memristive systems are important classes of devices with wide-ranging applications in electronic circuits, artificial neural networks, and memory storage. We show that such mean-field theories can emerge from averages over the group of orthogonal matrices, interpreted as cycle-preserving transformations applied to the projector operator describing Kirchhoff’s laws. Our results provide insights into the fundamental principles underlying the behavior of resistive and memristive circuits and highlight the importance of conservation laws for their mean-field theories. In addition, we argue that our results shed light on the nature of the critical avalanches observed in quasi-two-dimensional nanowires as boundary phenomena.


I. INTRODUCTION
With the increasing interest in developing neuromorphic computers, it is crucial to understand physical devices that exhibit similar structures and functionalities to biological neural networks.The nonlinear interactions and complex connectivity of biological neuronal networks are well-known characteristics [1].Yet, it is still a mystery why low-dimensional representations of certain rather complex dynamical systems exist [2,3] even if in certain regimes.This study sheds light on the existence of such representations in the context of memristive circuits, which we use as a toy model for more complex neuromorphic systems [4].
Circuits composed of nanodevices with memory are at the forefront of neuromorphic computing research, as their behavior often mimics synaptic plasticity observed in biological neuronal circuits.However, a comprehensive theory that effectively describes the behavior of these circuits is currently lacking.Memristive devices are a promising area of research for the development of nextgeneration computing systems [5,6].These devices are passive 1-port resistive components that have the ability to remember past voltages and currents and can change their resistance based on the history of the input signals [7,8].The experimental existence of switching in physical systems dates back to the late 60's [9], but the connection to memristive behavior has been made just over a decade ago [10][11][12].One peculiar feature of the memristive devices is that their resistance changes between two limiting values R on R of f (or analogously conductance value).The development of circuits of memristive devices has become an important area of research, as it enables the creation of neuromorphic devices that can support the existing von Neumann architecture [5,6,13] in a variety of tasks more prone to an analog computing approach.From an experimental perspective, nanowire networks have emerged as a promising material for the fabrication of disordered memristive networks.They exhibit reversible resistance switching behavior when subjected to an external electric field, making them ideal for use in memristive devices.Additionally, silver nanowires are low-cost, have high aspect ratios, and can be synthesized using a variety of techniques, making them highly versatile.The avoided crossings between the nanowires act as tunneling junctions [14,15], and for coated silver nanowires the phenomenon of filament formation is the main driver between the memristive effects that emerge [14,[16][17][18].Their behavior is particularly similar to the behavior of neuronal circuits, first and foremost, their connectivity strongly resembles the one of a neuronal connectome.In addition, the formation of filaments and their effective memristive behavior strongly parallels the plasticity of neuronal circuits.In this respect, then, memristive networks are an area of research in Physics that parallels the study of neuronal networks in biology.For instance, memristive networks have Lyapunov functions [19] similar to recurrent neural networks [20].Studying memristive circuits can provide important insights into the non-trivial dynamics of large assemblies of neuronal networks.
As an example of this, it became clear only recently that certain disordered circuits with memory exhibit a certain regularity [7,10,[21][22][23] in their dynamical behavior as the applied voltage is constant in time.This should be somewhat surprising, given that these are rather nonlinearly interacting systems, characterized by conservation laws that lead to effective nonlocal interactions, although exponentially bounded [24].Such mean field techniques have also been tested in experimentally viable systems such as nanowire connectomes [25].Although it is very well-known that circuits composed of purely memristive devices must also be memristive in 2-probe experiments [8], it is less obvious why the collective behavior should be so much similar to a single device, in particular when the network of devices is disordered.This obser-vation is not only numerical or model-based, but it has been recently shown to be true also in the experimental setup of silver nanowires [25], where a mean field theoretical description of a disordered network of nanowires could fit the potentiation and depression of the conductance.Following these results, the author proposed that there is a class of dynamical systems that, thanks to the presence of projector operators in their dynamics, have a well-defined mean field theory [26].However, rigorous results could be obtained only for simple projector operators.
On a side note, understanding the phenomenology of memristive networks has direct applications for the employment of these physical systems as reservoir computing devices at the edge of chaos, an idea proposed in the early 90's [27,28], which has seen a direct application in memristive circuits [29][30][31][32].In fact, memristive systems seem to perform better near critical transitions [33,34].In the case of the brain, the idea that the brain is in a constant critical state has been proposed in the mid 90's [35][36][37][38][39][40].
However, in the case of memristive circuits, one can have both first and second-order dynamical transitions in the conductance state.Avalanches of memristive switching are a phenomenology also discussed in the literature [41][42][43] both theoretically and experimentally.However, it is worth pointing out that these switching phenomena are strongly due to what we wish to call here boundary phenomena: they depend on how the memristive device approaches dynamically the boundary of their resistive or conductance values.We call instead a bulk phenomenon the switching is purely due to the presence of hysteresis, e.g. a pinched hysteresis loop.Boundary phenomena are based on a rapid switching of memristive devices between two resistive states.Bulk phenomena are instead due only to the presence of memory.This is for instance the case of the recent symmetry-breaking transition of [44].There, a first order transition in the conductance state is purely due to the presence of hysteresis, rather than the phenomenology of the boundary of the device.In fact, the presence or not of this transition depends only on the ratio r = R of f /R on , which at present is the best indicator of a bulk-induced transition.Previously, a mean-field theory for this type of transition was developed in a series of works [25,26,44].Other mean field theories have appeared in the literature [19,45,46], describing the Ising-like behavior.The present manuscript is an attempt of overcoming some of the limitations of previous works, incorporating the circuit properties into mean field theory.Bulk in deriving a mean-field theory for memristive network bulk transitions, by averaging only in a subclass with the same number of cycles.This is important for experimental reasons.We know for instance that in quasi-two dimensional materials that these second order transitions are only a cross over [25].One reason is that the mean field theoretical results imply implicitly the homogeneity of the network.We thus wish to go beyond this approximation, applying techniques that keep track of the density of cycles in the network.This result is important, even if for a toy model, to develop techniques to explain the phenomenology of more realistic systems.
The paper is organized as follows.First, we establish a connection between cycle-preserving transformations and the group of orthogonal transformations, introducing an equivalence relation on the cycle space.We then utilize the Weingarten calculus to average over orthogonal transformations, leading to a mean-field theory that extends previous results and renormalizes the ratio r.We show that the cycle density directly affects the transition from low to high conductance.Conclusions follow.

II. CYCLE EQUIVALENCE CLASSES AND MEMRISTIVE CIRCUITS
A. Circuits, Kirchhoff 's laws, and projector operators In circuits, Kirchhoff's laws are manifestations of the conservation of physical quantities such as charge and energy.Mathematically, these can be expressed via the introduction of projection operators [21,47], i.e. matrices Ω satisfying the constraint Ω 2 = Ω, and directly connected to circuit topology.For instance, for a resistive circuit made of identical resistances of value r in series with voltage generators in series, Ohm's law for the network can be expressed as where s is the collection of voltage generators connected in series to each resistance, while i contains the branch currents.The value of the current is connected to the effective graph resistance [48].For a circuit of identical resistors, if one picks two nodes a, b and applies a unital voltage, the effective resistance R(a, b) is equivalent to the inverse of the effective current that flows through the generator.The effective graph resistance is The graph resistance can be evaluated via the graph Laplacian, defined as the matrix ) is an edge of the graph and 0 otherwise.Then, if e a is the vector with all zeros and 1 in position a, we have that where Q −1 is intended as the pseudo-inverse of Q, e.g. the matrix obtained by diagonalizing the matrix, and inverting the diagonal matrix of eigenvalues for only the non-zero eigenvalues.It follows that It is known using the spectrum of the Laplacian that for a complete graph R ef f = V − 1.
FIG. 1. Cycles in graphs.Given an oriented graph, we assign an orientation to the edges and each cycle.Given a spanning tree T , the number of fundamental cycles are associated to the element not in T . .

B. Graph theoretic formalism
The underlying assumption of ( 1) is that the voltage generators s i 's are in series to the resistances i's, while the circuit can be represented as a graph with E edges.Given the branch currents and a certain orientation of the (fundamental) graph cycles 1, . . ., L, we can obtain the so-called cycle matrix of the circuit A, of size L × E, such that Ω = A t (AA t ) −1 A, where t denotes the transpose.The fundamental cycles L can be obtained by picking any spanning tree T , and associating the fundamental cycles in the complement of the tree T .[49].
Although the details of the derivation of Ω from the circuit topology are beyond the scope of this paper, where Ω will be kept generic and unrelated to any underlying graph, it is worth stressing its relation to the graph structure of the circuit.We begin by introducing some concepts from graph theory before defining the projector operator Ω.A walk on a directed graph G is a sequence of vertices, denoted as v 1 , v 2 , . . ., v n , such that for every pair of consecutive vertices in the sequence, there exists an edge that connects them.Formally, for every i such that , where E(G) is the edge set of the graph G.
A cycle is a walk denoted as W = v 1 v 2 . . .v n , such that n 3, v 0 = v n , and the vertices v i , 0 < i < n, are distinct from each other and from v 0 .An example of a cycle is shown in Figure 1.The space spanned by the edges of the graph has a vector space structure.The cycle space is the subset of the edge space that is spanned by all the cycles of the graph.
A cycle matrix A is a matrix whose columns form the basis of the cycle space.For example, if c 1 , . . ., c n is a set of column vectors that form a basis of the cycle space, then the cycle matrix is A = ( c 1 , . . ., c n ).Finally, the projector operator Ω on the cycle space of the graph is defined as Ω = A(A T A) −1 A T .The issue with this formalization is that calculating the cycle matrix is cumbersome, as this is a nonlocal quantity.Evaluating Ω is easier instead if one uses instead a local quantity, such as the local node connectivity.Let G be a directed graph.One way of representing G is by specifying where each of its edges starts and where it ends.It is convenient to do this using a matrix.We call such a matrix an incidence matrix.Each column of its incidence matrix represents an edge: the first edge starts at vertex 1 and ends at vertex 2, so the first column of the matrix has entry 1 in the first row and entry −1 in the second row.All the other entries in the first column are 0 because none of the other vertices are a part of that edge.By continuing this process for every edge, we get the incidence matrix B of the given graph: More formally, if a graph G has v vertices and e edges, then the incidence matrix B of G is a v × e matrix (i.e. a matrix with v rows and e columns), whose entry (i, j) is defined as is the initial vertex of the edge e j , −1 if v i is the terminal vertex of the edge e j , 0 otherwise.(4) If B is an incidence matrix, we can define the projector operator Ω B T : If we try to compute Ω B T from the definition, we will find that the inverse BB T −1 does not exist in general.This can be solved by either considering the reduced incidence matrix (obtained by removing a row from the original incidence matrix) or by taking the pseudoinverse of the expression instead of the "regular" inverse.
The following useful identity connects the projector operator Ω (based on the matrix A) to the projector operator Ω B T (based on the incidence matrix B): where I is the identity matrix.Such duality between the incidence and cycle adjacency matrix is instead a manifestation of the cycle and nodal analysis in circuit theory.Such a relationship is based on the fact that Ω and Ω B T are orthogonal matrices and the fact that AB t = 0, an identity related to the conservation of energy called Tellegen's theorem [24].

C. Orthogonal group and cycle preserving transformations
In essence, the projector Ω captures the part of a vector that can be represented as a combination of cycles in the graph, and discards any components that do not lie within the cycle space.Let us then introduce a set of transformations that preserve its spectrum.Cycle preserving transformations can be thought as transformations in which an edge is disconnected from the graph, and connected to two other nodes where such edge was not present, as we consider for simplicity simple graphs.An example of such transformation is shown in Fig. 2. As we can see, the transformation preserves the total number of edges.
Let us now introduce a very simple result, but key to the rest of this manuscript.The significance of the following lemma lies in its ability to establish a crucial connection between edge-preserving graph transformations and the orthogonal group.Despite its simplicity, this lemma plays a pivotal role in the following, by providing a fundamental insight that forms the basis for our further analyses and conclusions.
Proposition 1.Let G be a graph G = (E, V).Let θ be cycle preserving.If Ω A(G) is the projector operator on L = Span(A), we have for some orthogonal matrix O.
We say that two graphs G and G are conjugate if Ω and Ω are related by an orthogonal transformation.It is easy to see that, in fact, such transformations form an equivalence class.Such a simple result will go a long way in this paper, as it allows us to perform averages over the orthogonal transformations and derive a mean field result in each conjugacy class, e.g.averaging on graphs with the same number of cycles.
To see a direct application of this result, consider again the solution for a resistive circuit, eqn.(1).If two graphs representing the resistive circuits are conjugate, then because of Proposition 1 we know that Which implies immediately that from which we get that up to a rotation of currents and voltages, the two are identical.This implies that there is a class of circuits whose currents and applied voltages can be mapped to each other via an orthogonal transformation.This is why we can introduce an equivalence class induced by such transformations.We can then classify the number of such equivalence classes.
Using the equivalence relation, we have the following lemma for circuits Proposition 2. Let G be a graph on V nodes representing a circuit, e.g.there are no nodes of degree 1 and G is connected.Then there are at most 1 2 (V 2 − 3V + 2) and minimum 2V − 1 − V mod 2)/2 cycles respectively.This classifies the equivalence classes with fixed nodes.
Proof.The number of equivalence classes is the number of possible spectral configurations of Ω.This is given in the most general case, Λ(Ω) = {1 L , 0 E−L }.Circuits must satisfy certain relationships between the number of cycles, the number of fundamental faces and the number of vertices.First, note that the ordering of spectrum is not important, as we can permute the eigenvalues via a permutation matrix P , and then introduce the orthogonal matrix O → P OP t to reorder them.Since P OP t is still an orthogonal matrix, permutations of spectra belog to the same class.Thus, the eigenvalue ordering is not important, but only the number of 1's and 0's, the spectral fingerprint.The number of fundamental cycles is equal to the number L = E − |T | where T is the cardinality of a spanning tree.Because the graph of a circuit is connected, a spanning tree always exists with the number of edges |T | = V − 1 (e.g.we do not consider forests).It is easy to see that the number of leaves is even if V mod 2 = 0 and the number of leaves is odd if V mod 2 = 1.Let us assume V = 2k even or V = 2k + 1 odd with k 1.Then, we need to add at least k edges between the leaves to form k cycles.Thus, the minimum number of fundamental cycles is k.We can then add one edge at a time until we obtain a complete graph.Then, for a graph with V = 2k(+1) nodes, we have at most cycles (the graph in which we added the minimum number of edges to remove the leaves from the spanning tree).
Corollary 1.All graphs with fixed number of edges and vertices G = (E, V) representing a circuit belong to the same equivalence class under ∼.We call this equivalence class C(E, V).
Proof.This is a consequence of Proposition 1 and Proposition 2. If V is fixed, and E is fixed, then L = E − V + 1.Then, if E and V are fixed, then ∀G such that |E| = E and |V| = V the number of fundamental cycles is fixed.Then all graphs in the same equivalence class are related to each other via local edge moves as in Fig. 2 which preserve the number of edges and nodes.
The previous results establish that in order to remain into the same equivalence class it is sufficient to perform moves that preserve the number of edges, given the number of vertices.Then, it follows that Ω ∼ Ω , then we know the number of edges and the number of vertices must be identical.An example of such equivalence class partitioning is shown in Fig. 3.
Before we continue, let us make some comments on the relation between number of cycles, edges and nodes.In particular, it will be important to calculate the ration L/E.For a connected graph, the cardinality of a spanning tree is equal to V − 1, and thus Because of the handshaking lemma, we know that V d = 2E, from which we obtain that V E = 2 d , where d is the average degree.We then have We see from the equation above that for complete graphs this ratio goes to one.

D. Averaging over the equivalence class
One question we might ask is then: what is the average current vector on the equivalence class determined by cycle isomorphism?What we can do is average over all orthogonal matrices which span an equivalence class C(E, L), and write We can identify one element in the class C(E, V ), e.g. the graph G 0 and associate it to the identity element O 0 = I ∈ O(E), the group of orthogonal matrices of size E.Then, given For generic graphs, such average is over a finite number of elements.As an example, consider Fig. 3.For V = 5, E = 5, there is a unique representative for this equivalence class.For V = 5, E = 6, there are instead 5 representatives.This means that there is, up to permutations of nodes and edges, only a finite number of orthogonal matrices to average over, and thus the average above is over a finite sum.Let S(E, L) be the number of elements in C(E, V ).Then, our average of eqn.( 12) can be written as and thus we obtain that For finite V, E, we are not able to perform such an average in this paper.However, a possibility might be to perform the average when the number of edges and loops goes to infinity.In this case, the number of orthogonal matrices also goes to infinity.A mathematical question might then be what is the measure in the limit of E, L going to infinity?Our working assumption going forward will be that such average converges to the continuous group of orthogonal matrices, but this is a rather non-trivial statement.In the following, this will be our working assumption and in the worst-case scenario our approximation.The average over continuous orthogonal matrices can be performed using the Haar measure, e.g.we will perform the replacement lim where dO is the continuous Haar measure over the orthogonal group [50].Thus the group O(E) is equipped with a Haar probability measure, we can define the average Then, performing this average is equivalent to averaging over random orthogonal matrices with a constant measure over O(E).
To motivate this approach, let us consider a very simple example, based on the exact current solution of resistive network, eqn.(8).We have an expression of the form where the average is performed via the Haar measure over the orthogonal group.Let us anticipate one result here: it is known that the operation above is the first order isospectral twirling of the matrix Ω, and if one is acquainted with these methods, it is immediate to see that O t ΩO O ∝ I.The reason why this is the case will be clear in the next section, where we describe in detail the techniques used to perform these averages in detail.
We will just state for the sake of clarity here that such average gives Above, E is the size of the matrix O, which in our case is the number of edges of the graph, and thus resistors.The quantity Tr(Ω) represents the matrix trace, and is simply the sum over the eigenvalues of Ω.We know from the introduction that, because the Ω is a projector over the number of fundamental cycles, then the number sum of 1's and 0's in the spectrum is simply L. It follows that We see that the average over rotations of the matrix Ω leads to a trace, which contains the information over the network topological properties, such as the number of cycles.The amount above is an average over all possible real projectors with the same spectrum, and it provides information on the expected size of the current as a function of graph topological quantities, such as the number of fundamental cycles and the number of edges/resistors.Thus, this result suggests that our system of currents is equivalent to a set of uncoupled resistances of effective value r = rE L .To see whether we can make sense of this, consider E parallel resistance.We know from earlier discussions that for complete graphs, such a ratio goes to one.This is the same result obtained from the effective resistance analysis obtained in [48] using eqn.(2).We thus expect our Haar average to be informative for very dense circuits.
The second question one might ask is whether this result is typical.This means asking what is the probability that, given two isospectral graphs, a certain measure between the two vectors is far from a certain value.For instance, let us pick We can expand the expression above, and write it as where we used Ω 2 = Ω.We can average this expression again, and obtain There are now two possible cases.First, consider s 2 = O(1) e.g.not scaling with E. In the limit in which E L, or E = L such a result is typical because of concentration inequalities such as Chebyshev inequalities.This is the case when for instance we have a single cycle and a large number of resistors.The second is the case when for instance the graph is extremely dense, for instance in the case of a complete graph, in which L ∝ E. Using this technique, we can thus make estimates about the typical behavior of a certain system for very dense graphs.
While resistive circuits might not be considered as interesting as other physical systems, the point of our paper is that similar techniques can also be used in the case of dynamical resistive circuits, such as memristive circuits.Unfortunately, these are typically more complicated, and as we show in this paper, even for the case of the simplest model of memristive dynamics, exact results can be obtained only in the case of dense graphs and in the asymptotic limit.The question is then what happens then in the case of memristive circuits?

E. Toy model of memristive circuit dynamics
Let us now discuss the explicit equation of memristive dynamics for linear and current controlled devices, as a toy model for the conductance transitions.The equation of motion for a circuit of memristors has been derived in a previous study [24,44].It models a flow network that adheres to current and energy conservation laws, and the dynamics of its edges are bounded within an interval.A memristor with memory can be described by an effective resistance that depends on an internal parameter x.For example, T iO 2 memristors can be approximated by the functional form R(x) = R on (1 − x) + xR of f , where R on < R of f are the limiting resistances, and x ∈ [0, 1] physically represents the size of the oxygen-deficient conducting layer [10].The internal memory parameter x evolves, to the lowest order of description, according to a simple equation: with hard boundaries.Here, α and β are the decay constant and the effective activation voltage per unit of time, respectively, and they determine the timescales of the dynamical system.Many extensions of this basic model have been considered in the literature.For example, diffusive effects near the boundaries can be approximated by removing the hard boundaries and multiplying by a window function [51][52][53].Nonlinear conductive effects can also be included by replacing I with a function f (x, I) or introducing new parameter dependencies, such as temperature in the case of thermistors [54].Comparisons between these models [55][56][57][58] show that many of them are more faithful to the precise IV curves of physical devices, but most share the basic pinched hysteresis phenomenology of the linear model.In analytical work, it is often assumed that the dynamics are linear in the currents in order to study the behavior in a wide context.
For a single memristor under an applied voltage S, Ohm's law S = RI can be used to obtain an equation for x(t) in adimensional units , given by: where χ = and s = S αβ , with 0 χ 1 in physically relevant cases, and V (x, s) represents an effective potential.
The dynamics of the one-dimensional dynamical system is described by an ODE, and is fully characterized by gradient descent in the potential: The equation above, in the case of a circuit of purely memristive devices with a voltage generator in series, is generalized to a set of coupled ODE written compactly as [44] We can see that the ( 26) is written for arbitrary network topologies, as these enter only via Ω.
To see why Ω is connected to Kirchhoff's laws for memristive devices, consider again eqn.(26).It is easy to see that the transformation s → s + (I − Ω) k for arbitrary vectors k leaves the dynamics invariant, since Ω(I − Ω) = 0.This is a manifestation of Kirchhoff's laws [24], which are however also present in eqn.(1).However, in memristive devices described by eqn.(26) this is a subset of a larger invariance set, which are more generally of the form which preserves the dynamics, e.g.such that the time derivative is invariant, e.g.
After a bit of algebra, we get the following generalized relationship: for any vector k.To see the origin of this generalized invariance, consider for instance resistances R 1 , • • • , R n in series with k generators s i .We have To preserve the currents, we can either change R j and s j in such a way that the ratio is preserved.For networks of memristors, eqn.( 29) generalizes this simple invariance.

F. Existing results on the mean field theories and limitations
The property that we are interested in in this paper is the mean-field behavior eqn.(26).Interestingly, it has been shown that in the laminar regime, the dynamics of the whole system can be described in terms of a dynamical scalar order parameter x.
The first result of this type was based on a random matrix theory approximation of the resolvent.Recent studies have provided evidence for statistical regularities in the resolvent of large matrices, suggesting that it can be approximated by an effectively one-dimensional matrix that is universal at the zeroth-order in the limit of weak correlations [59].This result has broad applicability across various domains of complexity science [60].By defining the vector f as Ω x, and introducing the matrix Ã given by we can establish an approximate relation given by where χ is a parameter, X is the matrix of variables x, and N is the size of the matrix.This approximation allows us to express the dynamics in terms of the effective one-dimensional variable x cg , defined as the coarse-grained average of f ( x), and an operator mean = N −1 N i=1 i .The resulting effective dynamics can be written as where α, β, and S are parameters, and L( x) represents an effective force arising from imperfect coarse-graining.Notably, this dynamics resembles that of a single memristor, with the parameter s αβ replaced by the mean-field value s = Ω S αβ at the zeroth-order approximation.The effective potential above develops two minima when the parameter s, defined as the mean of sβ crosses a certain threshold.
Instead, in [25] the experimental potentiationdepression fit of the two-probe conductance was based on a different approximation, which is the mean-field approximation of the matrix inverse which minimizes the Frobenius norm, e.g. the quantity x such that (I + χΩX) −1 − (I + χΩx) −1 2 is minimized, which is a more brute but effective approximation.The circuit connectivity was then shown to reabsorbed into effective parameters that can then be fit a posteriori.More recently, in [26] it was shown that memristive circuits described by (26) belong to a larger class of dynamical systems called PEDS (Projective embedding of dynamical systems) where fixed points of larger systems can be mapped to fixed points of a lower dynamical systems, using the relationship Ω 2 = Ω.However, also there, exact results could be obtained only for very specific types of projector operators which are essentially a slight generalization of a mean-field coarse graining operator.
We see then that in both cases the topological properties of the network become less important.This should feel to be dissatisfactory, as we do expect that the topological properties of the network should matter.Thus, in what follows, we will average while preserving the global properties of the network, that in our case means preserving the spectral properties of the matrix Ω.

A. von Neumann series and its tensor representation
The key difference between the purely resistive and the purely memristive case is that the dynamics of the purely memristive case contains an infinite sum and terms of the form O t ΩO t X.To simplify the calculations that follow, we note that the memristive equation can be written in terms of the adimensional time variable τ = αt and O, and we perform the following change of variable: x = e −τ g.This simply removes the term −α x in the equation, and allows us to work temporarily only with the infinite sum.We now write the von Neumann series for the matrix inverse, in powers of χ, as It is thus immediate to see that performing the average over the orthogonal group as we set to do at the beginning of this manuscript is slightly more complicated than the case of the resistive average.Let us now perform the following matrix manipulations, using the properties of tensor products, which we introduce in A. These techniques, used commonly in quantum information, are borrowed from the theory of linear algebra when applied to tensor products of linear operators.Let us assume here that A i 's are linear operators on R E .Since we are dealing with matrices of the form (O t ΩO t X) k , let us rewrite this matrix power in a way in which we can perform the average using well known results.
Let us begin with the simplest non-trivial case, referring to App.A for the introduction to the tensor product.We have Above, the trace is partial, and over the second tensor space.The matrix S is the swap operator.
It follows that we can write where above, with an abuse of notation, we wrote at each step S ii+1 intending that it swaps only the ith and ith + 1 line, and leaves intact the other lines.Let us call . Graphically, the identity above can be seen in Fig. 4, with the swap operators acting on each tensor index.We now note that, then similarly for Ω ⊗k , O ⊗k and G ⊗k .We have then that Using this formalism, we then have The equation above is written in the form of a twirl.A twirl is an expression of the form [61] where M k is a certain linear operator on (R E ) ⊗k following the average.In tensorial graphical form, the expression is shown in Fig. 5.The internal index contractions will be shown as blue lines, and external indices to the twirl shown in red.We now ask for the following question, e.g.what is the average behavior of the circuit when we take the isospectral twirling average with respect to orthogonal operator O? The techniques used to perform these averages are well established [62][63][64][65].
We have then that The goal will be now to calculate the operators M k .

B. Haar measure, Weingarten calculus and mean field theory
In the present manuscript, we are concerned with performing averages of the form is also orthogonal, and the identity matrix is associated to the identity in the group.This is a subspace of M N (C), the algebra of E × E complex matrices with operation given by the standard matrix multiplication, and which is a compact topological space.Haar's theorem [50] states that for any locally compact Hausdorff topological group, there exists a left-translation-invariant measure and a right-translation-invariant measure that is unique up to a positive multiplicative constant.In the case of a compact group, these measures coincide and are known as the Haar measure.Let G be a group.A representation R of O on H is defined as a group homomorphism R : O → GL(H) [66].Given R, we can construct the k-fold tensor representation of G on H ⊗k by acting with R(g) ⊗k for any g ∈ G.It can be observed that if R is a valid representation, then its k-fold tensor product also forms a representation.The Haar average over the k-fold tensor product can be performed using the procedure of Weingarten calculus [62,67,68], which we will briefly review here.Consider a bounded operator A on the Hilbert space H, and let O be an orthogonal operator chosen uniformly at random from the orthogonal group O(E).We are interested in particular in the isospectral twirling.This is the following operator.The 2k-Isospectral twirling of X is defined as [61]: ) where W O g (στ −1 , E) denotes the Weingarten function of the group O(E).For the case of the orthogonal group B σ and B τ are operators associated with elements of the Brauer algebra σ and τ , respectively, and we will describe them in a moment.Eqn.(42) will be motivated in a moment by showing explicitly the contractions, and the explicit expression for the Weingarten functions in terms of the Brauer elements further below.
The Brauer algebra is a mathematical structure that was introduced by Richard Brauer in the 1930s, and it has connections to both algebra and representation theory, and is denoted here by P 2k .One interesting feature of the Brauer algebra is its connection to pairings in set theory.In particular, the Brauer algebra can be used to study pairings of elements from two sets in a meaningful way.This connection arises from the fact that the Brauer algebra can be used to represent certain operations on partitions, which are combinatorial objects that encode the ways in which a set can be divided into smaller subsets.Specifically, the Brauer algebra can be used to describe pairings of elements from two sets when the sets have a certain kind of symmetry.In set theory, a pairing is a function or a relation that maps pairs of elements from two sets to a third set.The Brauer algebra provides a way to describe and analyze these pairings in a systematic manner.Examples of Bauer pairings between 2k elements are shown in Fig. 6.
In the following, we will introduce a coloring scheme to understand better this average.As we can see in Fig. 5, the twirling involves external and internal indices, the latter being contrated with between O t Ω and O.We will denote internal indices with a blue color, and external indices with a red color.Another way of expressing eqn.( 42) is by making the indices explicit.The average over the orthogonal group is given by the formula Above, we have the following definition: The function above is an index explicit definition of the operators B σ s involved in eqn.(42).Using the formula above, the isospectral twirling is written in the form where we defined We see then that one of the summation over the Brauer's algebra elements is associated with external indices (red, τ ), while the other is associated with internally contracted indices (blue, σ).The Weingarten function depends instead only on the element α = στ −1 .The element α is essentially the superposition of σ and τ , for instance, in Fig. 6, we simply overlay two Brauer elements on top of each other.To obtain the Weingarten matrix elements, we first construct the Gram matrix Gm(σ, τ ) as follows [62,63].At the order 2k, let m be the number of Brauer' pairings.The Gram matrix is m × m, and in each corresponding matrix elements associated to σ, τ we have E c(σ,τ ) , where c(σ, τ ) is the number of closed cycles obtained by overlaying σ and τ on the same diagram.Then, the Weingarten matrix W O g (στ −1 , E) is the corresponding element of the pseudo-inverse of the matrix Gm(σ, τ ).
We now have all the elements to perform the average.We need to see the effect of the two summations and the delta function on the indices of the product of Ω's.Let us first consider the case k = 1.In this case, there are only two elements, and thus the only available pairing is σ, τ = {1, 2}.
Since there is only one component, we have This is shown in Fig. 7 (top).
Let us now focus on the case k = 2.This is already not as simple as the case of k = 1.The pairing on 4 elements are now 3: σ 1 = {{1, 3}, {2, 4}}, σ 2 = {{1, 4}, {2, 3}}, σ 3 = {{1, 2}, {3, 4}}.Thus, as we see in Fig. 7 (bottom), the average is projected over three operators, and we have where we now need to calculate As we have seen in the previous example, the quantity results in contractions over the indices i j s, and the rest are summed over.These thus become immediate that these are traces and products of traces of Ω.The contractions are shown in Fig. 8.These corresponds to the elements σ 1 → Tr(Ω) 2 , σ 2 → Tr(Ω 2 ), σ 3 → Tr(Ω 2 ), where we used the fact that Ω is a symmetric matrix.However, these must be multiplied by the orthogonal Weingarten functions, which are now nontrivial.The table of the Weingarten coefficients can be obtained by composing σ and τ .This is simplified by the observation that τ −1 = τ since these are pairings.The orthogonal Weingarten function is the pseudo-inverse of the Gram matrix, whose elements Gm(σ, τ ) = E c(σ,τ ) , where c(σ, τ ) is the number of connected components of the graph resulting from the composition of σ, τ .The number of connected components is given, inspecting Figure 9, by the Gram matrix below We thus find that, ordering the columns and rows according to τ 1 , τ 2 , τ 3 described above, the inverse of the Gram matrix yields the following Weingarten matrix for k = 2.
The Weingarten matrix is a symmetric matrix with identical elements on the diagonal, and for k = 2 identical elements on the off-diagonal.Let us call the element on the diagonal, and Then, we have Note that we used here the notation in which input indices are at the bottom, while output indices are at the top.We note that δ j1,j3 δ j2,j4 ≡ I ⊗2 , δ j1,j4 ≡ S while δ j1,j2 δ j3,j4 ≡ Π, and information theory Π is proportional to the projector onto the maximally entangled Bell state between the two copies of a Hilbert space H.In the braket notation, if one defines |Φ + as the Bell state between the two Hilbert spaces, then one has Π = E|Φ + Φ + |.

C. Perturbative average
Obtaining exact expressions up to an arbitrary order is rather cumbersome.For this reason, let us focus on expressions up to order k = 2.In this case, we have FIG.8. Contractions of the internal lines at the second order of the twirling.We see that given the pairing σ1 contributes a term Tr(Ω) 2 , while σ2 and σ3 both contribute Tr(Ω 2 ).
We see in the expression above that in order to perform this calculation, we need to calculate explicitly Using the methodology described in the previous section, it is easy to see that M 1 = Tr(Ω)| E I.
We consider the average up to the second order in χ, eqn.(53).Let us note that since Ω 2 = Ω, we have Tr(Ω 2 ) = Tr(Ω).From eqn. ( 51), we know instead that The result of these contractions can be seen in a graphical representation in Fig. 11.Inserting these expressions into eqn.(57), we obtain Using the assumption about the size of the system, e.g. that E 1, and the fact that the graph is loop dense, e.g.Tr(Ω) = L 1, we have We now define x(t) = 1 E E j=1 x i (t).It is easy to see then that e τ Tr(G)/E = x.Summing the left and right equation, and performing the mean field replacement s → s 1, we then obtain the mean-field approximation, using the fact that L/E 2 → 0, we obtain lim From summing over each index on the left, and dividing by E, we obtain where We introduce var(x) = x 2 − x2 .Using this definition, we then have, for E 1, that, introducing ρ = L E , that which is the first equation used in the main text.We see however that this equation is not closed, as it involves implicitly x 2 .We then note that From which we get, using the mean field approximation again for s, and summing cleverly on the left, we get which now depends on higher moments, implying a tower of coupled equations.We can impose closure by assuming that We will imposing closure at k = 3, imposing x 3 = r 3 constant.Note also that and thus, using var(x) = x 2 − x2 , we get Thus, we obtain the set of coupled differential equations The question is whether now these equations have physical fixed points.However, it is not hard to see that there is no attractive fixed point in the dynamical system at finite x and v.This is shown in Fig. 10, where we plot the phase portrait of the dynamics.Note that the dynamics should be constrained in the box x ∈ [0, 1].Moreover, from the Bhatia-Davis inequality [69], since 0 g i 1, we have from the identity var In particular or However, the trajectories are not constrained to this box, and thus this must imposed in the variance.More importantly, we can see that the only fixed point is a saddle, which is unphysical as we know that there must be an attractive fixed point.Modifying the parameters of the equation only modifies the location of such a fixed point, but not the spectrum of the Jacobian.We interpret this failure as the necessity to obtain non-perturbative results, which however we can only obtain using the asymptotics of the Weingarten calculus.

D. Asymptotic regime
As we have seen, the asymptotic results obtained in perturbation theory expanding in the power of χ lead to interesting but unsatisfactory results.In order to obtain non-perturbative results, we are then forced to consider the asymptotic results in the Weingarten calculus.In particular, we will use the following result [70]: This can be intuitively obtained from the definition of the Gram matrix.The dominant elements of the Gram matrix are in fact on the diagonal, as these always contribute E k , while the off-diagonal elements are of the form Inserting this expression into eqn.( 45), we then obtain that [65] Let us now look at the expressions Let us now call B τ the operator corresponding to the element τ of the Brauer algebra associated to the contractions ∆ τ ( j).Using eqn.(75), we obtain the asymptotic expression Now, note that we can write Tr(Ω js(τ ) ), We now want to argue that if L(E) grows as a function of E, then we have which is the expression we need in order to derive the large E mean field theory.We now want to show why this is the dominant contribution for each term at the order χ k .Let us now discuss quantities of the form Tr(G j ).First, note that we have that |x i | 1, and thus, since α 0, we have and thus 0 ḡ 1.Note that in general, Tr(G j ) = i g j i . Proposition Proof.Since both x i and g i are constrained to the interval [0, 1], such result holds for both G and X.Note that we have, since g i ∈ [0, 1], g r i g i , ∀r 1.It follows that [71,72], since Tr(G) E. Let us now analyze terms of the form for τ = e.This trace can be written in the form Now note that since m j 1, we have k − m 0 − s 0. Note that we assume ḡ < 1, which is the interesting case, as ḡ = 1 means that all memristors are at the fixed point, and the system is not evolving.As a result, we have shown that every loop contribution, for ḡ < 1 will be sub-dominant with respect to the identity element in the Brauer algebra.Since identity is the only term not containing loops, the result follows in the limit E → ∞.

It then follows that lim
where c k is a constant.Now, the question is whether we can swap the series with such limit.Note that the (scalar) series converges uniformly on x ∈ [0, 1).Similarly, the von Neumann series (I − A) −1 converges uniformly if ∀λ ∈ Λ(A), λ < 1.Note then that since |x i | 1 and χ < 1, we have Λ(χO t ΩOX) ∈ R(0, χ), where R(0, a) is the area in the complex plane of radius a and centered in zero.Using Proposition 3, the contribution due to the Ω traces is obtained.Then, at order k, the dominant operator contribution is of the form G k Tr(Ω) k .Going to the X variables, we have then obtained the final result lim which proves the proposition.
We use the proposition above to obtain the average dynamics over each cycle class C(E, L).We have A result which refines those of [44,59] although for particular type of random matrices and only in the large E limit.
Because of this, this implies that each memristive device is decoupled from another one on average, as G is a diagonal matrix.We can then analyze the behavior of the system by simply looking independently at each equation, and we get for the x i variables that where L = Tr(Ω) < E is the number of cycles in the graph.We have that L = E if the number memristors is equal to the number of cycles, and thus if all memristors are completely decoupled.In this case, Ω = I and in fact equation (88) is exact.Note at this point that we have the effective mean field potential We see the resemblance between eqn.(89) and eqn.(32).The equation is identical as long as we replace  Let us now make some comments on the effective value of χ, using eqn.(11) introduced earlier.In the limit E for a connected graph, we obtain that where d is the average degree.This result allows us to connect the geometrical properties of the graph to the transition properties.For a connected graph, d 2. This implies that 1 − 2 d 1.As a result, we have that if d ∝ V as in the case of a complete graph, then in the limit E → ∞ we have χ ef f → χ.For planar graphs instead, we have d 6 because of Kuratowski's lemma.It follows that 0 χef f χ At large values of s, the potential has two competing minima only for values of ρ/χ above 0.8.However, for quasi-planar graphs we know that these values are not attainable.This provides an explanation for the absence of first order bulk induced [44] transitions in the effective 2-probe conductance of silver nanowire experiments [16,25,42].

IV. CONCLUSIONS
In conclusion, this manuscript utilized techniques from the orthogonal Weingarten calculus to derive the mean field theory of memristive systems, with a specific application in mind of nanowires of low dimensionality.The results obtained shed light on the behavior of quasi twodimensional systems, revealing that they exhibit only a crossover and not a first-order switching transition in conductance as a bulk phenomenon.This finding provides important insights into the fundamental physics governing memristive behavior in nanowires, and suggests that first order transitions are possibly a boundary phenomenon, e.g.driven by avalanches in switching near the boundary for each memristive device.Of course, although our results apply to the case of a simple toy model, the techniques developed here can be used in more realistic case.
Furthermore, the developed technique has broader applicability in the theory of neuromorphic devices, as it draws connections with the physics of brain-like materials.This implies that the derived mean field theory can be extended to understand and potentially design other neuromorphic devices beyond nanowires, opening up new possibilities for the development of advanced electronic devices with brain-inspired functionalities.
The findings presented in this paper contribute to the understanding of memristive systems and their behavior in quasi two-dimensional systems, while also highlighting the broader applicability of the developed mean field theory to the field of neuromorphic devices.These results have the potential to impact the design and development of future electronic devices with applications in areas such as artificial intelligence, cognitive computing, and brain-computer interfaces.Further research and experimental validation of the proposed theory in various material systems are warranted to fully comprehend the potential of this approach in advancing the field of memristive devices and their applications.Additionally, the connection between memory and ergodicity in braininspired devices is worth noting in the context of the derived mean field theory of memristive systems using cycle classes.The understanding of how memory is encoded and processed in neuromorphic devices is crucial for the development of advanced brain-inspired computing systems [33,42].The insights obtained from the developed mean field theory provide valuable information on the role of boundary-induced transitions such as critical avalanches and ergodicity.
The findings suggest that the mean field theory can offer a deeper understanding of the interplay between memory, ergodicity, and the conductance switching behavior in quasi two-dimensional memristive systems such as nanowire connectomes.This knowledge can be leveraged to design more efficient and reliable neuromorphic devices that mimic the memory processing mechanisms of brain-like devices.

FIG. 2 .Definition 1 (
FIG. 2. Example of an isospectral transformation for the projector operator Ω.The number of fundamental cycles is preserved, and thus |Span(A)| and the number of edges is preserved.

FIG. 3 .
FIG. 3. Examples of partitioning in equivalence classes for V = 5 for graphs representing circuits.As we can see, the elements in each class are related by an edge-preserving transformation.We characterize each equivalence class as C(E, L), or alternatively C(E, V ).

41 )
Such an operator arises from the linearization of the expectation values of polynomials of O t XO.We see immediately that the operator M k (Ω) in eqn.(38) is in the form of an isospectral twirling.The Haar average will be computed by the Weingarten functions method for the orthogonal group.The Haar measure has the properties dO = 1 and dO = d(OO ) = d(O O) for any orthogonal matrix O ∈ O(E), which are referred to as the left(right)-invariance of the Haar measure.The general formula to compute the Haar average is given by[62], and for the orthogonal group reads:

FIG. 7 .
FIG. 7. Graphical representation of the orthogonal twirling at the first and second order.At the first, the only contribution is proportional to the the identity operator.

FIG. 9 .
FIG. 9.The Gram matrix Gm from which the Weingarten coefficients are derived via a pseudo-inverse.The element Gm(σ, τ ) is the number of loops produced by superposing σ, τ [63].

c(τ ) s=1 j
s (τ ) = k, (78) where c(τ ) is the number of connected components of τ .Since we have Ω k = Ω, the expression above is maximized when c(τ ) = k.This is true when τ is the identity over the Brauer algebra.As a result, we have proven the following Proposition 3. If Tr(Ω) = L > 1, then max τ Tr B τ Ω ⊗k = Tr Ω k , obtained for τ = e, the identitity in the Brauer algebra.

FIG. 11 .
FIG. 11.Tensorial representation of the partial trace occurring after the twirl orthogonal average at the order χ 2 .As we can see, the first and third terms give G 2 , while the middle term contributes Tr(G)G.
j = k, where s is the number of closed loops in the Brauer diagram resulting in the partial trace, and m j is the number of G inserted in the loop j.Using eqn.(81), we have

FIG. 12 .
FIG. 12. Effective mean field potential as a function of x and ρ/χ for the values of (a) s = 0.1 (b) s = 0.23 and (c) s = 0.3.As we can see, for values of ρ/χ above ≈ 0.8, the potential develops two competing minima as we increase s, the effective voltage.For lower values, instead, the single minimum moves smoothly as a function of s.
s i → s and χ → χ L E , and thus we can simply analyze the behavior of the effective potential as a function of voltage as we did in previous papers.
field potential as a function of ρ/χ and three values of s are shown in Fig. 12.As we can see, for low value of s the minimum is located at x = 0 for all values of χ.However, for larger values of s the potential has a minimum at intermediate values of x.
Let O be a E × E orthogonal matrix, e.g.such that O t = O −1 .The set of orthogonal matrices forms a group, which we call O(E).For any orthogonal matrix O, there exists an inverse O −1 = O t such that OO t = O t O = I.In fact, if O 1 , O 2 are orthogonal, it is easy to see that O 1 O 2