Ergodicity, lack thereof, and the performance of reservoir computing with memristive networks

Networks composed of nanoscale memristive components, such as nanowire and nanoparticle networks, have recently received considerable attention because of their potential use as neuromorphic devices. In this study, we explore the connection between ergodicity in memristive and nanowire networks, showing that the performance of reservoir devices improves when these networks are tuned to operate at the edge between two global stability points. The lack of ergodicity is associated with the emergence of memory in the system. We measure the level of ergodicity using the Thirumalai-Mountain metric, and we show that in the absence of ergodicity, two memristive systems show improved performance when utilized as reservoir computers (RC). In particular, we highlight that it is also important to let the system synchronize to the input signal in order for the performance of the RC to exhibit improvements over the baseline.


INTRODUCTION
Memristive networks are electronic circuits that use memristive devices, a type of resistive switching memory element.These networks store and recall information by modifying the device's resistance.They find applications in computer memory, neuromorphic computing, and signal processing [1].Memristive networks offer benefits such as high density, low power consumption, and potential for high-speed operation.Simultaneously, there is growing interest in alternative approaches to computation and optimization in response to the rapidly increasing demands on computing [2].Various proposals have emerged to address this challenge, some of which involve the use of oscillators or frequency domain encoding [3][4][5][6][7], leveraging near-or in-memory computation [8][9][10][11][12][13], and exploring memcomputing [12,13].These innovative techniques aim to provide more efficient solutions for complex optimization problems [3-5, 12, 14-18].However, understanding large assemblies of memristive devices in non-equilibrium statistical mechanics remains a challenge.Recent research has focused on exploring the geometric and statistical properties of nanowire networks, where electrical junctions exhibit memristive behavior due to the interplay between tunneling and filament formation phenomena [19,20].
Conductive nano-filament formation at nanowirenanowire junctions creates a memristive device, while quantum tunneling contributes to additional nonlinearity in switching behavior, making the dynamics more complex [21].As most of the voltage drop occurs at the junctions, a basic model for the conductance evolution of these networks involves an assembly of memristive devices with voltage or current generators.The dynamic behavior of these systems is currently being studied, especially regarding bias-induced conductance transitions [21][22][23][24].
Various conductance transitions have been observed in memristive devices, often characterized by transient unstable dynamics of the memristive components and their internal memory parameter.One well-known transition, which has been defined as "rumbling", has been analytically identified in the simplest model of memristive networks [23].A similar transition has also been observed in nanowire networks [21].This transition involves the system effectively moving between different minima of an effective potential and is marked by bursts of transient positive Lyapunov exponents.It arises from the coexistence of multiple low-dimensional equilibrium points in the dynamics.The mean-field theory for such systems can be ensured by mapping them to a PEDS (projective embedding of dynamical systems) [25].This is relevant to our study as we aim to identify non-ergodic behavior associated with these transitions.
In certain memristive systems, as the applied voltage increases, an effective mean potential develops with multiple minima.As one minimum becomes dominant, the system undergoes a rapid chaotic transition towards this emerging stable fixed point.This transition is distinct from the conventional Landau picture of symmetry breaking with bifurcation, as it arises from the competition between two minima.
While the understanding of these transitions is clearer in simplified memristive device models, it becomes less evident in more realistic systems.In this paper, we aim to describe the dynamic behavior of these transitions in two systems using ergodicity measures.Ergodicity, a concept in statistical mechanics and thermodynamics, pertains to the long-term behavior of a system.It states that the time average of a system over an extended period is equivalent to the ensemble average of the system.In essence, it implies that the system reaches a steady state, and its long-term average behavior aligns with the average behavior across many instances of the same initial conditions.
Ergodicity plays a crucial role in various natural sciences, such as physics and chemistry, enabling an understanding of system evolution, equilibrium attainment, and the prediction of long-term behavior based on shortterm observations.In the context of memory-dependent systems, the lack of ergodic behavior signifies (hard) ergodicity breaking, which occurs when symmetry breaking transpires in thermodynamic systems [26].Typically, physical systems operate within a regime where ergodicity holds for a subset of possible phase space states.Quantifying ergodicity and gaining insight into the dynamical state of a physical system is vital for comprehending the system's operational regime.One of the goals of this paper will also be to test the hypothesis that computation near a transition point improves.
To test this hypothesis, we will use reservoir computing (RC) as our computational model [27], which has been recently shown to be universal [28].It has been reported for instance that the "edge of chaos" may be important for the performance of RC [29], but it is important to stress that critical states have been reported both for biological neuronal networks in the brain [30] and in other artificial neural networks [31].
This article is organized as follows: in section II we introduce the general definition of ergodicity and how it is related to memory; in section III we introduce the two memristive systems we are considering in this study; in section IV we give the definition of the TM metric for the two memristive systems we are considering, and test the RC task results for both memristive systems in terms of ergodicity breaking.Conclusions follow.

II. MEMORY, ERGODIC CONVERGENCE AND THE THIRUMALAI-MOUNTAIN METRIC
Ergodicity and the emergence of memory are closely intertwined concepts within the realms of statistical physics and complexity science.Ergodicity refers to a system's property of uniformly exploring all possible states in its phase space over an extended period.In essence, it characterizes a system's behavior as it traverses its accessible states in a time-averaged manner.On the other hand, memory denotes the influence of past states on the current state of a system.
In many systems, the emergence of memory is closely connected to the violation of ergodicity.When a system is ergodic, its behavior remains independent of its past history, exhibiting no memory effect.However, when ergodicity is violated, the system may display persistent or long-term correlations, resulting in the emergence of memory.
A notable example of the relationship between ergodicity and memory can be observed in spin-glasses, which are disordered magnetic systems that exemplify complex dynamics with persistent or long-term correlations that foster memory emergence.The behavior of spin glasses is often described through two-time correlation functions, which measure correlations between spin configurations at different points in time.
Studying two-time correlation functions in spin glasses sheds light on system dynamics, including the emergence of memory and ageing phenomena.These functions measure correlations between spin configurations at different times, while the auto-correlation function reveals details about the system's relaxation dynamics.Slow decays observed in these functions signify the violation of ergodicity and the emergence of memory, while the dependence on both t and the waiting time t w highlights the ageing phenomenon in spin glasses [32] and other frustrated systems [33,34].Ergodicity refers to a system's property where the long-term behavior can be deduced from a single, extended observation of the system, rather than relying on multiple independent realizations.
According to Boltzmann's hypothesis, the trajectories of any dynamical system in its phase space eventually evolve into regions where macroscopic properties reach thermodynamic equilibrium [35].The ergodic hypothesis states that ensemble averages and time averages coincide as time progresses.This means that the ensembleaveraged value of an observable, denoted as ⟨g⟩, can be obtained by averaging its values over time during the observable's evolution.Mathematically for the observable g(t) we have: It is important to note that various definitions of ergodicity exist in the Physics and Mathematics literature [36][37][38], with significant distinctions.For example, in Markov chains, ergodic behavior requires a strongly connected transition graph, which is rarely the case.In practice, a system can exhibit ergodicity within a specific subset of its phase space.The concept of "effective ergodicity" was introduced to describe scenarios where the system rapidly samples coarse-grained regions [37].
A measure of effective ergodic convergence is based on the observation that certain components of a system exhibit identical average characteristics at thermal equilibrium.These characteristics are determined by an observable defined on the system's phase space, denoted as Γ.To assess the effective ergodic behavior of the observable, it is necessary to estimate its average value using an ensemble approach, such as a thermal ensemble.This estimation is typically performed using the Thirumalai-Mountain (TM) g-fluctuating metric, denoted as Ω g (t).
The TM g-fluctuating metric, introduced by Thirumalai and Mountain [37,39,40], quantifies the difference between the ensemble-averaged value of the observable, g(Γ), and the sum of the instantaneous values of g(t) for each component of the system.At a given time t, the TM g-fluctuating metric is expressed as: Here, ḡj (t) represents the time-averaged value per component, and ⟨g(t)⟩ denotes the instantaneous ensemble average, defined as: This definition assumes that g(Γ) serves as a suitable physical order parameter, effectively characterizing the system's behavior.The definition of instantaneous ensemble average will depend on the considered system, as it will be shown in Sec.IV.
The rate of ergodic convergence can be quantified using the derivative of the effective ergodic convergence, denoted as Ω ′ g .This quantity is given by: in the diffusive regime.When the system is instead in a sub-diffusive or super-diffusive regime, the power of the relaxation in time changes from 1 to a different exponent.Coarse graining of the phase space leads to the clustering of the system's accessible states, making the concept of effective ergodicity more applicable.Effective ergodicity is achieved when the system uniformly explores the coarse-grained regions within a finite time [39].Here, D G represents the diffusion coefficient associated with the property being studied, and Ω g refers to the effective ergodic convergence.This definition aligns with other notions of ergodicity in cases where diffusion follows a power law [41].The rate of ergodic convergence, determined using the TM metric, provides an estimate of the system's ergodicity.The behavior of the rate, as described by Eq. ( 4), indicates the system's attainment of effective ergodicity.For example, if the inverse of the rate scales linearly with time, the system reaches ergodicity in a diffusive manner: 1/Ω g → D G t, where D G = Ω G (0) represents the diffusion coefficient of the property G.It is generally expected that the rate scales with time as When the inverse of Ω ′ g exhibits a linear relationship with time, it indicates that all points in the phase space are equally likely, resembling the behavior of Brownian motion.This approach has been applied in various contexts, such as simple liquids [37], earthquake fault networks [42,43], and the Ising model [44].

III. MODELS OF MEMRISTIVE DEVICES
We would like to introduce the notion of memristive device that we will use in the following.We will consider two models of memristive device, a resistive currentcontrolled memristive device, which can be described by the equations The second model we consider is a conductance based device, of the form Both models satisfy the pinched hysteresis property for the I − V curve, characteristic of memristive devices [45].

A. Memristive network toy model
In [46], the dynamical equation for a circuit of memristors was derived under the assumption of a resistance current-controlled device.The resistance function used, R(x) = R on (1 − x) + xR of f , approximates T iO 2 memristors, where R on and R of f represent the limiting resistances, and x ∈ [0, 1] is internal memory parameter, the state variable describing the size of the oxygen-deficient conducting layer.At the lowest order, the evolution of the internal memory parameter can be described by a simple equation with hard boundaries: Here, α and β are the decay constant and the effective activation voltage per unit of time, respectively, which determine the timescales of the system.
While the model presented above provides a simple description of a polar resistive device, various extensions have been explored in the literature.For instance, to account for diffusive effects near the boundaries, some models remove the hard boundaries and introduce a window function [47,48].Although these models better capture the detailed IV curves of physical devices, they still exhibit the fundamental pinched hysteresis behavior observed in the linear model.
In adimensional units (τ = αt), the equation for x(t) in a single memristor device under an applied voltage S can be derived using Ohm's law (S = RI), [23,49].The resulting equation is: to a metastable state which acts as the boundary between the attracting stable points for a single memristive device, while the shading represents the tunneling region for the network of devices.
Here, χ = and s = S αβ , with 0 ≤ χ ≤ 1 in relevant physical cases.The dynamics of the system, represented by a single memristor device, are fully characterized by the gradient descent in the effective potential given by [23]: The potential exhibits two minima separated by a barrier, as depicted in Fig. 1 (top), for s = 0.15 and s = 0.24 with χ = 0.9.The range of s for the existence of a barrier is limited, and when χ approaches 1, the local minimum can move inside the domain [0, 1], leading to the emergence of an unstable fixed point (i.e., the peak of the barrier).Consequently, two basins of attraction and locally stable minima are formed.A pictorial phase diagram illustrating this behavior is shown in Fig. 1 (bottom), highlighting the critical voltage points at which such be-havior occurs.Below a critical voltage point V c , only one fixed point is present.At the V c value, a new fixed point emerges, but it becomes metastable in the presence of noise (dashed red line).At an intermediate point V c < V m < V c * , the two metastable points have equal energy, representing the switching point where the original stable fixed point becomes metastable.At higher values of V = V c * , the first fixed point disappears by merging with the barrier and becoming flat.Ultimately, only one fixed point remains at higher values.
For a network of memristors of this type, the meanfield potential resembles exactly the single memristor defined above.For details, see Supp.Mat. A. The key difference is that the mean-field potential is only an approximation when the system is high dimensional, e.g. now there is an effective metastable region, as shown in Fig. 1, instead of two stable minima, when one of the minima is lower than the other.Such effective "tunneling" can be explained using the theory developed in [25], i.e. the fact that local mean field maxima become saddle points in high dimensions.Here we will use this fact to study effective ergodicity breaking in memristive networks.
In fact, these two regimes can be characterized via the Thirumalai-Mountain (TM) metric introduced before.In Fig. 2 we show the TM metric as a function of time for s = 0.18 (top), s = 0.24 (center), and s = 0.25 (numerically calculated for a circuit of N = 50 memristors with parameters α = β = 1 and χ = 0.9).As we can see, in the former case the TM metric relaxes as a power law with an approximate intermediate exponent of p ∈ [−2, −1.5], typical of a diffusive regime.For s = 0.24, we see instead a typical behavior of weak ergodicity breaking, e.g. a transient non-monotonicity of the TM metric, associated with the trajectories effectively "tunneling" through the mean field barrier.This behavior is indeed transient as for s = 0.25 the TM metric relaxes again as a power law.
Using the toy model, we can then pinpoint such nonmonotonicity of the TM metric to a transient transition between two stable asymptotic states and the effective symmetry breaking of the potential.Thus, we can immediately identify the symmetry breaking of the potential as the culprit of such transient non-ergodic behavior.

B. Nanowire networks
We now consider a more realistic model of memristive networks, which can be associated with selfassemblies of silver nanowires.Via established bottomup self-assembly techniques, one can readily synthesize nanowire networks (NWNs) [22,50].These NWNs typically have a 2D spatial distribution of randomly oriented nanowires that are interconnected by cross-point MIM junctions.The NWN we consider have densities of 10 junctions/µm2 and 0.5 nanowires/µm2.Device electrodes can be deposited onto the substrate using a mask, from which the conductance measurements can be read- with the sample average calculated using the mean field value introduced in ( 13).We observe that, in the regime in which the potential has a single minimum (V < V c), the TM metric decays rapidly.Instead, for higher voltages in which two minima are present (V c < V < V c * ), the TM metric does not decay.This is a symptom of non-ergodic behavior.
ily performed.This bio-inspired structure is difficult to design and fabricate using top-down techniques.Selfassembled NWNs exhibit topological properties, such as small-world propensity and modularity, that are similar to biological neural networks and distinct from random and grid-like networks [51].Unlike fully connected bipartite networks in artificial neural networks, smallworld networks have local connectivity and short path lengths, making them relatively sparse.Although smallworldness is necessary for important functional properties, such as synchronizability and information flow, it alone cannot explain the diverse range of dynamics across networks that exhibit this structural property.
A model for the simulation of realistic NWNs has been introduced previously in the literature [21,24,52].Fig. 3 (a) shows a visualization of a simulated nanowire network containing 1000 nanowires and 6877 junctions.Self-assembly is modeled by distributing nanowires on a 3 × 3 µm 2 2D plane, with their centers uniformly sampled from [0, 3] and orientation uniformly sampled from [0, π].The lengths of the nanowires are sampled from a gamma distribution (mean = 100 nm, standard deviation 10 nm), based on experimental measurements [22].In theoretical studies, as illustrated in Fig. 3 (b), the NWN is transformed to the corresponding graphical representation, where the nodes represent nanowires and the edges are the cross-point junctions.
In this work, all simulation results for nanowire networks are generated using a network comprised of 1000 nanowires and 6877 junctions.All variables, except the adjacency matrix A, are time-dependent.A model for the conductance of a single junction, associated with the filament length is provided in the supplementary mate- rial, see Sec. A. Each junction evolves as voltage bias is continuously applied to the network.The modified nodal analysis approach is applied to the graphical representation to solve Kirchhoff's voltage and current conservation laws at each time step [53].This is equivalent to the method used for the derivation of the exact network equation for the toy model, eqn.(A1).Although the NWN model is based on polymer-coated Ag nanowires, with memristive junction internal dynamics that differ from that of the toy model (based on metal-oxide memristors), the network dynamics are similar and one should think of the two models as equivalent from a physical perspective.
For the purpose of using a NWN as a reservoir, a Mackey-Glass time-series signal with delay parameter τ = 17 is delivered to a source electrode as the input voltage signal.Before implementing the time-series prediction task using RC, a DC input of varying duration is applied to the NWN to initialize the internal state of the network and prepare it for RC.We refer to this pre-initialization protocol as "priming the system" [54].Fig. 4(a) shows the reservoir's conductance (blue curve) as a function of the DC input length T 0 .The shaded region represents the general conductance transition regime, identified from previous studies [21,24], and the dashed line at T 0 = 2.17 s represents when the first conductance pathways form between the source and drain nodes.The internal state of the network for different T 0 is visualized in Fig. 4(b).In cases where the reservoir is under-activated (T 0 < 2 s), the majority of memristive components remain inactive, resulting in insufficient dynamics from the network.When the reservoir is overactivated (T 0 > 8 s), the internal dynamics of the system become saturated, limiting the system's capacity to process additional information.The conductance transition regime from Fig. 4(a) corresponds to an intermediate dynamical state of the reservoir, where conductance paths first span the network and the internal state of the system produces dynamical features that are more diverse than at other activation times.

IV. ERGODICITY, RESERVOIR COMPUTING AND BISTABILITY
A. Toy model Reservoir computing using [55] was studied for the first time in [56], among other passive circuits.A brief recap of the procedures behind reservoir computing is provided in Supp.Mat.B.
Here, we have used a similar scheme to the reservoir, using a memristive circuit comprising N = 50 idealized memristors.
We have considered parameter values α = β = 1, and χ = 0.9, for which the system experiences a symmetry break, and where we expect its dynamics to become nonergodic, see III A. This is the regime where the reservoir is at the edge of stability, as defined in [29], where in general, reservoirs can have optimal performance, although not always guaranteed.Weak ergodicity breaking in a dynamical system is associated with a strongly chaotic regime [41].We will come back to this point later.We have also added a small noise value σ = 0.01.The equations (A2) were numerically integrated with a time step dt = 0.1.
For the input signal V (t) we have used a Mackey-Glass time series with delay parameter r = 0.2, γ = 0.1, τ = 17, and n = 10.Our input signal was given by with a ∈ [0, 10] a multiplicative factor of the input V (t) and b ∈ [−0.4,1] a parameter bias.An example of the input signal is shown in Fig. 6 (top).The parameter b represents the bias input to the network, while a is the amplitude.
For each value of a and b, we performed a Mackey-Glass reconstruction task using the internal memory states as the dynamical variables for the RC.The rMSE, which we use as a measure of the performance of the RC, is shown in Fig. 6.We can see from the figure that varying a leads to a reduced value of the rMSE, right where the rumbling transition is present, approximately The difference in the quality of the task can be observed, for the Mackey-Glass time series, in Fig. 5, for the toy model described in App. A.
At a value of b = 0.2, a plot of the rMSE is shown in Fig. 7.As we can see, the value of the rMSE decreases as a function of a, meaning that the larger the magnitude of the input signal the better the performance, but this happens only when b is carefully chosen near the transition point.The reason why this occurs can be inferred by analyzing the response of the internal memory values, which are reported in the Appendix, as a function of the parameter b and for a = 0.2, below, near, and above the transition point.When a is such that the potential has a single minimum, the value of the memory oscillates in the vicinity of that minimum.When however a is such that the potential has two minima, e.g. in the symmetry-breaking regime, at sufficiently large values of b the memory values start to oscillate between the two minima.This implies that the time evolution of the system is much more dynamic in the symmetry-broken phase, and this effectively results in an improvement in the performance of the RC.As we can see, increasing the value of the amplitude leads to the internal memory values fluctuating more prominently between the two stable states.This is also shown in Fig. 7 (b), where we plot the TM metric evaluated on the response of the system in the small Amp = 1 and large Amp = 10 amplitude regimes.In one first case, the system is still ergodic, while in the case Amp = 10 the TM metric is not converging to a value zero.
Furthermore, we have calculated the T M x metric near the transition bias a = 0.2 for varying values of the amplitude b as As the ensemble average ⟨x(t)⟩ we have consider the mean field value x cg defined as [23] x cg = 1 N We have chosen the mean field value x cg as a natural order parameter that resembles, in definition, the general meaning of ensemble average.This is actually one of the advantages of studying this model first, as in this case, we know the details of the order parameter for the whole memristive network.

B. Nanowire networks
The nanowire connectome dynamics has been extensively explored across various studies.Prior research has highlighted that nanowire Networks (NWNs) showcase brain-like dynamics, demonstrating their optimal information storage and processing capabilities at conductance transition points [21,23,24].More recently, a dynamical mean-field theoretical technique for polymercoated Ag nanowires has uncovered emergent dynamical features [57] such as transitions.In the context of the two-terminal setup, these transitions are commonly observed within a regime termed as the 'edge of formation' [21].This 'edge of formation' is delineated by the activation of memristive components, but precedes an exponential surge in the formation of parallel paths.As depicted in Fig. 4, this regime establishes a select few high-conductance current paths between the two electrodes (a phenomenon known as the 'winner-takes-all') [58].Within this state, the internal dynamics of the NWN intricately map the input signal to a diverse feature space.
In the case of the toy model, previous analytical studies provide a comprehensive understanding of both the system dynamics and the order parameters to be employed [23,49], which allows us to apply a simplification for the TM metric.Nevertheless, the same technique cannot be utilized for the nanowire model since it is more realistic and cannot be characterized in the same way.For that reason, the collective conductance of the nanowire network is used as the order parameter, which is determined based on the individual conductances of individual memristive junctions and the underlying circuitry shaped by the connectivity.In the meantime, the findings derived from the toy model will play an important role in interpreting the results of nanowires.

Thirumalai-Mountain metric
We now wish to link computation performance and ergodicity, also in the case of NWNs, we used the TM metric to understand the effective ergodicity of these dynamical systems; in particular, eqn.12 is also used to calculate the TM metric for NWNs.However, the meaning associated with each quantity is closer to the definition of the metric used in the original work on supercooled liquid, [37].The overall observable of interest is the twoprobe conductance, defined earlier for nanowires, which is a scalar.Then, the time average is calculated using the global conductance, while the collective conductance is employed as an ensemble average across various realizations of the initial conditions: where G i (t) is the two-probe conductance of a particular realization at time t and G(t) is the collective and effective conductance of the network between the two points.
For the time average quantity, note that we select a single element of the ensemble to evaluate the time average.The ensemble of different realizations is generated by randomly perturbing the filament levels of junctions in the network: where δ is randomly sampled from a flat distribution, while Λ i is the parameter controlling the length of the filaments for all junctions, while Λ 0 is the initial condition of the filament.The random variable δ is sampled from a uniform distribution over (−0.1, 0.1).Thus, effectively we are sampling over the initial conditions of the system.The TM metric, as a function of the bias, is shown in Fig. 8 (b).As we can see, and consistently with the case of the toy model, for small and large values of the bias, the metric decays.At intermediate values of the bias, at the edge of the transition between low and high conductance states, where the conductance synchronizes with the input, the TM metric fails to decay.This is exactly the regime in which the conductance transition occurs and is a signature of a special state for the nanowire network, highly synchronized with the input.

Reservoir Computing
For the realistic nanowire network model above, the implementation of RC involved the two probe conduc- tance, with designated input and readout nodes.Such construction has already been considered in the literature [54,59].The fitting task of the Mackey-Glass time series was performed using NWNs under the RC framework, with two nanowire nodes in the network selected as the source and the drain.The MG signal was linearly transformed as described in Eq. 11 and delivered to the network as input, while the same MG signal with 5 steps ahead was employed as the target.The task can be breakdown into three phases: 1. Priming: A 2 V DC input of varying length T 0 was applied to drive the internal state of the network.The first 1000 data points of the MG time-series were delivered subsequently to wash out the influence from the initial state.

Training:
The effective conductance of the network corresponding to t = 1000 − 4000 ms was measured and multiplexed using the virtual node technique [60] to provide training features (see details in Appendix).The readout layer was trained using a linear regression with a ridge parameter r = 0.01.

Testing:
The effective conductance during t = 4000 − 5000 ms was measured and multiplexed in the same fashion as the training phase, while the trained readout layer was applied to make predictions.The performance of the reservoir can thus be evaluated accordingly.
We consider two regimes: one in which the system is initialized away from the voltage transition point between the high and low conductance state, and one in which the system sits at the boundary between the two.As we can see from Fig. 9 (top), the RMSE of the RC model has a behavior very similar to the one observed for the toy model of Fig. 6.At values of the amplitude close to the transition point, in which the Thirumalai-Mountain fails to converge to zero, the physical RC performance peaks.Our intuition is that the system tuned at the point in which it is at the edge of a transition, is more prone to synchronization [29] with the input signal.This, combined with existing knowledge [54] on the average "priming" time that it takes for the system to reach synchronization with the signal, shows that the combination of input time and choice of input voltage leads to optimal performance.We can see the difference between the tuned and non-tuned network in Fig. 10.
FIG. 9: RMSE of RC prediction task with the simulated nanowire system as a function of driving.Top: RMSE as a function of the bias and the amplitude of the input signal.As we can see, the minimum is located exactly at the numerically observed transition point in bias.
Bottom: logRMSE as a function of the priming time T 0 and the amplitude of the input.We see that the optimal results are obtained after a certain initial time T 0 ≈ 2 s.

V. CONCLUSIONS
Memory effects are a critical aspect of many physical and biological systems, and they have been shown to play a vital role in the behavior of complex systems.Meanwhile, ergodicity is a property of systems that describes the degree to which they explore their phase space.In recent years, in particular, physical systems with memory such as nanowires or nanoparticle connectomes, memristive, and other nanoscale devices, have become increasingly essential candidates as substrates for synthetic intelligent devices, e.g.brain-like physical material.In particular, there are strong indications, both in theoretical models and experiments, that these devices exhibit conductance transitions both of the first and second order.These transitions have been linked to the edge of chaos, a concept that refers to the boundary between ordered and chaotic behavior in complex systems.
The present study explores the interplay between ergodicity breaking and memory in two models of memristive devices.The first model we studied was a toy model introduced in the literature to understand analytically the properties of purely memristive networks and has been instrumental in understanding their nonequilibrium properties such as Lyapunov functions [61] or conductance transitions [23].The second is a more realistic model for memristive networks composed of nanowires [21,24,54,62], used to understand the properties of polymer-coated Ag nanowires [22,63,64].In both cases, we studied the Thirumalai-Mountain [37] metric to understand how the systems relax when driven by different inputs, and in particular the ergodic properties of these systems.Both in the case of the toy model, for which the conductance transition has been studied ana-lytically using a variety of techniques [23,25,[65][66][67], and in the case of the nanowire model [21], we found that the Thirumalai-Mountain metric signals lack of ergodicity near the voltage bias value where the conductance transitions are expected to be.This is, in particular, interesting in view of the fact that it has been recently observed that edge of instability can be linked to a computational advantage [29,68].A similar result is observed in this paper.We did observe that, in particular, this is not necessarily true unless the dynamical system under study is synchronized to the input signal, as suggested in [29].In fact, this improved performance occurs only after a transient period ("priming") which allows the nanowire network to synchronize to the input signal, as shown in Fig. 9.The results described above are consistent across two different models: the toy model in which conductance transitions can be understood quantitatively and analytically, and the more realistic nanowire model able to capture the experimentally observed conductance in Ag nanowires.Thus, these results suggest that there might be an underlying common theory to explain these transitions.
In conclusion, by connecting memory effects, ergodicity, and the edge of chaos, we have identified a set of principles that can be used to create more effective computational models.Our research has shown for the first time that non-ergodic behavior can be linked to the effectiveness of reservoir computing, leading to new approaches for developing more advanced and efficient computational tools.We believe that our findings will inspire further investigations into the connections between memory effects, ergodicity, and the edge of chaos, and will lead to new and exciting developments in the field of machine learning and beyond.In particular, in future work we will discuss the application of the ideas developed in this paper to meta-plasticity with memristive systems, [69,70], in particular in view of the recent results on the meta-plasticity of nanowire networks shown in [64].
Reservoir computing is a popular approach to machine learning that utilizes a fixed, randomly connected network of nodes, known as a reservoir, to generate temporal patterns in response to an input signal.The reservoir acts as a dynamical system that maps input signals to high-dimensional feature spaces, where a readout layer can then be trained to classify or predict the desired output.This approach is particularly effective for processing time-varying data and has been applied to a range of tasks, such as speech recognition, natural language processing, and control systems.One of the key advantages of reservoir computing is that the reservoir can be predesigned and does not require tuning during the training phase, reducing the complexity of the learning process.Furthermore, the reservoir can be implemented using a variety of physical substrates, such as electronic circuits, optical systems, and biological neural networks, making it a versatile approach that can be adapted to a wide range of applications.
The training of a reservoir computer involves finding a set of optimal readout weights that map the reservoir states to the desired output.This can be formulated as a linear regression problem, where the readout weights W out are estimated by minimizing the mean squared error between the target output y(t) and the estimated output ŷ(t): where x(t i ) is the reservoir state at time step n, and N is the total number of training samples.The solution to this optimization problem can be obtained using the pseudo-inverse of the reservoir state matrix X: W out = (X t X + αI) −1 X t Y with t representing the vector or matrix transposition, where Y is the target output matrix, and α is a regularization parameter that prevents overfitting.The reservoir state matrix is defined as: and the target output matrix is defined as: Once the readout weights are estimated, they can be used to generate the output for new input signals by computing: where ŷ(t i ) is the estimated output at time step t.In the following, we will assume that x(t) are either the internal memory states or the conductance states.
In the case of a single conductance measured, in order to multiply the number of virtual nodes, we used the multiplexing technique introduced in [60].The time series is time multiplexed at certain sampling frequencies.Then, the time series is parsed into N vectors of length K, such that N K is the length of the time series.Then, elements of such vectors at locations mN + l for integer l ∈ [1, K] are considered as different internal virtual nodes.The regression then follows the same scheme as the previous section.
Reservoir computing has been successfully applied in various physical systems, including opto-electronic and nanoscale systems such as silver nanowire networks (NWNs) to implement neuromorphic computing.NWNs are particularly interesting for this application due to their memristive behavior and complex network topology that resembles that of biological neural networks.By leveraging the dynamics of NWNs as a reservoir, the input signals can be mapped to a high-dimensional state space through a random projection, followed by a simple readout operation to obtain the desired output.This allows for the implementation of complex computational tasks such as pattern recognition, time-series prediction, and control in a highly efficient and parallel manner.In this way, reservoir computing offers a promising approach to harness the capabilities of NWNs for neuromorphic computing, and silver nanowire learning represents an exciting area of research at the intersection of nanotechnology and machine learning.
To be considered a feasible reservoir, a dynamical system must satisfy the condition that its state approaches a nontrivial function of the input trajectory in the long time limit.This can be expressed in terms of two requirements [56]: Fading Memory: If the system were started from two different initial conditions while driven with the same input trajectory u, the system's trajectories should eventually converge to the same state.This implies that the system has a finite temporal memory; State Separation: Different input sequences should drive the system into different trajectories.That is, if the same initial condition were driven with two different input trajectories, the resulting reservoir trajectories must be sufficiently different.These properties could be proven analytically for the case of the memristive reservoir toy model.These conditions ensure that the state of the reservoir becomes a function of the input trajectory and that this function carries information about the input trajectory.In the case of memristive systems, the memristive junctions act as the dynamic elements of the reservoir, and the network dynamics arise from the collective behav-ior of the junctions.The input signal drives the memristive junctions to switch between high and low resistance states, resulting in a temporal sequence of network states.The final state of the network can be used as a high-dimensional feature vector for further processing.In the case of silver nanowire networks, the nanowires themselves act as the memristive elements, and their collective behavior gives rise to the network dynamics.The input signal drives the nanowires to switch between metallic and insulating states, resulting in a temporal sequence of network states that can be used for further processing.
As we saw above, reservoir computing requires the fading memory property for the dynamics of a reservoir, which is equivalent to the system tending to some degree of effective ergodicity when expressed in physical terms.The scope of this manuscript is to investigate fading memory and understand how quickly a reservoir "forgets" its initial conditions, using the Thirumalai-Mountain metric as our benchmark.The goal is to determine the optimal level of memory capacity for different system parameters.However, there are limitations to using ergodicity and the ergodic theorem, as ensemble averages are only defined for systems at equilibrium.Nonetheless, we can still consider the case of systems locally at equilibrium.
The original formulation of the TM metric aimed to predict the dynamics of a system numerically simulated on a computer without having to calculate the Lyapunov exponents of systems with a high number of variables.This is particularly relevant when dealing with large-scale systems, where calculating Lyapunov exponents can be computationally expensive.
Let us now briefly introduce the time series we used for our benchmarks.The Mackey-Glass time series is a widely used benchmark for testing the performance of time-series prediction models.It is a nonlinear, nonautonomous, and nonperiodic time series that exhibits chaotic dynamics.The series is defined by the following difference equation: where x(t) is the value of the time series at time t, β, γ, n, and τ are parameters that control the dynamics of the system.The Mackey-Glass time series exhibits a range of dynamic behaviors, depending on the values of these parameters.For example, for r = 0.2, γ = 0.1, τ = 17, and n = 10, the time series exhibits chaotic dynamics with a positive Lyapunov exponent.

FIG. 1 :
FIG. 1: Top: Evolution of the mean-field potential for the toy model as a function of voltage.Bottom: Phase diagram of the toy model, and the appearance and disappearance of stable points.The intermediate region in voltage between V c and V c * is characterized by the coexistence of two stable points.The buffer region correspondsto a metastable state which acts as the boundary between the attracting stable points for a single memristive device, while the shading represents the tunneling region for the network of devices.

FIG. 2 :
FIG.2: Thirumalai-Mountain metric as a function of time and amplitude, calculated for the stochastic memristive model with noise, with the sample average calculated using the mean field value introduced in (13).We observe that, in the regime in which the potential has a single minimum (V < V c), the TM metric decays rapidly.Instead, for higher voltages in which two minima are present (V c < V < V c * ), the TM metric does not decay.This is a symptom of non-ergodic behavior.

FIG. 3 :
FIG. 3: Example of a nanowire network generated with the random wire model.(a) Simulated NWN with 1024 nanowires and 6877 junctions.(b) Graphical representation of the NWN in (a).

FIG. 4 :
FIG. 4: (a) Collective conductance of the network between the source and drain nodes with a DC input of varying pulse width.Dashed red line indicates the edge of formation of a high conductance current pathway between the source and drain nodes.(b) Visualization of network activation levels for different pre-initilization pulse widths T0.

FIG. 5 :
FIG.5: Reservoir computing prediction task with the toy model, for b = 0.2, and a = 0.01 (a) and a = 10 (b)respectively.As we can see, the regime of optimal prediction corresponds both the regime in which the system's Thirumalai-Mountain metric does not converge to zero.

FIG. 6 :
FIG. 6: MSE of the Mackey-Glass reconstruction as a function of the parameters a (Amp) and b (bias).

FIG. 7 :
FIG.7:(a): Reservoir computing prediction error as a function of the amplitude of the input signal, while the bias is fixed on the transition point.We can see that rMSE decreases as a function of the amplitude.(a): Thirumalai-Mountain metric as a function of time for various amplitudes.We see that for increasing amplitudes, the metric ceases to converge to zero, indicating an effective non-ergodic behavior.In the intermediate regime, we see oscillations due to the fact that the system is jumping from one minimum to the other.

FIG. 8 :
FIG. 8: (a) Nanowire system simulated conductance, using the Mackey-Glass time series as input, for various values of the bias voltage b.As we can see, for small values and larger values of the bias, the conductance decays and grows to the asymptotic value.At intermediate values of the bias, the conductance oscillates.(b) Thirumalai-Mountain metric for the effective conductance of the nanowire network.

FIG. 10 :
FIG. 10: Fitting result for different biases, for amp = 4.9 V, T0 = 2 s.As we can see, near the transition the performance of the RC prediction task increases dramatically.The red dashed line represents the divide between training and test sets.