Quick search Find article
Quick search
Find article
New J. Phys. 9 (2007) 109
doi:10.1088/1367-2630/9/4/109
PII: S1367-2630(07)38155-X

Neural networks with transient state dynamics

Claudius Gros1

Institute of Theoretical Physics, J.W. Goethe University Frankfurt, 60438 Frankfurt, Germany

1 http://itp.uni-frankfurt.de/̃gros

Received 28 November 2006
Published 30 April 2007

Abstract. We investigate dynamical systems characterized by a time series of distinct semi-stable activity patterns, as they are observed in cortical neural activity patterns. We propose and discuss a general mechanism allowing for an adiabatic continuation between attractor networks and a specific adjoined transient-state network, which is strictly dissipative. Dynamical systems with transient states retain functionality when their working point is autoregulated—avoiding prolonged periods of stasis or drifting into a regime of rapid fluctuations. We show, within a continuous-time neural network model, that a single local updating rule for online learning allows simultaneously (i) for information storage via unsupervised Hebbian-type learning, (ii) for adaptive regulation of the working point and (iii) for the suppression of runaway synaptic growth. Simulation results are presented; the spontaneous breaking of time-reversal symmetry and link symmetry are discussed.

Contents

1. Introduction

Dynamical systems are often classified with respect to their long-time behaviours, which might be, e.g., chaotic or regular [1]. Of special interest are attractors, cycles and limiting cycles, as they determine the fate of all orbits starting within their respective basins of attraction.

Attractor states play a central role in the theory of recurrent neural networks, serving the role of memories with the capability to generalize and to reconstruct a complete memory from partial initial information [2]. Attractor states in recurrent neural networks face however a fundamental functional dichotomy, whenever the network is considered as a functional subunit of an encompassing autonomous information processing system, viz an autonomous cognitive system [3]. The information processing comes essentially to a standstill once the trajectory closes in at one of the attractors. Restarting the system `by hand' is a viable option for technical applications of neural networks, but not within the context of autonomously operating cognitive systems.

One obvious way out of this dilemma would be to consider only dynamical systems without attractor states, i.e. with a kind of continuously ongoing `fluctuating dynamics', as illustrated in figure 1, which might possibly be chaotic in the strict sense of dynamical system theory. The problem is then, however, the decision-making process. Without well-defined states, which last for certain minimal periods, the system has no definite information-carrying states onto which it could base the generation of its output signals. It is interesting to note, in this context, that indications for quasi-stationary patterns in cortical neural activity have been observed [4]–[6]. These quasi-stationary states can be analysed using multivariate time-series analysis, indicating self-organized patterns of brain activity [7]. Interestingly, studies of EEG recordings have been interpreted in terms of brain states showing aperiodic evolution states going through sequences of attractors that on access support the experience of remembering [8]. These findings suggest that `transient-state dynamics', as illustrated in figure 1, might be of importance for cortical firing patterns.

Figure 1

Figure 1. Illustration of fluctuating (top) and transient-state dynamics (bottom).

It is possible, from the viewpoint of dynamical system theory, to consider transient states as well-defined periods when the orbit approaches an attractor ruin. With a transient attractor, or attractor ruin, we denote here a point in phase space which could be turned continuously into a stable attractor when tuning certain of the parameters entering the evolution equations of the dynamical system. The dynamics slows down close to the attractor ruin and well-defined transient states emerge within the ensemble of dynamical variables. The notion of transient-state dynamics is related conceptually to chaotic itinerancy (see [9] and references therein), a term used to characterize dynamical systems for which chaotic high-dimensional orbits stay intermittently close to low-dimensional attractor ruins for certain periods. Instability due to dynamic interactions or noise is necessary for the appearance of chaotic itinerancy.

Having argued that transient-state dynamics might be of importance for a wide range of real-world dynamical systems, the question is then how to generate such a kind of dynamical behaviour in a controllable fashion and in a manner applicable to a variety of starting systems, viz we are interested in neural networks which generate transient-state dynamics in terms of a meaningful time series of states approaching arbitrarily close predefined attractor ruins.

The approach we will follow here is to start with an original attractor neural network and to transform then the set of stable attractors into transient attractors by coupling to auxiliary local variables, which we denote `reservoirs', governed by long timescales. We note that related issues have been investigated in the context of discrete-time, phase coupled oscillators [10], for networks aimed at language processing in terms of `latching transitions' [11, 12], and in the context of `winnerless competitions' [13]–[15]. Further examples of neural networks capable of generating a time series of subsequent states are neural networks with time-dependent asymmetric synaptic strengths [16] or dynamical thresholds [17]. We also note that the occurrence of spontaneous fluctuating dynamics has been studied [18], especially in relation to the underlying network geometry [19].

An intrinsic task of neural networks is to learn and to adapt to incoming stimuli. This implies, for adaptive neural networks, a continuous modification of their dynamical properties. The learning process could consequently take the network, if no precautions are taken, out of its intended working regime, the regime of transient-state dynamics. Here we will show that it is possible to formulate local learning rules which keep the system in its proper dynamical state by optimizing continuously its own working point. To be concrete, let us denote with \bar{t} the average duration of quasi-stable transient states and with Δ t the typical time needed for the transition from one quasi-stationary state to the next. The dynamical working point can then be defined as the ratio \Delta t/\bar t.

These timescales, \bar t and Δ t, result, for the network of cortical neurons, from the properties of the individual neurons, which are essentially time-independent, and from the synaptic strengths, which are slow dynamical variables subject to Hebbian-type learning [20]. It then follows that the modifications of the inter-neural synaptic strengths have a dual functionality: on one side they are involved in memory storage tasks [20], and on the other side they need to retain the working point in the optimal regime. Here we show that this dual functionality can be achieved within a generalized neural network model. We show that working-point optimization is obtained when the Hebbian learning rule is reformulated as an optimization procedure, resulting in a competition among the set of synapses leading to an individual neuron. The resulting learning rule turns out to be closely related to rules found to optimize the memory-storage capacity [21].

2. Model

2.1. Clique encoding

Neural networks with sparse coding, viz with low mean firing rates, have very large memory-storage capacities [22]. Sparse coding results, in extremis, in a `one-winner-take-all' configuration, for which a single unit encodes exactly one memory. In this limit the storage capacity is, however, reduced again and linearly proportional to the network size, as in the original Hopfield model [23]. Here we opt for the intermediate case of `clique encoding'. A clique is, in terms of graph theory, a fully interconnected subgraph, as illustrated in figure 2 for a 7-site network. Clique encoding corresponds to a `several-winners-take-all' set-up. All members of the winning clique mutually excite each other while suppressing the activities of all out-of-clique neurons to zero.

Figure 2

Figure 2. Geometry and simulation results for a small, 7-site network. Left panel: the links with wi,j > 0, containing six cliques, (0, 1), (0, 6), (3, 6), (1, 2, 3) (which is highlighted), (4, 5, 6) and (1, 2, 4, 5). Right panel: as a function of time, the activities xi(t) (solid lines) and the respective reservoirs varphii(t) (dashed lines) for the transient-state dynamics (4,5,6)→(1,2,3)→ (0,6) →(1,2,4,5). For the parameter values see section 2.6.

We note that the number of cliques can be very large. For illustration let us consider a random Erdös–Rényi graph with N vertices and linking probability p. The overall number of cliques containing Z vertices is then statistically given by

Equation (1)

where pZ(Z–1)/2} is the probability of having Z sites of the graph fully interconnected by Z(Z–1)/2 edges and where the last term is the probability that every single vertice of the NZ out-of-cliques vertices is not simultaneously connected to all Z sites of the clique.

Networks with clique encoding are especially well suited for transient-state dynamics, as we will discuss further below, and are biologically plausible. Extensive sensory preprocessing is known to occur in the respective cortical areas of the brain [20], leading to representations of features and objects by individual neurons or small cell assemblies. In this framework a site, viz a neural centre, of the effective neural network considered here corresponds to such a small cell assembly and a clique to a stable representation of a memory, by binding together a finite set of features extracted by the preprocessing algorithms from the sensory input stream.

2.2. Continuous time dynamics

For our study of possible mechanisms of transient-state dynamics in the context of neural networks we consider i  =  1, ...,N artificial neurons with rate encoding xi(t) and continuous time tin [0,∞]. Let us comment briefly on the last point. The majority of research in the field of artificial neural networks deals with the case of discrete time t  =  0,1,2, ... [20]. We are however interested, as discussed in the introduction section, in networks exhibiting autonomously generated dynamical behaviours, as they typically occur in the context of complete autonomous cognitive systems. We are therefore interested in networks having update rules being compatible with the interaction with other components of a cognitive system. Discrete time updating is not suitable in this context, since the resulting dynamical characteristics (i) depend on the choice of synchronous versus asynchronous updating and (ii) are strongly influenced when effective recurrent loops arise due to the coupling to other components of the autonomous cognitive system. We therefore consider and study here a model with continuous time.

2.3. Neural network model

We denote the state variables encoding the activity level by xi(t) and assume them to be continuous variables, xiin[0,1]. Additionally, we introduce for every site a variable varphii(t)in[0,1], termed `reservoir', which serves as a fatigue memory facilitating the self-generated time series of transient states. We consider the following set of differential equations:

Equation (2)

Equation (3)

Equation (4)

Equation (5)

We now discuss some properties of (2)–(5), which are suitably modified Lotka–Volterra equations.

1.  

Normalization. Equations (2)–(4) respect the normalization xi, varphiiin[0,1], due to the prefactors xi, (1–xi), varphii and (1–varphii) in equations (2) and (4), for the respective growth and depletion processes. Θ(r) is the Heaviside-step function: Θ(r < 0)  =  0 and Θ(r > 0)  =  1.

2.  

Synaptic strength. The synaptic strength is split into an excitatory contribution propto wi,j and an inhibitory contribution propto zi,j, with wi,j being the primary variable: the inhibition zi,j is present only when the link is not excitatory (5). We have used z ≡ –1, viz |z|  =  1 throughout the paper, which then defines the inverse reference unit for the time development.

3.  

Winners-take-all network. Equations (2) and (3) describe, in the absence of a coupling to the reservoir via fz/w(varphi), a competitive winners-take-all neural network with clique encoding. The system relaxes towards the next attractor made up of a clique of Z sites (p1, ...,pZ) connected via excitatory links w_{p_i,p_j} > 0\;(i,j=1,\ldots,Z).

4.  

Reservoir functions. The reservoir functions f_{z/w}(\varphi)\in[0,1] govern the interaction between the activity levels xi and the reservoir levels varphii. They may be chosen as washed out step functions of sigmoidal form,Note2  with a suitable width Γvarphi and inflection points \varphi_{\rm c}^{(w/z)}, see figure 3.

5.  

Reservoir dynamics. The reservoir levels of the winning clique deplete slowly, see equation (4) and figure 2, and recover only once the activity level xi of a given site has dropped below xc, which defines a site to be active when xi  >  xc. The factor ( 1-x_i/x_{\rm c}) occurring in the reservoir growth process, see the rhs of (4), serves for a stabilization of the transition between two subsequent memory states. When the activity level xi of a given centre i drops below xc, it cannot be reactivated immediately; the reservoir cannot fill up again for xixc, due to the (1 – xi/xc) in (4).

6.  

Separation of timescales. A separation of timescales is obtained when the \Gamma_\varphi^{\pm} are much smaller than the typical strength of an active excitatory link, i.e. of a typical wij  >  0, leading to transient-state dynamics. Once the reservoir of a winning clique is depleted, it loses, via fz(varphi), its ability to suppress other sites and the mutual intra-clique excitation is suppressed via fw(varphi).

7.  

Absence of stationary solutions. There are no stationary solutions with \dot{x}_i=0=\dot\varphi\break (i=1,\ldots,N) for equations (2) and (4), whenever \Gamma_\varphi^{\pm} > 0 do not vanish and for any non-trivial coupling functions f_{w/z}(\varphi)\in[0,1].

Figure 3

Figure 3. Left panel: illustration of the reservoir functions fz/w(varphi), see equation (3), of sigmoidal form, see footnote 1 with respective turning points \varphi_{\rm c}^{(f/z)}, a width Γvarphi and a minimal value fz(min)  =  0. Right panel: distribution of the synaptic strength for the inhibitory links zij  <  –|z| and the active excitatory links 0  <  wij  <  w leading to clique encoding. Note that w is not a strict upper bound, due to the optimization procedure (11). The shaded area just below zero is related to the inactive wij, see equations (5) and (11).

When decoupling the activities and the reservoir by setting fw/z(varphi) ≡ 1 one obtains stable attractors with xi  =  1/0 and varphii  =  0/1 for sites belonging/not-belonging to the winning clique, compare figure 4.

Figure 4

Figure 4. The attractors of the original network, viz when the coupling to the reservoir is turned off by setting fw/z(varphi) ≡ 1, correspond to xi  =  1/0 and \varphi_i=0/1\;(i=1,\dots, N) for members/non-members of the winning clique. A finite coupling to the local reservoirs varphii leads to orbit {xi(t), varphii(t)} which are attracted by the attractor ruins for short timescales and repelled for long timescales. This is due to a separation of timescales, as the time evolution of the reservoirs varphii(t) occurs on timescales substantially slower than that of the primary dynamical variables xi(t).

In figure 2 the transient-state dynamics resulting from equations (2)–(5) is illustrated. Presented in figure 2 are data for the autonomous dynamics in the absence of external sensory signals; we will discuss the effect of external stimuli further below. We present in figure 2 only data for a very small network, containing seven sites, which can be easily represented graphically. We have also performed extensive simulations for very large networks, containing several thousands of sites, and found stable transient-state dynamics.

2.4. Role of the reservoir

The dynamical system discussed here represents in first place a top-down approach to cognitive systems and a one-to-one correspondence with cortical structures is not intended. The set-up is however inspired by biological analogies and we may identify the sites i of the artificial neural network described by equation (2) not with single neurons, but with neural assemblies or neural centres. The reservoir variables varphii(t) could therefore be interpreted as effective fatigue processes taking place in continuously active neural assemblies, the winning coalitions.

It has been proposed [24] that the neural coding used for the binding of heterogeneous sensory information in terms of distinct and recognizable objects might be temporal in nature. Within this temporal coding hypothesis, which has been investigated experimentally [25], neural assemblies fire in phase, viz synchronous, when defining the same object and asynchronous when encoding different objects. There is a close relation between objects and memories in general. An intriguing possibility is therefore to identify the memories of the transient-state network investigated in the present approach with the synchronous firing neurons of the temporal coding theory. The winning coalition is characterized by high reservoir levels which would then correspond to the degree of synchronization within the temporal encoding paradigm and the reservoir depletion time \sim\kern-2pt1/\Gamma_\varphi^- would correspond to the decoherence time of the object binding neurons.

We note that this analogy can however not be carried too far, since synchronization is at its basis a cooperative effect, the reservoir levels describing on the other side single-unit properties. In terms of a popular physics phrase one might speak of a `poor man's' approach to synchronization, via coupling to a fatigue variable.

2.5. Dissipative dynamics

The reason for the observed numerical and dynamical robustness can be traced back to its relaxational nature. For short timescales we can consider the reservoir variables {varphii} to be approximatively constant and the system relaxes into the next clique attractor ruin. Once close to a transient attractor, the {xi} are essentially constant, viz close to one/zero and the reservoir slowly depletes. The dynamics is robust against noise, as fluctuations affect only details of both relaxational processes, but not their overall behaviour.

To be precise we note that the phase space contracts with respect to the reservoir variables, namely

\[\fl\sum_i {\partial \dot\varphi_i\over\partial\varphi_i} = -\sum_i\left\lbrack\Gamma_\varphi^+(1-x_i/x_{\rm c})\Theta(x_{\rm c}-x_i)+\Gamma_\varphi^-\Theta(x_i-x_{\rm c})\right\rbrack \leqslant\ 0, \quad\qquad \forall x_i\in[0,1],\]

where we have used (4). We note that the diagonal contributions to the link matrices vanish, zii  =  0  =  wii, and therefore \partial r_i/\partial x_i =0. The phase space contracts consequently also with respect to the activities,

\[\sum_i {\partial \dot x_i\over\partial x_i} = \sum_i [\Theta(-r_i) -\Theta(r_i)] r_i \leqslant\ 0,\]

where we have used (2). The system is therefore strictly dissipative, in the absence of learning and external perturbations, leading to the observed numerically robust behaviour.

2.6. Strict transient-state dynamics

The self-generated transient-state dynamics shown in figure 2 exhibits well-characterized plateaus in the xi(t), since small values have been used for the depletion and the growth rate of the reservoir, \Gamma_\varphi^-=0.005 and \Gamma_\varphi^+=0.015. The simulations presented in figure 2 were performed using wij  =  0.12 for all nonzero excitatory interconnections.

We define a dynamical system to have `strict transient-state dynamics' if there exists a set of control parameters allowing to turn the transient states adiabatically into stable attractors. Equations (2)–(5) fulfil this requirement, for \Gamma_\varphi^-\to 0 the average duration \bar{t} of the steady-state plateaus observed in figure 2 diverges.

Alternatively, by selecting appropriate values for \Gamma_\varphi^- and \Gamma_\varphi^+, it is possible to regulate the `speed' of the transient-state dynamics, an important consideration for applications. For a working cognitive system, such as the brain, it is enough that the transient states are stable just for a certain minimal period needed to identify the state and to act upon it. Anything longer would just be a `waste of time'.

2.7. Universality

We note that the mechanism for the generation of stable transient-state dynamics proposed here is universal in the sense that it can be applied to a wide range of dynamical systems in a frozen state, i.e. which are determined by attractors and cycles.

Physically, the mechanism we propose here is to embed the phase space {xi} of an attractor network into a larger space, {xi, varphij}, by coupling to additional local slow variables varphii. Stable attractors are transformed into attractor ruins since the new variables allow the system to escape the basin of the original attractor {xi  =  1/0, varphij  =  0/1} (for in-clique/out-of-clique sites) via local escape processes which deplete the respective reservoir levels varphii(t). Note that the embedding is carried out via the reservoir functions fz/w(varphi) in equation (3) and that the reservoir variables keep a slaved dynamics (4) even when the coupling is turned off by setting f_{z/w}(\varphi)\to 1 in equation (3).

This mechanism is illustrated in figure 4. Locality is an important ingredient for this mechanism to work. The trajectories would otherwise not come close to any of the attractor ruins again, viz to the original attractors, being repelled by all of them with similar strengths and fluctuating dynamics of the kind illustrated in figure 1 would result.

2.8. Cycles and time-reversal symmetry

The systems illustrated in figures 2 and 5 are very small and the transient-state dynamics soon settles into a cycle of attractor ruins, since there are no incoming sensory signals considered in the respective simulations. For networks containing a larger number of sites, the number of attractors can be however very large and such the resulting cycle length. We performed simulations for a 100-site network, containing 713 clique-encoded memories. We found no cyclic behaviour even for sequences of transient states containing up to 4400 transient states. We note that the system does not necessarily retrace its own trajectory once a given clique is stabilized for a second time, an event which needs to occur in any finite system, the reason being that the distribution of reservoir levels is in general different when a given clique is revisited for a second time.

Figure 5

Figure 5. Geometry and simulation results for a cyclic 9-site network with symmetric excitatory links wij  =  wji. Left: the links with wi,j  >  0, containing six cliques, (0,1), (1,2,3) (which is highlighted), (3,4), (4,5,6), (6,7) and (7,8,0). Right: as a function of time, the activities xi(t) for the cyclic transient-state dynamics (1, 2, 3) → (7, 8, 0) → (4, 5, 6,) → ···, for the parameter values see section 2.6. Both directions (clockwise/anticlockwise) of `rotation' are dynamically possible and stable, the actual direction being determined by the dynamical initial conditions.

We note that time-reversal symmetry is `spontaneously' broken in the sense that repetitive transient-state dynamics of type

\[ ({\rm clique}\ A) \to ({\rm clique}\ B)\to ({\rm clique}\ A) \to ({\rm clique}\ B)\to \cdots\]

does generally not arise. The reason is simple. Once the first clique is deactivated its respective reservoir levels need a certain time to fill up again, compare figure 2. Time-reversal symmetry would be recovered however in the limit \Gamma_\varphi^+ \gg \Gamma_\varphi^-, i.e. when the reservoirs would be refilled much faster than depleted.

2.9. Reproducible sequence generation

Animals need to generate sequences of neural activities for a wide range of purposes, e.g. for movements or for periodic internal muscle contractions, the heartbeat being a prime case. These sequences need to be successions of well-defined firing patterns, usable to control actuators, viz the muscles. The question then arises under which condition a dynamical system generates reproducible sequences of well-defined activity patterns, i.e. controlled time series of transient states [26, 27].

There are two points worth noting in this context.

1.  

The dynamics described by equations (2)–(4) works fine for randomly selected link matrices wij which may, or may not change with time passing. In particular one can select the cliques specifically in order to induce the generation of a specific succession of transient states, an example is presented in figure 5. The network is capable, as a matter of principle, to generate robustly large numbers of different sequences of transient states. For geometric arrangements of the network sites, and of the links wij, one finds waves of transient states sweeping through the system.

2.  

In section 3 we will discuss how appropriate wij can be learned from training patterns presented to the network by an external teacher. We will concentrate in section 3 on the training and learning of individual memories, viz of cliques, but suitable sequences of training patterns could be used also for learning temporal sequences of memories.

3. Autonomous online learning

An external stimulus, {bi(ext)(t)}, influences the activities xi(t) of the respective neural centres. This corresponds to a change of the respective growth rates ri,

Equation (6)

compare equation (3), where fz(varphii) is an appropriate coupling function, depending on the local reservoir level varphii. When the effect of the external stimulus is strong, namely when fzbi(ext) is strong, it will in general lead to an activation xi → 1 of the respective neural centre i. A continuously active stimulus does not convey new information and should, on the other hand, lead to habituation, having a reduced influence on the system. A strong, continuously present stimulus leads to a prolonged high activity level xi → 1 of the involved neural centres, leading via (4) to a depletion of the respective reservoir levels, on a timescale given by the inverse reservoir depletion rate, 1/\Gamma_\varphi^-. Habituation is then mediated by the coupling function fz(varphii) in (6), since fz(varphii) becomes very small for varphii → 0, compare figure 3. The effect of habituation incorporated in (6) therefore allows the system to turn its `attention' to other competing stimuli, with novel stimuli having a higher chance to affect the ongoing transient-state dynamics.

We now provide a set of learning rules allowing the system to acquire new patterns on the fly, viz during its normal phase of dynamical activity. The alternative, modelling networks having distinct periods of learning and of performance, is of widespread use for technical applications of neural networks, but it is not of interest in our context of continuously active cognitive systems.

3.1. Short- and long-term synaptic plasticities

There are two fundamental considerations for the choice of synaptic plasticities adequate for neural networks with transient-state dynamics.

1.  

Learning is a very slow process without a short-term memory. Training patterns need to be presented to the network over and over again until substantial synaptic changes are induced [20]. A short-term memory can speed up the learning process substantially, as it stabilizes external patterns and hence gives the system time to consolidate long-term synaptic plasticity.

2.  

Systems using sparse coding are based on a strong inhibitory background, the average inhibitory link-strength |z| is substantially larger than the average excitatory link-strength \bar{w},

\begin{equation}
|z| \gg\ \bar w.
\end{equation}

It is then clear that gradual learning is effective only when it affects dominantly the excitatory links: small changes of large parameters do not lead to new transient attractors, nor do they influence the cognitive dynamics substantially.

It then follows that it is convenient to split the synaptic plasticities into two parts,

Equation (7)

where the wijS/L correspond to the short-term and to the long-term synaptic plasticities respectively.

3.2. Negative baseline

Equation (5), z_{ij} = -|z|\,\Theta(-w_{ij}), implies that the inhibitory link-strength is either zero or –|z|, but is not changed directly during learning, in accordance with the above discussion. We may therefore consider two kinds of `excitatory links strengths'.

1.  

Active: an active w_{i,j} > 0 is positive and enforces z_{ij}=0 for the same link, via equation (5).

2.  

Inactive: an inactive wi,j  <  0 is slightly negative, we use w_{i,j}=W_{\rm L}^{\rm (min)} < 0 as a default. It enforces zij  =  /|z| for the same link, via equation (5) and does not contribute to the dynamics, since the excitatory links enter as \theta(w_{i,j}) in (3).

When wi,j acquires, during learning, a positive value, the corresponding inhibitory link zij is turned off via equation (5) and the excitatory link wi,j determines the value of the respective term, f_w(\varphi_i) \Theta(w_{ij}) w_{i,j} + z_{i,j}f_z(\varphi_j), in equation (3). We have used a small negative baseline of W_{\rm L}^{\rm (min)}=-0.01 throughout the simulations.

3.3. Short-term memory dynamics

It is reasonable to have a maximal possible value W_{\rm S}^{(\max)} for the short-term synaptic plasticities. We consider therefore the following Hebbian-type learning rule:

Equation (8)

w_{ij}^{\rm S}(t) increases rapidly, with rate \Gamma_{\rm S}^+, when both the pre- and the post-synaptic neural centres are active, viz when their respective activities are above xc. Otherwise it decays to zero, with a rate \Gamma_{\rm S}^-. The coupling functions fz(varphi) preempt prolonged self-activation of the short-term memory. When the pre- and the post-synaptic centres are active long enough to deplete their respective reservoir levels, the short-term memory is shut off via the fz(varphi). We have used \Gamma_{\rm S}^+=0.1, \Gamma_{\rm S}^-=0.0005 and W_{\rm S}^{(\max)}=0.02 and x_{\rm c}=0.85 throughout the simulations.

In figure 6 we present the time evolution of some selected w_{ij}^{\rm S}(t), for a simulation using the network illustrated in figure 2. The short-term memory is activated in three cases.

1.  

When an existing clique, viz a clique encoded in the long-term memory w_{ij}^{\rm L}, is activated, as it is the case of (0, 1) for the data presented in figure 6, the respective intra-clique w_{ij}^{\rm S} are also activated. This behaviour is a side effect since, for the parameter values chosen here, the magnitudes of the short-term link-strengths are substantially smaller than those of the long-term link-strengths.

2.  

During the transient-state dynamics there is a certain overlapping of a currently active clique with the subsequent active clique. For this short time span the short-term plasticities w_{ij}^{\rm S} for synapses linking these two cliques get activated. An example is the link (2, 4) for the simulation presented in figure 6.

3.  

When external stimuli act on two sites not connected by an excitatory long-term memory link w_{ij}^{\rm L}, the short-term plasticity w_{ij}^{\rm S} makes a qualitative difference. It transiently stabilizes the corresponding link and the respective link becomes a new clique (i, j) either by itself, or as part of an enlarged and already existing clique. An example is the link (3, 6) for the simulation presented in figure 6. Note however that, without subsequent transferral into the long-term memory, these new states would disappear with a rate \Gamma_{\rm S}^- once the causing external stimulus was gone.

The last point is the one of central importance, as it allows for temporal stabilization of new patterns present in the sensory input stream.

Figure 6

Figure 6. The time evolution of the short term-memory, for some selected links wi,jS and the network illustrated in figure 2, without the link (3, 6). The transient states are (0, 1) → (4, 5, 6) → (1, 2, 3) → (3, 6) → (0, 6) → (0, 1). An external stimulus at sites (3) and (6) acts for t in [400, 410] with strength b(ext)  =  3.6.

3.4. Long-term memory dynamics

Information processing dynamical systems retain their functionalities only when they keep their dynamical properties within certain regimes; they need to regulate their own working point. For the type of systems discussed here, exhibiting transient-state dynamics, the working point is, as discussed in the Introduction section, defined as the time Δt the system needs for a transition from one quasi-stationary state to the subsequent one, relative to the length \bar t of the individual quasi-stationary states, which is given by 1/\Gamma_\varphi^-.

The cognitive information processing within neural networks occurs on short to intermediate timescales. For these processes to work well the mean overall synaptic plasticities, viz the average strength of the long-term memory links wijL, need to be regulated homeostatically. The average magnitude of the growth rates ri, see equation (3), determines the time Δt needed to complete a transition from one winning clique to the next transient-state. It therefore constitutes a central quantity regulating the working point of the system, since \bar t\sim 1/\Gamma_\varphi^- is fixed, the reservoir depletion rate \Gamma_\varphi^- is not affected by learning processes which affect exclusively the inter-neural synaptic strengths.

The bare growth rates ri(t) are quite strongly time-dependent, due to the time-dependence of the post-synaptic reservoirs entering the reservoir function fw(varphii), see equation (3). The effective incoming synaptic signal strength

Equation (9)

which is independent of the post-synaptic reservoir varphii, is a more convenient local control parameter. The working point of the cognitive system is optimal when the effective incoming signal is, on the average, of comparable magnitude r(opt) for all sites,

Equation (10)

The long-term memory has two tasks: to extract and encode patterns present in the external stimulus, equation (6), via unsupervised learning and to keep the working point of the dynamical system in its desired range. Both tasks can be achieved by a single local learning rule,

Equation (11)

Equation (12)

where \Delta \tilde r_i\,=\,r^{\rm (opt)}-\tilde r_i\;. For the numerical simulations we used \Gamma_{\rm L}^{\rm (opt)}=0.0008, W_{\rm L}^{\rm (min)}=-0.01 and r^{\rm (opt)}=0.2. We now comment on some properties of these evolution equations for w_{ij}^{\rm L}(t).

1.  

Hebbian learning. The learning rule (11) is local and of Hebbian type. Learning occurs only when the pre- and the post-synaptic neuron are active, viz when their respective activity levels are above the threshold xc. Weak forgetting, i.e. the decay of seldom used links, is governed by (12). The function d(w_{ij}^{\rm L}) determines the functional dependence of forgetting on the actual synaptic strength; we have used d(w_{ij}^{\rm L})=\theta(w_{ij}^{\rm L})w_{ij}^{\rm L} for simplicity.

2.  

Synaptic competition. When the effective incoming signal \tilde{r}_i is weak/strong, relative to the optimal value r^{\rm (opt)}, the active links are reinforced/weakened, with W_{\rm L}^{\rm (min)} being the minimal value for the wij. The baseline W_{\rm L}^{(\min)} is slightly negative, compare figures 3 and 7. The Hebbian-type learning then takes place in the form of a temporal competition among incoming synapses—frequently active incoming links will gain strength, on average, at the expense of rarely used links.

3.  

Fast learning of new patterns. In figure 7 the time evolution of some selected w_{ij}^{\rm L} is presented. A simple input pattern is learned by the network. In this simulation the learning parameter \Gamma_{\rm L}^{\rm (opt)} has been set to a quite large value such that learning occurs in one step (fast learning).

4.  

Suppression of runaway synaptic growth. When a neural network is exposed repeatedly to the same, or to similar external stimuli, unsupervised learning generally leads then to uncontrolled growth of the involved synaptic strengths. This phenomena, termed `runaway synaptic growth', can also occur in networks with continuous self-generated activities, when similar activity patterns are auto-generated over and over again. Both kinds of synaptic runaway growth are suppressed by the proposed link-dynamics (11).

5.  

Negative baseline. Note that w_{ij}=w_{ij}^{\rm S}+w_{ij}^{\rm L} enters the evolution equation (3) as \theta(w_{ij}). We can therefore distinguish between active (w_{ij} > 0) and inactive (w_{ij} < 0) configuration, compare figure 3. The negative baseline W_{\rm L}^{\rm (min)} < 0 entering (11) then allows for the removal of positive links and provides a barrier against small random fluctuations, compare section 3.2.

During a transient state we have xi → 1 for all vertices belonging to the winning coalition and xj → 0 for all out-of-clique sites, leading to

\[\tilde{r}_i \approx\ \sum_{j} w_{i,j},\qquad j\in\mbox{active\ sites},\]

compare equation (9). The working-point optimization rule (10), \tilde r_i \to r^{\rm (opt)} is therefore equivalent to a local normalization condition enforcing the sum of active incoming link-strengths to be constant, i.e. site-independent. This rule is closely related to a mechanism of self-regulation of the average firing rate of cortical neurons proposed by Bienenstock et al  [28].

Figure 7

Figure 7. The time evolution of the long-term memory, for some selected links w_{i,j}^{\rm L} and the network illustrated in figure 2, without the link (3, 6). The transient states are (0, 1) → (4, 5, 6) → (1, 2, 3) → (3, 6) → (0, 6) → (0, 1). An external stimulus at sites (3) and (6) acts for t in [400, 410] with strength b(ext)  =  3.6. The stimulus pattern (3, 6) has been learned by the system, as the w3,6 and w6,3 turned positive during the learning interval ≈[400, 460]. The learning interval is substantially longer than the bare stimulus length due to activation of the short-term memory. The decay of certain wijL in the absence of an external stimulus is due to forgetting (11), which should normally be a very weak effect, but which has been chosen here to be a sizeable \Gamma_{\rm L}^-=0.1, for illustrational purposes.

3.5. Online learning

The neural network we consider here is continuously active, independent of whether there is sensory input via equation (6) or not, the reason being that the evolution equations (2) and (4) generate a never-ending time series of transient states. It is a central assumption of the present study that continuous and self-generated neural activity is a condition sine qua non for modelling overall brain activity or for developing autonomous cognitive systems [3].

The evolution equations for the synaptic plasticities, namely (8) for the short-term memory and (11) for the long-term memory, are part of the dynamical system, viz they determine the time evolution of w_{ij}^{\rm S}(t) and of w_{ij}^{\rm L}(t) at all times, irrespectively of whether external stimuli are presented to the network via (6) or not. The evolution equations for the synaptic plasticities need therefore to fulfil, quite in general for a continuously active neural network, two conditions.

(a)  

Under training conditions, namely when input patterns are presented to the system via (6), the system should be able to modify the synaptic link-strength accordingly, such that the training patterns are stored in the form of new memories, viz cliques representing attractor ruins and leading to quasi-stationary states.

(b)  

In the absence of input the ongoing transient-state dynamics will lead constantly to synaptic modifications, via (8) and (11). These modifications may not induce qualitative changes, such as autonomous destruction of existing memories or the spontaneous generation of spurious new memories. New memories should be acquired exclusively via training by external stimuli.

In the following section we will present simulations in order to investigate these points. We find that the evolution equations formulated in this study conform with both conditions (a) and (b) above, due to the optimization principle for the long-term synaptic plasticities in equation (11).

4. Simulations

We have performed extensive simulations of the dynamics of the network with ongoing learning, for systems with up to several thousands of sites. We found that the dynamics remains long-term stable even in the presence of continuous online learning governed by equations (8) and (11), exhibiting semi-regular sequences of winning coalition, as shown in figure 2. The working point is regulated adaptively and no prolonged periods of stasis or trapped states were observed in the simulations; neither did periods of rapid or uncontrolled oscillations occur.

Any system with a finite number of sites N and a finite number of cliques settles in the end, in the absence of external signals, into a cyclic series of transient states. Preliminary investigations of systems with N ≈ 20–100 resulted in cycles spanning on the average a finite fraction of the set of all cliques encoded by the network. This is a notable result, since the overall number of cliques stored in the network can easily be orders of magnitude larger than the number of sites N itself, compare equation (1). Detailed studies of the cyclic behaviour for autonomous networks will be presented elsewhere.

4.1. Learning of new memories

Training patterns \{p_1,\ldots,p_Z\} presented to the system externally via r_i \to r_i +f_w(\varphi_i)b_i^{\rm (ext)}(t), for i\in\{p_1,\ldots,p_Z\}, are learned by the network via activation of the short-term memory for the corresponding intra-pattern links. In figures 6 and 7 we present a case study. The ongoing internal transient-state dynamics is interrupted at time t  =  400 by an external signal which activates the short-term memory, see figure 6. Note that the short-term memory is activated both by external stimuli and internally whenever a given link becomes active, i.e. when both pre- and post-synaptic sites are active co-instantaneous. The internal activation does however not lead to the internal generation of spurious memories, since internally activated links belong anyhow to one or more already existing cliques.

In figure 7 we present the time development of the respective long-term synaptic modifications, w_{ij}^{\rm L}(t). The parameters for learning chosen here allow for fast learning; the pattern corresponding to the external signal, retained temporarily in the short-term memory, is memorized in one step, viz the corresponding w_{36}^{\rm L}(t) becomes positive before the transition to the next clique takes place. For practical applications smaller learning rates might be more suitable, as they allow the learning of spurious signals generated by environmental noise to be avoided.

In table 1 we present the results for the learning of two networks with N  =  20 and N  =  100 from scratch. The initial networks contained only two connected cliques, in order to allow for a nontrivial initial transient state dynamics; all other links were inhibitory. Learning by training and storage of the externally presented patterns, using the same parameters as for figures 6 and 7, is nearly perfect. The learning rate can be chosen over a very wide range, as we tested. Here the training phase was completed, for the 100-site network, by t  =  5 × 104. Coming back to the discussion in section 3.5, we then conclude that the network fulfils the there formulated condition (a), being able to store efficiently training patterns as attractor ruins in the form of cliques.

Table 1. Learning results for systems with N sites and Nlinks excitatory links and N2,  ..., N6 cliques containing 2,  ..., 6 sites. Ntot is the total number of memories to be learned. Nl and Npar denote the number of memories learned completely/partially.
N Nlinks N2 N3 N4 N5 N6 Ntot Nl Npar
20 104 1 10 42 11 1 65 60 3
100 901 26 563 122 2 0 713 704 7

4.2. Link-asymmetry

We note that the Hebbian learning via the working-point optimization, equation (11), leads to the spontaneous generation of asymmetries in the link matrices, viz to w_{ij}^{\rm L}\ne w_{ji}^{\rm L}, since the synaptic plasticity depends on the post-synaptic growth rates.

In figure 8 we present, for two simulations, the distribution of the link-asymmetry w_{ij}^{\rm L}-w_{ji}^{\rm L} for all positive w_{ij}^{\rm L}, for the 100-site network of table 1, at time t=5\times10^5. The distributions shown in figure 8 are particular realizations of steady-state distributions, viz they did not change appreciably for wide ranges of total simulation times.

(i)  

In the first simulation the network had been learned from scratch. The set of 713 training patterns were presented to the network for t\in[0,5\times10^4]. After that, for t\in[5\times10^4,\break 5\times10^5] the system evolved freely. A total of 3958 transient states had been generated at time t=5\times10^5, but the system had nevertheless not yet settled into a cycle of transient states, due to the ongoing synaptic optimization, equations (8) and (11). There were 661 cliques remaining at t=5\times10^5, as the link competition had led to the suppression of some seldom used links.

(ii)  

In the second simulation, uniform and symmetric starting excitatory links w_{ij}^{\rm L}\to 0.12 had been set by hand at t  =  0, for all intra-clique links. The same N  =  100 network as in (i) was used and the simulation ran in the absence of external stimuli. All 713 cliques were still present at t=5\times10^5, despite the substantial reorganization of the link-strength distribution, from the initial uniform to the stationary distribution shown in figure 8. A total of 4123 transient states have been generated in the course of the simulation, without the system entering into a cycle.

For both simulations all evolution equations, namely (2) and (4) for the activities and reservoir levels, as well as (8) for the short-term memory and (11) and (12) for the long-term memory determined the dynamics for all times t\in[0,5\times 10^5]. The difference between (i) and (ii) being the way the memories are determined, via training by external stimuli, equation (6), as in (i) or by hand as in (ii).

Figure 8

Figure 8. The link-asymmetry w_{ij}^{\rm L}-w_{ji}^{\rm L} for the positive w_{ij}^{\rm L} for a 100-site network with 713 cliques at time t  =  5 × 105, corresponding to about 4500 transient states. Left panel: after learning from scratch. Training was finished at t ≈ 5 × 104. Right panel: starting with wi,j → 0.12 for all links belonging to one or more cliques.

Comparing the two link distributions shown in figure 8, we note the overall similarity, a consequence of the continuously acting working-point optimization. The main differences turn up for small link-strengths, since these two simulations started from opposite extremes (vanishing/strong initial excitatory links). The details of the link distribution shown in figure 8 depend sensitively on the parameters. For the results shown in figure 8 we used for illustrational purposes \Gamma_{\rm L}^-=0.1, which is a very big value for a parameter regulating weak forgetting. We also performed simulations with \Gamma_{\rm L}^-=0, the other extreme, and found that the link-asymmetry distribution was somewhat more scattered.

Coming back to the discussion in section 3.5, we then conclude that the network fulfils the there formulated condition (b), since essentially no memories acquired during the training state were destroyed, or spurious new memories spontaneously created, during the subsequent free evolution.

5. Conclusions

We have investigated a series of issues regarding neural networks with autonomously generated transient-state dynamics. We have presented a general method allowing us to transform an initial attractor network into a network capable of generating an infinite time series of transient states. The resulting dynamical system has strictly contracting phase space, with a one-to-one adiabatic correspondence between the transient states and the attractors of the original network.

We then have discussed the problem of homeostasis, namely the need for the system to regulate its own working point adaptively. We formulated a simple learning rule for unsupervised local Hebbian-type learning, which solves the homeostasis problem. We note here that this rule, equation (11), is similar to learning rules shown to optimize the overall storage capacity for discrete-time neural networks [22].

We have studied a continuous time neural network model using clique encoding and showed that this model is very suitable for studying transient-state dynamics in conjunction with ongoing learning-on-the-fly for a wide range of learning conditions. Both fast and slow online learning of new memories is compatible with the transient-state dynamics self-generated by the network.

Finally we turn to the interpretation of the transient-state dynamics. Examination of a typical time series of subsequently activated cliques, such as the one shown in figure 2, reveals that the sequence of cliques is not random. Every single clique is connected to its predecessor via excitatory links; they are said to be `associatively' connected [29]. The sequence of subsequently active cliques can therefore be viewed, cum grano salis, as an `associative thought process' [29]. The possible use of such processes for cognitive information processing, however, still needs to be investigated.

References

[1]
Katok A, Hasselblatt B and Mendoza L 1995 Introduction to the Modern Theory of Dynamical Systems  (Cambridge: Cambridge University Press) 
[2]
Hopfield J J 1982 Neural networks and physical systems with emergent collective computational abilities Proc. Natl Acad. Sci. USA 79 2554 
CrossRefPubMed
[3]
Gros C 2007 Autonomous dynamics in neural networks: the dHAN concept and associative thought processes Cooperative Behaviour in Neural Systems (Ninth Granada Lectures) ed P L Garrido, J Marro and J J Torres AIP Conf. Proc.887 129–138 
Preprint
[4]
Abeles M et al 1995 Cortical activity flips among quasi-stationary states Proc. Natl Acad. Sci. USA 92 8616 
CrossRefPubMed
[5]
Kenet T, Bibitchkov D, Tsodyks M, Grinvald A and Arieli A 2003 Spontaneously emerging cortical representations of visual attributes Nature 425 954 
CrossRefPubMed
[6]
Ringach D L 2003 States of mind Nature 425 912 
CrossRefPubMed
[7]
Hutt A and Riedel H 2003 Analysis and modelling of quasi-stationary multivariate time series and their application to middle latency auditory evoked potentials Physica D 177 203 
CrossRef
[8]
Freeman W J 2003 Evidence from human scalp electroencephalograms of global chaotic itinerancy Chaos 13 1067 
CrossRefPubMed
[9]
Kaneko K and Tsuda I 2003 Chaotic itinerancy Chaos 13 926 
CrossRefPubMed
[10]
Timme M, Wolf F and Geisel T 2002 Prevalence of unstable attractors in networks of pulse-coupled oscillators Phys. Rev. Lett. 89 154105 
CrossRefPubMed
[11]
Treves A 2005 Frontal latching networks: a possible neural basis for infinite recursion Cogn. Neuropsych. 22 276 
CrossRef
[12]
Kropff E and Treves A 2006 The complexity of latching transitions in large scale cortical networks Nat. Comput. doi:10.1007/s11047-006-9019-3 
[13]
Rabinovich M, Volkovskii A, Lecanda P, Huerta R, Abarbanel H D I and Laurent G 2001 Dynamical encoding by networks of competing neuron groups: winnerless competition Phys. Rev. Lett. 87 068102 
CrossRefPubMed
[14]
Seliger P, Tsimring L S and Rabinovich M I 2003 Dynamics-based sequential memory: winnerless competition of patterns Phys. Rev. E 67 011905 
CrossRef
[15]
Rabinovich M I, Varona P, Selverston A I and Abarbanel H D I 2006 Dynamical principles in neuroscience Rev. Mod. Phys. 78 1213 
CrossRef
[16]
Sompolinsky H and Kanter I 1986 Temporal association in asymmetric neural networks Phys. Rev. Lett. 57 2861 
CrossRefPubMed
[17]
Horn D and Usher M 1989 Neural networks with dynamical thresholds Phys. Rev. A 40 1036 
CrossRef
[18]
Metzler R, Kinzel W, Ein-Dor L and Kanter I 2001 Generation of unpredictable time series by a neural network Phys. Rev. E 63 056126 
CrossRef
[19]
Paula D R, Araújo A D, Andrade J S, Herrmann H J and Gallas J A C 2006 Periodic neural activity induced by network complexity Phys. Rev. E 74 017102 
CrossRef
[20]
Arbib M A 2002 The Handbook of Brain Theory and Neural Networks  (Cambridge, MA: MIT Press) 
[21]
Chechik G, Meilijson I and Ruppin E 2001 Effective neuronal learning with ineffective Hebbian learning rules Neural Comput. 13 817 
CrossRefPubMed
[22]
Okada M 1996 Notions of associative memory and sparse coding Neural Netw. 9 1429 
CrossRefPubMed
[23]
Amit D J, Gutfreund H and Sompolinsky H 1985 Storing infinite numbers of patterns in a spin-glass model of neural networks Phys. Rev. Lett. 55 1530 
CrossRefPubMed
[24]
von der Malsburg C and Schneider W 1886 A neural cocktail-party processor Biol. Cybern. 54 29 
CrossRef
[25]
Gray C M, König P, Engel A K and Singer W 1989 Oscillatory responses in cat visual cortex exhibit incolumnar synchronization which reflects global stimulus properties Nature 338 334 
CrossRefPubMed
[26]
Huerta R and Rabinovich M 2004 Reproducible sequence generation in random neural ensembles Phys. Rev. Lett. 93 238104 
CrossRefPubMed
[27]
Rabinovich M I, Huerta R, Varona P and Afraimovich V S 2006 Generation and reshaping of sequences in neural systems Biol. Cybern. 95 519 
CrossRefPubMed
[28]
Bienenstock E L, Cooper L N and Munro P W 1982 Theory for the development of neuron selectivity: orientation specificity and binocular interaction in visual cortex J. Neurosci. 2 32 
PubMed
[29]
Gros C 2005 Self-sustained thought processes in a Dense Associative Network (Springer Lecture Notes in Artificial Intelligence (KI2005) vol 3698)p 375Preprint q-bio.NC/0508032
Preprint

Notes

Note2  The reservoir functions have the form of generalized Fermi functions. A possible mathematical implementation for f_{\alpha}(\varphi), with α  =  w, z, which we used, is f_{\alpha}(\varphi)\ =\ f_{\alpha}^{(\min)}+(1.0-f_\alpha^{(\min)})(({\rm atan}[(\varphi-\varphi_{\rm c}^{(\alpha)})/\Gamma_\varphi] -{\rm atan}[(0-\varphi_{\rm c}^{(\alpha)})/\Gamma_\varphi])/({\rm atan}[(1-\varphi_{\rm c}^{(\alpha)})/\Gamma_\varphi] -{\rm atan}[(0-\varphi_{\rm c}^{(\alpha)})/\Gamma_\varphi])) with \varphi_{\rm c}^{(z)}=0.15, \varphi_{\rm c}^{(w)}=0.7, \Gamma_\varphi=0.05, f_w^{\rm (min)}=0.1 and fz(min)  =  0.

  1. Neural networks with transient state dynamics

    Claudius Gros 2007 New J. Phys. 9 109

  2. Long-distance quantum key distribution in optical fibre

    P A Hiskett et al 2006 New J. Phys. 8 193

  3. Semiconductor lasers coupled face-to-face

    E A Viktorov et al 2004 J. Opt. B: Quantum Semiclass. Opt. 6 L9

  4. Diamagnetically stabilized levitation control of an intraluminal magnetic capsule

    Michael Lam and Martin Mintchev 2009 Physiol. Meas. 30 763

  5. Coloured phase singularities

    M V Berry 2002 New J. Phys. 4 66

  6. Gas sensing with long, diffusively contacted single-walled carbon nanotubes

    Anupama B Kaul 2009 Nanotechnology 20 155501

  7. Nearly complete regression of tumors via collective behavior of magnetic nanoparticles in hyperthermia

    C L Dennis et al 2009 Nanotechnology 20 395103

  8. N–P transition sensing behaviors of ZnO nanotubes exposed to NO2 gas

    J X Wang et al 2009 Nanotechnology 20 465501

  9. Determination of the effective Young's modulus of vertically aligned carbon nanotube arrays: a simple nanotube-based varactor

    Niklas Olofsson et al 2009 Nanotechnology 20 385710

  10. A flexure-based electromagnetic linear actuator

    Tat Joo Teo et al 2008 Nanotechnology 19 315501



Please login to access our web services, or create an account if you don't yet have one.

You must have cookies enabled in your web browser to be able to login.

Username
Password

Forgotten your password? Get a new one here.