Learning capacity and function of stochastic reaction networks

Biochemical reaction networks are expected to encode an efficient representation of the function of cells in a variable environment. It is thus important to see how these networks do learn and implement such representations. The first step in this direction is to characterize the function and learning capabilities of basic artificial reaction networks. In this study, we consider multilayer networks of reversible reactions that connect two layers of signal and response species through an intermediate layer of hidden species. We introduce a stochastic learning algorithm that updates the reaction rates based on the correlation values between reaction products and responses. Our findings indicate that the function of networks with random reaction rates, as well as their learning capacity for random signal-response activities, are critically determined by the number of reactants and reaction products. Moreover, the stored patterns exhibit different levels of robustness and qualities as the reaction rates deviate from their optimal values in a stochastic model of defect evolution. These findings can help suggest network modules that are better suited to specific functions, such as amplifiers or dampeners, or to the learning of biologically relevant signal-response activities.


Introduction
Chemical transformations play a central role in the field of chemistry and encompass a vast array of diverse chemical reactions [1]. In living organisms, biochemical reaction networks are responsible for a wide range of biological functions, such as metabolism and signal transduction [2]. Decoding the design principles underlying these complex biochemical reactions is required for life-inspired reaction engineering and understanding the emergence of life on Earth.
A fundamental feature of biochemical reaction networks is the co-existence of both complexity and robustness in these self-organizing, self-regulating systems [3]. Homeostasis is a recurring theme in biology with a long history going back to Claude Bernard who emphasized the importance of maintaining 'le milieu intérieur' [4]. In biological organism, regulated variables often robustly adapt to environmental perturbations [5]. Furthermore, robustness is often a desired feature of engineered reaction networks [6]. It is thus important to see how chemical networks can learn and implement representations of a certain function.
Hebb's rule is a natural mechanism of learning in (artificial) neural networks, especially in models of associative memory [7]. The back-propagation algorithm provides an efficient strategy when it comes to training of deep (multilayer) feed-forward networks [8,9]. The function and learning performance of these networks are well studied and many powerful mathematical tools and concepts have been developed, e.g. see [10][11][12][13][14][15][16][17][18][19] and references therein. Regarding the reaction networks, we know that stochastic reaction networks are Turing universal and can compute any computable function for any nonzero error probability [20][21][22]. There are many interesting works which use chemical species and reactions as computational resources to implement the basic gates and circuits that are needed for a universal computation [23][24][25][26]. An important problem here is reconstruction of a reaction network or learning of the reaction rates which reproduce a set of empirical or expected fluxes [27][28][29][30][31][32][33][34][35][36][37][38][39]. It is usually easier here to work with deterministic differential equations (or the steady states) in the thermodynamic limit, ignoring intrinsic fluctuations and correlations in the number of species for a finite system.
Currently, our understanding of learning capacity of stochastic chemical reactions is limited. Previous research into the function and learning of reaction networks has been mostly focused on deterministic models in the thermodynamic limit where the number of species and volume approach infinity but the concentrations are finite. In this limit, the dynamics of the system is described by a set of coupled ordinary differential equations for the average concentrations, assuming for instance the mass-action or Michaelis-Menten kinetics. A mathematical treatment of the space of solutions and functions that can be exhibited by these equations is presented in [40]. The computational power of deterministic reaction systems is demonstrated in [20]. Similarities with neural networks was exploited in [37][38][39] to have a mapping of the dynamics in the two systems and to implement different neural circuits with a reaction network. An elementary cellular signalling pathway is trained in [27] to display a specific signal-response pattern. The learning of linearly separable and inseparable problems is simulated in [30,34] using only chemical species and reactions.
The first step to study the learning capacity of stochastic chemical reactions is to characterize the function and learning capabilities of basic artificial reaction networks. In this article, we are interested in stochastic reaction networks in finite systems where fluctuations are not negligible. These networks can compute any computable function with a nonzero error probability [21,22]. The function of random neural network and deterministic reaction networks is studied in [14][15][16]41]. Here we do a similar study for random networks of stochastic reactions. Efficient simulation of stochastic reaction networks is in general a computationally hard task but there are good approximate algorithms for simulating these systems [42][43][44][45]. Learning capacity of multilayered neural networks has been studied in [10][11][12] using the methods of statistical physics. In this work, we shall investigate the learning performance of stochastic reaction networks which has not been studied before as far as we are aware of. The stability or robustness of deterministic reaction networks in a changing environment are discussed in [4,46,47]. In [48] the difference between the robustness in deterministic and stochastic networks is illustrated. In this work, we see how much the stored patterns in a stochastic reaction network are robust to mutations in the reaction rates in a simple model of disease evolution introduced in [49].
Here, we focus on simple networks of reversible reactions and provide a proof-of-concept analysis of learning process in stochastic chemical reaction networks. We estimate the learning capacity and its dependency on reaction properties and network topology. We also assess the quality and robustness of the stored patterns. In the following, we start with the model definitions in section 2 and present the results of numerical simulations in section 3. The results are reported in three subsections dealing with the networks' functions and capacities, learning of random signal-response patterns, and robustness of the stored patterns. The concluding remarks are given in section 4. Overall, our finding provides insights relevant to engineering reaction networks with desired learning capacities.

Preliminaries
We consider a well-stirred mixture of N interacting species connected with M reversible chemical reactions (i.e. 2M irreversible reactions). We use 0 ⩽ X i (τ ) ⩽ X max to show the number of molecules for species i = 1, . . . , N at time τ . No conservation law for the number or mass of the molecules is assumed here. The maximum number X max scales with the system volume V. Generally, a reaction a = 1, . . . , 2M is represented by the stoichiometric coefficients ν − a (i) and ν + a (i) When the reaction occurs, the reactant/product species in the left/right hand side of the reaction are consumed/generated and X i decreases/increases by ν + a (i) − ν − a (i). We use ν − a and ν + a for the set of reactants and products in reaction a, respectively.
In the following, we assume that the coefficients ν − a (i), ν + a (i) ∈ {0, 1}. Moreover, we shall consider reactions that connect at most two distinct reactant species to two other distinct product species. A reaction of type Rmn involves m reactants and n products. In the reaction networks that are studied here, we have only one-to-one (R11), one-to-two (R12), two-to-one (R21), or two-to-two (R22) reactions. A reaction network is called an Rmn network if all the reactions are of type Rmn. The dynamics is modeled by a continuous-time Markov chain where each reaction has a chance to occur in a small time interval dτ with a given probability [50]. The propensity function η a (X) times dτ gives the probability that reaction a occurs in the time interval (τ, τ + dτ ) given the number of molecules To be more precise, the propensity function is solely dependent on the number of molecules located on the left-hand side of the reaction ν − a = {i : ν − a (i) > 0}. In the mass-action kinetics with ν − a (i), ν + a (i) ∈ {0, 1}, we have: where x i (τ ) = X i (τ )/X max and the reaction rates κ a ∝ X max . The latter is to ensure that the average concentrations ⟨x i ⟩ are well defined in the thermodynamic limit V, X max → ∞. In practice, and in numerical simulations, we work with finite and fixed values for V and X max . Specifically, in the following we shall set V = X max .

Network
The definition of a reaction network comprises three components: the set of species or reactants (nodes), the reactions connecting the species (edges), and the reaction rates (couplings). In what follows, we shall consider multilayered reaction networks starting with one layer of signals and ending with one layer of responses. Figure 1 depicts a Petri net representation of the reaction networks. Assuming that the signal in a biochemical reaction system is encoded in the concentration of the signal species, let S be the set of signal variables which is a subset of species variables I = {i : i = 1, . . . , N}. The signal variables are externally driven to maintain a fixed activity level that is independent of the other variables. This scenario can occur in homeostatic regulation of cellular reaction networks or when a biochemical reaction system is being manipulated with drugs. In addition, we have the disjoint subset of response variables R ⊂ I. These variables could be on any side of a reaction. The number of signal and response variables is denoted by N S = |S| and N R = |R|, respectively. Note that a (reversible) reaction in these networks only connects species in two consecutive layers. The signal and response species do not appear in the hidden layers and each species corresponds to a single node in the network. N h determines the quantity of variables in the hidden layers. The hidden species are to process (e.g. filter or amplify) and transfer the signals to the response species (e.g. from surface receptors to nucleus of cell). As in artificial neural networks, the hidden layers of species are expected to enhance expressivity of multilayered reaction networks. We assume that the number of signals is larger than that of responses to model redundancy of pathways to the responses. Considering these points, in the following we work with the largest networks of stochastic reactions that are computationally tractable to have a reasonable statistics. We shall see how the structure and the nature of reactions determine the function of such reaction networks.

Dynamics
To simulate the time evolution of a stochastic reaction network, the Stochastic Simulation Algorithm (or Gillespie algorithm) can be used [42]. Suppose that a reaction occurred at time τ . Then, we need to know the time τ a to the next reaction and its index a. Here we assume that the time interval dτ is small enough such that at most one reaction can occur in that interval. This defines a Poisson process where the probability of occurring a reaction is also independent of the other reactions. Thus, assuming the system is in state X(τ ), the joint probability distribution of the two random variables (τ a , a) can be expressed as follows: To generate samples of the aforementioned random variables, the Gillespie algorithm operates as follows: • compute the η a and η given the X(τ ) • extract τ a and a from the probability distribution ρ(τ a , a): * draw two independent random numbers u 1 , u 2 from the uniform distribution in (0, 1) * then τ a = 1 η log(1/u 1 ) and a is the smallest integer satisfying Note that the numbers of molecules for externally driven (signal) species are fixed and do not follow the above dynamics even if they are involved in a reaction. Moreover, we have this constraint that 0 ⩽ X i ⩽ X max for all species. We take X max = 1000 in all the following numerical simulations. This is on the order of the number of molecules present in a cell. When X max (and hence X i ) takes on very small values, large finite size fluctuations occur and the uncertainty in the number of molecules dominates. On the other hand, for very large values of X max and extensive X i , the relative importance of fluctuations around the mean values would be negligible and one recovers the results of deterministic models (differential equations) for the average concentrations. One expects to observe different qualitative behaviors in these two limits. Interesting behaviors are usually exhibited in the region separating the two regimes, where real cells function. Given an initial state for the number of molecules X(0) and the magnitude of κ a , the above algorithm generates a sequence of states where in each step a single reaction occurs. An iteration of dynamics is defined by 2M steps. X(t) represents the state of the system, indicating the number of molecules for each species, following t iterations. In the following, the number of iterations is chosen large enough to ensure that the system is in the stationary state of the dynamics. Then samples of X(t) are extracted to obtain, e.g. the average values of concentrations and correlations between the concentrations of two species. Throughout the following sections, this algorithm will be repeatedly employed to obtain the above statistical information for a given realization of the reaction rates.

Signal-response patterns
Let us consider a reaction network which displays responses r µ in presence of signals s µ . Here {s µ , r µ } denote the expected activation values of the signal and response variables for a set of P patterns indexed by µ = 1, . . . , P. The signals have the specified values s µ i = 0, 1 in pattern µ; that is for pattern µ the average concentration satisfies The interval 2δ defines the size of the uncertainty gap between the two states. Similarly, for the response variables c µ j > 1 2 + δ means that r µ j = 1 and c µ j < 1 2 − δ is equivalent to r µ j = 0. In the following, we set δ = 1/6. Note that any species, including signals and responses, could be active or inactive depending on its concentration: it is active for concentrations greater than 1 2 + δ and inactive for concentrations less than 1 2 − δ. For concentrations between the two limits the species are considered neutral. The uncertainty gap 2δ should be larger than the standard deviations in the concentrations to have a well defined active state and inactive state.
Later in this study, to investigate the function space of a random reaction network, we compute the responses (output) for all the possible signal activities (input), given a realization of the random reaction rates. The number of such signal-response patterns is 2 NS . In the learning part of the study, however, we look for a stochastic reaction network which stores P randomly and uniformly chosen signal-response patterns. In both cases, the reaction network structure remains constant, and only the reaction rates κ a vary.

Network capacity and function
To characterize the function of a reaction network we study the statistics of response concentrations {x µ j : j ∈ R} for all the possible 2 NS signal patterns [14,15]. The average concentration c µ j = ⟨x µ j ⟩ t denotes the time average of concentration in the stationary state for signal pattern s µ . The results are obtained for an ensemble of random and independent reaction rates κ a uniformly distributed in (0, κ max ). This ensemble just provides the maximum range of models that here are investigated and it does not mean that in reality (or at equilibrium) the reaction rates are distributed uniformly and independently. That is we take a random sample of the reaction rates κ and run the Gillespie algorithm for signal activity s µ to obtain the time averages c µ j (κ). This process is repeated for different realizations of κ to estimate the ensemble averages over the space of the reaction rates. In the following, we will consider the time-averaged concentrations and correlations of the response variables for a signal pattern µ, The time-averaged correlations are Note that the averages with respect to time and random realizations of the reaction rates are denoted by ⟨.⟩ t and ⟨.⟩ κ , respectively. Given a signal pattern µ, we are also interested in the average polarization of the responses to see how much the responses' concentrations are deviated from the neutral value 1/2. The overlap of responses for two random and independent set of reaction rates is denoted by This quantity is a measure of changes in the responses that are induced by varying the reaction rates. We also check the overlap of responses for two different signal patterns µ and µ ′ , to have a measure of variation in the responses by the input patterns, To characterize the signal patterns, we define the level of activities and overlap of two signal patterns, These quantities provide the basic statistics of the input signals. Note that m µ S is related to the number of active signals in pattern µ and is distinct from the mean concentration of a signal variable represented as c µ i (κ). Let us start with the function of fully connected one-layer reaction networks with iid reaction rates κ a uniformly distributed in (0, κ max ). A fully connected network of Rmn reactions is a network in which all possible reactions of type Rmn are present between two consecutive layers of the network. In all the following numerical simulations we set κ max = 0.01X max . That is proportional to X max but small enough to allow us to Table 1. Response of the one/two-layer reaction networks. The average concentrations c µ R and correlations c µ RR of response variables in a fully connected network of NS = 10 signals and NR = 5 responses. The number of hidden variables is N h = 5 in the two-layer network. The results are averaged over independent realizations of the reaction rates for all the possible signal patterns. The number of iterations is t = 2000 for the R11, R12, and t = 4000 for the R21, R22 networks.

Reactions
Concentration (one/two-layer) Correlation (one/two-layer) simulate the stochastic dynamics of the species in a reasonable computational time. We sample a random realization of the reaction rates from the above distributions and run the Gillespie algorithm for t iterations to reach the stationary state of the dynamics. The interested quantities are then sampled and the process repeated for different realizations of the reaction rates to obtain the statistics of the stationary state. We checked that the results are not qualitatively sensitive to small changes in the value of κ max , and the number of signals, responses, and hidden variables. This is expected to be true also for different distributions of the reaction rates with nearly the same means and variances. However, we needed more iterations to reach the stationary state of the networks with the R21 and R22 reactions (t = 4000) than with the R11 and R12 reactions (t = 2000). Table 1 reports the average activity and correlation of the response variables in such networks. We observe that the activity is close to 1/2 for both the R11 and R22 networks, whereas the R12 and R21 networks have high and low average activities, respectively. The response variables are nearly uncorrelated, ignoring the very small correlations of order ±0.01. Similar behaviors are observed in table 1 for the two-layer networks where the effects of the R12 and R21 networks are enhanced. This means that on average the latter networks work respectively as an amplifier and dampener irrespective of the inputs and the reaction rates. Moreover, one can enhance these effects by increasing the number of layers in the network. In fact, when the number of reactants is larger than the number of products the probability of direct reaction is smaller than that of the reverse one according to the mass action kinematics. That is why we see that for R21 reactions the concentration of responses is smaller than 1/2 whereas for R12 reactions it is the opposite. Figures 2 and 3 display the statistics of the signal and response variables in the same networks. In average, the response polarization p µ R and overlaps q µ R , q µµ ′ R are larger for the R12 and R21 networks than for the R11 and R22 networks. That is the responses are closer to 0 and 1 and more similar to each other for different patterns and reaction rates. Moreover, in the R11 and R22 networks the p µ R and q µ R are nearly symmetric around the signal activity m µ S = 0, where half the signals are active and the other half inactive. On the other hand, the two quantities p µ R and q µ R are increasing with m µ S in the R12 network and are decreasing with m µ S in the R21 network. Figures 2 and 4 show the above statistics for the two-layer reaction networks which are qualitatively similar to the one-layer networks. In words, in both the one-and two-layer networks the R12 and R21 reactions favor the 'on' and 'off ' states, respectively. On the other hand, the two states are balanced in networks with the R11 and R22 reactions and one can switch between these states by changing the number of active signals. De Palma et al [16] demonstrates that random neural networks tend to exhibit bias towards simple functions. Interestingly, we observe that random reaction networks are very similar to random neural networks in supporting simple functions.
The learning capacity of a reaction network can be quantified as follows. Consider the ensemble of random and independent patterns {s µ , r µ } and define the success probability Pr(P) of storing P of such patterns. The network capacity is defined by the threshold P c where the success probability approaches zero, Pr(P > P c ) → 0. On the other hand, one can look at the number of solutions in the reaction space κ = {κ a : a = 1, . . . , 2M}, where´Dκ denotes an integral over the reaction space 0 < κ a < κ max . The indicator function I µ (κ) = 1 if pattern µ is stored, otherwise I µ (κ) = 0. We say that pattern µ is stored if the signal values s µ result in the associated responses r µ by following the system dynamics. The capacity is given by the maximum number of patterns where the average entropy log N κ vanishes [12]. Consider a one-layer network of the R11 and R21 reactions with a single response variable (N R = 1). Given the signals s µ i , the average concentration of the response variable in the thermodynamic limit c µ j is governed by the following deterministic equation Here the c µ i are the average concentrations of the signal variables constrained by |c µ i − s µ i | < 1 2 − δ. Moreover, A ↓ refers to the set of reactions that are directed from the upper layer to the lower layer. Similarly, we define the set A ↑ of reverse reactions. In the stationary state, A solution in the reaction space should satisfy the following constraints: whereκ a = κ a /( a∈A ↑ κ a ). Now consider a one-layer network of the R12 and R22 reactions with only two response variables (N R = 2). The deterministic equations for the average concentrations of the two response variables in the thermodynamic limit c µ Here by symmetry the two responses have the same average concentrations. Then, by taking (c µ j ) 2 as the response, one obtains the same equations as equations (15)-(17) for the R11 and R21 networks. Therefore, in the following, we consider only the R11 and R21 reaction networks with a single response variable.
We take P random and independent signal-response patterns {s µ , r µ }. That is, for each µ = 1, . . . , P we set the s µ i = 0, 1 and r µ = 0, 1 with equal probability. Then we use the least-squares method to check if there is a solution for the reaction rates which satisfy equations (15)-(17) for any µ. Figure 5 displays the probability of having a solution obtained by solving the above equations for at least 500 independent realizations of the patterns and the reaction networks. To have a fair comparison, besides the fully connected R11 and R21 networks, here we also report the results for a sparse R21 network with M = N S reversible reactions, where only a random subset of all the possible R21 reactions are present.
In figure 5 we observe that the number of stored patterns increases faster with N S in the R21 reaction networks than the R11 network. More precisely, the number of stored patterns scales with N ν S , where the exponent ν ≃ 0.5 in the R21 reaction networks which is nearly two times greater than that of the R11 network. The probability of having a solution of course increases with the number of signals in both the networks. But this probability is growing faster with N S , and therefore with the number of reactions, in the R21 networks compared to the R11 network. This means that the two-signal interactions which are added by the R21 reactions are helpful in capturing some level of correlations in the signal patterns. It is well known that statistical models with two-body interactions are reach enough to capture strong correlations which are observed in complex systems and phase transitions. Note that these results are for deterministic reaction networks. To study the learning of stochastic reactions in a multilayer network, in the following we resort to a stochastic learning algorithm.

Learning the patterns
Consider P signal-response patterns {s µ , r µ } and a stochastic reaction network with reaction rates κ a (0) ∈ (0, κ max ). We assume that the network structure is fixed. All the variables except the signal variables (in S) are free to change in range (0, X max ) by the system dynamics. The aim is to change gradually the network parameters to find the rates that store the above signal-response activities. We say the system learned pattern µ if the signal values s µ i result in the corresponding response concentrations that is, |c µ j − r µ j | < 1 2 − δ for all j ∈ R. In practice, reaction rates can be altered by introducing catalysts, or changing catalytic capacity of an enzyme (via mutations or adding regulators) in a biochemical reaction. More precisely, this can be done by changing the concentration of an enzyme, or by changing the enzyme itself, as it happens during evolution (large time scales).
Given the reaction network's structure, we aim to identify the reaction rates that maximize the following objective function: where E(c µ : r µ ) measures deviations from the expected responses r µ given the s µ , By minimizing the above energy function for each pattern µ the system is pushed towards responses that are close to the expected ones for that pattern. The idea is to update the reaction rates according to the sign of linear correlations between the products of a reaction i∈ν + a (2x µ i − 1) and deviations in the responses j∈R (2r µ j − 1)(2x µ j − 1); a positive correlation means that increasing the products would result to a larger overlap between the observed responses and the expected ones.
We start with random initial parameters and run the system dynamics (i.e. Gillespie algorithm) for t eq iterations to compute the objective function E(0). Then, for n = 1, . . . , T, we do the following for each pattern µ (randomly selected): • compute linear correlations c µ ij from the statistics of pattern µ in time interval ∆t • for each reaction a compute ∆ µ • use ∆ µ a to move the reaction rate κ a from its current optimal value: * with probability 1/2 change κ a → κ a + ϵ∆ µ a , otherwise change it in a random direction • run the dynamics for another t eq iterations to determine the value of the objective function E(n) and accept the changes in the κ a if the objective function shows an increase We use the stochastic dynamics of section 2.3 and the above learning algorithm to store random and uncorrelated patterns of signals and responses. That is, the values {s µ i = 0, 1 : i ∈ S} and {r µ j = 0, 1 : j ∈ R} are chosen with equal probability and independently for µ = 1, . . . , P. The signal patterns here are distinct. Recall that the response concentrations c µ j are determined by the reaction rates κ and the signal pattern s µ . The aim of leaning is to make the response concentrations c µ j as close as possible to the r µ j for all patterns µ. We checked that the above procedure works when we start from random reaction networks with known signal-response patterns; we obtain the responses for a number of input signals in a network with known reaction rates and then try to reconstruct the parameters by the above algorithm. Moreover, the algorithm performs better than a zero-temperature Monet Carlo which means that using the correlations to update the reaction rates is relevant here. For the parameters of the algorithm we take t eq = 2000-4000, ∆t = 1000, and ϵ = 0.02-0.2. Figure 6 shows the average number of stored patterns in the one-and two-layer networks as the number of learning iterations increases. It is observed that the networks are able to learn in average more  than half of the presented patterns (P) after a few tens of iterations. However, except for the case of a single pattern (P = 1), the learning is far from perfect with a large gap separating the cases P ⩾ 2. This is the case for both the R11 and R21 networks as shown in figure 6. Note that here we are working with random signal-response patterns. From the study of random reaction networks in section 3.1 we know that it would be easier to store patterns which are closer to the expected functions of the networks. For instance, patterns in which the activity of responses is low are better represented by the R21 reaction networks. Figure 7 displays the average number of stored patterns ⟨P s ⟩ after T = 2000 learning iterations as a function of P. We see that ⟨P s ⟩ approaches the value expected from a random network after a few number of patterns. As before, the R21 reaction network exhibits a better storage performance than the R11 networks. That is, the chance of storing more than half of the presented patterns decreases more rapidly with P in the R11 network. This is the case also in the sparse R21 network with the same number of reactions as in the R11 network.

Robustness
Chemical reaction systems like cellular signaling networks may undergo aging or disease processes which comprise of gradual accumulation of defects. A defect could for instance alter the concentration of an enzyme, which indirectly controls the reaction rate of a biochemical transformation, or even modify the structure of the reaction network by introducing new reactions to the system. Here we consider only the former defects, which are simply represented by small changes in the reaction rates of a network. In this section, we want to see how much the stored patterns in the above stochastic reaction networks are robust against random variations in the reaction rates.
To model such a process, we begin with the stationary state of an optimal (so-called 'healthy') reaction network, where a random signal-response pattern is stored (i.e. P = 1). The value of the associated objective function is denoted by E old . The time evolution of a disease then is modeled in the following way [49]: For t d = 1, . . . , t max do: • each reaction rate changes with (mutation) probability α(t d ) from κ a to κ a ± δκ • compute the new objective function E new • accept the changes in the reaction rates if the objective function increases, otherwise, accept the changes with probability exp(β(t)∆E). If the changes are accepted set E old = E new The mutation probability α(t d ) is expected to increase with the disease time step t d . On the other hand, the parameter β(t d ) which controls the acceptance probability is expected to decrease with t d . Here β(t d ) ⩾ 0 plays the role of an inverse temperature in a reverse simulated annealing algorithm [51]. While the probability α(t d ) controls the rate of local changes ('mutations') in the reaction network, the global parameter β(t d ) determines the system susceptibility to such local mutations ('immunity'). The direct simulated annealing algorithm, where α(t d ) decreases and β(t d ) increases with time, can also be employed as a learning algorithm.
Note that in each time step t d all the reactions have the chance to increase or decrease with equal probability by δκ. In the following numerical simulations, we set δκ = 0.5, and as before we limit the reactions rates κ a ∈ (0, 10). Every time that a change in the reaction rates is suggested, we run the Gillespie algorithm for t eq = 2000 − 4000 iterations to reach the stationary state, where the new objective function is estimated.
In figure 8, we check the robustness of the stored patterns against the disease evolution in the one-layer networks. The learning algorithm of section 3.2 was used to store a single random pattern in T = 2000 learning iterations. We see that the survival time of the stored pattern is longer in the R11 network than the R21 network for different rates of disease evolution. Thus, the basin of attraction of the stored pattern should be larger in the R11 network. Moreover, it is observed that the transition from the healthy state to the disease state gets sharper by increasing the size of the R11 network. Whereas in the R21 network the transition becomes smoother as the number of variables grows. This shows that the stored pattern in the R11 network is well separated from other spurious states in the reaction rate space. Similar results for the two-layer networks are reported in figure 9. In addition to the above observations, here the system displays a two-stage behavior when the time scales of the α(t d ) and β(t d ) are different. More precisely, the change in the mutation probability in a two-layer network has a considerable effect on the system even for large values of β. This means that the space of reaction rates around the stored pattern in the one-layer and two-layer networks are different. The difference could be due to the fact that learning of two-layer networks with the R21 reactions is more difficult than that of the one-layer networks.

Conclusion and discussion
In this paper we studied the function of multilayer networks of reversible and stochastic reactions connecting the subsets of signal, response, and hidden species. We used elementary reactions R mn connecting m = 1, 2 reactants to n = 1, 2 products and employed the Gillespie algorithm to simulate the stochastic dynamics of the species. We observed that the typical response function of the R11 and R22 networks with uniformly random reaction rates is concentrated around the activity c = 0.5. On the other hand, the R12 and R21 reaction networks displayed high and low response activities, respectively, with greater response polarization and larger overlap between the responses compared to the R11 and R22 networks. This effect was enhanced in the two-layer networks of the same reactions. Moreover, we found that the average polarization and similarity of responses in the R12 (R21) network increase (decrease) with the number of active signals, in contrast to the symmetric and non-monotonic behavior of the R11 and R22 networks. In summary, it seems that the function space of the random reaction networks is represented by simple functions very much like random neural networks [16]. To observe more complex functions, it would be essential to introduce recurrent reactions connecting reactant and product species in hidden layer of the networks.
We found that the average number of stored patterns in deterministic reaction networks grows faster with the number of signals in the R21 reaction networks than in the R11 networks. This highlights the role of two-signal interaction that are induced by the R21 reactions. It remains to be seen how much this effect is changed by increasing the number of hidden layers. We also introduced a stochastic learning algorithm to store a number of random signal-response activities in the stochastic reaction networks. The algorithm was based on the correlations between the products of a reaction and the responses. Here, we found it very difficult to store an extensive number of random signal-response patterns in both the one-and two-layer networks, at least with the above learning algorithm. Still, the R21 reaction network performed better than the R11 network in learning the random patterns. The patterns in the R11 networks are however more robust with respect to random changes in the reaction rates in a stochastic process of defect evolution. Moreover, the two-layer R21 reaction networks are more sensitive to changes in the mutation probability perhaps due to the appearance of other metastable states in the learning process.
In this study, we considered elementary reactions which in contrast to interactions in artificial neural networks are always excitatory in the sense that more reactants result to more products. It is of course possible to model effectively the inhibitory interactions by introducing other auxiliary species to the system [34]. Adding this feature to the reactions will expectedly enhance the learning performance of these networks at the expense of working with a larger number of species. In addition, the stochastic algorithm that we used to store the signal-response patterns can be improved to obtain better lower bounds for the learning capacity of artificial reaction networks. For instance, one could employ more efficient algorithms than the Gillespie algorithm for simulation of the system dynamics [43][44][45].
Please note that the reactions in this work are stochastic, which sets this study apart from other studies on learning deterministic reaction networks [29,30,34,39]. The stochastic nature of the reactions increases the computational complexity of the problem but also provides more information about the probability distribution of the species. We used this information for example in the learning algorithm of section 3.2 to update the reaction rates according to the values of the correlations between the products of a reaction and the response species. Information about the statistical dependencies of the species, which is absent in deterministic models, can be useful in a learning algorithm to devise more elaborate strategies to update the reaction rates. In appendix we use a similar strategy to store random patterns of signals and responses in an enzymatic reaction network.
The purpose of this study was to provide a theoretical framework for future applications, such as diagnostic problems. For instance, suppose we have access to the history of concentration variations of species in a reaction network that is developing an undesired state, so-called a 'disease' state. The starting reaction network could be in a 'healthy' state that stores several signal-response patterns. To diagnose the developing disease from such dynamic data, one must first understand how the initial system of interacting species represents the stored patterns, as well as how robust these healthy states are against relevant mutations in the reaction network. A signal transduction reaction network, like the one studied here, enables us to investigate disease dynamics and the effectiveness of diagnostic algorithms in a simple benchmark model.

Data availability statement
All data that support the findings of this study are included within the article (and any supplementary files).

Acknowledgment
This work was performed using the Academic Leiden Interdisciplinary Cluster Environment (ALICE), the compute resources provided by Leiden University.

Appendix. Learning of a stochastic enzymatic reaction network
In this section we provide an example of learning by changing the concentrations of enzymes in a stochastic enzymatic reaction network. We consider reactions that connect a substrate to a product through a complex formation S + E ↔ C ↔ P + E.
(A1) Figure 10 schematically displays a reaction network of N S substrates (signals) connected by the same number of distinct enzymes and complexes to a single product (response). Here, in contrast to the main text, we try to store random patterns of signals and responses by changing only the concentrations of the enzymes. As before the kinetics is given by the mass action law but with fixed reaction rates κ a which are distributed randomly and uniformly in (0, 10). The results at the end are averaged over independent realizations of the reaction rates and the stochastic dynamics. The learning algorithm is the same as the one presented in section 3.2, except that here we use the correlations between the enzymes and the product to change the enzyme concentrations. We begin with random initial numbers for the enzymes and execute the Gillespie algorithm for t eq iterations to compute the objective function E(0) (from equations (20) and (21)). Then, for n = 1, . . . , T, we do the following for each pattern µ: • compute linear correlations c µ ij from the statistics of pattern µ in time interval ∆t • for each enzyme i compute ∆ µ i = j∈R (2r µ j − 1)c µ ij • use ∆ µ i to move the number of enzymes X i from its current optimal value: * with probability 1/2 change X i → X i + ϵ∆ µ i , otherwise change it in a random direction • run the dynamics for another t eq iterations to determine the value of the objective function E(n) and accept the changes in the X i if the objective function shows an increase Figure 11 shows the average number of stored patterns P s when we try to store P random patterns in this way. The parameters used in the algorithm are t eq = 3000, ∆t = 1000, and ϵ = 0.02X max . Note that the reaction rates are fixed to random values and we do not tune the reaction rates. Therefore, learning is more difficult here compared to the case we studied in the main text.