Learning systems of ordinary differential equations with Physics-Informed Neural Networks: the case study of enzyme kinetics

Physics Informed Neural Networks (PINNs) are a type of function approximators that use both data-driven supervised neural networks to learn the model of the dynamics of a physical system, and mathematical equations of the physical laws governing that system. PINNs have the benefit of being data-driven to train a model, but also of being able to assure consistency with the physics, and to extrapolate accurately beyond the range of data that currently accessible. As a result, PINNs can provide models that are more reliable while using less data. Specifically, the PINNs objective is to learn the solutions of a systems of equations using supervised learning on the available data and incorporating the knowledge of physical laws and constraints into the training process. However, solving single differential equations with a PINN may be relatively simple, solving systems of coupled differential equations may not be so simple. In this study, I present a neural network model specialized in solving differential equations of enzyme kinetics that has the main characteristic of being a demonstrative simple case of coupled equations system. The study focuses mainly on the theoretical aspects of the definition of a physics-informed loss function and shows a case study that highlights the challenges still to be overcome in solving systems of coupled differential equations.


Introduction and background
Artificial neural networks have been utilised to solve issues in a variety of practical disciplines, including computer vision, natural language processing, and many others, throughout the last few decades.Another very interesting application in the scientific machine learning (ML) field has just emerged: the solution of differential equations (DEs) using artificial neural networks, employing a method known as Physics-Informed Neural Networks (PINNs).
Usually, traditional mathematical physics models are a set of equations defined by a domain expert who parametrize them to best fit the behaviour of the system of interest.For instance, building an fluid dynamics model with the use of equations for drag, lift, gravity, thrust, etc. and parametrizing the model to try to closely match it to a particular fluid-dynamic device.On the other side, the solely data-driven neural network technique involves utilising supervised learning with a neural network to try to learn the model using data gathered from a particular system.The junction of the two is where PINNs are located.In addition to using physics equations that are provided to the model to promote consistency with the system's known physics, data-driven supervised neural networks are used to learn the model.They benefit from being data-driven to learn a model, able to assure compliance with the physics, and accurate extrapolators beyond the range of data that is now accessible.As a result, PINNs can provide more reliable models with less data.
PINNs have been originally introduced by Raissi et al. [1], and today are beginning to be used more and more in mathematical models of light and wave propagation (Nonlinear Schrödinger equation, Korteweg-De Vries), as well as fluid motion (Navier-Stokes), models of dynamics involving rate equations [2], and optimal control problems.The use of PINNs is currently being studied as a potential replacement for existing numerical techniques.Due to the recent advent of this type of neural networks, the literature is not yet massive, but reports particularly important pivotal works such as e.g [3,4,5,6,7,8,9,10,11,12,13,14,15,16].A comprehensive review can be also found in [17].The resolution of differential equations through the use of neural networks, on the other hand, has been proposed in some work preceding those on PINNs and in some other works contemporary to those of PINNs, such as [18,19,20,21,22,23,24].Codes for solving differential equation with neural networks are also recently available, e.g.NeuroDiffEq [25], nnde [26], PyDEns [27], NeuroDiffEq [28,29], DeepXDE [30,31], and scripts and various codes available in [32].In this study, I use the Matlab Deep Neural Network Toolbox package implementing the Lagaris et al. neural network model [18,33].
I present a PINN to solve a set of ordinary differential equations describing models of enzymatic kinetics.The essential ideas of PINNs are reviewed in this introduction, and then it is demonstrated how to create a PINN from scratch to resolve a first-order ordinary differential equation (ODE).
PINNs are founded on two essential characteristics of neural networks [34] as follows.(i) Neural networks are universal function approximators [35].A neural network can therefore estimate any function.(ii) Automatic differentiation (AD) makes it simple and affordable to calculate the derivatives (of any order) of a neural network output with regard to any of its inputs (and of course, model parameters during back-propagation).Actually, AD is what first made neural networks so effective and successful.
Let us consider the ODE with the initial condition y(0)=1.This ODE has the analytic solution A custom loss function of the neural network (L N N ) that penalizes deviations from satisfying the ODE and the initial condition is defined as a weighted sum of the ODE loss and the initial condition loss: where y N N is the solution of the ODE as output of the neural network, and k a real values tunable parameter.Assuming the reader is familiar with the concept of a neural network and deep neural network and the terminology attached to it, I report in Tables 1, 2, 3, and 4 the slightly modified code for the solution of the ODE in (1).Each code line is proceeded by an explanatory comment.Figure 1 shows the structure of the neural network.
A single run of the codes in Tables 1, 2, 3, and 4 generates the Figures 2 and 3, where the loss function in logarithmic scale, and the comparison of the approximate solution with the analytic one are shown, respectively.In this experiment, the approximate solution deviates from the analytical solution as one moves away from the point defining the initial condition y(0) = 1.The average of 50 runs has brought out the poor accuracy and unsatisfactory predictive power of the classical neural network approach for solving differential equations (and on the other hand justifies the very recent burgeoning literature on the subject).In particular, it shows how the Table 1: Defining the neural network.Code slightly modified from [33].
% Initialization fo the layer graph lgraph = layerGraph; % Definition of the layers of the network.% A fully connected layer multiplies the input by a weight matrix % and then adds a bias vector.% A sigmoid layer applies a sigmoid function to the input such that % the output is bounded in the interval (0,1).specification of the initial condition alone is not a sufficient constraint to guarantee satisfactory accuracy.A fortiori, when the differential equation describes a physical system, it is necessary to incorporate as much knowledge as possible about the phenomenology of the system.In the case of systems of coupled differential equations, the possibility of instructing the neural network with knowledge of the physical laws governing the phenomenon under study becomes even more important since not only does it allow for greater accuracy, but it also makes it possible to manage, within certain limits of complexity of the system's dynamics, the possible coupling between equations.We shall see in this study an example involving enzyme kinetics.
Table 2: Specifying the training options.Code slightly modified from [33].
% Create a dlnetwork object from the layer graph.dlnet = dlnetwork(lgraph); % Train for 15 epochs with a mini-batch size of 100.numEpochs = 15; miniBatchSize = 100; % Specify a learning rate of 0.5, a learning rate drop factor of 0.7, % a learning rate drop period of 5, and a momentum of 0.9.initialLearnRate = 0.5; learnRateDropFactor = 0.7; learnRateDropPeriod = 5; momentum = 0.9; icCoeff = 8; % Give the loss function a coefficient of 7 for the starting condition term.% The proportionate contribution of the original condition to the loss % is indicated by this coefficient.

Enzymes kinetics
Typically multi-subunited proteins, regulatory enzymes have binding sites on both the catalytic and regulatory subunits for activators and inhibitors and for reactants and products, respectively.A two-subunit enzyme (EE) that changes substrate (S) into product (P ) is what I'm considering in this case study.Each subunit's turnover rate and binding constant depend on whether or not the other (same) subunit is coupled to S. The set of reactions is Applying the rapid-equilibrium approximation [36] on the enzyme-substrate complexes, the following equalities are found.

K
(2) K M is the Michaelis constant for an enzyme's interaction with the first substrate it binds to; it is inversely proportional to the enzyme's affinity for the first substrate.Instead, the relationship between K (2) M and the enzyme's affinity for the second substrate to bind is inverse.Most enzymes that have been investigated thus far have Km values that typically fall between 10 −3 and 10 −6 molar (1 mM -1µ M).
The reaction rate equation then becomes Two limiting cases of Eq. ( 13) are known as Hill equation and Substrate equation, and are as follows.
(i) Hill equation that is obtained when K and that is expressed by the following equation (ii) Substrate inhibition that is obtained when M → 0, such that M = V max = constant, and that is expressed by the following equation In the next section, I will present the codes and the results for/of the trainings of a neural network for solving the system of equations in (13), and instructed by the physical law of conservation as in Eq. (10).It all boils down, at least until the system of equations is too large and the physical phenomenon under study is known to such an extent that the relationships between the variables in the equations can be defined to define an appropriate loss function.

Neural network to learn enzymes kinetics I define the loss function as
Considering the logarithm of equation (21) In the limit of [S] large ([S] > 40) mole), the second member of Eq. ( 22) is a constant C = 8, and therefore the time behaviour of the enzyme concentration is linear: On the other hand, for values of [E] between 0 and 1, the time derivative of [E] depends linearly on [S], in good approximation with an angular coefficient given by the concentration of [E].This fact therefore implies that the loss function would become an integro-differential equation, which is rather complex to solve using a neural network.In the Figures 5 and 6, I show respectively the graphs of the numerical solution of the rate equation (13) and the progress of the loss function, as obtained with the neural network, defined as in Figure 1 and Table 1, with the loss function as in Eq. ( 16).Comparing Figure 5 with Figure 8, I find that the neural network trained according to the loss function reproduces the results obtained by classical numerical integration methods with good approximation.

Conclusions
With a relatively simple example, I have shown in this study the challenges posed by solving systems of differential equations using neural networks, and I have proposed to address these challenges by integrating knowledge of the physical laws of dynamics that the system of physical equations is intended to describe into the loss function.This integration increases the accuracy of the neural network, but another problem remains open and in need of general solution strategies.I am referring to the case of coupled equations that could lead in "physics-informed"-type approaches to differential loss functions that are not easy to solve except over certain domain intervals.Ideas to solve this problem are beginning to be put into the pipeline and certainly the study in this area deserves to be continued.For example, the definition of the loss function, even when informed by physical laws, remains a delicate point, as the graphs in this study also show.It would indeed be desirable to have a stable loss function around the minimum value in order to have stable learning process and consequently a stable accuracy.13) for the substrate calculated by the neural network with a loss function as in Eq. ( 16).The values of the y-axis are re-scaled between 0 and 1 by the second sigmoid layer.13) for the product by the neural network with a loss function as in Eq. ( 16).
Finally, it is worth mention that numerous conventional numerical approaches can be used to discover approximations for ODE problems without a closed-form solution, but an ODE can also be solved by a neural network.The fact that this method yields differentiable approximation solutions in a closed analytic form is just one of its many benefits.13) obtained with the method lsoda solver in R package [37].13) obtained with the method lsoda solver in R package [37] and rescaled between 0 an 1 for a better comparison with Figure 5.

Figure 1 :
Figure 1: Structure of the neural network defined in Tables 1-4.

Figure 2 :
Figure 2: Logarithm of the loss function for a single run of the code in Tables 1, 2, 3, and 4 for the solution of the equation y ′ = −2y, with y(0) = 1..

Figure 3 :
Figure 3: Logarithm of the neural network accuracy for a single run of the code in Tables 1, 2, 3, and 4 for the solution of the equation y ′ = −2y, with y(0) = 1.

Figure 4 :
Figure 4: Average of the logarithm of the neural network accuracy over 50 run of the code in Tables 1, 2, 3, and 4 for the solution of the equation y ′ = −2y, with y(0) = 1.The solution calculated by the neural network (approximate solution, in the legend) is just over one standard deviation away from the analytical solution.
network with a different expression of [EE] = [EE](t) in the loss function depending on whether the value of the enzyme concentration was greater or less than 1.I assumed a linear time course of the enzyme for concentration values not exceeding 1.This approximation avoids obtaining an integer-differential loss function and is supported by the fact that since [S] ≪ [E]the variations of [S] with respect to [E] can be considered linear.I used the following values for the hyper-parameters

Figure 5 :
Figure 5: Solution of Eq. (13) for the substrate calculated by the neural network with a loss function as in Eq. (16).The values of the y-axis are re-scaled between 0 and 1 by the second sigmoid layer.

Figure 6 :
Figure 6: Loss function during the process of solution of Eq. (13) for the product by the neural network with a loss function as in Eq. (16).

Figure 8 :
Figure 8: Numerical solution of Eq. (13) obtained with the method lsoda solver in R package[37] and rescaled between 0 an 1 for a better comparison with Figure5.