Data-Driven Reconstruction of Stochastic Dynamical Equations based on Statistical Moments

Stochastic processes are encountered in many contexts, ranging from generation sizes of bacterial colonies and service times in a queueing system to displacements of Brownian particles and frequency fluctuations in an electrical power grid. If such processes are Markov, then their probability distribution is governed by the Kramers-Moyal (KM) equation, a partial differential equation that involves an infinite number of coefficients, which depend on the state variable. The KM coefficients must be evaluated based on measured time series for a data-driven reconstruction of the governing equations for the stochastic dynamics. We present an accurate method of computing the KM coefficients, which relies on computing the coefficients' conditional moments based on the statistical moments of the time series. The method's advantages over state-of-the-art approaches are demonstrated by investigating prototypical stochastic processes with well-known properties.


Introduction
The analysis of time series reflecting the dynamics of complex systems, ranging from physics via engineering to medicine and the neurosciences, needs to be based on assessing arXiv:2305.10990v2[cond-mat.stat-mech]3 Aug 2023 the interactions and strengths of random forces operating within the systems.Modeling of such systems and time series has been pursued for decades [1,2,3,4,5].Time series with a finite Markov-Einstein scale can be modeled by a data-driven reconstruction of the governing equations, leading to stochastic differential equations (SDE).These consist of deterministic and stochastic terms, with the latter corresponding to the fluctuating behavior of the measured time series.As described below, the parameters of the SDE, known as Kramers-Moyal (KM) coefficients, are estimated directly from the measured data [3,4,5].
Not only does the modeling aim to understand the physics of the system and to extract as much information from the data as possible, but it also aims to make probabilistic predictions, namely to address the question: given the state of a system at time t, what is the probability of finding it in a particular state at time t + τ ?If the state of a system at time t depends statistically only on its state at the previous time step, but not on earlier ones, then the stochastic process represents a Markov process.A given time series may be considered as Markov above the Markov-Einstein scale, which can be estimated directly from observations via some statistical tests [3,4,5].
It is known that the probability distributions -both marginal and conditionalof Markov processes are governed by a first-order partial differential equation in time and an infinite series of derivatives with respect to the state variable x.The equation is known as the Kramers-Moyal equation [1,2] and is given by subject to the initial condition p(x, t 0 |x 0 , t 0 ) = δ(x − x 0 ).Here, D (l) (x, t) are the conditional moments of the probability density functions of the transition rates, which -if all known -can identify the probability distribution of the transition rates.They are the aforementioned KM coefficients and are given by with K (l) (x, t, τ ) defined as and with ⟨• • • ⟩ denoting averaging over the conditional distribution.State-of-the-art procedures for estimating the conditional moments K (l) (x, t, τ ) and, henceforth, the KM coefficients involve histogram-based [5] and kernel-based regression [6,7,8] as well as the maximum likelihood method [9,10,11], with the first two methods estimating the conditional probability distribution function (PDF) p(x ′ , t+τ |x, t) for given x and x ′ in order to calculate the average in Eq. ( 3).Other methods are the unbiased estimation method [12] and the statistical inference approach [13]; see also [14] for double kernelbased regression for unevenly sampled time series.In general, estimating the conditional PDF using the histogram-based regression with a fixed binning, given the time series with finite number of data points, will lead to a sparse matrix (for p(x ′ , t+τ |x, t)) and, therefore, will not provide reliable results for KM coefficients.Furthermore, the kernel-based regression requires an a priori selection of the kernel and its band-width to compute the conditional moments, which represents the method's main obstacle for yielding accurate results.In addition to such limitations, due to the sparsity of the matrix of conditional PDFs, such procedures (histogram/kernel) are applicable in one and rarely in two dimensions [3,15,16].On the other hand, similar to other optimization problems, maximum likelihood estimates can be sensitive to the choice of the starting values and are computationally expensive.To overcome such limitations, we introduce in this paper a new approach, the moments method, for computing the KM coefficients using the short-time correlation functions and statistical moments of time series.

Estimation of Kramers-Moyal conditional moments using statistical moments
If y l (t) = [x(t + τ ) − x(t)] l represents the l-th increments of x(t), and assuming that the conditional moments in Eq. ( 3) can be approximated by an l ′ -order polynomial (see the discussion below for the choice of l ′ ), one can write where we omitted the dependence of ϕ i on t and τ to simplify notation.We find that the coefficients ϕ 0 , ϕ 1 , • • • , ϕ l ′ are the solution of a system of linear equations (see Appendix A) where ⟨x k ⟩ is the statistical moment of x(t) of order k.The left-hand side of Eq. ( 5) involves terms such as ⟨x i (x(t + τ ) − x(t)) l ⟩, which require the estimation of short-time correlation functions of ⟨x i x(t + τ ) j ⟩ and of statistical moments of ⟨x i+j ⟩.Therefore, to compute the KM coefficients, it is necessary to have knowledge about the short-time correlation functions and statistical moments of a time series.By solving the matrix equation for ϕ 0 , ϕ 1 , • • • , and ϕ l ′ , the conditional moments and the KM coefficients of order l are determined by [cf.Eq. ( 2)] The ϕ coefficients are a function of the time lag τ .These coefficients are estimated ( φ) for different time lags τ = (1, 2, 3, . ..) dt and a linear regression is performed for the first four time lags.The slopes of the linear regressions are the ϕ coefficient.Here, dt is the sampling interval of the time series x(t).For a non-stationary process, the moments matrix on the right hand side and the vector on the left hand side of Eq. ( 5) will be time-dependent, implying that the coefficients ϕ i can also depend explicitly on time (see Appendix B, also for a validation of our method from an analytical perspective).
For a given time series, the order of the polynomial l ′ is selected based on resolving of x 2l ′ p(x), especially at the tails of p(x).By "resolving", we mean that for large values of x, given a sufficient number of data points, estimates of x 2l ′ p(x) will be reliable.Polynomials of higher orders l ′ require a larger amount of data points in order to safely estimate the statistical moments ⟨x 2l ′ ⟩.For a given number of data points n, increasing l ′ beyond some upper bound l ′ c will no longer resolve x 2l ′ p(x) and, therefore, one should stop the expansion in powers of x at order l ′ c .In Appendix C, we determine numerically the upper order of the moments, l ′ c , for the three examples presented in the following.The proposed method is also applicable to any non-polynomial KM coefficients, such as fractional or sinusoidal functions.Examples of non-polynomial expressions in Eq. ( 3) include the host-immune-tumor model (also known as dynamical model of cancer growth) and the Kuramoto-Sakaguchi phase oscillator model with higher-order interactions [17], to name but a few.
Before we demonstrate advantages of the proposed method over state-of-the-art approaches, we briefly discuss the truncation of the KM equation ( 1) to rate the number of KM coefficients that should be derived in case of an unknown process.The KM equation offers three possibilities: (i) truncating the expansion at l = 1 represents a deterministic process; (ii) truncating at l = 2 results in the Fokker-Planck equation for diffusion processes; and (iii) including all terms up to l → ∞.Checking the condition D (l) (x, t) = 0 for l ≥ 3 in a given time series is not a straightforward task as it requires knowledge of the KM conditional moments K (l) (x, t, τ ) in powers of the sampling time τ .Lehnertz et al. have shown that for a process to be classified as diffusive, the condition K (4) (x, t, τ ) ≃ 3(K (2) (x, t, τ )) 2 must be satisfied [24].If this holds, then the first two KM coefficients are sufficient to describe the measured stochasticity of the time series.Otherwise, the dynamics can be approximated using a jump-diffusion process which requires knowledge of the KM coefficients of orders 1, 2, 4, and 6 [23,5].
In what follows, we reconstruct one-dimensional stochastic dynamical equations from three prototypical stochastic processes with well-known properties, of which two examples represent diffusion processes and the third one is a jump-diffusion process.
We generated synthetic time series by a numerical simulation of the corresponding dynamical equations using the adaptive integration method for Itô SDEs [18,19].Then, we estimated the KM coefficients D (l) (x, t) of the time series using four methods, namely  7) with drift and diffusion coefficients D (1) (x) = −x and D (2) (x) = 1.Lower panels: Theoretical (dashed lines) and estimated drift coefficients using histogram-based and kernel-based regressions, maximum likelihood estimation (MLE), and the proposed moments method.We used the best-tuned values (minimum MSE and maximum R 2 score) for the number of bins and band-width for histogrambased and the kernel-based regressions, which we determined to be M = 45 and h ≃ 0.3 (with a Gaussian kernel).We chose the order of polynomials for estimating the drift and diffusion coefficients to be l ′ = 3 and l ′ = 4, respectively (cf.Appendix C).Error bars are the standard error of the mean, estimated from 100 realizations of synthetic time series.
histogram-based and kernel-based regressions, maximum likelihood estimation, and the moments methods proposed in this paper.For each example, the standard error of the mean of KM coefficients was computed based on 100 realizations of the applied noises and with n = 10 6 data points.The mean-squared error (MSE) of the estimated as well as the coefficient of determination (R 2 score) between estimated and actual values for the drift and diffusion coefficients (the first two KM coefficients) and the jump parameters, i.e., amplitude and jump-rate, are reported in Table 1.
Example 1: Diffusion processes: Ornstein-Uhlenbeck process.The Pawula theorem [20] states that the KM expansion can be truncated at l = 2 if D (4) (x, t) vanishes.The KM equation then reduces to the Fokker-Planck equation, with the state variable x(t) describing the diffusion process.In that case, x(t) will be governed by the Langevin equation which (using the Itô description) has the following form: where D (1) (x, t) and D (2) (x, t) are the drift and diffusion coefficients, respectively.{W (t), t ≥ 0} is a scalar Wiener process.We consider the Langevin equation (7) with D (1) (x) = −x, D (2) (x) = 1, and integrate it with dt = 0.01 to generate synthetic time series.Our findings for D (1) (x) and D (2) (x) are shown in Fig. (1).For the linear process considered, the MSEs of D (1) (x)  8)) as well as drift and diffusion coefficients D (1)  and D (2) (x) estimated with the moments method are smaller (with R 2 closer to unity) than the respective values estimated with the other methods (Table 1).For an analytical solution of Eq. ( 5) for the coefficients ϕ i and the KM coefficients, as well as for the condition to stop increasing the order of moments, see Appendices B and C.
Example 2: Diffusion processes: the noisy genetic model.As a second example from the family of diffusion processes, we consider a nonlinear multiplicative process, namely, the noisy genetic model in the context of population genetics [21], which exhibits a noise-induced transition from unimodal to bimodal probability distribution functions with increasing noise intensity.The noisy genetic model is described as a one-variable diffusive SDE [22] (see Appendix D) where x ∈ [0, 1], and α ∈ (0, 1) stands for the mutation rate.Here, β 0 denotes the gene selection factor and, independent of α and β 0 , the deterministic part of the dynamical system (8) has only one equilibrium point in the corresponding interval, which is stable.Moreover, the intensity σ is a real constant.
There is a critical σ 2 c such that for σ 2 < σ 2 c , the stationary solution of the corresponding Fokker-Planck equation, p s (x), has only one extremum, namely a maximum.When σ 2 > σ 2 c , the stationary probability distribution has three extrema, with x 2 being a minimum and x 1 and x 3 each being a maximum.Therefore, when , the probability distribution becomes flat, followed by a transition to bimodal distribution functions for more intense noise.
We integrate Eq. ( 8) with dt = 0.01 to generate synthetic time series for in Fig. (2).We find that for all values of σ 2 , the MSEs corresponding to the proposed moments method are comparably smaller (with R 2 closer to unity), as presented in Table 1.
Example 3: Jump-diffusion process with a bistable potential.It has become evident that a non-vanishing D (4) (x, t) is a signature of jump discontinuities in a time series [23,24].
In this case, one needs the KM coefficients of at least order six, D (6) (x, t), and in some other cases, up to order eight [5]) to estimate jump amplitude and rate, and the KM equation will contain all the terms up to l → ∞ [25,26,27,28,23,29,30,31,32,33,34].
For non-vanishing D (4) (x, t), one models a time series as a jump-diffusion process, which is given by a (Itô) dynamical stochastic equation [5,23] where {W (t), t ≥ 0} is a scalar Wiener process, D (1) (x, t) is the drift coefficient, D(x, t) is the diffusion function, and J(t) is a Poisson jump process.The jump rate is λ(x) that can be state-dependent with size ξ, which we assume to have some symmetric distribution with finite even-order statistical moments ⟨ξ 2j ⟩ for j ≥ 1.For a jumpdiffusion process ( 9), all functions and parameters can be determined based on measured time series by estimating the KM conditional moments [23,5] (see Appendix E).For simplicity, we assume that the jump size ξ has a Gaussian PDF with variance σ 2 ξ .In that case, ⟨ξ 2k+1 ⟩ = 0 and ⟨ξ 2k ⟩ = 2n! 2 k k! ⟨ξ 2 ⟩ k .We consider a dynamical system with a cubic nonlinearity (D (1) (x, t) = x−x 3 ) with additive white noise, D(x, t) = √ 0.75, jump amplitude σ 2 ξ = 1, and jump rate λ = 0.3.The deterministic part of the dynamics has three fixed points at 0, one unstable fixed point, and ±1 as stable ones.In Fig. (3), we show the numerical values for both the reconstructed drift and diffusion coefficients, as well as jump amplitude and jump rate.We find that the MSEs corresponding to the proposed moments method are comparably smaller, see Table 1.

Conclusions
We introduced a new method to estimate conditional averages of increments (x(t + τ ) − x(t)) l , with l = 0, 1, • • • , that are needed for a data-driven reconstruction of stochastic dynamical equations from experimental time series.The method consists of determining a conditional average in terms of statistical moments of the measured time series.It replaces the estimation of the sparse conditional probability distributions with solving a system of linear equations that has polynomial time complexity.Using three paradigmatic linear and nonlinear stochastic processes, we demonstrated that meansquared errors (as well as coefficients of determination of KM coefficients) estimated with the proposed method are smaller (larger) than the respective values estimated with current state-of-the-art methods.
While the examples presented in this work focus on stationary moments, we acknowledge the potential extension of our theoretical approaches to non-stationary  9)) with drift and diffusion coefficients D (1) (x) = x − x 3 and D(x) = √ 0.75 ≈ 0.866, and with jump amplitude σ 2 ξ = 1 and jump rate λ = 0.3.Theoretical (dashed lines) and estimated functions and parameters of the jump-diffusion model using the four methods are shown in the lower panels.The best-tuned values for the number of bins and the kernel band-width were M = 61 and h ≃ 0.3 (with a Gaussian kernel).We expanded the KM coefficients D (4) (x, t) and D (6) (x, t) with up to the fourth-and sixth-order polynomials (l ′ = 4 and l ′ = 6), respectively (cf.Appendix C).Error bars are the standard error of the mean, estimated from 100 realizations of the synthetic time series.Note, that we do not report results for MLE here and in Table 1, since a theoretical expression for the short-time propagator p(x ′ , t + τ |x, t) -which is required for MLE -is not known for the jump-diffusion processes, in contrast to the general Langevin equation [10].
processes (as discussed in Appendix B).We note that our method has not been implemented for non-stationary data at this stage.However, we are actively working on proving the method's applicability to time-dependent Fokker-Planck equations and expanding its scope to non-stationary time series.We intend to provide further updates and findings in the near future.As a final remark, we note that the proposed method is easily extendable to multivariate time series, which will enable one to characterize pairwise, as well as higher-order interactions in the dynamics [17].Our moments method provides a way to rapidly and reliably estimate functions and parameters needed for a reconstruction of dynamic equations of complex systems.Multiplying Eq. (A.7) by x, x 2 , • • • , x l ′ and integrating over x, we obtain a system of linear equations As we mentioned in the main text in the first example, the Ornstein-Uhlenbeck (OU) process can be written as a Langevin equation of the form dx(t) = −θx(t) dt + µ dW (t) , (B.1) with drift and diffusion coefficients given by D (1) (x, t) = −θx, and 2D (2) (x, t) = µ.
Here µ and θ are positive constants.For θ > 0, the process (B.1) has a stationary probability distribution [5].The mean of the process can be set to zero and the covariance of x(t) and x(s) in the stationary state can be written as [5] cov(x(t), x(s The statistical moments of the OU process would be as follows The double factorial is the product of all the integers from 1 up to k that have the same parity as k. To demonstrate our moments method, let us assume that the conditional KM moments are a cubic function of x, so by having the analytical form of the covariance and statistical moments, the matrix elements of the system of the linear equations (Eq.A.9) can be determined as For the first order KM moment (to obtain the drift coefficient), we set y ≡ y 1 (t, τ ) = x(t + τ ) − x(t), so the vector in the l.h.s. of Eq. (B.4) would be obtained as The solution for the parameters vector in this system of linear equations is Data-Driven Reconstruction of Stochastic Dynamical Equations based on Statistical Moments13 which means K (1) = (e −θτ − 1)x(t), and the drift coefficient is For the second KM moment (to obtain the diffusion coefficient), we set y ≡ y 2 (t, τ ) = (x(t + τ ) − x(t)) 2 , so the vector in the l.h.s. of Eq. (B.4) and the solution to the linear system would be as follows Note that the moments matrix in the r.h.s. of Eq. (B.4) is not a function of y and does not change.Therefore, the diffusion coefficient is Example 2: Non-stationary process, Wiener process.
As an example of a non-stationary process, we consider the Wiener process whose derivative is a white noise.It is known that the Wiener process has a vanishing drift and a constant diffusion coefficient which equals to 1/2 [5].The covariance of a Wiener process is cov(x(t), x(s)) = min (t, s) , (B.11) where min(t, s) is the minimum of two chosen times (t, s).With x(0) = 0 and t 0 = 0 the statistical moments of Wiener process are given by ⟨x 2k+1 ⟩ = 0 and where Γ(x) is the Euler gamma function.Here ⟨• • • ⟩ denotes ensemble averaging.

Figure 1 .
Figure 1.Upper panel: Excerpt of an exemplary time series of an Ornstein-Uhlenbeck process (Eq.(7) with drift and diffusion coefficients D(1) (x) = −x and D (2) (x) = 1.Lower panels: Theoretical (dashed lines) and estimated drift coefficients using histogram-based and kernel-based regressions, maximum likelihood estimation (MLE), and the proposed moments method.We used the best-tuned values (minimum MSE and maximum R 2 score) for the number of bins and band-width for histogrambased and the kernel-based regressions, which we determined to be M = 45 and h ≃ 0.3 (with a Gaussian kernel).We chose the order of polynomials for estimating the drift and diffusion coefficients to be l ′ = 3 and l ′ = 4, respectively (cf.Appendix C).Error bars are the standard error of the mean, estimated from 100 realizations of synthetic time series.

Figure 2 .
Figure 2. Top to bottom: Excerpts of exemplary time series of the noisy genetic model (Eq.(8)) as well as drift and diffusion coefficients D(1) (x) = 1 2 − x + β 0 x(1 − x) (with β 0 = 3) and D (2) (x) = σ 2 x 2 (1 − x) 2 for σ 2 < 2.84 (top), σ 2 ≈ 2.84 (middle), and σ 2 > 2.84 (bottom).Theoretical (dashed lines) and the estimated drift and diffusion coefficients using the four methods.The best-tuned values for the number of bins and for the kernel band-width were M = 41 and h ≃ 0.3 (with a Gaussian kernel).The order of polynomials for estimating the drift and diffusion coefficients were l ′ = 3 and l ′ = 4, respectively (cf.Appendix C).Error bars are the standard error of the mean (SEM) estimated from 100 realizations of the synthetic time series.
Figure 2. Top to bottom: Excerpts of exemplary time series of the noisy genetic model (Eq.(8)) as well as drift and diffusion coefficients D(1) (x) = 1 2 − x + β 0 x(1 − x) (with β 0 = 3) and D (2) (x) = σ 2 x 2 (1 − x) 2 for σ 2 < 2.84 (top), σ 2 ≈ 2.84 (middle), and σ 2 > 2.84 (bottom).Theoretical (dashed lines) and the estimated drift and diffusion coefficients using the four methods.The best-tuned values for the number of bins and for the kernel band-width were M = 41 and h ≃ 0.3 (with a Gaussian kernel).The order of polynomials for estimating the drift and diffusion coefficients were l ′ = 3 and l ′ = 4, respectively (cf.Appendix C).Error bars are the standard error of the mean (SEM) estimated from 100 realizations of the synthetic time series.

Figure 3 .
Figure 3. Top: Excerpt of an exemplary time series of a jump-diffusion process (Eq.(9)) with drift and diffusion coefficients D (1) (x) = x − x 3 and D(x) = √ 0.75 ≈ 0.866, and with jump amplitude σ 2 ξ = 1 and jump rate λ = 0.3.Theoretical (dashed lines) and estimated functions and parameters of the jump-diffusion model using the four methods are shown in the lower panels.The best-tuned values for the number of bins and the kernel band-width were M = 61 and h ≃ 0.3 (with a Gaussian kernel).We expanded the KM coefficients D (4) (x, t) and D(6) (x, t) with up to the fourth-and sixth-order polynomials (l ′ = 4 and l ′ = 6), respectively (cf.Appendix C).Error bars are the standard error of the mean, estimated from 100 realizations of the synthetic time series.Note, that we do not report results for MLE here and in Table1, since a theoretical expression for the short-time propagator p(x ′ , t + τ |x, t) -which is required for MLE -is not known for the jump-diffusion processes, in contrast to the general Langevin equation[10].

9 )
which proves the relation(5) in the main text.Data-Driven Reconstruction of Stochastic Dynamical Equations based on Statistical Moments12 Appendix B. Analytical solutions Example 1: Stationary process, Ornstein-Uhlenbeck process.
find the diffusion coefficient one has to solve the following set of linear equations with y ≡ y 2 (t, τ ) = (x(t + τ ) − x(t)) 2 :

Figure C1 .
Figure C1.Plots of x 2 p(x), • • • , x 10 p(x), estimated from time series of example 1 for different n dt.We estimate the error bars from 10 realizations of the generated time series.

Figure C4 .Figure C5 .
Figure C4.Dependence of statistical moments ⟨x 2k ⟩, with k = 1, 2, • • • , 5 on n dt from 10 realizations for various n dt for the three examples of the main text.The middle panel is for example 2 with σ 2 1 < σ 2 c .

Table 1 .
and σ 2 2 ≈ σ 2 c .Our results for the drift and diffusion coefficients are summarized Mean squared errors (MSEs) of estimated and R 2 scores between estimated and actual values of the drift and diffusion coefficients, as well as jump rate and jump amplitude for the examples considered.Results are given for the histogram-based and the kernel-based regressions, maximum likelihood estimation (MLE), and the proposed moments method.Since no theoretical expression for the short-time propagator of p(x ′ , t + τ |x, t) is known for the jump-diffusion processes, MLE could not be applied.MSEs and R 2 scores for the Ornstein-Uhlenbeck process were estimated in the interval x ∈ [−2, 2], for the noisy genetic model in the interval x ∈ [0.1, 1], and for the jumpdiffusion process in the interval x ∈ [−1.5, 1.5].