Reconstructing complex system dynamics from time series: a method comparison

Forough Hassanibesheli; Niklas Boers; Jürgen Kurths

doi:10.1088/1367-2630/ab9ce5

1. Introduction

A complex system is a system composed of highly interconnected components in which the collective property of an underlying system cannot be described by dynamical behavior of the individual parts alone. That is to say, the emergent behavior of a system is not evident in the components by themselves, but only in the system as a whole. Examples of this include lasers, fluids, financial markets, biological and social systems, but also Earth system and in particular climate dynamics variables [1–4]. Typically, complex systems are governed by nonlinear interactions and intricate fluctuations, and to retrieve the dynamics of a system from time series of only a few observed variables, it is required to characterize and assess interactions between deterministic tendencies and random fluctuations.

For complex systems, with large numbers of degrees of freedom interacting on various time scales, deriving explicit time-evolution equations remains both conceptually and computationally challenging. Many approaches from time series analysis and stochastic modeling have been proposed to model the behavior of complex systems based on observed time series by separating the system's behavior between observed macroscopic and hidden microscopic scales. Furthermore, it is typically assumed that these scales can be separated into slow and fast dynamics, respectively. Stochastic differential equations (SDEs) are then used as a result of the impossibility to determine all necessary processes and scales in comprehensive numerical models. In SDEs the macroscopic variables are modeled explicitly while the microscopic ones are represented by noise terms. The intuition underlying such approaches is to identify a small number of variables which are assumed to characterize the underlying dynamics, and solve the governing motion equation for the reduced-dimensional system in terms of SDEs. For instance master stability function has been extensively used to analyze the stability of the synchronous state in dynamical systems (both stochastic and deterministic) [5, 6].

During recent years, interest in modeling stochastic dynamics using Langevin equations (LEs) has elevated in many disciplines [7], such as investigating turbulent cascades [8], describing nano-friction fluctuations [9], or modeling molecular dynamics [10] and financial systems [11]. A commonly used, non-parametric method for deriving the deterministic drift and stochastic diffusion terms of an LE is based on the Kramers–Moyal (KM) coefficients, which correspond to the first and second order moments containing the spatial and temporal dependencies of the underlying dynamics [12, 13].

In the LE-approach, the dependencies between slow and fast variables are assumed to be negligible as a consequence of the separation of time scales. However, this assumption is not always well-grounded, for instance in Brownian motion [14], when the mass of particles is comparable with that of the surrounding particles, the random force does not obey white noise behavior anymore. Generalized Langevin equations (GLEs) have been proposed as a modification of LEs with no constraint of time scale separability [15]. A GLE equation can be retrieved from times series using projection operator techniques [15–17] and linear response theory [12]. In this study, we obtain the GLE model of a system within the projection operator framework. This technique, developed by Mori [15] and Zwanzig [18](MZ), provides a mathematically consistent foundation for closed, reduced-dimension models. It is restrained, however, to Hamiltonian dynamical systems around thermodynamic equilibrium. This formalism applies a projection operator P that projects the whole system dynamics onto the subspace of slow variables, and projects out the remaining degree of freedom. As a consequence of this dimension reduction, non-Markovian terms arise explicitly in the resulting equation of motion, representing interactions between the slow (observed) and the fast (hidden) variables.

Another dimension-reduction approach, belonging to a class of multivariate stochastic models, is empirical model reduction (EMR) [19, 20]. This methodology, which can be understood as an extension of the classical linear inverse model (LIM) approach [21], has been extensively used in modeling observed climate variables [22] and in several cases can successfully reproduce their statistical characteristics. In constructing the EMR model, it is assumed that the time evolution of a system can be approximated by information embedded in the observed time series with only a coarse knowledge about a system's dynamics.

When fitting a prescribed drift term with free parameter to a real-world series, the residual will typically not be white neither in space nor in time; i.e., it will be state-dependent and exhibit serial correlations. To cope with this problem, the EMR approach iteratively models the time derivative of the residual as a linear function of the state of the observed variable and the residual itself, until the final residual is approximately independent (see methods section below). Substantively, this stochastic model constructs the evolution of macroscopic variables by introducing hidden components representing microscopic variables. In the parameterization procedure, hidden variables are approximated by the past observed variables, which constitutes a non-Markovian closure [20].

The main purpose of this work is to systematically investigate the statistical properties of different dynamical systems using the three aforementioned methods (KM, MZ, and EMR). In section 2, we shortly introduce the employed reduction methods. In section 3 we present a detailed comparison of the performance of these methods based on different kinds of synthetic and real-world time series, and discuss them briefly. Concluding remarks are provided in section 4.

2. Methods

2.1. Langevin equation

In the early twentieth century Paul Langevin proposed a quantitative description of random motion of colloidal particles suspended in a fluid, which has been known as a key problem of non-equilibrium statistical mechanics. This theory's applicability later has been extended to express the dynamical behavior of varieties of macroscopic systems without genuine particle ontologies. The original Langevin equation is a stochastic differential equation, which represents the time evolution of a subset of degree of freedoms containing both frictional and random forces that are associated with the fluctuation–dissipation theorem (FDT) [12].

Consider a system for which the evolution of the macroscopic states x(t) obeys the following equation of motion:

$\begin{equation}\frac{\mathrm{d}x\left(t\right)}{\mathrm{d}t}=f\left(x,t\right)+g\left(x,t\right)\eta \left(t\right),\end{equation} \tag{ 1 }$

where f(x, t) and g(x, t)η(t) represent the deterministic force (e.g., friction and gravity) and stochastic forces (e.g., noise and chaotic particle interactions in many-body systems), respectively. Here, η(t) is conventionally a stationary, δ-correlated Gaussian process with zero mean: 〈η(t)〉 = 0 and 〈η(t)η(t')〉 = δ(t − t'). The presence of δ-correlated noise indicates that the Langevin process is a Markov process. For the class of stochastic processes following equation (1), the drift f and the diffusion g can be directly estimated from the measured data without any prior knowledge about the internal dynamics of a system using the Kramers–Moyal (KM) coefficients [23, 24]. The nth KM coefficient Dⁿ(x, t) can be understood in terms of the transition probability densities in the limit of dt → 0. Formally, the KM coefficients are defined as:

$\begin{equation}{D}^{n}\left(x,t\right)=\frac{1}{n!}{\mathrm{lim}}_{\mathrm{d}t\to 0}\frac{1}{\mathrm{d}t}\langle {\left(x\left(t+\mathrm{d}t\right)-x\left(t\right)\right)}^{n}{\vert }_{x\left(t\right)=x}\rangle .\end{equation} \tag{ 2 }$

In this study, to calculate conditional moments from time-discrete data, we employed the Nadaraya–Watson estimator [25, 26]:

$\begin{equation}\frac{1}{\mathrm{d}t}\langle {\left({x}_{\left(i+1\right)\mathrm{d}t}-{x}_{i\mathrm{d}t}\right)}^{j}{\vert }_{{x}_{i}=x}\rangle =\frac{\sum _{i=1}^{n}K\left(\frac{\left({x}_{i\mathrm{d}t}-x\right)}{h}\right){\left({x}_{\left(i+1\right)\mathrm{d}t}-{x}_{i\mathrm{d}t}\right)}^{j}}{\sum _{i=1}^{n}K\left(\frac{\left({x}_{i\mathrm{d}t}-x\right)}{h}\right)\mathrm{d}t}.\end{equation} \tag{ 3 }$

The kernel function K is here assumed to be Gaussian and accordingly the bandwidth h is determined using Silverman's rule of thumb: $h=1.06\sigma {N}^{\frac{-1}{5}}$ where σ is the standard deviation of the time series under investigation.

The numerical discretization of the LE, in Itô's interpretation of stochastic integration, is as follows:

$\begin{equation}\mathrm{d}x\left(t\right)={D}^{1}\left(x,t\right)\mathrm{d}t+\sqrt[]{{D}^{2}\left(x,t\right)}\mathrm{d}W\left(t\right),\end{equation} \tag{ 4 }$

where dW denotes the increments of a Wiener process (known as a stochastic process with stationary independent normally distributed increments). The numerical integration of the equation (4) can be easily implemented using the Euler–Maruyama scheme with accuracy of order $\sqrt[]{{\Delta}t}$ .

$\begin{equation}x\left(t+{\Delta}t\right)=x\left(t\right)+{D}^{1}\left(x,t\right){\Delta}t+\sqrt[]{{D}^{2}\left(x,t\right){\Delta}t}\eta \left(t\right).\end{equation} \tag{ 5 }$

Although the KM method has been applied successfully in many fields, the accuracy of estimation can be affected by finite-time effects since the estimation of drift and diffusion coefficients is sensitive to the sampling frequency of the underlying time series [27, 28]. Using the adjoint Fokker–Planck equation (AFPE) [29], we can derive D¹ and D² analytically for example for an Ornstein–Uhlenbeck (OU) process, and compare to the discrete-time estimates to judge on their accuracy:

$\begin{equation}\frac{\partial }{\partial t}{P}^{+}={D}^{1}\frac{\partial }{\partial x}{P}^{+}+{D}^{2}\frac{{\partial }^{2}}{\partial {x}^{2}}{P}^{+}.\end{equation} \tag{ 6 }$

With a special initial condition p⁺(x, 0) = (x − X) we can solve the AFPE analytically. For instance, in the case of an OU process (containing 10⁶ points with sampling frequency of 10⁻²) subjected to additive and multiplicative noise, respectively, we compared the first and second order KM coefficients with those obtained from AFPE. As can be concluded from figure 1, the values of the KM coefficients estimated from discrete times series according to the equation (2) and the corresponding exact solutions are in a good agreement. One can also use the second-order expansion for moments [30] to check the robustness of estimated drift and diffusion coefficients for above systems.

Figure 1. Refer to the following caption and surrounding text. — **Figure 1.** Comparison of the estimated drift (D1) and diffusion (D2) coefficients with corresponding exact solutions for an OU process containing 10⁶ points with sampling frequency of 10⁻². (a) Estimated D1 and D2 (red) and the values obtained from the analytical solution via AFPE (blue) for an OU process with additive noise. (b) Same as (a) for an OU process subject to the multiplicative noise 1 + x².
Download figure:
Standard image High-resolution image

2.2. Generalized Langevin equation

The Langevin equation is the simplest approximation to describe the dynamics of non-equilibrium systems. As remarked earlier, the random force is assumed to behave like a Gaussian white noise, which does not lead to a good approximation if the time scale of the macroscopic variables is not much longer than that of the microscopic variables. This issue also requires a modification of the fluctuation dissipation theorem in which the correlation function of fluctuations is proportional to the memory of the frictional force. Therefore, generalized Langevin equations (GLE) have been proposed to account for long-range correlations and memory effects of complex systems that do not exhibit strong time scale separation. This generalized equation satisfies a certain fundamental consistency condition which relates the memory function to the auto-correlation of the stochastic force.

One way to derive a GLE is by means of the Mori–Zwanzig (MZ) formalism [17, 31], a projection-based dimension reduction method that redefines a set of ordinary differential equations into a reduced system with a time-independent Hamiltonian as long as the system is close to equilibrium. The MZ formalism assumes that a macroscopic system can be well described by projecting the full microscopic dynamics of a system onto the space of macroscopic variables. The resulting equation consists of three terms; the first, local term defines the self-interactions of the macroscopic variables, the second, non-local term describes memory dependencies of the macroscopic variables, and the last term represents the residual force associated with the fast variables that is typically approximated by white noise:

$\begin{equation}\frac{\mathrm{d}x\left(t\right)}{\mathrm{d}t}={\Omega}x\left(t\right)-{\int }_{{t}_{0}}^{t}K\left(t-{t}^{\prime }\right)x\left({t}^{\prime }\right)\mathrm{d}{t}^{\prime }+R\left(t,{t}_{0}\right).\end{equation} \tag{ 7 }$

By multiplying both sides of equation (7) with x(t₀) and taking the average [32–34], we arrive at the following expression:

$\begin{equation}\left\langle \frac{\mathrm{d}x\left(t\right)}{\mathrm{d}t},x\left({t}_{0}\right)\right\rangle =\langle {\Omega}x\left(t\right),x\left({t}_{0}\right)\rangle -{\int }_{{t}_{0}}^{t}K\left(t-{t}^{\prime }\right)\langle x\left({t}^{\prime }\right),x\left({t}_{0}\right)\rangle \mathrm{d}{t}^{\prime }+\langle R\left(t,{t}_{0}\right),x\left({t}_{0}\right)\rangle \end{equation} \tag{ 8 }$

where 〈R(t)x(t₀)〉 = 0, because the fast variable of the underlying system is uncorrelated with the slow variable. We can rewrite equation (8) in form of an Volterra integro-diffential equation [35, 36] for the autocorrelation function C(t) = 〈x(t), x(t₀)〉:

$\begin{equation}\frac{\mathrm{d}C\left(t\right)}{\mathrm{d}t}={\Omega}C\left(t\right)-{\int }_{{t}_{0}}^{t}K\left(t-{t}^{\prime }\right)C\left({t}^{\prime }\right)\mathrm{d}{t}^{\prime }.\end{equation} \tag{ 9 }$

The memory kernel K can then be calculated by solving this equation with known dC/dt and C. Note that the discrete version of the integral ${\int }_{{t}_{0}}^{t}K\left(t-{t}^{\prime }\right)C\left({t}^{\prime }\right)\mathrm{d}{t}^{\prime }$ is $h{\sum }_{j=0}^{n-1}K\left({t}_{n-1}-{t}_{j}\right)C\left({t}_{j}\right)$ , where we applied the composite left rectangular formula, with t_j = t₀ + hj using equal spacing h. Thus having obtained estimates for the deterministic non-Markovian and the memory term on the right-hand side of equation (7), we are able to determine the random force. The non-Markovian deterministic part on the right-hand side of the equation (7), corresponding to the drift, is nonlinear for systems not close to equilibrium and Ωx(t) is thus replaced with $F\left(\right.x\left(t\right)$ with a nonlinear function F(x(t)). In such cases, we estimate this term using the first KM coefficient and estimate the memory kernel [34] via:

$\begin{equation}\frac{\mathrm{d}C\left(t\right)}{\mathrm{d}t}=\langle F\left(x\left(t\right)\right),x\left({t}_{0}\right)\rangle -{\int }_{{t}_{0}}^{t}K\left(t-{t}^{\prime }\right)C\left({t}^{\prime }\right)\mathrm{d}{t}^{\prime }.\end{equation} \tag{ 10 }$

Note that because we are considering discrete numerical time series, we are in practice using the following evolution equation:

$\begin{equation}x\left(n\right)=x\left({t}_{n-1}\right)+h\left[F\left(x\left({t}_{n-1}\right)\right)-h\sum _{j=0}^{n-1}K\left({t}_{n-1}-{t}_{j}\right)x\left({t}_{j}\right)+R\left({t}_{n-1}\right)\right].\end{equation} \tag{ 11 }$

2.3. Empirical model reduction

The primary goal of spatio-temporal modeling of complex systems is to characterize the properties of underlying dynamics with minimal assumptions about non-sufficient observations. Over the last few years, linear and nonlinear inverse stochastic modeling approaches have been intensively developed and applied to obtain reduced models that can explain the statistics of a full system [21, 37]. The linear inverse model (LIM) approach assumes that the relevant dynamics can be decomposed into linear deterministic and additive Gaussian random fluctuations terms. Although the linear simplification can be considered as a decent tool for describing dynamical systems close to equilibrium, not surprisingly, most of real-world systems exhibit nonlinear behavior. Hence, the empirical model reduction (EMR) [19] has been proposed as a nonlinear generalization of LIM approaches.

The general form of the EMR approach is as follows: first, model the increments dx_k = x_k+1 − x_k of an observed variable x as a nonlinear function of x plus a residual, typically via a least-squares minimization yielding the parameters of f:

$\begin{equation}\mathrm{d}{x}_{k}=f\left({x}_{k}\right)\mathrm{d}t+{r}_{k}^{0} \mathrm{d}t\end{equation} \tag{ 12 }$

Second, model the residual as a linear function of x_k and ${r}_{k}^{0}$ and repeat this until the remaining residual has vanishing autocorrelation:

$\begin{equation*}\begin{matrix}\mathrm{d}{r}_{k}^{0}={b}^{1}\left[{x}_{k},{r}_{k}^{0}\right]\mathrm{d}t+{r}_{k}^{1}\mathrm{d}t\hfill \\ \hfill \dots {.\,.}\hfill \\ \hfill \mathrm{d}{r}_{k}^{n}={b}^{n}\left[{x}_{k},{r}_{k}^{0},\dots ,{r}_{k}^{n}\right]\mathrm{d}t+{r}_{k}^{n+1}\mathrm{d}t.\hfill \end{matrix}\end{equation*}$

Here, $\mathrm{d}{r}_{k}^{n}$ equals ${r}_{k+1}^{n}-{r}_{k}^{n}$ with time index k and n denoting the regression level. The presence of these hidden variables that explicitly depend on the past values of slow variables brings forth 'memory' effects. Incorporating these adaptive number of memory steps resembles the MZ formalism. Opposed to the MZ method, which provides solutions for the dynamics of a system considering the full memory via the kernel K(t) function, the EMR offers approximate solutions based on the iterative procedure described above. We refer to Kondrashov et al [20] for further details and in particular explanations of the relationship between EMR and the MZ formalism.

In equation (12), in fact, we substitute the deterministic term f by the first KM coefficient. This term can be a polynomial with different orders depending on the process dynamics. For instance, to reconstruct a stochastic process that takes place in a double-well potential V(x), we first estimate the deterministic part via the first KM coefficient and then approximate the stochastic term using the recursive procedure.

3. Results and discussion

In the following, we present our results of KM, MZ, and EMR performance for reconstructing various dynamical systems with different levels of complexity. The performance of these three methods is examined in terms of statistical properties such as probability density functions (PDFs) and ACFs of large sets of time series simulated with the derived SDE models.

3.1. Unimodal processes

3.1.1. Synthetic time series

We commence by considering the dynamics of the stationary Gaussian–Markov Ornstein–Uhlenbeck (OU) process:

$\begin{equation}\frac{\mathrm{d}x\left(t\right)}{\mathrm{d}t}=\theta \left(\mu -x\left(t\right)\right)+\sigma \eta \left(t\right).\end{equation} \tag{ 13 }$

The OU process is known to be mean-reverting, i.e., the drift coefficient controls the forcing of the state variable x(t) back to its mean μ. We generate 10⁶ data points with θ = 1, μ = 0, σ = 0.5 and time sampling Δt = 0.01.

In order to numerically integrate equation (13), we apply the Euler–Maruyama method. We recall that for stochastic integration, there are two different approaches, Itô and Stratonovich. In the case of the OU process with an additive noise, these two interpretations are equivalent and in this study we mostly employ the Itô method.

In this study the deterministic terms of simulated time series constructed by the three aforementioned inverse modeling methods (KM, MZ, and EMR) are estimated directly from the first KM coefficient. Figure 2 displays the statistical properties of the time series generated using the original OU process as well as corresponding, randomly chosen time series obtained from the three different methods, in comparison with the original time series. The PDFs are obtained as averages over 1000 simulated time series and the error bars indicate the deviations ±σ around the averages for the 1000 realizations. We calculate the mean squared error (MSE) as an error metric to evaluate the performance of the KM, MZ, and EMR methods in reconstructing the statistical properties of underlying systems. The summary of MSE values can be found in tabels 1 and 2. For the specific settings investigated in the case of equation (13), all three SDE models perfectly reproduce the statistic of the underlying dynamics. Here the auto-correlation exponentially decays and the memory coefficients of GLE are zero except at τ = 1, indicative of a Markovian process.

Table 1. Summary of mean squared error (MSE) between PDFs of different systems and corresponding averaged PDF of simulated time series over several realizations.

Time series	KM	MZ	EMR
OU additive noise	0.000 03	0.000 32	0.000 03
OU multiplicative noise	0.0013	0.0021	0.004
OU colored noise	0.002	0.000 01	0.000 01
S&P500	0.05	0.03	0.06
Niño-3	0.01	0.001	0.001
DW additive noise	0.0004	0.000 42	0.0021
DW multiplicative noise	0.0008	0.0009	0.001
[Ca²⁺]	0.004	0.009	0.009

Table 2. The table illustrates MSE (as an error metric) between ACFs of original systems and corresponding average ACFs of simulated time series.

Time series	KM	MZ	EMR
OU additive noise	0.000 13	0.000 21	0.000 34
OU multiplicative noise	0.000 38	0.0003	0.000 27
OU colored noise	0.07	0.000 04	0.000 13
S&P500	0.002	0.001	0.0008
Niño-3	0.01	0.003	0.006
DW additive noise	0.000 15	0.000 02	0.000 31
DW multiplicative noise	0.0009	0.005	0.001
[Ca²⁺]	0.026	0.004	0.004

Figure 2. Refer to the following caption and surrounding text. — **Figure 2.** A stochastic process with linear drift and additive noise. (a) Original (red) and randomly chosen simulated time series based on KM, MZ, and EMR methods (from top to bottom, respectively). (b) Summary statistics (PDFs in the left column and ACFs in the right column) of original and simulated time series derived from 1000 sample time series reconstructed by the three stochastic models (KM, MZ, and EMR), from top to bottom as indicated in the legend. The original system is an Ornstein–Uhlenbeck process (according to equation (13)) with Δt = 0.01 whose statistical features are shown in red color.
Download figure:
Standard image High-resolution image

Dynamical processes with multiplicative noise are ubiquitous in the natural sciences and beyond [38–40]. For a system subject to an additive noise the deterministic potential corresponds to stochastic steady-state potentials through ${V}_{s}=\frac{{V}_{d}}{\sigma }$ , where σ is the intensity of fluctuations that are independent of the system state x. It implies that, in response to the stochastic force, the system exhibits stochastic fluctuations from its deterministic attractor. In the case of a multiplicative noise, a noise-induced drift becomes visible and consequently a new attractor basin is generated that does not exist in the absence of state-dependent fluctuations. In equation (13) we substitute the additive noise with a multiplicative noise (i.e., the diffusion term is multiplied by 1 + x²) whose intensity depends on the instantaneous value of the state variable x(t):

$\begin{equation}\frac{\mathrm{d}x\left(t\right)}{\mathrm{d}t}=\theta \left(\mu -x\left(t\right)\right)+\left(1+{x}^{2}\right)\eta \left(t\right)\end{equation} \tag{ 14 }$

As it can be concluded from figure 3, all three methods work very well in reproducing the statistical features of the linear system also when exposed to symmetric multiplicative noise.

Figure 3. Refer to the following caption and surrounding text. — **Figure 3.** A stochastic process with linear drift and multiplicative noise (1 + x²). (a) Original (red) and randomly chosen simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) Statistical properties of the observed and simulated Ornstein–Uhlenbeck subject to the multiplicative noise. The left column illustrates the PDFs for simulated results obtained as averages over 1000 realization with uncertainties in blue color. The right column displays the ACFs of ensembles of simulated time series constructed based on the three different stochastic models (KM, MZ, and EMR, from top to bottom).
Download figure:
Standard image High-resolution image

For more comparison, we analyze a system subject to asymmetric noise, which results in a skewed distribution. We thus multiply the noise term in equation (13) by (1 + x) (instead of (1 + x²)) and evaluate the statistical properties of the system. According to figure 4, the KM model exhibits better performance in comparison to the two other models, since it can directly estimate the (in this case asymmetric) state-dependent noise from the time series.

Figure 4. Refer to the following caption and surrounding text. — **Figure 4.** A stochastic process with linear drift and asymmetric multiplicative noise. (a) Original (red) and randomly chosen simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) Statistical properties of the observed and simulated Ornstein–Uhlenbeck subject to the asymmetric noise (1 + x). The left column illustrates the PDFs for simulated results obtained as averages over 1000 realization with uncertainties in blue color. The right column displays the ACFs of ensembles of simulated time series constructed based on the three different stochastic models (KM, MZ, and EMR, from top to bottom).
Download figure:
Standard image High-resolution image

So far we considered additive and multiplicative noise, with η itself given by Gaussian white noise. But in many physical and biological systems, fluctuations exhibit some degree of correlation that cannot be satisfied by white noise. Therefore we substitute the stochastic term of equation (13) with a first order autoregressive process (AR(1)) and investigate the performance of the three aforementioned methods to derive SDEs in the presence of colored noise.

An AR(1) process is given by:

$\begin{equation}{Y}_{t+1}=\alpha {Y}_{t}+\eta \left(t\right),\end{equation} \tag{ 15 }$

where η is a Gaussian white noise process with zero mean and constant variance. According to figure 5(b), the statistical properties of simulated time series constructed via the KM method can not perfectly follow the original ones in this case. From tables 1 and 2 it can also be concluded that MZ and EMR perform better (with smaller MSEs). This is expected, since the presence of colored noise implies a deviation from the white-noise assumption of the LE and the KM method to derive it.

Figure 5. Refer to the following caption and surrounding text. — **Figure 5.** A stochastic process with linear drift and colored noise. (a) Original (red) and randomly chosen simulated time series based on KM, MZ, and EMR methods (from top to bottom). (b) Summary statistics (PDFs and ACFs) of original time series and simulated time series derived from 1000 sample time series reconstructed by the three stochastic models (KM, MZ, and EMR) from top to bottom as indicated in the legend. The original system is an Ornstein–Uhlenbeck process with colored noise constructed by a first-order autoregressive process (AR(1)) with α = 0.5, whose statistical features are shown in red color.
Download figure:
Standard image High-resolution image

It is also of interest to evaluate the performance of these SDE models when the system exhibits some short-term memory in the deterministic part. For this purpose we consider a stochastic delay differential equation given by:

$\begin{equation}\frac{\mathrm{d}x\left(t\right)}{\mathrm{d}t}=\alpha x\left(t\right)+\beta x\left(t-\tau \right)+R\left(t\right).\end{equation} \tag{ 16 }$

The random force R is given by serially correlated noise produced by a second-order autoregressive (AR(2)) process in which the current value depends on the two previous values. As can be observed from figure 6, the EMR and MZ methods yield significantly better approximations of the PDF and ACF than the KM method for this system with short-term memory in the drift and colored noise. This provides an instructive example for situations where the MZ and EMR approaches clearly outperform the KM approach (as theoretically expected).

Figure 6. Refer to the following caption and surrounding text. — **Figure 6.** Statistical properties of the observed (red) and simulated (blue) time series from a delayed OU process, with tau = 2, subject to correlated noise produced by an AR(2) process. The simulated time series were constructed using KM, MZ, and EMR methods (from top to bottom), averaged over 1000 different realizations.
Download figure:
Standard image High-resolution image

3.1.2. Real-world time series

In recent years, quantifying stochastic dynamics of financial time series (e.g. stock prices and stock market indices) through stochastic differential equations in order to describe their evolution has attracted considerable attention. In this study we analyze the weekly S&P500 stock index for the time span of 35 years (1950–1985) and analyze its distinctive statistical properties. Here the stock return price $\mathrm{ln} \frac{p\left(t+\delta t\right)}{p\left(t\right)}$ represents the state variable x(t). The statistical results of the simulated time series estimated by the three inverse modeling methods is shown in figure 7. According to the KM coefficients the deterministic part of the dynamics is described by a linear function of the state variable, while the stochastic term exhibits nonlinear behavior.

Figure 7. Refer to the following caption and surrounding text. — **Figure 7.** Summary statistics of S&P500 stock index. (a) Original (red) and randomly chosen simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) The average PDFs (left column) and ACFs (right column) of simulated time series over 1000 realizations (blue) created by KM, MZ, and EMR models from left to right, respectively. The statistics plotted in red correspond to the original weekly S&P500 stocks index for the time span of 35 years (1950–1985).
Download figure:
Standard image High-resolution image

As mentioned above, we exert the first KM coefficient also to approximate the deterministic term of the GLE and EMR models. In this first real-world case, the approximation of the PDFs is not perfect anymore. The results reveal that the MZ formalism outperforms the two other methods in terms of the ACF, with very good agreement to the original return series (figure 7(b), second row, second column). It is worth mentioning that the statistical properties of financial time series exhibit a time-scale dependence and long-range correlations [41, 42]. Therefore we conclude that the memory effects of recent returns occurring in different time scales (from minutes to several days) can be modeled well by the MZ technique.

Another empirical time series that exhibits uni-modal variability is the Niño-3 index [43], which is one of several El Niño-Southern oscillation (ENSO) indicators in regard to tropical Pacific sea surface temperatures (SST). During the last decades, understanding the mechanisms underlying ENSO variability and prediction of future fluctuations has attracted substantial attention [44–47]. ENSO describes variations in temperature and pressure in the eastern Pacific ocean and has significant impacts on global climate variability. We reconstruct the Niño-3 monthly sea surface temperature (SST) indices averages across (5N–5S, 150–90W) from 1891 to 2015 using the KM, MZ, and EMR inverse modeling approaches. The non-Gaussian behavior of Niño-3 indicates a nonlinear process, quantified by the positive skewness of the SST distribution, which may reside in the interaction of oceanic variables of interest and the atmospheric fast forcing [48]. This kind of interactions brings about memory effects into the system dynamics that should not be expected to be fully captured by models established based on Markovian assumptions. Figure 8(b) displays the PDFs and ACFs of the resulting simulated time series. Although all three inverse methods produce similarly skewed and heavy-tailed distributions, comparing the AFCs, MZ achieves a higher accuracy approximation than KM and EMR. We repeated this procedure for the Niño-4 index, and also in that case found that the MZ approach outperforms the other two methods. According to these results, the most significant characteristic of these real-world processes leading to ENSO variability is the presence of serial correlations connected to internal interactions between observed (slow) and unobserved (fast) variables which can be captured best by the MZ approach.

Figure 8. Refer to the following caption and surrounding text. — **Figure 8.** Summary statistics of Niño-3 monthly sea surface temperature (SST) indices. (a) Original (red) and randomly chosen simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) Comparison of model performance of KM, MZ, and EMR in reconstructing dynamics underlying ENSO variability. The figure depicts the statistical properties of the observed (red) and simulated (blue) monthly Niño-3 SST indices from 1891 to 2015.
Download figure:
Standard image High-resolution image

In order to investigate whether these results are prone to overfitting, we first calibrated the three SDE models on the first half of the time series and validated the parameters by comparing the resulting time series statistics with the ones of the second half of the time series for both systems (Niño-3 and S&P500, respectively). The results are presented in figures 9(a) and (b). It can be inferred that the longer-term variations in the ACF are still captured to some degree by the MZ method. For the ENSO case, the slow variations of the ACF correspond to low-frequency variability present in this time series, which is not noise but an important part of (internal) climate variability. The fact that the reproduction of these slow variability modes is less accurate when only calibrating on the first half of the time series is due to the fact that capturing these slow variability modes is harder when considering only a shorter part of the time series. For the S&P500 time series however, the comparably faster fluctuations of the ACF most likely correspond to noise. The very good approximation of the ACF when applying the MZ formalism to the full S&P500 time series could hence be the result of overfitting when deriving the full kernel K.

Figure 9. Refer to the following caption and surrounding text. — **Figure 9.** Comparison between ACFs obtained from calibrating different models (KM, MZ, EMR) on the first half of (a) Niño-3 monthly SST and (b) S&P500 time series only, and ACFs from the second half of these time series (red).
Download figure:
Standard image High-resolution image

3.2. Bimodal processes

3.2.1. Synthetic time series

Up until now we studied processes with a unique mode; however, there are various stochastic dynamics in natural systems that do not exhibit uni-modality. We start with generating a synthetic process in which a particle moves in a double-well (DW) potential at ${\pm}\sqrt[]{\theta }$ , driven by additive Gaussian noise

$\begin{equation}\frac{\mathrm{d}X\left(t\right)}{\mathrm{d}t}=\theta \left(x\left(t\right)-{x}^{3}\left(t\right)\right)+\sigma \eta \left(t\right).\end{equation} \tag{ 17 }$

Here we construct the stationary time series with θ = 1, σ = 0.5 and Δt = 0.01. For all three inverse models the specific values of the drift terms are again derived from the first-order KM coefficients from the original time series. The non-Markovian term in the GLE derived via the MZ formalism, which accounts for interactions between the slow and fast variables, was estimated from the information encoded in the ACF [32, 34] (equation (7)). Considering the Markov property of the underlying system (17), the kernel function K of the MZ approach is zero (as theoretically expected). The left column in figure 10 illustrates the average PDFs of model-simulated time series in comparison with original time series, for 1000 realizations, and the right column displays the associated ACFs. It can be seen that these three stochastic models can mimic the key features of the original time series very well.

Figure 10. Refer to the following caption and surrounding text. — **Figure 10.** A stochastic process with nonlinear drift and additive noise. (a) Original (red) and simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) Comparison of the numerical simulation of a process in a double-well potential subject to an additive Gaussian noise with intensity σ = 0.5, 10⁶ data points and a sampling interval of 0.01. The plot displays PDFs and ACFs of the original (red) and simulated (blue) time series obtained from three different stochastic models (KM, MZ, and EMR as indicated in the legend), averaged over 1000 realizations.
Download figure:
Standard image High-resolution image

In contrast, in the presence of a multiplicative noise, estimating the underlying dynamic is not trivial anymore. We substitute the additive noise in the previous example with $\sqrt[]{1+{x}^{2}}$ . As it can be observed in figure 11), the average ACFs deviate from the original ones for the MZ and EMR methods, while the KM method still achieves an excellent approximation of the PDF and ACF.

Figure 11. Refer to the following caption and surrounding text. — **Figure 11.** A stochastic process with nonlinear drift and multiplicative noise. (a) Original (red) and simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) Statistical properties of the observed (red) and simulated (blue) time series related to a particle motion in a double-well potential, subject to multiplicative noise with diffusion $\sqrt[]{1+{x}^{2}}$ . The simulated time series were constructed using KM, MZ, and EMR methods (from top to bottom), averaged over 1000 different realizations.
Download figure:
Standard image High-resolution image

**Figure 11.** A stochastic process with nonlinear drift and multiplicative noise. (a) Original (red) and simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) Statistical properties of the observed (red) and simulated (blue) time series related to a particle motion in a double-well potential, subject to multiplicative noise with diffusion $\sqrt[]{1+{x}^{2}}$ . The simulated time series were constructed using KM, MZ, and EMR methods (from top to bottom), averaged over 1000 different realizations.
Download figure:
Standard image High-resolution image

3.2.2. Real-world time series

Identification and understanding the triggering mechanism of the Dansgaard–Oeschger events, abrupt temperature increases during the last glacial period (roughly from 115.000 years to 12.000 years BP) has attracted considerable attention in the past years [49–53]. These rapid and strong warming events (up to 16k within few decades [54]), followed by slow-paced cooling phases are inferred from the content of stable isotope composition of water δ¹⁸O in different Greenland ice cores. These proxy records allow to empirically reconstruct the temperature profile over the last 120.000 years with high temporal resolution. In this study we investigate the high-resolution (20 years-average) Ca²⁺ (interpreted as a proxy for atmospheric circulation patterns), collected from the NGRIP ice core on the GICC05 time scale [55]. Because of the substantially better signal-to-noise ratio, we focus here on the Ca²⁺ time series between 60ka and 30ka b2k. We apply the KM, MZ, and EMR approaches to reproduce dynamical and statistical properties of the underlying dynamics of [Ca²⁺].

Since the integration of the diffusion term of an SDE is not uniquely defined, two different frameworks for stochastic calculus have been proposed, namely the Itô and the Stratonovich scheme (see appendix A for further details). Up to now, we have interpreted the SDEs in the Itô sense, for which a consistent discretization scheme is the Euler–Maruyama method with straightforward white-noise increments.

In contrast, in the Stratonovich calculus white noise is approximated by continuously fluctuating noise with finite memory, which may be more suitable for approximating real-world time series. Indeed, it has been argued that the Stratonovich scheme is more appropriate for continuous physical systems, such as ocean and atmospheric circulation systems [56]. A well-defined SDE discretization scheme that is consistent with the Stratonovich interpretation is the Heun method, which has been shown to perform with higher precision for a dynamical system model of paleoclimate variability [57].

We compared the performances of both stochastic calculi and corresponding discretization schemes (Euler–Maruyama and Heun, respectively) for the Ca²⁺ ice core time series. According to figure 12, the Stratonovich calculus performs better than the Itô calculus in terms of approximating the PDF. This likely corresponds to the intrinsic properties of the Stratonovich interpretation in which fast quantities with finite correlation time, as observed in continuous physical systems such as climate variables, can be approximated as white noise. The better performance obtained using the Stratonovich calculus is in agreement with observations made in earlier studies [56, 57].

Figure 12. Refer to the following caption and surrounding text. — **Figure 12.** Comparison between PDFs obtained from the Itô scheme (left panel) and the Stratonovich scheme (right panel) for the Ca²⁺ ice core time series.
Download figure:
Standard image High-resolution image

Figure 13 shows the statistical properties (PDFs and ACFs) of the observed and simulated [Ca²⁺] time series in the interval 60.000 years to 30.000 years before the year 2000 AD. Because of the large amplitude of the [Ca²⁺] concentration variations, the calculations were conducted in natural logarithmic scale [50]. The drift term, again directly derived from the KM coefficient for all three inverse methods, exhibits a nonlinear behavior with two stable equilibrium states, which reflects the switching mechanism between cold stadials and warmer interstadials.

Figure 13. Refer to the following caption and surrounding text. — **Figure 13.** Comparison between observed and simulated [Ca²⁺] time series from Greenland ice cores. (a) Original (red) and simulated time series based on KM, MZ, and EMR methods (from top to bottom respectively). (b) Statistical properties of the observed and simulated [Ca²⁺] time series in the interval between 60ka and 30ka b2k. The PDF of the original data is shown in red color while simulated time series obtained from KM, MZ, and EMR models (from top to bottom) can be found in blue color. The PDFs for the three stochastic models are averaged over 1000 realization and therefore considerably smoothed.
Download figure:
Standard image High-resolution image

The results presented in figure 13 show that the modeled time series for all three methods could accurately reproduce the bimodality of the observed PDFs, although it should be noted that the LE derived with the KM approach approximates the observed PDF better than the GLE and EMR approaches: the depth of the two potential wells in MZ and EMR are shallower than the observed PDF suggests. It is clear from the results illustrated in the right column of the figure 10(b) that the MZ and EMR methods have better performances to construct the underlying autocorrelation structure of the [Ca²⁺].

4. Conclusions

In the last decades, there have been considerable methodological advances in understanding and quantitatively modeling the behavior of complex systems. Among others, stochastic differential equations (SDEs) are a promising approach for this challenge in situations where only a few variables are actually measured. For some typical example settings, we have studied the performance of three methods to derive SDEs—the Kramer–Moyal approach to derive Langevin equations (LE) with potentially multiplicative noise, the Mori–Zwanzig approach to derive generalized Langevin equation (GLE) including a non-Markovian term, and the empirical model reduction (EMR) approach to derive GLEs—for various synthetic and real-world time series. Since the microscopic dynamics of complex systems are often not accessible, these methods regard the unobserved, fast variables as random fluctuations and investigate the properties of underlying systems from the evolution of only a few macroscopic, comparably slow observables. In this study corresponding numerical simulations of all three inverse methods have been examined in terms of PDFs and ACFs of the simulated time series, as metrics for assessing the models' performance. We generally observed a nearly optimal performance of all three approaches for unimodal Markovian systems. For unimodal, non-Markovian systems, the MZ and EMR strongly outperformed the KM approach as theoretically expected. It is thus not possible to draw a firm conclusion about which SDE deviation method is in general superior in comparison to the others. According to our results, the performance of SDE models strongly relies on the effects of memory on the underlying dynamics. We could show that LEs (as derived by the KM approach) obtain better results for systems with weak memory contribution but asymmetric multiplicative noise, since it can directly estimate the state-dependent noise with higher precision. On the other hand, for systems with memory effects and colored-noise forcing, it is essential to take into account the non-Markovian closure terms. Hence, the MZ and EMR approaches can be considered more reliable in reconstructing the dynamics of systems exhibiting strong memory effects. In these two methods, the interactions between observed and unobserved variables are taken into consideration in terms of memory effects. That is, the EMR approach incorporates adaptive numbers of memory steps, whilst the MZ method considers the full memory of a system via the kernel K. With regards to prospective investigations, we suggest that the adaptability of the three investigated SDE models in reproducing dynamical characteristics of multi-variate time series should be further investigated. For example, Boers et al [50] introduced a two-dimensional stochastic delay differential equation model as an approximation of the GLE to analyze the temperature and dust proxy records from Greenland ice cores. Their results revealed the role of the coupling strength between the two variables, where the model has problems in reproducing the bimodality in absence of coupling terms, and also shows that the explicit consideration of memory terms helps to reproduce the asymmetry observed in those temperature and dust proxy records. Furthermore, it should be of interest and high practical relevance for real-world systems to consider an adaptive stochastic differential equation framework that can reconstruct the underlying dynamics of complex systems driven by Lévy noise.

Acknowledgments

This work has been financially supported by the German Academic Exchange Service (DAAD) under project no. 57299294. NB acknowledges funding by the Volkswagen foundation. This is TiPES contribution #25; the TiPES (Tipping Points in the Earth System) project has received funding from the European Union's Horizon 2020 research and innovation programme under grant agreement No. 820970. JK was supported by the project RF Government Grant 075-15-2019-1885.

Appendix A.: Stochastic calculus

Equation (1) with a given initial condition x(t₀) = x₀ has a unique solution which satisfies the following integral form:

$\begin{equation}x\left(t\right)=x\left({t}_{0}\right)+{\int }_{{t}_{0}}^{t}f\left(x\left({t}^{\prime }\right)\right)\mathrm{d}{t}^{\prime }+{\int }_{{t}_{0}}^{t}g\left(x\left({t}^{\prime }\right)\right)\eta \left({t}^{\prime }\right) \mathrm{d}{t}^{\prime }.\end{equation} \tag{ A1 }$

A Wiener process is non-smooth and nowhere differentiable, hence translating the second integral in the equation (A1) using a conventional Riemann sum is not uniquely defined. To interpret the noise term, two different formulations of stochastic calculus have been introduced for computing the solutions of SDEs; known as Itô and Stratonovich calculi, respectively [58, 59].

In the Itô prescription, the evaluations of the function g(x(t)) are uncorrelated with the (infinitesimal) increments of the Wiener process dW(t). In fact, the Itô integral is defined as the limit of a left Riemann sum (where the function g(x(t)) is evaluated at the left of the interval [t, t + Δt]) with an Itô correction. The resulting stochastic integration in the Itô scheme is given:

$\begin{equation}{\int }_{{t}_{0}}^{t}W\left(s\right)\mathrm{d}W\left(s\right)\mathrm{d}s=\frac{1}{2}\left[\left({W}^{2}\left(t\right)-{W}^{2}\left({t}_{0}\right)\right)-\left(t-{t}_{0}\right)\right].\end{equation} \tag{ A2 }$

Although the Itô convention does not preserve the chain rule of calculus, employing Itô's Lemma maintains the martingale property. Owing to this property, the Itô calculus is used extensively in finance [60].

The most common alternative to the Itô integral that does satisfy the chain rule of classical calculus is the Stratonovich scheme. From that point of view, a function can be evaluated at the midpoint of the time interval [t, t + Δt]. Because the midpoint selection rule is associated with the finite noise autocorrelation [61], the martingale property does not hold. In contrast to the Itô, the Stratonovich calculus approximates the Wiener process as the limit of a correlated process when the correlation time approaches zero:

$\begin{equation}{\int }_{{t}_{0}}^{t}W\left(s\right){\circ}\mathrm{d}W\left(s\right)=\frac{1}{2}\left({W}^{2}\left(t\right)-{W}^{2}\left({t}_{0}\right)\right).\end{equation} \tag{ A3 }$

This approximation leads to difficulties e.g. for the calculation of expectation values, since stochastic variables and noise are not independent: 〈x(t)η(t)〉 ≠ 0. It should be underlined that the Itô and Stratonovich calculi have the same solution if their drift terms fulfill the following relationship, which is called Itô–Stratonovich drift correction:

$\begin{equation}{f}_{\mathrm{S}}\left(x\left(t\right)\right)={f}_{\mathrm{I}}\left(x\left(t\right)\right)-\frac{1}{2}\left(\right.g\left(x\left(t\right)\right)\frac{\partial g\left(x\left(t\right)\right)}{\partial x},\end{equation} \tag{ A4 }$

where f_I denotes the drift of the Itô calculus and f_S the drift of the Stratonovich calculus.

Even though both interpretations are mathematically consistent, one obvious question that may arise is which interpretation is the right one for desribing or approximating a particular set of physical processes. In order to answer this, we have to look at the origin of the noise in the stochastic system. It has been shown that, if the relaxation time of a system is large enough in comparison with the noise correlation time, then the Itô interpretation is appropriate. On the other hand, if the noise is colored, i.e., it has finite correlation time, the limiting SDE must be treated in the Stratonovich framework [61–63].

As noted above, different stochastic calculi (Itô or Stratonovich) are associated with different kinds of discritization for numerical integration [64]. It must hence be stated upfront which stochastic calculus is going to be considered. The simplest and most widely used discretization scheme to numerically integrate SDEs is the Euler–Maruyama method, which converges to the Itô interpretation:

$\begin{equation}x\left(t+{\Delta}t\right)=x\left(t\right)+f\left(x\left(t\right)\right){\Delta}t+g\left(x\left(t\right)\right)\eta \left(t\right)\sqrt{{\Delta}t}.\end{equation} \tag{ A5 }$

Another numerical method that we used in this paper (for approximating the Ca²⁺ ice core time series), called Heun method [65], leads to the Stratonovich scheme. This method is an example of a predictor-corrector method in which the predictor is calculated by a simple Euler type integration as follows:

$\begin{equation}\hat{x}\left(t+{\Delta}t\right)=x\left(t\right)+f\left(x\left(t\right)\right){\Delta}t+g\left(x\left(t\right)\right)\sqrt{{\Delta}t}\eta \left(t\right)\end{equation} \tag{ A6 }$

$\begin{equation}x\left(t+{\Delta}t\right)=x\left(t\right)+\frac{1}{2}\left(\right.f\left(x\left(t\right)\right)+f\left(\hat{x}\left(t+{\Delta}t\right)\right){\Delta}t+\frac{1}{2}\left(\right.g\left(x\left(t\right)\right)+g\left(\hat{x}\left(t+{\Delta}t\right)\right)\sqrt{{\Delta}t}\eta \left(t\right).\end{equation} \tag{ A7 }$