Extending the extended dynamic mode decomposition with latent observables: the latent EDMD framework

Bernard O Koopman proposed an alternative view of dynamical systems based on linear operator theory, in which the time evolution of a dynamical system is analogous to the linear propagation of an infinite-dimensional vector of observables. In the last few years, several works have shown that finite-dimensional approximations of this operator can be extremely useful for several applications, such as prediction, control, and data assimilation. In particular, a Koopman representation of a dynamical system with a finite number of dimensions will avoid all the problems caused by nonlinearity in classical state-space models. In this work, the identification of finite-dimensional approximations of the Koopman operator and its associated observables is expressed through the inversion of an unknown augmented linear dynamical system. The proposed framework can be regarded as an extended dynamical mode decomposition that uses a collection of latent observables. The use of a latent dictionary applies to a large class of dynamical regimes, and it provides new means for deriving appropriate finite-dimensional linear approximations to high-dimensional nonlinear systems.


Introduction
Koopman operator theory (Koopman 1931) states that any nonlinear dynamical system can be lifted by a time-invariant nonlinear representation into a space where the time evolution of the system can be described by linear methods. In the last years, a combination of theoretical (Mezić 2005, 2013, 2015, Budišić et al 2012 and numerical (Schmid 2010, Williams et al 2015 efforts, as well as the increasing availability of data, promoted this formalism into one of the leading data-driven identification techniques. Modern Koopman theory aims at finding, from data, some special measurements of the state space that provide global linear models of non-linear, and potentially high dimensional dynamical systems. These representations may, however, be hard to obtain on complex systems. In practice, data-driven approximations such as the extended dynamic mode decomposition, (EDMD) (Williams et al 2015), assume some finite-dimensional basis of functions that span an (approximately) invariant Koopman subspace.
In this work, we wish to further expand the EDMD methodology to account for latent observables of the state variables. Specifically, we assume the finite-dimensional Koopman subspace to possibly be spanned by a known dictionary plus a collection of latent functions that form the latent dictionary. The evaluation of these latent functions is performed on a given realization of the observations. We discuss two methods for this proposed latent EDMD (LEDMD) framework, which can be used to derive Koopman approximations for both deterministic and stochastic dynamical systems. The latter relies on a state space model formulation with Gaussian uncertainties.

Deterministic dynamical systems and Koopman operator theory 2.1. Background on Koopman operator theory for deterministic systems
Let us assume a continuous, autonomous s-dimensional time-varying ODE, governing the state variable z t . This dynamical system also generates measurements x t ∈ R n in the following state space model: When considering the dynamical equation (1a) and given an initial condition z t0 , the solution of this equation for an interval t ∈ [t 0 , t f ] can be written as where Φ t (z t0 ) ∈ L with L ⊂ R s . We may also define a discrete state space model as follows: where T can be the application of a given integration scheme to (1a). Koopman (1931) introduced a new operator-based formalism, where the evolution of a dynamical system can be determined by following a set of observables of the state variable z t . Koopman proved that when considering an infinite-dimensional Hilbert space of observables F, the time evolution of the dynamics is invariant to a linear Koopman operator. Formally, let g ∈ F : L −→ C be a complex-valued observable of the dynamical system (3a). The collection of all these observables forms a linear vector space, on which the Koopman operator K is defined as follows: where • is the composition operator. For time-continuous dynamical systems, we define a one-parameter semi-group of Koopman operators {K t } t>0 as follows: We also denote by A = lim t→0 K t −I t the infinitesimal generator of the Koopman semi-group, which satisfies, assuming uniform continuity on a suitable banash space (Engel et al 2000): The Koopman operator K is a linear operator that advances observables in time. From a modeling perspective, going from an ODE to a Koopman operator formulation can be seen to trade the nonlinear complexity of the dynamical operator f for a linear operator representation based on an infinite-dimensional and non-linear set of observables.

Data-driven approximation of Koopman operator
Finding a data-driven approximation of Koopman representations consists in the definition, from a sequence of N + 1 measurements {x k } N+1 k=1 , of a finite-dimensional collection of observables, that can be propagated linearly in time. It is important to note that in this work, the measurements x k are distinguished from the states z k . Indeed, in real applications, we are not guaranteed to observe the full state vector. A data-driven identification of Koopman representations thus brings another layer of complexity: the finite-dimensional set of observables is defined as a function of the measurements x k and not from the state z k . In this situation, we need to ensue the projection (1b) does not impact the data-driven Koopman representation.
The literature on data-driven approximation of the Koopman operator mainly follows two paths. A first path is written in the language of the dynamic mode decomposition (DMD) (Schmid 2010). The DMD was first introduced to find low rank spatio-temporal coherent structures of complex dynamics. In the language of Koopman, the DMD computes a Koopman approximation when considering the observables as the measurements x k of the state space (Rowley et al 2009). The EDMD (Williams et al 2015) was then introduced to generalize the DMD algorithm to some (non-linear) functions of the measurements. Numerous basis of functions were explored in the literature, ranging from polynomial representations (Brunton et al 2016a) up to multi-layer perceptron (Takeishi et al 2017a). DMD-type algorithms were also shown to converge to a Galerkin approximation of the Koopman operator (Williams et al 2015). The combination of these theoretical results as well as the simplicity of the DMD algorithm motivated several developments of the method. For instance, the piDMD method (Baddoo et al 2023) was developed to enforce some known symmetries on the DMD approximation, and the mpEDMD (Colbrook 2022) guarantees that the EDMD approximation is measure preserving. Furthermore, in order to avoid the computation of spurious pairs of eigenvalues/eigenvectors, the ResDMD method (Colbrook et al 2023) was developed to assess and validate the accuracy of the pairs of eigenvalues/eigenvectors outputted by a DMD procedure. Several works also aimed at approximating the Koopman generator (Mauroy and Goncalves 2016, Klus et al 2020 using DMD-type algorithms, such methods can include sparsity priors on the infinitesimal generator even when the corresponding Koopman operator is not. However, this technique can be subject to closure issues, especially for complex systems with no prior knowledge about the dynamics (and their non-linearities). The measurement noise may also have a strong influence on EDMD estimates. A number of works then proposed efficient estimates of the DMD output under certain noisy conditions. Combining the DMD with the Kalman filter (Jiang and Liu 2022), methods can both provide an efficient separation of the signal and noise, and an approximation of the stochastic Koopman operator.
A second path explores deep learning models to identify the necessary non-linear transformations of the measurements that may lead to suitable Koopman observables (Lusch et al 2018, Yeung et al 2019, Azencot et al 2020, Rice et al 2020. Further note, considering partial observations of the state space variable, delay embedding coordinates offer a simple class of observables that can unfold (under some conditions on the parameters of the delay embedding) the structure of the underlying dynamics. These delay embedding observables were shown to be extremely efficient in the linearization of periodic and quasi-periodic dynamical systems (Arbabi and Mezic 2017). Their exploitation was also demonstrated in the decomposition of chaotic dynamics (Brunton et al 2017), further considering an additional forcing term. These types of representations have led to significant advances in the field, including convergence results under asymptotic regimes (Arbabi andMezic 2017, Zhen et al 2022). However, in practice, the definition of their parameterization, especially the delay embedding (Kamb et al 2020) as well as their exploitation in computing dissipative eigenvalues of the Koopman operator (Arbabi and Mezic 2017) remains an active research topic.

Inverse problem formulation
We begin by outlining the EDMD method (Williams et al 2015) before introducing the proposed LEDMD framework.

EDMD
The main idea behind the EDMD is to estimate a finite-dimensional approximation of the Koopman operator K given a dictionary of functions and a dataset of snapshot pairs. Formally, we start by choosing a dictionary: where ψ i ∈ F : L −→ C for i = 1, 2, . . . , M. We also define the vector valued function Ψ M : L −→ C M×1 : The span of D M is a subspace of F, and can be written as follows: By definition, we can write any function ϕ ∈ F DM as: If we apply the Koopman operator on the observable ϕ we have: If we assume that the subspace spanned by D M is invariant under the action of the Koopman operator we can write: then Under this assumption, we can define an exact finite-dimensional The action of this finite-dimensional Koopman operator is defined as: In practice, the subspace spanned by D M is not invariant to the Koopman operator. It implies that the action of the Koopman operator on some element of F DM will not lie exactly on F DM i.e. Kϕ = b T M Ψ M + r with r ∈ F. This yields to the following relation: To determine K M , the EDMD minimizes the following cost function given a dataset of snapshot pairs where | · | denotes the Frobenius norm. Equation (15) is a least squares problem and the solution that minimises (15) is: where and Ψ(U) + represents the pseudoinverse of Ψ(U).
Overall, EDMD requires a data set of snapshot pairs, {(u k , v k )} N k=1 , as well as a dictionary of observables, D M . Furthermore, it assumes that the subspace spanned by D M is (nearly) invariant under the action of K. Theoretical guarantees on the convergence of K M to the true Koopman operator were studied in the limit of infinite data and an infinite number of observables. Specifically, under some technical assumptions on the sampling of the measurements and the form of the Hilbert space F, Klus et al (2015), Williams et al (2015), Korda and Mezić (2018)showed that the matrix K M converges to K DM , the Galerkin projection of the Koopman operator on the subspace F DM . Furthermore, Korda and Mezić (2018) studied the convergence of K M to K as M goes to infinity in the strong operator topology. In practice, both N and M are finite, and in this setup, the choice of a relevant dictionary is still an open question. To further generalize the EDMD algorithm, we thus propose to consider a latent dictionary. This latent dictionary will help generate observables that make the Koopman approximation more accurate.

EDMD with a latent dictionary 3.2.1. Proposed framework
Within an LEDMD framework, the data-driven derivation of numerical approximations of the Koopman operator are stated as an EDMD approximation with latent observables. These latent observables are used to account for both i) situations where the observation operator is not an embedding and ii) closure issues related to the choice of the dictionary in EDMD framework. In the LEDMD approach, we do not assume to access a dataset of snapshot pairs We only assume observations of the state space model (3), that we conveniently put in the following dataset: Similarly to the EDMD, we start by choosing a dictionary D M (7). This dictionary operates on measurements of the state space (3). We also assume the existence of some latent dictionary D W , that can be written similarly to D M as follows: , we also define the following vector valued functions: We start by Ψ M , stated as the EDMD observables. Ψ M : L −→ C M×1 : We also define Ψ W , stated as the latent observables.
Finally, we define Ψ dE the vector valued LEDMD observables. where We also define the matrix G ∈ R M×dE as: If we assume that the observation operator H is a vector valued function with elements in F, then the span of D dE = D M ∪ D W is a subspace of F, and can be written as follows: Similarly to the EDMD, K dE , an approximation of the Koopman operator verifies: where ϕ ∈ F D d E and a dE ∈ C dE×1 . In the EDMD framework, the approximate Koopman operator K dE is defined as the minimizer of the following cost: The above EDMD cost function assumes direct measurements of the state variables z t . Furthermore, it assumes that both the dictionaries of non-linear functions D M and D W are known and span an (approximately) invariant Koopman subspace. In more realistic settings, we more likely have solely access to measurements x t of a dynamical system. Even though we define our basis of observables on an embedding of the measurements, like for the case with delay embedding approaches (Kamb et al 2020), an arbitrary choice of a dictionary of observables does not systematically generate a good approximation of the Koopman operator. We thus propose to minimize cost function (26), solely given measurements of a dynamical system and without any knowledge of the latent dictionary of observables D W . For this purpose, we reformulate the action of the Koopman operator on a function ϕ M ∈ F DM . By definition, we have: with a M ∈ C M×1 . The action of the Koopman operator on ϕ M verifies: Since the subspace spanned by F D d E is not invariant to the Koopman operator, the approximate Koopman operator K dE includes a residual term r as follows: Given a dataset of snapshot pairs, {(x k , x k+1 )} N k=1 the LEDMD considers the following cost function: If D W and z k are known, the LEDMD framework falls back into the standard EDMD problem described above. The solution of the minimization of J LEDMD can then be defined similarly to (15) using the pseudoinverse. However, since D W and z k are not known, we propose to minimize J LEDMD with respect to both K dE and {Ψ W (z k )} k=N k=1 : arg min

Practical optimization problem
In this work, a regularization term is used to account for the action of K dE on the latent observables. Specifically, we numerically minimize the following objective function: where β is a weighting parameter. The term |Ψ dE (z k+1 ) − K dE Ψ dE (z k )| 2 may be regarded as a regularization term such that the inference of the latent observables {Ψ W (z k )} k=N k=1 of the vector of observable Ψ dE is not solved independently for each time-step. The overall idea of the proposed framework are sketched in figure 1.
Interpretation of the LEDMD: This formulation can be interpreted as searching for the most relevant subset of latent observables Ψ W , that makes the set of observables Ψ dE to evolve linearly in time. It is worth noting that this formulation naturally extends to cases where the measurements x t do not form an embedding of the state of the dynamics z t . Indeed, the set of latent observables is written as a solution to an optimization problem, and not as an explicit mapping of the observations. The LEDMD can be easily extended to account for physical constraints, similarly to DMD-type algorithms (Colbrook 2022, Baddoo et al 2023, by including these constraints as regularization terms in (30).
Benefits of the numerical optimization: Since the optimization is carried out numerically, we can further benefit from the numerical computation of the approximate Koopman operator as well as the corresponding latent observables. It helps to formulate the Koopman operator as the matrix exponential of an infinitesimal generator. Such a formulation makes it simple to constrain the eigenvalues of the approximate Koopman operator to be unitary (when relevant, for example, in conservative systems). This formulation is simply derived by considering K = e ∆tA where A is the approximate infinitesimal generator of the Koopman operator and ∆t is the time step. Figure 1 shows an example where the infinitesimal generator is optimized. The proposed framework is highlighted on partial observations of the state variables of a non-linear equilibrium point. These partial measurements are embedded into a higher dimensional space of observables. This higher dimensional space is assumed to be invariant to a linear Koopman operator. Both the observables ψ l 1 and ψ l 1 as well as the infinitesimal generator of Koopman are solution of an optimization problem with respect to the prediction of the measurements. The outputs of this inverse problem is a linear model that can be used to simulate/predict new measurements.
Forecasting applications B the latent observables Ψ W of the vector of observables Ψ dE are outputs of an optimization problem, forecasting applications must be approached with caution to account for a relevant choice of Ψ W initial conditions. In practice, given a trained LEDMD model, forecasting measurements x k relies on the forecasting of the entire set of observables Ψ dE . The latter amounts to finding an initial condition for the latent observables, Ψ W and then propagating these observables by the action of the approximate Koopman operator. This issue can be addressed by finding the most relevant latent observables that minimize the forecasting cost of a predefined sequence of EDMD observables Ψ M . Specifically, given a new sequence of measurements x k , k ∈ {N 1 , . . . , N 2 }, forecasting these new series of measurements for k > N 2 is carried by following the optimization problem (32), where we infer the initial condition of Ψ W using the following minimization: Here, we only minimize w.r.t. latent observables, Ψ W given the trained Koopman operator K dE . This minimization relates to a variational assimilation issue with partially-observed states and known dynamical and observation operators (Lynch and Huang 2010).

LEDMD and the stochastic Koopman operator
Hereafter, we describe the extension of the proposed framework for the derivation of finite-dimensional approximations of the stochastic Koopman operator.

Stochastic Koopman operator
Let us consider a state space model driven by a Random Dynamical System (RDS): with (L, Σ L , µ L ) and (Ω, Σ Ω , µ Ω , θ) the probability spaces associated with L and Ω respectively. θ : Ω −→ Ω is a measure preserving base flow. The noise process w k is assumed to be independent from z 0 z 0 . . . z k .
We can define a one-step stochastic Koopman operator K Ω , associated with the RDS (34) as: with g ∈ F : L −→ C is a complex-valued observable of the dynamical system. Similarly to the deterministic case, if a finite dictionary D dE of elements of F spans a subspace F dE ⊂ F that is (approximately) invariant to the stochastic Koopman operator, we can define a finite-dimensional (approximation) K Ω of K Ω as the projection of K Ω into F dE .

Approximation of the stochastic Koopman operator
When considering the approximation of the stochastic Koopman operator from data, the standard EDMD (with a correct choice of the dictionary) has been explored in various works. Specifically, when in addition to ergodicity, the noise process is assumed to be white, Takeishi et al (2017b) showed the eigenvalues produced by the standard EDMD algorithms converging to the ones of the stochastic Koopman operator. However, for given noisy observables, the output of the EDMD algorithm is biased (Dawson et al 2016). Several works proposed variants of the DMD algorithm to correct the influence of noise on the estimation of the approximate Koopman operator in both the deterministic (Dawson et al 2016, Jiang andLiu 2022) and stochastic (Takeishi et al 2017b, Wanner 2020, Jiang and Liu 2022 settings. Yet, a fundamental issue regarding the choice of the dictionary of observables still persists. We will show that the proposed LEDMD framework is relevant for handling RDS. Assuming a Gausian noise, e k , we combine LEDMD with a Kalman filter, following Jiang and Liu (2022), to produce an estimate of the stochastic Koopman operator and estimates of the process and observation noises. Furthermore, using the LEDMD in a Kalman filter allows for highly irregular observations that can not be handled by simple DMD inversions.

LEDMD for the stochastic Koopman operator
The proposed LEDMD model can be used to derive the stochastic Koopman operator, given partial observations of the state of the system, and assuming an incomplete basis of observables. Similarly to the LEDMD in the deterministic case, two dictionaries of functions D M and D W are considered. The dictionary D M acts on measurements of the RDS, while D W spans the latent observables. We assume that the span of D dE = D M ∪ D W is (approximately) invariant to the action of the stochastic Koopman operator which gives: The derivation of K Ω and the latent observables Ψ W are to be considered with care to account for both the process noise and some potential measurements noise. The learning criterion proposed in (32) is likely to fail. To best circumvent this issue, similarly to Jiang and Liu (2022), we propose to formulate the LEDMD as a state space model. In this model, the measurement equation can be easily defined by writing the restriction of the full state observable to the measurements x k : Ψ O : L −→ C n×1 : with e * i , i = 1, . . . , n the ith unit vector in R n . In this work, we assume that ψ o i ∈ F D d E for all i = 1, . . . , n so that we can write: where H = BG. Here, B ∈ R n×M is some appropriate matrix of weight that satisfies is the covariance matrix of the measurements error. The vector ν accounts for the measurements error.
Equations (39) and (37) form a linear state space model. Assuming in this setup that the uncertainities are gaussian i.e. e k ∼ N (0, Q k ) and ν k ∼ N (0, R k ) and the span of D dE is invariant to the action of the stochastic Koopman operator, the Kalman filter then represents an optimal estimator, in the least square sense, of the sequence of vector valued observables Ψ dE (z k ), given the measurements x t . Standard Kalman filtering parameterization can then be used to infer the approximate stochastic Koopman operator, and the covariances Q k and R k .

Kalman filter for the estimation of the stochastic LEDMD
The Kalman filter is used on equations (39) and (37) to derive an estimate of the distribution Formally, given the initial moments Ψ dE (z k ) a 1 and P a 1 , the mean Ψ dE (z k ) a and covariance P a can be computed as follows: The superscripts f and a refer to the forecasting and filtering phases of the Kalman filter, respectively.

Optimization of the parameters of the stochastic LEDMD:
The use of the Kalman filter to estimate parameters of a given state space model (here Ψ dE , K Ω , Q k and R k ) is classical in state space modeling literature (Tandeo et al 2018). The optimization of K Ω , Q k and R k must be considered with care to account for a correct derivation of the covariances Q k and R k , and the stochastic Koopman operator K Ω . We consider the maximization of the likelihood of the observations. It provides a direct objective function to be optimized using gradient descent. Formally, and under the assumption of the state space model of equations (39) and (37) Equation (41) includes crucial information regarding the parameters of the filtering scheme. It can be maximized numerically to derive the correct values for Ψ dE , K Ω , Q k and R k . Furthermore, and under some assumptions on the covariances Q k and R k , Jiang and Liu (2022) showed that the maximization of the likelihood of the observations converges to the standard DMD formulation in the deterministic case. It also converges to the Noise-corrected DMD (Dawson et al 2016) when considering deterministic dynamics with a small measurement noise level. For arbitrary noises, the maximization of (41) does not have a known analytical solution. We thus propose to maximize (41) numerically using gradient descent. Specifically, the observable Ψ dE (z k ) and the K Ω , Q k and R k are estimated as follows: • Step 1: Initialize K Ω , Q k and R k . • Step 2: Given measurements {x k } k=N k=1 , estimate the posterior distribution p(Ψ dE (z k )|{x i } i =k i =1 ) using the Kalman filter.

•
Step 3: Minimize the negative log likelihood of (41) with respect to K Ω , Q k and R k . • Step 4: Repeat steps 2 and 3 until convergence.
The likelihood (41) measures how good a model fits the observations. It is known in the state-of-the-art data assimilation literature as model evidence. It is also often used in model selection (Carson et al 2018) as it provides a direct measure of the accuracy of a given state-space model. In this work, and under the hypothesis of the Gaussian noise uncertainties, this likelihood can be used to evaluate the effectiveness of the stochastic LEDMD approximation under different dictionaries of observables. In the next experimental section, focus is given on the evaluation of the dimension of the approximate Koopman operator. We show that the likelihood is maximized for approximations of Koopman that have a sufficient dimension.

Application examples
The LEDMD is tested on deternimistic and stochastic dynamical systems in the following section. In the appendix, we also present further experiments on nonlinear oscillations and passengers airline dataset.

Equilibrium points
Let us consider the following system of differential equations Equation (42) is a nonlinear ODE with an equilibrium point at the origin. Studying the derivation of a linear conjugate of this equation is relevant. The equation (42) admits a three-dimensional closed-form linear Koopman representation (Brunton et al 2016a), by choosing as set of observables the variables z 1 , z 2 and z 2 1 . Formally, and considering z 3 = z 2 1 , we may rewrite (42) as:   ż 1,t = µz 1,ṫ z 2,t = α(z 2,t − z 3,t ) z 3,t = 2µz 3,t . (43) Full observations of the state space. In this first experiment, we consider as measurements the full state vector of (42) i.e. H = I 2 , x T t = [z 1,t , z 2,t ] (please refer to appendix B for a description of the numerical simulation and sampling of (42)). Figure 2 illustrates the performance of LEDMD framework in the identification of a linear model that perfectly matches the non-linear dynamics. Our framework is tested here with D M = {e * 1 e * 2 } and D W = {ψ l 1 }. This setup generates observables Ψ dE = [z 1 z 2 ψ l 1 (z t )] T ∈ R 3×1 (d E = 3) i.e a single latent observable is concatenated to the observations. It is worth noting that neither the dynamical model, nor its non-linearities are known by the proposed framework as the observable ψ l 1 (z t ) and the approximate Koopman operator K dE are solutions of an optimization problem, that minimizes the forecasting of the observations. The EDMD algorithm using, in addition to the observed states, an additional observable z 2 1 leads to a closed form Koopman translation of the non-linear differential equation (42). Indeed, as shown by equation (43), this ODE can be analytically linearized with this set of observables. However, selecting a bad observable, for instance z 2 2 , drastically changes the EDMD performance as illustrated in figure 2. Choosing the right finite set of observables is key in data-driven Koopman representations, and this experiment highlights this aspect.
Partial observations of the state space We now only consider partial observations of the state of (42) i.e.
Similarly to the previous experiment, the LEDMD framework is tested with D M = {e * 2 } and D W = {ψ l 1 , ψ l 2 }. In this configuration, two latent variables are concatenated to the observation. This application example is depicted in figure 1. Figure 3(a) illustrates the DMD algorithm, applied to measurements of z 2 . Including high order polynomial non-linearities tends to make worst the DMD approximation as illustrated in figure 3(b). This is due to the fact that the non-linearity is present in the unobserved component z 1 . The LEDMD model is able to derive the most relevant latent observables that linearize the system, only given partial measurements. Figure 4 provides a simulation example of the LEDMD framework.  (42) given full observations of the state variables. The simulation of the LEDMD model is given in (a), the one of the EDMD algorithm with a correct choice of the observables (i.e. [z1, z2, z 2 1 ] T ) in (b) and the simulation of the EDMD with a wrong set of observables (i.e. [z1, z2, z 2 2 ] T ) in (c). In each figure, the colors correspond to simulations from different initial conditions. The lines correspond to the non-linear dynamics and the dots represent the data-driven Koopman simulations.

LEDMD on the shallow water equation (SWE)
Chaotic systems are typical examples where the observables space of the Koopman operator is infinite-dimensional. However, numerous chaotic systems admit, in addition to a chaotic signature, several periodic and quasi-periodic modes, making suitably chosen linear models relevant for short-term forecast applications.
We consider modeling chaotic dynamics governed by the SWE. Observations of the sea surface elevation η k are considered as measurements i.e.x k = η k (please refer to appendix B for a detailed description of the numerical simulation of the SWE). In this experiment, the LEDMD framework is built as follows. We construct Ψ M from the empirical orthogonal function (EOF) decomposition of the measurements, i.e. D M is built with the M = 100 principal eigenvectors of the covariance of the measurements and D W = {ψ l 1 , . . . , ψ l W } with W = 600. This setup generates observables ) with E the matrix formed by the eigenvectors of the covariance matrix of the data.   (42) using the LEDMD model constructed on partial observations of the state variables. The colors correspond to simulations from different initial conditions. The lines correspond to the non-linear dynamics of z2,t and the dots represent the LEDMD simulation. Even with partial measurements of the non-linear dynamics (42), the proposed framework perfectly matches the ground truth. Figure 5 illustrates the forecasting performance with respect to the true state, the projection of the true state from the EOF basis, and the Hankel-DMD-based algorithm. This state-of-the-art model is built on a singular value decomposition (SVD) of delay embedding coordinates of the EOF (we use the same EOF decomposition as the on described above for the LEDMD) components (Kamb et al 2020). The delay embedding is computed, for every EOF component, using a lag embedding equal to one time step. We test 3 different embedding dimensions of d E1 = 700, d E2 = 10 000 and d E3 = 20 000. These embedding configurations generate three different Hankel DMD models that we refere to HDMD dE1 , HDMD dE2 and HDMD dE3 . In all these Hankel DMD models,the dimension of the SVD is set to 150, which accounts for over 95% of the total variance of the delay embedding representations. The qualitative analysis of figure 5 shows that the proposed architecture outperforms all Hankel-DMD based models by generating eddies that are closer to the ones of the true state. These observations are validated through the computation of the RMSE of each prediction time step (figure 6). Interestingly, even though increasing the embedding dimension of the Hankel matrix improves the DMD representations based on these observables, the proposed architecture leads to better results based on a smaller dimension of the set of observables.

Stochastic LEDMD representations
We now shift our attention to the stochastic version of the LEDMD framework. Let us consider the following linear system in R 4 : with e k ∼ N (0, σ 2 I) and σ 2 z = 1. We consider as observations 2000 samples of the first component of the state vector z k with additional Gaussian noise i.e. x k+1 = Bz k+1 + b k with B = [1, 0, 0, 0] T and b k ∼ N (0, σ 2 I) and σ 2 x = 1.5. We consider six different stochastic Koopman operators with increasing d E i.e. {K Ω,dE } dE=6 dE=1 . These operators share the same EDMD observables D M = {e * 1 } and have an increasing number of latent observables (from no latent observables to 5). D W = {ψ l 1 }. We show in figure 7 the evolution of the log-likelihood of the observations for each tested stochastic Koopman approximation. Overall, this score converges for the models {K Ω,dE } dE>3 which suggests that a dimension of 4 is enough for modeling the observations. We also illustrate in figure 8 the estimated eigenvalues of each model with respect to the true eigenvalues of A. When d E < 4, the stochastic Koopman approximation is only able to match eigenvalues A with no imaginary parts which explain the low log-likelihood score of figure 7 when compared to higher dimensions. At d E = 4 the associated Koopman approximation has enough eigenvalues to correctly match the ones of the true RDS which explains the increase in the log-likelihood score. At higher dimensions, K Ω,dE=5 } and K Ω,dE=6 } have similar likelihood scores but with extra spurious eigenvalues.

Discussion
Introduced about a century ago, the Koopman operator formalism helps describe the evolution of a state sequence through the linear propagation of an infinite-dimensional set of observables. Finding finite rank approximations of this operator motivated tremendous amounts of works, centered around several questions, one of these interrogations concerns: which observables to use, in order to avoid loosely trading the complexity of a potentially non-linear system for a higher dimensional linear one ?
In this context, several dictionary-based families of observables have been investigated, ranging from non-linear polynomial expansions to autoencoders and deep learning. In the present work, a different perspective is considered. Instead of fixing a dictionary (or a family of dictionaries) and solving for the finite-dimensional approximation of Koopman, we write some of the Koopman observables as solutions to an optimization problem. In this setting, it helps avoid any unnecessary constraint over the observables space other than the ones constrained by the optimization of the forecasting cost of the observations. Through different numerical experiments, the proposed framework appears very efficient for the data-driven derivation of a finite-dimensional approximation of Koopman representations of dynamical systems. Whereas most state-of-the-art algorithms heavily rely on the selection of a family of basis functions, the proposed architecture can tackle several dynamical regimes, both given full or partial measurements of the state space.
When compared to standard non-linear model identification techniques (Brunton et al 2016b, Chen et al 2018, Ouala et al 2020, point (or discrete) spectrum approximations of Koopman, such as the proposed LEDMD framework and most of the state-of-the-art literature, can not represent the asymptotic behavior of chaotic dynamical systems. However, from an application perspective, finding linear models of non-chaotic dynamics as well as having simple predictive models of chaotic ones is highly valuable. In this context, investigating the relevance of the LEDMD in applications such as control and data assimilation is a promising perspective. Building end-to-end trainable control/data assimilation algorithms based on this architecture should allow learning dynamical priors and linear embeddings, based on a direct application-oriented cost (such as a data assimilation cost in state reconstruction, an energy/performance based cost in the context of control, . . .etc). Promoting sparsity in the proposed framework is also an important perspective. As shown in figure 1, the optimized approximation of the infinitesimal generator of Koopman contains terms that are close to zero, which emphasizes that encoding a sparsity prior to LEDMD may help to promote generalizability.
When the EDMD dictionary D M spans an (approximately) invariant subspace of the Koopman operator, the latent observables Ψ W can be omitted and the LEDMD becomes a standard EDMD method. From the numerical experiments, we show that stacking the latent observables to the vector of EDMD observables directs the LEDMD approximation towards an invariant subspace, especially when considering unknown dynamics and partial observations of the state variables. However, applying a numerical optimization instead of the standard inversion used in the EDMD misses the convergence results (for an infinite amount of data samples) of the EDMD to the Galerkin projection of the Koopman operator. This motivates studying the convergence properties and conditions of the LEDMD.
The approximation of the stochastic Koopman operator, treated in this work, further reveals an extremely important aspect, implicit to the proposed representation. Learning an LEDMD model in a stochastic fashion allows for two distinct levels of approximation within the proposed framework, namely i) the deterministic parameters, i.e. the approximate Koopman operator K Ω , and ii) the stochastic components encoded through the process and observation noise covariances. When jointly learnt, those two components can trade the complexity of some given measurements and may dissociate stochastic and deterministic behaviors within a signal. In this context, investigating such aspects on more real world problems may require a way of getting rid of the Gaussianity assumption of the noises.

Data availability statement
The data that support the findings of this study are openly available at the following URL/DOI: https:// github.com/CIA-Oceanix/Augmented_Koopman. This dynamical system is widely used in state-of-the-art data-driven Koopman representations and can be translated into a Koopman linear model using several state-of-the-art algorithms. We show in this experiment that our proposed framework is also relevant in this context. Considering the true states as observations i.e. with H = M = I 2 , we trained the proposed framework with d E = 100 on a simulated trajectory of size 5000 (the trajectory was computed using the LOSDA ODE solver (Hindmarsh 1983) with a sampling rate h = 0.1). The Hankel-EDMD algorithm was tested with a lag embedding of a single time step and an embedding dimension d E = 100. The dimension of the SVD is set to 16, which accounts for over 99% of the total variance of the delay embedding representation. The forecasting performance of the proposed model is shown in figure 9.

A.2. Air passenger time series
The previous experiment motivated the evaluation of the proposed model in forecasting real quasi-periodic signals. In this context, we consider the international Airline Passengers prediction problem. The data ranges from January 1949 to December 1960 with 144 observations in units of 1000. The first 100 data points were used as training data and we tested our approach on the remaining 44 observations. Figure 10 illustrates the forcasting performance of the proposed framework with respect to the Hankel DMD framework.

Appendix B. Generation of the data and parameterization of the non-linear equations
The data we use in our experiments are simulated by solving the systems of differential equations as follows: • The parameters of the dynamical system (42) are set in the experiments to µ = −1 and α = −10. The training set is a single trajectory, simulated using the LOSDA ODE solver (Hindmarsh 1983) from t = 0 up to t = 6 and sampled at h = 0.01. The initial condition of this training set is z T 0 = [12.0, −1.0]. The test set consists on a collection of 289 trajectories, simulated and sampled similarly to the training sequence, but starting at different initial conditions with z 1,0 = −40, −35, . . ., 40 and z 2,0 = −400, −350, . . ., 400.
• The SWE dynamics used in this work is geverned by the following set of equations: