Machine learning of hidden variables in multiscale fluid simulation

Solving fluid dynamics equations often requires the use of closure relations that account for missing microphysics. For example, when solving equations related to fluid dynamics for systems with a large Reynolds number, sub-grid effects become important and a turbulence closure is required, and in systems with a large Knudsen number, kinetic effects become important and a kinetic closure is required. By adding an equation governing the growth and transport of the quantity requiring the closure relation, it becomes possible to capture microphysics through the introduction of ``hidden variables'' that are non-local in space and time. The behavior of the ``hidden variables'' in response to the fluid conditions can be learned from a higher fidelity or ab-initio model that contains all the microphysics. In our study, a partial differential equation simulator that is end-to-end differentiable is used to train judiciously placed neural networks against ground-truth simulations. We show that this method enables an Euler equation based approach to reproduce non-linear, large Knudsen number plasma physics that can otherwise only be modeled using Boltzmann-like equation simulators such as Vlasov or Particle-In-Cell modeling.


Introduction
Machine learning for partial differential equations (PDEs) has received much attention with an emphasis towards computational fluid dynamics [1,2,3,4].Fluid equations describe conservation laws for bulk properties of matter -such as mass, energy, average flow velocity -under the continuum assumption.The equation set usually requires a closure, which involves additional information that represents some "microphysics" that is not captured by the continuum model.For example, because of turbulence on small scales, or granularity because of the nature of the media being made up of particles / molecules / cells etc. an equation may be introduced that represents the relationship between bulk properties, such as a constituent relation (material response to a force) or equation of state (a thermodynamic equation relating bulk properties) While analytic closure models based on equilbrium states have existed at least since the introduction of hydrodynamic equations, the abundance of data and of data-generation mechanisms has prompted a re-examination of the existing methods in light of data-driven models that provide closure for the coupled system of equations.This approach is particularly relevant in systems where physics occurs on length-and time-scales smaller than that which is resolved by the simulations.
Similarly, there are other systems where the fluid approximation itself breaks down because of effects involving an extended phase-space of states; for example, including momentum states where physical effects from the interaction between molecules or particles of a certain velocity become important.In these cases, increasing the resolution of the simulated domain arbitrarily does not recover the missing kinetic physics.Capturing kinetic physics using a first-principles approach requires a fully kinetic description such as solving the Boltzmann equation.Such effects may be included in a fluid model through constituent relations etc. that depend on the local conditions that may be derived or learned somehow from a kinetic model or experimental data etc.An example of this is the introduction of modified transport coefficients for nonlocal heatflow in fusion plasma [5].More recently, the application of machine learning to model turbulence has a growing body of research (see references within refs.[4,6]).
The issue with any relation derived from local bulk fluid conditions is that it retains no "memory" of microphysics effects from other locations; the effect is localized in space and time.But it is often the case that nonlocality is important.For example in the case of kinetic flows, fast particle streams will affect transport far, in both time and space, from a region that generates them.
In this article, we expand the method of machine learned closure to learn variables that themselves obey a PDE that describes the space-time variation of the microphysics.We refer to these as "hidden variables" as they do not describe bulk variables of the fluid system of equations but instead represent some physics that is "hidden" from the fluid equations.The original term "hidden variables" was introduced by Bohm to try to explain quantum behavior in terms of some unobservable underlying variables [7].Here, the meaning is slightly different as although our "hidden variable" indeed describes physics that is unobservable (within the fluid description), the variable itself is not thought to be fundamental but just an ad-hoc reduced fidelity approximation of the fundamental behavior which can be learned from an ab-initio model.We specifically look at the dynamics of Landau damping of excited plasma wavepackets in the presence of trapped hot electrons, which is described by the higher fidelity Vlasov-Poisson equations.
Our fluid model using a Landau closure, similar to that in Ref. [8], that is modified to include a hidden variable representing the effect of the hot electron population is able to replicate the fully kinetic behavior.The structure of paper is as follows.Section 2 discusses the inclusion of kinetic effects in a new differentiable fluid simulator using a "hidden variable" approach.Section 3 introduces the specific problem to address of Landau damping of excited plasma wavepackets and the basic equations to be solved.Section 4 introduces the new "hidden variable" and its evolution.Section 5 presents the results and comparison with full Vlasov simulation.Finally, we conclude in Section 6.

Inclusion of kinetic effects in a fluid code
The Boltzmann equation is a PDE that governs the evolution of the distribution function of particles in phase space and is given by where f = f (t, x, p) is the distribution function in (x, p) phase space, and F are the forces, and the right-hand-side describes collisional effects.Velocity-space moments of eq. 1 give an infinite chain of equations for the conservation of particle density, momentum, and energy.With appropriate closure conditions, these form the Euler equations in the inviscid case, or the Navier-Stokes equations in the viscous regime, depending on the treatment of the collisional term.If the particles are charged, this moment-driven approach gives the plasma fluid equations.In all cases, the reduction of a three configuration and three velocity phase-space down to a three-dimensional configuration space averages over the details contained in velocity-space.This approximation is justified in systems where the Knudsen number is small such that the particles reach local thermal equilibrium (LTE) and velocity-space effects, called kinetic effects, are negligible.The Knudsen number is defined by the ratio between the mean-free-path of the particles and the size of the system.There are many important systems where the Knudsen number is not small, and the thermal equilibrium approximation does not hold.For example, in micro-or nano-scale flows where the system size is small, as well as rarified gases such as the high-altitude atmosphere that is experienced by spacecraft where the mean-free-path is large.
In plasmas, kinetic effects are important in the Scrape-Off Layer (SOL) in magnetic fusion plasmas [9], in the low-density gas-fill in inertial fusion plasmas [9], and in laser-and plasma-based accelerator experiments where the plasma density is low and the physics is weakly-collisional.In such settings, it becomes important to retain the influence of kinetic effects in order to perform accurate modeling of the system dynamics.Another example is in plasma based particle acceleration [10], in particular the phenomena of wavebreaking or trajectory crossing in plasma waves [11].Fluid simulations of plasma accelerators have the benefit of having smooth distributions compared to typical particle-in-cell simulations, but cannot model crucial electron trapping processes because of phase space mixing.
Previous attempts at the inclusion of kinetic effects into plasma fluid modeling have leveraged analytical methods [8,12,13,14].Modeling kinetic physics using datadriven methods has either been attempted towards benchmarking their performance against known analytical methods [15] or is restrictive in the geometries to which it can be applied [16].In this work, we develop a machine-learned model that is applied to model a phenomenon with no known analytical representation.It is trained in a small geometry and successfully translated to a much larger domain.
To accurately model the desired kinetic effects in this work, we find that a model that is non-local in space and time is necessary.In other words, we find that a model that has memory is required.To accomplish this, we use an additional transport equation for a hidden variable with learned coefficients for the source term.Because we leverage a hidden variable that can persist over time, it enables the reproduction of the kinetic effect that requires non-locality in time.Since our hidden variable can also be transported, it can be leveraged to reproduce non-local behavior in space.We train the model in a reduced, idealized system against numerical solutions of the collisionless Boltzmann equation.Because our model is a self-consistent system of coupled PDEs with carefully constructed asymptotic behavior, we show that it not only extends to larger systems but can also be used to model phenomena that the neural networks have never previously seen.
In order to maximize the likelihood of developing models that are stable over long time-scales, it is important to develop models either using symbolic or sparse regression techniques [17,18,19] or via differentiable simulators [20,21].The former allow the user to craft and analyze symbolic formulations and ensure their stability.In the most general case, however, formulating a library of terms that may enable accurate modeling is a challenging task, and using a function approximator such as a neural network provides a method by which to circumvent that challenge.Training neural network models in-situ with simulations, rather than only using a single time-step, helps ensure the stability of the learned representation.This paradigm is visualized in fig. 1.
To increase the likelihood of learning stable models, we choose to develop a differentiable fluid simulator for plasma physics called ADEPT §.While differentiable simulators have been used to construct models for computational fluid dynamics [20,21,6], and for physics discovery in plasma kinetics [22], to our knowledge, this is the first differentiable simulator of computational plasma fluid dynamics.The code is publicly available along with the dataset introduced in this work.

Problem Statement -Landau damping of excited plasma wavepackets
Plasmas are systems where an ionized gas exhibits collective behavior through electrostatic or electromagnetic fields.These fields are capable of sustaining oscillations and waves which leads to a discipline rich in non-linear kinetic behavior.The kinetic behavior arises due to wave-particle interactions, where particles moving at the phase § Automatic-Differentiation-Enabled Plasma Transport

Evolve Fluid Equations
x N  The same parameters are fed to the differentiable fluid simulator that includes the machine learned hidden variable.This system is evolved in time over the same duration as the kinetic simulation.The observables from each simulation are used for calculating the loss function.The loss, and more importantly, the gradient of the loss with respect to the weights of the function approximator is computed.That gradient is used by an optimization algorithm to update the weights of the hidden variable model.The updated weights are used in the next batch of simulations.In practice, we precalculate the reference simulations.velocity of the wave can exchange energy with the wave.One important example of this is Landau damping [23], where an excited plasma wave will damp even in the absence of dissipative forces.This inherently requires a kinetic description to describe the physics of the wave-particle interaction driven damping mechanism.The linear version of Landau damping can be included via an analytic closure relation, which we do here.We extend that implementation to include the nonlinear version of this effect in a fluid model through the introduction of a "hidden variable" representing the trapped electron population.We start by describing the simple one dimensional fluid model.

Fluid model
The relevant equations are the simplest form of the moment equations for electron plasma transport that are necessary for illustrating our method.The two-moment equations are given by where n = n(t, x), u = u(t, x), p = p(t, x) are the density, velocity, and pressure, respectively.We use an adiabatic equation of state as a closure for the equation set, where γ = (f + 2)/f is the adiabatic index, with f the number of degrees of freedom, which is set to 3.0 for one-dimensional problems.T is the electron temperature and is assumed to be constant and uniform.E = E(t, x) is the electrostatic field given by solving Poisson's equation, ∂ x E = (Zn i − en)/ϵ 0 where n i is the ion density.Because this problem requires electron timescales, it is satisfactory to assume ions are stationary and form a neutralizing background.

Base Model -Linear Landau Damping
During linear Landau damping, particles absorb energy from the electric field and are accelerated.Because individual particle trajectories are no longer modeled in the fluid approximation, modeling Landau damping requires a modification to include the microphysics.In ref. [8], a Fourier space operation is used to provide the microphysics closure.
We include Landau damping similarly by adding a damping term to the momentum equation given by eq. 3. The modified momentum equation is given by where * is the convolution operator defined by [a * b](x) ≡ a(x ′ − x)b(x ′ )dx ′ and ν L is the wavenumber dependent Landau damping rate.Since ν L can be calculated for a range of wavenumbers using numerical rootfinders, we pre-compute the ν L for the computational domain, which serves as a convolutional filter that represents the wave-number dependent damping rate.We verify that this implementation reproduces Landau damping by comparing the results against those from a Vlasov code.We also verify that the analytical linear dispersion relation arising from by eqs. 2 and 5 recovers the Landau damping rate from the imaginary part of the kinetic dispersion relation.

Training -Non-linear Landau damping
When the electric field amplitude is large, particles exchange energy with the electric field, and eventually become trapped in the potential of the wave.The electric field and particles reach a time-averaged steady state during which Landau damping is suppressed and the electric field is not damped.In a weakly-collisional plasma, Landau damping does not completely disappear, but is reduced significantly, and the amount of reduction is a function of the collision frequency, field amplitude, and the wavenumber of the field.This is a distinctly non-linear and kinetic effect that is not captured by the fluid model.A model equation that enables the field amplitude dependent suppression of Landau damping can be given by where ν L is now a function of the wavenumber dependent field amplitude.An analytical solution for that term is not available, but a representation can be machine learned via tuning its weights and biases θ.However, it is readily seen that this implementation is unable to recover effects that are non-local in space, that is, effects that require nearby information, such as wavepacket etching.The formulation, by introducing a spatiotemporally dependent "hidden variable" that we use instead is detailed in the next section.

Testing -Motivating non-local physics through wavepacket etching
In the previous subsections, we have been discussing phenomenon that occurs in a idealized periodic system of a single wavelength.In finite-length wavepackets of O(10) wavelengths, a kinetic effect occurs where the damping rate becomes increasingly nonlocal because it depends on past dynamics.The effect on the observable is that the back of the wavepacket is damped faster than the front.It was shown in ref. [24] that the reason behind this is that resonant electrons are accelerated by the wave and travel faster than the group velocity of the wave.Because the electrons in the back of the wave are no longer trapped, non-linear Landau damping no longer occurs and the back of the wave resumes linear behavior.On the other hand, since the electrons from the back are traveling to the front of the wave, the front of the wave continues to experience non-linear effects.For this reason, the back of the wave damps faster than the front.The Euler equations do not discriminate between the resonant electrons and the bulk fluid.Therefore, they are unable to capture this effect and require further modification.

A spatiotemporally non-local model of Landau Damping
We modify eq.6 by adding an auxiliary "hidden variable" δ = δ(t, x) that represents the resonant electrons.From this section on we write the equations in normalized form using electrostatic units where v → v/v th , x → x/λ D , t → ω p t, m → m/m e , q → q/e, and E → eE/m e v th ω p [22].Here, v th is the thermal velocity, ω p is the plasma frequency, and m e , and e, are the electron mass, and charge, respectively.Since we seek to minimize Landau damping in their presence, we assume a sigmoidal dependence, similar to that in ref. [25], and the resulting equation is given by The growth and transport of δ is given by This model makes use of the knowledge that, since δ represents the electrons in phase with the wave, δ must be transported at the phase velocity.It also uses a source term inspired by linearizing the Vlasov-Boltzmann equation.Because δ is initialized at zero, the proportionality factor of the source term is what is approximated via a machine learned function.

ADEPT -Automatic-Differentiation-Enabled Plasma Transport
To train the model in eq. 9, we need a simulator that can model the set of equations given by where the final term in eq.11 represents the non-local kinetic effect that is added to the typical Euler equations.The closure relation for that term is given by eqs.14, 15, 16 where ν g is the key coefficient that must be learned from data.   is the normalized value of the electric field amplitude of the kth mode number.We choose this formulation for normalizing the electric field to avoid possible problems with taking the logarithm of zero.

Solver and Simulations
We solve this set using a pseudo-spectral, finite-volume method with a fourth order time-stepper [26] in a spatially periodic domain.The gradients are computed using Fourier transforms.Similarly, Poisson's equation is also solved spectrally.All the other operations, e.g.advection, are performed in real space.

Training -Suppression of Landau
where f = f (t, x, v) is the electron distribution function, Z = 1 is the atomic charge, and n i = 1 is the ion density.Details for this solver are given in refs.[27,22] The fixed parameters are t max = 500ω −1 p , ∆t = 0.5, N x = 64, N v = 2048.In all simulations, we simulate a single mode wave.The parameters of the temporal envelope of the external forcing term are t w = 40ω −1 p , t r = 5ω −1 p , t c = 40ω −1 p .The implementation of the forcing term is given in ref. [22].

Fluid Modeling
The fluid simulations used for training the model have different spatial and temporal resolutions, given by N x = 16, ∆t = 0.025, but are performed over the same spatial and temporal domains as the kinetic simulations.
Since we seek to replicate the behavior of a single-mode electron plasma wave with our fluid model, our training objective function is to minimize the mean squared error between the timeseries from the Vlasov and fluid simulations.Formally, the loss function is given by where ⃗ p is the parameter vector that consists of unique combinations of the parameters listed in eqs.19, 20, and 21, |n 1 | is the time history amplitude of the first mode of the density profile with the subscripts of f and V correspond to fluid and Vlasov, respectively.It is important to note that we compare the logarithm of the timeseries in the loss function.This is because the wave amplitude can be damped over many orders of magnitude.Without leveraging the logarithm, the wave amplitudes at the end of the simulation will be negligible in comparison to that near the beginning and would effectively be ignored by the gradient descent algorithm.Training was performed using A10G GPUs.We implement data parallel training by running N batch simulations in parallel and averaging their gradients on the primary node that is running the optimization algorithm.We use ADAM with a learning rate of 0.004 as the optimization algorithm.We stop training after the performance has saturated for many epochs.Figure 4 shows the result of two sample simulations.We choose to show the result of these two sets of parameters because they are representative examples of two extremes that the machine learning model must be able to handle.
In fig.4(a), we show the results from a simulation with ν ee =, k 0 = 0.34, a 0 = 10 −6 .The two simulations agree well.This is a parameter regime where linear Landau damping is expected, and therefore, the theory from sec.3.2 applies.Therefore, theory dictates that to reproduce the desired behavior, δ must remain small.Figure 4(b) shows that δ ≈ 10 −5 as the theory suggests it should.Because we train using the logarithm, as stated in eq.23, we are able to capture the damping over five orders of magnitude.
In fig.4(c), we show the results from a simulation with ν ee = 10 −5 , k 0 = 0.32, a 0 = 10 −2 .The two simulations agree well.In this parameter regime, kinetic theory dictates that there is a complex interplay between the collisions, represented by ν ee , and the field amplitude, represented by a 0 .Here we expect that to reproduce the desired behavior, δ must be some finite quantity that appreciably, but not completely suppresses the damping.Figure 4(d) shows that δ ≈ 7 and the damping is suppressed by a factor of 50.Now that we have been able to learn a microphysics model, we use the coupled set of PDEs along with the newly acquired model weights to simulate a novel geometry and verify its performance qualitatively.In the previous section, we trained on a dataset of systems where a single wavelength wave was excited in a box of one wavelength in length.To assess the performance of our learned model in unseen geometries, we perform simulations where the box size is increased by a factor of 100 and a 20 wavelength long, finite-length wave is modeled.This tests the ability of the machine learned model combined with the hidden variable approach that was trained on a limited geometry to generalize to larger length scales and different boundary conditions.To verify the fidelity, we compare the results of the learned fluid simulations to their kinetic counterparts acquired by using the Vlasov-Boltzmann solver.

Testing -Extending to unseen geometries and modeling unseen phenomena
In refs.[24,22], it has been shown that finite-length wavepackets damp nonuniformly due to a non-linear, kinetic phenomenon.In fig.5(a), we show an example of this phenomenon.Figure 5(b) shows that if a local implementation of Landau damping is used, for example, like the one given by eq.6, the non-uniform damping will not be reproduced because the damping rate will be the same along the length of the wavepacket.In contrast, fig.5(c) shows that the implementation using the machinelearned dynamical hidden variable approach is much more capable of reproducing the results from the ab-initio simulation.Figure 5(d) confirms that δ grows to account for the suppressed damping, and then is transported to account for the non-locality in space and time.In this work, we seek to highlight the fact that because we choose to embed the machine learning model into a set of coupled PDEs that are capable of generalizing, we can train the model on a simple system and can scale the learned physics to a significantly more complex domain.In this case, we scale a single-wavelength periodic system to a finite-length wavepacket in an open boundary simulation but this model could be very well be extended to multiple spatial dimensions.If the use-case were to acquire as accurate a model as possible in order to perform multi-scale modeling of an actual experiment, one could and should train the model on as many geometries as is feasible.

Conclusion
The marriage of numerical PDE solvers and deep learning driven models is more than coincidental because they are both products of numerical computing.The advent of scalable and performant scientific computing libraries that are capable of automatic differentiation, e.g.JAX and Julia, has enabled a seamless interchange of ideas between the two paradigms where the lines between each continue to blur.
In this work, we show how the two paradigms can be intimately woven together and leveraged towards performing multiscale modeling where the equation set does not natively contain all the necessary microphysics, and where the microphysics model itself has proven to be challenging to construct using conventional methods.By training against a dataset with a relatively simple geometry and parameters, we are able to learn a model that can emulate spatio-temporally non-local physics over broad parameter ranges and geometries, physics which can otherwise only be described by computationally expensive higher-dimensional PDE solvers.The key to preserving the spatio-temporally non-local dynamics is providing a dynamical system for a hidden variable that governs the physics.By allowing for growth and transport of the hidden variable, we enable non-locality in time and space.
We envision that a model that leverages machine-learned hidden variables can be used for various phenomenon inside and outside of plasma physics.For example, hot electrons created from laser-plasma instabilities are deleterious towards successful fusion experiments.Accurate modeling of their dynamics and energetics in fully integrated hydrodynamics simulations of fusion experiments can help substantially improve the predictive power of simulations and improve the results of fusion experiments.

Data Availability Statement
The data that support the findings of this study are openly available at the following URL: https://github.com/ergodicio/adept

Conflict of Interest
Authors declare no competing interests.

Figure 1 :
Figure 1: A single step in the training loop is illustrated.A batch of physical parameters is chosen.(Bottom) Those parameters are then fed to a kinetic simulator and an observable is calculated.(Top)The same parameters are fed to the differentiable fluid simulator that includes the machine learned hidden variable.This system is evolved in time over the same duration as the kinetic simulation.The observables from each simulation are used for calculating the loss function.The loss, and more importantly, the gradient of the loss with respect to the weights of the function approximator is computed.That gradient is used by an optimization algorithm to update the weights of the hidden variable model.The updated weights are used in the next batch of simulations.In practice, we precalculate the reference simulations.

Figure 2 :
Figure 2: The training data are single mode simulations of plasma waves that are excited by a short duration driver using the fully-kinetic, Vlasov formulation.The training involves reproducing these excitations with the simulations using the modified fluid equations.

Damping in large amplitude plasma waves 5 . 1 . 1 .
Dataset from kinetic simulations We train our model against simulations collected from a Vlasov-Boltzmann equation solver.The equations solved by the fullykinetic solver are

Figure 3 :
Figure 3: The training and validation loss is plotted against the number of epochs.

5. 1 . 3 .
Assessment We use a 90%/10% training and validation split.In fig.3, we show the mean squared error over the training epochs for training and validation.

Figure 4 :
Figure 4: (a) A small amplitude simulation.In this simulation, δ is not needed to modify the damping rate because the system exhibits linear dynamics.Accordingly, (b) shows that δ remains small.(c) A large amplitude simulation with non-linear Landau damping.δ is needed to modify the damping rate.The linear fluid model is incapable of reproducing the desired physics.To recover the non-linear effects, the simulation leverages δ and (d) shows that δ ≈ 7.

Figure 5 :
Figure 5: (a) shows the ground-truth density profile over space and time for a large amplitude excitation.(b) shows that using a local model for Landau damping is insufficient in modeling this excitation.(c) shows the result of a simulation that uses machine-learned δ(t, x) model which shows better agreement and reproduction of the non-local damping.(d) shows that δ grows to 3 in the case of this large amplitude perturbation to account for the non-locality.It is important to note that the scale length over which the model is trained in fig. 4 is significantly different than the scale length to which the model is applied here.

Figure 6 :
Figure 6: (a) shows the ground-truth density profile over space and time for a small amplitude excitation.In contrast to fig. 5, (b) and (c) agree well here because at small amplitudes, the damping remains local and the non-local model is expected to converge to the local result.Correspondingly, (d) shows that δ remains small.