Stokesian processes : inferring Stokes flows using physics-informed Gaussian processes

We develop a probabilistic Stokes flow framework, using physics informed Gaussian processes, which can be used to solve both forward/inverse flow problems with missing and/or noisy data. The physics of the problem, specified by the Stokes and continuity equations, is exactly encoded into the inference framework. Crucially, this means that we do not need to explicitly solve the Poisson equation for the pressure field, as a physically meaningful (divergence-free) velocity field will automatically be selected. We test our method on a simple pressure driven flow problem, i.e. flow through a sinusoidal channel, and compare against standard numerical methods (Finite Element and Direct Numerical Simulations). We obtain excellent agreement, even when solving inverse problems given only sub-sampled velocity data on low dimensional sub-spaces (i.e. 1 component of the velocity on 1D domains to reconstruct 2D flows). The proposed method will be a valuable tool for analyzing experimental data, where noisy/missing data is the norm.


Introduction
Stokes flow, characteristic of flows at small length scales (e.g.biological flows and flows in porous media) and/or large molecular weight polymeric fluids, is ubiquitous in life and applied sciences.There exist general and well-established numerical methods for solving such problems, such as boundary element methods (BEMs) [1,2] and finite element methods (FEM) [3].However, due to their computational cost, these methods are not particularly well suited for solving problems with complex/moving boundaries.For such cases, e.g. for simulating the dynamics of dense particle dispersions in a Newtonian fluid in the small Reynolds number limit, the Stokesian dynamics (SDs) method would provide the current state-of-the-art [4].Unfortunately, SD does not generalize to non-Newtonian host fluids.Alternatively, one can employ direct numerical simulations (DNSs), and solve for the Navier-Stokes equation.However, for low-Reynolds number flows, this necessitates the use of increasingly smaller time-steps, which dramatically increase the calculation cost.Furthermore, when analyzing experimental data, where only noisy and/or partial information is available, and boundary conditions are not always fully specified, these standard methods are incredibly cumbersome to use.In other words, these methods are well-suited for solving forward problems, but they are generally unsuited to handle inverse problems.For such cases, physics informed machine-learning (PIML) methods have shown great promise.
Indeed, interest in, and applications of, PIML has seen dramatic growth in the past five years, in particular within the fluid dynamics community, where experiments and simulations generate a substantial amount of data that is not trivial to analyze.Most work, however, has focused on developing physics informed neural-networks (PINNs) for high Reynolds number (turbulent) flows [5][6][7][8].The reason for using neural networks (NN) lies in their ability to approximate solutions to (non-linear) PDEs [9,10].Example PINN applications for fluid mechanics include inferring the velocity and pressure fields from flow measurement [10,11], or learning solutions for potential flow past a cylinder, Taylor-Green vortex, or thermal convection [12], as well as generating solutions to the steady Navier-Stokes equations [13].It is also possible to learn non-linear operators directly, leading to the development of physics informed neural-operators (PINO), which are more flexible than PINN and can dramatically increase their performance [14][15][16][17][18].However, with both approaches (PINN or PINO) the physics is typically included by defining a custom loss function, which should vanish (after training) if the solution encoded by the NNs satisfies the underlying equations and constraints (e.g.Navier-Stokes, incompressibility, boundary conditions).As such, the physics is only satisfied approximately.The possibility of building custom NN architectures that exactly satisfy the underlying equations by construction, remains out-of-reach at present.Furthermore, while the use of Bayesian NNs makes it possible to incorporate uncertainty estimates [19,20], it is not always clear how to specify an appropriate prior, and they come with an extremely high computational cost.
Gaussian processes (GP) offer an alternative probabilistic machine-learning framework to NN [21][22][23].GPs provide a probability distribution for functions, making it trivial to incorporate noisy and/or missing data.In addition, they are closed under linear transformations, which means that they can be used to perform exact inference for the solution of linear differential equations.They have been used, e.g. to learn the constitutive relations of polymer melts [24][25][26][27][28], discover partial differential equations from noisy measurements (i.e.infer the coefficients to a given equation) [29], as well as for learning Hamiltonians of conservative or dissipative systems from noisy and sparse data [30], among others.Here, we use physics informed GP to establish a generic framework for solving forward/inverse Stokes flow problems.The fact that the Stokes equation is linear makes it ideal for GP, as we can leverage all their benefits, without requiring any approximations [23].In contrast, if we were to solve the Navier-Stokes equation we would need to linearize the advection term [31].Here, we use a GP regression framework, in which the governing equations are directly encoded into the kernel functions.In this way, we are able (on average) to exactly satisfy the physics of the problem, here specified by the Stokes (force-balance) equation and the continuity equation.We are able to trivially solve both direct and inverse problems, e.g.solving for the flow given the boundary conditions or reconstructing the flow given partial measurements of the velocity and possibly the pressure.
The use of physics informed GP to solve differential equations has seen many notable developments.Among them are the works of Raissi, Perdikaris, Karniadakis and colleagues.In one of their earlier works, they showed how to use GP to learn the solutions of linear differential equations using noisy measurements [32].They demonstrated their method on a fractional differential equation, an integro-differential equation, and a reaction diffusion equation, all in 1D.Further work considered how to use noisy multi-fidelity data to solve such linear equations [33,34], as well as extensions to non-linear differential equations, where it becomes necessary to linearize the underlying operators [29,31].Recently, Pfortner et al have presented a detailed examination of GP regression as a general framework for solving linear PDEs, in particular, they provide a rigorous proof for the equivalence of GP with weighted residual methods (e.g.finite element, spectral, and generalized Galerkin methods) [35].To the best of our knowledge, no attempt has been made to develop GP as a generic Stokes flow solver.A related work is that of Michelen-Strofer et al who have used GP to infer the eddy viscosity from solutions of the Reynolds-averaged Navier-Stokes equation [36].Their setup, while similar to ours, requires both velocity and pressure measurements.In contrast, by encoding both the Stokes and continuity equations into our inference, we show that it is possible to reconstruct a flow field given only partial velocity information (the appropriate pressure field is simultaneously inferred).

GP
A GP defines 'a collection of random variables, any finite number of which have a joint Gaussian distribution' [21].Given that they are universal function approximators, they provide a convenient Bayesian framework for making inferences on functions, e.g.solving regression or classification problems.A GP for a function f(x), conditioned on input locations X = [x 1 , x 2 , . . ., x N ], provides the joint distribution for function values f(X) = [f(x 1 ), f(x 2 ), . . ., f(x N )], in the form of a multi-variate Gaussian [21,22] with µ and k the average and kernel functions, which together completely specify the model where δf = f − ⟨ f ⟩, and θ are the kernel hyperparameters (to be inferred from the data).In what follows, we will usually drop the explicit X and θ dependence and use the shorthand notation f ∼ N (µ, K) to denote a GP, where Without loss of generality, we can always take the average function µ to be zero [21].We will use the square exponential kernel as the basic 1-d kernel from which all other (composite) kernels are derived.Note that this kernel function contains only two hyperparameters, l and γ, controlling the length-scale and amplitude of the correlations, respectively.Kernels for higher-dimensional spaces can be obtained from a direct sum or product of lower-dimensional kernels [21].For example, for the 2D spaces considered here we have where x = (x 1 , x 2 ) = (x, y).Unless otherwise stated, all our calculations are obtained using the product kernel construction k × , with a squared exponential along each dimension.For these (2D product) correlation functions there are two separate length scale hyperparameters l x and l y , but only a single amplitude hyperparameter γ.We note that more sophisticated kernels can be used, e.g.defining kernels using the Ehrenpreis-Palamodov fundamental principle [37] or using deep Kernel learning [38,39].GPs allow us to 'learn' a function by inferring the values at unknown 'test' positions X ⋆ , given known values f(X) at 'training' positions X. Formally, GPs can be used to exactly compute the posterior distribution for f ⋆ = f(X ⋆ ), given f = f(X).Consider the joint GP containing both the training and test datasets, the distribution for f ⋆ , conditioned on f , is another GP given by [21,22] where (•) T denotes a matrix transpose.This inference can also be performed in the presence of noisy data.Assuming a normally distributed measurement error in f, this is accomplished by adding a diagonal term σ 2 i to the corresponding correlation matrix K.This additional variance can either be a known quantity (e.g.experimentally measured) or an additional unknown hyperparameter to be inferred from the data.GP are closed under linear transformations, such that, if f ∼ N (µ f , K ff ) and g ∼ N (µ g , K gg ), for any linear operators A and B, and function c, the following linear combination results in another GP [40] Af This linearity allows us to perform exact inference of Stokes flow problems.Finally, the 'learning' for GP is performed by finding the 'optimal' hyperparameters, given the training data.This is defined in terms of the posterior distribution [21,41,42] where I denotes all prior information and model hypotheses (e.g.average and kernel functions), p(f|X, θ, I) is the GP likelihood (given by equation ( 1)), and p( θ| I) is the prior for θ.A full Bayesian solution would average the model predictions over this distribution for the model hyperparameters, however, for simplicity we instead use the maximum a posteriori estimate given by In practice, we obtain the optimal hyperparameters θ ⋆ using a Nelder-Mead downhill simplex method (500 iterations).In addition, we have added a small artificial noise ϵ = 10 −3 to the training data covariance matrix, in order to improve the numerical stability of the matrix inversion, i.e.K → K + ϵ 2 I [21].all known values for the velocity and pressure (e.g.boundary conditions), as well as the constraint points to impose the physical equations, here the force balance and divergence free conditions, equations ( 12) and ( 13).The resulting posterior distribution can be used to infer the value of the velocity and/or pressure fields at arbitrary positions.

Stokesian processes (SPs)
The Stokes equations for an incompressible fluid are given by the force/momentum balance and continuity equations [43][44][45] −∇p with p the pressure, u the velocity, f the body forces, and η the fluid viscosity.For what follows it is convenient to express these equations in component form as, where D α and L are the partial derivative and Laplacian operators, respectively, and the Einstein summation convention is employed.For a standard flow problem, with known boundary values for the velocity and pressure, as well as the body forces, the appropriate GP prior, containing all relevant fields/variables, is given by where u ⋆ and p ⋆ denote the values for the velocity and pressure that are to be inferred (i.e. the test points), respectively, while u, p, f , and s denote the values of the velocity, pressure, body force, and velocity divergence at the training points (see figure 1).The correlation matrices can be written as Depending on the dimensionality and the problem at hand, the vectors / matrices used to define the GP of equation ( 16) will have a block-like structure.Specifically, for the 2D flow problems considered here, we specify the components of the velocity, u 1 and u 2 , at We can use different training points for the different velocity components, but unless stated otherwise, we will use the same training and test locations for all components of vector-valued functions.Furthermore, we note that the correlation functions required to compute K ⋆⋆ and Q are exactly the same as those used for K (i.e.u and u ⋆ refer to the same velocity function), the only difference being the positions at which the correlations are evaluated (training points X or test points X ⋆ ).At this point, we would require a separate kernel function to specify each pair of correlations.However, as the different variables are all linearly related through the Stokes and continuity equations, we can express all correlation functions in terms of only d(d + 1)/2 velocity-velocity correlations k u α u β , d velocity-pressure correlations k u α p , and a single pressure-pressure correlation function k pp (d the spatial dimension).These derived correlation functions are given by where a prime ( ′ ) indicates that the operator should act on the second argument of the kernel function, i.e. , y).The basic k u α u β , k u α p , and k pp kernel functions, which are used to define all other field correlations (equations ( 20)-( 26)), are defined as products of 1d square exponential functions, as given in equations ( 4) and ( 6).These derived correlation functions (i.e.those involving the governing equations, f α and/or s) are not square exponentials.We note that it is possible to reduce the complexity of the optimization problem by considering independent Gaussian process priors for the velocity and pressure variables [29,31], e.g. by setting the velocity-pressure cross-correlations to zero, k u x p = k u y p = 0.In fact, preliminary results show that this type of construction can improve the convergence of the optimization, without significantly degrading the accuracy of the predictions, however, we will not consider such approximations in this work.
Finally, the inferred solution to this generic Stokes flow problem is given by equation (8), which now takes the form where f = s = 0 and u and p are the known values of the velocity and pressure at the given training points (e.g. the boundary values).The optimal hyperparameters θ ⋆ are obtained by maximizing the corresponding log-posterior distribution, see equation (10), which is given as By definition, the first term on the right-hand side of equation ( 28) is given by the log-likelihood of the GP prior N (0, K), the second term is given by the prior for the N θ = 3 (d(d + 1)/2 + d + 1) = 18 hyperparameters (2 × 6 correlation length-scales and six amplitude parameters).For this last term we use independent Jeffrey's priors [41,46] log p This probabilistic formulation of the Stokes flow problem allows us to trivially handle noisy and/or missing data, as well as mixed boundary conditions.In particular, we are not required to specify the values of the velocity and pressure fields, we can also specify their derivatives, or the difference between their values at different positions.The latter is particularly useful to enforce periodic boundary conditions, or a pressure difference ∆p across the channel, both of which will be used in what follows (see appendix B).

Setup: pressure driven flow through a sinusoidal channel
To test our proposed SP framework, we consider the inference of a pressure driven flow through a sinusoidal channel in 2D (see figure 2).The channel is symmetric with respect to the y = 0 axis, the width w is a function of position x, given by The channel is characterized by its oscillation amplitude a and the dimensionless parameter ϵ, defined as the ratio of the mean width ⟨w⟩ and the wavelength L While the exact solution is not known, approximate analytical solutions have been obtained as a series expansion in ϵ [47,48].To fully specify the problem the following boundary conditions are imposed: no-flux and no-slip at the walls, with periodic boundary conditions for the velocity u, together with a constant pressure difference ∆p between the inlet (x = 0) and outlet (x = L) These can be re-expressed using the shifted difference operators (see appendix B), as (S 1 L u α )(0, y) = 0 and (S 1 L p)(0, y) = −∆p.
We will compare our inferred SP results with the 4th-order approximate solution [47,48].To aid in the comparison all quantities are non-dimensionalized using as characteristic units the viscosity η, the average channel width ⟨w⟩, and the flow rate Q for the corresponding flat parallel plate flow problem, given by the Poiseuille equation [43] The characteristic units for length, velocity, pressure, and body force are then ⟨w⟩, Q/⟨w⟩, ηQ/⟨w⟩ 2 , and ηQ/⟨w⟩ 3 , respectively.All results shown here have been non-dimensionalized using these characteristic units, for η = ⟨w⟩ = Q = 1.We consider the flow problem specified by the (non-dimensional) values L = 2.5, a = 0.2, and ∆p = −30.Furthermore, for our calculations, unless otherwise stated, we have redefined the pressure as p → p + x∆p/L, such that the corresponding term in the Stokes equation becomes −∇p → −∇(p + x∆p/L) = −∇p − e x ∆p/L (e x the unit vector along the x-axis).In this way, the pressure drop driving the flow can be considered as a constant external driving force, i.e. f = −e x ∆p/L, with the (redefined) p now periodic along x by construction, i.e. (S 1 L p)(0, y) = 0.

Flow inference given full information
We first consider the direct problem, i.e. that of inferring the solution to the completely specified flow problem.We compare the SP solution against the corresponding FEM and DNS results, as well as the approximate analytical solution.Given the fact that the FEM solution is directly solving the Stokes problem, we consider this to be the exact/reference solution, and use it when evaluating the error for our SP prediction.
For the SP solution, we use as training points the stick-boundary conditions u = 0 for the velocity at the walls, the periodic boundary condition along the x-direction for velocity and pressure, i.e. (S 1 L u)(0, y) = (S 1 L p)(0, y) = 0, and the force balance f = −∆p/Le x and solenoidal conditions s = 0 inside the domain.Thus, the appropriate SP prior is of the form with the relevant posterior SP given by For simplicity, we have not written out the explicit form of the average and covariance; they are given by equation (8).In contrast to the basic SP formulation of equations ( 16) and ( 27), here we assume no knowledge of the pressure, only the pressure gap ∆p between the inlet/outlet is used, which we introduce as a constant forcing term f .In addition, we train on the periodicity of the velocity u and pressure p.We have used a total of N train = 1180 training points, 169 to specify the boundary values (stick boundary conditions at the walls and periodic boundary conditions for the velocity and the pressure,) and 1011 for the governing equations (force balance and continuity).The training points were chosen to uniformly sample the corresponding computational domain, shown in figure 3. Likewise, for the test/inference points we use a uniform grid of N test = 5184 points to infer the solution.
For the training, we perform a grid-based search over the initial hyperparameters.Let γ 1 be the initial guess for the variance scale of the u − u and p − p correlations, γ 2 the initial guess for u − p correlations, and l the initial guess for the length scale hyperparameter for all correlations.We train/optimize the system over a grid (γ 1 , log l) of initial hyperparameter values, defined over (0 ⩽ γ 1 ⩽ 1.4, −2.8 ⩽ log l ⩽ −0.6) (grid spacing of 0.2), with γ 2 = 0.That is, for each set of initial values (γ 1 , γ 2 , log l), we perform a full optimization run over the 18 total hyperparameters.For this direct problem, we have chosen the results that yield the maximum log-posterior.We use the corresponding initial hyperparameters (γ 1 = 1.2, γ 2 = 0, log l = −1.2) as initial values for all subsequent SP inferences on this flow geometry (we have not reused the final optimized values).
The FEM solution is obtained using the finite element computational software (FEniCS) [49,50], an open-source computing platform for solving partial differential equations.The number of mesh vertices was 3893, the number of elements was 7204, and the number of iterations required by the FEM solver was 10 000.Finally, the DNS solution for the full Navier-Stokes equation is obtained using the Smoothed Profile method [51], for a sufficiently small Reynolds number Re ≃ 10 −2 .In this case, we use a rectangular simulation box, discretized on a square grid of dimensions 256 × 160 (40 960 points), for a lattice spacing of ∆x = 9.76 × 10 −3 .The Navier-Stokes equation is solved using a pseudo-spectral method in space, and a first-order exponential time-difference method in time [51].A time-step of ∆t = 1.93 × 10 −6 was used, determined by the momentum-diffusion time-scale and chosen to minimize the numerical error [51,52].The total number of steps was 1.2 × 10 5 , at which point the system is in the steady-state.
Results obtained using the SP, as well as the reference FEM solution, and the corresponding relative and absolute errors, are given in figure 4. We obtain excellent agreement with the FEM results, with a maximum absolute error of |u α FEM − u α SP | ≃ 10 −2 , which is seen in the fast flow regions, where the channel width is smallest (u x ≃ 1).The relative errors can become very large in regions where the velocity goes to zero, particularly for u y , but are on average 0.54% for u x and 4.1% for u y .For a more convenient and direct comparison, we have plotted the corresponding one-dimensional velocity profiles at four characteristic positions along the channel, see figure 5.Here we also compare against the DNS calculations and the semi-exact solution.All three numerical results, FEM, DNS, and SP yield almost indistinguishable solutions, at least on the scale of the plot.The small deviations of the DNS solution are due to the smearing out of the fluid-wall interface inherent to the smoothed profile method (essential for performing efficient simulations in the presence of moving boundaries).The semi-exact solution, obtained as a series expansion to fourth order in ϵ shows significant deviations, particularly for the y component of the velocity at points of high curvature in the wall boundaries.
Finally, we note that the accuracy of the solution is directly related to the number of training points used in the inference, as evidence by figure 6, which shows the results obtained using roughly one-half and one-third of the (base) high-resolution data of N train = 1180.The regions of high/low accuracy are similar among the three resolutions, but the magnitude of the error steadily decreases as the number of training points is increased.Together, these results clearly demonstrate that SP are able to accurately predict Stokes flows, even under complex boundary conditions.

Flow inference from sparse samples
Next, to show that the SP framework is able to reconstruct the flow given sparsely sampled training data, we infer the solution given values of the fluid velocity at (40) randomly chosen points inside the channel (obtained from the reference FEM solution), without specifying the stick boundary conditions at the walls (only the periodicity of the solution is used).For enforcing the pressure periodic boundary conditions and the constitutive equations we use exactly the same training points as above, illustrated in figure 3. Thus, the SP posterior is again given by equation (36).The training/optimization for the kernel hyperparameters is carried out over 500 steps, using the procedure described above.The (negative) log-posterior, refer to equation (11), is plotted in figure 7 as a function of training steps.For steps ≳300, while the loss is still decreasing, the rate of change is considerably smaller compared to early times.More importantly, we clearly observe that the SP predictions are not very sensitive to the change in hyperparameters.We have observed similar behavior for all other problems considered here.In contrast, PINNs show a strong dependence on the hyperparameters, and require considerably more iteration steps to arrive at a physically meaningful solution (see appendix C).The reconstructed flow is given in figure 8, together with the relative and absolute error, and the SP prediction uncertainty.Near the velocity training points the prediction error is of the same order of magnitude as that obtained from the inference using the full set of boundary conditions (figure 4(d)).As expected, this prediction error increases as one moves away from the training points.However, even for such a sparsely sampled solution, we obtain good overall agreement, i.e. the correct flow profile is recovered, as shown by comparing the velocity contour lines.Furthermore, we obtain good correspondence between the error of the solution and the SP prediction uncertainty (provided by the posterior covariance matrix of equation ( 36)).This is further evidence of the robustness of the proposed approach, since the solution error can only be computed if the exact solution is already known.Here, we see that the SP uncertainty can be effectively used as an estimate for the reliability of the inference, likewise, for identifying regions where the density of training points should be increased.This ability to transparently handle uncertainties, in both the training and prediction, is one of the main benefits of our Gaussian process based method over conventional deep-learning approaches (see appendix C for a comparison with a PINN solution).
To verify that the inference indeed satisfies the physics of the problem, as it should since the Stokes and continuity equations have been directly encoded into the correlation matrices, we have plotted the inference of f (force balance) and s (continuity) in figure 9.As expected, at the training points the governing equations are exactly satisfied (on average).Given the relatively dense distribution of training points for f and s, this results in excellent agreement throughout the domain, at least away from the boundaries.This can be easily seen by plotting the inferred values as a function of the height along the channel, as in figure 10.The predicted (average) values coincide with the exact solution at the training points, with small amplitude oscillations between points, which increase sharply at the walls.In conclusion, the SP provides an excellent interpolation of the governing equations, but the accuracy of the inference near the boundary is slightly reduced, as no training points were included at the wall.Finally, we consider how the error varies with the number of randomly sampled velocity training points, for a fixed number of training points on the governing equations and the periodic boundary conditions.For the case considered above, using 40 velocity training points, the mean absolute errors between the SP prediction and the FEM solution were 1.0 × 7.9 −3 for u x and 4.3 × 10 −3 for u y .The corresponding mean relative errors were 1.1 × 10 −1 for u x and 4.8 × 10 −1 for u y .Figure 11 shows the dependence of the average and maximum absolute error for u on the number of velocity training points.For ≲ 50 training points, the error noticeably decreases as the number of training points is increased.After this, the decay in the error becomes much weaker, with that of u y roughly constant at ≃ 10 −2 .The difference in the error between u x and u y is likely due to the difference in the velocity profiles, as u y shows a more complex structure, with both positive and negative flow regions, whereas u x is strictly positive.An inspection of the error maps shows that the error is highly localized near the boundaries/walls, at locations with no training points.Understanding the convergence properties of such a probabilistic flow solver, in terms of the number and arrangement of the training points remains an open question.

Flow inference from partial data: 1C on 1D domains
We now show that the SP can infer the solution to Stokes flow problems given incomplete boundary conditions together with a reduced set of velocity measurements.In particular, we are interested in situations where the velocities are known only on lower dimensional manifolds, as is typical for particle image velocimetry experiments [53,54].For this, we consider an idealized situation where only 1 component of the velocity is given, along 1D domains.All other training points (i.e. for the governing equations) are identical to those of figure 3.In particular, the x-component of the velocity is sampled at 8 + 4 = 12 points, along lines of constant x (i.e.vertical slices), corresponding to the points of maximum x 1 ≃ 0.636 and minimum x 2 ≃ 1.91 width.Likewise, the y-components of the velocity are sampled at 5 + 2 × 15 + 6 = 41 points, along lines of constant y (i.e.horizontal slices), at four different heights y 1 ≃ −0.573, y 2 ≃ −0.233, y 3 ≃ 0.276, and y 4 ≃ 0.615.Our results are presented in figure 12, showing excellent agreement with the reference solution.As before, we obtain good quantitative agreement between the prediction errors and the prediction uncertainty, showing the robustness of the SP.For reference, the mean relative errors between the SP results and the FEM solution were 14% for u x and 37% for u y .As before, high relative errors are concentrated near regions where the velocity tends to zero.
Remarkably, we are also able to effectively infer the location of the boundary from the zero-level contours of the velocity, as shown in figure 13.In contrast to all previous results, computing these velocity contours requires that we infer the velocity within the wall domain.For this, we have used an inference grid defined over the rectangular bounding box, of height equal to the maximum channel width ⟨w⟩ + 2a.We note that the inferred velocity inside the wall domain is not physical.In particular, as no training points are given inside the wall, the SP will use the correlation information learned within the fluid domain to extrapolate into the wall domain.To obtain the correct behavior, it would be necessary to incorporate the rigid-body constraints, as well as consider more sophisticated (e.g.non-stationary) kernels.

Pressure drop inference: flow past a cylinder
We now consider a more complex flow problem, by introducing a cylinder/disk inside the sinusoidal channel.The radius of the cylinder is R = 0.195⟨w⟩, and it is placed at the point of maximum height x = L/4.The flow parameters are the same as above, in particular ∆p = −30.The inference problem we solve is the following: given randomly sampled velocities inside the fluid domain, infer the pressure drop that is driving  the flow.For this, we use a similar training setup as that of figure 8, except that no information is given regarding the pressure drop, instead, we use f = 0 and remove the periodicity constraint for p, which is now the 'total' pressure (i.e. it contains the hidden pressure drop driving the flow).Thus, the relevant SP prior is L (u α )(0, y) = 0.In contrast to previous inferences, here no information is given regarding the pressure drop.
with the posterior The training point configuration is given in figure 14.Note that (1) no pressure information is given, and (2) we directly infer the pressure difference across the channel (as a function of height y).We use the reference FEM velocities to train this SP, by considering 100, 200, 400, and 800 random velocity points inside the domain.We repeat the grid-based optimization over initial hyperparameters, with γ 1 = 0.4, γ 2 = 0, and log l = −2 providing the maximum log-posterior.The inference results for the pressure drop ∆p(y) = p(L, y) − p(0, y) = −(S 1 L p)(0, y) are shown in figure 15, as relative errors with respect to the true value of ∆p = −30.Within the inner channel −0.4 ≲ y ≲ 0.4, the relative error is ≲5%.The error increases at the walls, but is still only ≃10% for all but the lowest resolution case, where it is ≃20%.

Discussion and conclusion
We have shown that the SP can infer the solution of 2D Stokes flow problems in complex geometries.In particular, the inference works even if the boundary conditions are not fully specified, e.g. from sparse samples of the velocity data.Crucially, as the physics of the problem is directly encoded into the inference framework, by expressing the correlations given by the Stokes and continuity equations, the pressure is not required for the inference, it is automatically computed.We have shown the accuracy of the method for solving a pressure driven flow problem between a sinusoidal channel when specifying (1) the velocity boundary conditions, (2) randomly sampled velocity training points inside the domain, and (3) 1 velocity component along 1d domains (1C1D).In addition, we have also considered the problem of flow past a cylinder (within the sinusoidal channel), and how to infer the pressure drop driving the flow from random velocity measurements.We obtained very good agreement in all cases, with the error decreasing as the number of training points is increased.Furthermore, the physical equations are exactly satisfied (on average) at the training points.This is in stark contrast to conventional PINNs, which include the governing equations in the loss function, and thus can only satisfy the constraints approximately.
In terms of calculation costs, when solving fully-specified (forward) problems, standard methods such as FEM are still faster than SP at this point.However, when considering 3D moving boundary problem, the required re-meshing for the FEM will dramatically increase the computing time, and our proposed SP, which can use arbitrary meshes, has the potential to outperform the FEM.In any case, the full potential of SP is realized when tackling inverse problems (e.g.flow reconstruction from noisy data), where standard methods cannot be applied.
Future work will focus on studying the convergence properties of our probabilistic flow solver, in order to understand how the number and arrangement of the training points affects the prediction accuracy.In addition, we will extend the current method to 3D, which will require it to scale to O (10 6 ) training points, at least.For this, we expect to replace the currently employed Cholesky decomposition for computing the (inverse) matrix-vector products, with the BBMM algorithm [55].Finally, we will also incorporate moving boundaries into the inference, to consider the dynamics of colloidal dispersions, as well as biological flows.This will allow us to provide a principled (Stokes) flow framework for analyzing experimental data, taking into account the known physical laws, as well as the uncertainties inherent to the experimental measurements.
training cost is ≲20 m.Using the same set of test/training points, the time required to generate the PINN solution (2 − 3 hidden layers with 32 − 256 neurons per layer), over N steps = 3 × 10 5 steps, was ≃ 40 m (see appendix C).The reference FEM and DNS calculations (on a single core) required ≃ 0.5 m and 5 m, respectively.
We note that our custom SP inference code is written in Python+JAX [56], which provides composable automatic differentiation and just-in-time (jit) compilation capabilities for running on GPUs.However, our current implementation is limited by two key aspects.First, we use a Cholesky decomposition to perform the matrix inverse calculations, which scales as O(n 3 ).Second, the outer-most calculation loop, required to build the SP (physics-informed) kernel is written within a plain Python 'for' loop, which prevents us from fully jit-compiling the entire code.As future work, we will replace the Cholesky decomposition with the BBMM-method, which scales as O(n 2 ), and can be efficiently parallelized on multi-GPU setups, allowing for millions of training points [55].In addition, we will also rewrite the inference code to be fully JAX-compatible and jittable.

Appendix B. Periodic boundary conditions
To impose periodic boundary conditions on the inferred solutions, in particular, or to fix the difference between function values at different points in space, in general, it is convenient to introduce the shifted difference operator S, defined as where a is an arbitrary field, and R an arbitrary vector in the input domain.Given the linearity in the shift operator, averages µ a = ⟨a⟩ and correlations k ab = ⟨δaδb⟩ of these shifted functions are trivially computed, Periodic boundary conditions along dimension α (with period L) are expressed as (e α the unit vector along α) In practice, it is enough to specify the values at the domain boundaries.For the 2D channels considered in this work, periodic along the x dimension (length L), periodicity in the velocity field is imposed as Likewise, to specify a constant pressure drop ∆p, we would have Correlations between these shifted velocities and pressures and the all other non-shifted variables are computed using equations ( 20)-( 26) and (B2)-(B5).All correlations are still expressed in terms of the same velocity-velocity, velocity-pressure, and pressure-pressure correlations.We use a feed-forward neural network with two input units (x, y), three hidden layers of 64 neurons each, and three output units (v x , v y , p).A hyperbolic tangent activation function is used for all layers except the output.Automatic differentiation is used to construct the residual functions (e α ) required to build the loss function.A schematic diagram of the network is shown in figure 16.The network parameters are trained using the ADAM algorithm [57] with default parameters α = 10 −3 , β 1 = 0.9, β 2 = 0.999 and ϵ = 10 −8 , as implemented in the Optax library [58].
We train the network for 3 × 10 5 steps, at which point the loss is ≃ 10 −3 .PINN predictions for the problem of reconstructing the flow field from randomly sampled velocities, using exactly the same set of training/test points as in section 3.3, are shown in figures 17-19.The loss as a function of training step is given in figure 17, together with the u y predictions at two steps during the training, at 10 4 steps and 3 × 10 5 steps.Not surprisingly, predictions obtained using the early-time hyperparameters are quite poor.This is a direct consequence of the way in which the physics is encoded, and is in stark contrast to the SP results shown in figure 7.For what follows, we will refer only to results obtained using the final hyperparameter values.The velocity field is qualitatively reproduced, but the predictions are clearly inferior to those of the GP-based SP method, see figure 8.More importantly, figure 19 clearly shows the inability of the PINN to satisfy the governing equations, even at the location of the training points (compare figures 10 and 19).All ML calculations (SP and PINN) were performed on the same hardware, however, the PINN took ≃ 20× longer to train compared to the SP.Increasing the number of PINN training steps can improve the predictions, in particular those for the velocity, but the deviations in the constitutive relations remain.
The loss function we have used is not the only choice, and it is possible that better results could be obtained by tuning the relative weights for the different loss terms (we use equal weights), or changing the norm (we use an l 2 -norm).In fact, the same is true of our choice of Kernel for the GP inference.However, given the nature of the learning problem, this is unlikely to solve the issues seen with the PINNs.In contrast, the SP inference (1) exactly satisfies the governing equations at the training points (within the fixed tolerance ϵ) and (2) allows one to transparently incorporate uncertainties.Furthermore, because of the way the physics is encoded in each of the methods, directly into the structure of the correlation matrices and as constraints in the posterior probability distribution (SP) or indirectly through the loss (PINN), the SP inference is also seen to be faster.

Figure 1 .
Figure 1.Schematic representation of the training and test points used for the Stokes flow inference.The training points include all known values for the velocity and pressure (e.g.boundary conditions), as well as the constraint points to impose the physical equations, here the force balance and divergence free conditions, equations (12) and (13).The resulting posterior distribution can be used to infer the value of the velocity and/or pressure fields at arbitrary positions.

Figure 2 .
Figure 2. Sinusoidal wall geometry for the pressure driven flow problem.

Figure 3 .
Figure 3.The N train = 1180 training points used to infer the flow between sinusoidal walls.Since we are considering a 2D problem, the training points for the Stokes equation and the velocities are pairs of values for the x and y components.(a) Training points for the governing equations: 337 × 2 points for the Stokes equation, f = −∆p/Le x , and 337 points for the equation of continuity s = 0. Note that we use the same locations for all points, though this is not required.(b) Training points for the velocity at the wall: 62 × 2 points for the no-slip boundary conditions, u b = 0. (c), (d) Training points for the periodic boundary conditions specifying the difference between inlet/outlet values: 15 × 2 points for the velocity, (S 1L u α )(0, y) = 0 and 15 points for the pressure, (S 1 L p)(0, y) = 0.As before, we use the same locations for specifying both velocity and pressure boundary conditions conditions.The pressure gap ∆p driving the flow has been included as a constant external force f x = −∆p/L = 12, such that the pressure p appearing in the Stokes equation is periodic.

Figure 4 .
Figure 4. Prediction results for the pressure-driven flow between sinusoidal walls, left (right) panels show the data for the x (y) component of the velocity.(a) The reference FEM solution, (b) the SP inferred solution, (c) contour plots for the FEM (solid) and SP (dashed) solutions, (d) the relative error, and (e) the absolute error.The yellow vertical lines in the top panels indicate the 1D domains over which the u x (u y ) velocities are plotted in figure 5.

Figure 5 .
Figure 5. Velocity profiles for (left) u x and (right) u y , as a function of height y, at four characteristic locations along x. Results for the SP inference, as well as the (reference) FEM and DNS solutions, together with the fourth-order analytical solution are shown.

Figure 6 .
Figure 6.(a)-(c) Relative and (d)-(f) absolute error of the SP predictions, with respect to the reference FEM calculations, for three different resolutions N train .(a), (d) Errors using the base high-resolution data of figure 4, (b), (e) medium-resolution data, and (c), (f) low-resolution data.

Figure 7 .
Figure 7. Negative log-posterior as a function of training steps, when inferring the flow given 40 randomly chosen training velocities.The insets on the right show the flow predictions for u y obtained using the hyperparameters at steps 20 and 500.The full solution is given in figure 8, for the final hyperparameter values.The green symbols indicate the location of the training points for the velocity, all other training points (for the governing equations and boundary conditions) are the same as in figure 3.

Figure 8 .
Figure 8. Prediction results given 40 randomly sampled training velocities.The green symbols indicate the location of these training points for the velocity, all other training points (for the governing equations and boundary conditions) are the same as in figure 3. (a) SP prediction, (b) SP/FEM velocity contours, (c) relative and (d) absolute error between the SP and FEM solutions, and (e) the SP prediction uncertainty, given by twice the standard deviation (i.e. the width of the 95% highest probability density region).

Figure 9 .
Figure 9. Deviations in the prediction results for the governing equations δa = a infer − aexact, with respect to the exact solution, given 40 randomly sampled training velocities, corresponding to the data of figure 8. (a), (b) Forcing term in the Stokes equation f and (c) the continuity equation s = ∇ • u.The theoretical solution is f exact = −ex∆p/L = (−∆p/L, 0) = (12, 0) and sexact = 0, from which we see that the relative error (for f x ) is ≲ 1%.

Figure 10 .
Figure 10.Deviations in the prediction results for the governing equations δa = a infer − aexact, as a function of channel width, given 40 randomly sampled training velocities.Left (right) panels show the results at positions near the point of maximum (minimum) width.(a) Forcing terms in the Stokes equation δf = f+ex∆p/L and (b) the continuity equation, i.e. the divergence of the inferred velocity field δs = s = ∇ • u.The shaded regions represent the ±2σ interval, i.e. the 95% highest probability region.The non-zero uncertainty at the training points (clearly seen in the δs data) is due to the artificial noise ϵ ≃ 10 −3 added to the training data covariance to improve the numerical stability of the matrix inversion.

Figure 11 .
Figure11.Absolute error as a function of the number of velocity training points.We plot both the maximum error and the average over all prediction test points.To ensure reliability, the predictions are repeated 50 times, each using a different set of randomly sampled training points.The symbols (lines) show the corresponding averages, the shaded region indicates the +2σ interval (i.e.upper bound of the 95% highest probability density region).

Figure 12 .
Figure 12.Prediction results given 1 component of the velocity along 1D domains as training points.The vertical/horizontal arrows show the location of the u x /u y training points.(a) SP prediction, (b) SP/FEM velocity contours, (c) relative and (d) absolute error between the SPM and FEM solutions, and (e) the SP prediction uncertainty, given by twice the standard deviation (i.e. the width of the 95% higher probability density region).

Figure 13 .
Figure 13.Zero-level velocity contour used to infer the location of the wall.Dashed (blue) and dotted (red) lines show the u y = 0 and u x = 0 contour lines.

Figure 14 .
Figure 14.The training points used to infer the pressure drop for flow past a cylinder in a sinusoidal channel.(a) Training points for the governing equations: 427 × 2 points for the Stokes equation, f = 0, and 427 points for the equation of continuity s = 0. (b) Random training points for the velocity: Nu × 2, for Nu = 100 (shown here), 200, 400, and 800.(c) Training points for the periodic boundary conditions: 16 × 2 points for the velocity, S 1L (u α )(0, y) = 0.In contrast to previous inferences, here no information is given regarding the pressure drop.

Figure 15 .
Figure 15.Relative error in the prediction results for the pressure drop ∆p, given randomly sampled velocities, for flow past a cylinder inside a sinusoidal channel.Results obtained using 100, 200, 400, and 800 randomly placed velocity training points inside the domain.The shaded regions represent the ±2σ interval, i.e. the 95% highest probability region.

)
Due to the linearity of S, it is trivial to include such boundary conditions into our generic Stokes solver by adding as additional training points values for u b = Su and p b = Sp at arbitrary points along x = 0.

Figure 16 .
Figure 16.Schematic representation of the PINN used to solve the Stokes flow problem.

Figure 17 .
Figure 17.PINN loss as a function of training steps, for the problem of inferring the flow given randomly sampled velocities (indicated by the green symbols), corresponding to the GP results of figure 7. The insets on the right show the PINN flow predictions for u y obtained using the hyperparameters at steps 10 4 and 3 × 10 5 .The full solution is given in figure 18, for the final hyperparameter values.

Figure 18 .
Figure 18.PINN velocity predictions for the problem of inferring the flow given randomly sampled velocities, corresponding to the SP results of figure 8. (a) The reference FEM solution, (b) the PINN prediction, (c) contour plots for the FEM (solid) and PINN (dashed) solutions, (d) the relative error, and (e) the absolute error.

Figure 19 .
Figure 19.Deviations in the PINN predictions for the governing equations, corresponding to the SP results of figure 10.