A fast, matrix-based method to perform omnidirectional pressure integration

Experimentally-measured pressure fields play an important role in understanding many fluid dynamics problems. Unfortunately, pressure fields are difficult to measure directly with non-invasive, spatially resolved diagnostics, and calculations of pressure from velocity have proven sensitive to error in the data. Omnidirectional line integration methods are usually more accurate and robust to these effects as compared to implicit Poisson equations, but have seen slower uptake due to the higher computational and memory costs, particularly in 3D domains. This paper demonstrates how omnidirectional line integration approaches can be converted to a matrix inversion problem. This novel formulation uses an iterative approach so that the boundary conditions are updated each step, preserving the convergence behavior of omnidirectional schemes while also keeping the computational efficiency of Poisson solvers. This method is implemented in Matlab and also as a GPU-accelerated code in CUDA-C++. The behavior of the new method is demonstrated on 2D and 3D synthetic and experimental data. Three-dimensional grid sizes of up to 125 million grid points are tractable with this method, opening exciting opportunities to perform volumetric pressure field estimation from 3D PIV measurements.


Introduction
The hydrodynamic pressure field plays an important role in the dynamics of the flow in many fluid mechanics problems.In particular, the correlation of the pressure fluctuations with the velocity field significantly influences the transport and evolution of turbulence.Additionally, detecting and quantifying pressure wave sources in the flow is an important part of acoustics and efforts to minimize noise sources, while the interaction of the pressure field with solid bodies such as planes and bridges plays a role in the design of engineered structures.However, the measurement of these fluctuations has historically been difficult without resorting to the use of invasive pressure probes or surface-mounted sensors or ports, and the measurement of pressure fluctuations and their gradients throughout a spatially-resolved field has been impractical.In particular, correlated velocity and pressure measurements needed to explore turbulence theory and models for computational simulations of flows have been lacking.To overcome this, researchers have turned to calculation of pressure data from experimentallymeasured velocity fields (van Oudheusden, 2013), for which methodologies such as Particle Image Velocimetry (PIV) have become quite mature and able to supply both planar and volumetric velocity sets in either snapshots or time-resolved sequences.
Examining the Navier-Stokes equations it is quickly apparent that, mathematically at least, the computation of pressure P from a known velocity field ⃗ u and density field ρ should be a straightforward process.For this introduction we will concentrate on incompressible flows, but similar methods can be applied to compressible flows as well (see van Oudheusden (2008) and the discussion in Section 4.2).Here, ρ is density, t is time, and µ is dynamic viscosity.
The most obvious approach would be to calculate the instantaneous pressure gradients from estimates of the material derivative of the velocity field, either including or ignoring the viscous terms.These estimates are lumped as a "source term" ⃗ f (⃗ u), and will have a different form when the flow is compressible or when the average pressure is sought.Integrating these pressure gradients in space from some known starting pressure to the remaining points in the experimental domain would then allow for the estimation of the pressure field up to a constant.
Since the pressure field is a scalar field, the integration of the pressure gradient should be path-independent.Therefore, it is a sensible choice to build an algorithm that integrates the pressure gradient field from different directions and averages the results to minimize the effect of localized experimental error.These methods evolved from averages of a small number of redundant paths toward efforts to average all possible straight line integrals within the domain, with the latter forms often being known as omnidirectional pressure integration schemes (Liu and Moreto, 2020).Iterative application of this family of techniques allows convergence of the solution without requiring explicit specification of boundary conditions.
Alternately, if the divergence of the above equation is taken, a Poisson equation relating the pressure field to velocities can also be developed, which has the advantage of having many efficient computational implementations for its solution, assuming that the boundary conditions are known.
In a series of papers, Pan, Whitehead and their collaborators demonstrated that, for Poisson solvers, experimental error on the final solution interacts with the geometry of the domain, how the measurements are sampled, the type of boundary conditions chosen, and the location and type of the error that is present in the data, among other factors (Pan et al., 2016;Faiella et al., 2021;Nie et al., 2022).Many of these same concerns apply to other solvers as well, as demonstrated in recent work by Sperotto et al. (2022) and Zhang et al. (2022).
Thus, in practice solving the Pressure-Poisson equation could lead to solutions that are sensitive to error in the measured velocity fields, with high levels of noise amplification.Charonko et al. (2010) showed that in general, Poisson-based solvers are more sensitive to measurement error than omnidirectional integration methods.In particular, work by Faiella et al. (2021) showed that the sensitivity to experimental error in Poisson-based solvers can largely be attributed to the boundary-value problem nature of the approach, with the known pressure boundaries (Dirichlet) being superior to the more commonly-used pressure gradient boundaries (Neumann).Additional work by Liu and Moreto (2020) showed that omnidirectional schemes control this effect by iteratively converging the most-physically relevant pressure boundary conditions based on the entire measured domain before solving for the interior points in the last iteration.Specifically, when the boundary conditions extracted from the final iteration of a converged omnidirectional integration solver were used as Dirichlet boundaries in a Poisson solver, the results were essentially identical to the omnidirectional method.This suggests that it is the improved handling and minimization of the pressure error on the domain boundaries that is the reason that the omnidirectional integration schemes are in general more accurate than Poisson solvers.
However, modern fluid dynamics experiments are moving toward volumetric measurement techniques which better capture the full three-dimensional behavior of the flow and which, in theory, should also provide better estimates of the true pressure field.Unfortunately, even in 2D the omnidirectional pressure integration schemes are considerably more expensive than Poisson solvers, and in 3D the difference in computational and memory requirements becomes even more pronounced, and often prohibitive.As a result, few researchers have embraced omnidirectional pressure integration despite its potential superiority in accuracy.
In this paper, we will explore an alternate perspective on the mathematics behind the omnidirectional pressure integration schemes.We will then show how the approach can be recast as an implicit, Poisson-like problem while preserving the iterative convergence of the boundary conditions that powers its improved accuracy.The approach will be demonstrated in synthetic and experimental fields of varying complexity, and is adaptable to both 2D and 3D flow data.Error analysis indicates that the new method demonstrates equivalent or better error levels as the conventional omnidirectional pressure integration, while maintaining computational efficiency equivalent, per iteration, to a Poisson solver.Finally, details will be provided on how to apply this technique to arbitrarily masked grids, where regions of the grid may contain missing data due to lack of particles, laser reflections or occlusions caused by the presence of an aerodynamic model.

Method Concept
In order to convert the explicit omnidirectional scheme into an implicit formulation, we first examine how the line integrals are constructed.For a discrete integration, the pressure at each new point, P C , is computed from the pressure at the adjacent point, P a , plus the pressure gradient along the path, ⃗ r.
The collection of discrete cells with pressure gradient information (computed from velocity field data) that make up a path across the domain can be defined in multiple ways (Figure 1).In this work, we will consider two approaches, shown here in 2D for simplicity.Both approaches considered assume a rectangular, evenly-spaced grid.Irregular grids will not be considered throughout this work.
The first type of path, herein dubbed "Face-Crossing scheme" (Figure 1 (a)), is made up of all cells that a given path ⃗ r crosses, and all integrations are made across connected cell faces.The integral path starts from a boundary point B and extends to one of the interior cells C. The second type of path examined, dubbed "Cell-Centered scheme" (Figure 1 (b)), consists of the cells with the closest centers.In this case, integrations may be across adjacent cell faces or diagonals.In 2D, if the ray angle θ is steeper than the cell diagonal (i.e., θ > atan(∆y/∆x)), then the cell-centered integration path passes through each y coordinate once and the x coordinate is rounded to the nearest cell for a given ray.Conversely, if the ray angle is less steep than the diagonal (θ < atan(∆y/∆x)), then the path passes through each x coordinate once and the y coordinate is rounded instead.For the full omnidirectional pressure calculation, the pressure in any cell C will be found from the average of all line integrals that cross the cell (Figure 1 (c)), which we will determine using the parallel-ray omnidirectional approach (Liu et al., 2016).In that method, line integrals of pressure are found for closely spaced rays separated by a shift s and oriented along multiple angles θ.
The proposed method extends the rotating parallel ray approach presented by Liu et al. (2016) by taking the limit where the number of line integrals is infinite.Thus, we consider all possible ray displacements s and all possible ray angles θ.As an example, let's analyze the specific case where the rays coming from the east cell E are being counted, as depicted in Figure 2.
Boundary Rays Definition of the ray counting process for the continuous integration case in an interior cell.(b) An adjacent cell can be treated as a boundary or as an interior point depending on whether the ray begins outside the boundary.
If the cell E is an interior cell (i.e., none of the rays at any angle θ come from a boundary), then the rays can be "counted" through the integration of a connectivity variable A E : where R CE = 1 if C and E are connected by a path given some s and θ; and 0 otherwise.The inner integral in s can be replaced by the projection of the face connecting C and E (in 2D, the interface is an edge), which has height ∆y in the direction θ.Also for the E cell, the valid integration bounds for θ are −π/2 < θ < π/2, and the integral A E becomes: Note in Figure 2 (a) that all rays coming from the cell E implement the same equation: , where f x (j) is the x component of the source function evaluated at the cell j and the pressure P n+1 j represents the pressure of the cell j evaluated at the (n + 1) th iteration of the algorithm.The discretization of the source term ⃗ f in this implementation is a second-order accurate approximation at the cell face, which is an averaging between the values at the center of the cells C and E. For an interior cell, Equation 5defines the ray count A n+1 E , whereas A n E = 0 because no pressure values from the previous iteration are required.
However, when the cell E shares one or more faces with a boundary, then the rays intersecting the boundary faces will effectively begin at the cell E, meaning cell E is the boundary cell where the integration starts.Therefore, the pressure value used for cell E has to come from the previous (n th ) iteration, as depicted in Figure 2 (b).For the "zeroth" iteration, the values can be initialized to zero (Liu and Moreto, 2020) or any other arbitrary value.Since for the boundary cell case some rays use pressure values from the (n + 1) th iteration and some rays use values from the n th iteration, the ray count A i E (where i is the iteration number) will be different than Equation 5. Depending on the location of cell C, the values of A i E will vary.As an example, the values for the boundary topology depicted in Figure 2 (b) are: ∆y cos θdθ Given the forms of the integrals, the "ray count" A i j is effectively the average of edge length (∆x or ∆y) projected at all angles θ that come from cell j and use the pressure value from iteration i.For the "face-crossing" scheme, we can repeat the application of Equation 5to the other faces to find the total projection area from all rays at all angles to be: due to the symmetry of the problem.We can then define an implicit equation that implements the averaging process from all rays by performing a weighted average of the finite-difference approximations of Equation 3 from all adjacent cells: where is the updated value of the pressure at a given cell C at the next (n + 1) th iteration, and the iteration i can take the values i = {n, n+1}.The cell j can take the values j = {C, E, W, N, S, F, B} for the "face-crossing" scheme, representing the cardinal directions East, West, North, South, Front and Back, respectively.For the cell-centered scheme, the cardinal directions can also take diagonal values j = [{p}, {pq}, {pqr}] representing all 27 cells in a 3×3 cube surrounding cell C, including itself.j is also a directional subscript, so ∆E = ∆x and ∆W = −∆x, and similarly for the other orthogonal and diagonal directions.The source function f j (at direction j) in the last summation is averaged between cells j and C to approximate the source value at the corresponding face, following a second-order accurate approximation.
Equation 9 simply describes the ray averaging process in the omnidirectional integration approach.The weights w i j represent the fraction of the rays that implement each subequation (Equation 3), utilizing the values of pressure from the appropriate neighboring cells depending on which direction the ray is coming from.These weights are defined as: except for w n+1 C , which is defined as: This is because P n+1 C is already counted in the left-hand side of Equation 9.The remaining weights are computed from the ray integrals A i j as previously discussed.Note that Equation 9 can be defined for all cells in the domain, forming a sparse matrix form: where [W i ] are weight matrices and {P i } are the pressure vectors at iteration i.The source term vector {S} represents the last summation in Equation 9 and is computed only once based on the velocity measurements from PIV.For a minimal example of how Equation 12would be constructed for a 3×3 domain, the reader is directed to the appendix of this manuscript.Furthermore, the weight matrices [W i ] can be precomputed for a given grid topology and can include the effect of cells with missing data (say, from regions not illuminated by the laser or containing occlusions/reflections).A method to consider any topology of missing cells will be detailed in the following sections.
The solution method proposed follows the outline provided in Algorithm 1.This algorithm was implemented in Matlab as a CPU code (only 2D) and in CUDA-C++ as a GPU code (2D and 3D).For the Matlab implementation, the weight matrices [W i ] are stored as sparse matrices and Equation 12 is solved using Matlab's default linear equation system solver (mldivide or "backslash" operator ).In the GPU CUDA-C++ implementation the Conjugate Gradient (CG) method is used to solve Equation 12.
For the GPU implementation, a total of 16 scalar fields with the mesh size are required (7 for the CG solver, 6 for the weight matrices, 3 for the source field x, y, z components).Stored as doubles, the code requires 16 • 8N bytes of GPU memory, where N = N x N y N z is the total number of cells in the domain.On the Nvidia RTX A4000 GPU used in this work, which has 16GB of memory and retails at ∼$1000 at the time of writing, a mesh size of at most N = 512 3 = 134 million cells can be computed.Since the CG solver could potentially Algorithm 1 Outline of the matrix-based omnidirectional integration algorithm Require: Source field ⃗ f from PIV data Compute: Source vector {S} (Eq.9, terms using ⃗ f ) Compute: c jk and w j i (details in Section 2.2) Compute: Weight matrices [W i ] (Eq.10) Initialize: {P n=0 } = 0; while Residual > Threshold do Solve: Eq. 12 for {P n+1 } Compute: Residual (Eq.40 or 41) {P n } ← {P n+1 } end while compute the weight matrices on-the-fly, it is possible to trade off computational time to increase the size of the domain, requiring only 10 scalar fields to be stored in memory.This possibility was not explored, since a domain of 134 million cells was deemed sufficient for most Tomographic PIV applications currently being considered by the experimental fluids community.

Weight Computation for Arbitrary Boundary Shapes
When working with experimental data from PIV/PTV, it is common to have cells in the 2D or 3D grid that are masked out due to missing data.Missing PIV vectors can occur when laser reflections from a model surface are present in the field of view; when a model surface is occluding the view of the flow; or when the laser sheet is not sufficiently expanded to illuminate the entire field of view of the cameras.Alternatively, in Stereoscopic PIV and Tomographic PIV there can also be missing vectors where the views from all cameras do not overlap.A typical flow field with vectors masked out due to the presence of an aerodynamic model is exemplified in Figure 3 (a).
Vectors could also be missing inside the field of view due to post-processing or thresholding (say, if the correlation coefficient does not meet a validation criterion), as shown in Figure 3 (b).This case will not be considered in this work, vectors missing due to vector post-processing will be filled with an interpolation algorithm.However, the approach that will be described in this section is agnostic to the nature of the missing vectors and should still handle these cases.Note Figure 3 encodes the missing vectors as NaN (Not a Number) values, shown as a deep blue color in the contours.NaN values will encode missing vectors throughout this manuscript.
When vectors are missing due to masking (Figure 3 (a)), the integration of a pressure ray needs to be performed starting at the edges of the masked out region.Furthermore, the masked region can have concave regions where some of the rays will start and end without ever reaching the borders of the domain.Therefore, a more general approach to compute the weights w j i considering arbitrarily masked border shapes is necessary to deal with realistic PIV datasets, especially in 3D volumes where the masked borders can have a large number of combinations of missing data topologies that would be intractable to handle manually in code.
Here a method will be introduced for the computation of the weights for arbitrary domain shapes.Only the "face-crossing" scheme will be described in the body of this manuscript, for both 2D and 3D grids.The "cell-centered" scheme is described in the supplementary material only for 2D grids, as the 3D grid integrals were exceedingly cumbersome to compute in the "cell-centered" scheme with arbitrary boundaries.

Arbitrary boundary weights for "face-crossing" scheme in 2D
In order to compute the weights corresponding to interior cells with potentially arbitrary boundaries, let's consider the source field ⃗ f has already been computed and it contains NaN values at some cells.If a cell has a NaN neighbor, rays coming from the NaN direction must be treated as boundaries and therefore must be assigned to w n j (i.e., the n th iteration).In the face crossing scheme, a ray coming from the east (E) cell will have a path that passes through the east cell as well as one and only one of its neighbors (i.e., it must pass through either EN , EE or ES in Figure 4 (a)).
We can define a boolean value b j for each cell, defining whether there is missing data in the corresponding cell for the source field ⃗ f : We also define bj = 1 − b j as the boolean NOT operation on b j to define where data is available instead.Thus, when considering all possible ray angles θ, we need to assess six possible cases depending on where the rays come from and whether the corresponding adjacent cells have missing data or not.The cases are enumerated in Figure 4 (b).For example, case A represents rays coming at an angle such that all rays come from the ES cell but some rays miss the C cell (and therefore must not be counted).Case B represents rays coming at an angle such that all rays coming from the ES cell also enter the C cell.Case C is the complementary case, where the rays at the same angle come from the EE cell and cross the C cell.Cases D, E and F are complementary cases to C, B and A, respectively, but involving the EN cell instead.
For each of the cases, a different mathematical expression representing the projection of the edge highlighted in blue can be defined.The total ray projection for the (n + 1) th iteration then becomes: where α = atan(∆y/∆x).Note that bE multiplies the entire expression, to indicate that if there is missing data in the E cell then this entire weight evaluates to zero.When the data in the E cell is missing, then the current cell being evaluated (cell C) must be the starting point for all rays going westward and therefore is itself a boundary.Therefore, there must also be a boundary contribution from the cell C of the form:

∆y cos θdθ
This form considers all four cells surrounding cell C, and the constants 2∆x and 2∆y are calculated the same way as shown in Equation 5. We can evaluate the integrals in Equation 14to obtain the expressions: where the constants c jk are defined as: and ∆d = ∆x 2 + ∆y 2 .These constants come from evaluating the integrals in Equation 14.
Similarly, the ray integral corresponding to the boundary values for iteration n has the following form: To compute the weights corresponding to the west (W ) cell, one simply swaps E and W in Equations 16 and 18.For the north (N ) and south (S) cells, the constant c xx also needs to be swapped to c yy : For a uniform rectangular grid, the constants c xx , c yy and c xy can be trivially precomputed using Equation 17.Then, the ray counting process to determine A i j reduces to a conditional sum for each cell in the domain using the values b j that define whether data for the function ⃗ f is available in the corresponding neighboring cells.The weights w i j then are found using Equation 10.

Arbitrary Boundary Weights for "Face-Crossing" Scheme in 3D
In the three-dimensional case, the computation of weights is significantly more involved, and therefore full detail of the integrals computated will be provided in the supplementary material.In this manuscript, we describe the key steps involved in the weight computation in 3D.As the rays now come from all spherical angles, we have to compute double integrals with curved bounds.The complexity of the integrals is also increased because the projections of the neighboring faces onto the faces of the cell C are no longer lines, but triangles, trapezoids or rectangles depending on the values of the spherical angles.
The problem is still tractable, as will be demonstrated herein.Not all integrals will have a closed form, thus requiring them to be numerically computed.However, this computation only occurs once and only a handful of coefficients needs to be stored in memory.Similarly to the 2D case, this method has the advantage of treating all possible combinations of boundary conditions in 3D, which is combinatorially intractable.In the 2D "face-crossing" scheme, there are 2 12 = 4096 possible combinations of boundary conditions.In 3D, the number of combinations is 2 32 ≈ 4 billion.Evidently, most combinations in 3D would be degenerate cases, but it is still better to have a system that can treat any possible combination of boundary conditions than having to treat each boundary condition possibility within a preprogrammed case structure.
First, we will find the value of A tot , considering all rays coming from all directions.For this, we will define a spherical coordinate system with polar angle θ and azimuthal angle ϕ, according to the right-handed coordinate system described in the ISO 80000-2:2019 convention.The definitions of the angles θ and ϕ are depicted in Figure 5 (a).Counting the total number of rays crossing the north face (N ), which has a positive y normal, is straightforward.Defining a unit vector in the ray direction r: The projection of the face N in the direction r is found by multiplying the surface area of N (∆x∆z) by the dot product ŷ • r.To find the total projection from all spherical angles A N , one needs to integrate: The factor of 4 is present due to the symmetry of the problem and the defined bounds of the angles θ and ϕ.Alternatively, one could make the upper bounds of both angles π and remove the factor of 4, yielding the same results.This integral can be evaluated analytically: Given the symmetry of the problem, the total projection for the other directions (A E , A W , A F , A B , A S ) can be found by swapping the grid spacings ∆x and ∆z by the appropriate values.Doing this yields a total projection area of: Within the same theme of utilizing a boolean value b j to define whether a cell j has NaN as a source term ⃗ f (and therefore must be treated as a boundary), we can evaluate the area integrals for the interior and boundary contributions in 3D in a similar form as it was performed in 2D: The pattern is also the same, where (N and S), (W and E) and (F and B) are swappable in the expressions above, and the corresponding constants c jk must correspond to the appropriate directions (i.e., y for N and S, x for E and W , and z for F and B).This means a total of 6 constants c jk must be calculated.
For the far faces (j = k), the area projection is displayed in Figure 5 (b), where P 1 , P 2 are points on one of the edges of the far face and P ′ 1 , P ′ 2 are the projections of these points on the N face through the r direction.Generalizing the projection for arbitrary coordinates (j, k, l) results in the following integral that finds the contribution of the far face to the area projection A: where ∆j is the grid spacing in the j direction and ∆ℓ 1 and ∆ℓ 2 are the grid spacings in the remaining two directions.Furthermore, ∆ℓ 1,c = ∆ℓ 1 cos ϕ and ∆ℓ 1,s = ∆ℓ 1 sin ϕ.Although ∆ℓ 1 and ∆ℓ 2 do not seem to be interchangeable in Equation 26, the result from the integral is identical when ∆ℓ 1 and ∆ℓ 2 are swapped.Further details on how to achieve the integral form of Equation 26 can be found in the supplementary material.
For the faces forming dihedrals (i.e., for c jk where j ̸ = k), the projections on the N face can form triangles or trapezoids, depending on the values of the angles θ and ϕ.There are three cases that must be considered, enumerated in Figure 5 (c).These cases result in four integrals: where ∆ℓ represents the grid spacing in the direction not considered (ℓ ̸ = {j, k}) and ∆k c = ∆k cos ϕ and ∆k s = ∆k sin ϕ.
Note Equation 28 has a closed form, whereas the remaining integrals need to be computed numerically.The forms provided are the simplest analytical expressions the authors were able to derive.Although not straightforwardly evident at a first glance, c jk = c kj .This is because the projection of face j onto face k for any viewing angle always has the same area as the projection of face k onto face j.Once again, further details on the computation of the integrals can be found in the supplementary material.
Although the integrals in Equations 29, 30, 31 and 26 must be numerically computed, this computation only needs to be performed once prior to running the matrix omnidirectional solver.Computing all integrals to 1 × 10 −7 precision takes about 3 seconds for arbitrary ∆x, ∆y and ∆z, which was deemed a reasonable investment for solving an arbitrarily-spaced fully three-dimensional grid.For common ratios of ∆x, ∆y and ∆z, these integrals can also be precomputed and stored.A special case is shown in the next section.

3D "Face-Crossing" Constants for Isotropic Grids
For the special case where the grid spacing is equal in all directions (i.e., ∆x = ∆y = ∆z = ∆), we can perform further simplification of the integrals and attain a closed form for c jk , which then can be added to the solver code as a fixed constant for isotropic grids.Below we can find the values of these constants: Numerically, these constants have the following values:

Method Accuracy and Computational Complexity
In this section, the matrix-based omnidirectional pressure integration method described in Section 2 will be tested with synthetic datasets where the ground-truth pressure field is known.Noise will be added to the velocity vectors to assess the uncertainty behavior of the pressure field solutions for the method proposed and for previous methods present in the literature.

Taylor Vortex (2D)
First, a decaying Taylor vortex is considered, which was previously used by Charonko et al. (2010) to assess error propagation in pressure integration algorithms.The Taylor vortex has an analytical solution for pressure that enables the definition of a ground truth.The velocity field of the decaying Taylor vortex centered at the origin is defined in cylindrical coordinates as: where u θ is the tangential velocity, r is the radial position and H is a measure of the angular momentum in the vortex.The pressure field for this vortex can be found by solving the Navier-Stokes equations: where P ∞ is an integration constant and is defined as the far-field pressure.
In order to generate velocity fields ⃗ u noisy contaminated with spatially uncorrelated noise, a noisy vector field ⃗ n with unit magnitude and random angle is sampled from an uniform angular distribution and scaled by a factor s. It is then multiplied pointwise by the magnitude of the instantaneous velocity vectors ||⃗ u||, following a previous work (Charonko et al., 2010): The scaling factor s is then varied between 0 and 0.1 (0% and 10% noise intensity) for this analysis.In all cases examined, a POD-based denoising scheme retaining only the most energetic modes in the time series was used to reduce noise in the velocity fields, following the previous work by Charonko et al. (2010).Modes with less than 1% of the total energy were discarded, resulting in approximately 6 modes retained for pressure reconstruction.
Five pressure reconstruction algorithms were tested in this part of the work, namely: (i) Pressure-Poisson solver with Neumann boundaries; (ii) Omnidirectional integration, "virtual boundary" (Liu and Katz, 2006); (iii) Omnidirectional integration, "rotating parallel ray" (Liu et al., 2016); (iv) Matrix-based omnidirectional solver, "face-crossing" scheme; (v) Matrix-based omnidirectional solver, "cell-centered' scheme; For the Pressure-Poisson case (i), we defined an additional Dirichlet boundary condition at one single mesh point at a corner of the domain to ensure the problem is well-defined.
To define the source terms, the "standard" approach as defined in Charonko et al. (2010) was used.This method used the full Navier-Stokes equations for Neumann boundaries, and direct calculation of the divergence of each term for the source term, preserving the timederivatives and viscous terms with no application of an incompressible flow assumption.This was chosen for consistency with the forms used in the omni-directional solvers, but as shown in Charonko et al. (2010) and Nie et al. (2022), alternate forms that reduce numerical error propagation can sometimes result in better performance.
For the "virtual boundary" case (ii), a total of 2N x N y rays were used in the boundary, whereas for the "rotating parallel ray" approach (iii) a ray spacing ∆s/∆x = 0.4 was used and ∆θ was four times the value provided in formula (73) in Liu and Moreto (2020).
To compare computational efficiency and accuracy of the algorithms tested, a 2D version of the algorithms described is implemented on a square domain of variable number of grid points containing the Taylor vortex flow field.The omnidirectional integration algorithms (ii) and (iii) are implemented in Matlab using the parallel computing toolbox, and 8 nodes of an Intel Xeon Gold 5218R CPU (2.10 GHz) were used to accelerate the computations.The algorithms requiring a matrix solver (i), (iv), (v) employed Matlab's default linear equation system solver (mldivide or "backslash" operator ).Finally, in the error tests considered in this section, a total of 26 Taylor vortex snapshots at times spanning 0.05 < t/T 0 < 0.3 using a 101 2 grid that spans from −3 < x/L 0 < 3 and −3 < y/L 0 < 3 were employed.A summary of the RMS pressure errors, as a percentage of the maximum absolute pressure in the field, are presented in Figure 6 (a) for all solvers considered.It becomes quickly evident that the pressure-Poisson solver presents very poor accuracy, providing incorrect results with an RMS error of ∼ 5% of the peak suction even when the underlying velocity field contains no noise.The error of the pressures obtained by the pressure-Poisson solver then shoots up very quickly as a function of velocity error.The two omnidirectional methods ("virtual boundary" and "parallel ray") perform better than the pressure Poisson method, providing a somewhat linear relationship with the velocity error with a zero crossing at approximately zero pressure error.However, we can see the statistics from both methods fluctuate for different velocity errors, due to the limited number of rays employed during the integration.We did not run more cases for converged error statistics for the traditional omnidirectional methods, as they are highly time intensive.The matrix omnidirectional methods proposed in this manuscript, on the other hand, offer the lowest errors; the "cellcentered" approach performing slightly better.Furthermore, the statistics from the matrix methods are better converged, as the continuous integrals utilized mimic the limit of infinite rays.
When the computational cost to evaluate pressure from the velocity fields is considered, there are several points that will be further elaborated throughout this manuscript.For this first analysis on the 2D Taylor vortex, we consider the computational time required to compute one iteration of each one of the five methods examined in Figure 6 (b) for a variety of different grid sizes ranging from 11 2 to 2001 2 in 2D.We also include in Figure 6 (b) the computational times for the GPU implementation in 2D and 3D (for grids of 51 3 to 501 3 ), shown as dot symbols.The only method that requires a single iteration to converge is the pressure-Poisson method, and all other methods will require multiple iterations to converge.Less than 30 iterations were required for the Taylor vortex problem examined in this section for all omnidirectional methods, but further discussion is presented in Section 3.4 regarding the convergence behavior for more complex cases, where hundreds of iterations may be required to achieve satisfactory convergence.In our numerical experiments, however, the matrix omnidirectional methods always converge in less iterations to a given residual than the traditional omnidirectional methods.
Figure 6 (b) demonstrates how significant the computational time reduction is for the matrix omnidirectional methods discussed herein, requiring a logarithmic scale to fit the times from the different methods.One iteration of the matrix methods requires slightly more time to solve in Matlab than the pressure-Poisson equation, but the times are of the same order.On the other hand, the traditional omnidirectional integration schemes require about 20 times longer per iteration for a given grid size in 2D for the discretization chosen here.The number of iterations required to convergence of the pressure to a certain criterion is at least the same as the matrix methods proposed herein.
When we consider our GPU implementation of the 2D matrix omnidirectional solver, we observe a ∼ 3-3.5× speedup in comparison to our Matlab implementation.For 3D grids, we only implemented the matrix omnidirectional solver in the GPU, and curiously we find that the 3D grids solve significantly faster for a given precision in the CG solver for the same number of grid points, averaging about ∼ 9× faster than a 2D grid with the same total number of grid points.We have not explored the underlying reason for this drop in computational complexity, but it is likely related to the number of iterations required for the CG solver to propagate information through the domain, which is of the order of the largest dimension in the domain.However, this effect is particularly exciting because it means that a speedup of approximately 3 orders of magnitude (∼ 500-1000× faster) can be achieved when compared to 2D grids with the traditional omnidirectional methods, which will enable the tackling of large 3D grids that are becoming highly relevant to the experimental fluids community.
Although a 3D implementation of the traditional omnidirectional integration was not currently examined in this work, an important time complexity scaling is evident by examination of the work presented by Wang et al. (2019): Using N to denominate the largest size of the grid between the x, y and z directions (assuming a box-shaped domain), the number of rays cast is proportional to the projection area of the domain box analyzed at a given spherical angle (i.e., ∼ N 2 ).The number of operations required to compute each integral is ∼ N .Furthermore, the number of spherical angles required to tesselate the enclosing sphere to a given grid spacing is proportional to the area of the enclosing sphere (∼ N 2 ).Therefore, the number of computations required per iteration must be t ∼ O(N 5 ), which is a rather strong scaling.We find in our empirical tests of our 3D GPU implementation of the matrix omnidirectional method presented herein that the time complexity scaling is t ∼ O(N 4 ), as shown in Figure 6 (b), which is significantly more favorable for larger grids.Note that the total number of points in the 3D grid scales as ∼ N 3 .
A more direct comparison with the performance described by Wang et al. (2019) can be performed by comparing the performance of the graphics cards used in both studies.The RTX A4000 used in our work is capable of 19.2 TFLOPS at single-precision, whereas the RTX 2080 Ti used by Wang et al. (2019) is capable of 13.45 TFLOPS.We performed our calculations in double-precision, which is reported by Wang et al. (2019) to take approximately 3 minutes per iteration for a 100 × 47 × 38 grid in their machine.A cutout of the JHU turbulence data described in Section 3.2 was generated with the same 100 × 47 × 38 grid size, and one iteration of the omnidirectional matrix "face-crossing" scheme takes on average 0.082 s.Therefore, the GPU implementation of our method for 3D grids is estimated to be about 1500× faster than the state of the art (Wang et al., 2019) at this specific grid size and likely even faster for larger grid sizes due to the more favorable time complexity scaling of t ∼ O(N 4 ).

JHU Isotropic Turbulence Data (3D)
In this section, the performance of the omnidirectional matrix method will be assessed for a more complex flow field for which the exact values of pressure and velocity are known.The "forced isotropic turbulence" direct numerical simulation (DNS) dataset provided by the Johns Hopkins University (Li et al., 2008) was chosen for this analysis.More specifically, the "isotropic1024coarse" database was sampled at the 40 th frame (time = 0.2535) and adjacent frames for the computation of the time derivatives.
Grid resolutions of 50 3 , 100 3 , 200 3 and 500 3 were considered in this study.The data cutouts taken are not decimated or downsampled to the specified grid resolutions considered to minimize truncation error in the derivative computations.Instead, the cutouts have the same grid spacing as the original data set and therefore the 50 3 box is truncated from a corner of the original 1024 3 set and is four times smaller in physical size than the 200 3 box.This also enables the demonstration of how the omnidirectional matrix method handles incomplete data boundaries, since none of the cases analyzed comprise of periodic boundaries.As boundary conditions are not defined in the omnidirectional matrix method, this poses no additional implementation challenge.
While sampling the JHU turbulence dataset, we noticed that the momentum equations are not balanced when computing the finite difference gradients of the available velocity/pressure fields in an Eulerian grid, even when including the forcing terms; for accuracy levels from second-order to sixth-order.This slight imbalance appears to be related to the fact the original database was solved with a spectral method.We believe that this effect acts as an unnecessary confounding factor that can contaminate the obtained pressure fields but does not pertain to the accuracy of the pressure solver we propose, but to missing information in the underlying velocity field.Therefore, in order to generate noisecontaminated velocity fields based on the correct underlying pressure field, we perform the following operations: (i) Generate the uncontaminated source field ⃗ f 0 based on the true pressure with central finite differences: ⃗ f 0 = ∇P truth ; (ii) Compute the source term based on the convective derivative ⃗ f u = −ρ(D⃗ u/Dt) based on the velocity fields using second-order accurate central finite differences; (iii) Compute a residual source term ⃗ r = ⃗ f 0 − ⃗ f u ; (iv) Add noise to the velocity field (⃗ u noisy ) and compute a noise-contaminated source term ⃗ f u,n = −ρ(D⃗ u noisy /Dt); (v) Correct the source term with the residual: ⃗ f 0,n = ⃗ f u,n + ⃗ r; (vi) Integrate the noise-contaminated source field ⃗ f 0,n to obtain pressure.
With this process, the source field ⃗ f 0,n computed from the velocity field contaminated with noise of zero magnitude matches ∇P truth and has no error related to the approximated derivatives.The noise-contaminated velocity fields ⃗ u noisy are generated differently than the previous section, as the fields are three dimensional.The equation below adds multiplicative noise to the velocity fields scaled by a factor s, between 0 and 0.1 (0% and 10% noise intensity).Note the noise added in Equation 39 is slightly different than Equation 38 in that it only rescales the vectors without adding a component perpendicular to them.Due to the usage of a single field to estimate pressure, no noise reduction scheme (such as the POD-denoising used in the previous section) was employed.Slice showing pressure contours obtained with the matrix omnidirectional method ("face-crossing" scheme) for the JHU turbulence database for the 200 3 box case (a-e) and 500 3 box case (f-j).5% noise corresponds to 1 standard deviation in the raw velocity vectors.
⃗ u noisy = ⃗ u(1 + s⃗ n) (39) Figure 7 displays the contours of pressure for two turbulence box sizes, 200 3 and 500 3 , which were iterated to a 10 −3 momentum residual as defined in Equation 41 and further discussed in Section 3.4.In each set of cases, the slice is taken at the middle of the respective domain.We can compare the true pressure in Figure 7 (a, f) against the results obtained with the matrix omnidirectional method at 0% noise (b, g) and 5% noise (d, i).The error fields (truth minus "matrix-omni" result) are shown in Figure 7 (c, e, h, j) in a tighter color scale.It is evident from the contours in Figure 7 that the matrix omnidirectional results closely approximate the true pressure fields in both cases even with the presence of fairly large velocity noise.A velocity field noise with a 5% standard deviation is representative of the quality that might be achieved with particle image velocimetry for the more challenging experiments where illumination intensity is very low.Nevertheless, we still observe the shape and magnitude of the contours in the noisy cases is close to the true pressure, up to quite significant detail.Several cases with varying velocity noise were tested with this data set and the RMS pressure error is plotted in Figure 8 as a function of velocity noise for various grid sizes.We note a similar linear trend as a function of velocity noise as observed in the 2D Taylor vortex shown in Section 3.1, however the RMS pressure error is smaller in the JHU turbulence case.This may be related to the more uniformly distributed magnitudes of the velocity vectors in the JHU database, which reduces the error of the velocity gradients required to compute the source terms ⃗ f .The generally low error achieved for different grid sizes demonstrates the ability of the matrix omnidirectional method proposed herein to provide accurate pressure estimates from PIV velocity fields even for very large 3D mesh sizes.

Limit of Infinite Rays
In previous sections, we argued that the matrix omnidirectional method we propose simply takes the "rotating parallel ray" approach to the limit of infinite rays.From the integral formulation proposed in Section 2.1, we define the matrix equation weights considering the infinite number of rays coming from all possible angles θ between 0 and 2π and all possible ray shifts crossing any given cell in the domain.Therefore, the results from the computations using the "rotating parallel ray" approach with increasingly smaller spacing between rays and increasingly more angles should approach the result from the matrix method proposed herein.This behavior is evidenced in Figure 9, where the "rotating parallel ray" algorithm is run on the 2D Taylor vortex data with 1% velocity error on a 101 2 grid for 30 iterations of the Matlab implementation.The blue solid line, showing the RMS pressure error as a percentage of the peak suction at the core of the vortex, evidences this convergence behavior as the number of rays (integration paths) is increased.However, note the logarithmic xscale, demonstrating a large number of integration paths is required to really achieve this convergence.Evidently, this comes at a high computational cost, which is evidenced by the red solid line in Figure 9. Compared to the time required to run the 30 iterations of the matrix omnidirectional method (shown as a red dashed line, ∼ 2.07 s), the cost to run the "rotating parallel ray" algorithm to the same accuracy as the matrix omnidirectional method is very high, of order 10 minutes, even for a small 101 2 domain.

Method Convergence
As discussed in Section 2, Equation 12 is an iterative update equation for the pressure field.It updates both boundary nodes and internal nodes for the (n + 1) th iteration.Here we discuss the observed convergence properties of this update method.
In order to define convergence, two metrics were examined.The pressure update metric is defined as: where || • || 2 is the L2 norm of the field considered.Alternatively, the residual of Equation 12(momentum residual) can be used as a convergence metric: where the denominator using only the source term assumes the first residual of Equation 12was computed with a zero initial guess for {P n=0 } = 0.The momentum residual of Equation 41is readily available in the CG solver implementation, as it needs to be calculated when the CG solver is initialized.
Although both residuals could be used for assessing convergence, the momentum residual ε R was found to be a slightly more conservative convergence criterion (i.e., its decay rate is the same as ε P but its magnitude is slightly larger) and therefore it will be the quantity used for all discussions through this manuscript.
Due to the high computational cost of the traditional omnidirectional integration method, previous references typically use a small number of iterations of the algorithm.For example, McClure and Yarusevych (2017) used 5 iterations in their examination of a 2D slice of a DNS cylinder flow database, whereas Liu and Katz (2006) claim to have observed convergence in less than 10 iterations in a 2D PIV measurement of a cavity shear flow.In the 3D implementation of the omnidirectional integration method by Wang et al. (2019), only 3-4 iterations were used, which was deemed to have reached convergence by the authors.With the implementation proposed herein, the effect of the number of iterations on convergence can be examined for a very large number of iterations, as the computational cost per iteration is small even for large grids.

Effect of Pressure Magnitude at the Boundaries
First, we examine the convergence behavior for the 2D decaying Taylor vortex at a specified time (t/T 0 = 0.1) and varying the magnitude of the true pressure values at the boundary by changing the extents of the domain considered under the same grid resolution of 101 2 and no noise contamination of the underlying source fields.
The "pressure magnitude" here is defined as the norm of the true pressures at the boundary ||P || b .It stands to reason that the further the final pressures at the boundary are from the initial guess, the longer it will take for the algorithm to update the boundary values to within a given residual.As the initialization of the pressure term {P n=0 } = 0 has zero norm, this effect can be captured by the norm of the boundary pressures ||P || b , which is evidenced in Figure 10.If the domain sampled from the Taylor vortex is very large, then the pressures at the boundary are close to zero (i.e., their norm ||P || b → 0).Conversely, with a smaller windowing of the domain, the pressures at the boundary will have values further from zero, and their norm grows larger.As evidenced in Figure 10 (a), increasing the norm of the target boundary pressure slows down the convergence of the algorithm.The trend is clearer when the convergence criterion is fixed, as shown in Figure 10  This effect may explain why some of the former references observed rapid convergence in the traditional omnidirectional integration method.If the pressure field boundaries include a large region with pressures close to the ambient (or "far field") pressure, then a fast convergence rate is attained because the boundary values do not need to be significantly updated.This also serves as a guideline for experimentation; i.e., the convergence rate will be faster when the far field is included in the velocity field measurements.However, this is not possible when the domain includes walls where a pressure distribution is present, as will be demonstrated with real data in Section 4.

Grid Size
Now considering fields with complex pressure boundaries, we evaluate the effect of grid size on the number of iterations required to attain convergence.In this section, the same JHU isotropic turbulence data set examined in Section 3.2 is considered with different grid sizes.The convergence behavior is summarized in Figure 11.In the 2D cases shown in Figure 11 (a), the 2D slices are taken from the center of the 3D box of same size.It is evidenced in both Figure 11 (a) and (b) that the convergence behavior is somewhat complex with an initial polynomial phase and a rapid exponential convergence at the latter iterations.The "knee" of the curves where the exponential phase begins seems to be different for each case analyzed, resulting in crossing of the convergence lines from each case.Generally, however, the convergence is faster for the smaller grids at the smaller residuals ε R and less than 10 3 iterations is required for most cases considered to converge to ε R < 10 −4 .Curiously, the number of iterations required to compute the 3D cases (which include far more boundary points) is comparable to the 2D cases.The wall clock time required for the 2D cases, however, is far smaller; taking 310 seconds to run 1,000 iterations for the 400 2 grid and 2863 seconds to run 1,000 iterations for the 200 3 grid using the GPU implementation of this algorithm.

Relationship Between Residual and Error
When the source field {S} used as an input to the omnidirectional matrix algorithm is contaminated by noise, it may be unnecessary to fully converge the pressure integration.Therefore, it is worthwhile to examine the relationship between the residual convergence and the error in the pressure fields for noisy cases to better understand where satisfactory convergence is attained.Figure 12 (a) presents the residual convergence for the 100 3 grid case from the JHU dataset for three different noise magnitude cases: 0% noise, 2% noise and 5% noise.The first observation is that the cases with larger noise converge to lower residuals ε R earlier than the uncontaminated case.Figure 12 (b) demonstrates that the addition of noise in the underlying velocity fields makes the more converged cases plateau to a given RMS error in pressure when compared to the true pressure.For example, although ε R = 10 −5 results in an RMS pressure error of 0.15% for the uncontaminated case, a ε R ∼ 10 −4 is sufficient to attain the best pressure error of ∼ 1% for the 2% noise case and ε R ∼ 10 −3 is sufficient to attain the RMS error of ∼ 2% for the 5% noise case.Note that under these conditions, somewhere between 100 and 500 iterations would already be sufficient for convergence, as shown in Figure 12 (a), and therefore a momentum residual of ε R ∼ 10 −3 to 10 −4 may be an appropriate stopping criterion for this algorithm under practical conditions where noise is present in the underlying velocity fields.

Wall-clock Time Convergence
It is also worth mentioning that the use of the Conjugate Gradient algorithm to solve the matrix equation, as implemented in the GPU version of this algorithm, presents the added advantage of enabling the use of the previous iteration's pressure field as a first guess for the CG algorithm.This significantly speeds up the solution time at the latter iterations of the algorithm, as is demonstrated in Figure 13 for the 100 3 grid from the JHU isotropic turbulence dataset.As it can be seen, after iteration ∼ 300 the time per iteration slowly decreases, reaching a negligible baseline of ∼ 6 milliseconds after iteration 1500 (corresponding to the various kernel launch overheads in the GPU).The peaks in the iteration times (blue lines) are due to periodic saving of the intermediary fields every 50 iterations.

Application Examples
In this section, a few example datasets from previous works by the authors are explored to demonstrate the ability of this technique to accurately reconstruct pressure from PIV measurements without a priori knowledge of the pressures at the boundaries, by only using the free stream pressure as the integration constant.

Planar PIV, Slanted cylinder flow, center plane slice
First, we consider the bluff body wake of a cylinder with a slanted afterbody examined by Zigunov et al. (2020) at the slant angle of ϕ = 45 • and diameter-based Reynolds number of Re D = 25, 000.At these conditions, this flow presents an interesting hysteresis behavior, where the same Reynolds number supports two distinct wake flow topologies.In the "vortex state", the flow separates at the leading edge of the slanted surface and reattaches on the slanted surface further downstream, forming a separation bubble and a counter-rotating vortex pair.In the "wake state", the flow separates at the same point but does not reattach, forming a more traditional fully-separated bluff body wake.The "wake state" is the default state for this wake at low Reynolds numbers, but if the Reynolds number exceeds a threshold value (of about 33,000), then the "vortex state" forms and persists if the free stream velocity is then reduced (Zigunov et al., 2022), allowing a range of Reynolds numbers to display two different flow topologies at the same Re D .
The surface pressure distribution was measured using an Omega PX653 pressure transducer with a range of 0-0.1 inH2O, resulting in a C p uncertainty of ±0.03 for the conditions analyzed.Five hundred planar PIV velocity fields were acquired at the center plane (streamwise-vertical) at a rate of 10 Hz, enabling the approximation of the Reynolds stresses required to compute the average pressure field.
The average pressure field is calculated by using the Reynolds-Averaged Navier Stokes equations, dropping the viscous terms: Since the underlying velocity fields were from planar PIV, the out-of-plane derivatives and velocity components were considered equal to zero.As the plane chosen is the symmetry plane of the flow, this is true for the mean quantities but not for the fluctuations.
The forcing vector field f i is computed using Equation 42 and integrated using the matrix omnidirectional method using the "face-crossing" scheme with our GPU implementation.
The PIV vector fields are 427×360 (∼154k grid points) in size and masks were applied to remove regions without valid vectors (∼87k grid points contain valid data).The first iterations took approximately 0.4 seconds, but about 3,000 iterations were required to converge the boundaries; taking about 16 minutes to converge the pressure field with the matrix method.
Although the number of iterations and convergence time was large for this case, the results presented in Figure 14 for the "vortex regime" demonstrate the motivation for attaining convergence of the boundaries in the matrix omnidirectional method.The contours of pressure obtained with the matrix omnidirectional method in Figure 14 (a) are clearly a different field than the results from the pressure-Poisson method in Figure 14 (b), which was obtained by setting the boundary conditions to Neumann boundaries (∇P In both cases, the pressure at the top-right corner of the field, which is the closest to a free stream value, is set to C p = 0. In the comparison with the transducer measurements shown in Figure 14 (c), it becomes clear that the matrix omnidirectional method presents a better match without any a priori knowledge (i.e., the only information provided was the C p = 0 value at the top-right corner after pressure was integrated).The magnitude of the surface pressures is off by approximately ∆C p = 0.3 − 0.5 in the pressure-Poisson method.Furthermore, the low-pressure region towards the trailing edge z/D ≈ 1 has incorrect topology in the pressure-Poisson solution of Figure 15 (b), incorrectly assigning a suction zone to the trailing edge flow.These aspects demonstrate the limitations of using the pressure-Poisson equation to obtain pressure estimates from PIV fields.Although the pressure-Poisson solution may seem adequate at a first glance without any other reference pressure field for comparison, it may lead to misleading interpretations about the physics of a problem.Therefore, the computational time investment for solving the omnidirectional matrix equation through a large number of iterations is likely justified if more accurate and robust pressure estimates are to be extracted from experimental PIV fields.
For the sake of completeness, the pressure field for the "wake regime" of this problem is also provided in Figure 15 (a), only for the matrix omnidirectional method.Note the remarkable agreement with the surface measurements in Figure 15 (b), likely due to the lower surface-normal pressure gradients.In other words, the pressure is not varying as much away from the surface as in the "vortex regime", minimizing the effect of the missing vectors near the surface (due to laser reflections) and enabling a fairer comparison between the surface pressure from transducers and near-surface pressure from PIV.Note the pressure field topology in Figure 15 (a) matches the expected distribution for a fully separated bluff body wake, with the entirety of the wake presenting a moderate suction and the pressure recovering to free stream values towards the trailing edge.

Scanning SPIV, Supersonic Double-Fin Flow
In this application we demonstrate the possibility of applying this technique to obtain a 3D average pressure field for a compressible flow at M ∞ = 2 over a double-fin.The PIV fields are obtained using the newly-proposed "scanning stereoscopic PIV" (SSPIV) technique (Zigunov et al., 2023).This large volume (53 × 66 × 100 mm) with the <1µm particles required for this supersonic flow is currently outside of the typical range for a tomographic PIV setup, which motivates the choice of this technique.In brief, the SSPIV technique comprises of a continuously scanning stereoscopic PIV system where the entire acquisition optics are slowly scanned throughout the interrogation volume.A stack of about 800 planes is then acquired over the course of a run, and each plane is assigned to a z location according to the timing of the laser Q-switch in relationship to a simultaneously-measured linear potentiometer attached to the traverse mechanism.This allows for the calculation of the average velocities and approximation of the Reynolds stresses by performing a spatiotemporal average, where PIV planes from z locations within a threshold (± 1 mm) of a given z coordinate are averaged.Further details about the acquisition technique are provided in Zigunov et al. (2023), and further details on the flow and analysis of the 3D flow structures observed can be found in Seckin et al. (2023b).As a final note, we point out that the grid spacing is different in the in-plane directions and the out-of-plane direction, which makes the computation of the rectangular grid coefficients provided in Section 2.2.2 particularly useful.Before discussing the implementation details, we note that 30 planes are used for each spatiotemporal average plane, and to attain better statistical convergence of the averages and fluctuations the vectors on a 5 × 5 × 30 grid centered at every grid point are used.More planes could not be acquired due to the limited blowdown time of the facility.This number of vectors is sufficient to converge the mean quantities, as discussed by Zigunov et al. (2023), but the Reynolds stresses are likely not converged.The contribution of the Reynolds stresses in this flow field is small, and a comparison between the pressure fields with and without including the Reynolds stresses was performed and negligible difference is observed.For this demonstration, the results are reported with the Reynolds stress terms included in the computation of the source field.
To obtain the 3D average pressure field measurements from PIV we use the compressible RANS equations, as described by van Oudheusden (2008).This requires three key, but reasonable assumptions: (1) adiabatic flow, (2) ideal gas and (3) negligible effect of density fluctuations.The latter assumption (3) is reasonable for moderate Mach numbers.The ideal gas assumption (2) is also valid for this flow, as the minimum compressibility factor is estimated to be P V /RT = 0.998 (Su, 1946).Given assumptions (1) and ( 2), we can find the temperature field T from the velocity field by knowing the total temperature T ∞ : (T ∞ = 173 K from isentropic relations) where γ is the specific heat ratio and V is the velocity magnitude.Once the temperature field is estimated, the following conservative form of the compressible RANS momentum equations is used: (van Oudheusden, 2008) where R is the ideal gas constant and u i u j = ūi ūj + u ′ i u ′ j .Equation 44 can be integrated using the matrix omnidirectional method to obtain ln( P /P ∞ ), and therefore P /P ∞ .
A few slices of the pressure field obtained with the matrix omnidirectional method, comprising of 212 × 266 × 101 = 5.6 million grid points, are presented in Figure 16 (a,  c, d).The spatial coordinates are normalized with respect to the fin spacing W = 63.5 mm.A comparison slice at y/W = 0.2 showing the pressure-Poisson solution with Neumann boundary conditions is also provided in Figure 16 (b), showing how far off the Poisson equation solution is, even though the same reference point at {x/W, y/W, z/W } = {0, 0.2, 0} was used in both cases, being enforced to P/P ∞ = 1.
Before further discussion is provided on the pressure fields shown in Figure 16 (a, c, d) is provided, we first demonstrate that the pressure fields obtained by the omnidirectional matrix method are a good representation of the true pressures.In Figure 17 (a) the pressures obtained by solving the omnidirectional matrix equation are presented at the center-plane, at the closest positions from the surface of the model (∼ 5 mm away from the surface), and compared with the pressures measured at the surface by a set of pressure taps connected to a Kulite KMPS-1-32-10psid pressure scanner (details in Seckin et al. (2023a)).The pressures from the omnidirectional matrix method (blue line) match very well with the measured surface pressures, being spatially displaced towards the downstream direction as they are inside the lambda shock structure and miss some of the boundary layer effects.The pressures from the Poisson solver, however, are not a match with our measurements.Different boundary conditions for the pressure-Poisson solver were tested (only considering the information known a priori, i.e., the free stream pressure), none of which yielded any better results.In Figure 17 (b), a comparison of different residual cutoffs is performed for the dual fin case for the omnidirectional matrix method.For the residual ϵ R = 10 −4 , 1072 iterations were required, taking 27 minutes to converge.The most converged case tested, at ϵ R = 10 −5 , took 2013 iterations (40 minutes).The convergence behavior at the boundary is a scaling of the entire pressure field towards the converged value.The same behavior was observed for the slanted cylinder flow presented in Section 4.1.
Although the time required to converge (27 minutes) is fairly long for one field (the average pressure field), the mesh size is far larger than was previously explored in past studies.The pressure fields displayed in Figure 16 (a, c, d) are data that was previously unavailable for this problem coming exclusively from experimental measurements, and highlights the main features of this flow with unprecedented detail.Features such as the oblique fin-tip shocks and reflected shocks are revealed to have a three-dimensionality due to the open top surface of this double-fin geometry.An expansion fan at the more downstream locations (x/W ∼ 1.2 − 1.4) is also revealed.Close to the floor surface where the shock interacts with the boundary layer, a large lambda shock is observed at the center plane.Especially for the double-fin problem, these features are difficult to see with a shadowgraph apparatus due to the difficult optical access owing to the presence of the fins.

Conclusion
In this work, we demonstrate a new approach for performing omnidirectional pressure integration in experimentally-acquired velocity fields.The approach presented considers all possible ray displacements and ray shifts through a double projection integral that computes partial weights for the neighboring cells of a uniformly-spaced rectangular grid.This approach recasts the omnidirectional ray integration method into a matrix inversion problem, which can be solved far more efficiently.We demonstrate this approach in two synthetic data sets and two experimental data sets, one of which is a compressible flow field with shocks and expansion waves.We find that the error in the reconstructed pressure is significantly lower in noisy velocity fields than the previous approaches described in the literature, including the omnidirectional ray integration this approach is based upon.Furthermore, we find that the computational cost for solving the matrix omnidirectional equation for one iteration is similar to solving the pressure-Poisson equation.However, many iterations are required to converge the boundary pressures for the matrix omnidirectional method.On the other hand, we find that the number of iterations required to converge the boundaries is always lower than the current implementations of the omnidirectional ray integration approach, since our approach takes the limit of ray integration to an infinite number of rays.The computational cost of the matrix omnidirectional approach is ∼ 20 times lower in 2D grids and potentially > 1000 times lower in 3D grids, with a more favorable scaling of O(N 4 ) instead of O(N 5 ).We successfully demonstrate this method on a grid up to 500 3 (125 million points) in size, which was previously unattainable with ray integration approaches.
In our demonstration of this approach with two experimental data sets, we see that a fairly large number of iterations is required to converge realistic data where some of the boundaries are near walls.However, the converged pressure fields match well with experimentally-measured pressures at selected locations, providing confidence that the fields obtained are an accurate representation of the real pressure fields and opening exciting new possibilities for volumetric pressure estimation in experimental compressible and incompressible flows.

Appendix: Example of matrix equation construction for a 3x3 domain
In this appendix, we show details of how the matrix Equation 12 is constructed for a simple 2D domain consisting of square cells (∆x = ∆y = ∆) of size 3×3 cells using the "facecrossing" method depicted in Figure 1 (a).The example domain is depicted in Figure 18 for reference.To compute the weights w in the weight matrices [W ] i , we need to know the ray counters A i j for each of the nine cells in the domain and the total ray count for each cell (A tot ).From Equation 8, we find that A tot = 8∆ for this domain comprising of square cells.
To compute the ray counters for the different boundary types encountered in this domain, we first precompute the constants c using Equation 17: For a 2D domain in the "face-crossing" scheme, there are 10 ray counters A i j for each cell.See Equations 11,15,16,18,19 and 20 for the equation forms used in their calculations.For the corner cell with index (1) in Figure 18, these coefficients are: Note all coefficients sum to A tot (in this case, 8∆), which is always required.The weights w i j are then trivially computed from A i j using Equation 10.For all other cells, we provide the coefficients for this example on Table 1.
If a cell within the domain was missing data, the coefficients for the neighboring cells, as well as the neighbor's neighbors, would change.As an example, if cell (1) contained a NaN value, then cells (2), (3), (4), ( 5) and ( 7) would have different values for the A i j coefficients.All other cells would be unaffected.
The matrix equation following Equation 12 is already too large in this example to be written with all coefficients.However, using Table 1 and Equation 10, we can express the matrix equation for this example as the following pentadiagonal matrix:  where w n E,5 , for example, is the weight corresponding to cell 5's eastern neighbor for iteration n.Note that the right hand side of Equation 48 can always be calculated, even for an initial guess, and is a vector.Also note that in the 3D case the matrix would be heptadiagonal.The terms in the source vector {S} in the right-hand side have the form provided in Equation 9. Two examples are provided below: Solving Equation 48for an initial guess {P } 0 would provide one iteration of update for {P } 1 .The value of {P } 1 can then be placed on the right-hand side of Equation 48 to solve for {P } 2 , and so forth.{P } n+1 appears to converge for this scheme in all tests we have performed, and we have never observed divergence with a starting guess of {P } 0 = 0.However, we have not examined nor proved that this is always true for any initial guess, and the proof of convergence for the general case is left as future work.

Figure 1 .
Figure 1.Graphical description of the integration schemes discussed.

Figure 3 .
Figure 3. Simplified example of a flow field with masking (a) and masking plus missing vectors (b) [Not considered in this work]

Figure 4 .
Figure 4. Graphical description of the arbitrary weight computation considering the cells neighboring the adjacent cell E.

Figure 5 .
Figure 5. Coordinate system definition (a), projection of far face N N on face N (b) and projection of dihedral face N F on face N (b).

Figure 6 .
Figure 6.(a) Accuracy of pressure solvers as a function of noise introduced in the velocity field.(b) Time required for 1 iteration of the algorithms considered, including the GPU implementation.
Figure7.Slice showing pressure contours obtained with the matrix omnidirectional method ("face-crossing" scheme) for the JHU turbulence database for the 200 3 box case (a-e) and 500 3 box case (f-j).5% noise corresponds to 1 standard deviation in the raw velocity vectors.

Figure 8 .
Figure8.Effect of velocity noise on pressure RMS error for the 3D JHU turbulence data set for the GPU implementation of the matrix omnidirectional "face-crossing" scheme.

Figure 9 .
Figure9.Taking the "rotating parallel ray" approach to infinite rays approaches the matrix omnidirectional result.

Figure 10 .
Figure 10.(a) Convergence of the residual as a function of pressure iterations for the 2D Taylor vortex at different domain crop factors.||P || b is the norm of the pressures at the boundaries.(b) Iterations to a fixed convergence criterion as a function of ||P || b .
(b), where ε R was fixed to 10 −4 and the number of iterations to convergence was counted for several values of ||P || b .The number of iterations to convergence appears to be a logarithmic function of the boundary norm ||P || b .

Figure 11 .
Figure 11.Convergence of the residual as a function of grid size for (a) 2D slices and (b)3D subvolumes of the JHU isotropic turbulence data set.

Figure 12 .
Figure 12.(a) Convergence of the residual as a function of pressure iterations.(b) Effect on error against ground truth for a 100 3 grid from the JHU dataset.

Figure 13 .
Figure 13.Total wall clock time and time per iteration for the GPU implementation to solve the 100 3 grid.

Figure 14 .
Figure 14.(a,b) Contours of center-plane pressure obtained from PIV measurements for the slanted cylinder model for the "vortex regime" using the matrix omnidirectional method (a) and pressure-Poisson method (b).(c) Comparison with surface measurements with traditional transducers.

Figure 15 .
Figure 15.(a) Contours of center-plane pressure obtained for "wake regime" for the matrix omnidirectional method.(b) Comparison of pressures at the surface.

Figure 16 .
Figure 16.Pressure contours at different slices for the (a, c, d) matrix omnidirectional solution and comparison with (b) pressure-Poisson with Neumann boundary conditions.Dashed white lines in (a) indicate slice locations for (c, d).Dashed blue line in (c, d) indicates fin location.Flow is left to right.

Figure 17 .
Figure 17.Comparison of the pressures obtained from PIV measurements with pressure scanner pressures.(a) Comparing pressure-Poisson and matrix omnidirectional.(b) Convergence behavior for the matrix omnidirectional method.

Figure 18 .
Figure 18.Example domain for equation construction.NaN's indicate locations where data is not available (i.e., boundaries).Small blue numbers are the value of the boolean b j for the neighboring cell.Numbers in black at the cell center are the cell index.

Table 1 .
Coefficients A i j for the simple 3×3 grid example.