KFVM-WENO: A high-order accurate kernel-based finite volume method for compressible hydrodynamics

This paper presents a fully multidimensional kernel-based reconstruction scheme for finite volume methods applied to systems of hyperbolic conservation laws, with a particular emphasis on the compressible Euler equations. Non-oscillatory reconstruction is achieved through an adaptive order weighted essentially non-oscillatory (WENO-AO) method cast into a form suited to multidimensional reconstruction. A kernel-based approach inspired by radial basis functions (RBF) and Gaussian process (GP) modeling, which we call KFVM-WENO, is presented here. This approach allows the creation of a scheme of arbitrary order of accuracy with simply defined multidimensional stencils and substencils. Furthermore, the fully multidimensional nature of the reconstruction allows for a more straightforward extension to higher spatial dimensions and removes the need for complicated boundary conditions on intermediate quantities in modified dimension-by-dimension methods. In addition, a new simple-yet-effective set of reconstruction variables is introduced, which could be useful in existing schemes with little modification. The proposed scheme is applied to a suite of stringent and informative benchmark problems to demonstrate its efficacy and utility. A highly parallel multi-GPU implementation using Kokkos and the message passing interface (MPI) is also provided.


INTRODUCTION
The numerical solution of systems of hyperbolic conservation laws has been a vigorous subject of research for several decades now.A notable feature of hyperbolic conservation laws is their ability to simultaneously support complicated -but otherwise smooth -solutions and discontinuous solutions.The nonlinear nature of the hyperbolic laws can potentially turn initially smooth flows into non-smooth flows with shocks and discontinuities.We seek to numerically solve hyperbolic conservation laws in di-vergence form where U is a vector of conserved variables, F is convex flux tensor, and S is a vector of source terms.In particular, we focus on the compressible Euler equations defined by et al. 2020), and the optimal recovery finite volume method (FVM) from Sonar (1996).
Finite difference methods (FDMs) readily generalize to higher dimensions by applying baseline one-dimensional schemes in a dimension-by-dimension manner (e.g., see Reyes et al. (2019)), which preserves the target one-dimensional solution accuracy in higher dimensions easily.The simplicity of achieving accuracy in higher spatial dimensions is mainly due to the fact that finite difference methods evolve pointwise quantities.In comparison, finite volume methods (FVMs) evolve volume-averaged (or cell-averaged) quantities, which complicates the design of reconstruction schemes in multiple spatial dimensions higher than second-order (McCorquodale & Colella 2011;Zhang et al. 2011;Balsara et al. 2009;Balsara 2009;Bourgeois & Lee 2022).As such, numerical solutions from multidimensional finite volume methods with naive dimension-bydimension spatial reconstruction will be limited to second-order (Zhang et al. 2011;Buchmüller & Helzel 2014a;Lee et al. 2017).
A few key challenges arise in the development of "multidimensional polynomial" WENO schemes.Several practitioners have taken multivariate polynomials to design novel high-order schemes (Balsara et al. 2009;Semplice et al. 2016).In general, however, multidimensional interpolation/reconstruction suffers from wellknown complications where not all stencils result in a solvable linear system (e.g., see Section 3.5 in Reeves et al. (2022)).Besides, the principal design choice of grid stencil requires that the full local stencil should be configured as small as possible while allowing the reconstruction of a polynomial to retain the desired degree for accuracy.The full local stencil should also be symmetric with respect to the grid to avoid having any preferred direction and allow decomposition into symmetrically placed substencils.The size and shape of the substencils also need to support the desired k th degree multivariate polynomial reconstruction compatible with the full local stencil.Matching the full stencil and substencil sizes to the dimension of polynomial spaces while satisfying the symmetry requirements is generally impossible.All of these issues become increasingly challenging as the degree k increases.Perhaps, the simplest (sub)stencil configuration would be to use rectangular stencils that allows reconstruction in a tensor product basis, although this approach comes at the cost of larger stencils than necessary to reach a given order of accuracy.Alternatively, one could reconstruct polynomials in the least squares sense by using more cells per (sub)stencil than the dimension of the corresponding space of polynomials.See more discussions in Bourgeois & Lee (2022); Reeves et al. (2022).
Instead, one can bring one-dimensional reconstruction schemes into higher dimensions by adding transverse corrections to the baseline one-dimensional schemes for each additional direction (Buchmüller & Helzel 2014b).In two dimensions these types of schemes use cellaveraged quantities to reconstruct face-averaged quantities, which are subsequently used as input to a second reconstruction in the transverse direction to reconstruct point values on the cell faces.In three dimensions three separate reconstructions are needed (hence increasing computational expense), cascading from cellaverages to face-averages, then face-averages to line-averages, and finally line-averages to point values.Centered at cell faces, the resulting point values, referred to as Riemann states, are by design constructed to attain high-order accuracy and numerical stability in multiple dimensions.These Riemann states can be passed to any Riemann solver resulting in a pointwisedefined numerical flux at each cell face center.Similar transverse corrections can be run in reverse to generate the necessary face-averaged fluxes from point-valued fluxes.See more details in Buchmüller & Helzel (2014b).
An alternative can be designed that only applies the transverse corrections to arrive at multiple pointwise-defined Riemann states along each transverse direction on each face and use them to subsequently evaluate the flux integrals at cell face centers through a quadrature rule (Zhang et al. 2011).As before, one uses a one-dimensional reconstruction scheme to build face-averaged quantities from the known cellaveraged quantities.In two dimensions a subsequent transverse reconstruction can be made and evaluated at several points on the face.In three dimensions the same cascade of reconstruction as described above must be done, but it needs to be done multiple times at multiple quadrature points on each 2D face to generate the appropriate pointwise-defined Riemann states.
These approaches above unfortunately suffer several essential drawbacks: (i) reconstruction of the Riemann states now requires multiple passes over the data hence requiring extra loads from memory, (ii) near the domain boundaries, the intermediate 2D face-averaged (and edgeaveraged in 3D) quantities need to be filled in accordance with the boundary conditions, which may be non-trivial, and (iii) implementations relying on distributed memory parallelism need to either perform additional communications to pass these intermediate quantities between neighboring subdomains or localize communication by utilizing much larger overlaps and repeating work adjacent to the local subdomain.In two dimensions, these drawbacks complicate the implementation mildly but are otherwise manageable.In three dimensions, however, these drawbacks become increasingly problematic.These issues are all exacerbated if adaptive mesh refinement is to be used, though it is certainly possible (Buchmüller et al. 2016).
Taking a kernel-based approach to reconstruction alleviates many of the aforementioned issues.The full stencil and substencils can be chosen with size and symmetry as the primary motives.The accuracy of reconstruction (or interpolation) on any (sub)stencil is determined by what polynomials can be reproduced there by the chosen kernel and stencil.Kernel-based reconstruction can be seen as a generalization of the least squares approach to polynomial reconstruction, where the polynomial spaces assigned to each (sub)stencil are implicitly defined (Schaback & Wendland 2006;Wendland 2004).Fully multidimensional reconstruction is thus easy to formulate and produces pointwisedefined Riemann states that are obtained directly from cell-average data.This removes the need for any intermediate quantities, which greatly simplifies the parallel implementation and reduces reconstruction to needing only a single pass over the data.
The present study builds around a series of recent studies exploring the use of radial basis functions (RBFs) and Gaussian Process (GP) modeling as an alternative to traditional polynomial-based approaches.The first shockcapturing finite volume GP algorithm was introduced in Reyes et al. (2018a).Called GP-WENO, this study put forward a new way of designing a versatile selectable-order property in modeling the 1D Euler and magnetohydrodynamics (MHD) equations.In addition to the newly proposed one-dimensional GP reconstruction scheme therein, this work also refactored the conventional L 2 -based a priori shockhandling smoothness indicators for polynomial WENO reconstruction (Jiang & Shu 1996) into an alternative that is genuinely designed for polynomial-free GP reconstruction.Similarly, the RBF-CWENO method introduced in Hesthaven et al. (2020) (and the related one dimensional scheme from Bigoni & Hesthaven (2017a)) combined radial basis functions and polynomials and proposed novel smoothness indicators that treated these different components separately.The GP paradigm has also been successfully developed in a finite difference method (FDM).Interested readers can refer to the work by Reyes et al. (2019), where a full threedimensional conservative GP-WENO finite difference scheme solves a stringent set of benchmark problems in the system of the Euler equations with improved solution accuracy and computational efficiency.
It is worth mentioning that the present study will be limited to the use of uniform Cartesian grids.Obviously, much simpler and cheaper finite difference methods (FDMs) could be used on these structured grids.We acknowledge that our numerical examples presented herein can be solved with lighter computational workload with FDMs.Regardless, we emphasize that the design of multidimensional finite volume methods remains a necessary task as they are much easier to incorporate into solvers utilizing adaptive mesh refinement (AMR) and into solvers utilizing unstructured meshes.Both AMR and unstructured meshes lie beyond the scope of the present study, but the presented scheme is designed with possible extensions to these cases in mind.The few items that truly rely on the underlying uniform mesh will be called out.Naturally, demonstrating the efficacy of the proposed scheme in the simpler case of a uniform mesh is a necessary step towards constructing more complex solvers in the future.
The aforementioned schemes all employ a priori nonlinear limiting which aims to suppress and avoid numerical oscillations before they can form.Unlike these a priori GP-WENO methods, a two-dimensional a-posteriori GP-MOOD (Multidimensional Optimal Order Detection) scheme in finite volume has been studied and reported in Bourgeois & Lee (2022).GP-MOOD adopts a set of high-order GP reconstruction methods (e.g., 3 rd , 5 th , and 7 th ) in place of the polynomial reconstructions in the original polynomial-MOOD approach (Clain et al. 2011;Diot et al. 2012Diot et al. , 2013;;Diot 2012).The resulting method is a strong positivity-preserving GPbased solver for shock-dominant compressible flows.This new GP-MOOD further improves upon the ingredients of the original MOOD limiting in the polynomial MOOD methods (Clain et al. 2011;Diot et al. 2012Diot et al. , 2013;;Diot 2012) by introducing the new Compressibility-Shock Detection (CSD) switch that controls a good balance between numerical accuracy and diffusivity.Besides, this new CSD switch can improve Discontinuous Galerkin (DG) methods that adopt the polynomial MOOD shock detection as subcell-based limiting to switch from DG to an alternative shock-stable method (e.g., FVM) at shock cells (Dumbser et al. 2014).
There have been other related GP applications beyond its role as a hyperbolic solver.In the recent work by Reeves et al. (2022), the GP-WENO method was extended to a 3rd-order prolongation algorithm in finite volume adaptive mesh refinement (AMR) simulations using AMReX (Zhang et al. 2019).As another nonfluid dynamics application, GP interpolation was shown as an improved mathematical tool for upsampling optical characters from coarse resolutions to fine resolutions (Reeves et al. 2020).
This paper introduces the KFVM-WENO (kernel-based finite volume method with WENO) scheme and proceeds as follows.Section 2 develops the kernel-based reconstruction scheme that is central to this work.Section 3 incorporates WENO into the reconstruction scheme to handle discontinuous data, deliberates on the choice of variables for reconstruction, and closes with an adaptation of the KXRCF indicator (Krivodonova et al. 2004) to flag when the nonlinear WENO limiting is required.Section 4 documents the incorporation of the self-adjusting positivity-preserving limiter detailed in Balsara (2012).Finally, the method in full is reviewed in Section 5, followed by a suite of benchmark problems in Section 6.

KERNEL-BASED RECONSTRUCTION
The first major component of a finite volume method for the solution of systems of conservation laws is the reconstruction of the state within each cell, which can then be evaluated to find high-order accurate pointwise-valued Riemann states along the boundary of the cell (or cell interface).This reconstruction, U (x), should simultaneously provide an accurate approximation of the true state and remain wellbehaved in the presence of shocks or other discontinuities.In the present study, we pose this reconstruction problem as optimal recovery in a reproducing kernel Hilbert space (RKHS) (Hesthaven et al. 2020;Schaback & Wendland 2006;Sonar 1996;Rasmussen & Williams 2005).This approach yields a form that is mostly dimension-independent and allows great flexibility in the choice of stencils.We note that related but distinct formulations have been examined previously (Reyes et al. 2018b;Guo & Jung 2017;Liu & Jiao 2016;Aboiyar et al. 2010).
By convention, we refer to this RKHS approximation as reconstruction (or generalized interpolation) when designed to match cell averages, and note that other more general input data could also be supplied.In this section, a kernelbased method for reconstruction is presented.The reconstruction presented here is linear (i.e., lacks nonlinear limiting) with respect the local stencil input data only linearly without limiting, and is hence inappropriate for use near shocks.This is resolved in Section 3.

Asymmetric reconstruction
Consider a set of finite volumes Ω h ⊂ R d for h = 1, . . ., N , each with volume ||Ω h || measured in the standard Eucliean L 2 norm.Let us denote the cell averaging functional for Ω h with respect to x ∈ Ω h by λ (3) The given data for reconstruction are cell averages of f (x), which we gather into the vector g whose entries are g h = λ (x) h f (x).Asymmetric kernel reconstruction seeks an approximant f (x) of the form which is a highly accurate approximation to the underlying function f (x).More specifically, we aim to find f (x * ) ≈ f (x * ) that represents pointwise Riemann states at x * .
Here, K(x, y) : R d × R d → R is a symmetric positive definite kernel function (Schaback & Wendland 2006;Fasshauer & McCourt 2015b), and x l denotes the center of the l th cell.In this work we take the kernel function to be the squared exponential where the hyperparameter ℓ is a length scale.
The second summation in Eq. ( 4) augments the kernel expansion in the first summation with an additional polynomial expressed in the monomial basis.The monomials x α (v) span the space P d p of polynomials with maximum total degree p in d spatial dimensions.Each α (v) is a multi-index (a d-tuple α ∈ N d 0 applying componentwise as a power and having |α| = α 1 + α 2 + . ..)where the subscript (v) indexes the D = dim P d p separate monomials.Here and throughout, parenthetical subscripts on boldface symbols (e.g., (v) on α (v) ) are meant to label one such vector or multiindex from a larger collection, while bare subscripts on plain symbols (e.g., h on g h ) indicate individual entries of a vector, matrix, or multiindex.
The polynomial term in Eq. ( 4) is not strictly required when using kernels such as the squared exponential, though it would be necessary for well-posedness of the reconstruction problem when using conditionally positive definite kernels such as the thin-plate splines (Schaback & Wendland 2006;Flyer et al. 2016).However, the inclusion of this polynomial term is still useful as it allows for the asymptotic accuracy of the reconstruction process to be maintained independent of the hyperparameter ℓ (Flyer et al. 2016).This allows ℓ to be fixed at moderate lengths to avoid problems with extremely illconditioned linear systems.The inclusion of the polynomial term forgoes any need for temporary elevations of precision as present in Reyes et al. (2018a) or the use of elaborate stable inversion schemes (Fasshauer & McCourt 2012;Fornberg & Wright 2004;Wright & Fornberg 2017a).
Two important observations are in order.First, the polynomial term does not restrict stencil selection in the same way that is seen in pure high-order piecewise polynomial schemes.The only requirement is that the included space of polynomials be unisolvent over the stencil, and in practice this means picking a stencil first and subsequently choosing the maximum polynomial degree p to maintain unisolvence (see Flyer et al. (2016) for a more fundamental discussion of this approach).Second, the form of the approximant f (x) in Eq. ( 4) is very generic and places little to no restriction on the stencil layout or shape of cells.These properties provide the choice of KFVM-WENO's multidimensional stencil with great flexibility (see more discussions in Section 3.1), relieving us from the stencil-related issues described in Section 1.
Enforcing that Eq. ( 4) matches the given cellaveraged data g = (g 1 , . . ., g h , . . ., g N ) T , requires that the coefficient vectors a and b satisfy The entries of the N × N kernel matrix Q correspond to the integrals of the kernel function K anchored at each x l , l = 1, . . ., N , over each cell Ω h , i.e., Similarly, the entries of the N ×D matrix P are the integrals of the monomials x α (v) over each cell Ω h , i.e., At present the linear system Eq.( 7) is underdetermined since there are N + D unknowns in a and b but only N equations.To resolve this issue, we follow the same approach used in the RBF literature (e.g., see Bigoni & Hesthaven (2017b)) and further require that Eq. ( 4) exactly reproduces polynomials spanned by the monomials present in the second summation.More specifically, this means that the set of coefficients a ∈ R N must lie orthogonal to the polynomial space, and hence must satisfy P T a = 0.This forces the first summation in Eq. (4) to only fit data that lies outside the span of the monomial terms.Putting things together, the coefficients a and b are found by solving the (N + D) × (N + D) block linear system Returning to Eq. ( 4) assuming a and b are now known, the approximant f ≈ f can be evaluated at a point x * via where the entries of T and S are respectively given by However, building and solving the block system in Eq. ( 10) every time reconstruction is needed would be very computationally expensive.Fortunately, as hinted by Eq. ( 13), most of the work being done in this process is independent of the data g that varies temporally and spatially.Considering the terms other than the data vector g yields that a reconstruction vector r could be instead computed as the solution of which gives This reduces the evaluation of the approximant in Eq. ( 13) to a single dot product Note that the weights w associated with the monomial terms are not explicitly needed for anything.This construction can be viewed as an optimization problem in the associated RKHS wherein w are the Lagrange multipliers from the polynomial exactness constraints.Since w is not directly used, the cost of producing one such point value depends only on the stencil size N and is independent of the maximal polynomial degree p.Each evaluation point x * will need its own reconstruction vector.The introduction of WENO in Section 3 will further require different reconstruction vectors for each substencil.Once a mesh configuration is determined, the reconstruction vectors r can be easily precomputed in an initialization step for our KFVM-WENO solver as a whole.This means that reconstruction to a point value only requires a single dot product (see Eq. ( 19)) between vectors whose size is determined by the stencil.For the uniform grids considered here we only need to compute one set of reconstruction vectors for a generic stencil and its substencils, detailed in Section 3.For unstructured grids a set of reconstruction vectors would need to be cached for each cell in the mesh as no two cells are likely to have identical stencils.Fortunately, the calculation of the various reconstruction vectors is trivially parallelizable and remains something that needs to be done only once during initialization.

Ties to Gaussian Processes
We note that this reconstruction method bears similarity to evaluations of the updated posterior predictive mean of a Gaussian process (GP) in the zero-noise limit (Reyes et al. 2018b(Reyes et al. , 2019;;Bourgeois & Lee 2022;Fasshauer & McCourt 2015a;Rasmussen & Williams 2005;Bishop 2007).Note that in general (20) hence the matrix Q in Eq. ( 8) is not symmetric.However, cell averaging functionals λ (x) h could also be incorporated into the kernel expansion in Eq. ( 4), thus symmetrizing the arising system.In doing so Q can be interpreted as a covariance matrix for a Gaussian process as was done to great success in Reyes et al. (2018bReyes et al. ( , 2019)); Bourgeois & Lee (2022).
We make another consequential remark on GP reconstruction.Viewing the reconstruction procedure through the lens of Gaussian processes could provide insight into the optimization of the length scale ℓ, and possibly into uncertainty quantification via an updated posterior covariance kernel (Rasmussen & Williams 2005;Bishop 2007).As length scale optimization and uncertainty quantification lie beyond the scope of this work, the remainder of this paper will only concern the deterministic interpretation of kernel reconstruction presented in the preceding section.

NONLINEAR RECONSTRUCTION
The reconstruction scheme discussed so far is linear with respect to the input data, which will unavoidably generate unwanted oscillations in the vicinity of shocks and discontinuities.In this section, a weighted essentially nonoscillatory (WENO) scheme is developed for these kernel-based reconstructions to control such oscillations.The great flexibility of kernelbased reconstruction allows many different stencil and substencil configurations.

Stencils and substencils
We consider "spherical" stencils of radius R = 2, 3, e.g., see Figs. 1 and 2. These stencils are unisolvent (or uniquely solvable) over monomials of total degree 3 and 5 respectively and hence yield reconstructions accurate at least to order O(∆ 2R ), where ∆ is the grid spacing.Careful selection of the hyperparameter ℓ can yield higher accuracy, tending towards O(∆ 2R+1 ), which we discuss further in Section 5.1.Examples of two dimensional stencils can be seen in Fig. 1 and Fig. 2 for R = 2 and R = 3, respectively.
Let S 0 = {(i, j) ∈ Z 2 : (i 2 + j 2 ) ≤ R 2 } denote the full circular stencil of radius R, indexed relative to the central cell where (i, j) = (0, 0) and reconstruction is being performed.This full stencil S 0 is then broken into N S = 5 substencils S q , including one central substencil of smaller radius, S 1 , and four other biased substencils, Figure 1.The full radius-2 stencil S 0 is shown on the left with its five substencils, S q , q = 1, . . ., 5, on the right.The cell with the diamond in each (sub) stencil indicates the central cell containing x * where reconstruction is performed.The full stencil S 0 has 13 cells, while each of the substencils has five.
S 2 , . . ., S 5 , given as: Stencils in three dimensions are formed in precisely the same way.First the full stencil is set as again indexed relative to the central cell where (i, j, k) = (0, 0, 0).This stencil is then split into N S = 7 substencils, including one central substencil of smaller radius, S 1 , and six other biased stencils, S 2 , . . ., S 7 , given as: For a given point x * in the cell where point values are sought, Eq. ( 7) through Eq. ( 19) furnish a different reconstruction vector for each Figure 2. The full radius-3 stencil S 0 is shown on the left with its five substencils, S q , q = 1, . . ., 5, on the right.The cell with the diamond in each (sub)stencil indicates the central cell containing x * where reconstruction is being performed.The full stencil S 0 has 29 cells, the central substencil S 1 has 13 cells, and each of the remaining biased substencils, S 2 , . . ., S 5 , has 10.
(sub)stencil.We denote the q th such reconstruction vector by r (q) on S q for each q = 0, . . ., N S .
Note that the (sub)stencils, as written, consist of collections of unit cells (i.e., cell size being unity) rather than cells matching the grid scale.The integrals required for filling the matrix Q in Eq. ( 8) and the sample vector T in Eq. ( 14) can easily be calculated over these unit cells.This change of variables only requires that the length scale ℓ be replaced by ℓ/∆ inside the kernel and naturally cancels out the pre-factor that scales by the cell volume and decouples the calculation of the reconstruction vectors r (q) from the description of the grid.Additionally, this reformulation will be exploited to generate gridindependent smoothness indicators in Section 3.3.
Furthermore, these particular choices of substencils are the only remaining item that is specific to uniform grids, though there would be no great difficulty in selecting substencils on unstructured grids.

Adaptive order WENO
A traditional WENO method leverages only the substencils (i.e., excluding the full stencil S 0 ) to recover the action of the full stencil S 0 indirectly through cleverly chosen nonlinear weights, ω q , that tend towards appropriately chosen linear weights, γ q , when the contained data is smooth.Mathematically speaking, recovering the action of the full stencil is only possible when the 0 th reconstruction vector r (0) on S 0 lies in the span of the remaining reconstruction vectors, i.e., r (0) ∈ span{r (q) : q = 1, . . ., N S }, assuming the reconstruction vectors of the substencils, r (q) , q = 1, . . ., N S , have been appropriately extended by zero-padding to match the dimension of r (0) .The exact linear weights (if found) would then satisfy r (0) = N S q=1 γ q r (q) .Unfortunately, however, the full stencil reconstruction vector r (0) does not lie in the span of the remaining reconstruction vectors.
Instead, inexact linear weights could be found by solving a least-squares problem (e.g., see Reyes et al. (2018a) for a 1D GP-WENO in FVM and Reyes et al. (2018b) for a multidimensional GP-WENO in FDM).Unfortunately, though, we have observed a degradation in accuracy when applying this approach to multidimensional FVM reconstruction; hence the leastsquares approach is not appropriate for the current study.
Alternatively, the adaptive order WENO method (WENO-AO) employs the full stencil S 0 directly and selects weights solely to provide stability (Balsara et al. 2016b).Given a vector of cell averages g (k) of f (x) supported on the k th (sub)stencil, KFVM-WENO forms the reconstruction (see also Appendix A in Reyes et al. ( 2019)), (21) Note that if ω q → γ q for all q in the absence of discontinuities on the highest order full stencil S 0 , the coefficient on the first term tends to one, and the term in parentheses tends to zero.Hence, when the nonlinear weights ω q approach the linear weights γ q , the highest accuracy reconstruction r T (0) g (0) is selected, hence full accuracy can be maintained independently of how γ q are chosen.Alternatively, if some substencils contain a discontinuity then those weights will fall to zero, thereby minimizing the contribution from those substencils.In this case, the full stencil will obviously also contain the discontinuity so ω 0 → 0, leaving only the substencils with smooth data to participate in the reconstruction.

Smoothness indicators and weights
The nonlinear weights, including unnormalized ω q and the associated normalized ω q , and the smoothness indicators β q within them, must be specified in an appropriate way for the reconstruction in Eq. ( 21) to simultaneously provide accurate reconstructions for smooth data and limited reconstructions for non-smooth data.In this section, we introduce multidimensional smoothness indicators β q on each S q for our KFVM-WENO, which measure the relative smoothness of the reconstructed function f q in Eq. ( 4) over each S q in a multidimensional way constructively.
We design these indicators β q so that they approximate the square of a scaled H p semi-norm Hp , where we set Here, Ω is the central cell where reconstruction is being performed and α is a multi-index.
The factor ∆ 2|α|−1 is present in the first formula Eq. ( 22) to make the quantity in the parentheses independent of the grid scale mimicking the use of undivided differences in standard polynomial WENO methods, see for example, Jiang & Shu (1996).The second line Eq.( 23) makes the change of variables x = x∆ (whereby Ω is introduced accordingly), which absorbs one factor of ∆ and places this in a formulation similar to the discussion regarding reconstruction vectors in Section 3.1.
The scaled H p semi-norm in the second part Eq. ( 23) cannot be integrated in closed form.To define β q such that β q ≈ | f q | 2 H 2 , we use a simple midpoint quadrature rule where the coordinate system has been shifted to place the center of the cell where reconstruction is occurring to the origin as denoted by 0 in Eq. ( 24).The evaluation of the partial derivatives of f q in Eq. ( 24) from given cell-average data g (q) on the q th (sub)stencil proceeds in much the same way in the reconstruction of Riemann states along the cell interface as previously done in Eqs. ( 11) to ( 13), except that we now need a differential version to compute those partial derivatives instead of the previous version in Eq. ( 14).This can be accomplished by replacing the integral version of the sample vector T in Eq. ( 14) by a new differential version of the sample vector and similarly replacing S by derivatives of the monomial terms.In addition, we follow a similar process in Eqs. ( 16) to ( 18) to obtain a separate reconstruction vector r required for each partial derivative ∂ |α| /∂x α .
Increasing the maximum order of differentiation, p in Eq. ( 23), improves the detection of discontinuous data and gives a more diffusive scheme overall.However, the calculation of these smoothness indicators is the most expensive part of the reconstruction process, and hence p should be chosen as small as possible while still avoiding the appearance of unwanted oscillations.Here we set p = R yielding 5 (or 9) derivatives in two dimensions for radius 2 (or 3) stencils and 9 (or 19) derivatives in three dimensions for radius 2 (or 3) stencils.We have observed that these degrees of derivatives are fully sufficient in suppressing oscillations, and higher degrees induce higher computational costs for little to no benefit.
Finally, the nonlinear weights ω q are generated from the smoothness indicators β q in the standard WENO-JS form (Jiang & Shu 1996).The unnormalized weights for each stencil are calculated as where ϵ is included to avoid division by zero.Finally, the unnormalized nonlinear weights ω q are normalized to obtain the so-called normalized weights as There are several ways to set a value for ϵ.First, ϵ can be set to a fixed small number independent of the grid scale as is done in many WENO schemes (see e.g., Jiang & Shu (1996)).In contrast, numerous follow-up studies have shown that ϵ must vary with the grid scale to maintain high order accuracy for many schemes, particularly in the resolution of critical points (see e.g., Henrick et al. (2005); Cravero et al. (2018)).Here we take the former route and set ϵ = 10 −40 which provides excellent robustness in the vicinity of very strong shocks.Meanwhile, we also have tested a grid dependent scaling satisfying ϵ ∼ ∆ R and observed improved accuracy in some cases, which did not contribute to further enhancement in general.Hence a constant ϵ a chosen in the current study.Moreover, in Section 3.5, we introduce indicators that allow nonlinear limited WENO reconstruction to be bypassed entirely and hence the simpler and more robust choice of our single fixed ϵ will not degrade the accuracy of the method as a whole.

Variables for reconstruction
The multidimensional WENO reconstruction procedure detailed in the prior subsections is only defined for scalar fields, thus reconstructing the Riemann states requires that the cell average states be separated into scalar fields.It is well known that performing limited reconstruction directly on the conservative variables leads to oscillatory results in the vicinity of shocks.As such, the cell averages of the conservative variables throughout the stencil need to be converted to another form prior to reconstruction.
Generically, for a system of N conservation laws, one can fix a transformation matrix Φ of size N ×N .Denoting the vector of cell-averaged conservative variables over cell Ω h by ⟨U ⟩ h , a conversion from ⟨U ⟩ h to a choice of reconstruction variable ⟨W ⟩ h is done by Reconstruction then proceeds over ⟨W ⟩ component-wise yielding a pointwise estimate of the state, w s , in the transformed variables.
The pointwise defined Riemann state u s can be obtained as Crucially, Φ must be constant throughout the stencil for the accuracy of the reconstruction to be maintained.In the following two subsections, we describe two ways to choose the transformation matrix Φ. Section 3.4.1 briefly overviews the most popular choice with characteristic variables, which is computationally expensive and restricted to governing systems where these variables are known.In Section 3.4.2,we address these issues by introducing a new approach using the so-called linearized primitive variables.

Characteristic variables
It is widely known in the computational fluid dynamics community that the best results come from limited reconstruction in characteristic variables, e.g., see Van Leer (2006).For each fixed direction η and reference state ⟨ U ⟩, the flux Jacobian is evaluated from the η directional flux F η .Then the eigendecomposition A = RΛL is computed, and the transformation matrices are set as Φ = L and Φ −1 = R.The reference state ⟨ U ⟩ is chosen as the cell average in the central cell of the stencil.Note that since the eigenvectors of the flux Jacobian depend nonlinearly on the components of the reference state ⟨ U ⟩, this only provides a second-order accurate approximation of the wave structure within the central cell; however, the limited accuracy of the wave structure does not influence the accuracy of the reconstruction as all that matters is that Φ be fixed over the whole stencil and that Φ −1 be its exact inverse.This characteristic reconstruction serves to optimally separate the originally coupled relationships of local flow vari-ables into nearly independent, decoupled characteristic components so that the influence of a discontinuity in one component can be isolated from the other components during reconstruction.

Linearized primitive variables
A critical drawback to using characteristic variables is that the nonlinear weights within the WENO reconstruction must be recalculated when the direction η is changed.On structured three-dimensional grids, these nonlinear weights need to be calculated multiple times for each cell, once in the x−direction for the reconstruction of the (i ± 1/2, j, k) Riemann states, again in the y−direction for the (i, j ± 1/2, k) Riemann states, and once again in the z−direction for the (i, j, k ± 1/2) Riemann states.This is of course not an issue for one dimensional reconstruction schemes since they only act in a single fixed direction, but for multidimensional reconstruction it is beneficial to avoid these repeated calculations.The situation worsens further for unstructured grids, with the nonlinear weights being re-evaluated for each and every face of the cell.
A cost-effective alternative can be made available with primitive variables that provide much better reconstruction than conservative variables in the vicinity of shocks with reduced oscillations.By being directionally-independent, they are more computationally friendly than characteristic variables needing only a half or a third as many nonlinear weight calculations in two and three dimensions, respectively (with even greater savings on unstructured grids).Obviously, the primitive variables depend nonlinearly on the conservative variables; converting all cell-averaged conservative variables, ⟨U ⟩ h , in the stencil to primitive variables naively will reduce the accuracy to secondorder immediately and irreversibly.To address this, we introduce a new approach below.
Let V denote the primitive variables associated with U , which for the compressible Euler equations are V = (ρ, u 1 , u 2 , u 3 , p), and again set ⟨ U ⟩ as a reference state in conservative variables matching the local cell-average.The irreversible reduction to second-order accuracy can be ameliorated by linearizing the map from conservative to primitive variables around the reference state as This gives an approximate primitive state V corresponding to any given conservative state U .By linearity, Eq. ( 31) can be directly averaged over a cell to obtain an approximation of the average primitive state simply by using a cell-averaged conservative state as the input.This can be applied over the entire stencil to generate approximate cell average values of the primitive variables ⟨V ⟩ h from each ⟨U ⟩ h .We emphasize that the produced cell averages are only approximate, and do not constitute a highorder representation of the averaged primitive variables.
The constant part of Eq. (31), has no influence on the nonlinear weights since it will disappear due to the derivatives in Eq. ( 24).Dropping the constant part and applying Eq. ( 28), we get as the cell-averged values of the linearized primitive variables for reconstruction.Reconstruction can now proceed componentwise over these variables resulting in the pointwise variable w s .Following Eq. ( 28) and Eq. ( 29), this establishes the transformation matrices where the full forms are given in Eq. (A2) and Eq. ( A1) respectively.We also note that finding Φ and Φ −1 is substantially simpler than finding the eigendecomposition of the flux Jacobian.The entries of both transformation matrices are all available by simply differentiating the maps between the conservative and primitive variables.In particular, the inverse transformation matrix is directly available without needing to perform any symbolic or numerical inversion.This approach may be interesting for more complicated systems of conservation laws, e.g., the ideal magnetohydrodynamics equations, radiation (magneto)hydrodynamics, multi-physics multi-fluid equations, or systems with non-convex fluxes.As a final note, the transformation matrices themselves contain a large number of zeros and the actual conversions can be done more cheaply by leveraging this fact.This is in contrast to the characteristic variables where the transformation matrices are dense and full.

Local smoothness indicators
The described WENO procedure is effective but considerably more expensive than directly forming an unlimited high accuracy reconstruction using the whole stencil.On the other hand, there are relatively few cells where the nonlinear limited WENO reconstruction is truly needed.For example, the study in Bourgeois & Lee (2022) suggests that, even in highly compressible, shock dominant 2D tests where the initial Mach number reaches as high as 800 (e.g., see Section 6.4), only a small fraction of less than 10% of the entire cells would need nonlinear limiting, while the rest 90% or more evolve well without any expensive nonlinear limiting.Hence, if a relatively cheap indicator can flag the cells that actually need WENO, then the cheaper unlimited reconstruction can be used elsewhere.MOOD style schemes (e.g., Clain et al. (2011); Bourgeois & Lee (2022)) take this idea to the extreme where unlimited reconstruction is attempted everywhere and order reduction is applied systematically to counteract spurious oscillations near discontinuities.MOOD schemes are inherently a-posteriori in their action, which could potentially cause challenging issues when incorporating them into highly parallel solvers for load balancing.
The KXRCF indicator introduced in Krivodonova et al. ( 2004), and the related limiter of Fu & Shu (2017), operates in an a-priori manner as a flag for troubled cells in discontinuous Galerkin (DG) methods.The underlying mechanism for this indicator is related to superconvergence properties of DG methods in the presence of outflows.Fortunately, the overall idea of the KXRCF indicator is general enough to adapt to the present finite volume method, and in fact could potentially be applied more broadly to other a-priori limited finite volume methods.Crucially, this indicator is much cheaper to calculate than WENO is to apply, so the added cost of evaluating the KXRCF indicator over all cells is generally negligible.Operationally, if a significant fraction of cells are marked for WENO limited reconstruction, the indicator could be deactivated and WENO reconstruction performed everywhere.
The essential idea behind the KXRCF indicator is to flag cells that exhibit large jumps in some value across cell faces shared with neighboring cells.Unlimited reconstruction is first applied to all cells to generate all Riemann states throughout the domain.Let s index all quadrature points on all faces of a given cell and superscripts (−) and (+) indicate states obtained from the current cell and the relevant adjacent neighbor, respectively.Then the absolute jump of an indicator variable representing some quantity q at the s th quadrature point is q . Similarly, let ⟨Q⟩ denote the same quantity evaluated using the cellaverage.For our purpose here, we use the entropy q = p ρ γ as the indicator variable, which then yields ⟨Q⟩ ≈ ⟨p⟩ ⟨ρ⟩ γ .For smooth data the jumps across faces should be small, e.g., O ∆ 2R ideally, while for rough data the jumps should be large.Hence, when the condition for some power m holds, the cell does not likely need any form of nonlinear limiting.Alternatively, when the relative jump in q is large at any quadrature point the cell will be flagged as needing WENO.This form is more conservative than that designed in Krivodonova et al. (2004), but works quite well in the context of the present finite volume method.The power m on ∆ in Eq. (36) controls the sensitivity of the indicator, i.e., larger powers will flag more cells as needing WENO while smaller values will flag less cells permitting larger jumps.For the current study, the power of m = 3/2 was chosen experimentally as it works well across a wide range of flow conditions and does not appear to need any problemspecific tuning.

POSITIVITY PRESERVATION
The WENO method discussed so far is adequate for many shock problems.However, problems exhibiting very strong shocks or near vacuum states may encounter negative states in the density and/or pressure variables during evolution.The high Mach number astrophysical jets shown in Section 6.4 are one such example.
To ameliorate issues with negativity, the above method can be combined with the positivity-preserving limiter introduced in Balsara (2012), which is similar to the methods described in Hu et al. (2013); Zhang & Shu (2010).The innovation in Balsara (2012) is the selection of density and pressure bounds in a datadependent manner that reduces the impact of extra parameters needing problem-specific tuning.
The current method only differs from Balsara (2012) by doing no reconstructions in the cell interior to test for positivity unless source terms are included in the problem.This reduces the computational cost of the limiter as those interior states would otherwise serve no purpose, but does mean that the limiter is no longer provably positivity-preserving.Within the implementation these additional states can easily be limited by including a zero source term if desired.
A brief description of the limiter as applied in three dimensions will be given here for completeness, and we refer the reader to Balsara (2012) for a more detailed discussion.In what follows let the index s label the individual quadrature points on all faces of the cell in consideration such that u s is the pointwise reconstructed Riemann state at the s th quadrature point.As in Section 3.4 let ⟨ U ⟩ be the averaged state in the cell where the limiter is being applied, and let the grid indices (i, j, k) be defined relative to this cell.

Density and pressure bounds
The limiter does not enforce any a-priori chosen bounds on the density and pressure, e.g., there is no fixed hard floor and/or ceiling values.Instead, these bounds are chosen in a way dependent on the local flow conditions around the cell where limiting is being performed.To do this, we begin with constraining the pointwise density value of each Riemann state from above and below such that ρ s ∈ [ρ min , ρ max ] with ρ min > 0, while the pointwise pressure value is constrained only from below such that p s ≥ p min > 0. Using the relative cell index convention for simplicity, these bounds depend upon the local cell average densities and pressures which are subsequently expanded as where the constant κ and the flattener η adjust how strict the bounds are by tending to unity only in compressive regions, all of which are set as in Balsara (2012).Note that the pressures used throughout this section are retrieved directly using cell average values rather than pointwise values and are hence only secondorder accurate.

Application of the limiter
With the allowable bounds on density and pressure known, the pointwise Riemann states on the boundary of each cell are modified by the positivity-preserving limiter in two stages as below.The limiter first ensures that the density is bounded above and below such that ρ s ∈ [ρ min , ρ max ].The size of the needed correction is given by and the Riemann states are corrected via There are two important observations to make here.First, on a given cell (i, j, k), all components of all Riemann states are modified in the same way to ensure consistency.Second, there will be no modification to the states if all densities lie in the desired range to begin with.All Riemann states will now have valid densities, but the pressures may not yet be bounded below by p min .As discussed in Balsara (2012); Hu et al. (2013); Zhang & Shu (2010), the convexity of the physically admissible region allows this minimum pressure to be enforced again by rewriting the states as a convex combination of their current values and the cell average value ⟨ U ⟩.That is to say, though now the selection of θ p is slightly more elaborate than that of θ ρ .In this case, the required correction for the s th quadrature point, denoted as θ p;s , appears as a root of As demonstrated in Balsara (2012) this is really a quadratic equation1 in terms of θ p;s which could be solved analytically.However, for the sake of a uniform implementation, we elect to solve Eq. ( 46) using a bisection approach as in the magnetohydrodynamics portion of Balsara (2012).This allows future inclusions of other equations of state in which Eq. ( 46) would fail to be quadratic.After solving Eq. ( 46) for θ p;s at each quadrature point where the pressure minimum was violated, the overall correction is selected as the smallest admissible root and Eq. ( 45) is applied.As before, it should be noted that all components of all Riemann states are modified in unison, and no correction will be applied if all pressures appearing in the Riemann states already satisfy p s ≥ p min .Additionally, this correction naturally leaves the densities bounded appropriately.As a final note, the presentation here only considers the Riemann states on the boundaries of a cell, but interior states could easily be included in the set {u s }.

OVERVIEW AND IMPLEMENTATION
All of the building blocks of the proposed scheme have been defined, and now we gather them together into an explicit step-by-step description.The application of the solver proceeds in three main stages, where the first and the second stages are conducted one-time as part of the initial setup, after which the last stage is performed repeatedly to evolve solutions until the final simulation time.The three main stages include: (i) constructing the stencils and reconstruction vectors, (ii) evaluating initial conditions into cell average quantities, and finally, (iii) advancing the cell averages through time.
We elaborate on each of these stages in the following subsections.

Stencils and weight vectors
To prepare the stencils for use in each uniform grid simulation, one needs to: 1. Enumerate all cells in the full stencil, S 0 , and the remaining substencils, S q , q = 1, . . ., N S .
2. Form reconstruction vectors, r (q) , for each face quadrature point x s relative to each (sub)stencil using Eqs.( 16) to ( 18).The quadrature points on each face are set using a Gauss-Legendre quadrature rule with R points per dimension.This yields 2R th order accuracy matching that of the reconstruction scheme.
3. Form reconstruction vectors for all partial derivatives needed to evaluate the smoothness indicators in Eq. ( 24) relative to each (sub)stencil.
The choice of the hyperparameter ℓ in the kernel Eq. ( 5) deserves further discussion.Kernelbased interpolation based on the squared exponential kernel can be used without the trailing monomial terms in Eq. ( 4).In this case maintaining a consistent order of accuracy requires that the hyperparameter ℓ be fixed independent of the grid scale.As a result, the kernel matrix in Eq. ( 10) becomes increasingly ill-conditioned as the grid is refined, and a number of elaborate stable inversion schemes have been developed to handle this issue (Fornberg & Wright 2004;Wright & Fornberg 2017b;Fornberg et al. 2013).Instead, another resolution is made available by including the monomial terms in Eq. ( 4), which yields guaranteed convergence rates independent of ℓ.Larger values of ℓ are desirable as discussed in Flyer et al. (2016).Then, a good way to proceed is to set the monomial degree as high as possible, whereby one can increase ℓ as far as allowed without needing to resort to the more elaborate stable inversion schemes that avoid (near) singular matrix inversions (Fornberg & Wright 2004;Wright & Fornberg 2017b;Fornberg et al. 2013).This approach with the monomial terms is the strategy we take here, and over all of the various stencils we have found that ℓ = 5∆ is a good baseline choice.Larger values are possible, but none of the presented results are particularly sensitive to this choice.

Evaluation of initial conditions
The initial conditions must be set accurately if the succeeding evolution is to be meaningful, e.g., see Bourgeois & Lee (2022).Here the initial cell averages are filled using a tensorproduct Gauss-Legendre rule on each cell.As for the quadrature points on faces, we use a rule with R points per dimension (see bullet point 2 in Section 5.1).

Time advancement
We describe the overall solution advancement of the KFVM-WENO solver whose highorder spatial solutions are temporally evolved by multistage Runge-Kutta (RK) type integrators.The implementation of these time integrators ultimately comes down to being able to evaluate all spatial terms with respect to some given state.The evaluation of the spatial terms proceeds as follows: 1. Fill the ghost cells in accordance with the boundary conditions.
2. Calculate Riemann states at each quadrature point on cell faces (see below).
3. Apply the positivity preserving limiter on the Riemann states, following Section 4.
4. Populate Riemann states outside of the physical domain in accordance with the boundary conditions.
5. Call an approximate Riemann solver at each flux quadrature point.
6. Integrate fluxes and update cell averages in accordance with the chosen RK method.
Naturally, the second step of calculating the high-order KFVM-WENO Riemann states contains most of the contributions from this work.This proceeds as follows: 1. Reconstruct unlimited Riemann states on all cells using cell averages from the full stencils S 0 .
2. Evaluate the KXRCF indicator from Section 3.5, and flag cells needing WENO.
3. For all flagged cells do the following: (a) Construct transformation matrix Φ from Eq. ( 35) using the central cell average data as the reference state.
(b) Project the average state for each cell in S 0 onto the linearized primitive variables via Eq.( 28).
(c) Reconstruct pointwise values of linearized primitive variables at all face quadrature points via the KFVM-WENO reconstruction detailed in Section 3.2.
(d) Project the pointwise linearized primitive values back to conservative variables via Eq.( 29), yielding the Riemann states.

Parallel code implementation
The presented scheme has been implemented using Kokkos (Trott et al. 2022;Edwards et al. 2014) for shared memory parallelism combined with the message passing interface (MPI) for distributed memory parallelism.This allows for straightforward use of multiple GPUs for high degrees of parallelism.Internally, the reconstruction process makes extensive use of the batched linear algebra routines from the Kokkos-Kernels package (Trott et al. 2021).We encourage the interested reader to consult our code for more details (May 2024).

NUMERICAL RESULTS
The proposed scheme is evaluated against a variety of benchmark problems, several of which are known to be very stringent in the literature.Below, we briefly overview what is expected in each test problem in this section.
First we experimentally verify the expected orders of accuracy through a convergence study using the isentropic vortex problem (Shu 1998;Spiegel et al. 2015) in Section 6.1.The Sod shock tube problem (Sod 1978) is then solved in grid-aligned and tilted configurations in Section 6.2 as a first evaluation of the shock handling capabilities, and to test if the method has any inherent preference for grid-aligned phenomena.
With those two fundamental tests done we move to more challenging problems.Section 6.3 presents a Richtmeyer-Meshkov instability driven by a Mach 3 shock wave as described in Samtaney (2003).Next, we solve two very stringent problems in Section 6.4,where we consider the high Mach astrophysical jet problems from Balsara (2012).
Finally, two problems with physical viscosity are solved.The Taylor-Green vortex is shown in Section 6.5 and compared against the validated benchmark data from International Workshop on High-Order CFD Methods ( 2021), and Section 6.6 considers a viscous version of the Rayleigh-Taylor instability as described in Shi et al. (2003).
All problems are solved using explicit Runge-Kutta time integrators with error-estimatebased step size selection in addition to the more standard CFL constrained step sizes.To this end we utilize the [3S] * + and [4S] * + time integrators from Ranocha et al. (2022), which also provides an excellent discussion of error-based step size selection.Due to our treatment of problems with very strong shocks, we also consider the RK-SSP(4,3) method of Kraaijevanger (1991) with the embedded RK-SSP(3,2) method described in Fekete et al. (2022), the combination of which is hereafter denoted as SSP(4,3,2).
The flux integrals are evaluated using tensorproduct Gauss-Legendre quadrature rules with R points per dimension as discussed in Section 3. The length scale hyperparameter in Eq. ( 5) is always set as ℓ = 5∆.All problems use the newly proposed linearized primitive variables from Section 3.4.2.As for Riemann problems, the HLLC+ approximate Riemann solver from Chen et al. (2020) is applied to all problems apart from the astrophysical jets which use the more stable HLL (Harten et al. 1983;Toro 2013) approximate Riemann solver.In each case wavespeeds are estimated following Batten et al. (1997).
Table 1.Shown are the experimental orders of convergence (EOC) for the described method with the radius R = 2 scheme (left) and the R = 3 scheme (right) as tested on the isentropic vortex problem.

Isentropic vortex
The isentropic vortex is one of the few nonlinear problems in the literature where its initial condition serves as a smooth, exact solution to the compressible Euler equations.By design, the isentropic vortex evolves in a fully nonlinear manner.These features make it an ideal candidate for experimentally validating the convergence rate of a given method.Herein we set up the problem as discussed in Spiegel et al. (2015), which uses a domain of Ω = [−10, 10] 2 with periodic boundary conditions and initial conditions given by where r = 1 is the vortex radius and the rotation rate ω is set as After evolving to the final time t = 20, the vortex returns to its initial position, and the accumulated error during the run can be found by comparing the initial and final states.In Table 1, the experimental L 1 convergence rates of the method are demonstrated for radius R = 2 and R = 3 stencils respectively.In all cases the [4S] * + time integrator (Ranocha et al. 2022) is used with the tolerances set as atol = rtol = 10 −6 which dominate over a maximal CFL of 1.0.This tolerance is tight enough to ensure that the spatial error dominates in every case.
In Table 1, we display the obtained L 1 errors and associated experimental order of convergence (EOC) rates where E c and E r are the errors in the L 1 norm on the coarse and the next coarse resolutions (e.g., if E c is measured on 32 2 , E r is on 64 2 ) for the radius R = 2 and R = 3 schemes.
For radius R = 2 we expect at least 4 th -order accuracy.Indeed, for resolutions beyond 64×64 we see convergence rates matching or exceeding the 4 th -order convergence.As noted in Section 2.1 the polynomial tail in Eq. ( 4) is not necessary to obtain a well-behaved and accurate scheme, but is rather present to allow smaller choices for ℓ and a better conditioned linear system in Eqs. ( 16) to (18).Indeed the kernel part of the expansion still plays an active role in improving the accuracy of the overall scheme, and better than 4 th -order accuracy is observed in Table 1.Similarly, for radius R = 3, we expect at least 6 th -order accuracy and again we observe orders near or beyond the expected rate after a resolution of 64 × 64.

Sod shock tube
The Sod shock tube problem (Sod 1978) is staple benchmark for any numerical method treating the compressible Euler equations.We consider this problem in two configurations.First, in the standard grid-aligned formulation we take the domain to be Ω = [0, 1] × [0, 0.04] with outflow conditions in the x−direction and periodic conditions in the y−direction.The initial conditions are piecewise constant with ρ L = 1 and p L = 1 set for x < 0.5 and ρ R = 0.125 and p R = 0.1 otherwise, and zero velocity throughout.
Second, we consider a tilted configuration where the same solution evolves oblique to the grid by the angle of θ ≈ 26.5651 • (or tan θ = 1/2) to test if there is any preference for grid-aligned phenomena.Following the idea in Kawai (2013); Lee et al. (2021), the domain is set as Ω = [0, √ 5]×[0, 2 √ 5] with periodic conditions in both directions.The tilted coordinate is set as x ∥ = 1 √ 5 (2x + y) with 0 ≤ x ∥ ≤ 4, and the left state is initialized in the regions 0 ≤ x ∥ ≤ 0.5, 1.5 < x ∥ ≤ 2.5, and 3.5 < x ∥ ≤ 4, while the right state is imposed elsewhere.In this configuration the final density and pressure extracted along the line 0 ≤ x ∥ ≤ 1 will match that of the grid aligned configuration.
These problems are solved with both the radius R = 2 and R = 3 schemes to the final time t = 0.2 using the SSP(4,3,2) time integrator with tolerances atol = rtol = 10 −4 and a maximum CFL of 1.25.For the grid-aligned configurations a grid with spacing ∆ = 1/100 is used.For the tilted configuration a comparable resolution is achieved by using a grid with spacing ∆ = √ 5/250, which yields 100 cells along the line 0 ≤ x ∥ ≤ 1 where the solution is extracted.The resulting density traces can be seen in Fig. 3, and all schemes can be seen to produce excellent results in comparison to the exact solution.The inset shows that all schemes resolve the contact discontinuity in roughly three cells with only a small overshoot from the R = 3 scheme in the tilted configuration.Overall, the present scheme has a minimal preference between features irrespective of the underlying grid configurations.

Richtmeyer-Meshkov instability
Here we consider a Richtmeyer-Meshkov instability similar to the unmagnetized case in Samtaney (2003).A right-traveling shock impinges on an oblique density jump, and the initially straight interface is bent as the shock diffracts through it.As time progresses secondary Kelvin-Helmholtz instabilities are excited along the density interface.The domain is Ω = [−1/2, 11/2] × [0, 1] with fixed inflow conditions at x = −1/2, extrapolation conditions at x = 11/2 and reflecting conditions in the y−direction.The problem is parameterized by the shock Mach number M a and the density ρ D of the gas downstream of the interface.The initial density is given by the initial pressure is given by and finally the initial x−velocity is given by As in Samtaney (2003) we set the parameters M a = 3 and ρ D = 2.In Fig. 4, we show the density fields around the interface at the final time of t = 3.33 on grids with spacing ∆ = 1/512 as produced by the radius R = 2 and R = 3 schemes, respectively.In both cases the SSP(4,3,2) time integrator is used with tolerance atol = rtol = 10 −3 and a maximum CFL of 1.0.The density profiles shown in Fig. 4 qualitatively agree with the results presented in Samtaney (2003) despite the latter making extensive use of adaptive mesh refinement to reach an We also note that the cost of activating KXRCF is negligible.The average time to execute one right hand side evaluation (which in-cludes all reconstruction, limitation, Riemann solves, etc.), with the indicator active was 0.044 seconds and without the indicator the average time was 0.050 seconds.The overhead in applying the indicator and doing a two-pass reconstruction is easily worthwile, and would only become more useful in three dimensions or with larger stencils.

Astrophysical jets
Here we consider the high-density and lowdensity astrophysical jet problems from Balsara (2012); Bourgeois & Lee (2022); Ha & Gardner (2008) as well as an extension of the low-density jet to three dimensions.The domain used is Ω = [0, 1/2] × [0, 3/2] in two dimensions and Ω = [0, 1/2] × [0, 3/2] × [0, 1/2] in three dimensions.Reflecting conditions are applied on the lower boundaries of x (for 2D and 3D) and z (for 3D only), outflow conditions are applied on the upper boundaries of x, y (for 2D and 3D) and z (for 3D only), and mixed inflow/outflow conditions are applied on the lower y boundary.The highly compressible extragalactic jet enters the ambient flow through a narrow slit configured as r 2 = x 2 < 0.05 2 for 2D and r 2 = x 2 + z 2 < 0.05 2 for 3D, where the inflow conditions are imposed on the slit.Apart from the narrow slit, the remainder of the lower y boundary is set to simple outflow extrapolation conditions.
In all cases, the injected jet is initialized with density ρ = γ and unit pressure p = 1 and all runs are solved using the HLL approximate Riemann solver and the SSP(4,3,2) time integrator with tolerances of atol = rtol = 10 −2 and a maximum CFL of 1.0.
In the high-density case, the jet evolves with a y−velocity of 800 into an initially quiescent background with density ρ = γ/10 and unit pressure, making the jet's density is ten times higher than the background density.On the other hand, in the low-density case, the jet is ten times lighter than the background ambient flow that has density ρ = 10γ and unit pressure, moving with a y−velocity of 100.All other velocity components are zero in all cases.These configurations yield jet Mach numbers of 800 and 100, respectively for the high-density jet and the low-density jet cases.
In two dimensions both the high and low density jets are solved using the radius R = 3 scheme to final times of t = 0.002 and t = 0.04 respectively on grids with spacing ∆ = 1/512, and the resulting logarithmic density fields can be seen in Fig. 6.In three dimensions we consider only the more stringent low-density jet using the radius R = 2 scheme on a grid with spacing ∆ = 1/384.The logarithmic density field at the final time t = 0.035 can be seen in Fig. 7.
As seen in Fig. 6 and Fig. 7, the overall evolution of the jets, the flow dynamics surrounding the jet envelops, the internal flow structures such as Kelvin-Helmholtz instabilities in the "cocoon" region, are all well captured and show excellent agreement with the results studied in Balsara (2012); Bourgeois & Lee (2022); Ha & Gardner (2008).Our results assure that the present schemes are capable of simulating highly compressible flows where the shock strengths are much beyond the typical supersonic, hypersonic, and high-hypersonic regimes.

Taylor-Green vortex
The Taylor-Green vortex studied in Taylor & Green (1937) furnishes a prototypical transition to turbulence and turbulence decay problem.Initially, smooth large-scale vortices decay into smaller and smaller vortices and eventually fully decay.The Taylor-Green vortex is well studied in literature especially with respect to high order methods and is therefore an ideal test case for our schemes.
We consider the Taylor-Green vortex at Reynolds number Re = 1,600 following its prescription as a challenge problem reported using incompressible flow solvers (e.g., see Van Rees et al. (2011) for a comparison study using a  particle-mesh vortex method and a pseudospectral method) as well as in several codeto-code comparison workshops on high-order methods for computational fluid dynamics.
Reference scales ρ, U, L are fixed for the density, velocity, and length respectively.The induced time scale is τ = L/U , and the pressure scale is chosen to be P = ρU 2 .After nondimensionalizing relative to these reference scales, we solve the problem in a triply periodic box on the domain Ω = [−π, π] 3 , with the initial conditions which correspond to constant initial temperature as ρ = γM a 2 p.We follow the guidance on the problem statement found in one of the CFD workshops held at the NASA Glenn Research Center 2 to set the Mach number as low as M a = 0.1 to approximate incompressible flow.
The nondimensional governing equations are where the vector of conserved quantities U and the inviscid fluxes F are those in Eq. ( 2).The additional viscous fluxes are given by where the scaled stress tensor is 2 See for instance the problem description of case C3.3 at https://www1.grc.nasa.gov/research-and-engineering/hiocfd/ and the thermal conduction is where the Prandtl number has been fixed at P r = 0.71.Since this paper has the compressible Euler equations as the primary focus we elect to discretize these viscous terms through the standard second order accurate finite differences for simplicity to handle viscous effects.This is justified in this high Reynolds number case, although future extensions to high accuracy viscous terms would be interesting.This problem is solved using a radius R = 2 stencil on a grid with spacing ∆ = π/192 yielding 384 3 cells in total.The Q = 0.1 isosurface of the Q−criterion colored by velocity magnitude at non-dimensional time t = 8 is shown in Fig. 8.Our result can be compared directly with the results presented in Giangaspero et al. (2015).
Running the Taylor-Green vortex problem is considered a challenging benchmark code-tocode verification test for a wide range of highorder schemes.As is standard for this challenge problem, the fully converged high resolution data available from International Workshop on High-Order CFD Methods ( 2021) is used to provide quantitative validation of the present scheme.These results were obtained using a pseudo-spectral scheme for the incompressible Navier-Stokes equations on a grid of size 512 3 , the details of which can be found in the associated problem statement.
In Fig. 9 we compare the (non-dimensional) kinetic energy dissipation rate, − d Êk dt , from the present scheme to the reference data.We observe excellent agreement between our scheme and the reference data through the peak dissipation rate at time t ≈ 9, with only minor discrepancies through the decay process after the peak.Here we consider a two dimensional viscous Rayleigh-Taylor instability.The domain and initial conditions follow the setup from Shi et al. (2003), though now an additional physical viscosity is added.The governing equations follow from the previous section (see Eq. ( 58)) with an added source term in Eq. ( 59).

Rayleigh-Taylor instability
Gravity is taken to point in the positive y−direction, and nondimensionalizing as before Finally, the x−direction boundaries are supplied with reflecting conditions, and the y−direction boundaries are held fixed at the initial density and pressure with zero velocity.The source terms are constant and contain only quantities for which cell averages are already available, namely the density and y−momentum, and the averaged source ⟨S⟩ is trivial to find for each cell.However, the implementation allows arbitrary user-provided source terms, so these gravitational sources are treated identically.As described in Section 4, internal states can also be reconstructed within each cell on a tensor-product Gauss-Legendre set of nodes.The source term is evaluated over these states and subsequently integrated.
The problem is evolved to the final time of t = 2.5 using the [3S] * + time integrator with tolerances atol = rtol = 10 −3 and a maximum CFL number of 1.25.In Fig. 10, we display the final density fields obtained by KFVM-WENO with radius R = 2 and R = 3 sten- cils on a grid spacing of ∆ = 1/1024.Morphologically, with the explicit physical viscosity, a smooth leading cap is observed with no secondary Kelvin-Helmholtz type instabilities along its interface, and similarly there are no secondary instabilities along the central column.Secondary Kelvin-Helmholtz instabilities are visible on the inner region of the rising cap, and their structure is consistent over a range of grid resolutions.The two hooks (or arms) at the lowest part of the cap, as well as the position and the shape of the roll-up just above it, are converged and appear the same for all resolutions above ∆ = 1/256.The results are also consistent between the R = 2 and R = 3 schemes, with at most a 2% difference in density in corresponding cells.However, the above convergent solution behavior is in opposition to results obtained from inviscid solvers that exclude explicit viscosity but rely only on numerical dissipation because, in that case, there is no agreeable solution for the method to converge to.Indeed finer and finer scale structures will appear each time the grid is refined or the numerical dissipation is lowered by using a larger stencil (see Shi et al. (2003) for instance).Furthermore, methods with sufficiently low numerical dissipation are prone to breaking symmetry due to the non-associativity of floating point addition (Fleischmann et al. 2019).While fixes for this issue are available for dimension-bydimension schemes, it remains unclear how one would avoid non-associativity errors in multidimensional reconstruction without drastically increasing the computational cost.Using a physical viscosity as done here avoids all of these problems by setting a single agreeable solution to converge to, and yields more scientifically meaningful results.

CONCLUSION
This paper proposes a multidimensional adaptive order WENO finite volume method using kernel-based reconstruction.The use of kernelbased reconstruction allows great flexibility in the choice of stencils and substencils in multiple spatial dimensions.We showed in this paper that our non-polynomial, kernel-based design simplifies the implementation of high-order finite volume schemes in multiple dimensions by reconstructing all pointwise Riemann states along cell boundaries directly from cell-average data, and eliminating the need to define boundary conditions for intermediate quantities as in modified dimension-by-dimension schemes.
Furthermore, alternative variables for reconstruction, dubbed linearized primitive variables, are proposed as a simplification over the use of characteristic variables.These are simpler to define and implement than characteristic variables.Crucially, these provide directionindependent information, which allows the same nonlinear weights within the WENO method to be used for the reconstruction of all Riemann states on all faces of a given cell.Alongside these new variables for reconstruction, we also proposed a straightforward adaptation of the KXRCF troubled cell indicator for use in finite volume schemes allowing WENO to be completely sidestepped in most cases.The calculation of nonlinear weights is the most expensive part of the whole scheme, so the use of these variables and the troubled cell indicator together provide a significant reduction in comcost.
Finally, the proposed scheme is evaluated against a variety of stringent and illustrative benchmark problems.The method simultaneously demonstrates high-order nonlinear accuracy on smooth flows, robust behavior in the face of strong shocks, and minimal preference for grid-aligned phenomena over non-aligned phenomena.

Figure 3 .
Figure 3. Shown is a trace of the density in the Sod shock tube problem as obtained from four different cases.The solid black line shows the exact analytical solution.The dashed lines with circle and square markers show the results of the radius R = 2 and R = 3 schemes in the grid-aligned configuration.The dotted lines with cross and triangle markers show the results of the radius R = 2 and R = 3 schemes in the grid-tilted configuration.The inset shows a zoom-in of the region near contact discontinuity.

Figure 4 .
Figure 4. Shown is the density field for the Richtmeyer-Meshkov instability at the final time of t = 3.33 as solved by the radius R = 2 scheme (top) and the radius R = 3 scheme (bottom) on a grid with spacing ∆ = 1/512.We display the view zoomed into the region [5/2, 11/2] × [0, 1] to highlight the interface.

Figure 5 .
Figure 5. Shown in black are the cells flagged for WENO reconstruction at the final time for the radius R = 2 scheme.The view has been zoomed in to match Fig. 4.

Figure 6 .
Figure 6.Shown are the logarithmic density fields for the high-density (left) and low density (right) astrophysical jets in two dimensions at the final times of t = 0.002 and t = 0.04 respectively.In both cases radius R = 3 stencils were used on grids with spacing ∆ = 1/512.

Figure 7 .
Figure 7.The left panel shows the logarithmic density field for the low-density astrophysical jet in three dimensions at the final time of t = 0.035 as solved by the radius R = 2 scheme on a grid with spacing ∆ = 1/384.The right panel shows the corresponding numerical Schlieren image defined as ln (1 + |∇ρ|).

Figure 8 .
Figure 8.The Q = 0.1 isosurface of the Qcriterion colored by velocity magnitude is shown for the Taylor-Green vortex at non-dimensional time t = 8 on a grid with spacing ∆ = π/192.

Figure 9 .
Figure 9.The evolution of the kinetic energy dissipation rate is compared against the fully converged reference data from International Workshop on High-Order CFD Methods (2021).

√
gL is the Froude number.We set the domain as Ω = [0, 1/4] × [0, 1].Initially, the density is set to ρ high = 2 for y < 1/2 and ρ low = 1 otherwise, the pressure is set as of specific heats is fixed at γ = 5/3.The Froude number is set as F r = 1, the Prandtl number remains P r = 0.71, and the Reynolds number is set to Re = 20,000.The instability is seeded by a small vertical velocity perturbation given as v = −0.025γp ρ cos(8πx).

Figure 10 .
Figure10.Shown is the density field for the viscous Rayleigh-Taylor instability at the final time of t = 2.5 with Reynolds number Re = 20,000 and Froude number F r = 1.These results were obtained with the radius R = 3 scheme on a grid with spacing ∆ = 1/1024, though they are insensitive to both grid resolution and to the stencil radius.