MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver

Duoming Fan; Andrew Nonaka; Ann S. Almgren; Alice Harpole; Michael Zingale

doi:10.3847/1538-4357/ab4f75

1. Introduction

Many astrophysical flows are highly subsonic. In this regime, sound waves carry sufficiently little energy that they do not significantly affect the convective dynamics of the system. In many of these flows, modeling long-time convective dynamics are of interest, and numerical approaches based on explicit compressible hydrodynamics are intractable, even on modern supercomputers. Our approach to this problem is to use a low Mach number model where sound waves are eliminated from the governing equations while retaining compressibility effects due to, e.g., nuclear energy release, stratification, compositional changes, and thermal diffusion. When the Mach number (the ratio of the characteristic fluid velocity over the characteristic sound speed; Ma = U/c) is small, the resulting system can be numerically integrated with much larger time steps than a compressible model. Specifically, the time step increase is at least a factor of ∼1/Ma larger. Each time step is more computationally expensive due to the presence of additional linear solves, but for many problems of interest the overall gains in efficiency can easily be an order of magnitude or more.

Low Mach number models that do not contain acoustic waves have been developed for a variety of contexts including combustion (Day & Bell 2000), terrestrial atmospheric modeling (Durran 1989; O'Neill & Klein 2014; Duarte et al. 2015), and elastic solids (Abbate et al. 2017). For astrophysical applications, a number of approaches to modeling low Mach number flows have been developed in recent years. One of the more similar approaches to ours is Lin et al. (2006); however, this approach is only first-order accurate and does not account for atmospheric expansion. There are several other approaches that retain the effects of acoustic wave propagation with various strategies to efficiently and robustly handle the low Mach number limit. In the reduced speed of sound technique and related methods, the speed of sound is artificially reduced by including a suitable scaling factor in the continuity equation, reducing the restriction on the size of the time step (Rempel 2005; Hotta et al. 2012; Wang et al. 2015; Takeyama et al. 2017; Iijima et al. 2019). There are semi-implicit all-Mach number solvers, where the Euler equations are split into an acoustic part and an advective part (Kwatra et al. 2009; Degond & Tang 2011; Cordier et al. 2012; Haack et al. 2012; Happenhofer et al. 2013; Chalons et al. 2016; Padioleau et al. 2019). The fast acoustic waves are then solved using implicit time integration, while the slow material waves are solved explicitly. There are also fully implicit time integration codes for the compressible Euler equations (Kifonidis & Müller 2012; Viallet et al. 2015; Goffrey et al. 2017). The MUSIC code uses fully implicit time integration for the compressible Euler equations, which therefore allows for arbitrarily large time steps. To resolve potential pressure scaling issues in very low Mach number flow in explicit or implicit schemes, another approach is to use preconditioned all-Mach number solvers (Miczek et al. 2015; Barsukow et al. 2016) to properly capture pressure fluctuations in the low Mach number regime (see Guillard & Nkonga 2017 for discussion). The numerical flux is multiplied by a preconditioning matrix that reduces the stiffness of the system at low Mach numbers, while retaining the correct scaling behavior in an acoustic limit of the Euler equations that retains ${ \mathcal O }$ (Ma) pressure fluctuations.

Previously, we developed the low Mach number astrophysical solver, MAESTRO. MAESTRO is a block-structured, Cartesian grid finite-volume, adaptive mesh refinement (AMR) code that has been successfully used for many years for a number of applications, detailed below. Unlike several of the references above, MAESTRO is not an all-Mach solver, but is suitable for flows where the Mach number is small (∼0.1 or smaller). Furthermore, the low Mach number model in MAESTRO is specifically designed for, but not limited to, astrophysical settings with significant atmospheric stratification. This includes full spherical stars, as well as planar simulations of dynamics within localized regions of a star. The numerical methodology relies on an explicit Godunov approach for advection, a stiff ordinary differential equation solver for reactions (VODE; Brown et al. 1989), and multigrid-based linear solvers for the pressure-projection steps. Thus, the time step is limited by an advective CFL constraint based on the fluid velocity, not the sound speed. Central to the algorithm are time-varying, one-dimensional stratified background (or base) state density and pressure fields that are held in hydrostatic equilibrium. The base state density couples to the full state solution through buoyancy terms in the momentum equation, and the base state pressure couples to the full state solution by constraining the evolution of the thermodynamic variables to match this pressure. The time-advancement strategy uses Strang splitting to integrate the thermodynamic variables, a second-order projection method to integrate the velocity subject to a divergence constraint, and a velocity-splitting scheme that uses a radially averaged velocity to hydrodynamically evolve the base state. The original MAESTRO code was developed in the pure-Fortran 90 FBoxLib software framework, whereas MAESTROeX is developed in the C++/F90 AMReX framework (AMReX Development Team et al. 2019; Zhang et al. 2019).

The key numerical developments of the original MAESTRO algorithm are presented in a series of papers which we refer to as Papers I–V:

1.
In Paper I (Almgren et al. 2006a), we derive the low Mach number equation set for stratified environments from the fully compressible equations.
2.
In Paper II (Almgren et al. 2006b), we incorporate the effects of atmospheric expansion through the use of a time-dependent background state.
3.
In Paper III (Almgren et al. 2008), we incorporate reactions and the associated coupling to the hydrodynamics.
4.
In Paper IV (Zingale et al. 2009), we describe our treatment of spherical stars in a three-dimensional Cartesian geometry.
5.
In Paper V (Nonaka et al. 2010), we describe the use of block-structured AMR to focus spatial resolution in regions of interest.

Since then, there have been many scientific investigations using MAESTRO, which have included additional algorithmic enhancements. Topics include:

1.
The convective phase preceding Chandrasekhar mass models for SNe Ia (Nonaka et al. 2011; Zingale et al. 2011; Malone et al. 2014a).
2.
Convection in massive stars (Gilet et al. 2013; Gilkis & Soker 2016).
3.
Sub-Chandrasekhar white dwarfs (Zingale et al. 2013; Jacobs et al. 2016).
4.
Type I X-ray bursts (Malone et al. 2011, 2014b; Zingale et al. 2015).

In this paper, we present new algorithmic methodology that improves upon Paper V in a number of ways. First, the overall temporal algorithm has been greatly simplified without compromising second-order accuracy. The key design decisions were to eliminate the splitting of the velocity into average and perturbational components, and also to replace the hydrodynamic evolution of the base state with a predictor-corrector approach. Not only does this greatly simplify the dynamics of the base state, but this treatment is more amenable to higher-order multiphysics coupling strategies based on method-of-lines integration. In particular, schemes based on deferred corrections (Dutt et al. 2000) have been used to generate high-order temporal integrators for problems of reactive flow and low Mach number combustion (Pazner et al. 2016; Nonaka et al. 2018). Second, we explore the effects of alternative spatial mapping routines for coupling the base state and the Cartesian grid state for spherical problems. Finally, we examine the performance of our new MAESTROeX implementation in the new C++/F90 AMReX public software library (AMReX Development Team et al. 2019; Zhang et al. 2019). MAESTROeX uses MPI+OpenMP parallelism and scales well to over 10,000 MPI processes, with each MPI process supporting tens of threads. The resulting code is publicly available on GitHub (https://github.com/AMReX-Astro/MAESTROeX), uses the Starkiller-Astro microphysics libraries (The StarKiller Microphysics Development Team et al. 2019, https://github.com/starkiller-astro), as well as AMReX (https://github.com/AMReX-Codes/amrex).

The rest of this paper is organized as follows. In Section 2 we review our model for stratified low Mach number astrophysical flow. In Section 3 we present our numerical algorithm in detail, highlighting the new temporal integration scheme as well as spatial base state mapping options. In Section 4 we validate our new approach and examine the performance of our algorithm on full spherical star problems used in previous scientific investigations. We conclude in Section 5.

2. Governing Equations

Low Mach number models for reacting flow were originally derived using asymptotic analysis (Rehm & Baum 1978; Majda & Sethian 1985) and used in terrestrial combustion applications (Knio et al. 1999; Day & Bell 2000). These models have been extended to nuclear flames in astrophysical environments using adaptive algorithms in space and time (Bell et al. 2004). In Papers I–III, we extended this work and the atmospheric model by Durran (1989) by deriving a model and algorithm suitable for stratified astrophysical flow. We take the standard equations of reacting, compressible flow, and recast the equation of state (EOS) as a divergence constraint on the velocity field. The resulting model is a series of evolution equations for mass, momentum, and energy, subject to an additional constraint on velocity. The evolution equations are

$\begin{eqnarray}&&\displaystyle \frac{\partial (\rho {X}_{k})}{\partial t}=-{\rm{\nabla }}\cdot (\rho {X}_{k}{\boldsymbol{U}})+\rho {\dot{\omega }}_{k},\end{eqnarray} \tag{ 1 }$

$\begin{eqnarray}&&\displaystyle \frac{\partial {\boldsymbol{U}}}{\partial t}=-{\boldsymbol{U}}\cdot {\rm{\nabla }}{\boldsymbol{U}}-\displaystyle \frac{{\beta }_{0}}{\rho }{\rm{\nabla }}\left(\displaystyle \frac{\pi }{{\beta }_{0}}\right)-\displaystyle \frac{\rho -{\rho }_{0}}{\rho }g{{\boldsymbol{e}}}_{r},\end{eqnarray} \tag{ 2 }$

$\begin{eqnarray}&&\displaystyle \frac{\partial (\rho h)}{\partial t}=-{\rm{\nabla }}\cdot (\rho h{\boldsymbol{U}})+\displaystyle \frac{{{Dp}}_{0}}{{Dt}}+\rho {H}_{\mathrm{nuc}}.\end{eqnarray} \tag{ 3 }$

Here ρ, ${\boldsymbol{U}}$ , and h are the mass density, velocity, and specific enthalpy, respectively, and X_k are the mass fractions of species k with associated production rate ${\dot{\omega }}_{k}$ and energy release per time per unit mass ${H}_{\mathrm{nuc}}$ . The species are constrained such that ${\sum }_{k}{X}_{k}=1$ giving $\rho ={\sum }_{k}(\rho {X}_{k})$ and

$\begin{eqnarray}&&\displaystyle \frac{\partial \rho }{\partial t}=-{\rm{\nabla }}\cdot (\rho {\boldsymbol{U}}).\end{eqnarray} \tag{ 4 }$

The total pressure is decomposed into a one-dimensional hydrostatic base state pressure, ${p}_{0}={p}_{0}(r,t)$ , and a dynamic pressure, $\pi =\pi ({\boldsymbol{x}},t)$ , such that $p={p}_{0}+\pi$ and $| \pi | /{p}_{0}\,={ \mathcal O }({\mathrm{Ma}}^{2})$ (we use ${\boldsymbol{x}}$ to represent the Cartesian coordinate directions of the full state and r to represent the radial coordinate direction for the base state). One way to mathematically think of the difference between p₀ and π is that π controls the velocity evolution in a way that forces the thermodynamic variables $(\rho ,h,{X}_{k})$ to evolve in a manner that is consistent with the EOS and p₀.

By comparing the momentum Equation (2) to the momentum equation used in Equation (2) in Paper V, we note that we are using a formulation that enforces conservation of total energy in the low Mach number system in the absence of external heating or viscous terms (Klein & Pauluis 2012; Vasil et al. 2013). We have previously validated this approach in modeling sub-Chandrasekhar white dwarfs using MAESTRO (Jacobs et al. 2016). We also define a one-dimensional base state density, ${\rho }_{0}={\rho }_{0}(r,t)$ , that represents the lateral average (see Section 3.1) of ρ and is in hydrostatic equilibrium with p₀, i.e.,

$\begin{eqnarray}&&{\rm{\nabla }}{p}_{0}=-{\rho }_{0}g{{\boldsymbol{e}}}_{r},\end{eqnarray} \tag{ 5 }$

where $g=g(r,t)$ is the magnitude of the gravitational acceleration and ${{\boldsymbol{e}}}_{r}$ is the unit vector in the outward radial direction. Here β₀ is a density-like variable that carries background stratification, defined as

$\begin{eqnarray}&&{\beta }_{0}(r,t)={\rho }_{0}(0,t)\exp \left({\int }_{0}^{r}\displaystyle \frac{1}{{\overline{{\rm{\Gamma }}}}_{1}{p}_{0}}\displaystyle \frac{\partial {p}_{0}}{\partial r^{\prime} }{dr}^{\prime} \right),\end{eqnarray} \tag{ 6 }$

where ${\overline{{\rm{\Gamma }}}}_{1}$ is the lateral average of ${{\rm{\Gamma }}}_{1}=d(\mathrm{log}p)/d(\mathrm{log}\rho ){| }_{s}$ (evaluated with entropy, s, constant). We explored the effect of replacing ${{\rm{\Gamma }}}_{1}$ with ${\overline{{\rm{\Gamma }}}}_{1}$ as well as a correction term in Paper III. Thermal diffusion is not discussed in this paper, but we have previously described the modifications to the original algorithm required for implicit thermal diffusion in Malone et al. (2011); inclusion of these effects in the new algorithm presented here is straightforward.

Mathematically, Equations (2) and (3) must still be closed by the EOS. This is done by taking the Lagrangian derivative of the EOS for pressure as a function of the thermodynamic variables, substituting in the equations of motion for mass and energy, and requiring that the pressure is a prescribed function of altitude and time based on the hydrostatic equilibrium condition. See Papers I and II for details of this derivation. The resulting equation is a divergence constraint on the velocity field,

$\begin{eqnarray}&&{\rm{\nabla }}\cdot ({\beta }_{0}{\boldsymbol{U}})={\beta }_{0}\left(S-\displaystyle \frac{1}{{\overline{{\rm{\Gamma }}}}_{1}{p}_{0}}\displaystyle \frac{\partial {p}_{0}}{\partial t}\right).\end{eqnarray} \tag{ 7 }$

The expansion term, S, incorporates local compressibility effects due to compositional changes and heat release from reactions,

$\begin{eqnarray}&&S=-\sigma \displaystyle \sum _{k}{\xi }_{k}{\dot{\omega }}_{k}+\displaystyle \frac{1}{\rho {p}_{\rho }}\displaystyle \sum _{k}{p}_{{X}_{k}}{\dot{\omega }}_{k}+\sigma {H}_{\mathrm{nuc}},\end{eqnarray} \tag{ 8 }$

where ${p}_{{X}_{k}}\equiv {\left.\partial p/\partial {X}_{k}\right|}_{\rho ,T,{X}_{j,j\ne k}}$ , ${\xi }_{k}\equiv {\left.\partial h/\partial {X}_{k}\right|}_{p,T,{X}_{j,j\ne k}}$ , ${p}_{\rho }\,\equiv {\left.\partial p/\partial \rho \right|}_{T,{X}_{k}}$ , and $\sigma \equiv {p}_{T}/(\rho {c}_{p}{p}_{\rho })$ , with ${p}_{T}\equiv {\left.\partial p/\partial T\right|}_{\rho ,{X}_{k}}$ and ${c}_{p}\equiv {\left.\partial h/\partial T\right|}_{p,{X}_{k}}$ is the specific heat at constant pressure.

To summarize, we model evolution equations for momentum, mass, and energy, Equations (2) and (3), subject to a divergence constraint on the velocity, Equation (7), and the hydrostatic equilibrium condition, Equation (5).

3. Numerical Algorithm

3.1. Spatial Discretization

The spatial discretization and AMR methodology remains unchanged from Paper V. We now summarize some of the key points here before describing the new temporal integrator in the next section. We recommend the reader review Section 3 of Paper V for further details.

We shall refer to local atmospheric flows in two and three dimensions as problems in "planar" geometry, and full-star flows in three dimensions as problems in "spherical" geometry. The solution in both cases consists of the Cartesian grid solution and the one-dimensional base state solution. Figure 1 illustrates the relationship between the base state and the Cartesian grid state for both planar and spherical geometries in the presence of spatially adaptive grids. One of the key numerical modules is the "lateral average," which computes the average over a layer of constant radius of a Cartesian grid variable and stores the result in a one-dimensional base state array. In planar geometries, this is a straightforward arithmetic average of values in cells at a particular height since the base state cell centers are in alignment with the Cartesian grid cell centers. However for spherical problems, the procedure is much more complicated. In Section 4 of Paper V, we describe how there is a finite, easily computable set of radii that any three-dimensional Cartesian cell center can map to. Specifically, for every three-dimensional Cartesian cell, there exists an integer m such that the distance from the cell center to the center of the star is given by

$\begin{eqnarray}&&{\hat{r}}_{m}={\rm{\Delta }}x\sqrt{0.75+2m}.\end{eqnarray} \tag{ 9 }$

**Figure 1.** (Left) For multilevel problems in planar geometry, we force a direct alignment between the base state cell centers and the Cartesian grid cell centers by allowing the radial base state spacing to change with space and time. (Right) For multilevel problems in spherical geometry, since there is no direct alignment between the base state cell centers and the Cartesian grid cell centers, we choose to fix the radial base state spacing across levels. Reprinted from Paper V (Nonaka et al. 2010).
Download figure:
Standard image High-resolution image

Figure 2 is a two-dimensional illustration (two dimensions is chosen in the figure for ease of exposition; this mapping is only used for three-dimensional spherical stars) of the relationship between the Cartesian grid state and the one-dimensional base state array. We compute the lateral average by first summing the values in all the cells associated with each radius, dividing by the number of contributing cells to obtain the arithmetic average, and then using quadratic interpolation to map this data onto a one-dimensional base state. Previously, for spherical problems MAESTRO only allowed for a base state with constant Δr (typically equal to Δx/5).

The companion "fill" module maps a base state array onto the full Cartesian grid state. For planar problems, direct injection can be used due to the perfect alignment of the base state and Cartesian grid state. For spherical problems, quadratic interpolation of the base state is used to assign values to each Cartesian cell center.

In this paper we explore a new option to retain an irregularly spaced base state to eliminate mapping errors from the "fill" module. For the lateral average, as before we first sum the values in all the cells associated with each radius and divide by the number of contributing cells to obtain the arithmetic average. However we do not interpolate this onto a uniformly spaced base state and retain the use of this irregularly spaced base state. The advantage with this approach is that the fill module does not require any interpolation. A potential benefit to eliminating the mapping error is to consider a spherical star in hydrostatic equilibrium at rest. In the absence of reactions, the star should remain at rest. The buoyancy forcing term in the momentum equation contains $\rho \mbox{--}{\rho }_{0}$ . With the original scheme, interpolation errors in computing ρ₀ by averaging would cause artificial acceleration in the velocity field due to the interpolation error from the Cartesian grid to and from the radial base state. By retaining the radial base state as an irregularly spaced array, the truncation error in the mapping scheme is completely eliminated. The only source of error is due to machine precision effects resulting from averaging a large number of values, which is not significant over the course of any simulation. We note that Δr decreases as the base state moves further from the center of the star, which results in far more total cells in the irregularly spaced array than the previous uniformly spaced array.

3.2. Temporal Integration Scheme

We now describe the new temporal integration scheme, noting that it can be used for the original base state mapping (with uniform base state grid spacing) as well as the new irregularly spaced base state mapping. Previously we adopted an approach where we split the velocity into a base state component, ${w}_{0}(r,t)$ , and a local velocity $\widetilde{{\boldsymbol{U}}}({\boldsymbol{x}},t)$ , so that

$\begin{eqnarray}&&{\boldsymbol{U}}=\widetilde{{\boldsymbol{U}}}({\boldsymbol{x}},t)+{w}_{0}(r,t){{\boldsymbol{e}}}_{r},\end{eqnarray} \tag{ 10 }$

where ${{\boldsymbol{e}}}_{r}$ is the normal vector in the outward radial direction. We used w₀ to provide an estimate of the base state density evolution over a time step. This resulted in some unnecessary complications to the temporal integration scheme including base state advection modules for density, enthalpy, and velocity, as well as more cumbersome split velocity dynamics evolution equations. Our new temporal integration scheme uses full velocities for scalar and velocity advection, and only uses the above splitting to satisfy the velocity divergence constraint due to boundary considerations at the edge of the star. This results in a much simpler numerical scheme than the one from Paper V since we use the velocity directly rather than more complex terms involving the perturbational velocity. Additionally, the new scheme uses a simpler predictor-corrector approach to the base state density and pressure that no longer requires evolution equations and numerical discretizations to update the base state, greatly simplifying the algorithm while retaining the same overall second-order accuracy in time.

At the beginning of each time step we have the cell-centered Cartesian grid state, ${({\boldsymbol{U}},\rho {X}_{k},\rho h)}^{n}$ , and nodal Cartesian grid state, ${\pi }^{n-1/2}$ , and base state ${({\rho }_{0},{p}_{0})}^{n}$ . At any time, the associated density, composition, and enthalpy can be trivially computed using, e.g.,

$\begin{eqnarray}&&{\rho }^{n}=\displaystyle \sum _{k}{\left(\rho {X}_{k}\right)}^{n},\quad {X}_{k}^{n}={\left(\rho {X}_{k}\right)}^{n}{/\rho }^{n},\quad {h}^{n}={\left(\rho h\right)}^{n}{/\rho }^{n}.\end{eqnarray} \tag{ 11 }$

Temperature is computed using the EOS,³ e.g., $T\,=T(\rho ,{p}_{0},{X}_{k})$ , where p₀ has been mapped to the Cartesian grid using the fill module, and ( ${\overline{{\rm{\Gamma }}}}_{1},{\beta }_{0})$ are similarly computed from $(\rho ,{p}_{0},{X}_{k});$ see Appendix A of Paper I and Appendix C of Paper III for details on how β₀ is computed.

The overall flow of the algorithm begins with a second-order Strang splitting approach to integrate the advection-reaction system for the thermodynamic variables $(\rho {X}_{k},\rho h)$ , followed by a second-order projection methodology to integrate the velocities subject to a divergence constraint. Within the thermodynamic variable update we use a predictor-corrector approach to achieve second-order accuracy in time. To summarize:

1.
In Step 1 we react the thermodynamic variables over the first Δt/2 interval.
2.
In Steps 2–4 we advect the thermodynamic variables over Δt. Specifically, we compute an estimate for the expansion term, S, compute face-centered, time-centered velocities that satisfy the divergence constraint, and then advect the thermodynamic variables.
3.
In Step 5 we react the thermodynamic variables over the second Δt/2 interval.⁴
4.
In Steps 6–8 we redo the advection in Steps 2–4 but are able to use the trapezoidal rule to time-center certain quantities such as S, ρ₀, etc.
5.
In Step 9 we redo the reactions from Step 5 beginning with the improved results from Steps 6–8.
6.
In Steps 10–11 we advect the velocity, and then project these velocities so they satisfy the divergence constraint while updating π.

There are a few key numerical modules we use in each time step.

1.
Average $[\phi ]\to [\overline{\phi }]$ computes the lateral average of a quantity over a layer at constant radius r, as described above in Section 3.1.
2.
Enforce HSE $[{\rho }_{0}]\to [{p}_{0}]$ computes the base state pressure, p₀, from a base state density, ρ₀ by integrating the hydrostatic equilibrium condition in one dimension. This follows equation (A10) in Paper V, noting that for the irregularly spaced base state case, Δr is not constant, where Δr_j+1/2 = r_j+1−r_j for cell face with index j+1/2. The base state pressure remains equal to a constant value at the location of a prescribed cutoff density outward for the entire simulation.
3.
React State $[{(\rho {X}_{k})}^{\mathrm{in}},\,{(\rho h)}^{\mathrm{in}},\,{p}_{0}]\,\to \,[{(\rho {X}_{k})}^{\mathrm{out}},\,{(\rho h)}^{\mathrm{out}},\,(\rho \dot{\omega }),(\rho {H}_{\mathrm{nuc}})]$ uses the multistep variable-coefficient Adams–Moulton and Backward Differentiation Formula methods in the VODE (Brown et al. 1989) package to integrate the species and enthalpy due to reactions over Δt/2 by solving
$\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{{{dX}}_{k}}{{dt}} & = & {\dot{\omega }}_{k}(\rho ,{X}_{k},T);\\ \displaystyle \frac{{dT}}{{dt}} & = & \displaystyle \frac{1}{{c}_{p}}\left(-\displaystyle \sum _{k}{\xi }_{k}{\dot{\omega }}_{k}+{H}_{\mathrm{nuc}})\right).\end{array}\end{eqnarray} \tag{ 12 }$
The inputs are the species, enthalpy, and base state pressure, and the outputs are the species, enthalpy, reaction rates, and nuclear energy generation rate. See Paper III for details.

Each time step is constrained by the standard advective CFL condition,

$\begin{eqnarray}&&{\rm{\Delta }}t={\sigma }^{\mathrm{CFL}}\mathop{\min }\limits_{i}({\rm{\Delta }}x/{U}_{i}),\end{eqnarray} \tag{ 13 }$

where for our simulations we typically use ${\sigma }^{\mathrm{CFL}}\sim 0.7$ and the minimum is taken over all spatial directions over all cells. For simulations that are initialized with zero velocity, we use the buoyancy force in the momentum equation to give a proper timescale to start the simulation; see Section 3.4 in Paper III for details.

In stratified low Mach number models, due to the extreme variation in density, the velocity can become very large in low density regions at the edge of the star. These large velocities can severely affect the time step, so throughout Papers II–V, we have employed two techniques to help control these dynamics without significantly affecting the dynamics in the convective region. First, we use a cutoff density technique, where we hold the density constant outside a specified radius (typically near where the density is ∼4 orders of magnitude smaller than the largest densities in the simulation). Second, we employ a sponge technique where we artificially damp the velocities near and beyond the cutoff region. For more details, refer to Paper V and the previous references cited within.

Beginning with ${({\boldsymbol{U}},\rho {X}_{k},\rho h)}^{n}$ , ${\pi }^{n-1/2}$ , and ${({\rho }_{0},{p}_{0})}^{n}$ , the temporal integration scheme contains the following steps:

Step 1: react the thermodynamic variables over the first Δt/2 interval. Call React State [(ρX_k)ⁿ, (ρh)ⁿ, ${p}_{0}^{n}$ ] → [ ${\left(\rho {X}_{k}\right)}^{\left(1\right)},{\left(\rho h\right)}^{\left(1\right)}$ , ${\left(\rho {\dot{\omega }}_{k}\right)}^{\left(1\right)},{\left(\rho {H}_{\mathrm{nuc}}\right)}^{\left(1\right)}$ ].
Step 2: compute the time-centered expansion term, ${S}^{n+1/2,\star }$ .We compute an estimate for the time-centered expansion term in the velocity divergence constraint. Following Bell et al. (2004), we extrapolate to the half-time using S at the previous and current time steps,
$\begin{eqnarray}&&{S}^{n+1/2,\star }={S}^{n}+\displaystyle \frac{{\rm{\Delta }}{t}^{n}}{2}\left(\displaystyle \frac{{S}^{n}-{S}^{n-1}}{{\rm{\Delta }}{t}^{n-1}}\right).\end{eqnarray} \tag{ 14 }$
Note that in the first time step we average S⁰ and S¹ from the initialization step.
Step 3: construct a face-centered, time-centered advective velocity, ${{\boldsymbol{U}}}^{\mathrm{ADV},\star }$ .The construction of face-centered time-centered states used to discretize the advection terms for velocity, species, and enthalpy, are performed using a standard multidimensional corner transport upwind approach (Colella 1990; Saltzman 1994) with the piecewise-parabolic method one-dimensional tracing (Colella & Woodward 1984). The full details of this Godunov advection approach for all steps in this algorithm are described in Appendix A of Zingale et al. (2015). Here we use Equation (2) to compute face-centered, time-centered velocities, ${{\boldsymbol{U}}}^{\mathrm{ADV},\dagger ,\star }$ . The † superscript refers to the fact that the predicted velocity field does not satisfy the divergence constraint,
$\begin{eqnarray}&&{\rm{\nabla }}\cdot \left({\beta }_{0}^{n}{{\boldsymbol{U}}}^{\mathrm{ADV},\star }\right)={\beta }_{0}^{n}\left[{S}^{n+1/2* * ,\star }-\displaystyle \frac{1}{{\overline{{\rm{\Gamma }}}}_{1}^{n}{p}_{0}^{n}}{\left(\displaystyle \frac{\partial {p}_{0}}{\partial t}\right)}^{n-1/2}\right].\end{eqnarray} \tag{ 15 }$
We project ${{\boldsymbol{U}}}^{\mathrm{ADV},\dagger ,\star }$ onto the space of velocities that satisfy the constraint to obtain ${{\boldsymbol{U}}}^{\mathrm{ADV},\star }$ . Each projection step in the algorithm involves the solution of a variable-coefficient Poisson solve using multigrid. Note that we still employ velocity-splitting as described by Equation (10) for this step in order to enforce the appropriate behavior of the system near the edge of the star as determined by the cutoff density. The details of this "MAC" projection are provided in Appendix.
Step 4: advect the thermodynamic variables over a time interval of ${\rm{\Delta }}t.$
- A.
  Update $(\rho {X}_{k})$ using a discretized version of
  $\begin{eqnarray}&&\displaystyle \frac{\partial (\rho {X}_{k})}{\partial t}=-{\rm{\nabla }}\cdot (\rho {X}_{k}{\boldsymbol{U}}),\end{eqnarray} \tag{ 16 }$
  where the reaction terms have been omitted since they were already accounted for in React State. The update consists of two steps:
  - i.
    Compute the face-centered, time-centered species, ${(\rho {X}_{k})}^{n+1/2,\mathrm{pred},\star }$ , for the conservative update of ${(\rho {X}_{k})}^{(1)}$ using a Godunov approach (Zingale et al. 2015). As described in Paper V, for robust numerical slope limiting we predict ${\rho }^{{\prime} n}={\rho }^{n}-{\rho }_{0}^{n}$ and X_kⁿ to faces and here we spatially interpolate ${\rho }_{0}^{n}$ to faces to assemble the fluxes.
  - ii.
    Evolve (ρX_k)⁽¹⁾ → (ρX_k)^(2),⋆ using
    $\begin{eqnarray}\begin{array}{rcl}{\left(\rho {X}_{k}\right)}^{(2),\star } & = & {\left(\rho {X}_{k}\right)}^{\left(1\right)}\\ & & -{\rm{\Delta }}t\left\{{\rm{\nabla }}\cdot \left[{{\boldsymbol{U}}}^{\mathrm{ADV},\star }{\left(\rho {X}_{k}\right)}^{n+1/2,\mathrm{pred},\star }\right]\right\}.\end{array}\end{eqnarray} \tag{ 17 }$
- B.
  Update ρ₀ by calling Average $[{\rho }^{(2),\star }]\to [{\rho }_{0}^{n+1,\star }]$ .
- C.
  Update p₀ by calling Enforce HSE $[{\rho }_{0}^{n+1,\star }]\,\to \,[{p}_{0}^{n+1,\star }]$ .
- D.
  Update the enthalpy using a discretized version of equation
  $\begin{eqnarray}&&\displaystyle \frac{\partial (\rho h)}{\partial t}=-{\rm{\nabla }}\cdot (\rho h{\boldsymbol{U}})+\displaystyle \frac{{{Dp}}_{0}}{{Dt}}+\rho {H}_{\mathrm{nuc}},\end{eqnarray} \tag{ 18 }$
  again omitting the reaction terms since we already accounted for them in React State. This equation takes the form:
  $\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{\partial (\rho h)}{\partial t} & = & -{\rm{\nabla }}\cdot (\rho h{\boldsymbol{U}})+\displaystyle \frac{\partial {p}_{0}}{\partial t}\\ & & +({\boldsymbol{U}}\cdot {{\boldsymbol{e}}}_{r})\displaystyle \frac{\partial {p}_{0}}{\partial r}.\end{array}\end{eqnarray} \tag{ 19 }$
  For spherical geometry, we solve the analytically equivalent form,
  $\begin{eqnarray}\begin{array}{rcl}\displaystyle \frac{\partial (\rho h)}{\partial t} & = & -{\rm{\nabla }}\cdot (\rho h{\boldsymbol{U}})+\displaystyle \frac{\partial {p}_{0}}{\partial t}\\ & & +{\rm{\nabla }}\cdot ({\boldsymbol{U}}{p}_{0})-{p}_{0}{\rm{\nabla }}\cdot {\boldsymbol{U}}.\end{array}\end{eqnarray} \tag{ 20 }$
  The update consists of two steps:
  - i.
    Compute the face-centered, time-centered enthalpy, ${(\rho h)}^{n+1/2,\mathrm{pred},\star },$ for the conservative update of ${(\rho h)}^{(1)}$ using a Godunov approach (Zingale et al. 2015). As described in Paper V, for robust numerical slope limiting we predict ${(\rho h)}^{{\prime} n}\,={(\rho h)}^{n}-(\rho h{)}_{0}^{n}$ to faces, where $(\rho h{)}_{0}^{n}$ is obtained by calling Average $[{(\rho h)}^{n}]\to [(\rho h{)}_{0}^{n}]$ , and here we spatially interpolate $(\rho h{)}_{0}^{n}$ to faces to assemble the fluxes.
  - ii.
    Evolve (ρh)⁽¹⁾ → (ρh)^(2),⋆ using
    $\begin{eqnarray}\begin{array}{rcl}{\left(\rho h\right)}^{\left(2\right),\star } & = & {\left(\rho h\right)}^{\left(1\right)}-{\rm{\Delta }}t\left\{{\rm{\nabla }}\cdot \left[{{\boldsymbol{U}}}^{\mathrm{ADV},\star }{\left(\rho h\right)}^{n+1/2,\mathrm{pred},\star }\right]\right\}\\ & & +{\rm{\Delta }}t\displaystyle \frac{{{Dp}}_{0}}{{Dt}}\end{array}\end{eqnarray} \tag{ 21 }$
    where here
    $\begin{eqnarray}\begin{array}{r}\displaystyle \frac{{Dp}_{0}}{Dt}=\left\{\begin{array}{cc}{\textstyle \tfrac{{p}_{0}^{n+1,\ast }-{p}_{0}^{n}}{{\rm{\Delta }}t}}+\left({{\boldsymbol{U}}}^{{\rm{A}}{\rm{D}}{\rm{V}},\star }\cdot {{\boldsymbol{e}}}_{r}\right){\left({\textstyle \tfrac{{\rm{\partial }}{p}_{0}}{{\rm{\partial }}r}}\right)}^{n} & ({\rm{p}}{\rm{l}}{\rm{a}}{\rm{n}}{\rm{a}}{\rm{r}})\\ {\textstyle \tfrac{{p}_{0}^{n+1,\ast }-{p}_{0}^{n}}{{\rm{\Delta }}t}}+\left[{\rm{\nabla }}\cdot \left({{\boldsymbol{U}}}^{{\rm{A}}{\rm{D}}{\rm{V}},\star }{p}_{0}^{n}\right)-{p}_{0}^{n}{\rm{\nabla }}\cdot {{\boldsymbol{U}}}^{{\rm{A}}{\rm{D}}{\rm{V}},\star }\right] & ({\rm{s}}{\rm{p}}{\rm{h}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{c}}{\rm{a}}{\rm{l}})\end{array}\right.,\end{array}\end{eqnarray} \tag{ 22 }$
    and ${p}_{0}^{n+1/2}=({p}_{0}^{n}+{p}_{0}^{n+1,* })/2$ .
Step 5: react the thermodynamic variables over the second Δt/2 interval. Call React State [(ρ X_k)^(2),⋆, (ρh)^(2),⋆, ${p}_{0}^{n+1,\star }]$ → [(ρX_k)^n+1,⋆, (ρh)^n+1,⋆, ${\left(\rho {\dot{\omega }}_{k}\right)}^{n+1,\star },{\left(\rho {H}_{\mathrm{nuc}}\right)}^{n+1,\star }].$
Step 6: compute the time-centered expansion term, ${S}^{n+1/2,\star }$ . First, compute ${S}^{n+1,\star }$ with
$\begin{eqnarray}&&{S}^{n+1,\star }={\left(-\sigma \displaystyle \sum _{k}{\xi }_{k}{\dot{\omega }}_{k}+\displaystyle \frac{1}{\rho {p}_{\rho }}\displaystyle \sum _{k}{p}_{{X}_{k}}{\dot{\omega }}_{k}+\sigma {H}_{\mathrm{nuc}}\right)}^{n+1,\star }.\end{eqnarray} \tag{ 23 }$
Then, define
$\begin{eqnarray}&&{S}^{n+1/2}=\displaystyle \frac{{S}^{n}+{S}^{n+1,\star }}{2},\end{eqnarray} \tag{ 24 }$
Step 7: construct a face-centered, time-centered advective velocity, ${{\boldsymbol{U}}}^{\mathrm{ADV}}$ . The procedure to construct ${{\boldsymbol{U}}}^{\mathrm{ADV},\dagger }$ is identical to the Godunov procedure for computing ${{\boldsymbol{U}}}^{\mathrm{ADV},\dagger ,\star }$ in Step 3, but uses the updated value ${S}^{n+1/2}$ rather than ${S}^{n+1/2,\star }$ . The † superscript refers to the fact that the predicted velocity field does not satisfy the divergence constraint,
$\begin{eqnarray}&&\begin{array}{l}\quad {\rm{\nabla }}\cdot \left({\beta }_{0}^{n+1/2}{{\boldsymbol{U}}}^{\mathrm{ADV}}\right)={\beta }_{0}^{n+1/2}\\ \quad \times \,\left[{S}^{n+1/2}-\displaystyle \frac{1}{{\overline{{\rm{\Gamma }}}}_{1}^{n+1/2}{p}_{0}^{n+1/2}}{\left(\displaystyle \frac{\partial {p}_{0}}{\partial t}\right)}^{n+1/2}\right],\end{array}\end{eqnarray} \tag{ 25 }$
with
$\begin{eqnarray}\begin{array}{rcl}{\beta }_{0}^{n+1/2} & = & \displaystyle \frac{{\beta }_{0}^{n}+{\beta }_{0}^{n+1,\star }}{2},\\ {\overline{{\rm{\Gamma }}}}_{1}^{n+1/2} & = & \displaystyle \frac{{\overline{{\rm{\Gamma }}}}_{1}^{n}+{\overline{{\rm{\Gamma }}}}_{1}^{n+1,\star }}{2}.\end{array}\end{eqnarray} \tag{ 26 }$
we project ${{\boldsymbol{U}}}^{\mathrm{ADV},\dagger }$ onto the space of velocities that satisfy the constraint to obtain ${{\boldsymbol{U}}}^{\mathrm{ADV}}$ using a MAC projection (see Appendix).
Step 8: advect the thermodynamic variables over a time interval of ${\rm{\Delta }}t.$
- A.
  Update $(\rho {X}_{k})$ . This step is identical to Step 4A except we use the updated values ${{\boldsymbol{U}}}^{\mathrm{ADV}}$ and ${\rho }_{0}^{n+1,\star }$ rather than ${{\boldsymbol{U}}}^{\mathrm{ADV},\star }$ and ${\rho }_{0}^{n}$ . In particular:
  - i.
    Compute the face-centered, time-centered species, ${(\rho {X}_{k})}^{n+1/2,\mathrm{pred}}$ , for the conservative update of ${(\rho {X}_{k})}^{(1)}$ using a Godunov approach (Zingale et al. 2015). Again, we predict ${\rho }^{{\prime} n}={\rho }^{n}-{\rho }_{0}^{n}$ and X_kⁿ to faces but here we spatially interpolate $({\rho }_{0}^{n}+{\rho }_{0}^{n+1,* })/2$ to faces to assemble the fluxes.
  - ii.
    Evolve (ρX_k)⁽¹⁾ → (ρX_k)⁽²⁾ using
    $\begin{eqnarray}\begin{array}{rcl}{\left(\rho {X}_{k}\right)}^{\left(2\right)} & = & {\left(\rho {X}_{k}\right)}^{\left(1\right)}\\ & & -{\rm{\Delta }}t\left\{{\rm{\nabla }}\cdot \left[{{\boldsymbol{U}}}^{\mathrm{ADV}}{\left(\rho {X}_{k}\right)}^{n+1/2,\mathrm{pred}}\right]\right\},\end{array}\end{eqnarray} \tag{ 27 }$
- B.
  Update ρ₀ by calling Average $[{\rho }^{(2)}]\to [{\rho }_{0}^{n+1}]$ .
- C.
  Update p₀ by calling Enforce HSE $[{\rho }_{0}^{n+1}]\to [{p}_{0}^{n+1}]$ .
- D.
  Update the enthalpy. This step is identical to Step 4D except we use the updated values ${{\boldsymbol{U}}}^{\mathrm{ADV}}$ , ${\rho }_{0}^{n+1}$ , $(\rho h{)}_{0}^{n+1}$ , and ${p}_{0}^{n+1}$ rather than ${{\boldsymbol{U}}}^{\mathrm{ADV},\star },{\rho }_{0}^{n}$ , $(\rho h{)}_{0}^{n}$ , and p₀ⁿ. In particular:
  - i.
    Compute the face-centered, time-centered enthalpy, ${(\rho h)}^{n+1/2,\mathrm{pred}},$ for the conservative update of ${(\rho h)}^{(1)}$ using a Godunov approach (Zingale et al. 2015). Again, we predict ${(\rho h)}^{{\prime} n}={(\rho h)}^{n}-(\rho h{)}_{0}^{n}$ to faces but here we spatially interpolate $[(\rho h{)}_{0}^{n})+(\rho h{)}_{0}^{n+1,* }]/2$ to faces to assemble the fluxes.
  - ii.
    Evolve (ρh)⁽¹⁾ → (ρh)⁽²⁾.
    $\begin{eqnarray}\begin{array}{rcl}{\left(\rho h\right)}^{\left(2\right)} & = & {\left(\rho h\right)}^{\left(1\right)}-{\rm{\Delta }}t\left\{{\rm{\nabla }}\cdot \left[{{\boldsymbol{U}}}^{\mathrm{ADV}}{\left(\rho h\right)}^{n+1/2,\mathrm{pred}}\right]\right\}\\ & & +{\rm{\Delta }}t\displaystyle \frac{{{Dp}}_{0}}{{Dt}}\end{array}\end{eqnarray} \tag{ 28 }$
    where here
    $\begin{eqnarray}\begin{array}{r}\displaystyle \frac{{Dp}_{0}}{Dt}=\left\{\begin{array}{lc}{\textstyle \tfrac{{p}_{0}^{n+1}-{p}_{0}^{n}}{{\rm{\Delta }}t}}+\left({{\boldsymbol{U}}}^{{\rm{A}}{\rm{D}}{\rm{V}}}\cdot {{\boldsymbol{e}}}_{r}\right){\left({\textstyle \tfrac{{\rm{\partial }}{p}_{0}}{{\rm{\partial }}r}}\right)}^{n+1/2} & ({\rm{p}}{\rm{l}}{\rm{a}}{\rm{n}}{\rm{a}}{\rm{r}})\\ {\textstyle \tfrac{{p}_{0}^{n+1}-{p}_{0}^{n}}{{\rm{\Delta }}t}}+\left[{\rm{\nabla }}\cdot \left({{\boldsymbol{U}}}^{{\rm{A}}{\rm{D}}{\rm{V}}}{p}_{0}^{n+1/2}\right)-{p}_{0}^{n+1/2}{\rm{\nabla }}\cdot {{\boldsymbol{U}}}^{{\rm{A}}{\rm{D}}{\rm{V}}}\right] & ({\rm{s}}{\rm{p}}{\rm{h}}{\rm{e}}{\rm{r}}{\rm{i}}{\rm{c}}{\rm{a}}{\rm{l}})\end{array}\right.,\end{array}\end{eqnarray} \tag{ 29 }$
    and ${p}_{0}^{n+1/2}=({p}_{0}^{n}+{p}_{0}^{n+1})/2$ .
Step 9: react the thermodynamic variables over the second Δt/2 interval. Call React State [(ρX_k)⁽²⁾, (ρh)⁽²⁾, ${p}_{0}^{n+1}$ ] → [(ρX_k)ⁿ⁺¹, (ρh)ⁿ⁺¹, ${\left(\rho {\dot{\omega }}_{k}\right)}^{n+1},{\left(\rho {H}_{\mathrm{nuc}}\right)}^{n+1}$ ].
Step 10: define the new-time expansion term, Sⁿ⁺¹.
- A.
  Define
  $\begin{eqnarray}&&{S}^{n+1}={\left(-\sigma \displaystyle \sum _{k}{\xi }_{k}{\dot{\omega }}_{k}+\sigma {H}_{\mathrm{nuc}}+\displaystyle \frac{1}{\rho {p}_{\rho }}\displaystyle \sum _{k}{p}_{{X}_{k}}{\dot{\omega }}_{k}\right)}^{n+1}.\end{eqnarray} \tag{ 30 }$
Step 11: update the velocity. First, we compute the face-centered, time-centered velocities, ${{\boldsymbol{U}}}^{n+1/2,\mathrm{pred}}$ using a Godunov approach (Zingale et al. 2015). Then, we update the velocity field ${{\boldsymbol{U}}}^{n}$ to ${{\boldsymbol{U}}}^{n+1,\dagger }$ by discretizing Equation (2) as
$\begin{eqnarray}&&\begin{array}{l}{{\boldsymbol{U}}}^{n+1,\dagger }={{\boldsymbol{U}}}^{n}-{\rm{\Delta }}t\left[{{\boldsymbol{U}}}^{\mathrm{ADV}}\cdot {\rm{\nabla }}{{\boldsymbol{U}}}^{n+1/2,\mathrm{pred}}\right]\\ \quad -\,{\rm{\Delta }}t\left[\displaystyle \frac{{\beta }_{0}^{n+1/2}}{{\rho }^{n+1/2}}{\rm{\nabla }}\left(\displaystyle \frac{{\pi }^{n-1/2}}{{\beta }_{0}^{n-1/2}}\right)\right.\\ \quad \left.+\,\displaystyle \frac{\left({\rho }^{n+1/2}-{\rho }_{0}^{n+1/2}\right)}{{\rho }^{n+1/2}}{g}^{n+1/2}{{\boldsymbol{e}}}_{r}\right],\end{array}\end{eqnarray} \tag{ 31 }$
where
$\begin{eqnarray}\begin{array}{rcl}{\rho }^{n+1/2} & = & \displaystyle \frac{{\rho }^{n}+{\rho }^{n+1}}{2},\\ {\rho }_{0}^{n+1/2} & = & \displaystyle \frac{{\rho }_{0}^{n}+{\rho }_{0}^{n+1}}{2}.\end{array}\end{eqnarray} \tag{ 32 }$
Again, the † superscript refers to the fact that the updated velocity does not satisfy the divergence constraint,
$\begin{eqnarray}&&\begin{array}{l}{\rm{\nabla }}\cdot \left({\beta }_{0}^{n+1}{{\boldsymbol{U}}}^{n+1}\right)\\ =\,{\beta }_{0}^{n+1}\left[{S}^{n+1}-\displaystyle \frac{1}{{\overline{{\rm{\Gamma }}}}_{1}^{n+1}{p}_{0}^{n+1}}{\left(\displaystyle \frac{\partial {p}_{0}}{\partial t}\right)}^{n+1/2}\right].\end{array}\end{eqnarray} \tag{ 33 }$
We use an approximate projection to project ${{\boldsymbol{U}}}^{n+1,\dagger }$ onto the space of velocities that satisfy the constraint to obtain ${{\boldsymbol{U}}}^{n+1}$ using a "nodal" projection. This projection necessarily differs from the MAC projection used in Step 3 and Step 7 because the velocities in those steps are defined on faces and ${{\boldsymbol{U}}}^{n+1}$ is defined at cell centers, requiring different divergence and gradient operators. Furthermore, as part of the nodal projection, we also define a nodal new-time perturbational pressure, ${\pi }^{n+1/2}$ . Refer to Appendix for more details.

This completes one step of the algorithm.

To initialize the simulation we use the same procedure described in Paper III. At the beginning of each simulation, we define $({\boldsymbol{U}},\rho {X}_{k},\rho h)$ . We set initial values for ${\boldsymbol{U}},\rho {X}_{k}$ , and $\rho h$ and perform a sequence of projections (to ensure the velocity field satisfies the divergence constraint) followed by a small number of steps of the temporal advancement scheme to iteratively find initial values for ${\pi }^{n-1/2}$ and S⁰ and S¹ for use in the first time step.

Our approach to AMR is algorithmically the same as the treatment described in Section 5 of Paper V; we refer the reader there for details. MAESTROeX supports refinement ratios of 2 between levels. We note that for spherical problems, AMR is only available for the case of a uniformly spaced base state.

4. Performance and Validation

4.1. Performance and Scaling

We perform weak scaling tests for simulations of convection preceding ignition in a spherical, full-star Chandrasekhar mass white dwarf. The simulation setup remains the same as reported in Section 3 of Nonaka et al. (2011) and originally used in Zingale et al. (2011), and thus we emphasize that these scaling tests are performed using meaningful, scientific calculations. Here, we perform simulations using ${256}^{3},{512}^{3},{768}^{3},{1024}^{3},{1280}^{3}$ , and 1536³ grid cells on a spatially uniform grid (no AMR). We divide each simulation into 64³ grids, so these simulations contain between 64 grids (256³) and 13,824 grids (1536³). These simulations were performed using the NERSC cori system on the Intel Xeon Phi (KNL) partition. Each node contains 68 cores, each capable of supporting up to four hardware threads (i.e., a maximum of 272 hardware threads per node). For these tests, we assign four MPI tasks to each node, and 16 OpenMP threads per MPI process. Each MPI task is assigned to a single grid, so our tests use between 64 and 13,824 MPI processes (i.e., between 1024 and 221,184 total OpenMP threads). For 64³ grids we discovered that using more than 16 OpenMP threads did not decrease the wallclock time due to a lack of work available per grid; in principle, one could use larger grids, fewer MPI processes, and more threads per MPI process to obtain a flatter weak scaling curve; however, the overall wallclock time would increase except for extremely large numbers of MPI processes (beyond the range we tested here). Thus, the more accurate measure of weak scaling is to consider the number of MPI processes, since the scaling plot would look virtually identical for larger thread counts. Note that the largest simulation used roughly 36% of the entire computational system.

In the left panel of Figure 3 we compare the wallclock time per time step as a function of total core count (in this case, the total number of OpenMP threads) for the original FBoxLib-based MAESTRO implementation to the AMReX MAESTROeX implementation. These tests were performed using the original temporal integration strategy in Nonaka et al. (2010), noting that the new temporal integration with and without the irregular base state gives essentially the same results. We also include a plot of MAESTROeX without base state evolution. Comparing the original and new implementations, we see similar scaling results except for the largest simulation, where MAESTROeX performs better. We see that the increase in wallclock time from the smallest to largest simulations is roughly 42%. We also note that without base state evolution, the code runs 14% faster for small problems, and scales much better with wallclock time from the smallest to largest simulation increasing by only 13%. This is quite remarkable since there are three linear solves per time step (two cell-centered Poisson solves used in the MAC projection, and a nodal Poisson solve used to compute the updated cell-centered velocities). Contrary to our prior assumptions, the linear solves are not the primary scaling bottleneck in this code. In the right panel of Figure 3, we isolate the wallclock time required for these linear solves and see that (i) the linear solves only use 20%–23% of the total computational time, and (ii) the increase in the solver wallclock time from the smallest to largest simulations is only 28%. Further profiling reveals that the primary scaling bottleneck is the average operator. The averaging operator requires collecting the sum of Cartesian data onto one-dimensional arrays holding every possible mapping radius. This amounts to at least 24,384 double precision values (for the 256³ simulation) up to 883,584 values (for the 1536³ simulation). The averaging operator requires a global sum reduction over all processors, and the communication of this data is the primary scaling bottleneck. For the simulation with base state evolution, this averaging operator is only called once per time step (as opposed to 14 times per time step when base state evolution is included). The difference in total wallclock times with and without base state evolution is almost entirely due to the averaging. Note that as expected, advection, reactions, and calls to the EOS scale almost perfectly, since there is only a single parallel communication call to fill ghost cells.

4.2. White Dwarf Convection

To explore the accuracy of the new temporal algorithm, we now analyze in detail three-dimensional, full-star calculations of convection preceding ignition in a white dwarf. Again, we refer the reader to Nonaka et al. (2011) and Zingale et al. (2011) for setup details. We implement both uniformly and irregularly spaced base state with the new temporal algorithm, while only uniform base state spacing is used in the original algorithm. As in Section 3 of Nonaka et al. (2011), we choose the peak temperature and peak Mach number as the two diagnostics to compare the simulations. Figure 4 shows the evolution of both peak temperature and peak Mach number until time of ignition on a single-level grid with resolution of 256³. The simulation using the new temporal scheme with uniformly spaced base state gives the same qualitative results as the original scheme, and predicts a similar time of ignition (t = 7810 s compared to t = 7850 s for original algorithm). The simulation using the new temporal scheme with irregularly spaced base state displays a slightly different peak temperature behavior during the initial transition period $t\lt 150\,{\rm{s}}$ , which results in the difference between the curves post transition. We strongly suspect that this is a result of using a different initial model file (the resolution near the center of the star is much coarser with the irregular spacing than the uniform spacing). Fortunately, the simulation with irregular base state spacing still follows the same trend as with uniform spacing, and the star is shown to ignite at an earlier time t = 6840 s.

**Figure 4.** (Left) Peak temperature, T_peak, and (right) peak Mach number in a white dwarf until time of ignition at resolution of 256³ for three different MAESTROeX algorithms.
Download figure:
Standard image High-resolution image

Figure 5 shows the peak temperature evolution over the first 1000 s on two grids of differing resolutions, 256³ and 512³. Limited allocations prevented us from running this simulation further. As previously suspected, the simulation using irregularly spaced base state agrees much closer with the results from using uniform spacing as the resolution increases. This is most likely due to the increased resolution of the initial model, which more closely matches the uniformly spaced counterpart. This is especially important when computing the base state pressure from base state density, which is particularly sensitive to coarse resolution near the center of the star.

In terms of efficiency, all three simulations on the 256³ single-level grid were run on Cori haswell with 64 processors and 8 threads per core and their run times were compared. As a result of simplifying the algorithm by eliminating the evolution equations for the base state density and pressure, the simulation using the new temporal algorithm took only 6.75 s per time step with uniformly spaced base state, which is 13% faster than the 7.77 s per time step when using the original scheme. However, we do observe a 25% increase in run time of 9.72 s when using irregularly spaced base state with the new algorithm. This can be explained by the irregularly spaced base state array being much larger in size than its uniformly spaced counterpart, and thus require additional communication and computation time. One possible strategy to significantly reduce the run time is to consider truncating the base state beyond the cutoff density radius.

4.3. AMR Performance

We now test the performance of MAESTROeX for adaptive, three-dimensional simulations to track localized regions of interest over time. Figure 6 illustrates the initial grid structures with two levels of refinement for both planar and spherical geometries where the grid is refined according to the temperature and density profiles, respectively. For each of the problems we tested, the single-level simulation was run using the original temporal scheme and the adaptive simulations using the original and new temporal algorithms. We want to show that the adaptive simulation can give similar results to the single-level simulation and in a more computationally efficient manner.

**Figure 6.** Initial grid structures with two levels of refinement. The red, green, and black lines represent grids of increasing refinement. (Left) Profile of $T\mbox{--}\bar{T}$ for a hot bubble in a white dwarf environment. (Right) Region of star where ρ ≥ 10⁵ g cm⁻³ in a full white dwarf star simulation.
Download figure:
Standard image High-resolution image

In the planar case, we use the same problem setup for a hot bubble rising in a white dwarf environment as described in Section 6 of Paper V. Here we use a domain size of 3.6 × 10⁷ cm by 3.6 × 10⁷ cm by 2.88 × 10⁸ cm, and allow the grid structure to change with time. The single-level simulation at a resolution of 128² × 1024 was run on Cori haswell with 48 processors and took approximately 33.5 s per time step (averaged over 10 time steps) using either the original or new temporal algorithm. The adaptive simulation has a resolution of 32² × 256 at the coarsest level, resulting in the same effective resolution at the finest level as the single-level simulation. We tag cells that satisfy $T\mbox{--}\bar{T}\gt 3\times {10}^{7}$ K as well as all cells at that height. The adaptive run took only 3.7 s per time step, and this 89% decrease in run time is mostly due to the fact that initially only 6.25% of the cells (1,048,576 out of 128² × 1024 cells) are refined at the finest level. Figure 7 shows a series of planar slices of the temperature profile at time intervals of 1.25 s, and verifies that the adaptive simulation is able to capture the same dynamics as the single-level simulation at much lower computational cost.

**Figure 7.** Time-lapse cross-section of a hot bubble in a white dwarf environment at t = 0, 1.25, and 2.5 s for (left) single-level simulation, and (right) adaptive simulation at the same effective resolution. The red, green, and black boxes indicate grids of increasing resolution.
Download figure:
Standard image High-resolution image

We continue to use the full-star white dwarf problem described in Section 4.2 to test adaptive simulations on spherical geometry. The adaptive grid is refined twice by tagging the density at ρ > 10⁵ g cm⁻³ on the first level and ρ > 10⁸ g cm⁻³ on the second level. These tagging values have been shown to work well previously in Paper V, but we have found that the code may encounter numerical difficulties when the tagging values are too close to each other in subsequent levels of refinement. The simulation on a single-level grid of 512³ resolution took 12.7 s per time step (again averaged over 10 time steps). The adaptive grid has a resolution of 128³ at the coarsest level and an effective resolution of 512³ at the finest level. On this grid, 27.8% of the cells (4,666,880 out of 256³ cells) are refined at the first level and 5.3% (7,176,192 out of 512³ cells) at the second. The adaptive simulation took 5.61 s, resulting in more than a factor of two in speedup. Both simulations are computed to t = 2000 s and we choose to use the peak temperature as the diagnostic to compare the results. Figure 8 shows the evolution of the peak temperature for all three runs and shows that the adaptive simulation gives the same qualitative result as the single-level simulation. We do not expect the curves to match up exactly because the governing equations are highly nonlinear, and slight differences in the solution caused by solver tolerance and discretization error can change the details of the results. Each simulation was run on Cori haswell with 512 processors and 4 threads per core.

**Figure 8.** Peak temperature, T_peak, in a white dwarf from t = 0 to 2000 s for grids with effective resolution of 512³. We can see that the adaptive grids with two levels of refinement give very similar solution trends compared to the single-level grid.
Download figure:
Standard image High-resolution image

5. Conclusions and Future Work

We have developed a new temporal integrator and spatial mapping options into our low Mach number solver, MAESTROeX. The new AMReX-enabled code scales well on large fractions of supercomputers with multicore architectures. Future software enhancements will include GPU implementation. In particular, the AMReX-based companion code, the compressible CASTRO code (Almgren et al. 2010), has recently ported hydrodynamics and reactions to GPUs (A. S. Almgren et al. 2019, in preparation). We plan to leverage the newly implemented mechanisms for offloading compute kernels to GPUs inside of the AMReX software library itself. Our future scientific investigations include convection in massive rotating stars (Heger et al. 2000), the convective Urca process in white dwarfs (Willcox et al. 2016), solar physics (Wood & Brummell 2018), and magnetohydrodynamics (Wood et al. 2011; Wood & Hollerbach 2015). Our future algorithmic enhancements include more accurate and higher-order multiphysics coupling strategies based on spectral deferred corrections (Dutt et al. 2000; Bourlioux et al. 2003). This framework has been successfully used in terrestrial combustion (Pazner et al. 2016; Nonaka et al. 2018)

The work at LBNL was supported by the U.S. Department of Energy's Scientific Discovery Through Advanced Computing (SciDAC) program under contract No. DE-AC02-05CH11231. The work at Stony Brook was supported by DOE/Office of Nuclear Physics grant DE-FG02-87ER40317 and through the SciDAC program DOE grant DE-SC0017955. This research used resources of the National Energy Research Scientific Computing Center (NERSC), a U.S. Department of Energy Office of Science User Facility operated under Contract No. DE-AC02-05CH11231.

Facility: NERSC - .

Software: AMReX (AMReX Development Team et al. 2019; Zhang et al. 2019), StarKiller Microphysics (The StarKiller Microphysics Development Team et al. 2019), MAESTROX (Fan et al. 2019).

Appendix: Projection Details

To enforce the divergence constraint in Steps 3, 7, and 11, we use a projection method analogous to the methods originally developed for incompressible flow (Bell et al. 1989; Almgren et al. 1998). The basic idea is to decompose the velocity field into a part that satisfies the divergence-satisfying component and a curl-free (gradient of a scalar field) component by solving a variable-coefficient Poisson equation for the scalar field. The details for the MAC projection in Step 3 and Step 7 are given in Appendix B of Paper III. The details of the nodal projection in Step 11 are given in Section 3.2 of Paper III. We note that in the nodal projection, the gradient of the scalar field is used to update the perturbational pressure, π.

Based on our past experience in the MAESTRO project, we have found it useful to split the velocity dynamics into a perturbational and base state velocity,

$\begin{eqnarray}&&{\boldsymbol{U}}=\widetilde{{\boldsymbol{U}}}({\boldsymbol{x}},t)+{w}_{0}(r,t){{\boldsymbol{e}}}_{r},\end{eqnarray} \tag{ 34 }$

solve for each term separately, and immediately combine them to find a full velocity that satisfies the constraint. We take that approach here, primarily because it allows us to enforce a boundary condition on w₀ at the edge of the star (i.e., the cutoff density location where we hold density constant). Namely, to enforce that r²w₀ remain constant is difficult to do when solving for the full velocity. This is demonstrated in Figure 9 (right) where the velocity magnitude is observed to incorrectly increase outside the cutoff density radius when we solve for the full velocity in the nodal projection. The resulting peak temperature also dips significantly as seen in Figure 9 (left), presumably because the dynamics of the overall expansion of the star are not being captured correctly.

**Figure 9.** When solving for the full velocity ${\boldsymbol{U}}$ in the nodal projection step instead of the split velocity formulation, we illustrate (left) the peak temperature, T_peak in a white dwarf at resolution of 256³ during the early evolution of the star and (right) the magnitude of velocity at early time. We observe much larger velocities outside the density cutoff region, ρ < 10⁵ g cm⁻³, that are not seen when we split the velocity dynamics.
Download figure:
Standard image High-resolution image

**Figure 9.** When solving for the full velocity ${\boldsymbol{U}}$ in the nodal projection step instead of the split velocity formulation, we illustrate (left) the peak temperature, T_peak in a white dwarf at resolution of 256³ during the early evolution of the star and (right) the magnitude of velocity at early time. We observe much larger velocities outside the density cutoff region, ρ < 10⁵ g cm⁻³, that are not seen when we split the velocity dynamics.
Download figure:
Standard image High-resolution image

In practice, we solve the constraint over the lateral average,

$\begin{eqnarray}&&{\rm{\nabla }}\cdot \left({\beta }_{0}\widetilde{{\boldsymbol{U}}}\right)={\beta }_{0}(S-\overline{S})\end{eqnarray} \tag{ 35 }$

and separately solve for w₀ using

$\begin{eqnarray}&&{\rm{\nabla }}\cdot ({\beta }_{0}{w}_{0}{{\boldsymbol{e}}}_{r})={\beta }_{0}\left(\overline{S}-\displaystyle \frac{1}{{\overline{{\rm{\Gamma }}}}_{1}{p}_{0}}\displaystyle \frac{\partial {p}_{0}}{\partial t}\right).\end{eqnarray} \tag{ 36 }$

To solve for $\widetilde{{\boldsymbol{U}}}$ we use a projection method, which involves the solution of a variable-coefficient Poisson solver to extract the curl-free component of the unprojected velocity, leaving a velocity field that satisfies the divergence constraint. Note that MAESTRO contains alternate low Mach number formulations that conserve total energy in stratified systems, with minimal changes to the code (see Appendix A of Jacobs et al. 2016 for details). To find w₀ we integrate in one dimension using the procedure in Appendix B of Paper V, keeping in mind that the base state spacing (Δr) should be computed using the appropriate cell-edge and cell-center locations when using irregularly spaced base state. Note that in this approach, we estimated the time-derivative of the pressure in part by examining how laterally averaged ρ' = ρ−ρ₀ changed over time (quantified by ${\eta }_{\rho }=\overline{\rho ^{\prime} {\boldsymbol{U}}\cdot {{\boldsymbol{e}}}_{r}}$ ). After evolving the species (Steps 4A/8A), we compute this term as reported in Paper V. For example, after Step 8A, we define a radial cell-centered ${\eta }_{\rho }^{n+1/2}$ ,

1.
For planar geometry, ${\eta }_{\rho }=\overline{\rho ^{\prime} ({\boldsymbol{U}}\cdot {{\boldsymbol{e}}}_{r})}$ ,
$\begin{eqnarray}&&{\eta }_{\rho }^{n+1/2}={Average}\displaystyle \sum _{k}\left[\left({{\boldsymbol{U}}}^{\mathrm{ADV}}\cdot {{\boldsymbol{e}}}_{r}\right){\left(\rho {X}_{k}\right)}^{n+1/2,\mathrm{pred}}\right].\end{eqnarray} \tag{ 37 }$
2.
For spherical geometry, first construct ${\eta }_{\rho }^{\mathrm{cart},n+1/2}\,={[\rho ^{\prime} ({\boldsymbol{U}}\cdot {{\boldsymbol{e}}}_{r})]}^{n+1/2}$ on Cartesian cell centers using:
$\begin{eqnarray}&&{\eta }_{\rho }^{\mathrm{cart},n+1/2}=\left[\left(\displaystyle \frac{{\rho }^{(1)}+{\rho }^{(2)}}{2}\right)-\left(\displaystyle \frac{{\rho }_{0}^{n}+{\rho }_{0}^{n+1}}{2}\right)\right]\cdot \left({{\boldsymbol{U}}}^{\mathrm{ADV}}\cdot {{\boldsymbol{e}}}_{r}\right).\end{eqnarray} \tag{ 38 }$
Then,
$\begin{eqnarray}&&{\eta }_{\rho }^{n+1/2}={\bf{Average}}\left({\eta }_{\rho }^{\mathrm{cart},n+1/2}\right).\end{eqnarray} \tag{ 39 }$

MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Governing Equations

3. Numerical Algorithm

3.1. Spatial Discretization

3.2. Temporal Integration Scheme

4. Performance and Validation

4.1. Performance and Scaling

4.2. White Dwarf Convection

4.3. AMR Performance

5. Conclusions and Future Work

Appendix: Projection Details

Footnotes

MAESTROeX: A Massively Parallel Low Mach Number Astrophysical Solver

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Governing Equations

3. Numerical Algorithm

3.1. Spatial Discretization

3.2. Temporal Integration Scheme

4. Performance and Validation

4.1. Performance and Scaling

4.2. White Dwarf Convection

4.3. AMR Performance

5. Conclusions and Future Work

Appendix: Projection Details

Footnotes