THE AGORA HIGH-RESOLUTION GALAXY SIMULATIONS COMPARISON PROJECT. II. ISOLATED DISK TEST

Ji-hoon Kim; Oscar Agertz; Romain Teyssier; Michael J. Butler; Daniel Ceverino; Jun-Hwan Choi; Robert Feldmann; Ben W. Keller; Alessandro Lupi; Thomas Quinn; Yves Revaz; Spencer Wallace; Nickolay Y. Gnedin; Samuel N. Leitner; Sijing Shen; Britton D. Smith; Robert Thompson; Matthew J. Turk; Tom Abel; Kenza S. Arraki; Samantha M. Benincasa; Sukanya Chakrabarti; Colin DeGraf; Avishai Dekel; Nathan J. Goldbaum; Philip F. Hopkins; Cameron B. Hummels; Anatoly Klypin; Hui Li; Piero Madau; Nir Mandelker; Lucio Mayer; Kentaro Nagamine; Sarah Nickerson; Brian W. O’Shea; Joel R. Primack; Santi Roca-Fàbrega; Vadim Semenov; Ikkoh Shimizu; Christine M. Simpson; Keita Todoroki; James W. Wadsley; John H. Wise; (for the AGORA Collaboration)

doi:10.3847/1538-4357/833/2/202

1. INTRODUCTION

Decades of strenuous effort by computational astrophysicists have propelled numerical experiments to become one of the most widely used tools in theorizing how galaxies form in the universe. Numerical experiments are often the only means to put our theory to a test, the result of which we can compare with observational data to validate the model's feasibility. Since the success of galaxy formation theory is predicated on robust numerical experiments, it is only reasonable that we apply the same scientific standard of reproducibility to galaxy formation simulations. In other words, it should be considered as a fundamental principle that researchers must not establish findings from a single numerical experiment as scientific knowledge. Only after the result is reproduced independently by other researchers and proven not to be an isolated incidence can we build any conclusive theory about how galaxies actually form in the universe.

However, the task of replicating galaxy simulations or, equivalently, comparing simulations between codes, has not received high priority.⁴⁰ Instead, the task is considered complex and time-consuming because one needs to ensure that identical physics is used in an identical initial condition (IC) with identical runtime settings. This is sometimes perceived as tedious and unrewarding for early-career researchers. In fact, the lack of reproducibility checks is not unique to the field of numerical galaxy formation (e.g., Open Science Collaboration 2015; Nature Survey 2016). And its cause is not simply an unwillingness of only computational astrophysicists, either (Everett & Earp 2015). Rather, addressing the system (or the lack thereof) which checks the reproducibility of simulations would require a collective action by the entire community. It cannot be simply about asking individual researchers to release their data dumps, but should be about building a system that incentivizes simulations published in a reproducible manner. It should also be about assembling an infrastructure that reduces the cost of reproducibility checks, on which simulations are verified routinely and effortlessly (Begley & Ioannidis 2015; Nosek et al. 2015).

The AGORA High-resolution Galaxy Simulations Comparison Project (Assembling Galaxies Of Resolved Anatomy) is the collective response by the numerical galaxy formation community to such a challenge. Since its first meeting in 2012 at the University of California at Santa Cruz, the AGORA Collaboration has aimed to compare galaxy-scale numerical experiments on a variety of code platforms with state-of-the-art resolution. Our shared goal is to ensure that physical assumptions are responsible for any success in galaxy formation simulations, rather than artifacts of particular implementations. Through a multi-platform approach from the beginning, we strive to improve all our codes by "increasing the level of realism and predictive power of galaxy simulations and the understanding of the feedback processes that regulate galaxy metabolism" (Kim et al. 2014), and by doing so to find solutions to long-standing problems in galaxy formation. Because the interplay between numerical resolution and subgrid modelings of stellar physics is crucial in galaxy-scale simulations, we require that simulations be designed with state-of-the-art resolution, ≲100 pc, which is currently allowed within realistic computational cost bounds.

In the Project's flagship paper (Kim et al. 2014), we explained the philosophy behind the Project and detailed the publicly available Project infrastructure we have put together. We also described the proof-of-concept test, in which we field-tested our infrastructure with a dark matter-only cosmological zoom-in simulation, finding a robust convergence between participating codes. More than 140 researchers from over 60 academic institutions worldwide have since agreed to take part in the Collaboration, many of whom having been actively engaged in working groups and sub-projects.⁴¹ The cohort of numerical codes participating in the Project currently include, but are not limited to in future studies: the Lagrangian smoothed particle hydrodynamics codes (SPH; Gingold & Monaghan 1977; Lucy 1977; Monaghan 1992) Changa, Gadget, Gasoline, and Gear, and the Eulerian adaptive mesh refinement codes (AMR; Berger & Oliger 1984; Berger & Colella 1989) Art-I, Art-II, Enzo, and Ramses, and the mesh-free finite-volume Godunov code Gizmo (see Section 5 for information on each code).

In this second report of our continuing endeavor, we use an isolated Milky Way-mass galaxy simulation to compare nine widely used state-of-the-art gravito-hydrodynamics codes. As in all comparison studies in AGORA, the participating codes share a common IC (i.e., generated by Makedisk; see Section 2), common physics models (e.g., radiative cooling and UV background provided by the standardized package Grackle; see Section 3.1; Bryan et al. 2014; Kim et al. 2014; Smith et al. 2016),⁴² and common analysis platform (i.e., yt toolkit; Turk et al. 2011).⁴³ We adopt spatial resolution of 80 pc that resolves the scale height of the disk. This helps the codes to be less dependent on phenomenological prescriptions of sub-resolution processes which are inevitably introduced in low-resolution (>kpc) simulations. As modern galaxy formation simulations with state-of-the-art resolution and physics prescriptions become more and more computationally expensive, it is timely that we compare high-resolution isolated disk simulations to check how successfully these galaxies are reproduced by their peers.⁴⁴ Readers should note that our intention is not to identify a "correct" or "incorrect" code, but to focus instead on juxtaposing the codes for physical insights and learn how much scatter one should expect among modern numerical tools in the field (see Section 7 for more discussion).

The remainder of this paper is organized as follows. In Section 2 we explain the isolated disk IC used in the study. The common input physics and runtime parameters required in the participating codes are discussed in Sections 3 and 4, respectively. Then Section 5 describes nine hydrodynamics codes that participated in this comparison. Section 6 presents the results of our comparison focusing on similarities and discrepancies discovered in various multi-dimensional analyses. Finally in Section 7 we summarize our findings and conclude the paper with remarks on future work. We will also stress the importance of collaborative and reproducible research in the numerical galaxy formation community the AGORA Project strives to promote.

2. INITIAL CONDITION

In this section we describe the Milky Way-mass isolated IC we adopt in this study. While this IC is part of a set of disk ICs generated for AGORA simulations that were first introduced in Section 2.2 of the Project flagship paper (Kim et al. 2014), we briefly explain its important structural properties for completeness.⁴⁵

The disk galaxy IC with properties characteristic of Milky Way-mass galaxies at redshift $z\sim 1$ is generated with a privately shared version of Makedisk (Springel et al. 2005).⁴⁶ ^,⁴⁷ The IC has the following components (see also Table 1 and Figure 1): (1) a dark matter halo with ${M}_{200}=1.074\times {10}^{12}\,{M}_{\odot }$ , ${R}_{200}=205.5\,\mathrm{kpc}$ and circular velocity of ${v}_{{\rm{c}},200}=150\,\mathrm{km}\,{{\rm{s}}}^{-1}$ that follows the Navarro–Frenk–White (Navarro et al. 1997) profile with concentration parameter c = 10 and spin parameter $\lambda =0.04$ ; (2) an exponential disk with ${M}_{{\rm{d}}}=4.297\times {10}^{10}\,\,{M}_{\odot }$ , scale length ${r}_{{\rm{d}}}=3.432\,\mathrm{kpc}$ and scale height ${z}_{{\rm{d}}}=0.1\,{r}_{{\rm{d}}}$ that is composed of 80% stars and 20% gas in mass (i.e., ${f}_{\mathrm{gas}}={M}_{{\rm{d}},\mathrm{gas}}/{M}_{{\rm{d}}}=0.2$ ); (3) a stellar bulge with ${M}_{{\rm{b}},\star }=4.297\times {10}^{9}\,\,{M}_{\odot }$ that follows the Hernquist profile ( ${M}_{{\rm{b}},\star }/{M}_{{\rm{d}}}=0.1;$ Hernquist 1990). The disk or bulge stars in the IC do not contribute to the feedback budget.

**Figure 1.** 0 Myr snapshots of the isolated Milky Way-mass galaxy simulations by nine participating codes. Disk gas surface densities in a $30\,\mathrm{kpc}$ box, edge-on (top) and face-on (bottom), produced with the common analysis toolkit `yt`. For visualizations of the particle-based codes hereafter (Figures 1–3, 14–15, 32, 34, and 35)—but not in any other analyses except these figures—`yt` uses an in-memory octree on which gas particles are deposited using smoothing kernels. Comparing 0 Myr snapshots—dumped immediately after each code reads in the IC—is to check the exact identity of ICs interpreted by each code. See Section 6.1 for more information on this figure, and Section 5 for descriptions of participating codes in this comparison. The high-resolution versions of this figure and article are available at the Project website, http://www.AGORAsimulations.org/.
Download figure:
Standard image High-resolution image

**Figure 1.** 0 Myr snapshots of the isolated Milky Way-mass galaxy simulations by nine participating codes. Disk gas surface densities in a $30\,\mathrm{kpc}$ box, edge-on (top) and face-on (bottom), produced with the common analysis toolkit `yt`. For visualizations of the particle-based codes hereafter (Figures 1–3, 14–15, 32, 34, and 35)—but not in any other analyses except these figures—`yt` uses an in-memory octree on which gas particles are deposited using smoothing kernels. Comparing 0 Myr snapshots—dumped immediately after each code reads in the IC—is to check the exact identity of ICs interpreted by each code. See Section 6.1 for more information on this figure, and Section 5 for descriptions of participating codes in this comparison. The high-resolution versions of this figure and article are available at the Project website, http://www.AGORAsimulations.org/.
Download figure:
Standard image High-resolution image

Table 1. Initial Condition Characteristics^a

	Dark Matter Halo	Stellar Disk	Gas Disk	Stellar Bulge
Density profile	Navarro et al. (1997)	Exponential	Exponential	Hernquist (1990)
Structural properties	${M}_{200}=1.074\times {10}^{12}\,{M}_{\odot }$ , ${v}_{{\rm{c}},200}=150\,\mathrm{km}\,{{\rm{s}}}^{-1}$ ,	${M}_{{\rm{d}},\star }=3.438\times {10}^{10}\,{M}_{\odot }$ ,	${M}_{{\rm{d}},\mathrm{gas}}=8.593\times {10}^{9}\,{M}_{\odot }$ ,	${M}_{{\rm{b}},\star }=4.297\times {10}^{9}\,{M}_{\odot }$ ,
	${R}_{200}=205.5\,\mathrm{kpc}$ , c = 10, $\lambda =0.04$	${r}_{{\rm{d}}}=3.432\,\mathrm{kpc}$ , ${z}_{{\rm{d}}}=0.1\,{r}_{{\rm{d}}}$	${f}_{\mathrm{gas}}=0.2$	${M}_{{\rm{b}},\star }/{M}_{{\rm{d}}}=0.1$
Number of particles	10⁵	10⁵	10⁵	$1.25\times {10}^{4}$
Particle mass	${m}_{\mathrm{DM}}=1.254\times {10}^{7}\,{M}_{\odot }$	${m}_{\star ,\mathrm{IC}}=3.437\times {10}^{5}\,{M}_{\odot }$	${m}_{\mathrm{gas},\mathrm{IC}}=8.593\times {10}^{4}\,{M}_{\odot }$	${m}_{\star ,\mathrm{IC}}=3.437\times {10}^{5}\,{M}_{\odot }$

Note.

^aFor detailed explanations on these parameters, see Section 2.

Download table as: ASCII Typeset image

Among the three resolution choices provided in Kim et al. (2014), here we employ a "low-resolution" IC which has 10⁵ particles each for the halo, the stellar disk, and the gas disk, and $1.25\times {10}^{4}$ particles for the bulge. The initial gas temperature in the disk is set to 10⁴ K, not to the specific internal energy computed by Makedisk. The initial metal fraction in the gas disk is 0.02041.⁴⁸ When the gas disk is initialized on mesh-based codes (Art-I, Art-II, Enzo, and Ramses), instead of using the particles provided by Makedisk, we require that the participants use an analytic density profile of

$\begin{eqnarray}&&{\rho }_{{\rm{d}},\mathrm{gas}}(r,z)={\rho }_{0}{e}^{-r/{r}_{{\rm{d}}}}\cdot {e}^{-| z| /{z}_{{\rm{d}}}}\end{eqnarray} \tag{ 1 }$

with ${\rho }_{0}={M}_{{\rm{d}},\mathrm{gas}}/(4\pi {r}_{{\rm{d}}}^{2}{z}_{{\rm{d}}})$ , where r is the cylindrical radius and z is the vertical height from the disk plane. To set up a disk in a centrifugal equilibrium we also ask that the participants utilize the rotational velocity profile binned from an actual gas particle distribution within $| z| \lt {z}_{{\rm{d}}}$ .⁴⁹ In mesh-based codes, we additionally include a uniformly low-density gas halo with ${n}_{{\rm{H}}}={10}^{-6}\,{\mathrm{cm}}^{-3}$ and zero initial velocity, since they cannot have cells with zero density. The halo is initially set to 10⁶ K and zero metallicity. Note that this gaseous halo does not exist in particle-based codes (SPH codes or Gizmo).

It is worth noting one point about the common disk IC adopted here. As readers may find in Section 6.6, in our experiment that includes radiative cooling, star formation and feedback ("Sim-SFF"; see Section 3), only $\sim {10}^{9}\,{M}_{\odot }$ additional stars form in 500 Myr in all codes on average. When compared with the stellar and gas components present in the IC, this means only a $\sim 3 \%$ increase in stellar mass, and a $\sim 12 \%$ decrease in gas mass. As the discussion in Section 6 should make clear, given this relatively small change in stellar mass, it is expected that the stellar feedback in our experiment will be very inefficient.

3. COMMON PHYSICS: Sim-noSF AND Sim-SFF

We now describe the common physics employed in our experiment including equilibrium gas cooling, metagalactic UV background, star formation, and energy and metal yields by supernovae. Note that the common physics adopted here is a variation of the common physics model recommended in all AGORA simulations by default; see Section 3 of Kim et al. (2014). For the present study all participating code groups are asked to run two simulations starting from the identical IC: (1) "Sim-noSF" with radiative gas cooling but without star formation or feedback, and (2) "Sim-SFF" with radiative cooling, star formation and feedback. In Section 3.1, we first list the gas physics that are common in both Sim-noSF and Sim-SFF. Then in Section 3.2, the subgrid prescriptions of stellar physics for Sim-SFF are explained.

3.1. Gas Physics: Radiative Cooling, UV Background, and Pressure Floor

The rate at which the gas in our galaxy radiatively cools is determined by AGORA's standard chemistry and cooling library Grackle (Bryan et al. 2014; Kim et al. 2014; Smith et al. 2016, see footnote 42). For this study, the equilibrium cooling version of Grackle is interfaced with each participating code, either via Grackle's original interface or via N. Gnedin's auxiliary API.⁵⁰ In the chosen equilibrium cooling mode, Grackle follows tabulated cooling rates pre-computed by the photoionization code Cloudy (Ferland et al. 2013).⁵¹ The pre-computed look-up table also includes metal cooling rates for solar abundances, $1\,{Z}_{\odot }$ , as a function of gas number density and temperature. These metal cooling rates are then scaled linearly with metallicity which is followed in our simulations as a separate passive scalar.⁵² We also adopt metagalactic UV background radiation at z = 0 by Haardt & Madau (2012) provided by Grackle. For the difference between the chosen UV background model and previous calculations such as Haardt & Madau (1996) or Faucher-Giguère et al. (2009), we refer the readers to Section 3.3 of Kim et al. (2014).

Lastly, a non-thermal Jeans pressure floor is applied to stabilize the scales of the smoothing length (particle-based codes) or the finest cell (mesh-based codes) against unphysical collapse and to avoid artificial fragmentation due to unresolved pressure gradient (Truelove et al. 1997; Robertson & Kravtsov 2008). In practice, it is achieved by enforcing that the local Jeans length ${\lambda }_{\mathrm{Jeans}}$ be sufficiently resolved with the finest resolution elements at all times. That is,

$\begin{eqnarray}&&{\lambda }_{\mathrm{Jeans}}={N}_{\mathrm{Jeans}}{\rm{\Delta }}x\end{eqnarray} \tag{ 2 }$

where ${\rm{\Delta }}x=80\,\mathrm{pc}$ is the adopted spatial resolution (finest cell size or softening length; see Section 4.1) and ${N}_{\mathrm{Jeans}}=4$ is the Jeans number adopted from Truelove et al. (1997). This gives the required pressure floor value as

$\begin{eqnarray}&&{P}_{\mathrm{Jeans}}=\displaystyle \frac{1}{\gamma \pi }{N}_{\mathrm{Jeans}}^{2}G{\rho }_{\mathrm{gas}}^{2}{\rm{\Delta }}{x}^{2}\end{eqnarray} \tag{ 3 }$

where G is the gravitational constant, $\gamma =5/3$ is the adiabatic index, and ${\rho }_{\mathrm{gas}}$ is the gas density. Note that ${N}_{\mathrm{Jeans}}$ is not necessarily equal to the parameter controlling the pressure support in each code. For actual parameter choices for selected codes, see Appendix A. For implementations using polytropes in Art-II and Ramses, see Sections 5.2 and 5.4, respectively.

3.2. Stellar Physics: Star Formation, and Energy, Mass, and Metal Yields from Core-collapse Supernovae

In addition to the gas physics described in the previous section, Sim-SFF incorporates subgrid models for star formation and supernova feedback. First, a parcel of gas above the threshold ${n}_{{\rm{H}},\mathrm{thres}}=10\,{\mathrm{cm}}^{-3}={\rho }_{\mathrm{gas},\mathrm{thres}}/{m}_{{\rm{H}}}$ produces stars at a rate that follows the local Schmidt law as

$\begin{eqnarray}&&\displaystyle \frac{d{\rho }_{\star }}{{dt}}=\displaystyle \frac{{\epsilon }_{\star }{\rho }_{\mathrm{gas}}}{{t}_{\mathrm{ff}}}\end{eqnarray} \tag{ 4 }$

where ${\rho }_{\star }$ is the stellar density, ${t}_{\mathrm{ff}}={(3\pi /(32G{\rho }_{\mathrm{gas}}))}^{1/2}$ is the local free-fall time, and ${\epsilon }_{\star }=1 \%$ is the star formation efficiency per free-fall time. We caution that the parameter ${\epsilon }_{\star }$ is not necessarily equal to the star formation efficiency parameter found in each code (e.g., for Changa and Gasoline, see Appendix B). For a new star particle to spawn, it should have at least the mass of a gas particle in the IC, ${m}_{\mathrm{gas},\mathrm{IC}}=8.593\times {10}^{4}\,{M}_{\odot }$ . Note that ${n}_{{\rm{H}},\mathrm{thres}}$ adopted in this experiment is for this particular run only, and represents where the Jeans polytrope intersects with a typical $T-\rho$ equation of state in our disks (see Sections 5.2 and 5.4).⁵³

New star particles inject energy, mass, and metals back into the interstellar medium (ISM) through core-collapse (Type II) supernovae. Assuming the AGORA standard Chabrier (2003) initial mass function (IMF) and that stars with masses between 8 and $40\,\,{M}_{\odot }$ explode as Type II supernovae, one Type II supernova occurs per every $91\,{M}_{\odot }$ stellar mass formed (see Section 3.5 of Kim et al. 2014). With the AGORA recommended fitting formulae Equations (4)–(6) of Kim et al. (2014) and the assumed IMF, this single burst is found to release $2.63\,{M}_{\odot }$ of metals⁵⁴ and $14.8\,{M}_{\odot }$ of gas (including metals). Per every $91\,{M}_{\odot }$ stellar mass, these metal and mass are instantaneously deposited into its surrounding after a delay time of 5 Myr, along with a net thermal energy of 10⁵¹ erg.

We note that exact deposit schemes for energy, mass, and metals are left at each participant's discretion. We do not intend to overly specify a single common deposit scheme which will need to be inevitably different from one code to another (e.g., between mesh-based codes and particle-based codes), as we argued in Section 3.8 of Kim et al. (2014). Nevertheless, for all mesh-based codes (Art-I, Art-II, Enzo, and Ramses), the same strategy was chosen: thermal energy, mass, and metals are added to the cell where a 5 Myr old star particle sits at the time of explosion, and to this cell only. For particle-based codes (Changa, Gasoline, Gadget-3, Gear, and Gizmo), each code's deposit scheme is discussed in detail in Section 5. In future AGORA projects, we plan to calibrate different feedback schemes against observations and against one another. We refer the readers to Section 7 for more discussion on this future work.

4. COMMON RUNTIME PARAMETERS

Here we review the runtime parameters each group is required to adopt, such as gravitational softening and hydrodynamic smoothing lengths for particle-based codes and refinement thresholds for mesh-based codes.

4.1. Gravitational Softening Length and Finest Mesh Size

For all codes the gas mass resolution in hydrodynamics needs to be set as close as possible to ${m}_{\mathrm{gas},\mathrm{IC}}=8.593\times {10}^{4}\,{M}_{\odot }$ . Assuming that we wish to resolve a self-gravitating clump with 64 of these resolution elements, the corresponding Jeans length scale becomes

$\begin{eqnarray}&&{\lambda }_{\mathrm{Jeans}}=2{\left(\displaystyle \frac{64{m}_{\mathrm{gas},\mathrm{IC}}}{(4\pi /3){n}_{{\rm{H}},\mathrm{thres}}}\right)}^{1/3}=348.7\,\mathrm{pc},\end{eqnarray} \tag{ 5 }$

and therefore, from Equation (2) we choose a spatial resolution of 80 pc. This value is used as the finest cell size ${\rm{\Delta }}x$ for mesh-based codes, and as the gravitational softening length ${\epsilon }_{\mathrm{grav}}$ for particle-based codes. For all particle-based codes taking part in the present study, gravity is softened according to the cubic spline kernel (e.g., Equation (A1) of Hernquist & Katz 1989). For readers interested in the actual parameter choices, in Appendix C we examine the meanings of relevant parameters in different particle-based codes.

4.2. Minimum Hydrodynamical Smoothing Length

For particle-based codes (including Gizmo; see Section 5.9 and footnote ⁶⁷), we require that the hydrodynamical smoothing lengths for collisional particles do not drop below 20% of the gravitational softening lengths. Unlike the gravitational softening kernel, exact smoothing kernel choices differ from code to code, and are detailed for each of the particle-based codes in Section 5. We also refer the readers interested in the actual parameter choices to Appendix C again.

4.3. Refinement Strategy

We recommend to mesh-based code groups that a cell be split into 8 child cells once the cell contains more mass than ${m}_{\mathrm{gas},\mathrm{IC}}=8.593\times {10}^{4}\,{M}_{\odot }$ (1 gas particle mass in the IC of particle-based codes), or 8 collisionless particles (disk/bulge star particles in the IC with ${m}_{\star ,\mathrm{IC}}=3.437\times {10}^{5}\,{M}_{\odot }$ , or dark matter particles with ${m}_{\mathrm{DM}}=1.254\times {10}^{7}\,{M}_{\odot }$ ). This causes the grids to be refined in a fashion similar to the Lagrangian behavior of particle-based codes, and keeps the ratio of collisionless particle numbers to gas cells approximately unity on average. However, exact refinement strategies differ slightly from code to code, and are detailed for each of the mesh-based codes in Section 5.⁵⁵ We continue to refine the grids down to the resolution limit ${\rm{\Delta }}x=80$ pc (see Section 4.1) where the non-thermal pressure floor kicks in (see Section 3.1).

5. PARTICIPATING CODES

In this section we introduce the nine gravito-hydrodynamics codes taking part in this test, focusing in particular on hydrodynamics solvers, refinement schemes for mesh-based codes (Art-I, Art-II, Enzo, and Ramses), and supernova feedback implementations for particle-based codes (Changa, Gasoline, Gadget-3, Gear, and Gizmo). We leave out details that are commonly adopted across platforms such as gas cooling (Section 3.1) or star formation (Section 3.2), or that were included in the AGORA flagship paper such as gravitational dynamics (Section 5 of Kim et al. 2014). We also point out that the codes involved in future AGORA studies are not necessarily limited to the ones described herein.

5.1. Art-I

In Art-I, differential equations of fluid dynamics are integrated using a shock-capturing Eulerian method described in Khokhlov (1998). It uses a second-order accurate Godunov solver (Godunov 1959) that evaluates Eulerian fluxes by solving the Riemann problem at every cell interface (Colella & Glaz 1985). Left and right states of the Riemann problem are obtained by piecewise linear interpolation (van Leer 1979). In contrast to other versions of Art (see Section 5.2.1 of Kim et al. 2014), Art-I with distinctive star formation and feedback recipes (e.g., Ceverino & Klypin 2009; Ceverino et al. 2014) have been developed by A. Klypin and collaborators.

The octree-based, multi-level adaptive mesh allows users to control the grid structure at the individual cell level. For this comparison, the Art-I group uses a 128³ root grid covering a ${(1.304\mathrm{Mpc})}^{3}$ box, then achieves an ∼80 pc cell size at maximum 7 levels of refinement. The mass thresholds above which a cell is adaptively refined into an oct of 8 child cells are ${m}_{\mathrm{gas},\mathrm{IC}}=8.593\times {10}^{4}\,{M}_{\odot }$ and ${m}_{\star ,\mathrm{IC}}=3.437\times {10}^{5}\,{M}_{\odot }$ for gas and collisionless particles, respectively.⁵⁶ For the supernova feedback scheme to deposit the thermal energy adopted by all mesh-based codes, we refer the readers to Section 3.2.

5.2. Art-II

Art-II solves the gravito-hydrodynamics equations using a particle-mesh + Eulerian AMR approach. Art-II features MPI parallelization for distributed memory machines, flexible time-stepping hierarchy, and a variety of unique physics modules (e.g., Gnedin & Kravtsov 2011; Agertz et al. 2013) developed by N. Gnedin, A. Kravtsov and collaborators.

For the present study, starting from a uniform 128³ root grid covering ${(1.311\mathrm{Mpc})}^{3}$ , cells are refined up to 7 additional levels to reach the finest size of 80 pc. Spherical regions of 4 (6, 10) root grid cells radius around the box center are always refined to at least 3 (2, 1) additional levels relative to the root grid. The (de-)refinement procedure consists of three steps. First, cells are marked for refinement if the gas mass in the cell exceeds $0.6\,{m}_{\mathrm{gas},\mathrm{IC}}=0.6\times 8.593\times {10}^{4}\,{M}_{\odot }$ , or if the cell contains two or more dark matter and/or star particles that were present in the IC. We then use a diffusion step to also mark neighboring cells for refinement and thus smooth the shape of the regions to be refined. By contrast, cells with gas masses below $0.2\,{m}_{\mathrm{gas},\mathrm{IC}}$ or without particles are marked for de-refinement provided they also satisfy a number of additional constraints. Finally, cells are refined (de-refined) by splitting them into 8 (by merging 8 child cells).

The pressure floor implemented in Art-II affects cells at the highest level of refinement by modifying the gas pressure values that enter the Riemann solver (i.e., not the actual pressure or temperature fields) with

$\begin{eqnarray}&&{P}_{\mathrm{cell}}={\mathtt{\max }}({P}_{\mathrm{Jeans}},\,{P}_{\mathrm{gas}})\end{eqnarray} \tag{ 6 }$

$\begin{eqnarray}&&={\mathtt{\max }}({n}_{{\rm{H}}}{k}_{{\rm{B}}}{T}_{\mathrm{Jeans}},\,{P}_{\mathrm{gas}})\end{eqnarray} \tag{ 7 }$

where ${P}_{\mathrm{cell}}$ is the value entering the Riemann solver, ${P}_{\mathrm{gas}}$ is the gas pressure field in the simulation, ${k}_{{\rm{B}}}$ is the Boltzmann constant, ${n}_{{\rm{H}}}={\rho }_{\mathrm{gas}}/{m}_{{\rm{H}}}$ , and ${T}_{\mathrm{Jeans}}={T}_{{\rm{J}}}({n}_{{\rm{H}}}/{n}_{{\rm{H}},{\rm{J}}})$ with ${T}_{{\rm{J}}}=1800\,{\rm{K}}$ and ${n}_{{\rm{H}},{\rm{J}}}=8\,{\mathrm{cm}}^{-3}$ . This polytrope choice is designed to match the common prescription Equation (3) with ${N}_{\mathrm{Jeans}}\simeq 4$ . For the supernova feedback scheme to deposit the thermal energy adopted by all mesh-based codes, see Section 3.2.

5.3. Enzo

Enzo is a block-structured adaptive mesh code, developed by an open-source, community-driven approach (Bryan & Norman 1997; O'Shea et al. 2004; Bryan et al. 2014).⁵⁷ Among a variety of solver choices, for this comparison the third-order accurate piecewise parabolic method is selected to reconstruct the left and right states of the Godunov problem (Colella & Woodward 1984; Bryan et al. 1995), along with a Harten–Lax–van Leer with contact (HLLC) Riemann solver (Toro et al. 1994). A maximum 30% of the required Courant–Friedrichs–Lewy (CFL) timestep is used to advance fluid elements; i.e., CFL safety factor = 0.3. In addition to solving the conservation equations for mass, momentum and energy, the equation for internal energy is also solved in parallel, and the conservative or non-conservative formulation is adaptively selected based on a local estimate of the energy truncation errors. This ensures that the gas temperature remains physical, even in highly supersonic regions.

The Enzo group uses a 64³ initial root grid covering a ${(1.311\mathrm{Mpc})}^{3}$ simulation box, then achieves 80 pc resolution with maximum 8 levels of refinement. The mass thresholds above which a cell is refined by factors of two in each axis are ${m}_{\mathrm{gas},\mathrm{IC}}\,=8.593\times {10}^{4}\,{M}_{\odot }$ and $8\,{m}_{\star ,\mathrm{IC}}=8\times 3.437\times {10}^{5}\,{M}_{\odot }$ for gas and collisionless particles, respectively (see footnote 56). The non-thermal pressure floor Equation (3) is used to modify the gas pressure inside the Riemann solver, but not to alter the actual gas energy field. For the supernova feedback scheme adopted by all mesh-based codes, we refer the readers to Section 3.2.

5.4. Ramses

Ramses is an octree-based adaptive mesh code featuring an unsplit second-order accurate Monotone Upstream-centered Scheme for Conservation Laws (MUSCL) Godunov scheme for the gaseous component (Teyssier 2002).⁵⁸ For this comparison, Ramses group uses a ideal gas equation of state with $\gamma =5/3$ , along with the HLLC Riemann solver (Toro et al. 1994) and the MinMod slope limiter (Roe 1986). The CFL safety factor for controlling the time step is set to 0.5. The dual energy formalism adopted in Enzo simulations (Section 5.3) is also used in Ramses runs.

For this study, starting from a uniform 128³ root grid covering ${(320\mathrm{kpc})}^{3}$ , cells are refined up to 5 additional levels to achieve an ∼80 pc cell size. The refinement process works as follows. First, new refinement is triggered on a cell-by-cell basis if the baryonic mass (gas + newly formed stars) exceeds ${m}_{\mathrm{gas},\mathrm{IC}}=8.593\times {10}^{4}\,{M}_{\odot }$ , or if the number of dark matter and/or star particles that are present in the IC exceeds 8.⁵⁹ We then mark additional cells by performing a mesh smoothing operation, expanding the initial area by one cell width in every direction. When new cells are created or old cells destroyed, density, momentum and internal energy are used as averaging and interpolating variables, thereby preventing a grid point with spurious temperature.

In Ramses, the gas pressure field includes the non-thermal pressure support term given by a temperature polytrope ${T}_{\mathrm{Jeans}}\ =\mu {T}_{{\rm{J}}}({n}_{{\rm{H}}}^{\prime }/{n}_{{\rm{H}},{\rm{J}}})$ with mean molecular weight μ, ${n}_{{\rm{H}}}^{\prime }={\rho }_{\mathrm{gas}}{X}_{{\rm{H}}}/{m}_{{\rm{H}}}$ , ${T}_{{\rm{J}}}=1800\,{\rm{K}}$ , ${n}_{{\rm{H}},{\rm{J}}}=8\,{\mathrm{cm}}^{-3}$ and ${X}_{{\rm{H}}}=0.76$ .⁶⁰ As in Art-II, this polytrope approximately matches the common pressure support prescription Equation (3). Newly created star particles in Ramses have a fixed mass of ${m}_{\mathrm{gas},\mathrm{IC}}$ , but they are spawned with a Poisson probability distribution whose parameters are designed to mimic the local Schmidt law, Equation (4). Lastly, for the common supernova feedback scheme adopted by all mesh-based codes, we refer the readers to Section 3.2.

5.5. Changa

Changa is a reimplimentation of Gasoline (see Section 5.6) in the Charm++ runtime system.⁶¹ Charm++ (Kale & Krishnan 1996, pp. 175–213)⁶² enables the overlap of computation and communication and provides adaptive load balancing infrastructure, allowing Changa to scale to hundreds of thousands of processor cores (Menon et al. 2015). The hydrodynamics in Changa closely follows that of Gasoline. SPH forces are calculated using the method of Ritchie & Thomas (2001), and energy is diffused using the scheme of Shen et al. (2010), both of which providing a more accurate treatment of multi-phase ISM. Timesteps are determined by the minimum of an acceleration and a CFL criterion. Furthermore, the timesteps of neighbors are kept within a factor of 2 of each other as in Saitoh & Makino (2009) in order to accurately integrate highly supersonic flows.

For this work, a kth nearest-neighbor algorithm is used to find the ${N}_{\mathrm{smooth}}=64$ nearest neighbors which are smoothed with the Wendland C4 kernel (Dehnen & Aly 2012) to determine hydrodynamic properties. Unlike conventional versions of Changa or Gasoline, the supernovae thermal energy, mass, and metals are directly distributed to the 64 neighboring gas particles.⁶³ Gas particles that are neighbors of particles that will explode as a supernova in their next timestep are put on timesteps suitable for their post-supernova thermal energy, preventing them from being on a much smaller timestep required in the CFL condition. Grackle cooling is implemented but it does not self-consistently account for the PdV work or other external sources of energy, a requirement for Changa and Gasoline's energy integration. Therefore, we split the energy integration into a half timestep of Grackle cooling, then a full timestep of PdV heating, and finally a second half timestep of cooling.

5.6. Gasoline

Gasoline is a massively parallel SPH code, first described in Wadsley et al. (2004), that has subsequently been updated with modern SPH features. It contains a subgrid model for turbulent mixing of metals and energy (e.g., Shen et al. 2010), a timestep limiter by Saitoh & Makino (2009) (see Section 5.5), and a geometric density estimator for SPH force expressions (see Section 2.4 of Keller et al. 2014, for a latest detailed description of the code and its performance).

For the current work, the Gasoline group uses a Wendland C4 smoothing kernel (Dehnen & Aly 2012) with ${N}_{\mathrm{smooth}}=200$ neighbors.⁶⁴ The same feedback scheme as Changa's (Section 5.5) is implemented, smoothed with the Wendland C4 kernel over 64 neighbors (not ${N}_{\mathrm{smooth}}=200$ ) to better match the amount of mass heated by feedback events with other particle-based codes. Gas particles that receive feedback compute their required CFL timestep at the timestep prior to receiving feedback, which helps to prevent numerical instability and overcooling. Grackle cooling is implemented by applying a half timestep of cooling, then a full timestep of external PdV heating, followed by a final half timestep of cooling, as in Changa (Section 5.5).

5.7. Gadget-3

Gadget-3 is an updated version of Gadget-2, a cosmological tree-particle-mesh SPH code that was originally developed by V. Springel (Springel et al. 2001; Springel 2005).⁶⁵ Gadget-3 has important updates from Gadget-2, such as domain decomposition and dynamic tree reconstruction which may slightly alter the N-body dynamics. The Gadget-3 code used in this comparison is a modified version of the original Gadget-3 by K. Nagamine and his collaborators, which includes pressure-entropy formulation by Hopkins (2013), time-dependent artificial viscosity, variable smoothing lengths, among others (e.g., Choi & Nagamine 2012; Thompson et al. 2014; Aoyama et al. 2016).

For the present study, the Gadget-3 group adopts a quintic spline smoothing kernel (Morris 1996) with ${N}_{\mathrm{ngb}}=64$ . The implementation of supernova feedback is based on an updated version of Todoroki (2014) that largely follows a Sedov–Taylor blast wave method outlined in Stinson et al. (2006, 2013, but not their cooling shutoff model). The exact model used in the current work is fully described in Aoyama et al. (2016) but, in brief, the implementation comprises the following steps. Every time a star particle explodes, we compute the "shock radius" based on Chevalier (1974) and McKee & Ostriker (1977), and then find the gas particles within the radius. We then inject thermal energy and metal yields into the identified gas particles within the shock radius, weighted by the SPH spline kernel. Finally, we note that the results of this version of Gadget-3 are not representative of all the Gadget-3 codes in the community, because some of the results are strongly dependent on the detailed implementations of baryonic physics, such as star formation and feedback.

5.8. Gear

Gear is a self-consistent, fully parallelized, chemo-dynamical tree SPH code (Revaz & Jablonka 2012) which is built on the publicly available Gadget-2 code (see Section 5.7; Springel 2005). The simulations reported here are run with the improvements dicussed in Revaz et al. (2016), including the pressure-entropy formulation proposed by Hopkins (2013), individual and adaptive time-stepping schemes (Durier & Dalla Vecchia 2012), artificial viscosity (Monaghan & Gingold 1983) supplemented with the Balsara switch ( ${f}_{{ij}}$ from Balsara 1995), and particle-based time-dependent viscosity coefficient (Rosswog et al. 2000).

For this study, the standard cubic spline smoothing kernel (Monaghan & Lattanzio 1985) in Gadget-2 is used with ${N}_{\mathrm{ngb}}=50$ . The feedback energy, mass, and metal injection into the ISM is implemented following the standard SPH scheme. The implementation comprises the following steps. Every time a star particle explodes, we first find the nearest gas particles, according to the weighted number of neighbors as defined in Springel & Hernquist (2002). A desired number of neighbors ${N}_{\mathrm{ngb}}=50$ is used. Then we inject thermal energy and yields into the neighboring gas particles, weighted by the SPH spline kernel. Grackle cooling is performed after the kick step, once gas particles have eventually received supernova feedback energy and once the size of the next timestep is known. The adiabatic cooling/heating is first applied, and then the radiative one provided by Grackle.

5.9. Gizmo

Gizmo (Hopkins 2015) is a new mesh-free Godunov code based on discrete tracers, aimed at capturing the advantages of both Lagrangian and Eulerian techniques.⁶⁶ The numerical scheme implemented in Gizmo, initially proposed by Lanson & Vila (2008), follows the implementation of Gaburov & Nitadori (2011) and relies on the discretization of the Euler equations of hydrodynamics among a set of discrete tracers. Unlike in the moving mesh technique, where the volume is partitioned by a Voronoi tessellation, Gizmo distributes the volume fraction assigned to the tracers through a kernel function. For the current work, Gadget's standard cubic spline smoothing kernel is used with ${N}_{\mathrm{ngb}}=32$ . Note, unlike SPH codes, these tracers only represent unstructured cells, sharing an "effective face" with the neighboring cells.⁶⁷ The Riemann problem is then solved across these faces using a Godunov method as in mesh-based codes, to accurately resolve shocks without artificial dissipation terms. Unlike mesh-based codes, these cells are not fixed in space and time, resulting in the scheme's Lagrangian behavior with intrinsically adaptive resolution. When the time evolution of the common face between two cells is considered, we use the second-order accurate Meshless Finite Mass method (described in Hopkins 2015). Gizmo's time-stepping scheme is fully adaptive, and closely follows Gadget-3 or Arepo (Springel 2010). It also includes a timestep limiter by Saitoh & Makino (2009) (see Section 5.5).

Gizmo's gravity solver is based on the tree algorithm inherited from Gadget-3, itself descending from Gadget-2 (see Section 5.7; Springel 2005). Gravitational softenings in Gizmo can be fixed or fully adaptive, but in the reported runs fixed softening length is used matching SPH codes. To model supernova feedback, the Gizmo simulations shown here adopts a similar feedback strategy used in Gear (Section 5.8). That is, energy, mass and metals are distributed among the neighboring gas particles/cells in a kernel-weighted fashion, but with ${N}_{\mathrm{ngb}}=32$ . For star particles, timesteps are constrained to prevent supernovae from exploding in the timestep when the stars formed. Lastly, we caution that different feedback implementations using Gizmo in the literature adopt different algorithms to distribute supernova feedback energy (e.g., Hopkins et al. 2014, by the FIRE Collaboration).

6. RESULTS

In this section, we lay out the results of the first isolated disk galaxy comparison by the AGORA Collaboration. We focus on similarities and discrepancies discovered by comparing 500 Myr snapshots of the participating simulations in two setups: Sim-noSF and Sim-SFF. As defined in Section 3, Sim-noSF refers to a run with radiative gas cooling but without star formation or feedback, and Sim-SFF refers to a run with radiative cooling, star formation and supernova feedback.

In our simulation analyses, a key role has been played by the AGORA recommended community-driven analysis platform yt (Turk et al. 2011; Turk & Smith 2011; Turk 2013, see footnote 43). It natively processes data from all nine participating simulation codes discussed in this paper, plus many other modern astrophysics codes such as Athena (Stone et al. 2008), Flash (Fryxell et al. 2000), Gadget-3-sphs (Read & Hayfield 2012), Nyx (Almgren et al. 2013), and Orion (Truelove et al. 1998), to name a few. Interested readers may try a unified, publicly available yt script employed in the present analyses that has been developed throughout the progress of this study.⁶⁸ ^,⁶⁹ We also plan to make data sets used in the present study publicly available in the near future (see Section 7 for more information).

6.1. Gas Disk Morphology

We first examine the morphology of gas disks evolved in each of the codes in our experiments. In Figures 1–3, we compile nine panels that exhibit the results of the isolated disk galaxy simulations, first with radiative gas cooling but without star formation or feedback, Sim-noSF, and second with star formation and feedback, Sim-SFF, by the nine participating codes. Each panel displays the disk gas surface density in a $30\,\mathrm{kpc}$ box centered on the location of maximum gas density within 1 kpc from the center of gas mass. This centering criterion is adopted in all subsequent figures and plots. For visualizations of the particle-based codes hereafter (Figures 1–3, 14–15, 32, 34 and 35)—but not in any other analyses except these figures—yt uses an in-memory octree on which gas particles are deposited using smoothing kernels. The resolution of this octree governs the resolution of produced images. If more than eight particles are in an oct, that oct is refined into 64 child octs (i.e., yt parameters ${\mathtt{n}}\_{\mathtt{ref}}=8$ and ${\mathtt{over}}\_{\mathtt{refine}}\_{\mathtt{factor}}=2$ ), providing compatible or better image resolution than a typical SPH visualization. The densities are assigned to the octree in a scatter step. That is, we first calculate a particle's smoothing length, and then add the particle's density contribution to all cell centers of the octree cells that are within the particle's smoothing sphere.

We asked every code to output the state of the simulation immediately after it was initialized—so-called "0 Myr snapshot"—to allow ourselves to directly compare whether the IC generation was successful and consistent among codes. This exercise has been strenuously carried out for all the analyses items presented in Sections 6.1–6.3 and 6.7, enabling us to correct inconsistently initialized simulations early in the study. One such example, the surface density comparison of 0 Myr snapshots, is shown in Figure 1. A clear distinction in gas disk initialization between mesh-based and particle-based codes can be seen in this figure. To model the gas disk, SPH particles or Gizmo's discrete tracers are generated by drawing random numbers from the distribution function given by an analytic density profile. This by definition results in Poisson noise in the disk surface density shown here. Readers can also observe slight differences between mesh-based and particle-based codes in how the density field is represented in their calculations. By the nature of reconstructing the density from the positions of particles, the particle-based codes may smooth out the strong density contrast in the IC at the edge of the initial gas disk.

Interestingly, in Figures 2 and 3 at 500 Myr, there are other subtle differences noticeable between mesh-based and particle-based codes as well as within these sub-groups. While the peak densities and filamentary structures in the disks are very similar across all codes, it is noticeable that the mesh-based codes typically show lower densities in the inter-arm regions of the disks. The typical densities in those inter-arm regions—while not containing much mass—may differ by as much as an order of magnitude between mesh-based and particle-based codes (see also Section 6.7 and Figure 35 for a related discussion on spatial resolution). Another distinguishing aspect among the participating codes is the number of dense clumps formed. This is true in both the simulations with and without star formation and feedback (see also Section 6.4 and Figures 21 and 23 for a related comparison of newly formed stellar clumps).

**Figure 2.** 500 Myr composite of gas surface densities from *Sim-noSF* with radiative gas cooling but without star formation or supernova feedback. Each frame is centered on the galactic center—location of maximum gas density within 1 kpc from the center of gas mass. For visualizations of the particle-based codes hereafter (Figures 1–3, 14–15, 32, 34, and 35)—but not in any other analyses except these figures—`yt` uses an in-memory octree on which gas particles are deposited using smoothing kernels. See Section 5 for descriptions of participating codes in this comparison, and Section 6.1 for a detailed explanation of this figure. Compare with Figure 14. Simulations performed by: Daniel Ceverino (Art-I), Robert Feldmann (Art-II), Mike Butler (Enzo), Romain Teyssier (Ramses), Spencer Wallace (Changa), Ben Keller (Gasoline), Jun-Hwan Choi (Gadget-3), Yves Revaz (Gear), and Alessandro Lupi (Gizmo). The high-resolution versions of this figure and article are available at the Project website, http://www.AGORAsimulations.org/.
Download figure:
Standard image High-resolution image

**Figure 3.** Same as Figure 2 but for *Sim-SFF* with star formation and feedback. See Section 6.1 for a detailed explanation of this figure. See also Section 3.2 for the common star formation prescription and the guideline for supernova feedback, and Section 5 for the exact deposit scheme of thermal feedback energy implemented in each code. Compare with Figures 15, 21, 29, 32, 34, and 35.
Download figure:
Standard image High-resolution image

Figures 4 and 5 are the cylindrically binned gas surface density profiles for Sim-noSF and Sim-SFF, respectively. The cylindrical radius is defined as the distance from the galactic center. Raw particle fields are used for profiles of the particle-based codes, not the interpolated or smoothed fields constructed in yt. In other words, the total mass in each cylindrical annulus is divided by its area so that each gas particle contributes only to a bin in which its center falls. As noted earlier, the gas surface densities show a high degree of correspondence across all codes in both Sim-noSF and Sim-SFF. All nine profiles agree very well within a factor of a few at all radii (averaged fractional deviation for $2\lt r\lt 10$ kpc in Sim-SFF is 32.2% or 0.121 dex),⁷⁰ and can be approximated by exponential profiles at radii >1.5 kpc. Note that due to the gas consumed by star formation, the gas surface density slightly decreases from Figures 4 to 5, the latter of which could be compared with Figures 22 (newly formed stellar surface density) and 27 (star formation rate (SFR) surface density). Aside from this small decrease in density, there is little impact of supernova feedback from contrasting Sim-noSF and Sim-SFF. The inefficiency of stellar feedback is partly related to a only small increase in stellar mass in the first 500 Myr of evolution (see Sections 2 and 6.6 for more discussion).

**Figure 4.** Cylindrically binned gas surface density profiles at 500 Myr for *Sim-noSF* without star formation or feedback. The cylindrical radius is from the galactic center (location of maximum gas density within 1 kpc from the center of gas mass; in all analyses for particle-based codes hereafter) except the graphical visualizations such as Figures 1–3—raw particle fields are used, not the interpolated or smoothed fields constructed in `yt`. Shown in the bottom panel is the fractional deviation from the mean of these profiles. See Section 6.1 for more information on this figure. The y-axis range of the top panel is kept identical among Figures 4–7 and 22 for easier comparison.
Download figure:
Standard image High-resolution image

**Figure 5.** Same as Figure 4 but for *Sim-SFF* with star formation and feedback. Compare with Figures 22 and 27.
Download figure:
Standard image High-resolution image

Displayed in Figures 6 and 7 are the vertically binned gas surface density profiles for Sim-noSF and Sim-SFF, respectively. The height is defined as the absolute vertical distance from the x–y disk plane centered on the galactic center, i.e., $| {z}_{{\rm{i}}}-{z}_{\mathrm{center}}|$ . Again, for particle-based codes raw particle fields are used to produce the profiles, not the interpolated or smoothed fields in yt. Note a smaller range in x-axes than in the previous figure (only one tenth of Figures 4 and 5). The surface densities begin to diverge above ∼0.6 kpc from the disk plane, but below that substantial agreement appears (averaged fractional deviation for $z\lt 0.6$ kpc in Sim-SFF is 30.4% or 0.115 dex). There is no systematic difference between mesh-based and particle-based codes, similar to what we find in radial density profiles, Figures 4 and 5.

**Figure 6.** Vertically binned gas surface density profiles at 500 Myr for *Sim-noSF* without star formation or feedback. The height is the absolute vertical distance from the x–y disk plane centered on the galactic center, $| {z}_{{\rm{i}}}-{z}_{\mathrm{center}}|$ . Shown in the bottom panel is the fractional deviation from the mean of these profiles. The y-axis range of the top panel is kept identical among Figures 4–7 and 22 for easier comparison.
Download figure:
Standard image High-resolution image

$| {z}_{{\rm{i}}}-{z}_{\mathrm{center}}| $ — **Figure 6.** Vertically binned gas surface density profiles at 500 Myr for *Sim-noSF* without star formation or feedback. The height is the absolute vertical distance from the x–y disk plane centered on the galactic center, $| {z}_{{\rm{i}}}-{z}_{\mathrm{center}}|$ . Shown in the bottom panel is the fractional deviation from the mean of these profiles. The y-axis range of the top panel is kept identical among Figures 4–7 and 22 for easier comparison.
Download figure:
Standard image High-resolution image

**Figure 7.** Same as Figure 6 but for *Sim-SFF* with star formation and feedback.
Download figure:
Standard image High-resolution image

Figures 8 and 9 show the cylindrically binned, mass-weighted average of gas vertical heights for Sim-noSF and Sim-SFF, respectively. As defined before, the height is an absolute vertical distance from the disk plane. Thus, each line in Figures 8 and 9 represents the averaged $| {z}_{{\rm{i}}}-{z}_{\mathrm{center}}|$ as a function of cylindrical radius. Combined with Figures 6 and 7, these plots provide insight into the thicknesses of the gas disks in each of the codes. Again, no systematic difference is found between mesh-based and particle-based codes.

**Figure 8.** Cylindrically binned, mass-weighted averages of gas vertical heights for *Sim-noSF* without star formation or feedback. The height is the absolute vertical distance from the x–y disk plane centered on the galactic center, $| {z}_{{\rm{i}}}-{z}_{\mathrm{center}}|$ . Shown in the bottom panel is the fractional deviation from the mean of these profiles. See Section 6.1 for an explanation on how this figure is made.
Download figure:
Standard image High-resolution image

**Figure 9.** Same as Figure 8 but for *Sim-SFF* with star formation and feedback.
Download figure:
Standard image High-resolution image

6.2. Gas Disk Kinematics

Here we examine the kinematics of gas disks that are evolved using each of the participating codes. First, in Figures 10 and 11 we show the gas rotation velocity curves for Sim-noSF and Sim-SFF, respectively. The curve reveals a mass-weighted average of gas rotational velocity of gas cells/particles, as a function of cylindrical radius. The cylindrical radius and rotational velocity are defined with respect to the galactic center (see Section 6.1 for our adopted centering scheme). For mesh-based codes, only the dense enough gas cells ( ${\rho }_{\mathrm{gas}}\gt {10}^{-25}\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ ) are considered in order to minimize the contribution of the gaseous halo which is present only in mesh-based codes (see Section 2 for more information about the halo gas distribution in our IC). From these two figures it is clear that there exists a very good agreement on gas kinematics in all the codes, as good as within a few percent at ∼10 kpc for both Sim-noSF and Sim-SFF (averaged fractional deviation for $2\lt r\lt 10$ kpc in Sim-SFF is 2.8% or 0.012 dex). Discrepancies in the central region (<1.5 kpc) are a result of determining the galactic center that may constantly shift its position.

**Figure 10.** Gas rotation velocity curves at 500 Myr for *Sim-noSF* without star formation or feedback. The cylindrical radius and rotational velocity are with respect to the galactic center—location of maximum gas density within 1 kpc from the center of gas mass. Shown in the bottom panel is the fractional deviation from the mean of these profiles. See Section 6.2 for a detailed explanation on how this figure is made.
Download figure:
Standard image High-resolution image

**Figure 11.** Same as Figure 10 but for *Sim-SFF* with star formation and feedback. Compare with Figure 24.
Download figure:
Standard image High-resolution image

Figures 12 and 13 reveal the gas velocity dispersion curves for Sim-noSF and Sim-SFF, respectively. The velocity dispersion quantifies the residual velocity components of gas other than the rotational velocity found in Figures 10 and 11. In other words, each line in Figures 12 and 13 denotes a square root of a mass-weighted average of ${({v}_{{\rm{i}}}-{v}_{\mathrm{rot}}(r))}^{2}$ , as a function of cylindrical radius. As in Figures 10 and 11, for mesh-based codes only dense enough cells are used to compute the dispersion. Again a good agreement is found in the velocity dispersion between all codes within a few tens of percent for both Sim-noSF and Sim-SFF (averaged fractional deviation for $2\lt r\lt 10$ kpc in Sim-SFF is 17.8% or 0.071 dex). Larger variations in the central region (<1.5 kpc) are partly due to the center determination, just like in rotation velocity curves, Figures 10 and 11. The discrepancies are also produced by vertical movement of gas in the inner disk, which is captured in mesh-based codes (Art-I, Art-II, Enzo, and Ramses) but not as well in particle-based codes (Changa, Gasoline, Gadget-3, Gear, and Gizmo) given our very particular choice of 80 pc resolution. This can be clearly seen in the bottom panels of Figures 12 and 13, in which we plot the ratio of vertical velocity dispersion (z-direction) to total velocity dispersion to illustrate the contribution by vertical movement of gas. From these figures, we find that a significant portion of gas velocity dispersion in the inner disk measured in mesh-based codes are driven by vertical movement, but not in particle-based codes (see Section 6.7 and Figure 35 for a related discussion on spatial resolution).

**Figure 12.** Gas velocity dispersion curves at 500 Myr for *Sim-noSF* without star formation or feedback. The velocity dispersion is the square root of mass-weighted averages of ${({v}_{{\rm{i}}}-{v}_{\mathrm{rot}}(r))}^{2}$ . Shown in the middle panel is the fractional deviation from the mean of these profiles. In the bottom panel we plot the ratio of *vertical* velocity dispersion (z-direction) to *total* velocity dispersion. See Section 6.2 for a detailed explanation on how this figure is made. The y-axis range of the top panel is kept identical among Figures 12, 13, and 25 for easier comparison.
Download figure:
Standard image High-resolution image

**Figure 12.** Gas velocity dispersion curves at 500 Myr for *Sim-noSF* without star formation or feedback. The velocity dispersion is the square root of mass-weighted averages of ${({v}_{{\rm{i}}}-{v}_{\mathrm{rot}}(r))}^{2}$ . Shown in the middle panel is the fractional deviation from the mean of these profiles. In the bottom panel we plot the ratio of *vertical* velocity dispersion (z-direction) to *total* velocity dispersion. See Section 6.2 for a detailed explanation on how this figure is made. The y-axis range of the top panel is kept identical among Figures 12, 13, and 25 for easier comparison.
Download figure:
Standard image High-resolution image

**Figure 13.** Same as Figure 12 but for *Sim-SFF* with star formation and feedback. Compare with Figure 25.
Download figure:
Standard image High-resolution image

6.3. Thermal Structure of the Interstellar Medium

In Figures 14 and 15, we compile 9 panels of the density-square-weighted gas temperature projections for Sim-noSF and Sim-SFF, respectively.⁷¹ Each panel is for the central $30\,\mathrm{kpc}$ box (see Section 6.1 for our adopted centering scheme). Readers should note that, by design, only mesh-based codes initially include a gaseous halo with low density and high temperature ( ${T}_{\mathrm{gas}}={10}^{6}\,{\rm{K}};$ colored dark red), but not the particle-based codes. All of these codes show similar features without star formation and feedback in Figure 14. We continue a related discussion on Sim-noSF using Figures 16 and 18 later in this section.

**Figure 14.** 500 Myr composite of density-square-weighted gas temperature projections, edge-on (top) and face-on (bottom), for *Sim-noSF* with radiative gas cooling but without star formation or supernova feedback. See Section 5 for descriptions of participating codes in this comparison, and Section 6.3 for a detailed explanation of this figure. Compare with Figure 2. The high-resolution versions of this figure and article are available at the Project website, http://www.AGORAsimulations.org/.
Download figure:
Standard image High-resolution image

**Figure 15.** Same as Figure 14 but for *Sim-SFF* with star formation and feedback. See Section 6.3 for a detailed explanation of this figure. See also Section 3.2 for the common star formation prescription and the guideline for supernova feedback, and Section 5 for the exact deposit scheme of thermal feedback energy implemented in each code. Compare with Figure 3.
Download figure:
Standard image High-resolution image

**Figure 16.** 500 Myr composite of two-dimensional probability distribution function of density and temperature for the gas within 15 kpc from the galactic center in *Sim-noSF* without star formation or feedback. Colors represent the total gas mass in each two-dimensional bin. A gaseous halo—low-density, high-temperature gas in the upper left corner of each panel—exists only in mesh-based codes, but not in particle-based codes (SPH codes or Gizmo). To guide the eye, we use a thick dashed line in each panel to plot the mean temperature in each density bin for Changa. The thin dotted diagonal lines denote the slope of constant pressure process, and the thin dotted–dashed diagonal lines that of constant entropy process. Note that different versions of Grackle are interfaced with different codes (Grackle v2.1 in Changa, Gasoline, Gadget-3, and Gizmo, vs. Grackle v2.0 or below in Art-I, Art-II, Enzo, Ramses, and Gear). See Section 6.3 for more information on this figure. The high-resolution versions of this figure and article are available at the Project website, http://www.AGORAsimulations.org/.
Download figure:
Standard image High-resolution image

But slight differences in supernova feedback implementation—even within a common guideline—may give rise to different features in the temperature map; Figure 15. For example, small hot bubbles are visible in Changa and Gear, but not in Gasoline or Gadget-3 in which the common supernova feedback schemes are implemented slightly differently.⁷² We also note the chimneys of hot gas departing from the disk in Enzo and Ramses, clearly visible in the edge-on maps of Figure 15. This gas ejecta from the disk as a result of supernova feedback is not as evident in Art-I, Art-II, or particle-based codes, although some hot bubbles are seen in Changa and Gear. We caution that the spatial resolution employed in this paper is only 80 pc. It is not as high as the resolutions in some of the modern zoom-in cosmological simulations, and may not be enough for particle-based codes to resolve the chimney-like structure above and below the disk (see Section 6.7 and Figure 35 for a related discussion on spatial resolution). We refer the readers to a continued discussion on Sim-SFF using Figures 17 and 19 later in this section.⁷³

**Figure 17.** Same as Figure 16 but for *Sim-SFF* with star formation and feedback. We use a thick dashed line in each panel to plot the mean temperature in each density bin in Changa's *Sim-noSF* run (same as in Figure 16). See Section 6.3 and the caption of Figure 16 for a detailed explanation of this figure. Compare with Figure 33.
Download figure:
Standard image High-resolution image

To better understand what we find in Figures 14 and 15, we show the two-dimensional probability distribution functions (PDFs) of gas density and temperature in Figures 16 and 17 for Sim-noSF and Sim-SFF, respectively. We consider gas within 15 kpc from the galactic center. Colors represent the total gas mass in each two-dimensional bin. As explained in Section 6.1, raw particle fields are used for the PDFs of particle-based codes, not the smoothed fields constructed by yt. Note that a gaseous halo, represented by low-density, high-temperature gas in the upper left corner of each panel, is designed to exist only in mesh-based codes, but is absent in particle-based codes (SPH codes and Gizmo; see Section 2). It is therefore not the intended scope of this paper to compare this hot halo or circumgalactic medium between codes. To guide the eye, we plot the mean temperature in each density bin from Changa's Sim-noSF run with a thick dashed line in each panel (in both Figures 16 and 17; in the range of $[{10}^{-26},{10}^{-21}]\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ ). Changa's mean profile is close to the "mean of means" of these nine codes, thus it helps to compare the PDFs' relative positions between codes. The thin dotted diagonal lines denote the slope of constant pressure process, and the thin dotted–dashed diagonal lines that of constant entropy process.

Overall, all codes exhibit similar behaviors in Figure 16 when without star formation and feedback, just like the broad similarity observed in Figure 14. A clear branch of gas is visible in all codes extending toward higher-density, lower-temperature, owing to the common treatment of cooling by the Grackle library (see also Figure 18, and a related discussion later in this section). But, as noted in Section 3.1, due to varying Grackle versions participating codes are interfaced with (Grackle v2.1 in Changa, Gasoline, Gadget-3 and Gizmo, versus Grackle v2.0 or below in Art-I, Art-II, Enzo, Ramses and Gear), the cooling rates differ slightly from code to code even at the same density, temperature, and metal fraction. See Section 2 (footnote ⁴⁸) or Section 3.1 (footnote ⁵²) for more information. Any remaining discrepancy is attributable to the difference in how each code sub-cycles its cooling module in the hydrodynamics calculation.

**Figure 18.** Gas density probability distribution function at 500 Myr for *Sim-noSF* without star formation or feedback. Note that a gaseous halo—low-density tails toward the left side of this plot—exists only in mesh-based codes, but not in particle-based codes (SPH codes or Gizmo). Shown in the bottom panel is the fractional deviation from the mean of these profiles. See Section 6.3 for more information on this figure.
Download figure:
Standard image High-resolution image

Now we compare the density–temperature PDFs when star formation and supernova feedback are included in Figure 17. All codes successfully lower the fraction of low-temperature, high-density gas by forming stars and then injecting their feedback energy (see also Figures 18–20, and a related qualitative discussion later in this section). However, notable differences exist between codes as to how gas reacts to the supernova feedback. For example, in Changa and Gear some gas is leaving the aforementioned high-density branch toward higher temperature (up to 10⁶ K) due to supernova feedback, but not in Gasoline, Gadget-3 and Gizmo. The high-temperature, high-density gas seen in Changa and Gear is associated with the small hot bubbles discussed in Figure 15. As explained earlier, these discrepancies in particle-based codes are attributed to different numerical implementations of the common feedback physics. Also noticeable is the hot gas being ejected from the disk as a result of feedback particularly in Enzo and Ramses, seen as a broader distribution of hot gas in Figure 17 compared to Figure 16.

For a more qualitative comparison of the ISM thermal structure, shown in Figures 18 and 19 are the gas mass distributions along the density axis (density PDF) for Sim-noSF and Sim-SFF, respectively, simply derived from Figures 16 and 17 above. Note again that a gaseous halo, represented by low-density tails toward the left side of these plots, is by design included only in the mesh-based codes, but not in the particle-based codes. In Figure 18, when without star formation and feedback all codes show similar distributions within a factor of a few difference in the density range $[{10}^{-25},{10}^{-22}]\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ . A notable deviation is that three particle-based codes, Gadget-3, Gear and Gizmo, hold more mass at density above ${10}^{-22}\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ than the rest of the codes do. However, in Figure 19, now in a more realistic setup including star formation and feedback, no clear systematic difference exists between mesh-based and particle-based codes. While the agreement in the range $[{10}^{-25},{10}^{-22}]\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ is again within a factor of a few (averaged fractional deviation in ${10}^{-25}\lt \rho \lt {10}^{-22}\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ is 28.6% or 0.109 dex), the codes diverge from one another up to more than an order of magnitude at density above ${10}^{-22}\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ , translating into differences in clumping properties (Figure 23) and SFRs (Figure 26).

**Figure 19.** Same as Figure 18 but for *Sim-SFF* with star formation and feedback. The thick dashed line denotes the star formation threshold density, ${n}_{{\rm{H}},\mathrm{thres}}=10\,{\mathrm{cm}}^{-3}$ .
Download figure:
Standard image High-resolution image

**Figure 19.** Same as Figure 18 but for *Sim-SFF* with star formation and feedback. The thick dashed line denotes the star formation threshold density, ${n}_{{\rm{H}},\mathrm{thres}}=10\,{\mathrm{cm}}^{-3}$ .
Download figure:
Standard image High-resolution image

Then, Figure 20 plots the code-by-code mass change in each density bin from Figures 18 to 19. This plot aims to measure the effect of star formation and supernova feedback by subtracting the density probability distribution of Sim-noSF from that of Sim-SFF. As noted in our discussion of Figure 17, Sim-SFF lowers the fraction of high-density gas by forming stars at above ${n}_{{\rm{H}},\mathrm{thres}}=10\,{\mathrm{cm}}^{-3}={\rho }_{\mathrm{gas},\mathrm{thres}}/{m}_{{\rm{H}}}$ (see Section 3.2) and then injecting their feedback energy. This impact is more evident in the bottom panel of Figure 20, where we show the sign-preserving logarithm, symlog(), of the mass change, making smaller changes more discernible.⁷⁴ Without exception, gas masses at density above ${\rho }_{\mathrm{gas},\mathrm{thres}}=1.67\times {10}^{-23}\,{\rm{g}}\,{\mathrm{cm}}^{-3}$ (thick dashed line) are reduced by star formation and feedback, even if inefficiently. The gas is either consumed by star formation or redistributed by thermal supernova feedback to a less dense region below ${\rho }_{\mathrm{gas},\mathrm{thres}}$ away from star-forming sites.

**Figure 20.** Code-by-code mass change in each density bin from Figures 18 to 19. This plot measures the effect of star formation and feedback by subtracting the density probability distribution function of *Sim-noSF* from that of *Sim-SFF*. The y-axis spans from $-7\times {10}^{8}\,\,{M}_{\odot }$ to $+3\times {10}^{8}\,\,{M}_{\odot }$ in a linear scale. Shown in the bottom panel is the sign-preserving logarithm of the mass difference in order to make smaller changes to stand out.⁷⁴ The thick dashed line denotes the star formation threshold density, ${n}_{{\rm{H}},\mathrm{thres}}=10\,{\mathrm{cm}}^{-3}$ . See Section 6.3 for more information on this figure.
Download figure:
Standard image High-resolution image

6.4. Stellar Disk Morphology

In this section, we study the morphology of stellar disks formed in Sim-SFF. Figure 21 shows the distributions of newly formed star particles in Sim-SFF for each of the 9 codes. Only the newly formed star particles are drawn, not the disk or bulge stars that were present in the IC. Star particles present in the IC are excluded from all "stellar" particle analyses hereafter. Each frame is centered on the galactic center, defined in Section 6.1 as the location of peak gas density within 1 kpc from the center of gas mass. This center almost always coincides approximately with the center of the most massive stellar clump. Colors represent the total newly formed stellar masses in each two-dimensional bin. The bottom rows also highlights clumps of newly formed stars (more discussion on stellar clumps later in this section).

**Figure 21.** 500 Myr composite of newly formed star particle distributions, edge-on (top) and face-on (bottom), for *Sim-SFF* with star formation and feedback. Only the newly formed star particles are drawn, not the disk or bulge stars that were present in the IC. Colors represent the total newly formed stellar mass in each two-dimensional bin. We also highlight the clumps of newly formed star particles identified by the friends-of-friends (FOF) algorithm with circles. Clumps with masses below $2.6\times {10}^{6}\,{M}_{\odot }$ and the most massive clump found by FOF (always associated with the stellar bulge at the galactic center) are excluded. See Section 6.4 for a detailed explanation of this figure. Compare with Figures 3, 15, and 35. The high-resolution versions of this figure and article are available at the Project website, http://www.AGORAsimulations.org/.
Download figure:
Standard image High-resolution image

Figure 22 depicts surface densities of newly formed star particles—excluding star particles present in the ICs—for Sim-SFF, calculated in cylindrically symmetric radial bins. While there are differences up to a factor of a few among some codes (e.g., between Gadget-3 and Gasoline, due to their different rates of galaxy-wide star formation shown in Figure 26; averaged fractional deviation for $2\lt r\lt 10$ kpc is 53.9% or 0.187 dex), all the lines can be well fit by an exponential disk profile at radii >1.5 kpc. Occasional fluctuations visible in the profiles (e.g., at ∼6.5 kpc in Gizmo) are due to dense stellar clumps located at these particular radii.

In order to compare the distribution of newly formed star particles and the level of disk fragmentation between different codes, we identify clumps in the distribution of newly formed star particles using the friends-of-friends (FOF) algorithm (Efstathiou et al. 1985).⁷⁵ We only consider clumps with newly formed stellar masses above $2.6\times {10}^{6}\,{M}_{\odot }$ (equivalent to 30 times the mass of star particles present in the IC, $30\,{m}_{\star ,\mathrm{IC}}$ ). The most massive clump found by FOF is excluded since it is always associated with the stellar bulge at the galactic center, but we do not explicitly remove gravitationally unbound clumps (which may have overestimated the number of clumps). Identified clumps are marked with circles in the bottom row of Figure 21, where the radius of each circle indicates the clump's virial radius. We also show the cumulative mass functions of newly formed stellar clumps in Figure 23. It is worth noting that very similar clump distributions and mass functions are discovered when we identify clumps using the HOP algorithm (Eisenstein & Hut 1998) in lieu of FOF. The general trends for clumps discussed below are largely independent of the clump finder.

**Figure 23.** Cumulative mass function of newly formed stellar clumps at 500 Myr for *Sim-SFF*. Clumps with masses below $2.6\times {10}^{6}\,{M}_{\odot }$ and the most massive clump found by FOF (always associated with the stellar bulge at the galactic center) are excluded. See Section 6.4 for more information on this figure, including an additional test Gizmo-ps2 for which Gizmo's Jeans pressure support is increased by a factor of 2.
Download figure:
Standard image High-resolution image

In all codes, the majority of newly formed stellar clumps have masses below ${10}^{7.5}\,{M}_{\odot }$ , and there is a relatively sharp decline in the number of clumps toward higher masses.⁷⁶ From these figures, it is also clear that nearly formed stellar clumps are the most prevalent in the Gizmo run, and less so in the Ramses run than in other codes. The relatively large number of stellar clumps in Gizmo is not a transient feature, but consistently observed across snapshots until 1 Gyr. This is related to the fact that Gizmo produces the most clumpy gas disk among the codes even with star formation and feedback in Sim-SFF (see Figure 3). While preserving all the common elements in comparison (such as pressure floor or resolution; described in Sections 3 and 4), the Gizmo group has carried out extensive tests with other simulation parameters to check what most dictates the level of fragmentation (e.g., smoothing kernel, ${N}_{\mathrm{ngb}}$ , Grackle version, slope limiter, dual energy formalism; but always within conventional norms), finding that the different parameter choices do not qualitatively alter the mass function. However, we note that increasing the Jeans pressure floor by as little as 100%—well within the uncertainty in the geometry prefactor of Jeans pressure support equation, Equation (3)—and reverting to Grackle v2.0—where metal cooling rates are slightly lower compared to v2.1 because of the different solar metallicity definition—entirely removes the discrepancy (see the black dashed line in Figure 23 labeled Gizmo-ps2). A possible explanation for the stronger fragmentation in Gizmo is that the "effective" gravitational resolution in Gizmo is slightly higher compared to other codes (as observed by Few et al. 2016), resulting from a combination of different choices in their implementation (e.g., slope limiter, gradient estimator, density estimator).⁷⁷

6.5. Stellar Disk Kinematics

Following the analysis shown in Section 6.2 for the gas disk kinematics, here we study the kinematics of stellar disks formed by each code in Sim-SFF. In Figure 24 we show the rotation velocity curves for newly formed star particles in Sim-SFF. As in gas rotation velocity curves of Figures 11, each line represents a mass-weighted average of stellar rotational velocities as a function of cylindrical radius. As in Section 6.4, only the newly formed star particles are considered for these profiles, not the disk or bulge stars in the IC. Just as in the gas rotation curves, the stellar rotation velocities show a high degree of similarity among the different codes, as good as within a few percent at certain radii (averaged fractional deviation for $2\lt r\lt 10$ kpc is 2.5% or 0.011 dex). Disagreements seen at radius >12 kpc for some codes such as Art-II and Gasoline are attributed to a small number statistics near the edge of a stellar disk.

**Figure 24.** Stellar rotation velocity curves at 500 Myr for *Sim-SFF* with star formation and feedback. The cylindrical radius and rotational velocity are with respect to the galactic center—location of maximum gas density within 1 kpc from the center of gas mass. Only the newly formed star particles are considered. See Section 6.5 for a detailed explanation on how this figure is made. Compare with Figure 11. The y-axis range of the top panel is kept identical among Figures 10, 11, and 24 for easier comparison.
Download figure:
Standard image High-resolution image

In Figure 25 we analyze the velocity dispersion curves for newly formed star particles in Sim-SFF. As in gas velocity dispersion curves of Figures 13, this plot quantifies the residual velocity components other than the rotational velocity computed in Figure 24. Once again, a good agreement is found that all codes lie within a few tens of percent from one another at radii <10 kpc (averaged fractional deviation for $2\lt r\lt 10$ kpc is 11.2% or 0.046 dex). When compared with Figures 12 and 13, the agreement is particularly better in the central region (<1.5 kpc). Note that when we plot the ratio of vertical velocity dispersion to total velocity dispersion in the bottom panel of Figure 25, the systematic discrepancy found in Figure 13 between mesh-based and particle-based codes at radii <1.5 kpc no longer exists. This confirms our assertion in Section 6.2 that the said discrepancy in Figure 13 is due to vertical gas movement captured only in mesh-based codes.

6.6. Star Formation Relation

In this section, we compare the SFRs of different codes in Sim-SFF and check whether we reproduce the observed relation between gas surface density and SFR surface density. In the following discussion, SFRs are time-averaged over the past 20 Myr and derived based on the ages of newly formed star particles in the 500 Myr snapshots.

Figure 26 displays the evolution of the galaxy-wide SFR of each run by 500 Myr. For most codes, SFRs increase at early times, reach a maximum at 200–300 Myr after the start of the simulation, and then plateaus at later times. The SFRs evolve smoothly without evidence for strong bursts. The SFRs are within a factor of ∼3 from one another at all times (averaged fractional deviation for $50\lt t\lt 500$ Myr is 32.8% or 0.123 dex), and for most codes the values settle at 2–4 $\,{M}_{\odot }\,{\mathrm{yr}}^{-1}$ over most of the simulated time. Some differences are noteworthy. For example, Gasoline and Enzo predict somewhat lower SFRs, especially at intermediate times, while the SFR for Gizmo never plateaus or begins to decline, but reaches a maximum of $\sim 6\,\,{M}_{\odot }\,{\mathrm{yr}}^{-1}$ at 500 Myr. Gadget-3 produces the most stellar mass in this time period, but its SFR does not further grow after ∼300 Myr. The total stellar mass formed in 500 Myr ranges from $0.8\times {10}^{9}\,{M}_{\odot }$ in Gasoline to $2.4\times {10}^{9}\,{M}_{\odot }$ in Gadget-3. Gizmo's efficient and clumpy star formation is discussed in detail in Section 6.4 and Figure 23. We note again that increasing the Jeans pressure floor in Gizmo by a factor of 2—well within the uncertainty in the geometry prefactor of Jeans pressure support equation, Equation (3)—and reverting to Grackle v2.0 entirely removes Gizmo's discrepancy (see the black dashed line in Figure 26 labeled Gizmo-ps2; compare with Figure 23). In addition to the systematic tests the Gizmo group performed, the Gadget-3 group has also tried a run with a slight variation in the treatment of supernova feedback that would increase the number of gas particles inside the "shock radius" (Section 5.7). We find that this slight variation indeed produces significantly lower SFR for Gadget-3, closer to Enzo's value (results not shown here). Our tests strongly suggest that the SFR evolution is highly sensitive to the details of the numerical implementation of the common subgrid physics, including pressure floor and feedback prescriptions.

Now, to better understand the differences in SFR, we plot in Figure 27 the SFR surface densities as functions of cylindrical distance from the galactic center. Again, SFRs are estimated using the newly formed star particles that are younger than 20 Myr old. The agreement between the codes is generally encouraging, especially outside the central 0.5 kpc. We note, however, that the SFR within 0.5 kpc constitutes a large fraction of the total galactic SFR. Enzo and Gasoline have significantly lower SFRs in the central region of the galaxy, thus explaining the difference seen in Figure 26. In contrast, much of the excess star formation in Gadget-3 and Gizmo takes place at large radii and is likely related to the formation of larger numbers of stellar clumps, as seen in Figures 21 and 23. For example, a ∼6.5 kpc peak in Gizmo's SFR profile in Figure 27 coincides with a peak at the same radius in its stellar surface density profile in Figure 22.

When measured on galactic scales (∼kpc), the gas surface density and SFR surface density are tightly correlated (e.g., Schmidt 1959; Kennicutt 1989, 1998; Kennicutt et al. 2007; Bigiel et al. 2008). Consequently, the relation between these two observables, the Kennicutt–Schmidt (KS) relation, is frequently used to calibrate the modeling of star formation in galaxy-scale simulations. In Figure 28 we show the KS relation for Sim-SFF, where gas and SFR surface densities are measured within cylindrical annuli, as computed in Figures 5 and 27, respectively. Only annuli with nonzero averaged SFR surface densities are considered. The thick black dashed line denotes a best observational fit by Equation (8) in Kennicutt et al. (2007), $\mathrm{log}\,{{\rm{\Sigma }}}_{\mathrm{SFR}}=1.37\,\mathrm{log}\,{{\rm{\Sigma }}}_{\mathrm{gas}}-3.78$ , for a spatially resolved patches in M51a. The blue hatched contour marks the observed sub-kpc patches in nearby galaxies taken from Figure 8 of Bigiel et al. (2008), where their hydrogen surface density is multiplied by 1.36 to account for helium to match the total gas density in our simulations (see their Section 2.3.1).⁷⁸

As Figure 28 reveals, all participating codes predict a KS relation that agrees well with one another within a factor of a few, and with observed nearby disk galaxies in Bigiel et al. (2008). In particular, by combining star formation and thermal supernova feedback, most codes match both the normalization of the observed relation and the characteristic "threshold" value of the gas surface density ( $\sim 10\,\,{M}_{\odot }\,{\mathrm{pc}}^{-2}$ ) below which star formation becomes less efficient. However, it should be noted that our simulations do not include multi-phase gas physics that explicitly models the transition between atomic and molecular hydrogen at $\sim 10\,\,{M}_{\odot }\,{\mathrm{pc}}^{-2}$ (e.g., Krumholz et al. 2009). More investigation may be needed to check how the apparent change in slope seen here is affected by our choice of subgrid physics (e.g., star formation efficiency ${\epsilon }_{\star }$ , or thermal feedback energy budget; see Section 3.2). Figure 28 also highlights that there are some differences between the codes. SFR surface densities of Gasoline and Enzo lie slightly below the other codes at a given gas surface density, while Gadget-3 and Gizmo show higher SFRs than the rest of the codes. These differences are generally in line with what was observed in global SFR of Figure 26.

In order to better match the observational technique by Bigiel et al. (2008), one may consider using sub-kpc patches to generate the KS relation rather than cylindrical annuli. In Figures 29 and 30 we present mock observations of gas surface densities and SFR surface densities for Sim-SFF at 500 Myr. For mesh-based codes in Figure 29, their panels in Figure 3 are degraded to 750 pc resolution. For gas particles for particle-based codes in Figure 29 and young star particles of age less than 20 Myr in Figure 30, we use the cloud-in-cell (CIC) scheme to deposit particle masses on to a uniform two-dimensional grid with 750 pc resolution.⁷⁹ This resolution matches Bigiel et al. (2008)'s reported working resolution. The dynamic range of the color axis are set to 3 orders of magnitude in both Figures 29 and 30, in order to help readers to see if the gas depletion times are similar among pixels.

**Figure 29.** 500 Myr snapshot of face-on gas surface densities for *Sim-SFF* at 750 pc resolution. For particle-based codes, surface densities are estimated by depositing gas particles via the cloud-in-cell (CIC) scheme on to a two-dimensional uniform grid with 750 pc resolution. This image could be considered as a degraded version of Figure 3 although a different deposit algorithm is used for particle-based codes. See Section 6.6 for a detailed explanation on how this figure is made.
Download figure:
Standard image High-resolution image

**Figure 30.** 500 Myr snapshot of face-on star formation rate surface densities for *Sim-SFF* at 750 pc resolution. SFR surface densities are estimated by depositing the newly formed star particles that are younger than 20 Myr old on to a uniform grid with 750 pc resolution. See Section 6.6 for a detailed explanation on how this figure is made. The dynamic range of the color axis (3 orders of magnitude) are kept identical among Figures 29 and 30 to help see if the gas depletion times are similar among pixels.
Download figure:
Standard image High-resolution image

Then we identify all 750 × 750 pc patches with nonzero SFR surface density, and plot them in Figure 31 on the same KS plane as Figure 28. As an example, all such patches found in the Changa run are shown as gray triangles. For all other codes, only 80% percentile contours are drawn. Again, all participating codes reproduce the slope and normalization of the observed KS relation well. But slight differences in SFR surface densities exist. Contours for Gasoline and Enzo lie below the pack, while Gadget-3 and Gizmo's contours sit above all other codes at all gas surface densities. These findings are consistent with what we see in Figures 26 and 28.

6.7. Other Comparisons: Metal Fraction, Disk Elevation, and Spatial Resolution

In Figure 32 we present the projections of density-square-weighted gas metal fraction for Sim-SFF with star formation and feedback. The metal fraction we show here is simply the ratio of metal density to total gas density, and it is not in units of ${Z}_{\odot }$ , in order to minimize any confusion caused by Grackle 2.0 versus 2.1 implementations (see footnotes ⁴⁸ and ⁵²). The color axis spans from 0.01 to 0.04 in a linear scale. The edge-on views of mesh-based codes, particularly Enzo and Ramses, show high metallicity filaments flowing out of the disk, carrying metals into the embedding halo (see also Figure 15). However, as noted in Section 2, a gaseous halo exists only in mesh-based codes, but not in particle-based codes (SPH codes and Gizmo). Therefore, it is not the intended scope of this paper to compare the metal content of the halo which, by design, is captured only in mesh-based codes. In the face-on views within the disk we find qualitatively similar results across codes, with denser gas (corresponding to star-forming regions) tending to be more metal-enriched. Significant differences in the morphology of metal distribution exist, however. Differences in numerical implementations of the stellar feedback model are responsible for such discrepancies (see Section 6.3 for more discussion).⁷³

**Figure 32.** 500 Myr composite of density-square-weighted gas metal fraction projections, edge-on (top) and face-on (bottom), for *Sim-SFF* with star formation and feedback. Colors represent the ratio of metal density to total gas density (not in units of ${Z}_{\odot }$ ). The color axis spans from 0.01 to 0.04 in a linear scale. See Section 6.7 for a detailed explanation of this figure. Compare with Figures 3, 15, and 21. The high-resolution versions of this figure and article are available at the Project website, http://www.AGORAsimulations.org/.
Download figure:
Standard image High-resolution image

Figure 33 shows the mass-weighted averages of gas metal fraction in Sim-SFF on a two-dimensional density–temperature plane for gas within 15 kpc of the galactic center. Raw particle fields are used for particle-based codes, not the interpolated or smoothed fields constructed in yt. Note again that a gaseous halo represented by low-density, high-temperature gas in the upper left corner of each panel exists only in mesh-based codes. For particle-based codes, high-density, low-temperature gas has higher metallicity because of its correspondingly higher SFR and thus metal enrichment. For mesh-based codes, on the other hand, high metal fraction is found in low-density, high-temperature gas as well, which is contaminated by hot metal-enriched materials dispersed by supernova feedback.⁸⁰ These two observations are related to how well metals get mixed in each code. Readers may have noticed the difference between mesh-based and particle-based codes already in Figure 32 by focusing on mixing of the metals. With neither the halo gas nor a sophisticated metal mixing scheme in place, metal-enriched gas in particle-based codes tends to stay near the dense star-forming sites that provided the metals.

In Figure 34 is the density-weighted average of gas elevations from the x–y disk plane for Sim-SFF with star formation and feedback. That is, averages of $({z}_{{\rm{i}}}-{z}_{\mathrm{center}})$ such that a positive (negative) value indicates the gas along that line of sight is located above (below) the disk plane on average. For particle-based codes, we use the reconstructed density field from yt's in-memory octree on which gas particles are deposited using smoothing kernels (see Section 6.1). This figure helps to visualize and estimate the warping of the gas disk (e.g., Levine et al. 2006). All participating codes produce largely flat disks, with vertical offsets less than ±1 kpc. Yet, it is also true that all codes show some levels of coherent warping or flaring along the disk plane. This strongly suggests that all these codes are able to resolve vertical instabilities.

Finally, Figure 35 compares the sizes of spatial resolution elements each code imposes on the galactic disk of Sim-SFF at 500 Myr. For mesh-based codes, this is a projection of gas cell sizes along different lines of sight, weighted by (cell volume)⁻² so that the maximal resolution element along that line of sight could stand out. For particle-based codes, this is a projection of gas particle sizes, defined as ${({m}_{\mathrm{gas}}/{\rho }_{\mathrm{gas}})}^{1/3}$ , smoothed on to yt's octree (the same octree yt used in other edge-on/face-on visualizations; see Section 6.1 for more information on the octree), weighted by (particle volume)⁻², defined as ${({m}_{\mathrm{gas}}/{\rho }_{\mathrm{gas}})}^{-2}$ . The color axis spans from 10 to 10³ pc in a logarithmic scale, with highest resolution shown in dark blue. The green color in mesh-based codes marks the finest mesh size permitted, 80 kpc, while the blue color in the spirals and clumps of particle-based codes can be associated with the minimum smoothing length permitted, $0.2\times 80$ pc. Because of its Lagrangian nature, particle-based codes best demonstrate their strengths in dense clumps and spirals. Meanwhile, with its flexible refinement strategy, mesh-based codes may best utilize their strengths in the contact regions with high density contrast, such as above and below the disk. For example, in the edge-on views of Figure 35, the green-colored high resolution region covering the disk is thicker in mesh-based codes than in particle-based codes.⁸¹

7. DISCUSSION AND CONCLUSION

Using an isolated Milky Way-mass galaxy simulation, we compared results from 9 state-of-the-art gravito-hydrodynamics codes widely used in the numerical community (Section 5). We utilized the infrastructure we have built for the AGORA High-resolution Galaxy Simulations Comparison Project. For the common ICs for these isolated galaxy simulations we used the ones generated by Makedisk (Section 2, see footnote 46). We also adopted the common physics models (e.g., radiative cooling and UV background by the standardized package Grackle; Section 3.1, see footnote 42) and common analysis toolkit yt(see footnote 43), both of which are publicly available. Subgrid physics such as pressure floor, star formation prescription, supernova feedback energy, and metal production have been meticulously constrained across participating codes (Sections 3.1 and 3.2). Strenuous efforts have also been made to ensure the consistency between the parameters that control resolutions of the codes (Section 4).

With numerical accuracy that resolves the disk scale height—high-order numerical methods in modern simulation codes combined with high spatial resolution—we find that the codes overall agree well with one another in many dimensions, including: gas and stellar surface densities, gas and stellar rotation curves and velocity dispersions, gas density and temperature distribution functions, disk vertical heights, newly formed stellar clumps, SFRs, and KS relations (Section 6). Quantities such as velocity dispersions are very robust (e.g., gas and newly formed stellar velocity dispersions agree within a few tens of percent at all radii) while other measures like newly formed stellar clump mass functions show more significant variation (differ by up to a factor of ∼3). In Table 2 we summarize the relative differences between codes for the main observables studied in this report. Some discrepancies can be understood as systematic differences between codes, for example, between mesh-based and particle-based codes in the low-density region (Figures 2–3 and Section 6.1), and between more diffusive and less diffusive schemes in the high-density tail of the density distribution (Figure 19 and Section 6.3). The latter translates into differences in clumping properties (Figure 23) and SFRs (Figure 26) of different codes. These intrinsic code differences are not as serious as some might have mistakenly extrapolated from previous code comparisons (e.g., Scannapieco et al. 2012), and are generally small compared to the variations in numerical implementations of the common subgrid physics such as supernovae feedback. Our experiment reveals the remarkable level of agreement between different modern simulation tools despite their codebases having evolved largely independently for many years. It is also reassuring that our computational tools are more sensitive to input physics than to intrinsic differences in numerical schemes, and that predictions made by the participating numerical codes are reproducible and likely reliable. If adequately designed in accordance with our proposed common parameters (e.g., cooling, metagalactic UV background, stellar physics, resolution; see Sections 3 and 4), results of a modern high-resolution galaxy formation simulation are likely robust.

Table 2. Relative Differences between Codes in Main Observables (for Sim-SFF run)^a

	Level of Agreement between Codes	Figures	Relevant Sections
Gas surface density (cylindrically binned)	averaged fractional deviation for $2\lt r\lt 10$ kpc = 32.2% (0.121 dex) (see footnote ⁷⁰)	Figure 5	Section 6.1
Gas surface density (vertically binned)	averaged fractional deviation for $z\lt 0.6$ kpc = 30.4% (0.115 dex)	Figure 7	Section 6.1
Gas average vertical height	averaged fractional deviation for $2\lt r\lt 10$ kpc = 19.1% (0.076 dex)	Figure 9	Section 6.1
Gas rotation velocity	averaged fractional deviation for $2\lt r\lt 10$ kpc = 2.8% (0.012 dex)	Figure 11	Section 6.2
Gas velocity dispersion	averaged fractional deviation for $2\lt r\lt 10$ kpc = 17.8% (0.071 dex)	Figure 13	Section 6.2
Gas density probability distribution	averaged fractional deviation in ${10}^{-25}\lt \rho \lt {10}^{-22}\,{\rm{g}}\,{\mathrm{cm}}^{-3}=28.6$ % (0.109 dex), up to more than an order of magnitude difference at $\rho \gt {10}^{-22}\,{\rm{g}}\,{\mathrm{cm}}^{-3}$	Figure 19	Section 6.3
Newly formed stellar surface density	averaged fractional deviation for $2\lt r\lt 10$ kpc = 53.9% (0.187 dex)	Figure 22	Section 6.4
Newly formed stellar clump mass function	all data points lie within a factor of ∼3 from the mean at each mass	Figure 23	Section 6.4
Newly formed stellar rotation velocity	averaged fractional deviation for $2\lt r\lt 10$ kpc = 2.5% (0.011 dex)	Figure 24	Section 6.5
Newly formed stellar velocity dispersion	averaged fractional deviation for $2\lt r\lt 10$ kpc = 11.2% (0.046 dex)	Figure 25	Section 6.5
Galaxy-wide star formation rate	averaged fractional deviation for $50\lt t\lt 500$ Myr = 32.8% (0.123 dex)	Figure 26	Section 6.6
KS relation (azimuthally averaged)	all data points lie within a factor of ∼3 from the mean at each ${{\rm{\Sigma }}}_{\mathrm{gas}}$	Figure 28	Section 6.6
KS relation (patch-averaged)	all data points lie within a factor of ∼3 from the mean at each ${{\rm{\Sigma }}}_{\mathrm{gas}}$	Figure 31	Section 6.6

Note.

^aFor descriptions of observables, we refer the readers to relevant figures and sections.

Download table as: ASCII Typeset image

It is worth briefly noting a few points about our study presented in this paper. (1) During the course of the present study, we have developed and field-tested important pieces of the AGORA infrastructure such as the common IC, common physics models, and common analysis toolkit. In particular, it should be noted that all the analyses in Section 6 are carried out with common yt scripts that are nearly independent of simulation codes. This common analysis platform approach has repeatedly proven its strength in AGORA comparisons including this study, significantly reducing the cost needed to hack the codes that any one researcher might not be familiar with, and allowing moving straight to science-driven comparisons of underlying physical properties. (2) While we find that the 500 Myr snapshot we used for the comparison is representative of each simulation, we caution that any similarity or discrepancy found here may not be universally the case at every single epoch. In an ongoing study using the same suite of simulations presented here, but multiple snapshots up to 2 Gyr, we are systematically checking if any conclusion drawn in this work is challenged by the fact that we compared a snapshot of a galaxy at a single epoch.⁸² (3) Comparison studies in the AGORA Collaboration including this work are not intended to decide which numerical implementation is a "correct" one. The problem we are trying to numerically solve, i.e., galaxy formation, does not have a well-defined solution at a given resolution that every code is expected to converge to. Thus, it is never our intention to identify a "correct" or "incorrect" code, nor even a "better" or "worse" code. Instead, we aim to determine how much scatter one should expect among different numerical implementations in a particular problem of galaxy formation, given nominally similar physics and runtime parameters.

We plan to further investigate our isolated disk galaxy simulations in other interesting dimensions, such as disk stability, bulge-to-disk decomposition, spiral and bar formation, and mass inflow and outflow, among others. As mentioned in Section 3.2, we also intend to calibrate feedback schemes against observations (e.g., metrics such as galactic fountain, outflows, mass-loading, fraction of hot/warm/cold gas, and main sequence star formation) and against one another. While we complete analyses for these ongoing efforts, we are aiming to publicly release the 500 Myr snapshots used in the present paper from all participating codes in 2017 January (tentatively) through the AGORA Project website (see footnote 41). This is to allow any interested party in the numerical galaxy formation community to be able to compare their own simulations with the AGORA snapshots, using our publicly available common yt script if needed.⁶⁸

Finally, we emphasize the role the AGORA Project played in promoting collaborative and reproducible research in the numerical galaxy formation community. Over the past four years we have collaboratively formed a one-of-a-kind platform where members of the numerical community can work together and verify one another's work. Not only have we successfully built a common, publicly available infrastructure fully encompassing all the components to run galaxy-scale simulations in a reproducible manner—ICs, physics packages, calibrated runtime parameters, analysis pipeline, and data storage—but we also have founded an open forum where members could talk to and learn from one another. This Project has become a great experiment in itself in which it was continuously shown how beneficial a platform like this could be for any scientific community. Through workshops and teleconferences, and via common languages and infrastructure built together, Project participants were able to better understand other codes, and improve their own. Participants found an optimal set of simulation parameters that makes their code to be best compatible with others. We came to understand how seemingly identical parameters differ in their meanings in different codes, and how seemingly different parameters have in fact identical meanings. In some comparisons, numerical errors were discovered and fixed in participating codes. The AGORA framework, now tested with the common physics and subgrid models, are serving as a launchpad to initiate astrophysically motivated comparisons aimed at raising the predictive power of galaxy simulations, especially as we run the zoom-in cosmological simulations outlined in our flagship paper (Kim et al. 2014). In the coming years, we expect AGORA to continue to provide a sustainable and fertile platform on which numerical experiments are readily validated and cross-calibrated, and ambitious multi-platform collaborations are forged.

The authors of this paper thank the members of the AGORA Collaboration who are not on the author list but have provided helpful suggestions throughout the progress of the paper, including John Forbes. We also thank Volker Springel for providing the original versions of Gadget-3 and Makedisk to be used in the AGORA Project. We gratefully acknowledge the financial and logistical support from the University of California High-Performance AstroComputing Center (UC-HiPACC) during the annual AGORA Workshops held at the University of California Santa Cruz from 2012 to 2016. This research also used resources of the National Energy Research Scientific Computing Center (NERSC), a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. The publicly available Enzo and yt codes used in this work are the products of collaborative efforts by many independent scientists from numerous institutions around the world. Their commitment to open science has helped make this work possible. Ji-hoon Kim acknowledges support from NASA through an Einstein Postdoctoral Fellowship, grant PF4-150147, and support from the Moore Center for Theoretical Cosmology and Physics at Caltech. A part of his computing time was provided by Extreme Science and Engineering Discovery Environment (XSEDE) allocation TG-AST140064. XSEDE is supported by National Science Foundation (NSF) grant No. ACI-1053575. He is also grateful for the support from the computational team at SLAC National Accelerator Laboratory during the usage of the clusters for the simulation analysis. Oscar Agertz acknowledges support from STFC consolidated grant ST/M000990/1 and the Swedish Research Council grant 2014-5791. Daniel Ceverino acknowledges support from the European Research Council (ERC) via the ERC Advanced Grant STARLIGHT Project No. 339177. Robert Feldmann acknowledges support in part by NASA through Hubble Fellowship grant HF2-51304.001-A awarded by the Space Telescope Science Institute, which is operated by the Association of Universities for Research in Astronomy, Inc., for NASA, under contract NAS 5-26555, in part by the Theoretical Astrophysics Center at UC Berkeley, and by NASA ATP grant 12-ATP-120183. Alessandro Lupi acknowledges support by the ERC Project No. 267117 (PI J. Silk). Tom Quinn acknowledges partial support by the NSF through grants No. AST-1514868 and AST-1311956 and NASA Hubble grant HST-AR-13264. Changa simulations where run on resources provided by XSEDE and NASA Pleiades. Robert Thompson, Matthew Turk and Nathan Goldbaum acknowledge support by the Gordon and Betty Moore Foundation's Data-Driven Discovery Initiative through grant GBMF4561 (PI M. Turk), and by the NSF through grant No. ACI-1535651. Tom Abel acknowledges partial support by the Kavli Foundation. Sukanya Chakrabarti acknowledges support by the NSF through grant No. AST-1517488. Piero Madau acknowledges support by the NSF through grant No. AST-1229745 and by NASA through grant NNX12AF87G. Kentaro Nagamine and Ikkoh Shimizu acknowledge support from the JSPS KAKENHI grant No. JP26247022. Some of the Gadget-3 simulations were carried out on the XC30 machine at the Center for Computational Astrophysics, National Astronomical Observatory of Japan. Brian O'Shea acknowledges support from NASA through grants NNX12AC98G, NNX15AP39G, and NASA Hubble theory grants HST-AR-13261.01-A and HST-AR-14315.001-A. He was also supported in part by the sabbatical visitor program at the Michigan Institute for Research in Astrophysics at the University of Michigan in Ann Arbor, and gratefully acknowledges their hospitality. Joel Primack acknowledges support from STScI through grant HST-GO-12060.12-A-004, and NASA Advanced Supercomputing for Pleiades time on which Art-I simulations were run. Christine Simpson acknowledges support from the ERC under ERC-StG grant EXAGAL-308037 and from the Klaus Tschira Foundation. John Wise acknowledges support by the NSF through grants No. AST-1333360 and AST-1614333 and NASA Hubble theory grants HST-AR-13895 and HST-AR-14326.

APPENDIX A: PRESSURE FLOOR PARAMETERS IN SELECTED CODES

We caution that the definition of ${N}_{\mathrm{Jeans}}$ in our Jeans pressure floor recommendation, Equation 3(a),

$\begin{eqnarray}&&{P}_{\mathrm{Jeans}}=\displaystyle \frac{1}{\gamma \pi }{N}_{\mathrm{Jeans}}^{2}G{\rho }_{\mathrm{gas}}^{2}{\rm{\Delta }}{x}^{2}\end{eqnarray} \tag{ 3a }$

is different from another widely used formula in particle-based codes, Equation (1) of Hopkins et al. (2011),

$\begin{eqnarray}&&{P}_{\mathrm{Jeans}}=\displaystyle \frac{1.2}{\gamma }{N}_{\mathrm{Jeans}}^{2/3}G{\rho }_{\mathrm{gas}}^{2}{h}_{\mathrm{sml}}^{2}.\end{eqnarray} \tag{ 8 }$

Our choice of ${N}_{\mathrm{Jeans}}=4$ in Equation 3(a) (see Section 3.1) is equivalent to ${N}_{\mathrm{Jeans}}=8.75$ in Equation (8) if the smoothing length ${h}_{\mathrm{sml}}$ is replaced with a fixed number ${\rm{\Delta }}x=80$ pc. For Gear, the runtime parameter JeansMassFactor controls the Jeans pressure support and it equates to ${N}_{\mathrm{Jeans}}$ in Equation (8), to be set to 8.75. For Changa and Gasoline, the relevant runtime parameter is dResolveJeans, but it is not equal to ${N}_{\mathrm{Jeans}}$ in Equation (8), but to the entire prefactor $1.2\,{N}_{\mathrm{Jeans}}^{2/3}{\gamma }^{-1}$ . Therefore, to follow the AGORA recommendation in Section 3.1, dResolveJeans should be set to $1.2\times {(8.75)}^{2/3}\times {(5/3)}^{-1}=3.06$ , along with replacing ${h}_{\mathrm{sml}}$ with 80 pc. For some codes, one of the pressure floor formulae is simply hardcoded without a tunable runtime parameter: Equation (8) in Gadget-3, and Equation (3a) in Gizmo. The parameter MinimumPressureSupportParameter in Enzo refers to ${N}_{\mathrm{Jeans}}^{2}$ in Equation (3a) above (see Grid $\_$ SetMinimumSupport.C). Thus, to follow the AGORA recommendation it should be set to 16. For pressure floor implementations using polytropes in Art-II and Ramses, see Sections 5.2 and 5.4, respectively.

APPENDIX B: STAR FORMATION EFFICIENCY PARAMETERS IN Changa AND Gasoline

Unlike in other codes, the parameter which controls the star formation efficiency in Changa and Gasoline, CStar ( ${c}_{\star }$ defined in Equation 3(a) of Katz 1992), is not equal to ${\epsilon }_{\star }$ in our Equation (4) of Section 3.2. This is due to a difference in the denominator of the Schmidt formula: ${t}_{{\rm{g}}}={(1/(4\pi G{\rho }_{\mathrm{gas}}))}^{1/2}$ in Equation 3(a) of Katz (1992), versus ${t}_{\mathrm{ff}}={(3\pi /(32G{\rho }_{\mathrm{gas}}))}^{1/2}$ in our Equation (4). Therefore, in order to follow the AGORA recommendation in Section 3.2, CStar should be set to ${\epsilon }_{\star }{({t}_{{\rm{g}}}/{t}_{\mathrm{ff}})=0.01\times (32/12{\pi }^{2})}^{1/2}=0.0052$ in Changa and Gasoline.

APPENDIX C: SOFTENING AND SMOOTHING PARAMETERS IN PARTICLE-BASED CODES

Throughout this paper we define the gravitational softening length ${\epsilon }_{\mathrm{grav}}$ (Section 4.1) as the equivalent Plummer softening length. This equates to the parameters Softening[Gas/Halo/Disk/Stars] in Gadget-3, Gear and Gizmo. It is however different from the typical definition of spline size h beyond which the gravitational force becomes exactly Newtonian (h as defined in Equation (4) of Springel 2005). For Gadget-3, Gear and Gizmo, the spline size h is equal to $2.8\,{\epsilon }_{\mathrm{grav}}=2.8\times 80=224$ pc (see ForceSoftening[i] in gravtree.c). Note that the parameter MinGasHsmlFractional that controls the minimum hydrodynamical smoothing length (Section 4.2) is defined as a fraction of h, not of ${\epsilon }_{\mathrm{grav}}$ . Therefore, in order to set the minimum smoothing length to $0.2\,{\epsilon }_{\mathrm{grav}}$ , MinGasHsmlFractional should be set to $0.2/2.8=0.0714$ .

By contrast, the spline size parameter dSoft in Changa and Gasoline is not equal to the equivalent Plummer softening length, but to $h/2$ in the Springel (2005) definition of h above.⁸³ Therefore, in order to have ${\epsilon }_{\mathrm{grav}}=80$ pc in Changa or Gasoline runs, dSoft should be set to $2.8\,{\epsilon }_{\mathrm{grav}}/2=224/2=112$ pc. One should keep in mind that the parameter which controls the minimum hydrodynamical smoothing length, dhMinOverSoft, is defined as a fraction of dSoft, not of ${\epsilon }_{\mathrm{grav}}$ . Therefore, in order to set the minimum smoothing length to $0.2\,{\epsilon }_{\mathrm{grav}}$ , dhMinOverSoft should be set to $0.2\times (2/2.8)=0.143$ .