First Sagittarius A* Event Horizon Telescope Results. V. Testing Astrophysical Models of the Galactic Center Black Hole

In this paper, we provide a first physical interpretation for the Event Horizon Telescope (EHT)'s 2017 observations of Sgr A*. Our main approach is to compare resolved EHT data at 230 GHz and unresolved non-EHT observations from radio to X-ray wavelengths to predictions from a library of models based on time-dependent general relativistic magnetohydrodynamics (GRMHD) simulations, including aligned, tilted, and stellar wind-fed simulations; radiative transfer is performed assuming both thermal and non-thermal electron distribution functions. We test the models against 11 constraints drawn from EHT 230 GHz data and observations at 86 GHz, 2.2 $\mu$m, and in the X-ray. All models fail at least one constraint. Light curve variability provides a particularly severe constraint, failing nearly all strongly magnetized (MAD) models and a large fraction of weakly magnetized (SANE) models. A number of models fail only the variability constraints. We identify a promising cluster of these models, which are MAD and have inclination $i \le$ 30$^\circ$. They have accretion rate $(5.2$-$9.5)\times10^{-9}M_\odot$yr$^{-1}$, bolometric luminosity $(6.8$--$9.2)\times10^{35}$ erg s$^{-1}$, and outflow power $(1.3$--$4.8)\times10^{38}$ erg s$^{-1}$. We also find that: all models with $i \ge$ 70$^\circ$ fail at least two constraints, as do all models with equal ion and electron temperature; exploratory, non-thermal model sets tend to have higher 2.2 $\mu$m flux density; the population of cold electrons is limited by X-ray constraints due to the risk of bremsstrahlung overproduction. Finally we discuss physical and numerical limitations of the models, highlighting the possible importance of kinetic effects and duration of the simulations.


Introduction
The center of the Milky Way contains a massive compact object that is likely a supermassive black hole (Do et al. 2019;Gravity Collaboration et al. 2019). The putative black hole is surrounded by hot plasma that is visible across 17 decades in electromagnetic frequency. Hereafter we will use Sgr A * to refer to the supermassive black hole candidate and the hot plasma.
Sgr A * is one of the most studied objects on the sky, both observationally and theoretically. A key characteristic of the Sgr A * system is its extremely low overall luminosity with respect to the Eddington limit. The low luminosity suggests that matter falls onto Sgr A * 's central object in the form of a radiatively inefficient/advection-dominated accretion flow (RIAF/ADAF, as proposed by Ichimaru 1977;Rees et al. 1982;Narayan & Yi 1994, 1995a, 1995bNarayan et al. 1996Narayan et al. , 1998Yuan & Narayan 2014) rather than in the form of a radiatively efficient thin disk (Shakura & Sunyaev 1973). Since the nearly flat radio spectrum of Sgr A * is similar to radio spectra observed in jets from active galactic nuclei, it has also been suggested that the majority of the Sgr A * emission could be produced by a jet launched by an accreting black hole rather than matter falling through the black hole event horizon (Falcke et al. 1993;Falcke & Markoff 2000).
GRMHD simulations of ADAFs show that ADAF-like inflows are not unique. In particular, two dramatically different modes are observed, depending on the magnetic flux interior to the black hole equator: the standard and normal evolution (SANE) mode, in which the midplane magnetic field pressure is less than the gas pressure and magnetic fields are turbulent, and the magnetically arrested disk (MAD) mode, in which magnetic fields are strong and organized and can even disrupt accretion. An outstanding question about Sgr A * is whether the flow is in MAD or SANE mode, or possibly in a third mode that results from wind-fed accretion ).
The energy distribution of electrons in the emitting plasma is also not known. Because emission is driven by the synchrotron process, this is critical in determining the observational appearance of the source. In particular, the energy per electron may increase with latitude in the flow, leading to a jet or outflow that outshines an equatorial inflow.
The question of whether emission is dominated by an inflow or outflow is intimately tied to the problem of what drives an outflow, if there is one. In GRMHD simulations of black hole accretion the strength of the outflow depends sensitively on the black hole spin (e.g., Event Horizon Telescope Collaboration et al. 2019c, hereafter M87 * Paper V; Narayan et al. 2022). At 1 large spin GRMHD simulations produce powerful jets driven by extraction of black hole spin energy via the Blandford & Znajek (1977) process. A spatially resolved study of Sgr A * may thus also constrain the black hole spin and provide direct evidence for black hole energy extraction.
A major difficulty in determining the nature of Sgr A * radio emission is caused by the interstellar scattering screen that distorts our view of the Galactic center up to λ ∼ 1 mm wavelengths (see Johnson et al. 2018;Psaltis et al. 2018;Issaoun et al. 2019, and references therein). The Event Horizon Telescope (EHT) is a very long baseline interferometric (VLBI) experiment operating at 230 GHz or wavelength λ = 1.3 mm (see Event Horizon Telescope Collaboration et al. 2019b, hereafter M87 * Paper II, for an introduction to the instrument). EHT operates at high enough frequency to penetrate the scattering screen, with angular resolution sufficient to directly image structures in the immediate vicinity of the black hole event horizon.
In 2017 April the EHT observed Sgr A * (among other sources, including the core of the M87 galaxy; see Event Horizon Telescope Collaboration et al. 2019a, hereafter M87 * Paper I) and produced the first ever horizon-scale images of the source. We report the results of these observations in Event Horizon Telescope Collaboration et al. This paper is structured as follows. Section 2 describes our main assumptions, a one-zone source model, and a standard simulation and synthetic image library used to model nearhorizon emission from Sgr A * . Our model library assumes that general relativity is valid and the spacetime around Sgr A * is described by the Kerr metric (Kerr 1963). A discussion of Sgr A * observations in the context of alternative theories of gravity can be found in Paper VI. Our model library is based on time-dependent GRMHD simulations that, combined with general relativistic radiative transfer models, result in images and broadband spectra of the models. The library of simulated images was used in Paper III and Paper IV, to validate the Sgr A * EHT imaging and parameter estimation algorithms. In Section 3, we describe the observational constraints that are used in the present work to test theoretical models of Sgr A * . These data compose a subset of EHT 2017 observations and other non-EHT historical or other data. In Section 4, we describe model scoring procedures and use our model library to infer physical properties of the Sgr A * system. We discuss model limitations, results in the context of previous studies, and the outlook for future Sgr A * theoretical research directions in Section 5. Finally, we conclude in Section 6. This paper is supplemented with several appendices. Appendix A discusses numerical details of our simulations. Appendix B discusses the impact of physical and numerical effects on the model variability. Appendix C summarizes the results of applying constraints to our fiducial models in an extended set of figures.

Basic Assumptions
We assume that the mass of and distance to Sgr A * are We also assume that Sgr A * is a black hole described by the Kerr metric. The dimensionless spin, a * ≡ Jc/GM 2 , is a free parameter with − 1 < a * < 1, where J, G, and c are the black hole spin angular momentum, gravitational constant, and speed of light, respectively. Following M87 * Paper V, we use a * > 0 to indicate that the angular momenta of the accretion flow and black hole are parallel (the accretion flow is "prograde") and a * < 0 to indicate that the angular momenta of the accretion flow and black hole are antiparallel ("retrograde"). 152 The implied characteristic length The expected diameter of the black hole shadow is ( ) GM c D 2 27 2 for a * = 0. For |a * | > 0 the shadow is noncircular and its size and shape depend on a * and inclination i (the angle between the line of sight and the spin axis); its width can be as small as 9GM/(c 2 D) for a * = 1 and i = 90° ( Bardeen 1973).
If the emitting plasma is ionized hydrogen, then the Eddington luminosity an extremely small Eddington ratio.

One-zone Model
Here we motivate the more complicated models that follow using a simple one-zone model, following M87 * Paper V and one-zone models developed in the literature over many decades (e.g., Falcke 1996).
Consider a uniform sphere of plasma with radius r = 5GM/c 2 , comparable to the observed size of Sgr A * at 230 GHz (Paper III; Paper IV), with uniform magnetic field oriented at π/3 to the line of sight. In turbulent astrophysical plasmas, it is common for the gas pressure to be comparable to the magnetic pressure, so we set n i k B T i + n e k B T e = B 2 /(8π), where T i ≡ ion temperature, T e ≡ electron temperature, k B ≡ Boltzmann constant, and B ≡ magnetic field strength. The plasma is collisionless (as shown below), and it is plausible that the ions are preferentially heated, so we assume T i = 3T e . If the ions are subvirial by a factor of 3, as commonly seen in relativistic MHD simulations, i.e., (3/2)kT i ∼ (1/3)(1/2)(GMm p /r), then the ions are nonrelativistic and the electrons are relativistic, with Θ e ≡ k B T e /(m e c 2 ) ∼ 10.
Assuming a thermal electron distribution function (eDF) and therefore a thermal synchrotron emissivity j ν (e.g., Leung et al. 2011) and assuming optically thin emission, the flux density from a uniform sphere F ν = (4/3)πr 3 j ν D −2 10 23 Jy. Requiring This is consistent with a similar one-zone model fit to archival Sgr A * millimeter spectra (Bower et al. 2019). The synchrotron optical depth τ sync = rj ν /B ν ; 0.4, where B ν is the Planck function, so the optically thin approximation is marginal.
The one-zone model has electron-scattering optical depth τ e = σ T n e r ; 2 × 10 −6 , and thus the Compton parameter ( ) t t = Q  y 16 max , 0.003 e e e 2 2 is small. Synchrotron cooling therefore dominates Compton cooling.
The synchrotron cooling timescale for electrons t cool ≡ u e /Λ, where u e = 3n e kT e is the electron internal energy and ( ) L Q  B e n c m 5.4 e e e 2 4 2 3 2 is the synchrotron cooling rate for a thermal population of electrons with Θ e  1 (see Appendix A in Mościbrodzka et al. 2011; finite optical depth reduces Λ). Therefore, t cool = 2.3 × 10 4 s ; 1.1 × 10 3 GM/c 3 , which is longer than the inflow timescale r/v r ∼ r 3/2 . This suggests that radiative cooling can be neglected in the plasma models. 153 More detailed calculations confirm this estimate (Chael et al. 2018;Yoon et al. 2020). 154 The one-zone model solution implies that the mean free path to Coulomb scattering is large compared to GM/c 2 , i.e., the source plasma is collisionless. At Θ e ∼ 1, for example, the electron−electron Coulomb scattering cross section is comparable to the Thomson cross section, and the mean free path is therefore t -GM c e 1 2 . The electron−ion Coulomb scattering mean free path is even longer, and the electrons and ions are therefore poorly coupled. This is consistent with our assumption that the ions and electrons can have different temperatures (Shapiro et al. 1976;Ichimaru 1977;Rees et al. 1982) and motivates consideration of nonthermal (unrelaxed) eDFs (see Özel et al. 2000;Chan et al. 2009;Mościbrodzka et al. 2014;Davelaar et al. 2018;Chatterjee et al. 2021;Cruz-Osorio et al. 2021;El Mellah et al. 2021;Scepi et al. 2022;Fromm et al. 2022).

Numerical Models
The one-zone model is too simple for comparison with EHT data. In particular, it does not predict EHT image morphology, and it fails to model emission that arises outside the nearhorizon region, including 86 GHz emission and X-ray emission. Steady spherical accretion models (e.g.,  go one step beyond the one-zone model, incorporating relativistic gravity and a radially extended flow. Steady, disklike (RIAF) accretion models in the Kerr metric go still further and include rotation and departures from spherical symmetry (e.g., Broderick et al. 2009;Huang et al. 2009;Pu & Broderick 2018). Steady phenomenological models do not, however, self-consistently capture fluctuations in the flow. That requires either a statistical model (Lee & Gammie 2021) or a time-dependent numerical simulation. Here we use numerical simulations, adopt an ideal GRMHD model for the flow, employ simple parameterized models to assign an eDF, and solve the radiative transfer equation along geodesics to produce simulated images.
Unless stated otherwise, the initial conditions for the GRMHD simulations are constant angular momentum hydrodynamic equilibrium tori (Fishbone & Moncrief 1976), with orbital angular momentum that is parallel or antiparallel to the black hole spin. The tori are seeded with a weak, poloidal magnetic field. The simulations use varying torus pressure maximum radius (from ∼ 15 GM/c 2 to 40 GM/c 2 ), peak temperature, adiabatic index, and initial field configurations. These variations permit us to test the robustness of our results (see Appendix A).
The torus initial conditions are motivated by the notion that the near-horizon flow, where most of the emission is generated (M87 * Paper V), relaxes to a statistically steady state that is nearly independent of the flow at larger radius. This notion is challenged in the stellar-wind-fed models of Ressler et al. (2020b), which are included in our study.
All simulations are run in Kerr−Schild-like coordinates, which are regular on the horizon. Unless stated otherwise, boundary conditions are outflow at the inner boundary, which is located inside the event horizon, and outflow at the outer boundary, which is located at  r GM c 1000 max 2 . Most simulations are evolved to t final 30,000 GM/c 3 .
Once the evolution has started, the magnetorotational instability (MRI; Balbus & Hawley 1992) and possibly other instabilities, such as, for MAD models, magnetic Rayleigh-Taylor instabilities (Marshall et al. 2018), drive the torus to a turbulent, fluctuating state. Defining P gas ≡ gas pressure and P mag ≡ B 2 /(8π) ≡ magnetic pressure, the standard accretion flow models can be divided by latitude into three zones: (i) an equatorial inflow, (ii) a midlatitude disk wind/corona with β ≡ P gas /P mag ∼ 1, and (iii) a polar "funnel" with σ ≡ B 2 /(4πρc 2 ) ? 1.
The magnetic flux through the horizon, characterized by 1 2 (Φ BH ≡ magnetic flux interior to the black hole equator, º  M mass accretion rate), divides the outcome into two states: the MAD state (e.g., Bisnovatyi-Kogan & Ruzmaikin 1974;Igumenshchev et al. 2003;Narayan et al. 2003;Tchekhovskoy et al. 2011) andthe SANE state (e.g., De Villiers et al. 2003;Gammie et al. 2003;Narayan et al. 2012). MAD models have f ∼ f crit ∼ 60. 155 In MAD models, magnetic flux accretes onto the hole until f  f crit . Accretion of additional flux leads to flux expulsion events so that the flow maintains f ∼ f crit . Our SANE models, in contrast, typically have f ∼ 1.
We consider two GRMHD simulations with initial conditions that differ from the fiducial aligned torus: strongly magnetized non-MAD tilted torus simulations (Liska et al. 2018;Chatterjee et al. 2020), and a simulation in which Sgr A * is fed directly by winds from stars in its vicinity ). The wind-fed simulations result in a mode of accretion that is similar to MAD but typically has lower mean angular momentum and is less well organized. The wind-fed models have a * = 0.
The GRMHD simulation library is summarized in Table 1. Figure 1 shows a few examples of GRMHD simulations for an aligned SANE, an aligned MAD, a tilted torus, and a wind-fed simulation. These simulations vary in numerical method and in numerical resolution. We present more information on numerical methods in Appendices A and B.
The gas temperature profile is a critical feature of the GRMHD simulations. Figure 2 shows the time-and azimuthaveraged profiles of the midplane dimensionless gas temperature P/(ρc 2 ) in a set of aligned GRMHD simulations. The temperature profiles exhibit trends with spin and magnetic state (MAD or SANE) that drive many of the trends seen in the radiative models: MAD models are a factor of several hotter than SANE models, and both MAD and SANE become hotter as a * increases.

Radiative Transfer Model
Synthetic images are generated from the GRMHD simulations in a radiative transfer step. The transfer step requires (i) a model for the eDF, (ii) assignment of a density scale to the GRMHD simulation, (iii) the inclination i (angle between the torus angular momentum and the line of sight), and (iv) a numerical integration performed as a post-processing step that assumes that the plasma evolution is unaffected by radiation.
Electron Distribution Function.-Thermal models have electron energies distributed according to the Maxwell-Jüttner distribution function: where K 2 is a modified Bessel function of the second kind and γ is the electron Lorentz factor. Recall Θ e = k B T e /(m e c 2 ), which is determined by the ion−electron temperature ratio R ≡ T i /T e : Here u and ρ are the internal energy density and rest-mass density from the GRMHD simulation, respectively, and we have assumed that the ions are nonrelativistic with adiabatic index 5/3 and the electrons are relativistic with adiabatic index 4/3. Thermal models are motivated by the idea that waveparticle scattering drives partial relaxation of the eDF, even though Coulomb scattering is ineffective. The temperature ratio depends on a balance between microphysical dissipation, radiative cooling, and fluid transport. Models for collisionless dissipation vary widely in their predictions for the ratio of heat deposited in ions and electrons but depend most strongly on the local magnetic field strength. This motivates a prescription in which the temperature ratio depends solely on the plasma β ≡ P gas /P mag (Chan et al. 2015b). We adopt the same model as M87 * Paper V and Event Horizon Telescope Collaboration et al. 2021a, hereafter M87 * Paper VIII, where 1 13 i e high 2 2 low 2 (Mościbrodzka et al. 2016) and b ≡ β/β crit . This model has three free parameters: β crit , R low , and R high . We fix R low = 1 (consistent with the long cooling time in Sgr A * ; see discussion in Event Horizon Telescope Collaboration et al. 2021a) and β crit = 1, but we allow R high to vary from 1 to 160. When R high ? 1, emission is shifted away from the midplane and toward the poles. In nonthermal models, the eDF has a power-law tail extending to high energy. We explore two implementations: (i) a power-law distribution function with power-law index p and upper and lower limits g min and g ; max and (ii) a so-called κ distribution function, inspired by observations of the solar wind and by results of collisionless plasma simulations (e.g., Kunz et al. 2015, and references therein) which has width parameter w and power-law index parameter κ. Evidently, any eDF assignment scheme is an approximation since the eDF depends in general on both local conditions and particle histories. Notice that we also assume that the eDF is isotropic and neglect electron−positron pairs.
Once the eDF is specified, the radiative transfer coefficients (emissivities, absorptivities, and rotativities) can be readily calculated; see Marszewski et al. (2021) for a recent summary.
Model Scaling.-With the exception of the stellar-wind-fed simulations, the GRMHD simulations considered in this work contain a characteristic speed, c, but are otherwise scale-free; they set GM = c = 1. Physical scales are assigned during the radiative transfer step. The black hole mass M fixes the length unit GM/c 2 and time unit GM/c 3 . Because the GRMHD simulations are non-self-gravitating, one is free to set a density scale, or equivalently the accretion rate  M or plasma mass scale .
The plasma mass scale parameter  controls the plasma emissivity and the plasma optical depth and thus the source brightness. We adjust  iteratively until the time-averaged 230 GHz flux densities of the models are within a few percent of the 2.4 Jy mean observed during the 2017 campaign. Notice that, in this work, model parameters are always varied at constant time-averaged millimeter flux density.
Radiative Transfer Calculation.-Given an eDF, density scale , inclination i, and radiative transfer coefficients based on local properties of the plasma, the emergent radiation is obtained by integrating the radiative transfer equation. We use two classes of numerical methods: observer-to-emitter raytracing to generate synthetic images (ipole, Mościbrodzka & Gammie 2018;BHOSS, Younsi et al. 2012), and emitter-toobserver Monte Carlo to generate spectral energy distributions (SEDs, using grmonty; Dolence et al. 2009).
All radiative transport calculations are carried out using the fast-light approximation, in which plasma variables are read from a GRMHD output file at constant Kerr−Schild time and are assumed not to change during ray-tracing. Including lighttravel time effects in the model introduces minor changes to light curves and images (Dexter et al. 2010;Mościbrodzka et al. 2021). Further detail on numerical methods is given in Appendix A.1. Comparisons of radiative transfer codes (Gold et al. 2020;B. Prather et al. 2022, in preparation) show that differences between codes do not contribute substantially to the error budget.
SEDs are produced for narrow bins in inclination angle. At each inclination, the SED is averaged over azimuth. The SED includes synchrotron, bremsstrahlung, and Compton scattering.
We find that 2.2 μm emission is usually dominated by synchrotron, but occasionally 2.2 μm synchrotron is so weak that Compton scattering dominates. We also find that the X-ray can be dominated by either Compton scattering or bremsstrahlung, with the latter dominating in models with a large population of cold electrons at large radius. Figures 3 and 4 show examples of model images and multiwavelength SEDs from our library.
The GRMHD simulation-derived temperatures are unreliable in regions where σ ≡ B 2 /(8πρc 2 ) is large because truncation error in integration of the total energy equation produces large fractional errors in temperature. All radiative transfer models therefore set the emissivity, absorptivity, and inverse Compton scattering cross sections to 0 for the regions with σ > σ cut = 1.

Summary of Sgr A * Model Library
A summary of radiative transfer calculations is given in Table 2. The entire image library contains six simulation sets, ∼1.8 million images at each of 86 GHz, 230 GHz, and 2.2 μm, and ∼1.3 million SEDs. The images and SEDs together occupy about 50 TB.
We refer to the thermal, R high models as "fiducial" models and the remainder as "exploratory" models that test the effect of incorporating changes in the eDF or initial conditions. Nearly all exploratory models (exceptions are described in the discussion) are imaged over 5 × 10 3 GM/c 3 , in comparison to 10 4 GM/c 3 for the fiducial models. The sampling noise in the exploratory models is therefore larger than in the fiducial models, and thus they cannot be tested as rigorously.
The library contains multiple, redundant models for the fiducial models and variable-κ models. This provides some control over the systematic uncertainties associated with variations in GRMHD simulation setup and algorithms.

Observational Constraints
Sgr A * is one of the most frequently observed objects on the sky: it has been observed with a slew of telescopes over five decades in time and 17 decades in electromagnetic frequency. We must select a manageable subset of these data to constrain our models. In doing so, we have attempted to identify constraints (i) that are believed to be uncorrelated, so that each tests a distinct aspect of the model; (ii) that use data that can be simulated with the models; (iii) that are based on EHT 2017 230 GHz VLBI data or that are based on emission produced within or close to the 230 GHz emitting region; and (iv) that are observed contemporaneously or near contemporaneously with the EHT 2017 campaign.
The selected constraints are described in detail below. In brief, the 11 constraints can be divided into three classes. The first class uses EHT data and compares estimates of source size, morphology of the visibility amplitude (VA) distribution, and three parameters of the best-fit m-ring image model (five constraints). The second class uses non-EHT data, including 86 GHz flux density and source size, the median 2.2 μm flux density, and the X-ray luminosity (four constraints). The third class considers variability, including the 230 GHz sourceintegrated variability and the VA variability based on EHT data (two constraints).
The selected constraints are heterogeneous, and it is not yet possible to combine them in a consistent, fully satisfactory way. Indeed, uncertainties in the data and the models are not well enough understood to make that possible. In this first analysis we set a pass/fail criterion for each constraint and consider the implications of various combinations of constraints.
As the number of constraints increases, so does the probability of wrongly rejecting a model. Consider a set of N constraints, and for each assign a probability p that the model is consistent with the data. The model is rejected if p < p c . Then, the probability that the model is wrongly rejected by a single constraint is p c . Applying all N constraints, the probability that the model is wrongly rejected is ( ) -p 1 1 ; c N for N = 11 and p c = 0.01, this is ≈ Np c ; 10%. Each of N constraints must therefore be able to reject a model with probability = 1/N, or the model scoring is meaningless.
The confidence with which a model can be evaluated is limited by sampling noise. Many constraints (e.g., 86 GHz flux density) compare an observation to a distribution of synthetic observations from a model. Time series of synthetic observations are not yet well characterized, but most have a correlation time τ ∼ few × 100 GM/c 3 . If the model decorrelates on timescales longer than τ, then a model of duration T yields ∼ T/τ independent samples, 156 and thus a fractional error in moments of the distribution ∼ (T/τ) −1/2 = 0.18(T/15,000) −1/2 (τ/500) 1/2 . Increasing the number of constraints, then, requires increasing the duration of the GRMHD simulations.
Evidently the models have significant sampling noise, which we control for in part by using three redundant fiducial models. Nevertheless, one should not attach too much significance to the success or failure of individual models.

EHT Observational Constraints
We test the models against EHT interferometric data in three ways. First, we compare an estimate of the source size ("second moment") against an estimate based on short-baseline VAs. Second, we check the location of the first minimum and the longbaseline values of the VAs ("VA morphology"). Finally, using a variant of a procedure from Paper IV, we compare fits for the diameter, width, and asymmetry of an m-ring (a parameterized image-plane model, "m-ring constraints") to distributions based on synthetic data generated from the model library.

230 GHz VLBI Pre-image Size
The source size can be characterized using the second moments of the source image on the sky. The second moments in the image domain map to second derivatives of the visibilities near zero baseline in the (u, v) domain, so shortbaseline VAs can be used to directly estimate the source size.
This procedure is used in Paper II to set an upper limit of 95 μas FWHM and lower limit of 38 μas FWHM for the second moment along a direction through the source corresponding to the orientation of the short baselines (SMT-LMT and ALMA-LMT). This is done without any assumption about the structure of the source and is therefore quite permissive.
These limits do not include scattering. The scattering kernel is estimated to have 16.2 μas FWHM along the relevant EHT baselines. To descatter the sky image size, we subtract this value in quadrature, which produces a scattering-corrected 93.6 μas FWHM upper limit and 34.4 μas FWHM lower limit.
To score a model, we evaluate the second-moment tensor for each simulated 230 GHz image and find its eigenvalues , where λ maj and l min are the major-and minor-axis FWHM, respectively. The image is deemed compliant if there exists any position angle (PA) for which the second moment would satisfy the size constraints, i.e., it is compliant if for any λ such that l l l   min maj , λ lies between the scattering-corrected upper and lower limits. We reject models with compliance fraction <0.01.

230 GHz VLBI Visibility Amplitude Morphology
The second constraint provides a morphological check on the VAs. We ask two questions of each model snapshot: (i) is the first minimum in the visibilities-"the null"-at about the right place, and (ii) are the long-baseline VAs comparable to the data? The null locations and long-baseline amplitudes are sensitive to the source structure. For example, if the source is a simple, circularly symmetric ring of finite width, then the location of the first minimum depends only on the ring diameter, while the VAs on long baselines depend mainly on ring width. GRMHD models are more complicated, with fluctuations in the null locations and long-baseline amplitudes (e.g., Medeiros et al. 2018; M87 * Paper V).
We compare with data from April 7, which have the best (u, v) coverage near the minima in the VAs. The first visibility minimum in both the N-S and E-W directions in the data always occurs between 2.5 and 3.5 Gλ (see Paper II, for details). For the long-baseline interval between 6 and 8 Gλ in the data, the VAs have 4% of the zero-baseline flux. One complication when comparing models to data on long baselines is the effect of interstellar scattering. Diffractive scattering effectively convolves the image with a smooth kernel and can reduce the amplitudes to ∼50% of their descattered values in the 6-8 Gλ range; refractive scattering, on the other hand, introduces noise at all baselines of order 0.5%-3%, depending on the characteristics of the scattering screen Psaltis et al. 2018).
To apply this constraint, we compute the VA of each model snapshot along PAs of 0°, 45°, 90°, and 135°(because of Hermitian symmetry we need only consider PAs in the 0°-180°r ange). We find the first minimum numerically and compute the median VAs between 6 and 8 Gλ. We classify a snapshot as compliant if (i) for at least one PA the first minimum falls between 2.5 and 3.5 Gλ and (ii) at no PA do the median VAs exceed 4%/50% = 0.08 of the zero-baseline flux. We reject models with compliance fraction <1%.

230 GHz M-ring Fitting
Following Paper IV, we fit an m-ring image-plane model to snapshots from EHT data and from simulated data and then compare the distributions of fit parameters.
The m-ring is a δ function in radius with diameter d multiplied by a truncated (up to m = 3; notice that Paper IV truncates at m = 4) Fourier series, convolved with a Gaussian of width w. The model also contains a centered Gaussian component, with amplitude and width as free parameters, to absorb large-scale emission and emission interior to the ring. 157 The m-ring model has 10 parameters. 158 We use three of the parameters that are well constrained and physically interpretable: the m-ring diameter d, the m-ring width w (FWHM of the convolving Gaussian), and the m = 1 relative amplitude β 1 (the 156 In what follows we must sometimes estimate how many independent samples are available in a time series. Rather than estimating τ model by model, we uniformly assume τ ; 500GM/c 3 . The analysis is insensitive to this choice. "asymmetry"). For more details about the m-ring model see Section 4.3 of Paper IV.
We fit the m-ring independently to snapshots consisting of 2-minute intervals of EHT data (this averaging interval is consistent with that used in Paper IV). Over these short intervals, we approximate the source as static. Uncertainties in the fitted m-ring parameters are dominated by the limited baseline coverage during these snapshots rather than by calibration uncertainties or thermal noise. Because snapshots that are close in time sample nearly identical baselines, they do not provide additional model constraints.
To compare fitted m-ring parameters from EHT data to those from synthetic data, we select a subset of ten 120 s scans that have detections on more than ten baselines and integration times at all stations > 40 s. The selected scans are as widely separated in time as possible so that they sample distinct baseline coverage, with an average separation of ; 1240 s ; 60 GM/c 3 , which is small compared to the VA correlation time in the models . Note that the selected scans overlap with those found in Farah et al. (2022). Only small changes in model selection were observed if any one scan was removed from the comparison. The data were descattered before fitting, that is, the VAs were divided by the scattering kernel.
The maximum likelihood m-ring parameters for the 10 selected EHT scans are listed in Table 3. Evidently the fit parameters are noisy. The fits for d range from 39 to 84 μas, for w from 9 to 21 μas, and for β 1 from 0.04 to 0.48. The variation in fit parameters could be caused by source variability, thermal noise, or gain variations. In the models the main driver of fit variations is source variability.
For the models, we read in a series of model images, generate synthetic data for each image for each scan at four PAs (0°, 45°, 90°, 135°), and fit m-rings to the synthetic data. This produces a distribution of m-ring parameters for each model.
The synthetic data are generated as follows. A model image I (x, y) is Fourier transformed to complex visibilities V(u, v) with an assumed PA and then sampled on baselines i drawn from the comparison scan, V i ≡ V(u i , v i ). Normally distributed thermal noise δV th,i with amplitude based on telescope performance during the scan is added, and multiplicative, normally distributed noise with unit variance N is added to crudely model gain corrections:˜( . We set ò = 0.05, but no substantial changes in fit parameters were observed for ò = 0.02. We then fit to the VAs |˜| V i and closure phases. 159 We sample the model images once per 500 GM/c 3 , which is comparable to a correlation time. A model with a 15,000 GM/c 3 imaging window thus produces 30 fits per scan per PA. In comparing the models to the data, we (i) generate the distribution of fit parameters at each PA; (ii) use a Kolmogorov-Smirnov (K-S) test to compare the distribution of ∼300 synthetic data fits with the distribution of 10 observational fits, and obtain a p-value (what is the probability they are drawn from the same underlying distribution?); (iii) average the p-values over the four sampled PAs (i.e., marginalize over PA; the models do not show a significant PA preference); and (iv) reject the model if p < 0.01.

Non-EHT Constraints
In addition to the EHT data, the SED of Sgr A * is well constrained in Paper II and thus potentially useful for model selection. We limit comparison to three bands: 86 GHz, 2.2 μm, and X-ray.

86 GHz Flux
The Global Millimeter VLBI Array (GMVA) observed Sgr A * on 2017 April 3, just 3 days ( ; 13,000 GM/c 3 ) before the EHT campaign. Issaoun et al. (2019) estimate that the compact flux during this observation was F 86 = 2.0 ± 0.2 Jy (2σ errors). Notes. Summary of the EHT Sgr A * GRMHD simulation library. The last column is N 1 × N 2 × N 3 , with coordinate x 1 monotonic in radius, x 2 monotonic in colatitude θ, and x 3 proportional to longitude f. The first four entries use aligned torus initial conditions. The last two entries are tilted accretion models and two realizations of To test the models, we compute a library of 86 GHz images for all GRMHD snapshots for all models and integrate over them to obtain F 86 . We assume normally distributed measurement errors with σ = 0.1 Jy and convolve the F 86 distribution for each model with the resulting Gaussian. We reject models where the value of the error-broadened  (95% confidence; Issaoun et al. 2021). We adopt the latter analysis.
We compute the major-axis FWHM for each image in the 86 GHz image library. We assume normally distributed errors with σ = 6 μas and convolve the model major-axis distribution with the normal distribution. We reject models with error-broadened CDF < 1% or >99% at 146 μas.
Our synthetic 86 GHz images have a 800 μas field of view. A 200 μas field of view cuts off enough emission that the major axis is biased downward in many models by ∼ 20%. Increasing the field of view beyond 800 μas has negligible effect.

2.2 μm Median Flux Density Constraint
Sgr A * has a quiescent and a flaring component in the nearinfrared (NIR), with flares occurring a few times per day (1 day ; 4, 200 GM/c 3 ; Witzel et al. 2018). Since there is as yet no generally accepted model for NIR flares, we accept models that do not produce flares (indeed, none of our models reliably produce flares, even those with nonthermal eDFs). Our working hypothesis is that the models can be made to produce flares by introducing a process that accelerates a small fraction of electrons into an NIR-bright tail of the eDF. If the model overproduces quiescent 2.2 μm emission, however, then we reject it.
Sgr A * had a median 2.2 μm flux = 0.8 ± 0.3 mJy in 2017 (Gravity Collaboration et al. 2020a; see Table 1). The median flux density likely overestimates the median quiescent flux density since it includes flares.
We compute the model median 2.2 μm flux density using one of two procedures. If a full SED-which includes Compton scattering-is available, then we use it. The SEDs are generated by the grmonty Monte Carlo code Wong et al. 2022). If a full SED is not available (see Table 2), then we compute a 2.2 μm image that includes only synchrotron emission (synchrotron absorption is negligible at 2.2 μm for Sgr A * ).
A rigorous model evaluation procedure would correct for the upward bias in median quiescent flux density from flares and allow for errors in the model and observed median flux density, but these refinements are sufficiently uncertain that, instead, we set a conservative threshold of 1.0 mJy and reject the model if its median 2.2 μm flux density exceeds the threshold.

X-Ray Luminosity Constraints
Sgr A * flares in the X-ray less than about once per day (see Yuan et al. 2018, and references therein). Chandra observations during the 2017 campaign suggest a conservative upper limit on the median (quiescent) νL ν at 6 keV of 10 33 erg s −1 (Paper II).
As for the model 2.2 μm flux density, we estimate | n n n= L h 6 keV in two ways. The SED, which incorporates Compton scattering and bremsstrahlung, is used if it is available. If the SED is not available, then we compute an X-ray image that includes only bremsstrahlung (which dominates the X-ray emission in thermal SANE models with R high = 40,160) enabling us to eliminate a few additional models.
We reject the model if its median | n > n n= L h 6 keV -10 erg s 33 1 .

Variability
Sgr A * shows variability on a wide range of timescales. This is expected: fluctuations in stellar wind feeding at the scale of the S stars plausibly introduce long-timescale variations, while turbulence at smaller radii, down to the scale of the event horizon, introduces a spectrum of shorter-timescale variations. Quantitative comparison of observed variability to the models is therefore a potentially powerful tool for model selection.
We consider two variability measures: one characterizes variability in the 230 GHz light curves (Wielgus et al. 2022), and a second characterizes variability of VAs in EHT data (Paper IV; Broderick et al. 2022).

230 GHz Light Curves
We compare variability in the models to light-curve observations of Sgr A * from 2005 to 2017 using the 3 hr modulation index M 3 , where M ΔT ≡ σ ΔT /μ ΔT , σ ΔT is the standard deviation measured over an interval ΔT (in hours), and μ ΔT is the mean measured over the same interval.
Following Chan et al. (2015a), we use M ΔT because it is easy to describe, easy to compute, commonly used in the literature (in the X-ray astronomy literature it is "rms %"), and closely related to the structure function, since the expectation value for s T 2 is given by an integral over the structure function (see D. Lee et al. 2022, in preparation).
We use ΔT = 3 hr ( ∼ 530 GM/c 3 ) because it is long enough to be comparable to the characteristic timescale measured in damped random walk fits to the ALMA light curve (see Table  10 of Wielgus et al. 2022) but short enough that the model light curves provide a sample that is large enough to be constraining. In extracting a sample of M 3 from the light curves, we use as many 3 hr segments as possible, equally spaced away from the light-curve endpoints and each other, and calculate M 3 on each segment. We treat consecutive measurements of M 3 as independent, consistent with the minimal correlation expected for a damped random walk (D. Lee et al. 2022, in preparation).
We must select an observed distribution of M 3 . The April 7 data alone provide only a weak constraint because there are only three samples. The M 3 measured from EHT 2017 observations on April 5-11 provide seven samples, while the M 3 measured from all available light curves longer than 3 hr, including earlier SMA and CARMA data (the "historical distribution"; see Wielgus et al. 2022), yield 42 samples. The 2017 distribution is consistent with being drawn from the historical distribution, although April 6 has one of the quietest segments on record, and April 11 one of the most variable. We selected the historical distribution and note that the 2017 distribution rejects slightly more models but leads to identical conclusions.
For each model we use a two-sample K-S test to estimate the probability p that the model and observed M 3 values are drawn from the same underlying distribution. We reject the model if p < 0.01.
Through the K-S test, the strength of the M 3 constraint depends on the number of data and model samples. The fiducial  models have duration 10 4 or 1.5 × 10 4 GM/c 3 (18 or 28 samples), whereas most exploratory models have duration 5 × 10 3 GM/c 3 (nine samples). The M 3 constraint is therefore weaker for the exploratory models: an exploratory model that passes the constraint may be more variable than a fiducial model that fails.

EHT Structural Variability
Fluctuations in the spatial structure of the source produce fluctuations in the VAs. Here we compare the power spectrum of structural variability from EHT observations with predictions from GRMHD models.
A nonparametric technique to measure the variance of the spatially detrended VAs at a location in the (u, v)-plane is described in Broderick et al. (2022) and briefly summarized here. We use EHT observations of Sgr A * from April 5, 6, 7, and 10 (April 11 was excluded). To remove correlations associated with variations in the total flux, we normalize the VA data with the contemporaneous intrasite light curve . The light-curve-normalized VAs are then linearly detrended, and variances are computed and azimuthally averaged ). The resulting is a measure of the fractional structural variability as a function of baseline length |u|. The (| |) s u var 2 is included in an inflated error budget when making images of and fitting models to the 2017 EHT observations of Sgr A * (Paper III).
We measured this quantity from the GRMHD simulations (see Georgiev et al. 2022, for details). For all simulations reported here, s var 2 is well approximated by a broken power law with parameters that are nearly universal among simulations. The s var 2 is measured over a 4-day period, which is longer than the typical model duration. We therefore expect that model values will be biased downward compared to the data. Furthermore, each GRMHD simulation can only give one draw from a distribution that is broader than if the simulation spanned 4 days. This secondary effect negates the downward bias, which is further unimportant, as we do not exclude models for being not variable enough. To measure the larger broadness of the distribution, we use multiple simulations with the same parameters and subdivide the analysis of long simulations into windows. The uncertainties in the measurement from the GRMHD simulations due to simulation resolution, the fast-light approximation, and code differences are small compared to the uncertainty due to the variability of s var 2 due to short simulations ). The measured s var 2 is well characterized by a power law for 2 Gλ < |u| < 6 Gλ . For comparison with the models presented here, we distill the s var 2 to two numbers: the amplitude a 4 2 at 4 Gλ and a power-law index b. Because the normalization is done in the center of the fit range, the estimated ( ) = - a log 3.4 0.1 10 4 2 and b = 2.4 ± 0.8 are essentially uncorrelated.
Model predictions for a 4 2 and b are computed using the power spectral densities from Georgiev et al. (2022). 160 The anisotropic diffractive scattering kernel from Johnson et al. (2018) is applied to (| |) s u var 2 and averaged over relative orientations of the major axis of the scattering kernel and Note. Summary of the EHT Sgr A * model library. All models are imaged at 86 GHz, 230 GHz, and 2.2 μm, and some (Column (5)) also have SEDs. For the wind-fed accretion model the viewing angle is set by the stellar orbits and R high is set so the model matches the observed 230 GHz flux; R high = 13 and 28 for models with weak and strong stellar wind magnetizations, respectively ). 160 Georgiev et al. (2022) gives the power spectral density of the complex visibility,ˆ(| |) á ñ P u , rather than the VA, and thusŝ = á ñ P 2 var 2 . 11 the black hole spin. These estimates are then azimuthally averaged, and the parameters a 4 2 and b are determined from a least-squares linear fit to (| |) s u var 2 in 2 Gλ < |u| < 6 Gλ. For each model the fits for a 4 2 and b are done separately on each window of length 5 × 10 3 GM/c 3 , giving at most three measurements for most models. This makes a direct comparison with the measured value difficult, as the model distribution is poorly constrained. Georgiev et al. (2022) estimates that the typical width of a model distribution is We can obtain a rough estimate for how the models fare compared to the measurement by taking the mean across windows, assuming that the width of the distribution is σ = 0.1, and comparing this with the observed distribution under the assumption that both are distributed normally. We reject models with error-broadened CDF <1% or >99% at ( ) =a log 3.4 10 4 2 .

Fiducial Models
We start with the fiducial models. Recall that these have aligned (prograde or retrograde) accretion flows, thermal eDFs, and electron temperature assigned according to the R high model, as in M87 * Paper V, and include the KHARMA, BHAC, and H-AMR model sets listed at the top of Table 2.
A set of plots showing how the three, redundant fiducial model sets fare for each constraint is provided in Appendix C. Table 4 summarizes the fraction of fiducial KHARMA, BHAC, and H-AMR models that pass each constraint.

EHT Constraints
Second Moment.-Without assuming a ring, the EHT data allow a wide range of second moments. The second moment constraint passes 98% of all models. Here and in what follows, the quoted passing fraction for the model describes the fraction of points in parameter space for which the existing model sets (KHARMA, BHAC, and when present H-AMR) agree that the model passes the constraint. In short, nearly all fiducial models are about the right size once we use the 230 GHz to fix the mass unit . The few rejected models are a * 0, face-on, SANE models with R high = 1. These models have extended emission on scales large compared to the critical impact parameter = b GM c 27 c 2 . The right panel of Figure 5 shows an example of one of these failed models. The left panel shows an example of a passing model.
Visibility Amplitude Morphology.-The VA morphology constraint tests the null location and long-baseline VAs. Figure 6 shows an example of a passing and failing model. The constraint disfavors edge-on models at positive spin and a few large-R high SANE models. This is mainly because the edge-on models contain bright spots, corresponding to the approaching side of the rotating accretion flow, and faint rings, so the first nulls get washed out by the bright features. The VA morphology constraint passes 79% of all models.
M-ring Fits.-The m-ring asymmetry, diameter, and width are treated as separate constraints. Recall that we compare the distribution from the data to that from the model using a twosample K-S test.
The asymmetry parameter is typically not well constrained. Many rejected models are at high inclination and have a * = 0.94. These models have asymmetries that are large and detectable because Doppler boosting concentrates emission in an equatorial spot on the approaching side of the disk. The asymmetry parameter constraint passes 91% of all models.
The m-ring diameter, which depends on the diameter of the shadow and the ring width, is better constrained than the asymmetry parameter and varies systematically from model to model. The ring diameter constraint passes 54% of all models.
Most of the models that fail are low-inclination models with ring diameters that are too large. Only two BHAC models fail because the ring diameter is too small. Most of the rejected models are low-inclination models at a * < 0.
The m-ring width w is the most tightly constrained of the three m-ring parameters. Although the closure phases constrain w as well, it is easiest to see how w affects VAs at long baselines. For example, for a circularly symmetric ring the VAs are a Bessel function multiplied by a Gaussian with width ∼ 1/ w. Increasing w therefore decreases the amplitude of the long baselines. Figure 7 shows examples of models that pass and fail the m-ring width constraint. Figure 8 summarizes the pass/fail status of the fiducial models for the m-ring width. All rejected models have median w that is below the median of the data, ; 17.5 μas. The rejected models include all MAD models at a * 0 and all edge-on (i = 90°) models in the KHARMA, BHAC, and H-AMR fiducial models. MAD models exhibit a strong trend toward smaller w as i increases. SANE models exhibit a similar but weaker trend. The SANE model images have higher optical depth, broader rings, and more substructure than the MAD models. Their w distributions are typically broad, with mode well below 17.5 μas. Only for a * = 0.94, where the optical depth is lower owing to higher temperatures in the emitting region, do most of the models exhibit a sharply peaked w distribution centered at 17.5 μas.
EHT Constraint Summary.-We can combine all EHT constraint cuts with a logical and operation. The results are summarized in Figure 9. Evidently EHT data alone are capable of discriminating between models. The edge-on (i = 90°) models all fail, with some failing m-ring width, diameter, asymmetry, and the VA morphology constraint. The cuts clearly favor a * > 0 models, with a few exceptions. There are two clusters of models that do not fail any constraints in any models: positive-spin MAD models at low inclination, and positive-spin SANE models, also at low inclination.

Non-EHT Constraints
86 GHz Flux Density.-In a simplified picture Sgr A * 's millimeter flux is produced in a photosphere that decreases in size as frequency increases. Because optical depth is not large at 230 GHz (∼0.4 in the one-zone model) and the source structure is complicated (the optical depth varies across the image), the simplified picture is imprecise. Nevertheless, 86 GHz emission is on average produced at larger radius than 230 GHz emission, and the 86 GHz source size is larger than the 230 GHz source size. The ratio of 86 GHz to 230 GHz flux density is therefore sensitive to the radial structure of the source plasma. Figure 28 records the results of applying this constraint. Most R high = 1 models, both MAD and SANE, fail the 86 GHz flux density test. The 86 GHz flux density is quite sensitive to R high . For example, SANE a * = 0.5, i = 70°, 90°models are too bright at R high = 1 and too dim at R high = 10. This suggests that there are passing models in between, and that the parameter space is not sampled densely enough. Finally, the 86 GHz flux constraint strongly favors MAD models over SANE models in all three fiducial model sets.
86 GHz Major Axis.-As for the 86 GHz flux, the 86 GHz size is sensitive to optical depth as a function of radius in the source plasma. Figure 29 in Appendix C shows the full results of applying this constraint.
The 86 GHz size is sensitive to inclination. For example, the SANE, a * = 0, R high = 40 models are too small at low inclination and too large when seen edge-on, because the edge-on models have prominent limb-brightened jet walls that are visible to 100 μas. The 86 GHz size constraint passes only 58% of models and is therefore one of the tightest constraints.
The physical picture for 86 GHz source size is complicated, as is the extraction of the constraint itself from observations. Notice that (i) two different values for the 86 GHz intrinsic source size have been reported in the literature (see Section 3.2.2), (ii) scattering is 7 times stronger at 86 GHz than at 230 GHz, (iii) scattering must be subtracted accurately to obtain the intrinsic source size, and (iv) the error bars for the 86 GHz source size are narrow and this plays a key role in determining the strength of the constraint.
2.2 μm Median Flux Density.-2.2 μm photons are produced by the synchrotron process from electrons on the high-energy end of the eDF. For the one-zone model with B = 30 G and Θ e = 10, the mean Lorentz factor is γ = 30 and the synchrotron critical frequency ν crit = γ 2 eB/(2πm e c) ; 80 GHz. Emission at 2.2 μm is produced by electrons with Lorentz factor γ ; 10 3 , so 2.2 μm flux density is sensitive to Θ e and B. Both increase toward the horizon, and field strength is nearly independent of latitude, so 2.2 μm photons are produced at small radius in regions where Θ e is highest.
The sensitivity to Θ e implies that 2.2 μm flux density will be highest for models with higher temperatures. For SANEs the midplane gas temperature, and therefore electron temperature in the R high prescription, increases with a * , so the highest 2.2 μm flux density is at positive a * .
The sensitivity to B implies that 2.2 μm flux density will be highest for parameters with stronger fields. B depends on the GRMHD flow configuration and also on the accretion rate, which is fixed by the observed F 230 , so when all else is equal the 2.2 μm flux density is highest when the accretion rate is largest. The dependence of accretion rate on model parameters is discussed in Section 5.5. In brief, for SANE models the accretion rate declines as a * increases and R high decreases. For MAD models the accretion rate dependence on a * and R high is relatively weak.
Finally, the 2.2 μm flux density is also sensitive to inclination. A combination of Doppler boosting and the rapid falloff in emissivity in the NIR means that at large inclination lower-frequency emission from the approaching side of the accretion flow is boosted into the NIR, and thus 2.2 μm flux is higher at high inclination. Figure 10 shows sample SEDs from our model library, where the left panel is a model that passes the 2.2 μm flux limit and the right panel is a model that fails. Models that pass the 2.2 μm flux limit are shown in Appendix C in Figure 30. The rejected SANE models (7% rejected by all of KHARMA, BHAC, and H-AMR) tend to be at high inclination: their images are dominated by a bright spot on the approaching side of the disk. The rejected MAD models (53%) include nearly all models at R high = 1 and R high = 10, where Θ e tends to be larger, and the majority of high-inclination models, where the effect of Doppler boosting is largest.
We find that some models are Compton dominated at 2.2 μm. For example, a * = − 0.94 SANE models become optically thin at relatively low frequency as R high goes to 1, and thus synchrotron emission drops off rapidly as frequency increases. When the synchrotron is weak enough, the underlying bump of Comptonized millimeter photons dominates.
X-ray Luminosity.-X-ray production in fiducial models is typically dominated by Compton upscattering of thermal synchrotron photons. In the first Compton bump νL ν is thus proportional to the y-parameter t Q y 16 e e 2 , where τ e is a characteristic electron-scattering optical depth and Θ e is a typical dimensionless electron temperature. At R high = 1 the X-ray band lies in the first Compton bump, while at larger R high the bumps move to lower energy because the bulk of the Thomson depth is in the midplane where Θ e ∝ 1/R high .
We find that in a few large-R high SANE models, however, X-ray emission is dominated by bremsstrahlung (synchrotron never dominates the X-ray in thermal models). Bremsstrahlung emissivity j ν,b ∝ n 2 , so at fixed temperature bremsstrahlung increases rapidly with density. Notice that µ Q n j b e , 1 2 for Θ e > 1 and Qe 1 2 for Θ e < 1, so cool disks enhance bremsstrahlung. Bremsstrahlung therefore dominates Compton in models with high density and low temperature, i.e., some models with large R high (see Section 5).
In models with bremsstrahlung-dominated X-ray emission the median radius of emission is ;20GM/c 2 . Although the models are equilibrated at this radius, the X-ray luminosity may be partially contaminated by emission from unequilibrated plasma at larger radii. Because the fiducial models start with a torus of finite radial extent, however, they are also missing bremsstrahlung emission from outside the initial torus. A full assessment of the associated uncertainty requires large, long runs. Notice that because bremsstrahlung arises at large radii, it varies more slowly than the synchrotron and Comptonupscattered X-ray emission and is therefore potentially distinguishable (Neilsen et al. 2013).
The left panel of Figure 10 shows a model that passes the X-ray flux limit, while the right panel shows a model that fails. The X-ray cuts are shown in Appendix C, Figure 31. Some large-R high SANE models fail owing to excess bremsstrahlung, although there is notable disagreement between BHAC and Note. The m-ring fits to selected 120 s scans from April 7. Column (2) gives UTC in hours for the observation. Columns (3)- (5) give best-fit parameters for the m-ring diameter, width, and asymmetry parameter, respectively.
KHARMA for SANE X-ray fluxes. MAD models that fail have small R high and are Compton dominated in the X-ray. Nearly all R high = 1 MAD models fail the X-ray constraint, as do many at R high = 10. This is because the midplane Θ e increases as R high goes to 1. Since the midplane contributes most of the electronscattering optical depth, small-R high models have the largest yparameter and are at greatest risk of overproducing X-rays. Summary of Non-EHT Constraints.-Applying only non-EHT constraints leaves 6% of models as shown in Figure 11. The surviving models are the result of applying a heterogeneous and noisy set of constraints using a hard cutoff, which somewhat obscures the underlying physical picture. Nevertheless, the surviving 13 models are all MAD and all have R high > 10. All but two have i < 70°. This leaves a cluster of surviving MAD models at large R high and low to moderate inclination.

Variability
Variability is central to the interpretation of EHT observations of Sgr A * : an 8 hr observation of Sgr A * lasts 1400 GM/c 3 , a timescale over which most models vary substantially. In contrast, an 8 hr observation of M87 * is ∼ GM/c 3 , and on this timescale M87 * hardly varies at all.
Recall that we consider two variability constraints, one on the 230 GHz light curve and the other on 230 GHz VAs. We find that SANE models are less variable than MAD models. Only 3.5% of models, all SANE, pass both variability constraints. A possible interpretation of this result is that the models are missing a physical ingredient that would reduce variability, and this is discussed in Section 5.
Modulation Index.-The distributions of 3 hr modulation index (M 3 ) across all fiducial SANE models, across all fiducial MAD models, and across the historical data set are shown in Figure 12. The plot also shows distributions for individual models with the lowest and highest median M 3 .
The M 3 cuts are summarized in Appendix C, Figure 34. We find that (i) as a group, the fiducial models are more variable than the data; (ii) the MAD models are more variable than SANE models; (iii) 11 individual models pass the constraint for all fiducial model sets, and these are exclusively SANE models; (iv) there are some differences between variability in the fiducial model sets, with H-AMR models notably more variable than KHARMA and BHAC models; and (v) the pass fractions for the fiducial model sets are 20% for KHARMA, 27% for BHAC, and 7% for H-AMR. The modulation index is the tightest single constraint on the models. at 2-6 Gλ of the models is generally in good agreement with the value measured from the 2017 EHT campaign (excluding April 11). The amplitude a 4 2 , however, varies depending on the model. Figure 13 shows the distribution of ( ) a b , 4 2 from the EHT observation, along with the distributions across all fiducial models. For a single model, the number of measurements of ( ) a b , 4 2 is equal to the number of windows for that model (three in most cases). The koral models appear more variable because they include only R high = 20 MAD models at various spins.
The models tend to be more variable than the observations, with face-on models performing better than edge-on models. For SANE models, R high = 10 tends to be more variable than others. For MAD models, there is a slight preference for lower R high .
Long-duration koral Models.-We have imaged a set of MAD models run with the koral code out to ∼ 100, 000 GM/c 3 . These long-duration models have R high = 20, which lies off our fiducial model parameter grid. They enable us to assess the importance of integration time for application of the constraints and provide a more accurate distribution for, e.g., M 3 .
The koral models are discussed in Appendix B. In brief, we find no evidence for significantly different variability when comparing the first and second half of the koral runs, consistent with no long-term evolution of the variability. We also find no significant differences when comparing the koral runs to nearby models on the fiducial model parameter grid. Notice that in Figure 13 the koral models are more variable than the other model sets only because the other model sets contain lower-variability SANE models.

Summary of Constraints on Fiducial Models
None of the fiducial models survive the full gauntlet of constraints. The pass fractions for individual constraints for the BHAC, KHARMA, and H-AMR fiducial models are listed in Table 4. M 3 is the most severe constraint, followed by the m-ring width constraint. Together the variability constraints pass only 4% of fiducial models and prefer SANEs, which are less variable than MADs, while the remaining constraints prefer MAD models.
It is likely that the models are physically incomplete. It is also possible, however, that one of the constraints is measured incorrectly, that one of the constraints is applied incorrectly, or that one of the constraints is poorly predicted for numerical reasons. To investigate this, we identify all models that fail only one constraint in Table 5. We find that the critical constraints are 86 GHz size, m-ring diameter, and M 3 . Notice that there is overlap between KHARMA and BHAC in MAD models that fail the M 3 constraint. The H-AMR models fare significantly worse than the KHARMA and BHAC models in the M 3 constraint: only 7% of models pass. The remaining models all fail at least one additional constraint, leading to their exclusion from Table 5.

Exploratory Models
Next, we go beyond the fiducial models and consider the exploratory models, which include aligned models that use an alternative scheme for assigning temperatures to a thermal eDF, aligned models with a power-law component or κ component in the eDF, tilted models, and stellar-wind-fed models. Unless stated otherwise, exploratory models are imaged over only 5 × 10 3 GM/c 3 , yielding weaker constraints. In all cases we focus on how the exploratory models differ from the fiducial models.

Critical Beta Model
The R high prescription provides a convenient, one-parameter model for assigning electron temperatures, but here is a vast function space of possible alternative parameterizations. One well-motivated choice is the critical beta model (Anantua et al. 2020b), which sets T e = T e (R) and (11)). This "critical beta" model has two parameters, f and β c . We consider a single point in the parameter space: 14 f = 0.5, β c = 1. Compared to the R high temperature prescription, the main new characteristic of the critical beta models is that the electron-to-ion temperature ratio approaches 0 at high β instead of 1/R high .
We have run all tests except X-ray for the critical beta models. The 2.2 μm flux is calculated by imaging only and therefore does not include Compton scattering.
All critical beta models fail the non-EHT constraints, with the 86 GHz size constraint rejecting most models as too small. The variability constraints pass 23% of the models. No models survive the combined EHT and non-EHT constraints even if variability constraints are excluded. Notice that this does not imply that critical beta models are ruled out, since we have only tested a single point in the f, β crit parameter space.

Thermal Plus p = 4 Power-law Models
So far we have assumed a thermal eDF (Equation (11)). Fully kinetic simulations and resistive MHD predict that reconnection in current sheets within the accretion flow and in the jet sheath leads to the acceleration of particles to higher energies, resulting in the emergence of a power-law tail (e.g., Sironi et al. 2021, and references therein). Such acceleration events are thought to be the origin of NIR and X-ray flares detected in Sgr A * . Here we do not address flare mechanisms but seek to constrain the contribution of nonthermal electrons to the quiescent emission of Sgr A * .
Below we assume different forms of the eDF assuming that a fraction of the electron population is accelerated into a nonthermal tail. There are multiple ways of doing this, but we will continue to assume that the eDF depends instantaneously on local conditions and set the accretion rate so that the 230 GHz time-averaged compact flux is 2.4 Jy.
First, we consider a hybrid thermal/power-law distribution using H-AMR/BHOSS. Since we are modeling quiescent emission, we assume a steep power-law index of p = 4 with a constant nonthermal acceleration efficiency ò = n e, power-law /n e, thermal = 0.1, typical of PIC simulations (e.g., Sironi et al. 2015;Crumley et al. 2019). Following Chatterjee et al. (2021), the power-law tail is stitched to the thermal core by choosing the minimum Lorentz factor limit of the power law, g min , to be at the peak of the Maxwellian component. The upper end of the power law is set to g 10 5 min (see Equation (14)). The temperature of the thermal component is set by the R high prescription (Equation (13)). We find that the accretion rate is slightly smaller than for corresponding thermal models, consistent with a small contribution from the powerlaw component to the 230 GHz total intensity.
230 GHz VLBI Pre-image Size.-Hybrid thermal/powerlaw models have larger 230 GHz VLBI pre-image sizes compared to their purely thermal counterparts. This is because the power-law component of the eDF allows high-energy electrons in weak magnetic fields at distances more than a few gravitational radii (i.e., larger than the typical emission radius of the 230 GHz images) to contribute to the total image. However, the extension in the images is much smaller for MAD models, with most MAD images displaying an increase in size of <10%.
86 GHz Flux and Image Size.-In general, the R high = 1 models produce too much 86 GHz flux. Since the lower limit of the power law g min is directly affected by the local electron temperature, the highest-energy electrons are located in the jet sheath where T i ≈ T e . Indeed, this is why SANE models produce more 86 GHz flux when nonthermal electrons are introduced, especially at larger R high values. On the other hand, MAD thermal and mixed thermal/nonthermal models behave similarly, as the bulk of the emission is produced in the inner disk.
The 86 GHz image sizes for the hybrid H-AMR models are, on average, larger than their thermal-only counterparts, similar to the 230 GHz image sizes. The higher-energy electrons of a hybrid thermal/power-law population emit at higher frequencies than their thermal core, thereby extending the image size. This effect increases the image size of MAD models by only a few percent.
2.2 μm Constraint.-The addition of the power-law tail increases the flux at 2.2 μm, and thus the GRAVITY-based 2.2 μm median flux density threshold of 1.0 mJy provides a strong constraint on the power-law index and the acceleration efficiency. In brief, 59% of the power-law models, especially R high = 1 and 40 MAD models, are ruled out by the 2.2 μm constraint.
Summary.-Overall, H-AMR hybrid thermal/power-law models behave quite differently from their thermal counterparts. For the thermal models, both EHT and non-EHT constraints are equally successful in ruling out models, with 22% passing for each constraint set. For the power-law model set non-EHT constraints pass 39% of models while EHT constraints pass 10% of models. This disparity occurs for two reasons: (i) introducing nonthermal electrons pushes the 86 GHz image size to the acceptable range, as thermal models typically exhibit small image sizes; and (ii) the m-ring width is found to be smaller for the hybrid models. This could be due to a change in the gas density scaling that is required to match the 230 GHz flux. Nonthermal models require a smaller normalization value, meaning a smaller electron number density as compared to the corresponding thermal models. A decrease in the number density lowers the optical depth, leading to a thinner photon ring. For the initial 5000 GM/c 3 survey, two mid-inclination power-law models survive: a SANE a * = 0.94 model and a MAD a * = 0.5, R high = 1 model (see Table 6),     Next, we consider a model in which all electrons are in a κ eDF, which has a thermal core and a power-law tail. We set κ = 5 everywhere, motivated by Kunz et al. (2016), who found κ = 5 to be a good fit to the ion DF in a 3D hybrid simulation of MHD turbulence. A similar application of κ eDFs with fixed κ values for Sgr A * can be found in Davelaar et al. (2018). The power-law tail has p = κ − 1 = 4, and at high frequency νL ν ∼ ν s , where s = 2 − κ/2 = − 1/2. 161 We image BHAC GRMHD simulations from 25 to 30 kM using BHOSS (Younsi et al. 2012. The accretion rate required to obtain 2.4 Jy is smaller than for the thermal models. This implies that many of the κ = 5 models are optically thin at 230 GHz and show thinner rings than their thermal counterpart (see first and second rows in Figure 14).
230 GHz Size and Light-curve Variability.-We find that the κ = 5 models produce results that are generally consistent with the BHAC thermal models. Especially at 230 GHz we find similar passing fractions for the 230 GHz source sizes. A total of 92% of the κ = 5 models pass the size constraint, compared to 98% for the thermal models. This can be explained mainly by SANE models at small R high , which are larger than the thermal models. Variability is almost completely unaffected by the κ distribution. We find that 29% are in agreement with the M 3 constraint, compared to 27% for the thermal models. The κ = 5 models have a higher M 3 for a small number of SANE R high 40 models. However, since the M 3 constraint is computed for a time window of length only 5000 GM/c 3 , a factor of three shorter than for the thermal models, this increase does not increase the fraction of models ruled out by this constraint.
Visibility Amplitude Morphology.-The κ = 5 models are optically thinner than the corresponding thermal models and typically show a thin, bright ring feature. As a consequence, only 59% of the κ = 5 models pass the VA morphology constraint, while the passing fraction for the thermal models is 84%. Similarly, due to the change in optical depth, only 55% of the nonthermal models are in agreement with the VA morphology, in contrast to 72% of the thermal models.
M-ring Fits.-The m-ring constraints on diameter, width, and asymmetry are passed by 71%, 3%, and 73% of the κ = 5 models, respectively. Except for the diameter, all pass fractions are smaller than for the thermal models (65%, 21%, and 95% for diameter, width, and asymmetry, respectively). The slightly larger pass fraction for the diameter could be affected by the shorter time window used for the κ models as compared to the thermal ones. However, the low fraction for the m-ring width can be explained by the optical depth of the κ models. Most of the κ models are optically thinner than their thermal counterpart, which leads to a finer, brighter ring structure, and this is picked up by the m-ring fitting (see Figure 14).
86 GHz Source Size.-For MAD models the size of the κ = 5 models does not change. This can be explained by the fact that most of the emission is produced in the midplane. For the SANE models we find two different behaviors: the source size increases for R high < 40 and decreases for R high 40, especially for positive black hole spins and high inclinations (compare first and last panel in the bottom row of Figure 14). This change in size is consistent between the images at 230 GHz and at 86 GHz. The passing fraction for the κ = 5 models drops to 29% as compared to 59% for the thermal models.

2.4
230 GHz p 1 2 . This leads to an 86 GHz flux density ∼10 Jy, which is far above the 86 GHz flux constraint of 2 ± 0.2 Jy. Consequently, the passing fraction for the κ = 5 models drops to 12%, compared to 68% for the thermal models. Figure 9. Combined EHT constraints (logical and) including the second moment, VA morphology, and m-ring fit constraints. Green indicates that the KHARMA, BHAC, and H-AMR fiducial models pass, yellow that one or two of the fiducial models fail, and red that all three fail. The inclination coverage is not uniform: BHAC and KHARMA models cover all five inclinations, while H-AMR models cover i = 10°, 50°, and 90°only.
2.2 μm Constraint.-All MAD models fail the 2.2 μm constraint, as do SANE models with R high > 1. This can be explained by the power-law tail of the κ eDF (see Equation (15)) as compared to the exponential behavior in the thermal eDF (see Equation (11)). Only 14% of the κ models pass, in contrast to 70% of the thermal models. Evidently the NIR flux density provides a powerful constraint on any nonthermal component in the eDF.

Mixed Thermal/κ Model
Next, we consider a mixed thermal/nonthermal eDF, with the nonthermal component following the κ DF with κ = 3.5. At high frequency νL ν ∼ ν s with s = 2 − κ/2 = 1/4, similar to what is seen in 2.2 μm flares (Hornstein et al. 2007). For this model set the GRMHD simulations use BHAC and the imaging uses BHOSS.
The fraction of nonthermal electrons is assumed to depend on σ and β. The emissivity  Evidently ò → 0 in the disk while ò → ε in the jet. Since we remove emission at σ > σ cut = 1, the nonthermal electrons are confined to the jet sheath. We set s = 0.01 min and vary the base efficiency, ε, over 0.05, 0.1, and 0.2. At each ε we generate a model set spanning the same parameter space as the thermal models (see Table 2) and normalize the accretion rate using the standard procedure (see Section 2).
The mass accretion rate required to obtain 2.4 Jy at 230 GHz only changes on average around 1.5% as compared to the thermal models. This small variation in the mass accretion rate reveals the fact that most of the emission at 230 GHz is created from the thermal part of the hybrid eDF, consistent with the small fraction of nonthermal particles added (ε = 0.05, 0.1, and 0.2). For MAD models, 230 GHz emission is mostly produced in the disk region (see M87 * Paper V and Figure 8 in Wong et al. 2022, for a 3D rendering). Thus, the images are unaffected by the nonthermal particles, which are located in the jet.
For SANE models, increasing R high pushes the emission toward the jet sheath, which increases the source size for high spins and large R high . However, the effect of nonthermal particles on the image is minor because most of the emission is still produced by thermal electrons with temperature set by the R high prescription (compare first and third panels in the top row of Figure 14).
The passing fraction for the 230 GHz image size is 98%, independent of ε, consistent with the thermal models. We find that 47% of the models are in agreement with the 230 GHz variability constraint. This passing fraction is larger than for the thermal models (27%), due to the shorter time window (5000 GM/c 3 ) considered for the nonthermal models, in contrast to 15,000 GM/c 3 for the thermal ones.
Visibility Amplitude Morphology and Variability.-Since 230 GHz images of the κ = 3.5 models with variable efficiency are similar to the thermal models (see previous paragraph), the fraction of passing models for the VA morphology are comparable. The three nonthermal models have an average passing fraction for the VA morphology of 80%, whereas 84% of the thermal models pass. The 4 Gλ VA variability constraint passes 72% of the models for both thermal and nonthermal eDF.
M-ring Fits.-Given that including nonthermal particles via the equations presented in Equation (17) does not change the image structure and variability properties of the 230 GHz images, the M-ring fits provide the same passing fractions for the diameter (65%), width (22%), and asymmetry (95%).
86 GHz Source Size and Flux.-The 86 GHz source is only slightly affected by the addition of nonthermal particles as compared to the thermal models. Only the SANE models with R high 40 and a * > 0 produce 86 GHz image sizes larger than Figure 10. Non-EHT flux density constraint example. Left: passing model with SED close to the measured 86 GHz point and below the quiescent 2.2 μm and X-ray points. Right: failing model with inconsistent (strongly rising) millimeter wavelength spectral index, overproduction of 2.2 μm due to strong Comptonization, and overproduction of X-rays by bremsstrahlung. the thermal SANE models. This effect can be seen in the first and third panels in the bottom row of Figure 14. Notice the increased flux density in the jet sheath in the difference image (blue color). This trend increases with the efficiency and is reflected in the decreasing pass fraction: 56% (for ε = 0.05, 0.1) and 55% (ε = 0.2) as compared to thermal models (59%). A similar trend is found for the 86 GHz flux density. The nonthermal particles are mainly located in the jet and thus contribute to the 86 GHz flux. Again, jet-dominated high-spin SANE models typically fail the 86 GHz flux constraint. With increasing efficiency, i.e., adding more nonthermal particles, the pass fraction decreases, with 67% passing at ε = 0.05, 66% passing at ε = 0.1, and 63% passing at ε = 0.2, compared to a pass fraction of 68% for the thermal models.
2.2 μm Constraint.-The 2.2 μm flux density increases for all models. For SANE models, except R high = 1, the addition of nonthermal particles leads to overproduction of 2.2 μm photons. For MAD models, all models overproduce at 2.2 μm for ε 0.05. As noted above, 2.2 μm emission is produced from the tail of the eDF. The thermal eDF tail decreases exponentially, while the κ eDF tail decreases as a power law, so the increase in 2.2 μm flux density is unsurprising. This is a general feature of the nonthermal models: 2.2 μm observations sharply limit the allowed population of nonthermal electrons.

Variable-κ Model
The high-energy variability observed in many astrophysical sources, including the Galactic center, may be associated with magnetic reconnection. Particle-in-cell (PIC) simulations have found that the slope of the nonthermal tail depends on σ and β (see, e.g., Ball et al. 2018). Here we consider a κ eDF model in which κ and w vary following the prescription of Ball et al.
We use emissivities and absorptivities from Pandya et al. (2016), computed numerically for the interval 3 < κ 8. For κ > 8 we substitute a thermal eDF. As in the fiducial models, we turn off emission at σ > 1.
We find that the mass accretion rate needed to obtain 〈F 230 〉 = 230 GHz is on average 4% larger than for the thermal models, and thus the variable-κ models have slightly higher optical depth.
230 GHz Size.-The disk region is dominated by thermal electrons (i.e., large κ), while the jet sheath has the lowest κ. Therefore, the 230 GHz source size of the variable-κ models is similar to the thermal ones, and no difference in pass fraction is found. A total of 98% of both models are in agreement with the 230 GHz size estimate (see the first and second panels in the top row of Figure 14).
Visibility Amplitude Morphology and Variability.-For the null location of the variable-κ models we find no difference to their thermal counterparts, and for both ∼80% pass this constraint. However, there is a clear discrepancy between the variable-κ and thermal models regarding the VA morphology. Only 60% of the κ models pass the 4 Gλ VA variability constraint, in contrast to 72% of the thermal models.
86 GHz Source Size and Flux.-The pass fractions for the 86 GHz source size constraint are comparable for thermal (59%) and variable-κ models (55%). Given that most of the variable-κ models are optically thicker than their thermal counterparts, the 86 GHz flux is on average lower, which increases the passing fraction from 68% (thermal eDF) to 75% (variable κ eDF). Figure 11. Combined non-EHT constraints (logical and). Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail. 20 2.2 μm Constraint.-Including nonthermal particles via the κ eDF increases the 2.2 μm flux as compared to the thermal eDF. In our prescription κ decreases (more high-energy electrons) as σ increases. In MAD models σ is systematically larger than in SANE models. Consistent with this, we find that most of the MAD models fail the 2.2 μm constraint, whereas in SANE models κ is large and the passing fraction for SANE models is almost indistinguishable from their thermal counterparts. In total, 35% of the variable-κ models pass the 2.2 μm constraint, compared to 55% for models including only thermal particles.
X-Ray Constraint.-Including nonthermal particles via the variable κ eDF reduces the pass fraction from 61% (for thermal eDF) to 35%. The κ eDF provides a larger population of seed photons in the NIR and at higher energies that can be boosted into the X-ray by a single scattering event.
Trends across BHAC and H-AMR Models for Variable-κ Models.-For the variable-κ models we have redundant models from BHAC and H-AMR (see Table 1). Both model sets show similar trends for all constraints (see Table 6).

Summary of Constraints on Nonthermal Models
In Table 6 we list the pass fractions for BHAC and H-AMR models using different eDFs. Most nonthermal eDF models produce little change compared to the thermal models for most constraints. The 86 GHz size and flux, which are the most important non-EHT constraints, are only marginally affected by the addition of nonthermal electrons. This behavior is obtained especially for eDFs, which mainly add nonthermal particles in the jet, while the disk is populated by thermal ones. In our case this setup is given for variable κ and κ = 3.5 with variable efficiency and is consistent between BHAC and H-AMR models (see Table 6).
If nonthermal particles are included also in the disk, either via a power law with slope p = 4 stitched to a thermal distribution or via a κ = 5 distribution, then there are some variations in pass fractions as compared to the abovementioned eDFs. For the power-law models the addition of nonthermal electrons increases the 86 GHz size by an average of 50%. However, the pass fractions with respect to the thermal models are not changed. In contrast, the κ = 5 model pass fractions decrease by 20% compared to the thermal models. For p = 4 and κ = 5, fewer models pass the 230 GHz m-ring width, with a consistent decrease by ∼20% for both models. Interestingly, the other m-ring constraints, i.e., the diameter and asymmetry, are not affected by the addition of nonthermal particles. This can be explained by the finer and brighter ring feature found in p = 4 and κ = 5 models connected to their smaller optical depth compared to their thermal counterparts (see Table 6).
In general, the fraction of models passing M 3 increases with the addition of nonthermal particles, independent of the prescriptions of the eDF. This is due to the shorter duration of the exploratory runs and not an actual reduction in variability. The main characteristic of the nonthermal models is the increase of 2.2 μm and X-ray flux densities. However, in a large fraction of models this leads to overproduction of 2.2 μm or X-ray flux and the pass fractions are reduced.
Six nonthermal models pass all 11 constraints (see Table 7). These models are one H-AMR MAD p = 4 model with a * = 0.5 seen under an inclination i = 50°and R high = 1 and a high spinning SANE model with a * = 0.94 at an inclination of i = 50°with R high = 40. From the BHAC variable κ three models are in agreement with all constraints, namely, spin a * = 0.5 at inclination i = 10°at R high = 80 and 160 and a model with the same spin seen under a slightly larger angle of  i = 30°with R high = 160. The last of the six survivors is a BHAC SANE model with variable efficiency of ε = 0.05 with an inclination of 10°and R high = 10. These models share a common low inclination angle i 60°and positive spin. We note that the MAD models coincide with the cluster of thermal models found for both BHAC and KHARMA models (see Section 4.1.4).We also show the nonthermal models that fail only one constraint in Table 8.

Tilted Models
Aligned models are a special case: in general, one expects that the spin angular momentum of the black hole and the orbital angular momentum of the accretion flow are misaligned. Here we consider misaligned flows around an a * = 15/16 black hole from Liska et al. (2018) and Chatterjee et al. (2020).
All aligned models considered so far produce either a SANE or MAD accretion flow. The tilted disk model initial conditions, however, produce a strongly magnetized near-MAD outcome with dimensionless magnetic fluxes between 25 and 50, a state we describe as IN-SANE. We consider three GRMHD simulations with tilt 0°, 30°, and 60°.
The tilted models exhibit a warped disk due to Lense-Thirring precession. The time-averaged disk and jet are therefore nonaxisymmetric. Since the inner and outer disks have different orientations, it is necessary to specify the coordinate axis of the observer. We consider three observer inclinations with respect to the outer disk at a single azimuth of 0°(for more details, see Chatterjee et al. 2020). 162 The 230 GHz pre-image size of edge-on large-R high models increases slightly for the tilt-60°compared to the aligned case. This occurs because the inner jet is warped and creates an extended image. This effect is also seen in the 86 GHz image size. On the other hand, the 86 GHz flux varies little with tilt despite the presence of a boosted jet component at large tilt angles.
Variability increases with tilt. In tilted disks, accretion occurs via thin plunging streams (e.g., Fragile et al. 2007), where electrons in the shocked flow can be heated to relativistic temperatures (e.g., Dexter & Fragile 2013;Generozov et al. 2014;White et al. 2019), forming localized, fluctuating hot spots more easily than in aligned disks and increasing flux variability Bauer et al. 2022). Nevertheless, 20/27 models pass the M 3 constraint because of the short duration of the tilted models, which provide fewer M 3 samples than the fiducial models.
The 2.2 μm flux density also increases with tilt. The 2.2 μm flux exceeds the 1.0 mJy limit for all three tilts, with a few exceptions, e.g., R high = 160 models at 10°inclination, which makes it difficult to favor the aligned case over the tilted one. Furthermore, misalignment destroys the axisymmetric nature of the accretion flow. The current model set covers a small parameter space in inclination and R high . A thorough exploration of the source azimuthal angle with respect to the observer is left to future studies.
To summarize: for the model set considered here tilt primarily affects variability and the 2.2 μm flux density, tending to increase both and thus shifting acceptable aligned models into rejected models as tilt angle increases. These trends are consistent with those observed by White & Quataert (2022).

Stellar-wind-fed Models
The accretion models of Ressler et al. (2020aRessler et al. ( , 2020bRessler et al. ( , 2018 track plasma from magnetized stellar winds down to the event horizon and provide a self-consistent picture of the origin of both gas and magnetic fields in the accreting plasma in Sgr A * . The resulting inflow does not fully circularize, so the models provide a distinct alternative to the fiducial models, which assume that the torus initial conditions relax to an astrophysically accessible state for the inner accretion flow. In the wind-fed models the density of the wind is fixed, so the 230 GHz flux density is matched to observations by varying R high instead. Note. Models that pass all but one constraint. Since no model passes all constraints, these represent the parameters that are closest to being consistent with observations. We use two versions of the model: one in which the stellar wind magnetization is low (β = 10 6 ), and a second in which the magnetization is high (β = 10 2 ). R high is adjusted until each model has the observed time-averaged 230 GHz flux density, with R high = 13 (β = 10 6 ) and R high = 28 (β = 10 2 ).
Both wind-fed models produce rings that are too narrow, failing the m-ring width test. In addition, both are too bright at 86 GHz and fail the M 3 test, although they are quieter than MAD models and close to the cutoff.
Both non-EHT and EHT constraints have the power to test wind-fed models. It is not possible to draw broad conclusions about the viability of the wind-fed models in general, however, since the two models tested here contain only a single spin (a * = 0) and all use the R high thermal eDF model.

The Goldilocks Model and Polarization
The fiducial models cover a regular grid in parameter space. In one instance, for KHARMA models, adjacent points in parameter space fail only one constraint for opposite reasons: the SANE, a * = 0.5, R high = 40, i = 10°models fail because the 86 GHz image is too small, while the i = 30°models fail because the 86 GHz image is too large, suggesting that a model with intermediate inclination would pass all constraints.
We analyzed an intermediate-inclination model at i = 20°. This "goldilocks" model passes all constraints (we did not compute the X-ray luminosity, but the neighboring i = 10°and i = 30°models pass the X-ray constraint).
We have imaged a series of KHARMA models with inclinations between 10°and 30°. The cause of the inclination sensitivity is an extended, 86 GHz bright jet. At i = 10°the jet is nearly parallel to the line of sight and the source size is dominated by the accretion flow, but as i increases, the jet, which is radially extended, begins to extend past the accretion flow and dominate the source size.
Despite this success, we regard the goldilocks model as unpromising for two reasons. First, the 10°and 30°BHAC models fail the X-ray constraint, and the neighboring 10°and 50°H-AMR models fail several constraints.
In addition, the goldilocks model is likely underpolarized at 230 GHz. The source-integrated linear polarization of Sgr A * at the time of the 2017 campaign was between 6.9% and 7.5% (Goddi et al. 2021), consistent with historical measurements. We will consider linear and circular polarizations in future papers, but our preliminary finding is that the goldilocks model has linear polarization ≈1%, which is far too low. We find that the best-bet region MAD models, considered below, have linear polarization compatible with observations.

Testing Exploratory Models That Pass
Six exploratory models, considered in Section 4.2, pass all constraints. Although they appear promising, these six models are imaged for only 5000 GM/c 3 and thus have been tested more weakly than the fiducial thermal models.
To evaluate the effect of run duration, we imaged the six passing exploratory models for 15,000 GM/c 3 . All failed one or more constraints, which are listed in Table 7. Evidently adding a population of nonthermal electrons does not provide a consistent way of transforming failing fiducial models to passing models.
The pass/fail status of the model can be sensitive to the length of integration, and it is important to image the models for at least 15,000 GM/c 3 . In connection with this, we note that the koral models, which were imaged at 230 GHz for ∼ 10 5 GM/c 3 , were generally consistent with the fiducial models but provide tighter M 3 constraints (see Appendix B). The constraints that are most sensitive to model duration are M 3 (all models failed M 3 after being extended) and the m-ring fits (two failed m-ring width, one failed m-ring diameter).

Origin of Variability Excess
Approximately 84% (KHARMA), 73% (BHAC), and 97% (H-AMR) of the fiducial models fail one or both variability Note. Pass fractions for BHAC and H-AMR models for various eDFs. Note that the thermal models are run for 10,000 GM/c 3 and 15,000 GM/c 3 for BHAC and H-AMR, respectively, while the nonthermal models are only run for 5000 GM/c 3 . This results in a much lower pass fraction in the thermal models for the light-curve variability constraint and all three m-ring constraints.
constraints. This naturally leads one to ask whether there is an observational or modeling problem with these constraints. For example, it is possible that a fraction f ext of the 230 GHz flux density is in an extended structure (e.g., a jet) that is slowly varying, unresolved by ALMA, and resolved out by EHT. The observed M 3 would then be smaller than the true M 3 for the compact source by a factor of 1/(1 + f ext ). The 4 Gλ amplitude variability is normalized by the light curve, and thus a 4 would be suppressed by a similar factor. Diffuse emission on scales larger than the VLBI images ( ∼ 100 μas) and smaller than connected element interferometer measurements ( ∼ 100 mas) is difficult to constrain. The EHTC Figure 14. Influence of the eDF on the image structure for a SANE model with spin a å = 0.5 seen under a viewing angle of 30°using R high = 40 at t = 25,000 GM/ c 3 . In the first and third rows the panels show the image structure, from left to right, of a thermal, variable kappa κ(σ, β), κ = 3.5 (fixed) with efficiency ε = 0.2, and κ = 5 (fixed) everywhere eDF at 230 GHz and at 86 GHz. Notice the increased field of view for the 86 GHz images. The second and fourth rows show the difference between thermal image and the different nonthermal eDFs.
imaging strategy involves a self-calibration step that assumes no diffuse structure on scales between the zero baseline and the shortest VLBI baselines (Paper III). However, longer-wavelength VLBI observations place constraints on any emission on these scales under the assumption of flat or steep spectra. For instance, 230 and 43 GHz VLBI observations that use the Los Alamos-Pie Town-VLA baselines probe scales of ∼1 to 10 mas and demonstrate no inconsistency in closure amplitudes and closure phases with a symmetric, two-dimensional Gaussian model (Bower et al. 2004). VLBI observations at 3 mm wavelength are also fully consistent with a two-dimensional Gaussian model with an upper limit of ∼ 10 mJy, or approximately 1% of the total flux density (Brinkerink et al. 2019). Additionally, a dust contribution is constrained by shorterwavelength ALMA observations that find a flat or slightly falling spectrum up to 900 GHz (Bower et al. 2019). A substantial diffuse dust contribution would only be consistent if its properties were tuned to match a steeply falling compact synchrotron spectrum, which would likely be inconsistent with the substantially variable far-infrared component of Sgr A * emission von Fellenberg et al. 2018).
Future observations with short-baseline coverage such as that provided by the Kitt Peak−to−SMT baseline will enable tighter constraints on any diffuse component. If confirmed, the presence of diffuse flux would require renormalization of the models and reevaluation of the constraints. A reduction of ∼30% in the compact flux would lead many SANE models to fail the M 3 constraint because they would be not variable enough and would lead most MAD models to pass the M 3 constraint. Since~ F M 230 2 , a 15% change in model density normalization, and therefore in optical depth, would suffice.
The variability excess might also be caused by physical incompleteness of the models. Collisionless effects and radiative cooling could both reduce variability.
We model the accreting plasma as an ideal fluid when in fact it is collisionless, with Coulomb mean free path large compared to GM/c 2 . This is less worrisome than one might think: electrons and ions are confined to helical orbits around field lines, implying an effective mean free path perpendicular to the field lines of order the gyroradius ∼ 56Θ e [B/(30 G)] −1 cm = GM/c 2 . The mean free path parallel to field lines is still long but may be limited by scattering off electromagnetic field fluctuations excited by kinetic instabilities rather than Coulomb scattering.
Future global kinetic general relativistic PIC simulations may be able to test how well the ideal fluid model describes collisionless accretion flows. Meanwhile, a relativistic fluid model incorporating small collisionless corrections was developed by Chandra et al. (2015) and studied numerically by Foucart et al. (2017). The leading-order corrections are conduction and pressure anisotropy (i.e., viscosity). The effect of conduction and viscosity on our constraints is not known, but it is known that viscosity reduces turbulent stress in SANE models , consistent with a reduction in variability. It is also plausible that conduction smooths out temperature maxima, possibly also leading to a reduction in variability.
Our models neglect radiative cooling. Cooling is fastest where the electron temperature is highest, so cooling has the potential to blunt local maxima in temperature (Yoon et al. 2020). If local maxima with short cooling times contribute significantly to variability, then cooling could reduce M 3 . Selfconsistent cooling requires integration of an electron energy equation (e.g., Ressler et al. 2015) and assignment of a density scale (mass unit or accretion rate) when the GRMHD simulation is run, vastly increasing computational cost.
Another possibility is that self-consistent electron heating reduces variability. In Appendix B we consider a set of such models from Dexter et al. (2020). These models show a variability excess that is similar to the fiducial models.
The R high prescription contains a parameter R low that determines the ion−electron temperature ratio at low β, which we have consistently set to 1. Increasing R low models the effect of rapid cooling in low-β regions. In Appendix B we consider R low up to 10 in a sample of four models and show that it does not systematically reduce variability.
The variability excess might also be caused by numerical inaccuracies in radiative transfer, truncation error in the GRMHD integrations (limited resolution), limited simulation duration, or misspecification of the adiabatic index. These are considered in Appendices A and B. We find no evidence that any of these effects is producing the variability excess. A more extensive study on how these effects could alter the variability is warranted but outside the scope of this paper.

Best-bet Region without Variability
From the preceding discussion it is possible that a combination of extended flux, viscosity, cooling, and numerical limitations affects model variability enough to compromise the variability constraints. Notice that a relatively small change in variability, ∼30% in M 3 , is sufficient to promote many of the MAD models from failing to passing. If we exclude the variability constraints, then, which models are favored? Figure 15 (which also appears as Figure 33 in Appendix C) shows the result of applying all constraints except structural and flux variability to the fiducial models. Most negative-spin models are ruled out, and two MAD, positive-spin, lowinclination, large-R high models pass all constraints for KHARMA and BHAC (the H-AMR simulations do not include i = 30°). Nearby models in parameter space are close to passing in the sense that they pass for one or more of KHARMA, BHAC, and H-AMR. 163 We will call the green region of parameter space in Figure 15 the best-bet region. The models in this part of parameter space perform well and explain nearly all the data.
Given the uncertainty associated with the variability excess and the possibility of missing physical ingredients in the model, the existence of the best-bet region cannot be regarded as evidence that Sgr A * has positive spin and low inclination. Given the large and uncertain set of constraints, however, it is remarkable that any models perform as well as these do. The models in the best-bet region merit additional analysis, and it is interesting to ask what they predict for future observations.

Fiducial Model Accretion Rate and Outflow Power
What is the accretion rate in Sgr A * ? The time-averaged r 163 If we use the more permissive 86 GHz size constraint from Issaoun et al.
(2019), we find two new best-bet models: one MAD model close to the best-bet region at a * = 0.94, i = 10°, and R high = 160; and one SANE model at a * = 0.5, i = 30°, and R high = 10.
25 at the event horizon; the quantity in parentheses is the inward rest-mass flux density. Figure 16 (top panels) shows  M in solar masses per year for the KHARMA and BHAC thermal models. The accretion rates follow immediately once the models are normalized so that 〈F 230 〉 = 2.4 Jy.
MAD models accrete at 10 −9 to 10 −8 M e yr −1 , while SANE models have a broader range, 10 −9 to 10 −6 M e yr −1 .  M is an increasing function of R high , which is a result of the thermal synchrotron emissivity increasing with increasing electron temperature, so that at fixed flux density lower electron temperature models (higher R high ) have higher  M. SANE models exhibit a stronger dependence on R high than MAD models. The emission in the SANE models is predominantly from regions with large β, where the electron temperature is regulated by R high . MAD models produce more of their emission in regions with β ∼ 1, where R high only has a weak effect on electron temperature (see M87 * Paper V, Figure 4). For both SANE and MAD models,  M decreases with increasing spin. This follows from the dependence of temperature on spin shown in Figure 2-higher spin models have higher gas temperature, which in the R high model implies higher electron temperature.
Retrograde SANE, R high = 40 and 160 models produce the largest˜--  M M 10 yr 6 1 . These models have a high midplane density of cool electrons that, in KHARMA models, overproduce X-ray emission through bremsstrahlung and are therefore ruled out. Critical beta models have accretion rates that lie between the fiducial model values for R high = 10 and R high = 40 for the selected critical beta parameter values ( f = 0.5, β crit = 1).
How do our  M compare with earlier estimates? Linear polarization and Faraday rotation measurements at millimeter and submillimeter (submm) wavelengths (Agol 2000;Quataert & Gruzinov 2000;Bower et al. 2003;Macquart et al. 2006;Marrone et al. 2006aMarrone et al. , 2006b) and X-ray emission (Baganoff et al. 2003;Wang et al. 2013) combined with semianalytic models predict˜- M 10 9 to 10 −7 M e yr −1 . The broad range of values is due to the differences in regions of radio emission in the theoretical models that are considered (ADAFs: Narayan et al. 1998;Yuan et al. 2003;jet models: Falcke et al. 1993;Falcke & Markoff 2000;ADAF+jet models: Yuan et al. 2002). All our MAD model  M fall within the range of these historical observational estimates. In contrast, almost all SANE models with larger R high parameter (except SANE a * = 0.94, which is one of our best models) have an  M that is inconsistent with earlier estimates.
All fiducial models produce outflows in the polar regions. In many cases the outflows can be divided into a slower, denser disk wind and a relativistic, high-σ Poynting jet. The outflows have a power that is comparable to or larger than the bolometric luminosity. What is the outflow power P out ?
First, we must define P out . There are a number of competing definitions in the literature; we set , 21 t r r out poles where "poles" indicates θ < 1 rad or θ > (π − 1)rad. We include only those θ where the time-and azimuth-averaged energy flux is outward. The integral is evaluated at r = 100 GM/c 2 . P out includes power in the relativistic Poynting jet, if present, and in the slower, denser disk wind. Figure 16 (bottom panels) shows P out for the fiducial KHARMA and BHAC models. As expected, the outflow power increases with the magnitude of black hole spin. We find that SANE models have 10 35 erg s −1  P out  10 38 erg s −1 . Evidently P out increases with R high , as expected from the behavior of  M: higher- M models have stronger magnetic fields. We find that MAD models have 10 37 erg s −1  P out  10 39 erg s −1 and are, on average, more powerful and less sensitive to R high .
Many high-spin or large-R high models have P out ∼ 10 38 erg s −1 . An outflow with this power could produce dramatic observable effects in the dense interstellar medium (ISM) of the Galactic center. For instance, the X-ray transient CXOGC J174540.0 −290031, located only 0.1 pc from Sgr A * and with an estimated jet power of ∼ 10 36 erg s −1 , produced a compact bipolar lobe at radio wavelengths with peak flux densities near 100 mJy (Bower et al. 2005). A more continuous outflow, however, might clear out a substantial volume of space, making identification of any interaction with the ISM less certain. Nevertheless, there have been a number of large-scale features that have been suggested as the result of interaction of a jet with the ISM (e.g., Li et al. 2013;Cecil et al. 2021).

Electron Distribution Function
One of the central uncertainties in modeling Sgr A * is eDF assignment. Do the surviving models have anything to say about the eDF?
In our fiducial models, thermal eDFs with equal ion and electron temperature (R high = 1) are ruled out, in most cases by more than one constraint. For MAD models this is easily explained: Comptonization is strong at R high = 1, and X-rays are overproduced. For MAD models,  M and therefore the electron-scattering optical depth τ es are insensitive to R high . Since the amplitude of the first Compton bump is t µ = Q y 16 e 2 es , the high Θ e at R high = 1 produces a large X-ray flux. For SANE models the situation is more complicated (see Appendix C), with m-ring width rejecting many R high = 1 models at a * 0, and 86 GHz flux and size rejecting the rest. The latter is a consequence of model electron temperature reaching a maximum at large a * and small R high , so that optical depth is a minimum and therefore so is the 86 GHz image size.
In some fiducial SANE models the sense of the X-ray constraint is reversed: bremsstrahlung is strong and X-rays are overproduced where R high is large. Again, this is easily understood: when R high is large, the midplane electrons are cold, the accretion rates (and therefore n 2 ) are high, and the bremsstrahlung emissivity is large. More generally, the X-rays provide a strong constraint on the presence of dark (subrelativistic) electrons, which are otherwise undetectable in millimeter wavelength emission or absorption, although they can produce strong Faraday rotation.
We have tested a large set of nonthermal models, which have a power-law tail on the eDF. Although integration times for the nonthermal models are too short to provide strong model constraints, there are trends that emerge from the existing data.
First, the 230 GHz images are relatively insensitive to the presence of nonthermal electrons for models in which most of the nonthermal electrons are introduced in and near the outflow region. This is encouraging: the 230 GHz image is generated by electrons in an approximately thermal core of the eDF and is relatively insensitive to the behavior of the tails.
Second, as one might expect, the 2.2 μm flux density is an increasing function of nonthermal electron density. In many models (e.g., the variable efficiency models of Section 4.2.3) the addition of a power-law tail changes a thermal model that passes the 2.2 μm test into a nonthermal model that fails.
Third, it is important to understand that many of the nonthermal models we use are linked to the R high prescription in some way. For example, the κ distribution function contains a width parameter w, and this is set using an R high -like prescription with width depending on β. Our nonthermal models are only a few points in a vast function space of possible nonthermal parameterizations, with none of the models considered allowing for an electron energy density that depends on plasma history and instantaneous plasma state.

Collisionless Plasma Effects
The mean free path to Coulomb scattering for particles is typically larger than or comparable to the system size in Sgr A * , rendering its plasma collisionless. The GRMHD simulations used in this work describe a collisional system, whereas a firstprinciples modeling of the collisionless plasma requires a fully kinetic treatment. General relativistic (radiative) kinetic simulations are crucial for dynamically probing the electron temperature, effects of nonthermal distribution functions, and pressure anisotropy and their interplay with radiation in collisionless plasma in the accretion disk and jet. While global general relativistic kinetic simulations cannot be performed with full physical separation between microscopic plasma scales (the particle's Larmor radius r L , and plasma skin depth d e ) and macroscopic scales (the gravitational radius r g ), they can achieve the right hierarchy of scales (r g ? d e ? r L ) for magnetized plasmas (Chen et al. 2018;Levinson & Cerutti 2018;Parfrey et al. 2019;Chen & Yuan 2020;Crinquand et al. 2020;Kisaka et al. 2020;Bransgrove et al. 2021;Crinquand et al. 2021). Even in GRMHD, it is computationally challenging to resolve plasma heating processes powering the observed radiation in a converged manner. It is not yet feasible to resolve dissipation at the smallest scales of the turbulent cascade or the interplay between turbulence and reconnection at a similar level to that in local box simulations (Riquelme et al. 2012;Hoshino 2013Hoshino , 2015Kunz et al. 2016;Zhdankin et al. 2017;Comisso & Sironi 2018;Inchingolo et al. 2018;Zhdankin et al. 2019;Nättilä & Beloborodov 2021;Chernoglazov et al. 2021). However, Porth et al. (2019) and H. Olivares et al. (2022 in preparation) show that the global accretion dynamics (mass accretion rate, magnetic flux on the horizon, and MRI quality factor) are converging between the different simulations in this work. Kinetic processes in the (near-)collisionless plasma may increase the effective particle collision rate (see, e.g., Kunz et al. 2016). Deviations from the infinitely conducting ideal fluid approximation may alter the thermodynamics of the flow (see, e.g., Foucart et al. 2017, and Appendix C1). Some aspects of (near-)collisionless plasma dynamics can be described with nonideal effects (e.g., viscosity, resistivity, heat conduction, pressure anisotropy) in GRMHD simulations of black hole accretion (e.g., Bugli et al. 2014;Chandra et al. 2015;Foucart et al. 2016;Chandra et al. 2017;Foucart et al. 2017;Qian et al. 2018;Ripperda et al. 2019;Vourellis et al. 2019;Ripperda et al. 2020;Nathanail et al. 2021;. For example, the first efforts have recently been made with highresolution global GRMHD simulations to capture heating through magnetic reconnection in the largest current sheets in the system (Nathanail et al. 2020;Ripperda et al. 2020; Note. Exploratory models that pass all constraints when computed to 5000 GM/c 3 . When extended to 15,000 GM/c 3 , each model fails one or more constraints.  Chashkina et al. 2021;Nathanail et al. 2021;Ripperda et al. 2022).

Positrons
So far we have considered only ion−electron plasmas, but pairs can be produced in the 230 GHz emission region through pair discharges or through so-called pair drizzle.
The importance of pairs has been assessed using phenomenological models (Anantua et al. 2020a;Emami et al. 2021) and depends sensitively on the efficiency with which a reservoir of magnetic energy can be converted into pairs. If this efficiency is large, then pairs can substantially increase intensity in the jet region.
Production of pairs through the drizzle process is weak in Sgr A * because its luminosity is low. Wong et al. (2021; see also Mościbrodzka et al. 2011) estimate the drizzle pair density of Sgr A * to be 10 −8 cm −3 . This is well below the Goldreich −Julian density ∼ a * Bc 2 /(4πeGM) ∼ 10 −2 a * [B/(30 G)] cm −3 required to screen electric fields, suggesting that pair discharges are likely. If pair discharges serve only to raise the pair density to the Goldreich−Julian density, however, then the jet is unlikely to outshine the accretion flow, where the magnetic field strength is similar to that in the jet but the characteristic number density is ∼ 10 6 cm −3 . The maximum conceivable impact of pairs on EHT observations is obtained by converting about half of the magnetic energy density into pairs with Lorentz factor ∼30 so that in Sgr A * 's ∼30 G magnetic field the emissivity of the resulting pairs peaks close to 230 GHz. Then, the pair density ∼ 10 6 cm −3 and the jet might compete with the accretion flow.

Outlook
Except for a brief discussion in Section 5.1, our analysis omits polarization. Future analyses will test our models against integrated polarization of Sgr A * (Goddi et al. 2021) and against polarized imaging, as was done for M87 * (M87 * Paper VII; M87 * Paper VIII).
Our analysis also omits discussion of one of the main observational features of Sgr A * : the NIR and X-ray flares, for which there is as yet no consensus model. Our analysis is built on the notion that the NIR flares, at least, could be produced by accelerating a small fraction of the electron population into a nonthermal tail. The increase in 2.2 μm flux density with nonthermal population seen in Section 4 is consistent with this notion.
The agreement between our results on the source size and orientation and those of the GRAVITY Collaboration (Gravity Collaboration et al. 2018) also support the similarity in the spatial distribution of the electrons producing NIR flares and those responsible for the 230 GHz emission. Gravity Collaboration et al. (2018) find that the NIR flares originate from a region ≈40-50 μas from the black hole, only slightly larger than the diameter of the m-ring fit to the 230 GHz image. Moreover, the combined results of our analysis point toward a low observer inclination, again consistent with the GRAVITY results, although the GRAVITY results are based on a model with a hot spot orbiting at 40°from the plane of the sky.
In M87 * we were able to identify a sense of rotation of the source from the asymmetry of the observed ring and the orientation of the large-scale jet. For Sgr A * we have not yet been able to identify a preferred PA for the source or measure the amplitude of source asymmetry, perhaps because it is small (Paper IV). The sparse baseline coverage from 2017 sharply limits our ability to detect asymmetry, but the 2021 observation already had better coverage, and future EHT campaigns will add even more stations. Unless the source is aligned and i ≈ 0°, all our models (which have a definite sense of rotation) predict that the brightest point on the ring is produced by Doppler boosting and should lie on the approaching side of the accretion flow. The sense of rotation, determined either from the helicity of spiral features in the flow or from tracking rotation of bright spots, could be compared to the clockwise motion of NIR flares observed by GRAVITY (Gravity Collaboration et al. 2018).
Our analysis shows the value of simultaneous measurements. Simultaneous or near-simultaneous GMVA observations, in particular, provide a powerful constraint on the models. The eDF is a major source of uncertainty in our analysis, and since the submm and 2.2 μm flux densities are most sensitive to the eDF, future analyses should incorporate submm data and future EHT campaigns should seek near-simultaneous submm observations.
Our analysis provides some guidance for future numerical modeling of Sgr A * . It is clear that models require long integrations (30,000 GM/c 3 ) to provide a converged characterization of the source. We also note that there are regions in parameter space where we may not have sampled densely enough-where, for example, the model is too large and too small at adjacent points in parameter space. An example of this is the 86 GHz size measurement, which is sensitive to inclination, as seen for the goldilocks model. We found that being able to compare three simulation pipelines helped us identify numerical sensitivities and saved us from error on a number of occasions. One point raised in these comparisons is that the 2.2 μm flux density is sensitive to where emission is cut off in high-σ regions of the flow-the socalled σ cut parameter. This point merits future investigation.
Throughout this work we have assumed that the mass and distance to Sgr A * are known. This assumption could be relaxed and the models checked for consistency with the stellar orbit measurements of mass and distance. Phenomenological accretion flow models are particularly well suited to this type of study since they are inexpensive to compute (e.g., Broderick et al. 2009).

Conclusions
We have made a first comparison of the EHT 2017 Sgr A * data to a state-of-the-art library of ideal GRMHD models. The models assume that the mass and distance to Sgr A * are known and that the central object is a black hole described by the Kerr metric. We use multiple simulation pipelines and find that, for a given model configuration, independent simulations are remarkably consistent (Appendix A).
The model parameters are as follows: whether the horizon magnetic field is strong or weak (MAD or SANE, respectively); the black hole spin a * ; and the inclination angle i between the line of sight and the accretion flow orbital angular momentum vector. The eDF also has one or more parameters. In our "fiducial" model set, run with three independent codes, the eDF is determined using the so-called R high prescription (Section 2). We have also considered exploratory models with alternate eDF prescriptions and alternate initial conditions.

28
We have selected and applied 11 heterogeneous observational constraints. Six derive directly from EHT VLBI data, two derive from 86 GHz VLBI observations with the GMVA, one from variability of the 230 GHz light curve, and one each from the 2.2 μm flux density and the X-ray luminosity.
Five structural constraints derive from EHT VLBI data. When combined, these constraints reject about 75% of our fiducial models. The EHT cut favors a * 0 and avoids edgeon (i = 90°) models and models with equal ion and electron temperatures (R high = 1). We are not able to constrain the source PA owing to sparse baseline coverage. The 2017 EHT observations are, nevertheless, quite constraining. New EHT observations with additional antennas will be even more constraining.
The strongest EHT-derived constraint is m-ring width. The physical interpretation of m-ring fits is challenging because fitting is done after the model is observed with limited baseline coverage and limited temporal sampling. Nevertheless, some interpretation is possible. For example, there is a trend toward lower width at higher inclination that eliminates many of the edge-on models. This can be understood since many edge-on models have higher peak brightness temperature than face-on models owing to Doppler boosting of emission from the approaching side of the disk. The flux density, which must average 2.4 Jy, is approximately proportional to the solid angle of the source multiplied by a typical brightness temperature, so when brightness temperature is higher, solid angle must be smaller. If the source is a ring of fixed radius, then higher brightness temperature implies a narrower ring.
Four constraints derive from non-EHT data that are contemporaneous or near contemporaneous. Combined, the non-EHT constraints reject 94% of fiducial models. The non-EHT cut favors strongly magnetized (MAD) models and eliminates most models at i > 50°(consistent with interpretations of GRAVITY results; Gravity Collaboration et al. 2020b), and it also eliminates all models with equal ion and electron temperatures. These results highlight the value of continued multiwavelength monitoring of Sgr A * .
The non-EHT constraints, like the EHT-derived constraints, exhibit complicated but interpretable trends across parameter space. A full discussion of fiducial model trends will be explored in later papers, but as an example consider the 86 GHz size constraint. At R high = 1, the mean 86 GHz FWHM of SANE models decreases as inclination increases. The origin of this trend is very similar to the origin of the trend in m-ring width with inclination. At R high = 1 the electrons are hot and the source is optically thin. The peak brightness temperature of edge-on models is higher than that of face-on models owing to Doppler boosting of emission from the approaching side of the disk, and thus at fixed flux density the source becomes smaller as inclination increases. At R high = 160, by contrast, the trend is reversed and the mean 86 GHz FWHM of SANE models increases as inclination increases. In R high = 160 SANE models the electrons are cool in the disk midplane and most emission arises along the walls of the jet (see Figure 4 of M87 * Paper V). The equatorial plane is relatively opaque and Doppler-boosted emission is hidden. In face-on models one is looking down the jet and the source appears relatively small; in edge-on models the jet is extended perpendicular to the line of sight and the source appears larger. Interestingly, MAD models exhibit only weak trends in 86 GHz size with inclination, in part because MAD models are more optically thin than SANE models and also because MAD models are more slowly rotating than SANE models, weakening Doppler boosting. Sgr A * is variable but is not as variable as we expected based on our fiducial models. We have used two tests to compare the variability of models and data. One characterizes variability in the 230 GHz light curve (including simultaneous ALMA data), and the other characterizes structural variability expressed through fluctuations in the VAs. The light-curve variability is the tightest of all 11 constraints: it rejects 95% of our fiducial models. We find that strongly magnetized (MAD) models are more variable than weakly magnetized (SANE) models, and, grouped together, both SANE and MAD models are more variable than the data. The structural variability constraint measures the slope and amplitude of the power spectrum of the VA variability. Remarkably, we find that the power spectrum slope is consistent for all models, while the power spectrum amplitude is consistent for 43% of fiducial models.
The higher variability of the MAD models compared to the SANE models is a consequence of the quasi-regular magnetic flux expulsion events that are a defining feature of the MAD Figure 15. Combined constraints without structural or flux variability. Green indicates that the KHARMA, BHAC, and (for i = 10°, 50°, 90°) H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all models fail. models. Magnetic flux through the horizon builds up until it exceeds a threshold and then escapes over a few dynamical times through the surrounding accretion flow. As flux is escaping, strongly magnetized low-density regions push aside plasma in the region close to the black hole that produces most of the 230 GHz emission, and this produces large fluctuations in the 230 GHz flux density. The SANE models, by contrast, exhibit relatively steady accretion through the equatorial plane.
The failure of nearly all fiducial models to match the lightcurve variability is interesting. It may signal the presence of extended, slowly varying structure that is resolved out by EHT, or it may signal that future models need to incorporate collisionless effects (potentially modeled as viscosity and conductivity) or a more sophisticated treatment of electron thermodynamics including cooling. In addition, different initial geometries and polarities of the magnetic field could lead to more slowly varying structures (Nathanail et al. 2021). If, when combined, these effects were to reduce M 3 by 30%, then many MAD models would be consistent with the data.
None of the fiducial models survive the full gauntlet of 11 constraints. If we set aside both variability constraints, however, there two fiducial models that pass the remaining nine constraints in all simulation pipelines (a few more survive in one model set but not the other). These models in the "bestbet region" are strongly magnetized (MAD) and have R high = 160, positive spin, and low inclination, with (a * , i) = (0.5, 30°) and (0.94, 30°). They have accretion rates ( =  M 5.2-9.5) × 10 −9 M e yr −1 , which are consistent with earlier estimates and overlap with accretion rates in wind-fed models, ∼ 10 −8 M e yr −1 ). The a * = 0.5 MAD with i = 30°model is presented in Figure 17 as snapshots and in Figure 18 as a weighted average, a convolved average, and a reconstructed average image from synthetic data. One of the reconstructed average images from the 2017 EHT observations is shown in the rightmost panel in Figure 18 for comparison.
We produced synthetic SEDs, and therefore bolometric luminosities L bol , for all fiducial models. Typically L bol is dominated by a synchrotron bump in the submm, and for the best-bet region it is (6.8-9.8) × 10 35 erg s −1 ; the corresponding radiative efficiency is (1.3-3.0) × 10 −3 . The maximum radiative efficiency over the entire fiducial model set is 0.08 (for a MAD, a * = 0.94, R high = 1 model), which is necessary but not sufficient to justify our neglect of radiative cooling in the GRMHD evolution.
All our fiducial models produce bipolar outflows, and for each we measured the outflow power P out , defined in Section 5. Consistent with earlier work we find that outflow power is higher for strongly magnetized (MAD) models than for comparable weakly magnetized (SANE) models and increases by more than an order of magnitude from a * = 0 to |a * | = 0.94. For models in the best-bet region, P out = (1.3-4.8) × 10 38 erg s −1 , corresponding to an outflow efficiency ( )  P Mc out 2 of 0.25-1.6. Such large outflow efficiency is only possible if energy is extracted from a spinning black hole via the mechanism proposed by Blandford & Znajek (1977). It is an open question how these powerful outflows might interact with incoming gas in a self-consistent Figure 16. Comparison of the dependence on spin a * and high end of the temperature ratio R high for the accretion rate  M and outflow power P out in the SANE and MAD cases. Top: accretion rate  M for fiducial models. Since  M depends weakly on inclination i, we show only i = 50°. Bottom: outflow power P out for fiducial models at i = 50°. The colors and markers vary with R high . The dashed lines correspond to KHARMA thermal models, while the dotted lines indicate BHAC thermal models.
accretion model that follows plasma over a larger range in radius than our fiducial models. It is also an open question whether the outflow power could be detected in the dense but crowded galactic center environment. Notice that this outflow luminosity is comparable to the spin-down luminosity of the Crab pulsar.
All fiducial models assume a particular parameterization for the eDF (the R high prescription), use a common initial setup (a magnetized torus), and assume that the black hole spin vector and torus orbital angular momentum are aligned or antialigned. To partially control for the errors introduced by these assumptions, we have included a set of exploratory models. These include several eDF prescriptions, a wind-fed model that tracks accretion from stellar winds down to the scale of the horizon, and tilted disk models in which the black hole spin and torus angular momentum are misaligned.
Our nonthermal models are remarkably similar to their thermal counterparts. For the limited set of nonthermal eDF prescriptions we consider here the 230 GHz image structure differs very little for models in which the nonthermal electrons are introduced mainly in the jet. The 230 GHz variability is not detectably different than corresponding thermal models. 164 The 86 GHz size and flux density, which are the most restrictive non-EHT constraints, are not detectably affected by the addition of nonthermal electrons for most nonthermal models (except κ = 5 models). Nonthermal electrons consistently increase the 2.2 μm flux density over similar thermal models, however. Accelerating even a small fraction of the electron population into a nonthermal tail risks overproducing 2.2 μm emission. The 2.2 μm (and submm through mid-IR) flux density therefore provides the strongest eDF constraints. Future EHT analyses would benefit from incorporating submm constraints (e.g., Bower et al. 2019), and because model submm SEDs are highly variable, the submm and 2.2 μm data should be as close to simultaneous as possible.
The stellar-wind-fed models of Ressler et al. (2020b) feature the best-motivated treatment of boundary and initial conditions for Sgr A * models. They differ from our torus-initialized fiducial models in that they follow plasma from its ejection from stars on known orbits down to the event horizon. We have imaged these models using an R high prescription for the electron temperature, with R high adjusted in the otherwise parameter-free models to produce the correct time-averaged 230 GHz flux density. The two models considered here, both with a * = 0, fail the 86 GHz flux, m-ring width, and M 3 constraints. This does not imply that wind-fed models are ruled out; they clearly merit further investigation with longer integrations over a broader range of eDFs and a * .
In general, black hole accretion flows can be tilted in the sense that the orbital angular momentum of the disk and the spin angular momentum of the hole are misaligned. Tilted disks Figure 17. 230 GHz full-resolution snapshots from a fiducial model in the best-bet region. This model passes 10/11 constraints. The different panels are snapshots taken from the "best time," when the synthetic observation has good (u, v) coverage. have not until now been included in EHT analyses because (i) it is conceivable that accretion flows align either by consistently oriented long-term accretion or by some analog of the Bardeen −Petterson effect (Bardeen & Petterson 1975) and (ii) the tilted disk parameter space is larger than the aligned disk parameter space by two dimensions: the tilt angle and the longitude of the observer. We considered models with tilt 30°and 60°, observed at a single longitude. The integrations were too short (3000 GM/c 3 ) to provide strong constraints on tilt, but we find that the m-ring width test is particularly sensitive to tilt and rejects a progressively larger fraction of the models as tilt increases at the single observing longitude studied here. Tilted models clearly merit further investigation.
Our fiducial models and variable κ nonthermal models have been run with independent GRMHD codes and imaged with independent radiative transfer codes. The outcomes are largely consistent (see Appendix A for details). The code comparisons were valuable and helped identify multiple issues in the independent simulation sets. The consistency between codes is remarkable given the complexity of the modeling process and the scope for error. Tracking down the remaining discrepancies (e.g., in the 2.2 μm flux density) and developing a quantitative error budget is an essential but difficult task for the future. This paper is dedicated to the memory of John F. Hawley, whose pioneering work on black hole accretion flows made this paper possible. We are grateful to an anonymous referee whose comments significantly improved this paper.  (Oliphant 2007), matplotlib (Hunter 2007).

A.1. Consistency of Radiative Transfer Simulations
Two studies have been undertaken within the EHT Collaboration to evaluate the consistency of radiative transfer codes. Figure 19. Correlation between BHAC and KHARMA models for nine model constraints. The horizontal axis is the constraint value from BHAC/BHOSS, and the vertical axis shows the constraint value from KHARMA/ipole. Each point corresponds to a single model, with the width of the distribution shown by the error bars. See text for details.
The first, Gold et al. (2020), evaluated the consistency between general relativistic ray-traced radiative transfer (GRRT) codes when tracing geodesics and when integrating the unpolarized radiative transfer equation. Gold et al. compare BHOSS and ipole, which are the two transfer codes used in this paper, and also compare to grtrans, raptor, odyssey, gray2, and raikou. Code consistency was found to be excellent, with sub-percent-level variations between codes when run with standard numerical parameters, i.e., without accuracy parameters tuned for consistency.
The second, B. Prather et al. (2022, in preparation), evaluates code performance when imaging GRMHD simulation output and when integrating the equations of polarized radiative transfer. Prather et al. include ipole, grtrans, odyssey, and raptor. Code consistency was also found to be excellent.
Uncertainty in the radiative transfer calculation is therefore unlikely to contribute significantly to the model error budget.

A.2. GRMHD Simulations Consistency and Convergence
As evident in Table 1, the thermal models have been calculated for an identical parameter space from two different codes, namely, KHARMA and BHAC for the GRMHD simulations and ipole and BHOSS codes for the GRRT calculations. This allows us to perform an in-depth comparison between the different numerical methods used in this work, in addition to the EHTC code comparison projects Gold et al. 2020).
In Figure 19 we show the correlation between the thermal KHARMA and BHAC models for constraints where we have predictions from both models. The top row shows, from left to right, the 230 GHz flux density, M 3 , and the 230 GHz image size obtained from image moments. Since we normalize the 230 GHz images to an average flux of 2.4 Jy within a time window of 5000 M (28.5 hr for Sgr A * ), the scatter around this value is small. The deviation from an ideal correlation reflects the precision and number of GRMHD snapshots included during normalization procedure.
The correlation in M 3 spreads over ΔM 3 = 0.75, which serves as a measure of intracode (e.g., MAD vs. SANE accretion) and intercode (BHAC vs. KHARMA) differences. Despite these differences, the models show a strong correlation throughout the investigated models and parameter space.
We also find a strong correlation between models and codes for the image size computed from image moments, i.e., secondmoment analysis.
The middle row presents the correlation plots for the 86 GHz flux density (left), the 86 GHz image size using second moments (middle), and the NIR flux (right). The 86 GHz flux and 86 GHz image size exhibit a shift toward larger values for the BHAC models. This difference can be explained by the larger field of view used for the BHAC models at 86 GHz during the radiative transfer calculations. Thus, more extended structure and therefore a larger total flux are included in the BHAC models. This affects mainly models with large inclinations i 70°and jet-dominated emission models (R high 40).
The NIR fluxes show a tight correlation over four orders of magnitude and systematically larger flux for the BHAC models for low NIR fluxes ( ( )<log NIR Jy 7   10 ). These fluxes are far below the NIR constraints of ∼ 1 mJy, and therefore they do not affect the passing or failing of the models. In the thermal models the NIR flux is generated from the tail of the eDF and is thus very sensitive to the electron temperature. Small differences in the distribution and value of the electron temperature between the two codes explain the observed decorrelation at very low NIR flux.
The correlation between models for the m-ring parameters is presented in the third row of Figure 19. The correlation of the diameter of the m-ring is plotted in the left panel. The spread covers nearly the same extent as the 230 GHz image size (top row, right panel); however, the scatter in the correlation is larger. The same is true for the width of the m-ring (middle panel in the last row of Figure 19). Compared to the diameter and width of the m-ring, the asymmetry of the m-ring is less correlated (right panel). Notice that horizontal and vertical limits in the asymmetries occur because the parameter hits the boundary of the allowed range, which is 0.5.
The smaller correlation of the m-ring parameters as compared to the other parameters presented in Figure 19 is a consequence of the noisy nature of the m-ring fits. Still, the distributions are quite symmetric under reflection across the diagonal, so the models are at least not biased with respect to each other. Notice also that these plots do not capture all the information that is contained in the distribution of m-ring parameters, just the central value.
We are somewhat surprised by the strength of the correlations seen in Figure 19. The range of each constraint is substantially larger than the width of the correlation, so the variations between models are real, detectable, and reproducible with independent codes. The question of the origin of the systematic offsets between models for some constraints (e.g., in the NIR) is interesting but beyond the scope of this paper.

Appendix B Variability of GRMHD Models
Nearly all models fail to recover the variability of Sgr A * in 230 GHz flux density as measured by M 3 . In this appendix we discuss and dismiss four possible causes for this variability excess.

B.1. Effect of Resolution
For the 50% change in resolution considered in the comparison shown in the preceding appendix (between KHARMA and BHAC simulations) we find no evidence for systematic changes in M 3 with resolution. This is not a large range in resolution, however, and much higher resolution simulations (Ripperda et al. 2020(Ripperda et al. , 2022Nathanail et al. 2021) show the emergence of qualitatively new structures (plasmoids) in current sheets that could affect 230 GHz variability. A deeper study of the resolution dependence of variability is clearly warranted but is beyond the scope of this paper.

B.2. Simulation Duration
The fiducial models are evolved for ∼ 30,000 GM/c 3 . Figure 20 compares M 3 distributions from a fiducial KHARMA simulation to a koral model with similar parameters that was evolved and imaged for approximately three times longer. The M 3 distributions have similar mean and standard deviation regardless of time interval chosen for comparison.

B.3. Effect of R low
The R high prescription (Equation (13)) has three free parameters: R high , R low , and β crit . In the main text the R high parameter is varied while R low and β crit are set to unity.
The R low parameter determines the electron temperature in regions of low β, i.e., in and near the funnel. Increasing R low mimics rapid electron cooling. We are particularly interested in the effect of increasing R low on M 3 . Figure 21 shows the M 3 distribution for a set of four KHARMA models with four values of R low (1, 2, 5, and 10). Evidently the M 3 distribution does not exhibit a clear trend with R high and is still inconsistent with the observed distribution even at R low = 10.
B.4. Effect of Self-consistent Electron Heating Ressler et al. (2015) provide a formulation to model electron thermodynamics during the fluid evolution. Numerical dissipation at the grid scale sources entropy generation and is used to heat the electrons based on a microphysical, subgrid heating prescription. Local fluid and electromagnetic variables are used to compute the electron entropy, which, along with the ideal gas equation of state, can be converted into a temperature Θ e . This approach allows computing the electron temperature at each time step of the simulation rather than post-processing, as is done in the R high and critical-β prescriptions.
We consider three subgrid heating models that prescribe the partition of dissipated energy into electrons and ions. Howes (2010) computed the ratio of ion to electron heating due to dissipation of Alfvénic turbulent cascade, while Werner et al. (2017) and Rowan et al. (2017) considered magnetic reconnection as the source of energy dissipation at subgrid scales. These studies provide approximate fitting formulae for the ion-to-electron heating rate Q i /Q e based on local ion-toelectron temperature ratio T i /T e and local magnetic field strength-parameterized by σ or β.
We use a subset of the simulations analyzed in Dexter et al. (2020). These include MAD and SANE accretion flows at spins a * = 0, + 1/2, + 15/16. We compute the 3 hr modulation index M 3 , over the time interval 5000-10,000 GM/c 3 . The average M 3 values are comparable to similar R high models, with SANE reconnection models exhibiting a slightly reduced variability as compared to the corresponding turbulent heating models. However, the M 3 distribution is still inconsistent with the historical data.

B.5. Effect of Fluid Adiabatic Index
We expect the ions and electrons in hot accretion flows to be thermally decoupled and the resulting plasma to be twotemperature (Shapiro et al. 1976;Quataert 1998;Sadowski et al. 2016;Chael et al. 2018;Ryan et al. 2018). The electrons are relativistic and can be modeled as a fluid with an adiabatic index Γ e = 4/3, while the ions are nonrelativistic with adiabatic index Γ i = 5/3.
The adiabatic index of the fluid assumes a value between Γ e and Γ i dictated by the thermodynamics of the ions and electrons (see Figure 4 in Sadowski et al. 2016). If the electrons and ions have equal temperature, then in the relativistic electron/nonrelativistic ion regime the fluid adiabatic index is 13/9.
Our simulations are not fully consistent in their treatment of the adiabatic index. All use a fixed Γ ad , but some set Γ ad = 4/3 while others use 13/9 or 5/3. Two-temperature simulations can self-consistently evolve adiabatic indices of electrons and ions and compute the net fluid adiabatic index with contributions from both species (Sadowski et al. 2016). These two-temperature simulations often show variation of the adiabatic index with polar angle, with the fluid energy dominated by hot electrons near the poles (Γ = 4/3) and by cooler ions and electrons in the midplane (Γ = 5/3).
We evaluate the effect of Γ ad on light-curve variability by comparing M 3 for thermal, GRMHD simulations with varying Γ ad . This includes MAD models with Γ ad = 13/9 (see Section B.2 and Narayan et al. 2022) and SANE models with Γ ad = 5/3. The models exhibit light-curve variability similar to the fiducial models, and all have M 3 distributions that are inconsistent with the historical data.

Appendix C Pass/Fail Plots
The full set of constraint results for the fiducial models is presented below in graphical form.
We start with the EHT constraints. Figure 22 shows the 230 GHz 2nd moment constraint and Figure 23 shows the null location constraint. Figures 24-26 show m-ring diameter, width, and asymmetry constraints, respectively. Figure 27 combines all the EHT constraints listed above.

38
The Astrophysical Journal Letters, 930:L16 (49pp), 2022 May 10 Event Horizon Telescope Collaboration et al. Figure 27. Combined EHT constraints. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail. Figure 28. 86 GHz flux constraints. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail. Figure 29. 86 GHz size constraints. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail.

39
The Astrophysical Journal Letters, 930:L16 (49pp), 2022 May 10 Event Horizon Telescope Collaboration et al. Figure 31. X-Ray luminosity constraints. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail. Figure 32. Combined non-EHT constraints. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail.

40
The Astrophysical Journal Letters, 930:L16 (49pp), 2022 May 10 Event Horizon Telescope Collaboration et al. Figure 33. Combined constraints without structural or flux variability. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail. Figure 34. M 3 constraints. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail. Figure 35. EHT structural variability constraints. Green indicates that the KHARMA, BHAC, and H-AMR models pass, yellow that one or two of the fiducial models fail, and red that all three fail.
We then show the non-EHT constriants. Figures 28 and 29 are the 86 GHz flux and size constraints, respectively. Figures 30 and 31 show the 2.2 μm and x-ray constraints. Figure 32 combines all the non-EHT constraints. Figure 33 shows the combined constraints without structural or flux variability. Finally, Figures 34 and 35 show the M 3 constraints and structural variability constraints, respectively. These results are summarized in Section 4.1.4.
The full set of constraint results for the fiducial models is presented below in graphical form. We start with the EHT constraints. Figure 22 shows the 230 GHz 2nd moment constraint and Figure 23 shows the null location constraint. Figures 24-26 show m-ring diameter, width, and asymmetry constraints, respectively. Figure 27 combines all the EHT constraints listed above. We then show the non-EHT constriants. Figures 28 and 29 are the 86 GHz flux and size constraints, respectively. Figures 30 and 31 show the 2.2 μm and x-ray constraints. Figure 32 combines all the non-EHT constraints. Figure 33 shows the combined constraints without structural or flux variability. Finally, Figures 34 and 35 show the M3 constraints and structural variability constraints, respectively. These results are summarized in Section 4.1.4. In each plot the green models pass the constraint or constraints for KHARMA, BHAC, and H-AMR versions of the model, while the red models fail the constraint or constraints for KHARMA, BHAC, and H-AMR. The yellow models have different results for the different codes. Notice that H-AMR models are available only for i = 10°, 50°, 90°; at 30°and 70°o nly the KHARMA and BHAC models are compared.