Quantifying the Effects of Known Unknowns on Inferred High-redshift Galaxy Properties: Burstiness, the IMF, and Nebular Physics

The era of the James Webb Space Telescope ushers stellar populations models into uncharted territories, particularly at the high-redshift frontier. In a companion paper, we apply the Prospector Bayesian framework to jointly infer galaxy redshifts and stellar populations properties from broad-band photometry as part of the UNCOVER survey. Here we present a comprehensive error budget in spectral energy distribution (SED) modeling. Using a z phot > 9 sample, we quantify the systematic shifts stemming from various model choices in inferred stellar mass, star formation rate (SFR), and age. These choices encompass different timescales for changes in the star formation history (SFH), non-universal stellar initial mass functions (IMF), and the inclusion of variable nebular abundances, gas density and ionizing photon budget. We find that the IMF exerts the strongest influence on the inferred properties: the systematic uncertainties can be as much as 1 dex, 2–5 times larger than the formal reported uncertainties in mass and SFR; and importantly, exceed the scatter seen when using different SED fitting codes. This means that a common practice in the literature of assessing uncertainties in SED-fitting processes by comparing multiple codes is substantively underestimating the true systematic uncertainty. Highly stochastic SFHs change the inferred SFH by much larger than the formal uncertainties, and introduce ∼ 0 . 8 dex systematics in SFR and ∼ 0 . 3 dex systematics in average age. Finally, employing a flexible nebular emission model causes ∼ 0 . 2 dex systematic increase in mass, comparable to the formal uncertainty. This paper constitutes one of the initial steps toward a complete uncertainty estimate in SED modeling.


INTRODUCTION
Since the initial data releases from the James Webb Space Telescope (JWST), rapid progress has been made in photometrically identifying z > 9 candidates, leading to a new census of the early stages of galaxy formation (e.g., Finkelstein et al. 2022;Naidu et al. 2022;Donnan et al. 2023;Harikane et al. 2023).Among the surprising findings is the apparent over-abundance of high-redshift sources in comparison to theoretical predictions, or extrapolation of lower-redshift luminosity functions (Bouwens et al. 2023;Castellano et al. 2023;Ferrara et al. 2023;Mason et al. 2023;Mauerhofer & Dayal 2023;Pérez-González et al. 2023).Spectroscopic follow-ups of some of the candidates generally confirm their high-redshift nature, but also reveal lower-redshift interlopers (Arrabal Haro et al. 2023;Curtis-Lake et al. 2023;Wang et al. 2023b).
Two questions immediately emerge from the early searches for z > 9 candidates: the reliability of photometric redshifts and uncertainties in their key properties as inferred by stellar population synthesis (SPS) models.The flexibility in SPS modeling enabled by Bayesian SED fitting codes (Chevallard & Charlot 2016;Carnall et al. 2018;Johnson et al. 2021) means that these systematics can be investigated in detail.In a companion paper (Wang et al. 2023c), we release an SPS catalog, consisting of ∼ 50,000 redshifts and stellar population properties for sources in the Abell 2744 galaxy cluster field, as part of the UNCOVER survey (Bezanson et al. 2022).The UNCOVER SPS catalog is a large-scale application of the Prospector Bayesian framework (Johnson et al. 2021), and is representative of the state-of-theart in SPS modeling.Importantly, it provides a baseline for evaluating alternative SPS model choices in an exotic early universe now made eminently accessible by JWST.
SPS models include multiple ingredients, such as an initial mass function (IMF), stellar isochrones, and stellar spectra for the construction of simple stellar populations, SFH models and stellar metallicity models for the construction of composite stellar populations, a model for dust attenuation and emission, and nebular continuum and line emission (see Conroy 2013 for a review, and Section 5.2 of Wang et al. 2023c for a discussion in the context of the UNCOVER SPS catalog).Even * NHFP Hubble Fellow with data of exquisite quality and wavelength coverage at z ≲ 3, the dependence of inferred parameters on modeling assumptions is already visible (Pacifici et al. 2023).At z ≳ 10, SPS models have yet to undergo robust tests.This implies that the dominant uncertainty for high-redshift measurements is not usually measurement uncertainty, but instead systematic uncertainty which can potentially be orders of magnitudes higher than the former.The implications of different stellar libraries and functional forms of SFHs for JWST observations have been studied in some length (Whitler et al. 2023).This work focuses on a different set of systematics, encompassing the following three model choices.
First, a non-universal IMF.The relative numbers of stars as a function of the stellar mass influence most of the observable properties of stellar populations.The strongest effect is expected on the inferred stellar mass: at rest-frame optical/ultraviolet (UV) wavelengths for rising SFHs, we only observe M ≳ 20M ⊙ stars (e.g., Figure 5 in Conroy 2013), meaning that a considerable fraction of the reported stellar mass relies on an extrapolation from the assumed IMF shape.Despite such importance, a complete theoretical understanding of the origin of the shape of the IMF is still lacking (see Bastian et al. 2010;Smith 2020 for recent reviews).Introduced in Salpeter (1955), the functional form of the IMF above the solar mass scale remains remarkably consistent with a slope of ∼ −1.3.Since then, follow-up studies of the Galactic regions suggest deviations from the nominal IMF (Kroupa 2001;Chabrier 2003), but it is typically assumed to be constant cross time and environment.There is, however, mounting evidence in the nearby universe against a universal IMF (Treu et al. 2010;Conroy & van Dokkum 2012;Geha et al. 2013;McWilliam et al. 2013;van Dokkum et al. 2017).Recent studies have claimed that IMF variations may be particularly relevant in the context of early star-forming activities (Chon et al. 2022;Katz et al. 2022;Pacucci et al. 2022;Steinhardt et al. 2023).In this work we aim to quantify the systematic effects of IMFs that cover a reasonable range as suggested by observational studies.
Second, outshining.It has long been known that bursty star formation causes light from recent star formation to outshine that of older stellar populations (Papovich et al. 2001).Here burstiness refers to the pattern of sporadic star-forming activities interspersed with less active or "quiescent" phases.Such stochasticity is expected at high redshift as the feedback timescale approaches or exceeds the dynamical time of the system, decreasing and delaying the regulatory effect on the star formation rate (SFR) (Tacchella et al. 2016;Faucher-Giguère 2018;Dome et al. 2024).Observations also hint at bursty SFHs (Caputi et al. 2007;Smit et al. 2015;Guo et al. 2016;Looser et al. 2023).A recent bursty star formation in the detected high-redshift galaxies is also strongly motivated from their extremely blue UV slopes, which are explained by ejection of dust due to sudden enhancement of radiation pressure (Tsuna et al. 2023;Ziparo et al. 2023).The recent SFH significantly influences galaxy SEDs across the electromagnetic spectrum and particularly in the rest-frame UV, meaning that the inference of galaxy properties depends on the assumed level of burstiness (Furlanetto & Mirocha 2022;Endsley et al. 2023).Various previous studies have investigated the effect of varying SFHs and priors (Carnall et al. 2018;Leja et al. 2019a;Suess et al. 2022;Duan et al. 2023).We focus on assumed burstiness in the context of nonparametric SFHs, and test it by varying the prior on the change in SFRs in adjacent age bins.Our "bursty" prior represents a more extreme case than what is implied by the bursty continuity prior in Tacchella et al. (2022).This is an intentional choice as the goal is to perform a sensitivity test rather than propose a new physical model.
Third, nebular physics.Line and continuum emission can make up a significant fraction of the total flux for stars at low metallicity and at young ages, up to 50% of the flux in broadband filters for highly star-forming galaxies (Pacifici et al. 2015), implying increased importance at high redshift (Charlot & Longhetti 2001;Anders & Fritze-v. Alvensleben 2003;Reines et al. 2010;Smit et al. 2014).The most sophisticated approach of incorporating nebular physics into SED fitting thus far relies on a model grid pre-computed using photoionization simulations (Gutkin et al. 2016;Byler et al. 2017).While such an approach better accounts for variations in the recent SFH and non-solar metallicity than analytic prescriptions (Fioc & Rocca-Volmerange 1999;Leitherer et al. 1999;Mollá et al. 2009), the accessible parameter space is still limited.The diverse chemical environment that may be present at early epochs (e.g., Bunker et al. 2023;Katz et al. 2023), and the possibility of strong optical nebular emission lines at z < 6 mimicking JWST/NIRCam photometry of high-redshift galaxies (McKinney et al. 2023;Zavala et al. 2023) renew the interest in the modeling of nebular emission with increased flexibility in elemental abundance ratios and separate parameterizations of gas-phase number density and ionizing photon density, while retaining consistency with the ionizing flux emitted by the model stellar populations.We do so by integrating a neural net emulator for the spectral synthetic code Cloudy (Chatzikos et al. 2023), dubbed Cue (Li et al. in prep.), into Prospector.
The structure of this paper is as follows.Section 2 summarizes the JWST observations, including our sample of z > 9 candidates.Section 3 details our fiducial as well as new modeling approaches regarding the IMF, SFH prior, and nebular physics.Section 4 presents the effects of new models on the inferred parameters.We discuss the error budget as well as possible ways forward in Section 5, and conclude in Section 6.

Fiducial Redshifts and Stellar Population Properties
Here we only briefly reiterate the modeling components for completeness.All parameters, including redshift, are inferred jointly using the Prospector inference framework (Johnson et al. 2021), adopting the MIST stellar isochrones (Choi et al. 2016;Dotter 2016) and MILES stellar library (Sánchez-Blázquez et al. 2006) from FSPS (Conroy & Gunn 2010).A non-parametric form is assumed for the SFH, defined by mass formed in 7 logarithmically-spaced time bins (Prospector-α; Leja et al. 2017).Only the contribution of obscured active galactic nuclei (AGN) is considered following Nenkova et al. (2008a,b), and parameterized by the normalization and dust optical depth in the mid-infrared (Conroy et al. 2009;Leja et al. 2018).A mass function prior, and a dynamic SFH(M, z) prior are included to optimize the photometric inference over the wide parameter space covered by deep JWST surveys (Wang et al. 2023d).Lensing magnification, computed from the public v1.1 UNCOVER lensing maps by Furtak et al. (2023), is performed on-the-fly to ensure consistency with the scaledependent priors.The model consists of 18 free parameters, and sampling is performed using the dynamic nested sampler dynesty (Speagle 2020) with a speed up in model generation enabled by a neural net emulator dubbed parrot (Mathews et al. 2023).

High-redshift Candidates
We select z phot > 9 candidates directly from the newly-public SPS catalog, demonstrating the quality of the catalog as well as the exciting parameter space now accessible via deep JWST surveys.Specifically, the criteria are 1.Redshift posteriors in the Prospector-β (Wang et al. 2023d) fits The first criterion defines our selection-criterion 1a imposes the redshift cut, whereas 1-b decreases the probability of low-redshift interpolators given the multimodality commonly seen in the posterior distribution.The rest of the criteria is to ensure the purity of the sample: χ 2 red < 5 is applied to ensure a reasonable fit; that is, to remove data defects in the photometric catalog that are difficult to be described by SPS models.The S/N cut in the photometry is determined empirically after visual inspection of the 31 candidates given the first 4 criteria.
Our selection eventually yields 18 candidates, which are listed in Table 1.In the same table, we cross-match our sample to existing candidates in the literature, and include available spectroscopic redshifts as well.We note that the spectroscopic redshifts are systematically lower than the photometric redshifts by ∼ 1.5 − 2σ, which may be a manifestation of the "Eddington bias" claimed in Serjeant & Bakx (2023).Further details on this sample are supplemented in the Appendix, which include the new candidates from this paper, a possible artifact, and discrepant photometric and spectroscopic redshifts.
In principle, the cutoff based on the maximumlikelihood redshifts is unnecessary if the posterior mass is located correctly and consistently in the full sample.However, as it is still in early stage of photometry calibration for JWST, we impose this criterion for a cleaner sample.In the future, removing this would provide a more independent approach for selecting high-redshift candidates, complementing the traditional color selection technique (e.g., Castellano et al. 2022;Atek et al. 2023b).
We now proceed to investigate the effects of various modeling choices on the inferred redshifts and galaxy properties.It is worth noting that in order to reach a definitive conclusion regarding the signals of interest of this paper, relatively high S/N spectra with medium to high-resolution would likely be necessary.For instance, rest-frame near-infrared spectra with S/N ≳ 100 are needed for placing constraints on the IMF in the local universe (Conroy & van Dokkum 2012;van Dokkum & Conroy 2012).Given the difficulty in acquiring such data at high redshift, the scope of this paper is restricted to exploring the systematic effects on inferred properties when fitting broad-band photometry.

VARYING KNOWN UNKNOWNS IN STELLAR POPULATION INFERENCE
While the current inference framework is highly flexible, the known unknowns in the SPS modeling may induce systematic shifts in the inferred parameters comparable to or larger than the uncertainties in the fiducial model.The companion paper (Wang et al. 2023c) discusses some of the assumptions that are particularly relevant to the UNCOVER SPS catalog.This work further examines the influences of a steeper/flatter IMF, bursty star formation, and more flexible nebular emission modeling, as summarized in Figure 1.We elaborate on both fiducial and alternative models below.

Initial Mass Function
The fiducial IMF is taken from Chabrier (2003).We consider two alternative IMFs described by a power law of slope x over the full mass range as Two variations are examined, x = 1.8 and x = 2.8, covering a reasonable range as suggested from observations (Cappellari et al. 2012;Martín-Navarro et al. 2015;van Dokkum et al. 2017).For the main analyses, we define all the IMFs over the interval 0.08 < M ⋆ < 120 M ⊙ for a controlled experiment.However, the minimum (m l ) and maximum stellar mass cutoffs remain uncertain.The low mass end of the IMF is particularly difficult to constrain since these stars are largely invisible outside the Milky Way.We thus additionally consider two cases where m l = 0.5M ⊙ and m l = 1M ⊙ (Kroupa 2001;Chabrier 2003).
Since we intend to test the scenario in which the IMF varies across cosmic time, and changing the IMF affects the scale of the predicted spectrum, it is necessary to modify the mass prior in Wang et al. (2023d) accordingly.We find the normalization factor by minimizing the χ 2 between spectra predicted by different IMFs, and simply scale the mass prior by this factor.Strictly speaking, IMF(z) describes instantaneous SFR, but stellar mass is determined by effective IMF from all stars formed.We thus note that our modification on the mass The gray dashed curve separating AGNs and starburst galaxies is taken from Kauffmann et al. (2003).
prior serves only as an approximation to mitigate the bias that may be induced by the scale difference among different IMFs.The fact that we assume the same IMF over the entire history justifies this simplification.
Effectively, the same stellar mass function prior is applied to all fits.Therefore, our choice of the prior has no influence on the systematic effects to be quantified as these are the relative shifts in the inferred parameter with respect to the fiducial values.In addition, the mass prior is constructed by using the observed mass functions between 0.2 < z < 3 from Leja et al. (2020); for z > 3, we adopt the nearest-neighbor solution, i.e., the z = 3 mass functions, in the absence of reliable highresolution rest-frame optical selected mass functions at z > 3.As mentioned in Wang et al. (2023d), this choice allots a conservatively high probability for yetto-be-discovered populations of high-mass, high-redshift galaxies.

Timescales for Changes in SFH
All SFHs are modeled non-parametrically via mass formed in 7 logarithmically-spaced time bins (Leja et al. 2017).In our fiducial setup, we use the dynamic SFH prior from Wang et al. (2023d).The expectation values of the logarithmic SFR ratios in adjacent time bins are based on the observed cosmic SFR density in the empirical UniverseMachine model (Behroozi et al. 2019), favoring rising SFHs in the early universe and falling SFHs in the late universe, and adjusted as a function of mass to reflect downsizing (Cowie et al. 1996;Thomas et al. 2005).The distribution of the logarithmic SFR ratios is modeled as a student-t distribution.
An additional bursty SFH is generated by replacing the student-t distribution with a uniform prior spanning from -5 to 5.This means that the SFR is effectively uncorrelated between the time bins, and can jump up to 10 orders of magnitude in a single time step.Our prior is similar in concept to the bursty continuity prior in Tacchella et al. (2022), but represents a more extreme bursty scenario.

Nebular Physics
Prospector models the nebular emission selfconsistently via a pre-computed Cloudy grid (Byler et al. 2017).The parameter space is limited for a practical reason: grid-based approaches necessarily suffer from the curse of dimensionality.
We replace the grid with a neural net emulator, dubbed Cue (Li et al. in prep.), which emulates the continuum and line emission from a single H ii region based on the spectral synthetic code Cloudy (version 22.00; Chatzikos et al. 2023).Cue is tailored to flexibly model exotic chemistry and unusual ionizing properties of galaxies, and is well-suited for galaxies in the early universe.It models the ionizing input spectra as a flexible 4-part piecewise-continuous power-law, along with freedom in gas density, total ionizing photon budget, O/H, C/O, and N/O.While in principle this power law approximation allows us to infer the ionizing spectrum separately from the model stellar spectra, here we fit this piecewise power law to the stellar ionizing spec-tra from the model for each spectrum generated.With the exception of carbon and nitrogen, the element abundances are scaled with the oxygen abundance, based on solar abundances and dust depletion factor specified in Dopita et al. (2000).
The newly gained flexibility, however, does not a priori forbid unlikely nebular abundances.Certain chemical ratios are preferred on galactic scales due to the production rate of various elements; for example, N/O as a function of oxygen abundance has a well-defined shape following from the secondary production of nitrogen.We thus place a Gaussian prior on the N/O ratio against oxygen abundance based on the observed trend.The Gaussian mean and standard deviation are estimated from the data compiled by Groves et al. (2004).In the low oxygen abundance regime where no data is available, we take values from the model grid in Gutkin et al. (2016).

RESULTS
The effect of these changes on the stellar mass, SFR, and stellar age are summarized in Figures 2, where these key parameters inferred from alternative models are plotted against the fiducial values.Furthermore, the preference of a given model by the data is quantified via the Bayesian evidence in Figure 4.The results regarding each model are presented in more detail below.

Alternative Initial Mass Functions
Changing the slope of the IMF induces a systematic shift in the inferred stellar mass and SFR by ∼ 0.3 − 1 dex, as suggested in Figure 2. It has a negligible effect on the inferred stellar age.The effects of varying the minimum stellar mass cutoff is shown in the same way in Figure 3.More specifically, assuming a top-heavy (x = 1.8)IMF systematically decreases the inferred stellar mass and SFR by ≳ 0.3 dex; increasing m l from 0.08 M ⊙ to 0.5 or 1 M ⊙ further enhances the systematic shifts by ≳ 0.1 dex.Assuming a top-light (x = 2.8) IMF increases the inferred stellar mass and SFR by ∼ 1 dex; increasing m l , however, significantly decreases the systematic shifts.With x = 2.8, m l = 1M ⊙ , we recover the fiducial Chabrier masses and SFRs.Our results are consistent with Woodrum et al. (2023), in which different IMFs are used to fit a sub-sample in the JADES photometric catalog (Eisenstein et al. 2023;Rieke et al. 2023).
Importantly, the Bayes factors (Trotta 2008), plotted in Figure 4, indicate that the data are almost completely agnostic to the different forms of IMFs.Further intuition can be gained from examining a single fit.This is illustrated in Figure 5, where the posterior median model photometry and spectra from different IMF cases are plotted.All models can fit the observed photometric data equally well, and the model spectra only differ marginally from each other.The longer wavelength range covered by JWST/MIRI may offer additional distinguishing power.Similar results are found for the rest of the sample.

Bursty Star Formation Prior
The bursty SFH prior of this paper causes negligible systematic changes in the inferred stellar masses.However, it tends to increase the uncertainty in the inferred SFR averaged over the most recent 10 Myr, SFR 10 , with a subset of the fits having these recent SFRs that significantly deviate from their fiducial values in Figure 2. In contrast, the inferred SFRs averaged over a longer time scale of the most recent 100 Myr, SFR 100 , are much less affected.Additionally, the bursty prior leads to a systematic decrease in the inferred average stellar ages of all galaxies, of order 0.2 − 0.5 dex.
The left panel of Figure 6 shows the inferred SFHs for the object where the most dramatic change in SFR is observed.The increase in the stochasticity in the SFH due to the bursty prior is clearly visible, and at almost all lookback times the star formation rate is systematically changed by much larger than the size of the formal uncertainties.
Similar to the IMF case, diagnostics based on the Bayes factor, as well as the median model photometry and spectra show that the observed photometric data have little power to discriminate between bursty and smooth SFHs in general.This is in spite of the fact that here we contrast extreme opposite scenarios of an extremely bursty prior with a relatively smooth prior.We note that this null result differs from that of a recent study, which claims that stochastic SFHs are found at z ∼ 9 (Ciesla et al. 2023) using the JADES photometric catalog (Eisenstein et al. 2023;Rieke et al. 2023).Given the relatively weak constraining power of broadband photometry coupled with the uncertain redshifts, more informative data, in particular spectroscopic observations, are likely required to reach a definitive conclusion.

Flexible Nebular Emission Modeling
The more flexible nebular emission modeling, enabled by the Cloudy emulator, Cue, causes differences in the inferred masses and SFRs that are within 1σ of the formal uncertainties.However, consistent increases in the masses and SFRs are found of order ∼ 0.2 dex.We note that this mass offset is more significant that an extrapolation from studies at lower redshift (e.g., the ∼ 0.12 dex difference found in Li & Leja 2022), which is a natural consequence of the z > 9 population being more sensitive to nebular modeling than the regular galaxy population.
It is also a curious finding that the scatter in the model comparison plot of Figure 4 is noticeably larger than those of the above two cases-this suggests that, unlike the IMF and SFH timescales, nebular properties may well be able to be constrained from broad-band photometry, and this constraining power would be substantially increased by medium-band photometry.
To gain more insights into the influence of the flexible model, we show the emission line ratios calculated from posterior samples for one of the candidates in Figure 6.The recent measurement of GN-z11 is included for reference (Bunker et al. 2023).Interestingly, the line     ratios of GN-z11 seem to occupy a space that is better sampled by Cue.We note that this particular galaxy is claimed to host a bright AGN, which also explains its peculiar abundance (Maiolino et al. 2023;Scholtz et al. 2023).Line emission due to AGN are capable of being modeled with Cue; however, here we assume the ionizing spectra of our galaxies are dominated by stars, and use Cue such that the ionizing input follows the stellar ionizing input.We opt not to allow full freedom in the ionizing spectra since the photometric data lack the constraining power to distinguish between the stellar and AGN contributions.The ability to sample the parameter space occupied by GN-z11 thus indicates that the emulator is flexible enough to model a stellar ionizing spectra mimicking an AGN spectrum.These results suggest that while more flexible nebular models have a relatively less strong effect on bulk properties inferred from SED-fitting than key uncertainties like the IMF and SFH timescales, they will be very important in interpreting more detailed spectroscopic observationsand they can be observationally constrained.We discuss the various implications from these results in the following section.

Systematic Uncertainties from Fixed Parameters in SED-modeling
Two immediate conclusions can be reached from Section 4: first, changing the IMF and the SFH prior produces substantial systematic effects on some of the key stellar population parameters; second, the broad-band photometric observations available from JWST cannot distinguish between the model choices.
It is thus important to ask, given the reasonable model choices and data, what the "real" uncertainties are for high-redshift galaxies.One way to address this question is by comparing the systematic uncertainties to the reported formal uncertainties.Here a formal uncertainty refers to the 16th-84th percentiles of posterior distributions from a single fit, where the different model choices cannot yet be marginalized.We compare the two kinds of uncertainties as follows.
For each fit, we subtract the medians from the posteriors, which are then stacked to provide an estimate of the average formal uncertainty.The resulting distribution for each of the inferred parameters is well approximated by a Gaussian function.We quantify the systematic uncertainty in a similar manner, except that we subtract the median calculated from the fiducial fit from the posteriors.A Gaussian function is fitted to the stacked posteriors, of which the mean corresponds to the average systematic shift, and the standard deviation indicates the range of possible systematic shifts.The black curve is a Gaussian fit to the stacked posteriors with median subtracted for the Chabrier stellar masses.The fit suggests that the typical reported uncertainty, σ, is about 0.20 dex.The orange (green) curve is a Gaussian fit to the stacked posteriors of masses assuming a x = 1.8 (x = 2.8) IMF, with median of the Chabrier masses subtracted.The mean, -0.38 (0.97), describes the typical systematic shift after changing the IMF, whereas the standard deviation, 0.31 (0.16), indicates the spread in the systematics.Overplotted as the gray shade is the scatter driven by using different SED fitting codes (Pacifici et al. 2023).The other two plots contrast the uncertainties and the systematics in the SFRs and stellar ages, respectively.Critically, the systematics in the inferred stellar mass and SFR due to different IMF choices can be at least comparable to and often larger than the standard uncertainties (posterior moments), and the scatter seen when fitting the an object with different codes.(Middle panel) The same as the upper panel, but showing the systematics introduced by adopting a bursty prior instead of a rising SFH prior (Wang et al. 2023d).The stochasticity in the SFH leads to a large spread in the SFR posteriors.The typical systematic uncertainty in SFR average over the most recent 10 Myr is of order ∼ 0.84 dex, and with a large spread of ±1.47 dex.However, SFR averaged over the most recent 100 Myr is less prone to the change in priors.The typical systematic uncertainty decreases to 0.05 dex, with a spread of ±0.50 dex.The typical systematic uncertainty in age is −0.31 ± 0.34 dex.(Lower panel) The typical systematic uncertainty in inferred mass introduced by modeling the nebular physics with more flexibility while retaining consistency with the model stellar populations is comparable (∼ 0.16 dex) to the formal uncertainty.A similar systematic uncertainty in SFR (∼ 0.15 dex) is also observed.
The typical formal and systematic uncertainties of stellar mass, SFR, and stellar age are reported in Figure 7.For the case of a non-universal IMF, the usually unaccounted systematic uncertainty in stellar mass (0.4 ± 0.3 dex for a flatter, top-heavy IMF, and 1.0 ± 0.2 dex for a steeper, top-light IMF) can be 2-5 times larger than the formal uncertainty (∼ 0.2 dex).The same applies to SFR, where the formal uncertainty is found to be ∼ 0.2 dex, but the systematic uncertainties are 0.3 ± 0.3 dex and 1.0 ± 0.1 dex for a top-heavy and a top-light IMF, respectively.In addition, theoretical models have indicated even more exotic IMF shapes at high redshift (e.g., a flat IMF in Chon et al. 2022), or possible environmental dependences (Gunawardhana et al. 2011;Jeřábková et al. 2018;Yan et al. 2019).Here we only intend to cover the range of the IMF slopes suggested by observations.
Critically, a steep IMF, motivated on both theoretical (Chabrier et al. 2014) and observational grounds (Conroy et al. 2017;van Dokkum et al. 2023), leads to systematics that always dominate over the error budget from all other uncertainties normally considered.These values are also greater than scatter caused by fitting a sample of galaxies with different SED-fitting codes, which are found to be ∼ 0.1 dex and ∼ 0.2 dex for stellar mass and SFR, respectively (Pacifici et al. 2023).This means that a common practice in the literature of assessing uncertainties in SED-fitting processes by comparing multiple codes is substantively underestimating the true uncertainty, as these codes typically use similar assumptions for the IMF, the smoothness of the SFH, and nebular emission.
We note that the minimum and the maximum stellar mass cutoffs, which are held fixed in the above analyses, may further affect the systematic uncertainties.Of particular interest is the low-mass end of the power law, as for a steeper, top-light IMF, the stellar mass would be dominated by stars around the lower limit of the IMF.Thus far, we focus on the discussion on the varying IMF slopes because low-mass stars make up a negligible fraction of the total light.We additionally include Figure 8 showing the systematic uncertainties with increasing lower mass limits to illustrate the degeneracy between m l and inferred stellar mass.For the top-light IMF, assuming m l = 0.5M ⊙ , the systematic uncertainties of stellar mass and SFR decrease to 0.3 ± 0.2 dex, comparable to the formal uncertainties; assuming m l = 1M ⊙ , the inferred parameters exhibit negligible differences comparing to the fiducial Chabrier values.It is, however, also easy to imagine performing a similar analysis for IMFs with increasing hidden mass, either owing to lower m l or steeper slopes at the low-mass end.These alternatives are difficulty to be ruled out as low-mass stars would be largely invisible outside the Milky Way, and would increase the systematic uncertainties.For the top-heavy IMF, the results are less sensitive to the lower limit.The stellar masses are systematically decreased by 0.5 ± 0.4 (0.6 ± 0.4) dex, and the SFRs are similarly decreased by 0.4 ± 0.4 dex, if The same analysis is performed for the SFH priors which assume different timescales for changes.Also plotted in Figure 7, we find that stochastic SFHs, which are expected at high redshift (Tacchella et al. 2016;Faucher-Giguère 2018;Legrand et al. 2022;Dome et al. 2024), produce significant systematics in SFR averaged over the most recent 10 Myr (0.8 ± 1.5 dex), and in stellar age (0.3 ± 0.3 dex).The formal uncertainties in these two parameters are ∼ 0.2 dex and ∼ 0.3 dex, respectively.It is worth noting that the systematics in the SFR can be mitigated by averaging the SFR over a longer time scale.Using the most recent 100 Myr, the systematic uncertainty in SFR decreases to 0.1 ± 0.5 dex.In addition, it may be surprising to see that the bursty prior does not lead to significant systematics in the inferred stellar mass (0.0 ± 0.4 dex).This is likely because of the construction of our fiducial SFH prior from Wang et al. (2023d), which has been shown to recover stellar mass reasonably well even in the presence of very bursty SFHs from simulations (Narayanan et al. 2023).
In contrast to the substantial systematic shifts found in the above two cases, a more flexible nebular emission model results in systematic uncertainties in mass and SFR (0.2 ± 0.3 dex) that are comparable to the formal uncertainty (∼ 0.2 dex).The smaller systematic changes mean that even in the case of very high emission line equivalent widths expected in the first galaxies, the assumption of solar-scaled nebular abundance patterns, which are not at all expected to hold at early times, do not cause factor-of-two or more systematics in the inferred masses, SFRs, and ages.Furthermore, the larger scatter in the Bayesian evidences and the differences in the observed line ratios suggest that this is a solvable systematic with future observations-spectroscopy, or even photometry with relatively higher spectral resolution.
A counterintuitive finding, however, is that the estimates for stellar mass and SFR of one galaxy, ID 14088, using the model with increased flexibility are much stronger constrained than the original model.In addition, the Bayesian evidence strongly favors the original model for this particular galaxy.Upon examination, the fiducial fit has a bimodal posterior distribution for the stellar-phase metallicity, but one of the solutions is disfavored by Cue, thus decreasing the error bars.A possible reason is that while Cue finds a marginally better single mode with the increased flexibility, the likelihood of this mode is not high enough to compensate for the added four parameters.
Finally, we note that the contribution from AGN can have significant influence on the stellar mass estimates (D 'Silva et al. 2023).In this work, we only model the mid-infrared emission from the AGN, and find that our sample has low AGN contribution.The mean AGN to galaxy bolometric luminosity is ∼ 0.004.However, we caution that the wavelength coverage of our photometric data is not expected to process constraining power for the particular model considered.The AGN modeling in Prospector will be improved in future works.

Implications for Early Galaxy Formation and Possible Ways Forward
Given the significant systematics in parameters inferred by SED-fitting combined with the lack of shortterm prospects for observational constraints on said parameters, one could argue for including these systematic uncertainties in our error budget for all galaxies.Indeed, the IMF-induced systematics are often larger than the uncertainties and systematics usually considered in the stellar mass functions (Behroozi et al. 2010;Conroy 2013;Speagle et al. 2014;Grazian et al. 2015;Santini et al. 2015;Carnall et al. 2018;Leja et al. 2019b;Furtak et al. 2021).
More recently, various papers have challenged the validity of galaxy formation models or ΛCDM in light of newly discovered galaxy populations with JWST (e.g., Casey et al. 2023;Labbé et al. 2023;Xiao et al. 2023).A non-universal IMF can serve as a possible explanation for the overly massive galaxies, as pointed out in Boylan-Kolchin (2023); Steinhardt et al. (2023).Taking into account the systematic uncertainties found in this paper can, for example, bring the star formation efficiency expected from the Labbé et al. (2023) sample into 1σ consistency with a reasonable value of ≲ 0.32 at z > 8, and that expected from the Xiao et al. (2023) sample to ≲ 0.2 at z > 5; thus removing the need for exotic cosmological models (e.g., Gong et al. 2023;Padmanabhan & Loeb 2023;Parashari & Laha 2023).This is intriguing given that the modifications in the power spectrum necessary for solving the tension with ΛCDM can induce conflicts at lower redshifts (Sabti et al. 2023).
Unfortunately, observationally constraining the IMF beyond the Milky Way is a known challenge.Thus far most of the meaningful constraints come from independent measurements on the mass-to-light ratio from gravitational lensing (Treu et al. 2010;van Dokkum et al. 2023) or dynamical modeling (Cappellari et al. 2013;Posacki et al. 2015), or from direct observational signatures of low-mass stars in high S/N, R ∼ 3000 near-IR spectra (Conroy & van Dokkum 2012;van Dokkum et al. 2017).These approaches are difficult to generalize to a statistical sample at cosmological distance.
Therefore, it seems necessary to fully incorporate the systematic uncertainties in the error budget.This is, however, non-trivial via conventional techniques.To start, including IMF as a free parameter requires rebuilding the stellar library in every fit, which makes it infeasible to sample the likelihood surface within a reasonable amount of time.Possible solutions include machine-learning-accelerated SED fitting of individual galaxies (Hahn & Melchior 2022;Wang et al. 2023a), or population-level inference (Alsing et al. 2023;Li et al. 2024).For the former, once a suitable training set is constructed, it becomes straightforward to marginalize over IMF variations.The latter approach directly makes the population distribution the inference objective, which drastically increases the computational efficiency.
The over-abundance of high-redshift galaxies is another interesting discovery from early JWST observations (e.g., Yung et al. 2024).Bursty star formation has been invoked as a possible explanation (Pallottini & Ferrara 2023;Shen et al. 2023;Sun et al. 2023).While we find that adapting a bursty prior can fit the photometric observations equally well, the current data do not offer evidence for or against such expectation.Understanding the prevalence of bursty star formation, and quantifying the resulting selection effects (i.e., galaxies oscillating in and out of the observable sample due to SFR variability) likely requires at least spectroscopy in order to measure the different SFR and SFH indicators that can constrain burstiness.Perhaps the most convincing evidence would stem from exploring population distributions of these spectral indicators.
As for nebular physics, the Cloudy emulator, Cue, allows for the exploration of a wide parameter space while retaining consistency with stellar populations; particularly relevant to this work is the BPT region occupied by GN-z11, which is sparsely sampled in the standard Cloudy grid (Byler et al. 2017).This indicates that, with more informative spectroscopic or medium-band photometric data, the flexible model is a promising approach to characterize the exotic nebular conditions in the early universe.
Having discussed the systematic effects on the inferred galaxies properties, we would like to add that none of the alternative models leads to dramatic changes in the inferred photometric redshifts.For the z > 9 range studied in this work, the photometric redshifts are almost en-tirely determined by the Lyman break.The Cue parameterization does not influence the broad-band photometry substantial enough to change the results.This means that the uncertainties in the photometric redshifts are mainly driven by the modeling of Lyman-α and its interaction with the intergalactic medium, although nebular emission lines such as Hα have been shown to be able to resemble the Lyman break in photometric data by spectroscopic follow-ups (Arrabal Haro et al. 2023).In the near future, we plan to incorporate a phenomenological model to describe the radiative transfer process (e.g., damping wings; Curtis-Lake et al. 2023).With that being said, the transfer of resonant lines in gas is an intricate process, the accurate description of which requires numerical radiative transfer simulations (e.g., Gronke et al. 2017;Michel-Dansac et al. 2020).There is much room for improvement to fully capture the complexity of the propagation of Lyman-α.
Finally, we note that the systematics analyzed for the z phot > 9 sample of this paper are expected to be present across redshifts, albeit with varying degrees of importance.On the observational front, JWST has been revealing new galaxy populations and establishing statistical samples, reaching uncharted regions in the cosmic history.Sophisticated modeling of galaxy populations that encapsulates the full uncertainties, analogous to the framework proposed for redshift distribution inference in Alsing et al. (2023), would be invaluable in forming a coherent narrative of galaxy evolution over the observed dynamic range.

CONCLUSIONS
In this paper, we quantify the systematic uncertainties from different model choices on inferred high-redshift galaxy properties.We conduct our experiment on a z > 9 sample selected from the publicly available UN-COVER SPS catalog (Wang et al. 2023c).As a largescale application of the Prospector Bayesian inference framework, where redshifts and stellar population parameters are inferred jointly, the UNCOVER SPS catalog provides a baseline against which the fundamental assumptions in SED modeling can be tested.In particular, we investigate the three model choices as follows.
First, we fit the photometry assuming a flatter/steeper IMF in addition to the fiducial Chabrier IMF.The photometric data are firmly unable to distinguish between particular forms of IMF.We find that a flatter IMF (x = 1.8) tends to systematically decrease the inferred stellar mass by ∼ 0.4 dex, and SFR by ∼ 0.3 dex.A steeper (x = 2.8) IMF typically increase the inferred stellar mass and SFR both by ∼ 1.0 dex.These values are notably larger than the reported uncertainty of ∼ 0.2 dex on mass and SFR from fits assuming a fixed IMF, and also the scatter found when fitting data using different SED codes (∼ 0.1 dex in mass and ∼ 0.2 dex in SFR; Pacifici et al. 2023).However, these results can be sensitive to the lower mass limits of the IMF, which increase or decrease the systematic uncertainties depending on the change in the amount of hidden mass.Observationally, this is hard to measure, since the total integrated light is dominated by massive stars.Nevertheless, taken together with the lack of IMF constraints at high redshift and the motivations for a non-universal IMF on observational as well as theoretical grounds, we infer that the usually reported uncertainties on stellar mass and SFR are likely to be underestimated.
Second, we adopt an extremely bursty star formation prior, in contrast to the fiducial rising SFH prior proposed in Wang et al. (2023d).Similar to the varying IMF case, the Bayes factor shows no particular preference given the photometric data.The bursty prior leads to > 1 dex deviations from the fiducial SFRs averaged over the most recent 10 Myr in some cases, however the large systematics can be mitigated by averaging the SFRs over longer timescales, e.g., 100 Myr.The typical systematic decrease is ∼ 0.8 dex in inferred SFR 10 , which is greater than the formal uncertainty of ∼ 0.2 dex, and ∼ 0.1 dex in SFR 100 .Therefore, care must be taken when attempting to infer burstiness by comparing SFRs averaged over different time scales.Additionally, the bursty prior results in marginally younger stellar ages for the full population, of an order of ∼ 0.3 dex, comparable to the formal uncertainty.It is worth emphasizing that, while being able to fit the observations equally well, the inferred SFR assuming the bursty prior is systematically changed by much larger than the size of the formal uncertainties at almost all lookback times.This means that the inference on SFH from broad-band photometry is prior-dominated rather than likelihooddominated; in other words, choosing the appropriate prior is critical for SFH studies, as pointed out in Leja et al. (2019a); Suess et al. (2022); Tacchella et al. (2022).
Third, we implement a Cloudy emulator (Cue; Li et al. in prep.)into Prospector, which allows us to model the exotic nebular conditions in the early universe while retaining consistency with the assumed stellar populations.This results in systematic uncertainties in mass and SFR (∼ 0.2 dex) that are comparable to the formal uncertainty.Interestingly, utilizing Cue also leads to better explored parameter space on the BPT diagram where GN-z11 resides.This implies that flexible nebular emission modeling would be valuable for both for describing and interpreting chemical evolution given more informative (higher resolution) data.
To conclude, the new redshift frontier established by JWST pushes stellar population models into new, exciting, and largely uncalibrated regimes.This work represents a first step toward a comprehensive error budget in inferred galaxy properties from SED modeling, paving the way for an accurate accounting of the intricate processes governing galaxy formation and evolution.

Figure 1 .
Figure 1.Model choices tested in this paper.(Left) The fiducial Chabrier IMF is plotted in black.A flat (x = 1.8)IMF is shown in orange, whereas a steep (x = 2.8) IMF is shown in green.f is the ratio of the numbers of low-mass (0.08 < M⊙ < 0.5) to massive (5 < M⊙ < 120) stars.(Middle) Prior distributions of specific SFRs for a galaxy observed at z = 10 are plotted as functions of lookback time in Gyr.The fiducial rising SFH prior (Wang et al. 2023d) is illustrated by the black curves, while the extremely bursty prior is illustrated by the cyan curves.Shading indicates 16th and 84th quantiles.(Right) The BPT diagram demonstrates the model coverage in the standard Cloudy grid (Byler et al. 2017) and in the more flexible model enabled by Cue.The gray dashed curve separating AGNs and starburst galaxies is taken fromKauffmann et al. (2003).

Figure 2 .
Figure 2. Changes in inferred parameter due to different model choices.The stellar mass, SFR, and mass-weighted age (MWA) are plotted against the fiducial values in each panel.(Upper panel) The comparison between results assuming an IMF with x = 1.8 (x = 2.8) and the fiducial Chabrier IMF are shown in orange (green).Significant systematic shifts in stellar mass and SFR are found.(Middle panel) The fiducial model assumes a rising SFH prior (Wang et al. 2023d), while the alternative model assumes a bursty prior.The SFRs averaged over the most recent 10 Myr (SFR10, unfilled circles show significant scatter driven by the stochasticity in the SFH, but the SFRs averaged over the most recent 100 Myr (SFR100, filled circles) are less affected.The a bursty prior also cause a systematic decrease in stellar age.(Lower panel) The fiducial model computes nebular emission and continuum from a pre-computed Cloudy grid (Byler et al. 2017), while the alternative model uses a Cloudy emulator dubbed Cue.The extra flexibility enabled by Cue systematically increases the inferred mass and SFR by ∼ 0.1 − 0.2 dex, similar in size to the 1σ measurement uncertainties indicated by the error bars.

Figure 3 .
Figure 3. Same as Figure 2, but showing the changes in the inferred parameters with increasing minimum stellar mass cutoffs, m l , of a given IMF.

Figure 4 .
Figure 4. Model comparison.Logarithm of the Bayes factor comparing different model choices is calculated for each candidate.A value of B * /B > (<) 1 represents an increase (decrease) of the support in favor of the alternative model given the observed data(Trotta 2008).The boundaries between different shading indicate weak evidence (|B * /B| = 1), moderate evidence (|B * /B| = 2.5), and strong evidence (|B * /B| = 5), respectively.In most cases, the data are agnostic of the model choices.

Figure 5 .
Figure 5. Model photometry and spectra for one of the galaxies as an example.(a) Photometric data are shown as gray squares.Median model spectra assuming the different IMFs are plotted as functions of observed wavelengths in the left panel, and the shading indicates the 16th and 84th quantiles.Median model photometries are over-plotted in the same colors.The right panel is the same, but compares the fiducial model with one which assumes a bursty prior.In all cases, the model photometric points fit the data equally well, and appear to be identical, whereas the model spectra exhibits marginal difference.(b) Same as the upper panel, but including the longer wavelength range covered by JWST/MIRI.The lower panels here show the differences between the photometries of the alternative models and the fiducial model, normalized by the 1σ uncertainty in the fiducial model photometry.

Figure 6 .
Figure6.Inferred properties for one of the galaxies as an example.(Left) The inferred SFH assuming a rising SFH prior(Wang et al. 2023d) is plotted as a function of lookback time in Gyr in black, whereas that assuming a bursty prior is plotted in cyan.The bursty prior increases stochasticity as expected, and importantly, the SFR is systematically changed by much larger than the size of the formal uncertainties shown in gray shading at almost all lookback times.(Right) Emission line ratios calculated from posterior samples based on the standard Cloudy grid(Byler et al. 2017) are shown in black, whereas those from the Cloudy emulator, Cue, are shown in purple.The measurement of GN-z11 is included for reference(Bunker et al. 2023).Cue enables the exploration of a wider parameter space in nebular conditions.

Figure 7 .
Figure7.Formal reported uncertainties vs. systematics.(Upper panel) The black curve is a Gaussian fit to the stacked posteriors with median subtracted for the Chabrier stellar masses.The fit suggests that the typical reported uncertainty, σ, is about 0.20 dex.The orange (green) curve is a Gaussian fit to the stacked posteriors of masses assuming a x = 1.8 (x = 2.8) IMF, with median of the Chabrier masses subtracted.The mean, -0.38 (0.97), describes the typical systematic shift after changing the IMF, whereas the standard deviation, 0.31 (0.16), indicates the spread in the systematics.Overplotted as the gray shade is the scatter driven by using different SED fitting codes(Pacifici et al. 2023).The other two plots contrast the uncertainties and the systematics in the SFRs and stellar ages, respectively.Critically, the systematics in the inferred stellar mass and SFR due to different IMF choices can be at least comparable to and often larger than the standard uncertainties (posterior moments), and the scatter seen when fitting the an object with different codes.(Middle panel) The same as the upper panel, but showing the systematics introduced by adopting a bursty prior instead of a rising SFH prior(Wang et al. 2023d).The stochasticity in the SFH leads to a large spread in the SFR posteriors.The typical systematic uncertainty in SFR average over the most recent 10 Myr is of order ∼ 0.84 dex, and with a large spread of ±1.47 dex.However, SFR averaged over the most recent 100 Myr is less prone to the change in priors.The typical systematic uncertainty decreases to 0.05 dex, with a spread of ±0.50 dex.The typical systematic uncertainty in age is −0.31 ± 0.34 dex.(Lower panel) The typical systematic uncertainty in inferred mass introduced by modeling the nebular physics with more flexibility while retaining consistency with the model stellar populations is comparable (∼ 0.16 dex) to the formal uncertainty.A similar systematic uncertainty in SFR (∼ 0.15 dex) is also observed.

Figure 10 .
Figure 10.A possible artifact.ID 42658 only appears in the first epoch, and falls in the detector gap in the second epoch.

Table 1 .
z phot > 9 Candidates Studied in this Paper

Table 2 .
(Fujimoto et al. 2023; and Spectroscopic Redshifts NIRSpec/Prism confirmation conducted during the second phase of the UNCOVER survey(Fujimoto et al. 2023; Price et   al. in prep).