Validating posteriors obtained by an emulator when jointly-fitting mock data of the global 21-cm signal and high-z galaxy UV luminosity function

Although neural-network-based emulators enable efficient parameter estimation in 21-cm cosmology, the accuracy of such constraints is poorly understood. We employ nested sampling to fit mock data of the global 21-cm signal and high-$z$ galaxy ultraviolet luminosity function (UVLF) and compare for the first time the emulated posteriors obtained using the global signal emulator ${\tt globalemu}$ to the `true' posteriors obtained using the full model on which the emulator is trained using ${\tt ARES}$. Of the eight model parameters we employ, four control the star formation efficiency (SFE), and thus can be constrained by UVLF data, while the remaining four control UV and X-ray photon production, and the minimum virial temperature of star-forming halos ($T_{\rm min}$), and thus are uniquely probed by reionization and 21-cm measurements. For noise levels of 50 and 250 mK in the 21-cm data being jointly-fit, the emulated and `true' posteriors are consistent to within $1\sigma$. However, at lower noise levels of 10 and 25 mK, ${\tt globalemu}$ overpredicts $T_{\rm min}$ and underpredicts $\gamma_{\rm lo}$, an SFE parameter, by $\approx3-4\sigma$, while the `true' ${\tt ARES}$ posteriors capture their fiducial values within $1\sigma$. We find that jointly-fitting the mock UVLF and 21-cm data significantly improves constraints on the SFE parameters by breaking degeneracies in the ${\tt ARES}$ parameter space. Our results demonstrate the astrophysical constraints that can be expected for global 21-cm experiments for a range of noise levels from pessimistic to optimistic, and also the potential for probing redshift evolution of SFE parameters by including UVLF data.


INTRODUCTION
A promising tool for probing the physics of the early Universe is the 21-cm cosmological signal arising from the neutral hydrogen gas that permeated the intergalactic medium (IGM) before, during, and after the formation of the first stars and galaxies (Madau et al. 1997; for reviews see Furlanetto et al. 2006;Bera et al. 2023).The spin-flip transition in neutral hydrogen emits low-frequency radiation at 1420.4 MHz ( ≈ 21 cm), which has been redshifted to low radio frequencies ( ≲ 200 MHz, corresponding to redshifts  ≳ 6) due to cosmic expansion and encodes the high-redshift evolution of Corresponding author: J. Dorigo Jones johnny.dorigojones@colorado.edu the IGM.The 21-cm signal has both an anisotropic component (power spectrum) and an isotropic, sky-averaged component (global signal; Shaver et al. 1999), whose brightness temperature is measured as a differential temperature relative to the Cosmic Microwave Background (CMB) radiation.
An unambiguous detection of the global 21-cm signal has the potential to reveal the true astrophysical and cosmological properties associated with the Dark Ages ( > 30 − 40), Cosmic Dawn (CD; 10 ≲  ≲ 40), and the Epoch of Reionization (EoR; ending by  ≈ 6).However, the global 21-cm signal is particularly difficult to detect due to the presence of significant foreground emission from the Milky Way that is 4 − 6 orders of magnitude brighter than the underlying signal, making a robust Bayesian forward modelling approach necessary to properly recover and exploit the global 21-cm signal (e.g., Bernardi et al. 2016;Liu & Shaw 2020;Shen et al. 2022).
Radio telescopes on Earth have provided some constraints on the 21-cm power spectrum (e.g., Paciga et al. 2011;Mertens et al. 2020;Trott et al. 2020;Garsden et al. 2021;The HERA Collaboration et al. 2022) and global 21-cm signal (e.g., Bowman et al. 2018;Singh et al. 2018Singh et al. , 2022)).The claimed EDGES detection has been met with skepticism (see e.g., Hills et al. 2018;Bradley et al. 2019;Tauscher et al. 2020;Sims & Pober 2020) particularly because of the systematics involved with measuring the global signal and recently because it has been found to be in tension with the non-detection published by SARAS 3 (Singh et al. 2022).To properly recover the underlying global 21-cm signal, the beam-weighted foreground (i.e., foreground emission convolved with the antenna beam) and instrumental systematics must be carefully fitted and removed (e.g., Rapetti et al. 2020;Hibbard et al. 2020;Tauscher et al. 2021;Pagano et al. 2022;Murray et al. 2022;Anstey et al. 2023;Hibbard et al. 2023).Radio frequency interference (RFI) is a large systematic due to artificial and ionospheric terrestrial contamination which can be avoided by measuring the 21-cm signal from the pristine radio environment of the far side of the Moon.Upcoming NASA Commercial Lunar Payload Services (CLPS) missions ROLSES (2023, at the lunar south pole; Burns et al. 2021b) and LuSEE-Night (early 2026, on the far side; Bale et al. 2023) will lay the path for future lunar far side radio telescope arrays capable of measuring the 21-cm global signal and power spectrum (e.g., FARSIDE (Burns et al. 2021a) and FarView (Polidan et al. 2022)).
Physically-motivated models for the global 21-cm signal have various astrophysical and cosmological parameters that affect the shape of the signal.Multiple studies have attempted to constrain such model parameters when fitting a measured global 21-cm signal via a Bayesian, likelihood-based approach (e.g., Monsalve et al. 2018;Mirocha & Furlanetto 2019;Monsalve et al. 2019;Qin et al. 2020;Bevins et al. 2022aBevins et al. , 2023)).In this work, we perform a similar Bayesian parameter estimation analysis for eight astrophysical parameters using the publicly available model ARES (Accelerated Reionization Era Simulations1; Mirocha 2014; Mirocha et al. 2017) by fitting mock data of the global 21-cm signal and numerically sampling the full posterior distribution of these parameters via nested sampling.We examine the improvement in constraining power on these parameters when jointlyfitting mock data of the high- galaxy rest-frame ultraviolet (UV) luminosity function (LF) in addition to the global 21-cm signal.We present the first nested sampling constraints on ARES parameters when fitting a mock global 21-cm signal and 1 https://github.com/mirochaj/ares;v0.9; git commit hash: fd77c4a86982d25fdad790d717f8bf5eecff4eb8 UVLF that are calibrated to real UVLF data.In doing so, we forecast the level of astrophysical constraints that can be expected for different noise levels of global 21-cm experiments in combination with UVLF data.
The recent development of neural-network-based emulators for the global 21-cm signal, such as globalemu (Bevins et al. 2021, v1.8.0, Zenodo, doi:10.5281/zenodo.8178850),21cm-VAE (Bye et al. 2022), and 21cmEMU (Breitman et al. 2023, which also emulates other quantities such as the 21-cm power spectrum and the UVLF), enables fast, efficient parameter estimations when fitting the global signal.To our knowledge, there is currently no study that shows a direct comparison of the parameter estimates obtained when using an emulator versus the corresponding full model of the global 21-cm signal in the likelihood.The accuracy of an emulator is determined by computing the root mean squared error (RMSE) between model (i.e.simulated) and network (i.e.predicted) data realizations in a test set, while a fully Bayesian parameter inference and model comparison analysis is much more computationally demanding and yields a formal comparison of the posteriors (Trotta 2008).
Parameter estimation using a full model of the global signal in the likelihood is computationally expensive for most existing models.Most global 21-cm signal models are seminumerical and generate a realization of the signal on the order of minutes to hours (Thomas et al. 2009;Santos et al. 2010;Mesinger et al. 2011;Fialkov & Barkana 2014;Ghara et al. 2015Ghara et al. , 2018;;Murray et al. 2020;Schneider et al. 2023;Schaeffer et al. 2023;Hutter et al. 2023), which hinders the ability to perform an analysis that requires on the order of 10 5 likelihood evaluations.In contrast, the semi-analytical code ARES generates a realization of the global 21-cm signal on the order of seconds, owing its speed primarily to the fact that it evolves the mean radiation background directly as opposed to averaging over large cosmological volumes.Therefore, we use ARES in a Bayesian nested sampling analysis to obtain the 'true' posterior distributions and for the first time directly compare them to the emulated posteriors from globalemu.
We generate the mock global 21-cm signal and high- UVLF using ARES with fiducial parameter values that are calibrated to the Bouwens et al. (2015) UVLF at  = 5.9 (Mirocha et al. 2017).We emphasize that the basic ARES UVLF model we employ accurately fits UVLFs at  ≈ 6 − 10 obtained by either HST or JWST (Mirocha & Furlanetto 2023), and so our results would not change if we were to fit mock data calibrated to newer JWST UVLF measurements at these redshifts.However, given early indications of a departure from the predictions of HST-based models at  ≳ 10 (see, e.g., Naidu et al. 2022;Lovell et al. 2023;Donnan et al. 2023;Finkelstein et al. 2023;Harikane et al. 2023;Mason et al. 2023b;Boylan-Kolchin 2023;Bouwens et al. 2023), fitting JWST UVLFs at  ≳ 10 would require non-trivial changes to the UVLF model we employ (Mirocha & Furlanetto 2023).We defer such analysis to future work (see also Zhang et al. 2022).
To summarize, we pursue three main goals: (1) numerically sample the full posterior distribution of eight astrophysical parameters in ARES, which control the star formation efficiency and UV and X-ray photon production per unit star formation in galaxies, when fitting mock global 21-cm signal data with varying noise levels; (2) validate and examine the accuracy of the posteriors obtained by our version of the publicly available neural network emulator globalemu that we trained with ARES; and (3) study the constraints from jointly-fitting high- galaxy UVLF mock data along with the simulated global 21-cm signal.
In Section 2, we describe our methods for obtaining marginalized posterior distributions via nested sampling when fitting mock data of the global 21-cm signal and UVLF.We also describe the training of the globalemu neural network and the generation of the mock data being fit.In Section 3, we present the results from nested sampling analyses, primarily comparing the posteriors obtained when using the emulator globalemu in the likelihood versus the full model ARES, and also examining the effect on posteriors when jointly-fitting with the high- galaxy UVLF mock data.Finally, we summarize our results and conclusions in Section 4.

ANALYSIS
In this section, we describe our analysis method for obtaining the posterior distributions for eight astrophysical parameters in ARES when fitting a mock global 21-cm signal plus statistical noise.The main steps to define our Bayesian analysis are: (1) selecting a sampling method, (2) selecting a fiducial model for the global 21-cm signal, and (3) generating mock data by adding to the simulated global signal a noise realization at a statistical error level corresponding to a given integration time.We also train a neural network to emulate the ARES global signal model and study its accuracy versus the full ARES model in producing realizations of the signal.
Note that for this work, we are not concerned with systematic uncertainties such as the beam-weighted foreground, radio frequency interference (RFI; either from terrestrial contamination or the instrument), and environmental horizon and surface conditions (for studies on such effects, see e.g., Singh et al. 2018;Kern et al. 2020;Bassett et al. 2020;Hibbard et al. 2020;Bassett et al. 2021;Pagano et al. 2022;Leeney et al. 2022;Murray et al. 2022;Anstey et al. 2023;Hibbard et al. 2023).

Likelihood
Bayesian inference allows us to estimate the posterior distribution (|, ) of a set of parameters  in a model , given observed data  with priors  on the parameters (also written (|)).This is achieved via the Bayes' theorem: where L is the likelihood function, or the probability of the data given the parameters of the model (also written ( |, )), and the normalizing factor Z is the Bayesian evidence, or marginal likelihood over the priors (also written ( |)), which can be used for model comparison.For all of the fits performed in this paper, we sample from a multi-variate log-likelihood function assuming Gaussiandistributed noise: where  is the noise covariance matrix of the data, which we assume to be diagonal.In this paper, we fit mock data realizations for the global 21-cm signal ( 21 ) and the UVLF ( UVLF ) instead of real data, although for the latter, the mock data are calibrated to real measurements of the high- galaxy UVLF (see Section 2.6).Hence, we know the input, or fiducial, values of the parameters whose posteriors we numerically sample and can evaluate the validity of the sampling methods and the accuracy of the ARES model and globalemu emulator based on the expectation of marginalized posterior distributions around the fiducial parameter values.

Combined Constraints
To better realize the constraints that are achievable from global 21-cm signal experiments, in addition to fitting only the mock global signal, we also perform joint-fits that combine the model constraining powers from the global signal and high- galaxy UVLF mock data.Using Equation 2, we construct separate log-likelihood functions for the global 21cm signal and the UVLF.For the joint-fits, we form a loglikelihood by adding both individual likelihoods (see, e.g., Chatterjee et al. 2021;Bevins et al. 2023): (3) We evaluate the separate log-likelihood functions at the same set of parameters using the same priors to sample the full posterior distribution, as the models we employ for the 21-cm signal,  21 (), and for the UVLF,  UVLF (), are both generated using the ARES framework (see Section 2.4).2For the global 21-cm signal likelihood, the noise covariance matrix  21 is a diagonal array of constant values corresponding to the square of the estimated noise level  21 .For the UVLF likelihood, the main diagonal elements of  UVLF are the same as the errors on the  = 5.9 UVLF data by Bouwens et al. (2015) (see Section 2.6).

Nested Sampling
We employ the Bayesian inference method of nested sampling (Skilling 2004; for reviews see Ashton et al. 2022;Buchner 2023).Conceptually, nested sampling algorithms converge on the best parameter estimates by iteratively removing regions of the prior volume with lower likelihood.Nested sampling computes both the evidence and posterior samples simultaneously (by recasting the multi-dimensional evidence integral into a one-dimensional integral), whereas Markov Chain Monte Carlo (MCMC) samplers calculate only the posterior.
In general, Monte Carlo methods like nested sampling and MCMC are computationally expensive because they require many likelihood evaluations to sample the converged posterior distributions.We choose nested sampling instead of MCMC because the former is designed to better constrain complex parameter spaces with "banana"-shaped curved degeneracies and/or multi-modal distributions (Buchner 2023).Another likelihood-based method that has been applied to parameter estimation of the global 21-cm signal is Fishermatrix analysis (Liu et al. 2013;Muñoz et al. 2020;Hibbard et al. 2022;Mason et al. 2023a), which assumes multi-variate Gaussian posterior distributions and requires only O () likelihood evaluations for  parameters being sampled.Fisher analysis is efficient but provides an accurate description only when the posteriors are symmetric, Gaussian, and uni-modal (e.g., Trotta 2008;Ryan et al. 2023); however, Fisher matrix generalizations exist (Heavens 2016) such as adding higher order matrices (Sellentin et al. 2014).There are also 'likelihood-free' inference methods (also called simulationbased inference, see Cranmer et al. 2020), which have been shown to provide accurate posteriors at a relatively low computational cost (Prelogović & Mesinger 2023).
Two nested sampling algorithms in particular have been used in 21-cm cosmology and have been shown to efficiently sample posterior distributions: MultiNest (Feroz & Hobson 2008;Feroz et al. 2009Feroz et al. , 2019) ) and PolyChord (Handley et al. 2015a,b, v1, Zenodo, doi:10.5281/zenodo.3598030).In both MultiNest and PolyChord, an initial number of 'live' points,  live , are generated in the prior volume, which are used to eventually converge on the best parameter estimates, but the two nested sampling algorithms differ in how they replace live points.For a more in-depth comparison of MultiNest and PolyChord see Section 2 of Lemos et al. (2023).Because of their different approaches for replacing live points, MultiNest and PolyChord are known to perform differently depending on the number of dimensions, or the number  1 for the parameter ranges).Shown in bolded blue is the fiducial global 21-cm signal to which we add Gaussian-distributed noise at different levels to form the mock 21-cm data sets that we fit.The mock UVLF data that we add in our joint-fits are also generated by ARES using the same fiducial parameter values (see Table 1) that were obtained via calibration to the Bouwens et al. ( 2015)  = 5.9 UVLF by Mirocha et al. (2017), as described in Section 2.6. of parameters being constrained (see Fig. 4 in Handley et al. 2015a).We primarily utilize MultiNest for our analyses, and we show for one joint-fit that the two nested sampling algorithms converge on roughly the same result for the same  live but with MultiNest being much more efficient than PolyChord for constraining eight astrophysical parameters in ARES (see Section 3.1).

Modeling the Global 21-cm Signal and UVLF
To simulate the global 21-cm signal (and high- galaxy UVLF), we use the physically-motivated, semi-analytical code ARES, which is the union of a 1D radiative transfer code developed in Mirocha et al. (2012) and a uniform radiation background code described in Mirocha (2014).ARES outputs realizations of the global 21-cm signal and galaxy LF in just seconds, which makes it computationally feasible to perform direct parameter estimation using the full model ARES rather than an emulator in the likelihood of a nested sampling analysis.3Although ARES contains cosmological parameters that affect the shape of the Dark Ages trough, in this work we focus on demonstrating the astrophysical constraints that are achievable when fitting the Cosmic Dawn and reionization redshift ranges.For high- galaxies, the observed LF probes the rest-frame UVLF, ( UV ), and so the UVLF model primarily depends on the star formation rate of massive, young stars.The ARES model is motivated by studies of the high- galaxy LF based on abundance matching, and the fiducial model ignores dust extinction (which has a minor impact on the conversion between the observed and intrinsic LF at  ≳ 6) and suggested redshift evolution of the star formation efficiency (SFE).ARES assumes a multi-color disk (MCD) spectrum for high-mass X-ray binaries (HMXBs; Mitsuda et al. 1984) and uses the BPASS version 1.0 single-star models for continuous star formation to derive the UV photon production efficiency (Eldridge & Stanway 2009).
For full descriptions of how ARES models the galaxy UVLF and the global 21-cm signal, see Section 2 of Mirocha et al. (2017).Here we will provide a brief description of the UVLF model to highlight the SFE parametrization.The two components required to calculate the UVLF are (1) the intrinsic luminosity  of galaxies as a function of dark matter (DM) halo mass  h , and (2) the DM halo mass function (HMF) (i.e., the number of DM halos per mass bin per co-moving volume of the Universe).The HMF has been well-studied (e.g., Press & Schechter 1974;Bond et al. 1991;Murray et al. 2013), and in ARES it is calculated a priori in lookup tables using an analytical construct that assumes halos form by spherical collapse.The luminosity of each halo can be written in terms of the star formation rate, which is itself the product of the SFE,  ★ , and the baryon mass accretion rate (MAR).The MAR is derived directly from the HMF (see, e.g., Furlanetto et al. 2017;Mirocha et al. 2021), and so all that is needed to calculate the UVLF is a parametrization for the SFE.Here, as in Mirocha et al. (2017), we assume the SFE is a double power law in  h : where  ★,0 is the peak SFE at mass  p , and  lo and  hi are the power-law indices at low and high masses, respectively.We sample the full posterior distribution of eight parameters, including the four SFE parameters (  ★,0 ,  p ,  lo , and  hi ) and four other astrophysical parameters:   ,  esc ,  min , and log  HI .The production and release of X-ray photons in galaxies is controlled by   and log  HI ; the escape of UV photons is controlled by  esc ; and the minimum virial temperature which determines the number of collapsed starforming halos is controlled by  min .In Table 1, we summarize these eight parameters and give the flat prior ranges used in the nested sampling analyses and also when training the globalemu network on ARES mock global 21-cm signals.
The flexible ARES parameter space allows us to set wide, uninformative priors over these free parameters that are still physically meaningful.In order to facilitate a complete exploration of the prior volume, for four parameters,   ,  min ,  ★,0 , and  p , we sample from their prior ranges uniformly in log10-space, as shown in Table 1.Our prior ranges are centered on some empirically-motivated values (see Section 2.6 for description of fiducial parameter values), but we give multiple orders of magnitude on either side of those values to accommodate potentially dramatic departures at high- and to capture the full resulting converged posterior distributions (see Section 3).One of our main goals is to directly compare the posteriors when using an emulator for the global 21-cm signal versus when using the full model on which the emulator was trained.In the next sub-section, we describe the construction of the training set for the emulator and directly assess the accuracy of the emulated signals compared to the 'true,' input ones.

Emulating ARES with globalemu
We employ the publicly available global 21-cm signal emulator globalemu (Bevins et al. 2021) for our analyses, though other emulators for the signal do exist such as 21cmVAE (Bye et al. 2022), 21cmGEM (Cohen et al. 2020) 1.Each signal spans the redshift range  = 6 − 55 with a redshift spacing of  = 0.1, similar to Bevins et al. (2022a).The training set that is ultimately used to train the globalemu network used for analyses presented in this work contains 24,000 mock signals.A representative subset of this training set is shown in Figure 1.We also generated training sets of sizes 5,000, 10,000, and 20,000, which all resulted in less accurate trained networks.The marginal improvement of 10% in the RMSE of the resulting trained network obtained when using a training set of size 24,000 compared to 20,000, however, indicates that increasing the number of global signals in the training set above 24,000 would not significantly affect our results.In addition, we also created a so-called test set of 2,000 global signals using ARES and the same parameter ranges as used for the training set.Importantly, the test set is completely separate from the training set and is used to determine the accuracy of the trained globalemu network.
Using the 24,000-signal ARES training set, we train five globalemu networks each with a different network architecture.We test a similar, although less comprehensive, grid of architectures as those tested in Bevins et al. (2021) (see their Figure 8): [8,8,8], [64,64], [16,16,16,16], [16,16,16], [32,32,32]; where the values of each component in a given bracket are the numbers of nodes in each hidden layer, and the number of components in each bracket is the number of layers.The network stops learning once the loss function does not improve by 10 −5 within the last twenty epochs of training, which ensures the trained network is as accurate as possible for the chosen network architecture.5For the data pre-processing step that is required before training the network (see section 4 of Bevins et al. 2021), we turn off the astrophysics-free-baseline (AFB) subtraction and resampling options because we find that they have a slightly negative impact on the accuracy of the resulting trained network.The lack of benefit from the pre-processing steps may be due to the fact that the 'astrophysics free' Dark Ages comprises a small portion of our simulated signals.
We determine the accuracy of each trained network by evaluating them at the parameter values of the 2,000 ARES signals in the test set and comparing the resulting emulated signals to their corresponding 'true' signals.The top panel of Figure 2 shows a subset of the test set (in black) plotted along with the corresponding emulations (in red) generated by the globalemu network used for analyses presented in this work.The bottom panel of Figure 2 shows the residuals between the emulated and 'true' signals in the test set.We find that the network architecture of [32,32,32] gives the lowest mean RMSE of 1.25 mK (with a maximum RMSE of 18.5 mK) between the 2,000 emulated and 'true' signals (see the horizontal dotted, red line in the bottom panel of Figure 2), while the other network architectures gave mean RMSEs ranging from 1.8 mK to 4.5 mK.Network training for the architecture [32,32,32] took 10 hours, as performed on a 2018 MacBook Pro with a 6-core i9 processor and 32 GB of memory.The mean RMSE of 1.25 mK is comparable to or better than those achieved in other studies that trained globalemu on large training sets (e.g., Bevins et al. 2022aBevins et al. ,b, 2023)), and Bevins et al. (2021) also found [32,32,32] to give the lowest mean RMSE of the trained network.
The efforts described above to optimize the accuracy of the ARES-trained globalemu network provide robustness to the accuracy limits determined in Section 3.Even so, the small RMSE of the trained network should contribute to bias on the resulting emulated parameter constraints.We briefly investigate this by determining whether or not there is a correlation in the test set between the depth of a signal's Cosmic Dawn (CD) trough and the accuracy of its corresponding emulation.We find no statistically significant correlation between CD trough depth and the mean, median, or maximum emulation residual, obtaining Kendall rank and Pearson correlation coefficients between −0.1 and −0.6 with p-values all < 10 −3 .Therefore, we infer that emulated posterior biases are not correlated to emulation residuals in the CD trough depth, although we defer to future work a detailed investigation of the relationship between RMSE network uncertainties and the accuracy of the emulator model constraints.

Mock Data
For all analyses, we fit the same mock data realization for the global 21-cm signal and galaxy UVLF at  = 5.9 generated by ARES using a fiducial set of parameter values,  0 (see Figure 1 and Table 1).The fiducial values used for the four parameters that the UVLF is sensitive to (i.e., the four SFE parameters: ★,0 ,  p ,  lo , and  hi ; see Equation 4) were determined empirically via calibration to the  = 5.9 UVLF measured by Bouwens et al. (2015) (see Mirocha et al. 2017 for details on this calibration).For the other four astrophysical parameters that we constrain (i.e., the four 'non-SFE' parameters:   ,  esc ,  min , and log  HI ) we use typical, physically-motivated fiducial values based on observations or simulations.
Because the non-SFE parameters have no effect on the ARES UVLF model, their values are not constrained by the UVLF calibration procedure.In particular, the fiducial value for   is motivated by studies of low- star-forming galaxies (e.g., Mineo et al. 2012), and the fiducial value for log  HI is motivated by simulations (e.g., Das et al. 2017).The difference in our fiducial values for  esc and log  HI compared to those used in Mirocha et al. (2017) result in our fiducial mock global 21-cm signal (see the blue curve in Figure 1) having a Cosmic Dawn trough that is located at the same frequency but is ≈ 50 mK deeper.
The fiducial mock global 21-cm signal is created in the same manner as the training set (i.e.,  = 6 − 55 with step  = 0.1), and the mock galaxy UVLF is created at the same ten magnitudes as the UVLF at  = 5.9 measured by Bouwens et al. (2015).Therefore, the mock UVLF that we fit is a collection of ten data points that resembles the actual  = 5.9 UVLF measured by Bouwens et al. (2015), but with small vertical offsets from the real data points due to the UVLF calibration procedure that allows us to identify the input model parameters (see the left panel of Fig. 2  The noise that we add to the fiducial mock 21-cm signal is Gaussian-distributed with a standard deviation noise estimate  21 .For our analyses, we test five different 21-cm noise levels (including the optimistic, fiducial, and pessimistic scenarios used for the REACH radiometer in de Lera Acedo et al. 2022):  21 = 5 mK or 10 mK (referred to as 'optimistic'),  21 = 25 mK or 50 mK (referred to as 'standard'), and  21 = 250 mK (referred to as 'pessimistic').We also note that the noise added to  21 is constant in frequency space, whereas in practice, the noise on the measured global 21-cm signal is expected to decrease with increasing frequency according to the radiometer equation.It has been suggested that such frequency dependence has little impact on the derived parameter constraints (Bevins et al. 2022b), but full treatment is left for future work.For the UVLF, we use the error reported for the  = 5.9 UVLF data from Bouwens et al. (2015).

RESULTS
In this section, we present the results of fitting mock global 21-cm signal data, with and without mock high- galaxy UVLF data, using various noise levels to be expected from 21-cm experiments.For the astrophysical modelling, we employ either an ARES-trained globalemu network or the full ARES model.
We first discuss the posteriors obtained when jointly-fitting the mock 21-cm and UVLF data (Section 3.1, Figures 3 to 5, followed by those obtained when separately fitting the individual data sets (Section 3.2, Figure 6), and lastly we discuss the concept of posterior consistency in our results (Section 3.3).Because the joint-fits produce unimodal posteriors with wellbehaved means, we focus primarily on the posteriors from joint-fits when comparing globalemu and ARES.
We determine the accuracy of the ARES-trained globalemu model by comparing the mean (see top panel of Figure 5) or the shape (see Appendix A) of the emulated posterior distributions to those of the 'true,' full ARES posteriors.Note that this comparison is driven by the global 21-cm signal since globalemu does not emulate the UVLF, which we continue to model with ARES.To our knowledge, the recently released 21cmEMU (Breitman et al. 2023) is the only publicly available emulator that includes the UVLF; see, however, Kern et al. (2017) for a more general emulator.
For most fits, we find necessary to use more than the default number of initial live points in MultiNest of  live = 400 in order to fully sample the posterior and obtain convergence (see Table 2 for details on fits performed).For the choice of sampling efficiency (i.e., the ratio of points accepted to points sampled), we use the recommended value for parameter estimation in MultiNest,  = 0.8, and for the evidence tolerance, the recommended, default value of tol = 0.5.6All of the triangle plots shown in this paper were generated using the Python module corner.py(Foreman-Mackey 2016, v2.0.0,Zenodo, doi:10.5281/zenodo.53155)with 100 bins and a Gaussian smoothing kernel of 2.For case examples, we tested that increasing the number of bins did not affect the essence of the results presented.The resulting posteriors were plotted using the samples and weights output by the converged nested sampling runs.

Jointly-fitting 21-cm and UVLF Mock Data
In Figure 3, we present the posteriors obtained from a jointfit using either globalemu or ARES, for a 'standard' noise level of  21 = 25 mK in the 21-cm data being fit.We present this as the main joint-fit result because we find that  21 = 25 mK gives the least biased mean parameter values for ARES with respect to the fiducial ones and therefore provides the best representation of the accuracy limits of the globalemu model with respect to ARES.In Figure 4, we compare the 1D posteriors obtained from joint-fits using ARES for three characteristic 21-cm noise levels.In Figure 5, we summarize the biases between the emulated and 'true' posteriors, as well as between the 'true' posteriors and the fiducial values, for the five tested 21-cm noise levels.We present the full posterior distributions obtained from joint-fits for  21 = 50 mK and 250 mK in Appendix B.
As to be expected, because the four SFE parameters (  ★,0 ,  p ,  lo , and  hi ) directly determine the UVLF model (Section 2.4), their posteriors are well-constrained when adding the UVLF to the 21-cm data.For joint-fits, the four SFE posteriors are unimodal and centered on the fiducial value, which is not the case when fitting only the 21-cm data as we discuss in Section 3.2.Interestingly, the bimodalities in the 1D posteriors for  p and  lo when fitting only the 21-cm data disappear when adding the UVLF data in the joint-fit, showing that the combination of both data sets can break degeneracies in the ARES parameter space and reduce biases.
Comparing the emulated distributions (in red) and 'true' distributions (in black) in Figure 3, we see that the globalemu model produces similar posteriors as the ARES model, both in shape (see Appendix A) and mean (top panel of Figure 5), except for a few exceptions discussed below.In Figure 5, we summarize the two different types of parameter biases discussed, for the different 21-cm noise levels tested: emulation bias (Equation 5) and true bias (Equation 6).Emulation bias refers to the accuracy of the emulated posterior parameter means,  globalemu , with respect to the 'true' posterior parameter means,  ARES , and true bias refers to the accuracy of  ARES with respect to the fiducial parameter value,  0 .We note that these two biases provide all of the information necessary to evaluate the accuracy of globalemu and ARES, and that defining a third bias between  globalemu and  0 does not further aid our results.
We therefore define and compute an emulation bias as the difference in the emulated and 'true' posterior parameter means divided by the standard deviation of the 'true' posterior,  ARES : In the same manner, we define a true bias between an ARES posterior parameter mean and its fiducial value: signal and UVLF data.These eight parameters control the SFE and the UV and X-ray photon production in galaxies (see Table 1).The red posterior is obtained using the ARES-trained globalemu network model, and the black posterior is obtained using the full ARES model.Blue vertical and horizontal lines indicate the input, or fiducial, parameter values used to generate the mock data being fit (see Table 1), which are calibrated to real observations of the UVLF (see Section 2.6).The statistical noise in the 21-cm data being fit is  21 = 25 mK, which among the five tested we find gives the most accurate 'true' ARES posteriors with respect to the fiducial parameter values, and also highlights for which parameters globalemu obtains biased constraints (see also Figure 5).The UVLF data noise is the same as the error on the  = 5.9 UVLF measurements from Bouwens et al. (2015).Contour lines in the 2D histograms represent the 95% confidence levels, and density colormaps are shown.Axis ranges are zoomed-in with respect to the full prior ranges given in Table 1.See Table 2 for further details on each fit.
true bias We find that, in general, the emulation bias decreases as the 21-cm noise level increases.For  21 = 50 mK and 250 mK, all parameters' emulation biases are ≤ 1 (marked with a black, horizontal line in Figure 5), while at lower noise levels ( 21 = 5 mK, 10 mK, and 25 mK) the emulation bias raises above 1 for certain parameters.For  21 = 10 and 25 mK,  min and  lo have emulation biases of 3 − 4, and for  21 = 5 mK,  min and  esc have even higher emulation biases of ≈ 6 − 10, while the emulation bias of  lo drops below 1.
The relatively high emulation biases on  min and  lo are due only to the globalemu posteriors being less accurate, given that the true biases on  min and  lo are low (see the bottom panel of Figure 5).In contrast, the high emulation bias on  esc at 5 mK is influenced by the high true bias on the ARES posterior for  esc .We find that true biases ≥ 1 at low 21-cm noise levels for  esc and  min also exist when using other samplers such as PolyChord (see Figure 4) and emcee (Foreman-Mackey et al. 2013), and so we infer that these biases could be due to accuracy limitations of the sampling algorithms to produce unbiased constraints at very low noise levels.Future work could further explore sampling biases at such low noise levels by using other algorithms such as dynesty (Speagle 2020) and in particular UltraNest (Buchner 2016(Buchner , 2019(Buchner , 2021)), which was created for the purpose of mitigating bias in complex posteriors.
We can also compare the final evidences output from the nested sampling analyses and compute the Bayes factor (i.e., the ratio of evidences, or difference of log evidences) to select the favored model given the data and priors (Trotta 2008).For  21 = 25 mK, the Bayes factor between globalemu and ARES is 0.6; for 50 mK, it is 2.7; and for 250 mK, 1.2.The natural logarithm of these Bayes factors being < 1 indicates that there is no preference for one model over the other in fitting the mock data (see e.g., Kass & Raftery 1995;Jeffreys 1998;Trotta 2008).
In Figure 4, we see that as  21 increases, the non-SFE posteriors become less constrained around the fiducial value, except for log  HI which is unconstrained at all noise levels.At high/pessimistic  21 , the 21-cm data provides much less constraining power, which causes degeneracies in the 8dimensional ARES parameter space to grow larger (i.e., the space becomes flatter).This subsequently widens the posterior distributions for those parameters that are most sensitive to the 21-cm data (see also Section 3.3).For the 21-cm noise level of 250 mK, the true biases are ≈ 1 for   and  min , and ≈ 2.5 for  esc .In contrast, for  21 = 10 mK, 25 mK, and 50 mK, there is no true bias ≥ 1, except for  min at 50 mK and  esc at 10 mK, which each have true bias of ≈ 2 (see bottom panel of Figure 5).
As briefly mentioned, we performed one joint-fit using the PolyChord nested sampling algorithm to compare the result to an equivalent joint-fit using MultiNest.In Figure 4, the posteriors from PolyChord are shown as dotted green histograms, and the equivalent posteriors from MultiNest are shown in solid yellow.For the 21-cm data being fit we assume the optimistic noise level  21 = 10 mK, and for the UVLF we assume twice the error on the  = 5.9 UVLF measurements from Bouwens et al. (2015) (i.e., '2xB+15').We use '2xB+15' UVLF error instead of 'B+15' because this allows the PolyChord run to converge in a more reasonable amount of time.In addition, we find that doing so has no effect on the non-SFE posteriors and only slightly increases the width of the SFE posteriors.We find close agreement between the posterior distributions and final evidences (see Table 2) obtained when using PolyChord versus those when using MultiNest.
Comparing the two runs, we find that PolyChord required 28 times more likelihood evaluations to reach roughly the same result (with an acceptance rate of 0.38% versus 8.7% for MultiNest; see Table 2).PolyChord, however, is expected to become more efficient than MultiNest for a larger number of parameters (Handley et al. 2015a), and could thus be a better choice for 21-cm analyses including additional free parameters to account for systematics such as the beamweighted foreground, RFI, sub-surface conditions, etc.

Fitting Individual Mock Data Sets
In Figure 6, we present the posterior distributions when separately fitting our individual mock data sets.When fitting only the 21-cm data, using either the full ARES model (in black) or the ARES-trained globalemu model (in red) for  21 = 50 mK, the posterior presents large degeneracies and in general larger true biases than the corresponding joint-fit at the same  21 (shown in Figure B.1).In particular, for the SFE parameters, bimodalities and degeneracies exist when fitting only the global signal that are removed when jointly-fitting the UVLF (see Section 3.1).Among the four SFE parameters,  p and  hi are the least constrained when fitting only the 21-cm data.This is expected because these two parameters control the brightest sources, which contribute relatively little to the global photon budget, making the global signal rather insensitive to these parameters and motivating the inclusion of the UVLF data to aid these constraints.In addition, even though the posteriors of the non-SFE parameters,   ,  esc ,  min , and log  HI , remain largely the same after adding the UVLF data, the joint-fit does significantly reduce the presence of long tails in these parameters, in particular for  esc and  min .
When only fitting the UVLF data (green posterior in Figure 6), we find as expected strong constraints on the SFE parameters and the lack of constraints on the rest.This is because the ARES UVLF model only depends on the four SFE parameters and is independent of the other four.The green  1).Blue vertical lines indicate the input, or fiducial, parameter values used to generate the mock data (see Section 2.6).The dotted, green histograms result from using PolyChord with  21 = 10 mK and match well the corresponding distributions obtained by using MultiNest.The noise on the mock UVLF being fit is the same as the error on the  = 5.9 UVLF measurements from Bouwens et al. (2015), except for the posteriors for 10 mK shown here, for which we used twice the UVLF error to allow for a reasonable convergence time of the PolyChord run (see Section 3.1).The posteriors for 25 mK and 250 mK are the same as those in Figures 3  and B.2, respectively.Axis ranges are zoomed-in from the full prior ranges given in Table 1.  5) between globalemu and ARES for different noise levels of the mock 21-cm data being jointly-fit with the mock UVLF data.Generally, the emulation bias decreases as the 21-cm noise level increases.For  21 = 50 mK and 250 mK, the emulation biases are < 1 for all eight parameters, as indicated by the horizontal black line.The emulation biases for  lo ,  min , and  esc can be significantly higher than the rest for certain lower 21-cm noise levels.Bottom: True bias (Equation 6) between ARES and the fiducial parameter values, for the same joint-fits.True bias is lowest at 25 mK (< 1 for all parameters), and increases at high and low 21-cm noise levels due to increased uncertainty and difficulty in sampling, respectively (see Section 3.1).As also discussed in the text, note that the high emulation bias on  esc at 5 mK is dominated by its high true bias.
together with the black or red posteriors in Figure 6 illustrate how jointly-fitting the UVLF with the 21-cm data is expected to break significant degeneracies in this parameter space, to obtain the tight constraints shown in Figure 3.  3, except that the statistical noise in the 21-cm data being fit is  21 = 50 mK, and the axis ranges are the full prior ranges given in Table 1.See Table 2 for further details on each fit.
Comparing the red and black constraints from the 21cm data in Figure 6, we find that using the ARES-trained globalemu model produces rather similar 1D and 2D posterior distributions to those from the full ARES model, with all emulation biases < 1, except for  esc which has an emulation bias of ≈ 1.As stated in Table 2, the runs using globalemu and ARES reach nearly the same final evidence, further demonstrating the agreement between the two results.This close agreement shows that globalemu is able to represent the ARES parameter space more easily when the constraints are significantly weaker with respect to those from the joint-fit with the UVLF data.

Posterior Consistency
Bayesian consistency of a posterior distribution is the concept that as the number of data observations grows, the pos-terior distribution converges on the truth (Schwartz 1965)7.A posterior is considered consistent if it eventually concentrates on the true parameter value as the number of degrees of freedom in the data vector increases to infinity.As shown in Figure 4, we observe posterior consistency when comparing the 1D posteriors obtained for decreasing levels of the 21-cm noise: larger integration times result in posteriors generally becoming more peaked around the input, fiducial values (marked by blue lines).As briefly mentioned in Section 3.1, for lower integration times (i.e., higher  21 ), the 21-cm data provides relatively little constraining power, which grows the covariance in the multi-dimensional parameter space, producing probability density biases8.As expected from Bayesian consistency, we thus find that the posteriors are more biased from their fiducial values at increasing noise levels.
Posterior consistency is most apparent for these four parameters:   ,  esc ,  min , and  lo .Their pessimistic noise level posteriors ( 21 = 250 mK; gray in Figure 4) are clearly not centered on their fiducial values, presenting a relatively slow 'rate of convergence,' while the three SFE parameters  ★,0 ,  p , and  hi have faster rates of convergence and thus require less integration time to concentrate on their input, fiducial values.As also shown in the triangle plots above, log  HI remains largely unconstrained for all the noise levels, though globalemu still accurately emulates its posterior.

CONCLUSIONS
In this paper, we present the 1D and 2D posterior distributions for eight astrophysical parameters in ARES obtained when fitting mock data of the global 21-cm signal and/or the high- galaxy UVLF via nested sampling.We compare for the first time the posteriors obtained from a global 21-cm signal emulator to those obtained using the full model on which it is trained, at various 21-cm noise levels.Use of an emulator such as globalemu is desirable as it speeds up model evaluations by several orders of magnitude, but the accuracy of such constraints is poorly understood.The eight parameters employed control in ARES the star formation efficiency (SFE) and the efficiency of UV and X-ray photon production per unit star formation in galaxies (see Table 1).
We assess the accuracy of the parameter constraints obtained by an ARES-trained globalemu network and determine for which parameters and 21-cm noise levels globalemu is biased compared to ARES.We test optimistic, standard, and pessimistic 21-cm noise levels ranging between  21 = 5 mK and 250 mK to show the astrophysical constraints that can be expected for non-systematics-limited 21-cm experiments.We optimize the accuracy of the trained globalemu network by testing multiple network architectures and training set sizes, obtaining a mean RMSE between the emulated and true ARES signals in the test set of 1.25 mK.
We find that adding the UVLF to the 21-cm data provides significant improvements to the constraints on the four SFE parameters, and it has little to no effect on the constraints on the non-SFE parameters.These results imply that combining 21-cm observations with HST and JWST measurements of the UVLF at different redshifts may provide key insights into the suggested redshift evolution of the star formation efficiency and the degree of stochasticity.
The ARES-trained globalemu model produces relatively accurate posteriors with respect to the 'true' ARES model at the tested 21-cm noise levels, both in shape and mean, except for the following.In particular,  min and  lo present significant emulation biases at  21 = 25 mK or lower, for which globalemu overpredicts  min and underpredicts  lo by ≈ 3 − 4 (see the top panel of Figure 5, and Figure 3 for the full posterior distributions), except for at  21 = 5 mK, where  lo has a negligible bias.For noise levels of  21 = 50 mK and 250 mK, the globalemu emulator reproduces the posterior means found by ARES at the 68% confidence level for all eight parameters (see the top panel of Figure 5, and Appendix B for the full posterior distributions).
When examining the 1D posteriors obtained from joint-fits at various noise levels in Figure 4, we find that as the noise in the 21-cm data decreases, the 1D posteriors become more concentrated around their input, fiducial values, as expected for 'posterior consistency.'For standard noise levels of  21 = 25 mK and 50 mK, the true biases for all parameters are < 1, except for at  21 = 50 mK where  min has a true bias of ≈ 1.5.For the pessimistic noise level of  21 = 250 mK, three parameters (  ,  min , and  esc ) have 'true' ARES posterior means that are ≈ 1 − 3  away from their fiducial value (i.e., have 'true biases' ≈ 1 − 3; see the bottom panel of Figure 5).This indicates a slow rate of convergence for these parameter fits and the need for a longer integration time to achieve posteriors centered around the true value.
In summary, this work provides insights on the statistical constraints that are achievable from global 21-cm measurements in combination with high- UVLF data when using an emulator.We obtain strong constraints on eight ARES parameters when jointly-fitting such data using either the full ARES model or an ARES-trained globalemu model.The most accurate ARES constraints are achieved for a 21-cm noise level of 25 mK, where all eight ARES parameter means are within 1 of their fiducial values.At this noise level, however, globalemu overpredicts  min and underpredicts  lo .For larger noise levels of 50 and 250 mK, while in general the true biases increase, The information provided for each fit are the noise level of the mock 21-cm signal ( 21 ) and/or UVLF ( UVLF ) being fit, the number of initial live points used ( live ), and the final output metrics, including the evidence (log Z), the total number of likelihood evaluations ( evaluations ), the acceptance rate (  accept ), and average CPU-time required per evaluation (sec./eval.).'B+15' denotes that the UVLF error used is the same as that of the  = 5.9 UVLF data by Bouwens et al. (2015) (see Section 2.6).All fits shown were performed using MultiNest, except for one joint-fit for which we used PolyChord, the result of which is consistent with the equivalent MultiNest fit (see Figure 4).The fit using PolyChord required over an order of magnitude more computational time to converge compared to the equivalent MultiNest fit, and so we used twice the 'B+15' UVLF error to aid convergence in a reasonable amount of time without significantly affecting the results (see Section 3.1).The results from each fit included here are presented in Section the emulated and true posteriors match more closely such that their parameter means are within 1 of each other.We thank the anonymous reviewer for their detailed comments that helped improve the manuscript.We thank Harry Bevins for useful discussions.This work was directly supported by the NASA Solar System Exploration Research Virtual Institute cooperative agreement 80ARC017M0006.This work was also partially supported by the Universities Space Research Association via D.R. using internal funds for research development.We also acknowledge support by NASA grant 80NSSC23K0013.J.M. was supported by an appointment to the NASA Postdoctoral Program at the Jet Propulsion Laboratory/California Institute of Technology, administered by Oak Ridge Associated Universities under contract with NASA.This work utilized the Blanca condo computing resource at the University of Colorado Boulder.Blanca is jointly funded by computing users and the University of Colorado Boulder.Software: This research relies heavily on the python (Van Rossum & Drake Jr 1995) open source community, in particular, numpy (Harris et al. 2020), matplotlib (Hunter 2007), scipy (Virtanen et al. 2020), and jupyter (Kluyver et al. 2016).This research also utilized MultiNest (Feroz & Hobson 2008;Feroz et al. 2009Feroz et al. , 2019)), PolyChord (Handley et al. 2015a,b), and globalemu (Bevins et al. 2021).

Figure 1 .
Figure 1.Representative subset of the training set (10% out of 24,000 total) containing mock global 21-cm signals generated by ARES when varying eight astrophysical parameters.The full training set was used to train globalemu (see Table1for the parameter ranges).Shown in bolded blue is the fiducial global 21-cm signal to which we add Gaussian-distributed noise at different levels to form the mock 21-cm data sets that we fit.The mock UVLF data that we add in our joint-fits are also generated by ARES using the same fiducial parameter values (see Table1) that were obtained via calibration to the Bouwens et al. (2015)  = 5.9 UVLF byMirocha et al. (2017), as described in Section 2.6.

Figure 2 .
Figure 2. Top: Representative subset of the test set (200 out of 2,000) generated by ARES ('true' global signals; black, dashed curves) and the corresponding subset of emulations from the globalemu network (solid, red curves) trained on the ARES training set using the architecture [32, 32, 32].Bottom: Differences between the emulated and 'true' signals in the top panel (i.e., emulation residuals), with color depicting the depth of the Cosmic Dawn (CD) trough of the respective signal.The horizontal dotted, red line indicates the mean RMSE of 1.25 mK between the emulated and 'true' signals in the full test set (see Section 2.5).
in Mirocha et al. (2017) for a comparison of the fiducial ARES UVLF model and the Bouwens et al. (2015) UVLF).

Figure 3 .
Figure3.Marginalized 1D and 2D posterior distributions for eight astrophysical parameters in ARES when jointly-fitting mock global 21-cm signal and UVLF data.These eight parameters control the SFE and the UV and X-ray photon production in galaxies (see Table1).The red posterior is obtained using the ARES-trained globalemu network model, and the black posterior is obtained using the full ARES model.Blue vertical and horizontal lines indicate the input, or fiducial, parameter values used to generate the mock data being fit (see Table1), which are calibrated to real observations of the UVLF (see Section 2.6).The statistical noise in the 21-cm data being fit is  21 = 25 mK, which among the five tested we find gives the most accurate 'true' ARES posteriors with respect to the fiducial parameter values, and also highlights for which parameters globalemu obtains biased constraints (see also Figure5).The UVLF data noise is the same as the error on the  = 5.9 UVLF measurements fromBouwens et al. (2015).Contour lines in the 2D histograms represent the 95% confidence levels, and density colormaps are shown.Axis ranges are zoomed-in with respect to the full prior ranges given in Table1.See Table2for further details on each fit.

Figure 4 .
Figure 4. Marginalized 1D posterior distributions when jointlyfitting mock global 21-cm signal and UVLF data using the full ARES model, for three different 21-cm noise levels: 10 mK (optimistic), 25 mK (standard), and 250 mK (pessimistic).These eight parameters control the SFE and the UV and X-ray photon production in galaxies (see Table1).Blue vertical lines indicate the input, or fiducial, parameter values used to generate the mock data (see Section 2.6).The dotted, green histograms result from using PolyChord with  21 = 10 mK and match well the corresponding distributions obtained by using MultiNest.The noise on the mock UVLF being fit is the same as the error on the  = 5.9 UVLF measurements fromBouwens et al. (2015), except for the posteriors for 10 mK shown here, for which we used twice the UVLF error to allow for a reasonable convergence time of the PolyChord run (see Section 3.1).The posteriors for 25 mK and 250 mK are the same as those in Figures3 and B.2, respectively.Axis ranges are zoomed-in from the full prior ranges given in Table1.

Figure 5 .
Figure 5. Top: Emulation bias (number of standard deviations, see Equation5) between globalemu and ARES for different noise levels of the mock 21-cm data being jointly-fit with the mock UVLF data.Generally, the emulation bias decreases as the 21-cm noise level increases.For  21 = 50 mK and 250 mK, the emulation biases are < 1 for all eight parameters, as indicated by the horizontal black line.The emulation biases for  lo ,  min , and  esc can be significantly higher than the rest for certain lower 21-cm noise levels.Bottom: True bias (Equation6) between ARES and the fiducial parameter values, for the same joint-fits.True bias is lowest at 25 mK (< 1 for all parameters), and increases at high and low 21-cm noise levels due to increased uncertainty and difficulty in sampling, respectively (see Section 3.1).As also discussed in the text, note that the high emulation bias on  esc at 5 mK is dominated by its high true bias.

Figure 6 .
Figure6.Marginalized 1D and 2D posterior distributions obtained when fitting either mock global 21-cm signal data (red and black) or mock UVLF data (green).All is the same as in Figure3, except that the statistical noise in the 21-cm data being fit is  21 = 50 mK, and the axis ranges are the full prior ranges given in Table1.See Table2for further details on each fit.
3 and Appendix B (see Figures 3 to 6 and Figures B.1 and B.2), except for the  21 = 25 mK only global signal fits.

Table 1 .
Astrophysical parameters in ARES to be fit with mock global 21-cm signal and high- UVLF data   normalization of X-ray luminosity -SFR relation Log unif.[10 36 , 10 41 ]erg s −1 (M ⊙ yr −1 ) −1 2.6 × 10 39 (Breitman et al. 2023)eased 21cmEMU(Breitman et al. 2023); we leave a comparison of the posteriors obtained from different global 21-cm signal emulators to future work.To obtain a trained globalemu neural network that accurately emulates ARES, we first create a large training set of simulated global 21-cm signals generated by ARES and then train globalemu on this training set.For the latter step, we test multiple network architectures (i.e., different numbers of nodes and hidden layers composing the network; see Bevins et al. 2021 for a detailed description of the network).To create the training set, we generate global 21-cm signals from ARES by drawing random values4 from the parameter ranges given in Table

Table 2 .
Summary of key nested sampling analyses Type of mock data being fit Model used in likelihood  21  UVLF  live log Z  evaluations  accept sec./eval.(mK) (mag −1 cMpc −3 )