BayeSED-GALAXIES I. Performance test for simultaneous photometric redshift and stellar population parameter estimation of galaxies in the CSST wide-field multiband imaging survey

The forthcoming CSST wide-field multiband imaging survey will produce seven-band photometric spectral energy distributions (SEDs) for billions of galaxies. The effective extraction of astronomical information from these massive datasets of SEDs relies on the techniques of both SED synthesis (or modeling) and analysis (or fitting). We evaluate the performance of the latest version of BayeSED code combined with SED models with increasing complexity for simultaneously determining the photometric redshifts and stellar population parameters of galaxies in this survey. By using an empirical statistics-based mock galaxy sample without SED modeling errors, we show finding that the random observational errors in photometries are more important sources of errors than the parameter degeneracies and Bayesian analysis method and tool. By using a Horizon-AGN hydrodynamical simulation-based mock galaxy sample with SED modeling errors about the star formation histories (SFHs) and dust attenuation laws (DALs), the simple typical assumptions lead to significantly worse parameter estimation with CSST photometries only. The SED models with more flexible (or complicated) forms of SFH/DAL do not necessarily lead to better estimation of redshift and stellar population parameters. We discuss the selection of the best SED model by means of Bayesian model comparison in different surveys. Our results reveal that the Bayesian model comparison with Bayesian evidence may favor SED models with different complexities when using photometries from different surveys. Meanwhile, the SED model with the largest Bayesian evidence tends to give the best performance of parameter estimation, which is more clear for photometries with larger discriminative power.


INTRODUCTION
Understanding the complex ecosystem of stars, interstellar gas and dust, and supermassive black holes in galaxies is one of the most important challenges in modern astrophysics (National Academies of Sciences, Engineering, and Medicine 2021).The new generation of space and ground telescopes and the corresponding large surveys will provide vast amounts of multi-band data for understanding the cosmic ecosystems and all the complex physical processes involved.For example, the James Webb Space Telescope (JWST, Rieke et al. 2005;Gardner et al. 2006;Beichman et al. 2012) is able to detect the earliest stages of galaxies from infrared at unprecedented depths, and is expected to provide de-cisive observations of the first generation of stars and galaxies (Beichman et al. 2012;Robertson 2022).Meanwhile, forthcoming deep and wide field surveys with the Chinese Space Station Telescope (CSST, Zhan 2011Zhan , 2018Zhan , 2021)), the Euclid Space Telescope (Laureijs et al. 2011;Joachimi 2016), the Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST, Ivezić et al. 2019;Breivik et al. 2022), and the Nancy Grace Roman Space Telescope (NGRST, Green et al. 2012;Dore et al. 2019) will provide multi-band photometric and spectroscopic information for billions of galaxies.Especially, the CSST wide-field multiband imaging survey is set to image approximately 17,500 square degrees of the sky using NUV, u, g, r, i, z, and y bands in about 10 years of orbital time, which aims to achieve a 5σ limiting magnitude of 26 (AB mag) or higher for point sources in the g and r bands.How to effectively and reliably measure the redshift and the properties of various physical components of galaxies from the obtained huge amount of photometric spectral energy distributions (SEDs) data has become an urgent task to be done.A new generation of SED synthesis and analysis methods and tools are strongly demanded to effectively extract physical information from those massive datasets of observational SEDs.
The SED synthesis and analysis of galaxies are two aspects that are both opposite and unified in nature.The reliability and efficiency of the SED synthesis and analysis methods and tools will directly determine the reliability and efficiency of physical information extraction from the massive multi-wavelength datasets.In terms of SED synthesis of galaxies, the evolutionary synthesis technique of stellar population has become the core method from the pioneering works of Tinsley & Gunn (1976); Tinsley (1978).Nowadays, the stellar population synthesis models of BC03 (Bruzual & Charlot 2003), M05 (Maraston 2005), FSPS (Conroy et al. 2009), BPASS (Eldridge & Stanway 2009), among others are widely used in the study of the formation and evolution of galaxies.However, in the SED synthesis models of galaxies, many important uncertainties remain in almost all the model ingredients (Conroy et al. 2009(Conroy et al. , 2010;;Conroy & Gunn 2010;Conroy 2013), such as the initial stellar mass function (IMF) (Padoan et al. 1997;Hoversten & Glazebrook 2008;van Dokkum 2008;Bastian et al. 2010;Cappellari et al. 2012;Ferreras et al. 2013;Gennaro et al. 2018), the physics of stellar evolution (Thomas & Maraston 2003;Zhang et al. 2005;Maraston et al. 2006;Han et al. 2007;Marigo et al. 2008;Bertelli et al. 2008;Brott et al. 2011;Hernández-Pérez & Bruzual 2013), stellar spectral libraries (Coelho 2009;Choi et al. 2019;Knowles et al. 2019;Coelho et al. 2020;Knowles et al. 2021;Yan et al. 2019), the complex star formation and metallicity enrichment histories (SFHs and MEHs) (Debsarma et al. 2016;Iyer et al. 2019;Carnall et al. 2019;Leja et al. 2019;Aufort et al. 2020;Wang & Lilly 2020;Iyer et al. 2020;Côté et al. 2016;Maiolino & Mannucci 2019;Valentini et al. 2019), the reprocessing by the interstellar gas and dust (Draine 2003(Draine , 2010;;Galliano et al. 2018;Kewley et al. 2019;Salim & Narayanan 2020;Tacconi et al. 2020), and the possible contribution from active galactic nuclei (AGNs) at the center of galaxies (Antonucci 1993(Antonucci , 2012;;Netzer 2015;Hickox & Alexander 2018;Brown et al. 2019a,b;Lyu & Rieke 2022).Different choices of these model ingredients will lead to very different estimation of the redshifts and physical parameters of galaxies, as well as different and even conflicting conclusions about the formation and evolution of galaxies.Therefore, the proper selection of these model ingredients is an essential step in any SED analysis work of galaxies (Han & Han 2019;Han et al. 2020).
In terms of SED analysis of galaxies, the Bayesian method has been widely adopted in the last decade.For example, the widely used and actively developing SED fitting codes, such as MAGPHYS (da Cunha et al. 2008), CIGALE (Noll et al. 2009;Boquien et al. 2019), GalMC (Acquaviva et al. 2011), BayeSED (Han & Han 2012, 2014), BEAGLE (Chevallard & Charlot 2016), Prospector (Leja et al. 2017), BAGPIPES (Carnall et al. 2018), and ProSpect (Robotham et al. 2020) are all based on the Bayesian methods.Besides, a long list of new SED fitting codes, such as MCSED (Bowman et al. 2020), piXedfit (Abdurro'uf et al. 2021), gsf (Morishita 2022), and Lightning (Doore et al. 2023) among others, are build along this way.The application of Bayesian methods implies that the SED analysis of galaxies is considered as a more general Bayesian inference problem instead of the previous Chi-square minimization-based optimization problem known as SED fitting.For the parameter estimation of a give SED model, the Bayesian approach provides the complete posterior probability distribution of parameters as the solution to the SED analysis problem, which is computationally more demanding but allows a more formal and simultaneous estimation of parameter values and their uncertainties.More importantly, for the selection of model ingredients, the Bayesian approach also provides the very useful Bayesian evidence which can be considered as a quantified Occam's razor for effective model selection.
A noteworthy difference among the Bayesian SED analysis tools is that the earlier tools (e.g.MAGPHYS and CIGALE) are based on irregular or regular grid search, while the newer generation of tools (e.g.GalMC and BayeSED) are based on more advanced random sampling techniques such as Markov Chain Monte Carlo (MCMC, Sharma 2017;Hogg & Foreman-Mackey 2018) and Nested Sampling (NS, Skilling 2006;Buchner 2021;Ashton et al. 2022).The advantage of the grid-based Bayesian approach is that an SED library with regular or irregular model grids can be built in advance for only once.Besides, the prior probabilities can be set more freely during this procedure.Then it can be used in the analysis of a large sample of galaxies of any size without the generation of new SEDs.However, the size of SED library needs to be very large to allow a reasonable parameter estimation for all galaxies in the sample, especially in the case with regular grids where the number of required grids will increase dramatically with the number of free parameters.In contrast, a sampling-based Bayesian approach allows for a more detailed and efficient sampling of the parameter space for each galaxy and allows for a finer reconstruction of the posterior, leading to more reliable parameter estimates.However, the theoretical SED synthesis needs to be done in realtime and repeated many times, which could be very computationally expensive for the analysis of very large samples of galaxies.Fortunately, a much more efficient synthesis SED models can be achieved with the help of machine learning techniques.For example, in Han & Han (2014) we have employed the artificial neural network (ANN) and K-nearest neighbor (KNN) searching techniques to speed up the sampling-based Bayesian approach.The combination of sampling-based Bayesian inference and machine learning techniques enables the detailed Bayesian SED analysis of very large samples of galaxies (Han et al. 2019).Although the training phase of machine-learning-based SED synthesis method could be very time-consuming, especially for very complex SED models with many free parameters and the accurate synthesis of high-resolution SED, it is very promising with more advanced training techniques (Alsing et al. 2020;Gilda et al. 2021;Qiu & Kang 2022;Hahn & Melchior 2022) and worthwhile to carry out further exploration in this direction.
For the study of galaxy formation and evolution, the ideal SED synthesis and analysis tool should be able to simultaneously account for the contributions of stars, interstellar gas and dust, and AGN components, and to provide accurate and efficient estimates of the redshift and the physical properties of all components.However, in practice, it is very difficult, if not impossible, to fully satisfy all of these requirements.Therefore, a good SED synthesis and analysis tool should attempt to achieve a reasonable balance among these requirements as much as possible.This is what we are trying to achieve during the development of the BayeSED code (Han & Han 2012, 2014, 2019;Han et al. 2019Han et al. , 2020)).In this work, we will rigorously test the performance of the latest version of BayeSED code combined with SED models with increasing complexity for simultaneous photometric redshift and stellar population parameter estimation of galaxies, so as to be ready for the analysis of the forthcoming massive datasets from the CSST wide-field multiband imaging survey and others.
We begin in §2 by introducing the methods we have employed for the generation of empirical statistics-based ( §2.1) and hydrodynamical simulation-based ( §2.2) mock catalog of galaxies, observational error modeling ( §2.3) and the selection of samples ( §2.4) that will be used for the performance test.In §3, we briefly describe the Bayesian approach of photometric SED analysis methods, including parameter estimation ( §3.1) and model selection ( §3.2).Especially, in §3.3, we will introduce some runtime parameters of MultiNest algorithm which is the core engine of BayeSED.They need to be properly tuned to improve the performance of BayeSED.We present the results of performance test for the case without SED modeling errors using an empirical statistics-based mock galaxy sample in §4.In §5, by employing the simplest SED model, we present the results of performance test for the case with SED modeling errors about the SFH and DAL of galaxies using a Horizon-AGN hydrodynamical simulation-based mock galaxy sample for CSST-like imaging survey.In §6, we discuss the effectiveness of more flexible (or complex) forms of SFH and DAL of galaxies for improving the performance of simultaneous redshift and stellar parameter estimation in CSST-like ( §6.1), CSST+Euclid-like ( §6.2), and COSMOS-like ( §6.3) surveys with increasing discriminative power, respectively.Especially, we also discuss the relation between the metrics of the quality of parameter estimation and Bayesian evidence, as well as how they depend on the different surveys.Finally, a summary of our results and conclusions is presented in §7.

BAYESIAN PHOTOMETRIC SED SYNTHESIS WITH BAYESED
The SED synthesis (or modeling) module is an essential part in any Bayesian SED fitting code.In BayeSED-V3, we have added more functions for SED synthesis, especially for the simulation of mock observation of galaxies in a Bayesian way.This is not just crucial for the current work, but also lays the foundation for future applications of machine learning and simulationbased Bayesian inference methods in Bayesian SED fitting (Hahn & Melchior 2022;Hahn et al. 2022).For this work, we use the empirical statistics-based ( §2.1) and hydrodynamical simulation-based ( §2.2) methods to generate mock photometric catalog, add noise with a simple magnitude limit-based approach (2.3), and select a sample (2.4) similar to previous works for the performance test in the next two sections.In the following, we introduce them in more detail.

Empirical statistics-based photometric mock catalog
The first method to generate mock photometric catalog is built by randomly draw samples from the parameter space of a particular SED model while under the constraints of some empirical statistical properties of galaxies.The sampling is performed with the same nested sampler MultiNest as in the Bayesian SED analysis mode of BayeSED.A sample from this catalog will be used in §4 to test the performance of redshift and stellar population parameter estimation in the case where the SED modeling is perfect, since exactly the same SED modeling method will be used in the Bayesian SED analysis of it.

SED modeling
As in Han & Han (2019), the SED of a galaxy is modeled as the luminosity of starlight from stellar populations of varying ages and metallicities, transmitted through the Interstellar Medium (ISM) and the Intergalactic Medium (IGM) to the observer.Specifically, the luminosity emitted at wavelength λ by a galaxy with age = t can be given as: where ψ(t − t ′ ) is the star formation history (SFH) describing the SFR as a function of the time t − t ′ , and S λ [t ′ , Z(t − t ′ )] is the luminosity emitted per unit wavelength per unit mass by a simple stellar population (SSP) of age t ′ and metallicity Z(t − t ′ ).
T ism λ (t, t ′ ) is the transmission function of the ISM (Charlot & Longhetti 2001), which is contributed by two components: where T + λ (t, t ′ ) and T 0 λ (t, t ′ ) are the transmission functions of the ionized gas and the neutral ISM, respectively.The transmission through ionized gas can be modeled with photoionization code such as CLOUDY.However, we set T + λ (t, t ′ ) = 1 in this work to be consistent with the hydrodynamical-simulation based catalog ( §2.2).A detailed modeling of T + λ (t, t ′ ) with CLOUDY to account for the combined effects of starlight absorption, nebular line emission, ionized continuum emission, and possible emission from warm dust within HII regions will be presented in a companion paper.Meanwhile, the transmission functions of the neutral ISM is considered with a simple time-independent dust attenuation law (DAL) and uniformly applied to the whole galaxy.
Z(t − t ′ ) is the stellar metallicity as a function of the time t−t ′ , which describe the chemical evolution history of the galaxy.In previous works, we assume a timeindependent metallicity, i.e.Z(t − t ′ ) = Z 0 , as in many SED fitting codes of galaxy.To properly consider the evolution of stellar metallicity, we additionally employ a linear SFH-to-metallicity mapping model (Driver et al. 2013;Robotham et al. 2020;Thorne et al. 2021;Alsing et al. 2023): (4) Generally, the main ingredients for our SED modeling of galaxies are the SSP model, SFH, CEH and DAL.In this work,as the construction of the Horizon-AGN hydrodynamical-simulation based catalog ( §2.2), we use the SSP model assuming a Chabrier (2003) stellar IMF from the widely used stellar population synthesis model of Bruzual & Charlot (2003).The SFH of galaxies is typically parameterized as the exponentially declining form: SFR(t) ∝ e −t/τ (hereafter τ model).The τ model only describes the SFH of galaxies in a closed box without inflow of pristine gas and outflow of processed gas, where the gases are converted to stars at a rate proportional to the remaining gas and with a fixed efficiency (Schmidt 1959;Tinsley 1980).It is widely discussed in the literature that this simple assumption may lead to systematically biased estimation of stellar population parameters, especially for galaxies at z ≳ 2 (Lee et al. 2009(Lee et al. , 2010;;Reddy et al. 2012;Ciesla et al. 2017;Carnall et al. 2018).Therefore, some more flexible and physically inspired form of models have been suggested to improve the measurement of SFHs of galaxies and the estimation of their stellar population parameters and photometric redshift (Pacifici et al. 2012;Ciesla et al. 2017;Iyer & Gawiser 2017;Carnall et al. 2019;Leja et al. 2019;Iyer et al. 2019;Lower et al. 2020;Suess et al. 2022).
In the present work, we employ three extension of the τ model with different complexity.The first one is described as: which is just an extended form of the delayed-τ model (Lee et al. 2010).Apparently, the typical τ model and delayed-τ model are just two special cases of this model (hereafter β-τ model) with β = 0 and β = 1, respectively.The second one is the β-τ model combined with a quenching (or rejuvenation) component which is described as (Ciesla et al. 2016): ) where t trunc is the time when the star formation is quenched (r SFR < 1) or rejuvenated (r SFR > 1), and r SFR is the ratio between ψ(t > t trunc ) and ψ(t = t trunc ): This model (hereafter β-τ -r model) is a further extension of the β-τ model with the latter being the special case with r SFR = 1.The third one is the double power-law model (Diemer et al. 2017;Carnall et al. 2018;Alsing et al. 2023) combined with a quenching (or rejuvenation) component which is described as: when t <= t trunc r SFR × ψ (t = t trunc ) , when t > t trunc (8) where α and β are the falling and rising slopes, respectively, and t * is related to the time at which star formation peaks, which is defined as t * ≡ τ t age for the age of the galaxy t age .A major advantage of the double-powerlaw model is the decoupling of the rising and falling parts of the SFH.Therefore, this model (hereafter α-βτ -r model) is even more flexible than the β-τ -r model.
The dust attenuation law (DAL) is another very important ingredient for the SED modeling of galaxies (Walcher et al. 2011;Conroy 2013).When deriving the photometric redshift and physical properties of galaxies from the analysis of their photometric or spectroscopic observations, a universal DAL as a simple uniform screen is commonly assumed.However, different choices of the universal law may lead to very different estimation of photometric redshift and physical parameters of galaxies (Pforr et al. 2012(Pforr et al. , 2013;;Salim & Narayanan 2020).Especially, many studies show that the dust attenuation curve of different galaxies are very different (Kriek & Conroy 2013;Reddy et al. 2015;Salmon et al. 2016;Salim & Boquien 2019;Shivaei et al. 2020), and therefore there is no universal DAL as expected on theoretical grounds (Witt & Gordon 2000;Seon & Draine 2016;Narayanan et al. 2018;Lower et al. 2022).By a detailed study of the dust attenuation curves of about 230,000 individual galaxies in the local universe using photometric data covering from UV to IR bands, Salim et al. (2018) presented new forms of attenuation laws that are suitable for normal star-forming galaxies, highz analogs, and quiescent galaxies (See also Noll et al. 2009).In this work, we additionally employ this new form of DAL which is parameterized as following: where k λ,Cal /R V,Cal is the Calzetti et al. (2000) DAL with R V,Cal = 4.05.The power law term with an exponent δ is introduced to deviate from the slope of the Calzetti et al. (2000) DAL.R V,mod is the δ-dependent ratio of total to selective extinction for the modified law.
The term D λ is introduced to add a UV bump.The relationship between R V,mod and δ is given by: The UV bump following a Drude profile (Fitzpatrick 1986) is represented as: with the amplitude E b , fixed central wavelength λ 0 = 0.2175µm and width γ = 0.35µm.In total, we have considered six different combinations of SFH, CEH, and DAL with increasing complexity.A summary of these models, their parameters and priors are shown in Table 1.Finally, we also include the effect of IGM absorption with the description of Madau (1995).Other other more recent consideration of IGM absorption are also available in BayeSED.However, the exploration of the effects of different choices of IGM absorption models on the redshift and stellar population parameter estimation is beyond the scope of this work, which will not change the conclusions given here.

Galaxy population modeling
To model the galaxy population, we need to set the joint probability distribution that characterizes the statistical properties of the galaxy population.The statistical properties of the galaxy population are the results of the complex physical procedures happened during the  formation and evolution of galaxies.In this work, we employ some widely discussed empirical statistical properties of galaxies to model the galaxy population phenomenologically.It should be mentioned that there are large uncertainties in these statistical properties, and we do not attempt to use the most up-to-date results for all of them in this work.The other choices of statistical properties of galaxies will not change the conclusions of this work.Similar to Tanaka (2015)(See also Alsing et al. 2023), we assume that the joint probability distribution of the stellar population parameters and redshift of galaxy population can be factorized as The joint distribution of stellar mass and redshift is defined as where Φ(M * , z) is the unnormalized stellar mass function and dV (z) is the differential comoving volume element.We employ the recent measurement of stellar mass function and its redshift evolution from Leja et al. (2020), while a WMAP-5 (Spergel et al. 2003) cosmology for the comoving volume element.Following Tanaka (2015), we assume that P (SF R | M * , z) can be expressed as the sum of two Gaussians to represent two distinct sequences formed by star forming and quiescent galaxies: where SFR SF (M * , z) is the mean SFR of star forming galaxies given by and the fraction of quiescent galaxies is given as a function of stellar mass and redshift (Behroozi et al. 2013): As in Tanaka (2015), the dust attenuation is considered to positively correlate with SFR: where σ τ V = 0.5, and Then, we use the relation between τ V and A V to obtain The probability of the age of a galaxy is described conditionally on the stellar mass and redshift: where This lead to a low probability for a massive galaxy with young age, while a high probability for a low-mass galaxy with the same age.Finally, the probability of mass weighted stellar metallicity is modeled as: where σ log(ZMW) = 0.1, and is the redshift dependent stellar mass and metallicity relation (Ma et al. 2016) which is predicted by using the high-resolution cosmological zoom-in simulations from the Feedback in Realistic Environment (FIRE) project (Hopkins et al. 2014).
To generate empirical statistics-based mock catalog of galaxies, we employ the MultiNest algorithm to draw samples from the joint probability distribution of the stellar population parameters and redshift of galaxy population by setting P (G, z) in Equation 12to be the likelihood function.Besides, to simulate a magnitudelimited sample, we can additionally set the likelihood function to be 0 when the magnitude in a given band is larger than a given value.Since the sampling points with likelihood to be 0 will be ignored by MultiNest, the obtained posterior sample can be used to buid a magnitude-limited sample of mock galaxies with some physical constraints from the empirical statistical properties of galaxies.More details about the selection of magnitude-limited sample is presented in §2.4.The mock catalog can be build with the posterior sample of redshift and all physical parameters given by Multi-Nest.However, this is a weighted sample (Yallup et al. 2022), which can not be directly used as a mock sample of galaxies.To build a more realistic mock sample of galaxies, we use bootstrap resampling method to obtain an unweighted sample.
In total, we have build six mock catalog of galaxies by employing SED models with different combinations of SFH, CEH, and DAL and increasing complexity as shown in Table 1, respectively.The employed priors of redshift and stellar population parameters are listed in the same Table for each model, respectively.In Figure 1, we show the joint distributions of redshift and physical parameters of the six empirical statistics-based mock galaxy population.Although the sampe set of empirical are employed, different SED models lead to slightly different distribution of parameters, especially for redshift and galaxy age.This is likely due to different mapping relations from free parameters to derived parameters.For example, different forms of SFH may lead to different relations between the age of galaxy and its recent SFR.

Hydrodynamical simulation-based photometric mock catalog
The second method to generate mock photometric catalog is based on an SED library which is built by the post-processing of galaxies from a hydrodynamical simulation.This catalog will be used in §5 to test the performance of redshift and stellar population parameter estimation in the case where the SED modeling is imperfect, since the SED modeling method employed in the Bayesian SED analysis will be very different from the one used to built it.
We start from the rest-frame spectra of galaxies which are produced using the light-cone from the cosmological hydrodynamical simulation Horizon-AGN (Dubois et al. 2014).The computation of these spectra, which accounts for the complex star formation history and metal enrichment of Horizon-AGN galaxies, and consistently includes dust attenuation, is described in details by Laigle et al. (2019) and Davidzon et al. (2019).The dust attenuation of galaxies in Horizon-AGN simulation is modelled for each stellar particle, assumed to be a SSP, by using the gas metal mass distribution as an approximation of the dust mass distribution, assuming a constant dust-to-metal mass ratio (Laigle et al. 2019).Besides, to obtain the amount of extinction at a given wavelength, the Weingartner & Draine (2001) model of Milky Way dust grain with R V = 3.1 and the prominent 2175 Å-graphite bump is employed for post-processing the simulated galaxies.As mentioned in Laigle et al. (2019), the overall attenuation curve becomes less steep and the bump tends to be reduced when summing up the contribution of all the SSPs to obtain the resulting galaxy spectrum.They also noticed that the averaged attenuation curve in Horizon-AGN simulation can not be well reproduced by either the model of Calzetti et al. (2000) or Arnouts et al. (2013).The more flexible form of DAL as given by Equation 9 is more likely to reproduce the attenuation curves of galaxies in the Horizon-AGN simulation.In order to isolate the possible differences in the convolution with filter response function, observational error modeling, and consideration of IGM absorption, we choose to convert their rest-frame spectra of mock galaxies to corresponding mock photometries with BayeSED, instead of using their virtual photometries directly1 .The consideration for the effects of IGM absorption is the same as in §2.1.1.Therefore, the difference between the empirical statistics-based ( §2.1) and hydrodynamical simulationbased photometric mock catalog are only driven by the different SFH, CEH, DAL of mock galaxies and their different distribution of redshift and physical parameters.

Observational error modeling
The modeling of realistic errors on the flux is crucial for a meaningful performance test of redshift and stellar population parameter estimation.Here, we introduce the method we have employed to compute flux errors of mock galaxies and perturb their fluxes accordingly.The flux error for a wavelength band i with Nσ AB magnitude limit m lim,i = −2.5 * log(F lim,i )+23.9 is given by: where the flux limit F lim,i and the systematic flux error σ F,i,sys are in unit of µJy.From the relation that magnitude error σ m ≈ 1.08574/SN R and signal-to-noise ratio SN R = F/σ F , we can obtain: As in Cao et al. (2018), we assume a systematic magnitude error σ m,sys = 0.02.The final mock flux is obtained by the original flux perturbed by a Gaussian noise ϵ ∼ N 0, σ2 F,i .In practice, the magnitude limit may have a dispersion σ m,lim,i for galaxies with different sizes.So, the actually used magnitude limit is drawn from the Gaussian distribution N m lim,i , σ 2 m,lim,i .In this work, we set σ m,lim,i = 0.1.We have generated three sets of mock catalog for CSST-like, Euclid-like, and COSMOS-like surveys, respectively.A summary of the adopted depths in all bands of the three surveys is shown in Table 2.The response functions and modeled relation between magnitude and magnitude error are shown in the panels of Figure 2 for the 7 CSST bands, 3 Euclid bands and 26 COSMOS bands, respectively.To separate the effects of observational errors on the accuracy of parameter estimation, we also generated another two sets of mock catalog without adding observational errors.In this case (the no noise case in Figure 2), the magnitude errors are all fixed to be 0.01, but the photometries have not been perturbed accordingly.
b The 5σ depths for extended sources in the CSST wide-field multiband imaging survey (Gong et al. 2019).The CSST deep survey can be at least 1 mag deeper .The results of performance tests for the latter will be presented in future works.
Table 2.A summary of the adopted depths in all bands for CSST-like, Euclid-like, and COSMOS-like mock observations.

Sample selection
To test the performance of BayeSED, we selected two sets of samples of galaxies with K s < 24.7 (Laigle et al. 2019) and i+ < 25 (Cao et al. 2018) 2 from the empirical statistics-based mock catalog ( §2.1) and the hydrodynamical simulation-based mock catalog ( §2.2), respectively.The first set of samples are obtained directly with BayeSED combined with SED models with different complexity by using the method present in §2.1.2.
The second sample is selected from the Horizon-AGN hydrodynamical simulation-based photometric catalogs for COSMOS-like configuration3 which contains 789, 354 galaxies.We find that a sample with 10, 000 galaxies is large enough to obtain stable results for the performance tests as presented in §4 and §5.The redshift and magnitude distributions of the two samples are presented in Figure 3.When employing different SED models with different complexity and the same set of empirical statistics, the empirical statistics-based samples show some differences, especially the redshift distribution.This is likely due to the different mapping relations from physical parameters to photometries and from free parameters to derived parameters for different SED models.Generally, the hydrodynamical simulation-based sample is consistent with the empirical statistics-based samples.
We attribute the difference between the two set of samples to the different modeling of the SFH, CEH and DAL of galaxies and their different distribution of redshift and physical parameters.

BAYESIAN PHOTOMETRIC SED ANALYSIS WITH BAYESED
The general method for the application of Bayesian inference to photometric SED analysis of galaxies is the same as in Han & Han (2014, 2019).In this section, we introduce some special aspects of Bayesian parameter estimation ( §3.1) and model selection ( §3.2) which are relevant to the current work.

Bayesian parameter estimation
For the Bayesian analysis of the mock data generated in the last section, we employ the same SED modeling procedure and setting of priors for free parameters as in §2.1.1,while the commonly used Gaussian form of likelihood function is employed.The performance of this Bayesian analysis, including its speed and quality, is crucial for the analysis of large sample of galaxies in the big data era.We need some metrics to quantify the performance of parameter estimation which is the main subject of this work.
While the speed of parameter estimation can be easily quantified by the running time, some metrics for the quality of parameter estimation are required.Similar to Acquaviva et al. (2015), we use three metrics to quantify the quality of parameter estimation.Bias, which characterizes the median separation between the predicted and the true values, is defined as: while the precision, which describes the scatter between predicted and the true values, is defined as: where ∆x = (x phot − x true )|/(1 + x true ) for redshift (Ilbert et al. 2009;Dahlen et al. 2013;Salvato et al. 2018), and ∆x = (x phot − x true )/(x max true − x min true ) for other parameters.The median-base definition makes them to be less sensitive to outliers (sources with unexpectedly large errors).The fraction of outliers is defined as:

Bayesian model selection
An important advantage of nested sampling-based algorithm, such as MultiNest, over MCMC-based method is the ability to carry out a simultaneous parameter estimation and model selection.While the main subject of this work is parameter estimation, it is also interesting to explore the effects of sampling parameters (namingly, nlive and ef r) of MultiNest on the computation of Bayesian evidence, the quantity which is crucial for Bayesian model selection.
In Han & Han (2019), we presented a mathematical framework to discriminate the different assumptions about SSP, DAL and SFH in the SED modeling of galaxies based on the Bayesian evidence for a sample of galaxies.In this work, since the SSP model employed in the generation of mock data is the same as that employed in their Bayesian analysis, we do not need to consider the different choices of SSP.So, the problem is significantly simplified.In this work, we focus on the computation of the Bayesian evidence for the SED modeling of a sample of galaxies with SSP, SFH, and DAL all being assumed to be universal (i.e.M (ssp 0 , sf h 0 , dal 0 )-like model (See Section 5.1 of Han & Han 2019).)The sample Bayesian evidence in this case (as Equation 33of Han & Han (2019)) is: Although the detailed SFH and DAL of different galaxies can vary significantly, the sample Bayesian evidence computed in this manner remains valuable for identifying the most efficient combination of SFH and DAL for analyzing a vast sample of galaxies, such as the one provided by the CSST wide-field imaging survey.
In practice, the natural logarithmic of Bayesian evidence is commonly used for Bayesian model selection.Therefore, Equation 30 can be rewritten as: where ln(p(d g |M (ssp 0 , sf h 0 , dal 0 ), I)), the Bayesian evidence for an individual galaxy, can be directly obtained in BayeSED with MultiNest.However, the individual Bayesian evidences estimated with MultiNest contain errors.A more strict Bayesian model selection should consider the effects of error propagation.In our case, the error of the sample Bayesian evidence ln(BE) is simply the sum of errors for individual galaxies which is provided by MultiNest as well.
The minimum χ 2 method is also widely used for model selection.For the case with Gaussian observational errors, there is only a constant difference between the minimum χ 2 and the natural logarithmic of maximum likelihood.The sample maximum likelihood (as Equation 32of Han & Han (2019)) is: Then, the natural logarithmic of sample maximum likelihood is: where ln(max θg [p(d g |θ g , M (ssp 0 , sf h 0 , dal 0 ), I)]), the natural logarithmic of maximum likelihood for an individual galaxy, can be directly obtained in BayeSED with MultiNest.Similar to the model selection with Bayesian evidence, only the difference of ln(M L) between different models is useful for the model selection.Therefore, the model selection with ln(M L) is equivalent to that with minimum χ 2 .In §6, we will discuss the difference between the two model selection methods.

Runtime parameters of MultiNest algorithm
As the Bayesian inference engine of BayeSED, Multi-Nest has some runtime parameters.The values of these runtime parameters have very important effects on the performance of BayeSED for redshift and stellar population parameter estimation of galaxies.Here, we briefly introduce the meaning of these runtime parameters of MultiNest algorithm.
Nested sampling (NS) (Skilling 2004(Skilling , 2006)), as a Monte Carlo (MC) method primarily designed for the efficient computation of the Bayesian evidence, allows posterior inference as a by-product at the same time.So, it provides a way to carry out simultaneous Bayesian parameter estimation and model selection.As an algorithm built on the NS framework, MultiNest (Feroz & Hobson 2008;Feroz et al. 2009) is special for its efficiency in sampling from posteriors that may contain several modes and/or degeneracies.It has been improved further by the implementation of importance nested sampling (INS) (Cameron & Pettitt 2014;Feroz et al. 2019) to increase the efficiency for evidence computation.In the latest version of BayeSED, the V3.12 version of MultiNest, which includes the implementation of INS, is employed.
Similar to most nested sampling algorithms, Multi-Nest explores the posterior distribution by maintaining a fixed number (See also Higson et al. 2019; Speagle 2020, for new methods using variable number) of samples drawn from the prior distribution, called live points, and iteratively replaces the point with the lowest likelihood value (the dead point) with another point drawn from the prior but has a higher value of likelihood.While there are many runtime parameters of MultiNest which can be set in BayeSED, only two of them are of particular importance.They largely determined the accuracy and computational cost for the running of Multi-Nest algorithm and therefore BayeSED.The first one is the total number of live points (nlive), which determines the effective sampling resolution.The second one is the target sampling efficiency (ef r), which determines the ratio of points accepted to those sampled.Generally, the larger nlive and lower ef r lead to more accurate posteriors and evidence values but higher computational cost.The optimal value of nlive and ef r should be problemdependent, although ef r equals to 0.8 and 0.3 are recommended by the authors of MultiNest for parameter estimation and evidence evalutaion, respectively.
In this work, we will explore the effects of nlive and ef r on the estimation of photometric redshift and stellar population parameters.The results are presented in §4.1 and §4.2, respectively.

RESULTS OF PERFORMANCE TESTS USING EMPIRICAL STATISTICS-BASED MOCK GALAXY SAMPLE
In this section, we present the results of performance tests of photometric redshift and stellar population parameter estimation by using empirical statistics-based mock galaxy sample for CSST wide-field multiband imaging survey.Since the SED model employed in the Bayesian parameter estimation is exactly the same as that used in the generation of mock observations, the error of parameter estimation is mainly contributed by the random error in the data, parameter degeneracies, the stochastic nature of the employed MultiNest sampling algorithm and other potential errors in the BayeSED code.To separate out the effects of the random photometric error in the data, we will consider the two cases with and without adding random noise to the photometric data.Besides, to find out the optimal run parameters, we have considered six different choices of the target sampling efficiency (ef r) and eight choices of the number of live points (nlive) for the MultiNest sampling algorithm.Furthermore, we compare the performance of different SED models with increasing complexity in terms of running time and quality of parameter estimation.

Photometric redshift estimation
The results of performance tests for photometric redshift estimation are shown in Figure 4.In Figure 4(a), we show the results for only the simplest SED model (SFH=τ ,-CEH,DAL=Cal+00) employed in this work.As shown in the top right panel of this figure, in the case without noise, there is a clear anti-correlation between the computation time (or the sampling resolution nlive) and the error σ NMAD (defined in Equation 28) of photometric redshift estimation.Meanwhile, a smaller ef r makes the anti-correlation converge faster with the increasing of nlive.There is a clear lower limit for the value of σ NMAD , which is about 0.006.As shown in the top left panel of Figure 4(a), in the case with noise, the error of photometric redshift estimation does not always decrease with the sampling resolution nlive.When we set ef r = 0.1, the lowest error (≲ 0.056) of redshift estimation is obtained when nlive is about 25.When nlive > 25, the error of redshift estimation start to increase with nlive and finally converge to ∼ 0.058.This is most likely due to the overfitting to the noise added to the mock data.
The middle two panels of Figure 4(a) show the relation between the computation time (or nlive) and the bias (defined in Equation 27) of photometric redshift estimation.In the case with noise, the relation between the computation time (or nlive) and bias has almost the opposite profile of that of the error σ NMAD .However, the bias of photometric redshift estimation is generally very small, which is almost zero in the noise-free case.
The bottom two panels of Figure 4(a) show the relation between the computation time (or nlive) and the fraction of outliers OLF (defined in Equation 29) of photometric redshift estimation.Similar to that for σ NMAD , in the noise-free case, there is a clear anti-correlation between the computation time (or nlive) and OLF.In this case, the lower limit for the value of OLF is about 0.002.In the case with noise, the relation between the computation time (or nlive) and OLF has the same profile as that of the error σ NMAD .When we set ef r = 0.1, the lowest OLF (≲ 0.215) of redshift estimation is also obtained when nlive is about 25.When nlive > 25, the  2018), the number of free parameter is greater than the number of photometric data points (7 for CSST imaging survey), as shown in Table 1.
OLF of redshift estimation start to increase with nlive and finally converge to ∼ 0.225.
In Figure 4(b), we show the results for all of the six SED models with increasing complexity.Here, only the results with ef r = 0.1 are shown.In the case without noise, as shown in the top right panel of this figure, the error σ NMAD of photometric redshift estimation tends to converge to a larger value when more complicated SED model is employed.This is not strange, since more complicated SED models have more free parameters and thus suffer from more severe parameter degeneracies.Besides, more complicated SED models apparently require longer running time.The bias of redshift estimation is always very small no matter which SED is employed.In general, when more complicated SED model is employed, the OLF of redshift estimation also increases significantly, and decreases much slower with the increasing of nlive.
In the case with noise, as shown in the left panels of Figure 4(b), the results are a little more complicated.For the first three simplest SED models, the error σ NMAD of photometric redshift estimation apparently increases with the increasing of model complexity.However, when more complicated forms of SFH is considered, σ NMAD start to decreases with the increasing of model complexity, although not very significantly.Meanwhile, the most complicated SED model (SFH=αβ-τ -r,+CEH,DAL=Sal+18) lead to the smallest absolute value of bias, although the bias is actually very small in all cases.The situation for OLF is somewhat similar to that of σ NMAD .No matter which SED model is employed, when nlive ≳ 25, both σ NMAD and OLF of redshift estimation start to increase, and then slowly decrease to a stable value.

Stellar population parameter estimation
In this subsection, we show the results of performance tests for the photometric stellar population parameter estimation.While the estimates of many stellar population parameter are available, we only show the results for stellar mass and SFR, which are two of the most important physical parameters for the study of the formation and evolution of galaxies.

Stellar mass
The results of performance tests for stellar mass estimation are shown in Figure 5.In Figure 5(a), we show the results for only the simplest SED model (SFH=τ ,-CEH,DAL=Cal+00) employed in this work.As shown in the top right panel of this figure, in the case without noise, there is also a clear anti-correlation between the computation time (or the sampling resolution nlive) and the error σ NMAD of photometric stellar mass estimation.The behavior of error σ NMAD of stellar mass with respect to the change of efr is similar to that of photometric redshift.There is also a clear lower limit for the value of σ NMAD , which is about 0.04.As shown in the top left panel of Figure 5(a), in the case with noise, the error of stellar mass estimation does not always decrease with the sampling resolution nlive as well.When we set ef r = 0.1, the lowest error (∼ 0.1130) of stellar mass estimation is also obtained when nlive is about 25.When nlive > 25, the error of stellar mass estimation only slightly increase with nlive.The error of stellar mass is about two times larger than that of photometric redshift estimation.The bias of stellar mass estimation is also larger, but still very small when comparing with σ NMAD .In the noise-free case, there is also a clear anticorrelation between the computation time (or nlive) and the OLF of stellar mass estimation, where the lower limit for the value of OLF is about 0.03.In the case with noise, when we set ef r = 0.1, the lowest OLF (≲ 0.285) of stellar mass estimation is also obtained when nlive is about 25.When nlive > 25, the OLF of stellar mass estimation only slightly increase with nlive as well.The OLF of stellar mass estimation is slightly larger than that of photometric redshift estimation.
In Figure 5(b), we show the results for all of the six SED models with increasing complexity, where only the results with ef r = 0.1 are shown.In the two cases with or without noise, as shown in the top right panel of this figure, the error σ NMAD of photometric stellar mass estimation tends to converge to a larger value when more complicated SED model is employed.The same is true for the OLF of photometric stellar mass estimation.The behavior of bias is somewhat different, but it is generally very small when comparing with σ NMAD .Besides, in the case with noise, for the most complicated SED model (SFH=α-β-τ -r,+CEH,DAL=Sal+18) used in this work, the σ NMAD , bias and OLF of stellar mass estimation increases significantly when nlive > 100.This should be a very clear indication of overfitting to the noise in the data.In general, more complicated SED models lead to worse quality of stellar mass estimation.

Star-formation rate
The results of performance tests for SFR estimation are shown in Figure 6.In Figure 6(a), we show the results for only the simplest SED model (SFH=τ ,-CEH,DAL=Cal+00) employed in this work.Similar to the results for photometric redshift and stellar mass estimation, in the case without noise, there is also a clear anti-correlation between the computation time (or the sampling resolution nlive) and the error σ NMAD of photometric stellar mass estimation.The behavior of error σ NMAD of SFR with respect to the change of efr is similar to that of photometric redshift and stellar mass.There is also a clear lower limit for the value of σ NMAD , which is about 0.02.As shown in the top left panel of Figure 5(a), in the case with noise, the error of SFR estimation increases apparently when the sampling resolution nlive ≳ 25.Generally, the error of SFR estimation is slightly smaller than that of stellar mass estimation.The bias of SFR estimation is also slightly smaller, and ignorable with respect to σ NMAD .In the noise-free case, the relation between the computation time (or nlive) and the OLF of SFR estimation is somewhat different from that of photometric redshift and stellar mass.Even with nlive = 500, the OLF of SFR estimation still does not seem to converge.The lower limit for the value of OLF seems near 0.1.In the case with noise, the OLF of SFR estimation converge much faster to about 0.255 when we set ef r = 0.1.This is slightly smaller than that of stellar mass estimation.
In Figure 6(b), we show the results for all of the six SED models with increasing complexity, where only the results with ef r = 0.1 are shown.In the two cases with or without noise, as shown in the top right panel of this figure, the error σ NMAD of SFR estimation tends to converge to a larger value when more complicated SED model is employed, and is more sensitive to the selection of SED model than that of stellar mass.The same is true for the OLF of SFR estimation.The behavior of bias is somewhat different, but it is generally very small when comparing with σ NMAD .Besides, in the case with noise, for the four most complicated SED models used in this work, the σ NMAD , bias and OLF of SFR estimation significantly increases with nlive.This is another even more clear indication of overfitting to the noise in the data.In general, more complicated SED models lead to worse quality of SFR estimation.

Computation of Bayesian evidence
In this subsection, we present the results of performance test for the computation of Bayesian evidence, a quantity which is crucial for Bayesian model selection.
In Figure 7(a), we show the results for only the simplest SED model (SFH=τ ,-CEH,DAL=Cal+00) employed in this work.As shown in the top and middle panels of this figure, the Bayesian evidence computed with importance sampling is more stable than that without importance sampling, especially in the case with noise.So, hereafter and especially in §5 and 6, all mentioned Bayesian evidences are computed with importance sampling.The value of Bayesian evidence increases with the number of live points (nlive) which determines the effective sampling resolution.In all cases, it eventually converges to a stable value when nlive is very large, while a smaller sampling efficiency (ef r) leads to faster convergence rate.A good balance between the speed and quality of Bayesian evidence estimation can be achieved when the MultiNest runtime parameters ef r equals to 0.1 and nlive equals to 50.
On the other hand, as shown in the bottom panels of Figure 7(a), the error of Bayesian evidence decreases with nlive, while a larger sampling efficiency (ef r) also leads to faster convergence rate.However, unlike the value of Bayesian evidence, the error of Bayesian evidence converges slower with the increasing of nlive in all cases.As a result, if we set ef r = 0.1 and nlive = 50, the error of Bayesian evidence would be overestimated.A much larger value of nlive seems required to obtain a more reliable estimation for the error of Bayesian evidence with MultiNest, which would be very computationally expansive and not suitable for the analysis of massive photometric data.However, in practice, this may not be a serious issue, since an overestimated error of Bayesian evidence only leads to a more conservative conclusion about model comparison.We just need to keep this in mind.
In Figure 7(b), we show the results for all SED models with increasing complexity, where only the results with ef r = 0.1 are shown.Although the data used for the computation of Bayesian evidence is different for different SED models, the value of Bayesian evidence clearly decreases with the increasing of the complexity of SED model.This is reasonable.Since the same SED model is employed for the generation of mock data and their Bayesian analysis, the mock data can always be interpreted well.However, a more complicated SED model is penalized for being distributed over a larger space, of (a) (b) Figure 6.As in Figure 4, but for the star-formation rate estimation.The quality of star-formation rate estimation, in terms of σNMAD, BIA and OLF, is slightly better than that of stellar mass estimation, but even more sensitive to the selection of SED models.In the case with noise, for the four most complicated SED models used in this work, the σNMAD, bias and OLF of SFR estimation significantly increases with nlive.This is another even more clear indication of overfitting to the noise in the data.
In general, more complicated SED models lead to worse quality of SFR estimation.
which only a smaller fraction is useful for the given mock data.

RESULTS OF PERFORMANCE TESTS USING HYDRODYNAMICAL SIMULATION-BASED MOCK GALAXY SAMPLE
In this section, we present the results of performance tests of photometric redshift and stellar population parameter estimation by using hydrodynamical simulation-based mock galaxy sample for CSST-like imaging survey.Only the results obtained with the simplest SED model are shown.In §6, we will discuss the effect of more flexible SFH and DAL for CSST-like, CSST+Euclid-like and COSMOS-like surveys, respectively.
As mentioned in §2.2, the generation of this mock galaxy sample accounts for the complex SFH and metal enrichment of Horizon-AGN galaxies, and consistently includes dust attenuation.However, for the Bayesian analysis of this more theoretical mock galaxy sample, we firstly employ the widely used assumptions about SFH (exponentially declining), metal enrichment history (constant but free), and dust attenuation (uniform foreground dust screen with Calzetti et al. (2000) DAL).The results in this section will help us to quantify the systematic errors resulting from these simplified assumptions.
Besides, as mentioned in §2.4,galaxies in the mock sample used here are selected with K s < 24.7 and i + < 25.
In the literature, it is quite common to exclude some pathological cases with a kind of χ 2 selection (Davidzon et al. 2017;Caputi et al. 2015;Laigle et al. 2019) before presenting the results of performance test.However, no such cut was made here because the pathological cases are precisely what we want to investigate.

Photometric redshift estimation
In Figure 8, we investigate the performance of BayeSED combined with the simplest SED model to estimate the photometric redshifts of hydrodynamical simulation-based mock galaxy sample for CSST-like imaging survey.Aa a reference, the panel a of this figure show the ideal case without observational noise and SED modeling errors.So, in this case, the effects of parameter degeneracies, the stochastic nature of Multi-Nest sampling algorithm and other potential errors in the BayeSED code are the sources of error.As shown clearly, the total error from all of these sources is very small.By comparing panels a and b of this figure, with only the observational noise added, the σ NMAD of photometric redshift estimation increases by eight times and the OLF increases by more than forty times.The bias also increases, but is ignorable with respect to σ NMAD .By comparing panels a and c of this figure, with only the error from the imperfect SED modeling added, the σ NMAD of photometric redshift estimation increases by more than three times and the OLF decreases slightly.Besides, there are some additional systematic patterns in the relation between true and estimated values of photometric redshift.The bias also increases and is comparable to σ /rmN M AD .In general, the observational noise is more important source of error for the photometric redshift estimation of galaxies, although the other one is also very important.
Finally, as shown in panel d of Figure 8, when all sources of error are included, the σ NMAD of photometric redshift estimation increases to 0.097, the OLF increases to 0.264, and the bias becomes 0.003.The systematic patterns shown in panel c seems being hidden due to the added noise.The algorithm seems to be struggling to estimate the photometric redshifts correctly by only using the seven-band photometries from CSST-like imaging survey, especially for galaxies with redshift larger than one.

Stellar population parameter estimation
In Figure 9, we investigate the performance of BayeSED combined with the simplest SED model to estimate the stellar population parameters of hydrodynamical simulation-based mock galaxies for CSST-like imaging survey.Aa a reference, the panel a of this figure show the ideal case without observational noise and SED modeling errors.In this ideal case, the σ NMAD , bias and OLF of stellar mass estimation are 0.047, 0.005 and 0.052, respectively.As shown in panel b, with only the observational noise added, the results become 0.115, 0.02 and 0.283, respectively.As shown in panel c, with only the error from the imperfect SED modeling added, the results become 0.103, 0.003 and 0.141, respectively.By comparing panels a and c, the performance of stellar mass estimation is severely affected by the simplified assumptions in the SED modeling.However, by comparing panels b and c, the observational noise is more important source of error for the photometric stellar mass estimation of galaxies, although the other one is also very important.Finally, as shown in panel d of this fig- ure, when all sources of error are included, the σ NMAD of photometric stellar mass estimation increases to 0.135, the bias increases to 0.034, and the OLF increases to 0.341.The algorithm seems to be even more struggling to estimate the photometric stellar mass correctly by only using the seven-band photometries from CSST-like imaging survey.
Similarly, Figure 10 shows the performance of BayeSED combined with the simplest SED model to estimate the SFR of hydrodynamical simulation-based mock galaxy sample for CSST-like imaging survey.Comparing with the results for stellar mass estimation in Figure 9, the photometric SFR estimation is even more severely affected by the simplified assumptions in the SED modeling.Actually, by comparing panels b and c of this figure, it is clear that the error from the imperfect SED modeling is more important source of error for the photometric SFR estimation of galaxies, although the other one is also very important.Finally, it becomes even more struggling to estimate the photometric SFR correctly by only using the seven-band photometries from CSST-like imaging survey.

DISCUSSION
By comparing the results of performance tests for simultaneous photometric redshift and stellar parameters estimation using empirical statistics-based mock galaxy sample ( §4) and hydrodynamical simulation-based mock galaxy sample( §5), especially those presented in Figures 8, 9 and 10, it is clear that the simple typical assumptions about the SFH and DAL of galaxies have severe impact on the performance of photometric parameter estimation of galaxies for CSST-like imaging survey.It is not very surprising, since the SFHs and MEHs of galaxies in the cosmic hydrodynamical simulation, such as Horizon-AGN (Volonteri et al. 2016;Beckmann et al. 2017a;Kaviraj et al. 2017;Beckmann et al. 2017b), are much more complex and diverse (See also Iyer et al. 2020) than the simple assumptions that have been employed in the previous Bayesian analysis of photometric mock data.
In this section, we will discuss the effects of more flexible forms of SFH and DAL on the performance of simultaneous photometric redshift and stellar population parameter estimation of galaxies.As in Han & Han (2012, 2014, 2019) (See also Salmon et al. 2016;Dries et al. 2016Dries et al. , 2018;;Lawler & Acquaviva 2021), we mainly employ the Bayesian model comparison method to compare six different combinations of these model ingredients with increasing complexity (See Table 1 for details).In addition to the CSST-like survey ( §6.1) where only the photometries from seven broad-bands are available, we also discuss the results obtained by using mock data for CSST+Euclid-like ( §6.2) and COSMOS-like surveys ( §6.3) with increasing discriminative power, respectively.

Effects of more flexible SFH and DAL for CSST-like survey
In Table 3, we present a summary of the Bayesian evidences, maximum likelihoods and metrics of the quality of parameter estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for CSST-like survey by employing six different combinations of SFH and DAL with increasing complexity, as well as for the cases with and without noise.The same results are also shown more clearly in Figure 11.

Model comparison
In the case without noise, as shown in the top left panel of Figure 11, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" has the lowest Bayesian evidence of ln(BE) = −88042±5936.With the additional consideration of metallicity evolution, the Bayesian evidence of the model "SFH=τ ,+CEH,DAL=Cal+18" increases to ln(BE) = −74128 ± 5973.Then, with the adoption of the DAL of Salim et al. (2018), the Bayesian evidence of the model "SFH=τ ,+CEH,DAL=Sal+18" increases significantly to ln(BE) = 58878 ± 5823.Apparently, the DAL of Salim et al. ( 2018) is a much better choice than that of Calzetti et al. (2000) for the hydrodynamical simulation-based mock galaxy sampe in CSST-like survey.Furthermore, by employing a more complicated β-τ form of SFH, the Bayesian evidence of the model "SFH=β-τ ,+CEH,DAL=Sal+18" seems decreases a little to ln(BE) = 56343 ± 5780.Actually, the latter two SED models ("SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18") have the largest Bayesian evidences which are comparable within error bar.They are neither the simplest nor the most complex models.With a quenching (or rejuvenation) component added to the SFH, the Bayesian evidence of the model "SFH=β-τ -r,+CEH,DAL=Sal+18" obviously decreases to ln(BE) = 34213 ± 5867.It seems that, while the rejuvenation or rapid quenching events may happen in some galaxies, this additional component of SFH is not very effective for most of the galaxies in the sample.Finally, by employing a even more flexible double power-law form of SFH, the Bayesian evidence of the model "SFH=α-β-τ -r,+CEH,DAL=Sal+18" seems decreases a little to ln(BE) = 33044 ± 5800.However, the latter two SED models are actually comparable within error bar.On the other hand, it is worth to mention that the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.3. A summary of Bayesian evidences (BE), maximum likelihoods (ML) and the metrics of the quality of parameter estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for CSST-like survey by employing six SED models with increasing complexity in the form of SFH and DAL, as well as for the cases with and without noise.
Figure 11.The Bayesian evidences (BE), maximum likelihood (ML) and metrics of photometric redshift (red star), stellar mass (blue square) and SFR (green circle) estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for CSST-like survey by employing six SED models (0:"SFH=τ ,-CEH,DAL=Cal+18", 1:"SFH=τ ,+CEH,DAL=Cal+18", 2:"SFH=τ ,+CEH,DAL=Sal+18", 3:"SFH=β-τ ,+CEH,DAL=Sal+18", 4:"SFH=β-τ -r,+CEH,DAL=Sal+18", 5:"SFH=α-β-τr,+CEH,DAL=Sal+18") with increasing complexity in the forms of SFH and DAL , as well as for the cases with (bottom panels) and without noise (top panels).In the case without noise, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" has the lowest Bayesian evidence of ln(BE) = −88042 ± 5936.Meanwhile, the SED models "SFH=τ ,+CEH,DAL=Sal+18" and "SFH=βτ ,+CEH,DAL=Sal+18" which are neither the simplest nor the most complex models have the largest Bayesian evidences of ln(BE) = 58878 ± 5823 and ln(BE) = 56343 ± 5780 which are comparable within error bar.Interestingly, the same two models also give the highest quality parameter estimates.It is worth to mention that the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.Contrary to the case without noise, in the case with noise, the two simplest SED models "SFH=τ ,-CEH,DAL=Cal+00" and "SFH=τ ,+CEH,DAL=Cal+00" have the largest Bayesian evidences of ln(BE) = −18871 ± 2649 and ln(BE) = −18661 ± 2649 which are comparable within error bar.It is very interesting to notice that the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to exactly the opposite results, where the more complex (or flexible) models are always more favored.Similar to the case without noise, the two simplest SED models also give the highest quality parameter estimates.For photometric redshift, stellar mass and SFR, their accurate estimation becomes increasingly difficult, while the latter two are also more sensitive to the selection of SED models.Generally, the quality of parameter estimates is closely related to the level of Bayesian evidence, which is especially clear in the more realistic case with noise.Actually, the quality of parameter estimation, especially that of stellar mass and SFR estimation, significantly decrease with the increasing of SED model complexity, which is similar to the case with perfect SED modeling as shown in Figures 4, 5 and 6, and should be caused by more severe parameter degeneracies suffered by the more flexible SED model.It is clear that, in the more realistic case with noise, the model selection with maximum likelihood (or equivalently minimum χ 2 ) is not consistent with the measurements of the quality of parameter estimation.Since the direct measurements of the metrics such as NMAD, BIA and OLF are usually unavailable, the Bayesian model comparison with Bayesian evidence can be used to find the best SED model which is not only the most efficient but also give the best parameter estimation.
In the case with noise, as shown in the bottom left panel of Figure 11, the two simplest SED models "SFH=τ ,-CEH,DAL=Cal+00" and "SFH=τ ,+CEH,DAL=Cal+00" have the largest Bayesian evidences of ln(BE) = −18871 ± 2649 and ln(BE) = −18661 ± 2649, respectively.Although the latter which has additionally considered the metallicity evolution seems better, their Bayesian evidences are actually comparable within error bar.Then, with the adoption of the DAL of Salim et al. (2018), the Bayesian evidence of the model "SFH=τ ,+CEH,DAL=Sal+18" decreases significantly to ln(BE) = −26098 ± 2956, exactly the opposite of the situation without noise.It is likely that the more complicated form of DAL does not give a better fit to the noisier data.By employing a more complicated β-τ form of SFH, the Bayesian evidence of the model "SFH=β-τ ,+CEH,DAL=Sal+18" seems decreases further to ln(BE) = −28141 ± 2904, although comparable with the former within error bar.
With a quenching (or rejuvenation) component added to the SFH, the Bayesian evidence of the model "SFH=β-τ -r,+CEH,DAL=Sal+18" obviously decreases to ln(BE) = −35048 ± 3020, which is similar to the case without noise.Finally, by employing a even more flexible double power-law form of SFH, the Bayesian evidence of the model "SFH=αβ-τ -r,+CEH,DAL=Sal+18" seems increases a little to ln(BE) = −33983 ± 2869, although comparable with the former within error bar.On the other hand, it is very interesting to notice that the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to exactly the opposite results, where the more complex (or flexible) models are always more favored.

Parameter estimation
In the right three panels of Figure 11, we show the three metrics of the quality of photometric redshift, stellar mass and SFR estimation for different SED models, respectively.In general, in the case without noise, the SED models "SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18" which are just the two with largest Bayesian evidence give the highest quality parameter estimates.In the case with noise, the two simplest SED models "SFH=τ ,-CEH,DAL=Cal+00" and "SFH=τ ,+CEH,DAL=Cal+00" which are also the two with the largest Bayesian evidences give the highest quality parameter estimates.In the following, we discuss these results in more detail.
The detailed results of photometric redshift estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 12.In the case without noise, by comparing the results in panel c of Figure 8 with that in panel a of Figure 12, it is clear that, with the additional consideration of metallicity evolution and the adoption of the DAL of Salim et al. (2018), the σ NMAD of photometric redshift estimation is obviously reduced while the bias and OLF are only slightly increased.Meanwhile, the systematic patterns in the former results are also largely reduced.However, as shown in panel c of Figure 12, by additionally employing a more complicated β-τ form of SFH, the σ NMAD of photometric redshift estimation is only slightly reduced while the bias and OLF are exactly the same.Besides, as shown in Table 3 and Figure 11, the other two even more complicated forms of SFH lead to similar quality of photometric redshift estimation.In the case with noise, the best two models are quite similar in the quality of photometric redshift estimation.Besides, with the increasing of the complexity of SED models, the quality of photometric redshift estimation tend to decrease, although not very significantly.
The detailed results of photometric stellar mass estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 13.In the case without noise, by comparing the results in panel c of Figure 9 with that in panel a of Figure 13, it is clear that, with the additional consideration of metallicity evolution and the adoption of the DAL of Salim et al. (2018), the σ NMAD of photometric stellar mass estimation is reduced, although the bias and OLF are somewhat increased.By additionally employing a more complicated β-τ form of SFH, as shown in panel c of Figure 13, the quality of photometric stellar mass estimation increase further.However, as shown in Table 3 and Figure 11, with the increasing of the complexity of SED models, the quality of photometric stellar mass stimation decreases obviously.In the case with noise, the best two models are exactly the same in the quality of photometric stellar mass estimation.Meanwhile, with the increasing of the complexity of SED models, the quality of photometric stellar mass stimation decreases even more obviously.
The detailed results of photometric SFR estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 14.In the case without noise, by comparing with the results in panel c of Figure 10 with that in panel a of Figure 14, it is clear that, with the additional consideration of metallicity evolution and the adoption of the DAL of Salim et al. (2018), the systematic bias and OLF of photometric redshift estimation are largely reduced while the σ NMAD is slightly increased.By additionally employing a more complicated β-τ form of SFH, as shown in panel c of Figure 14, the quality of photometric SFR estima- tion increase to the best.However, as shown in Table 3 and Figure 11, with a additional quenching (or rejuvenation) component, the β-τ -r form of SFH lead to a much worse quality of photometric SFR estimation.Finally, the most complicated α-β-τ -r form of SFH lead to a slightly better SFR estimation.In the case with noise, the best two models are also very similar in the quality of photometric stellar mass estimation.Meanwhile, with the increasing of the complexity of SED models, the quality of photometric SFR stimation increases significantly.
For photometric redshift, stellar mass and SFR, their accurate estimation becomes increasingly difficult.Besides, the latter two are also more sensitive to the selection of SED models.Generally, the quality of parameter estimates is closely related to the level of Bayesian evidence, which is especially clear in the more realistic case with noise.Meanwhile, the model selection with maximum likelihood (or equivalently minimum χ 2 ), where the more complex (or flexible) models are always more favored, is not consistent with the measurements of the quality of parameter estimation.In practice, the direct measurements of the quality of parameter estimation as indicated by NMAD, BIA and OLF are usually unavailable.So, the Bayesian model comparison with Bayesian evidence can be used to find the best SED model which is not only the most efficient but also give the best parameter estimation.

Effects of more flexible SFH and DAL for
CSST+Euclid-like survey The results of both model comparison and parameter estimation are strongly dependent on the used datasets which may have very different discriminative powers.In Figure 15, we show an example of 1D and 2D posterior probability distribution functions (PDFs) of free parameters obtained from the Bayesian analysis of the photometric data of a mock galaxy in the CSST-like, CSST-like+Euclid-like, and COSMOS-like surveys, respectively.It is clear that different datasets lead to very different PDFs, due to their very different discriminative powers.In this section and §6.3, we discuss the effects of more flexible SFH and DAL for CSST+Euclid-like and COSMOS-like surveys, respectively.The addition of Euclid data extends the wavelength coverage of the data to the longer NIR band than with CSST-only data, which should be useful for enhancing the discriminative power of model comparison and the quality of the parameter estimation.
In Table 4, we present a summary of the Bayesian evidences, maximum likelihoods and metrics of the quality of parameter estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for CSST+Euclid-like survey by employing six different combinations of SFH and DAL with increasing complexity, as well as for the cases with and without noise.The same results are also shown more clearly in Figure 16.

Model comparison
In the case without noise, as shown in the top left panel of Figure 16, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" has the lowest Bayesian evidence of ln(BE) = −243527 ± 6307.With the additional consideration of metallicity evolution, the Bayesian evidence of the model "SFH=τ ,+CEH,DAL=Cal+18" increases to ln(BE) = −216613 ± 6390.Then, with the adoption of the DAL of Salim et al. (2018), the Bayesian evidence of the model "SFH=τ ,+CEH,DAL=Sal+18" increases significantly to ln(BE) = 72092 ± 6475.Apparently, the DAL of Salim et al. (2018) is also a much better choice than that of Calzetti et al. (2000) for the hydrodynamical simulation-based mock galaxy sampe in CSST+Euclid-like survey.Furthermore, by employing a more complicated β-τ form of SFH, the Bayesian evidence of the model "SFH=β-τ ,+CEH,DAL=Sal+18" seems decreases a little to ln(BE) = 70340 ± 6409.As in the case for CSST-like survey, the latter two SED models ("SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18") have the largest Bayesian evidences which are comparable within error bar.With a quenching (or rejuvenation) component added to the SFH, the Bayesian evidence of the model "SFH=β-τ -r,+CEH,DAL=Sal+18" decreases significantly to ln(BE) = 25728 ± 6566.Finally, by employing a even more flexible double power-law form of SFH, the Bayesian evidence of the model "SFH=αβ-τ -r,+CEH,DAL=Sal+18" seems decreases a little to ln(BE) = 19560 ± 6454 which is comparable with the former within error bar.On the other hand, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.
In the case with noise, the two simplest SED models "SFH=τ ,-CEH,DAL=Cal+00" and "SFH=τ ,+CEH,DAL=Cal+00" have the largest Bayesian evidences of ln(BE) = −31783 ± 3314 and ln(BE) = −31347 ± 3302, respectively.Then, with the adoption of the DAL of Salim et al. (2018), the Bayesian evidence of the model "SFH=τ ,+CEH,DAL=Sal+18" decreases significantly to ln(BE) = −38081 ± 3529.By employing a more complicated β-τ form of SFH, the Bayesian evidence of the model "SFH=βτ ,+CEH,DAL=Sal+18" seems decreases further to ln(BE) = −41164±3437, although comparable with the  An example of 1D and 2D posterior probability distribution functions of free parameters obtained for the Bayesian analysis of a mock galaxy with CSST-like (grey), CSST+Euclid-like (red), and COSMOS-like (blue) photometric data, respectively.The contours show the 1σ, 2σ, and 3σ confidence regions, while the red dash lines show the ground truth values of each parameter.It is clear that the parameters are more tightly constrained and some degeneracies between them have been broken when using datasets with increasing discriminative powers.
The Bayesian evidences (BE), maximum likelihood (ML) and metrics of photometric redshift (red star), stellar mass (blue square) and SFR (green circle) estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for CSST+Euclid-like survey by employing six SED models (0:"SFH=τ ,-CEH,DAL=Cal+18", 1:"SFH=τ ,+CEH,DAL=Cal+18", 2:"SFH=τ ,+CEH,DAL=Sal+18", 3:"SFH=βτ ,+CEH,DAL=Sal+18", 4:"SFH=β-τ -r,+CEH,DAL=Sal+18", 5:"SFH=α-β-τ -r,+CEH,DAL=Sal+18") with increasing complexity in the forms of SFH and DAL , as well as for the cases with (bottom panels) and without noise (top panels).In the case without noise, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" has the lowest Bayesian evidence of ln(BE) = −243527±6307.Meanwhile, the SED models "SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18" which are neither the simplest nor the most complex models have the largest Bayesian evidences of ln(BE) = 72092 ± 6475 and ln(BE) = 70340 ± 6409 which are comparable within error bar.As in the case for CSST-like survey, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.Contrary to the case without noise, in the case with noise, the two simplest SED models "SFH=τ ,-CEH,DAL=Cal+00" and "SFH=τ ,+CEH,DAL=Cal+00" have the largest Bayesian evidences of ln(BE) = −31783 ± 3314 and ln(BE) = −31347 ± 3302 which are comparable within error bar.However, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to exactly the opposite results, where the more complex (or flexible) models are always more favored.Generally, the quality of parameter estimates is closely related to the level of Bayesian evidence, which is especially clear in the more realistic case with noise.The model selection with Bayesian evidence is still more consistent with the measurements of the quality of parameter estimation than that with maximum likelihood (or equivalently minimum χ 2 ).For photometric redshift, stellar mass and SFR, their accurate estimation becomes increasingly difficult, while the latter two are also more sensitive to the selection of SED models.In the case without noise, the SED models "SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18" which are just the two with the largest Bayesian evidence give the highest quality parameter estimates.In the case with noise, the two simplest SED models "SFH=τ ,-CEH,DAL=Cal+00" and "SFH=τ ,+CEH,DAL=Cal+00" which are also the two with the largest Bayesian evidences give the highest quality parameter estimates.All of these results are very similar to that for CSST-like survey.However, the relative error of Bayesian evidences have been reduced, especially in the case without noise.Besides, the quality of parameter estimation, especially that of stellar mass estimation, has been significantly improved.Furthermore, the quality of parameter estimation, especially that of stellar mass, increases more slowly with the increasing of SED model complexity.
former within error bar.With a quenching (or rejuvenation) component added to the SFH, the Bayesian evidence of the model "SFH=β-τ -r,+CEH,DAL=Sal+18" obviously decreases to ln(BE) = −49796 ± 3561, which is similar to the case without noise.Finally, by employing a even more flexible double power-law form of SFH, the Bayesian evidence of the model "SFH=αβ-τ -r,+CEH,DAL=Sal+18" seems increases a little to ln(BE) = −49608±3426, although comparable with the former within error bar.However, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to exactly the opposite results, where the more complex (or flexible) models are always more favored.
In general, all of these results are very similar to that for CSST-like survey.The model selection with Bayesian evidence is still more consistent with the measurements of the quality of parameter estimation.However, the relative error of Bayesian evidences have been reduced, especially in the case without noise.

Parameter estimation
In the right three panels of Figure 16, we show the three metrics of the quality of photometric redshift, stellar mass and SFR estimation for different SED models, respectively.In general, in the case without noise, the SED models "SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18" which are just the two with largest Bayesian evidence give the highest quality parameter estimates.In the case with noise, the two simplest SED models "SFH=τ ,-CEH,DAL=Cal+00" and "SFH=τ ,+CEH,DAL=Cal+00" which are also the two with the largest Bayesian evidences give the highest quality parameter estimates.In the following, we discuss these results in more detail.
The detailed results of photometric redshift estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 17.By comparing with the results for CSST-like survey in Figure 12, it is clear that the quality of photometric redshift estimation has been obviously increased in both the cases with and without noise.Especially, in the more realistic case with noise, the outliers caused by the mis-identification of Lyman and Balmer break features have been largely reduced.This suggest that, the inclusion of J, H and Y bands from Euclid is helpful for improving the photometric redshift estimation.However, the more complicated β-τ form of SFH is not very helpful for improving the quality of photometric redshift estimation.
The detailed results of photometric stellar mass estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 18.By comparing with the results for CSST-like survey in Fig- ure 13, it is clear that the quality of photometric stellar mass estimation has been significantly improved in both the cases with and without noise.Apparently, the inclusion of J, H and Y bands from Euclid is crucial for a more accurate estimation of stellar mass.With the more complicated β-τ form of SFH, the quality of photometric stellar mass estimation is slightly improved.However, as shown in Table 4 and Figure 16, the even more complicated forms of SFH is still not helpful for improving the quality of photometric stellar mass estimation.
The detailed results of photometric SFR estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 19.In the case without noise, by comparing with the results for CSSTlike survey in Figure 14, it is clear that the σ NMAD and OLF of photometric SFR estimation are largely reduced, although the bias is slightly increased.However, in the case with noise, all of the σ NMAD , bias and OLF of photometric SFR estimation slightly increase.So, the inclusion of J, H and Y bands from Euclid is not very helpful for improving the photometric SFR estimation.

Effects of more flexible SFH and DAL for COSMOS-like survey
The COSMOS-like survey covers many more bands than CSST-like data.Although there is no NUV data, it extends to the longer wavelengths and includes some intermediate bands (IBs) (Laigle et al. 2016).Since the COSMOS-like mock data has much stronger discriminative power than CSST-like and CSST+Euclid-like mock data, the Bayesian evidence of different SED models should show much larger difference, and the photometric redshift and stellar population parameter estimation should be better.
In Table 5, we present a summary of the Bayesian evidences, maximum likelihoods and metrics of the quality of parameter estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for COSMOS-like survey by employing six different combinations of SFH and DAL with increasing complexity, as well as for the cases with and without noise.The same results are also shown more clearly in Figure 20.

Model comparison
In the case without noise, as shown in the top left panel of Figure 20, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" has the lowest Bayesian evidence of ln(BE) = −1710604 ± 6811.With the additional consideration of metallicity evolution, the Bayesian evidence of the model "SFH=τ ,+CEH,DAL=Cal+18" increases to ln(BE) = −1499618 ± 6997.Then, with the adoption of the DAL of Salim et al. (2018), the Bayesian   The Bayesian evidences (BE), maximum likelihood (ML) and metrics of photometric redshift (red star), stellar mass (blue square) and SFR (green circle) estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for COSMOS-like survey by employing six SED models (0:"SFH=τ ,-CEH,DAL=Cal+18", 1:"SFH=τ ,+CEH,DAL=Cal+18", 2:"SFH=τ ,+CEH,DAL=Sal+18", 3:"SFH=βτ ,+CEH,DAL=Sal+18", 4:"SFH=β-τ -r,+CEH,DAL=Sal+18", 5:"SFH=α-β-τ -r,+CEH,DAL=Sal+18") with increasing complexity in the forms of SFH and DAL , as well as for the cases with (bottom panels) and without noise (top panels).In the case without noise, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" has the lowest Bayesian evidence of ln(BE) = −1710604±6811.Meanwhile, the SED model "SFH=β-τ ,+CEH,DAL=Sal+18" which is neither the simplest nor the most complex models has the largest Bayesian evidence of ln(BE) = 387009 ± 7273, and give the highest quality parameter estimates.As in the cases for CSST-like and CSST+Euclid-like surveys, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.In the case with noise, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" still has the lowest Bayesian evidence of ln(BE) = 99881 ± 4881.Meanwhile, the SED models "SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18" have the largest Bayesian evidences of ln(BE) = 131578 ± 5115 and ln(BE) = 130766 ± 5064 which are comparable within error bar.Unlike the cases for CSST-like and CSST+Euclid-like surveys, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.The two models with the largest Bayesian evidences (or maximum likelihoods) also give the highest quality parameter estimates.In general, in the cases with and without noise, the same and more clear results of model comparison are obtained.Meanwhile, in the more realistic case with noise, the more complicated SED models are more favored than in the cases for CSST-like and CSST+Euclid-like survey.All of these are the natural results of the much stronger discriminative power of the COSMOS-like survey than the CSST-like and CSST+Euclid-like surveys.
evidence of the model "SFH=τ ,+CEH,DAL=Sal+18" increases significantly to ln(BE) = 354581 ± 7310.Apparently, the DAL of Salim et al. (2018) is also a much better choice than that of Calzetti et al. (2000) for the hydrodynamical simulation-based mock galaxy sampe in COSMOS-like survey.Furthermore, by employing a more complicated β-τ form of SFH, the Bayesian evidence of the model "SFH=β-τ ,+CEH,DAL=Sal+18" increases further to ln(BE) = 387009 ± 7273.However, with a quenching (or rejuvenation) component added to the SFH, the Bayesian evidence of the model "SFH=β-τ -r,+CEH,DAL=Sal+18" significantly decreases to ln(BE) = 255069 ± 7552, which is similar to the case without noise.Finally, by employing a even more flexible double power-law form of SFH, the Bayesian evidence of the model "SFH=α-β-τr,+CEH,DAL=Sal+18" increases to ln(BE) = 275129± 7218.As in the cases for CSST-like and CSST+Euclidlike surveys, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.
In the case with noise, as shown in the bottom left panel of Figure 20, almost the same conclusions about SED model comparison are obtained, although the detailed values of Bayesian evidence are apparently different.Unlike the cases for CSST-like and CSST+Euclidlike surveys, the model selection with Bayesian evidence and maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.In general, for COSMOS-like survey, we can obtain more clear results of SED model comparison.Meanwhile, in the more realistic case with noise, the more complicated SED models are more favored than in the cases for CSST-like and CSST+Euclidlike survey.This is reasonable, since COSMOS-like survey has much stronger discriminative power than CSSTlike and CSST+Euclid-like surveys.

Parameter estimation
In the right three panels of Figure 20, we show the three metrics of the quality of photometric redshift, stellar mass and SFR estimation for different SED models, respectively.In the cases with and without noise, the same SED model "SFH=β-τ ,+CEH,DAL=Sal+18" which is just the one with largest Bayesian evidence give the highest quality parameter estimates.In the following, we discuss these results in more detail.
The detailed results of photometric redshift estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 21.By comparing with the results for CSST-like survey in Figure 12 and that for for CSST+Euclid-like survey in Figure 17, it is clear that the quality of photometric redshift estimation has been significantly increased in both the cases with and without noise.Meanwhile, in both cases, the best two SED models given identical quality of photometric redshift estimation.In the case without noise, the bias and OLF of photometric redshift estimation are almost zero while the σ NMAD is only 0.002.In this case, the errors from parameter degeneracy and SED model error should be largely reduced.This suggest that the contribution of errors from the stochastic nature of MultiNest sampling algorithm and other potential errors in the BayeSED code should be less than 0.002.In the more realistic case with noise, the outliers caused by the mis-identification of Lyman and Balmer break features have been largely resolved.Finally, as in the cases for CSST-like and CSST+Euclid-like surveys, the more complicated SED models are not very helpful for improving the quality of photometric redshift estimation.
The detailed results of photometric stellar mass estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 22.By comparing with the results for CSST+Euclid-like survey in Figure 18, it is clear that the quality of photometric stellar mass estimation has been improved further in both the cases with and without noise.Besides, the more complicated β-τ form of SFH makes the quality of photometric stellar mass estimation a little better.However, as shown in Table 5 and Figure 16, the even more complicated forms of SFH make the quality of photometric stellar mass estimation worse.
The detailed results of photometric SFR estimation obtained by employing the two models with the largest Bayesian evidences are shown in Figure 23.By comparing with the results for CSST+Euclid-like survey in Figure 18, it is clear that the quality of photometric SFR estimation has been significantly improved in both the cases with and without noise.Besides, the more complicated β-τ form of SFH makes the quality of photometric SFR estimation much better.However, as shown in 20, with a quenching (or rejuvenation) component added to the SFH, the quality of photometric SFR estimation becomes obviously worse.Finally, by employing a even more flexible double power-law form of SFH, the quality of photometric SFR estimation becomes a little better.

SUMMARY AND CONCLUSION
In this work, based on the Bayesian SED synthesis and analysis techniques employed in the BayeSED-V3 code, we present a comprehensive and systematic test of its performance for simultaneous photometric redshift and stellar population parameter estimation of galaxies when combined with six SED models with increasing complexity in the form of SFH and DAL.The main purpose is to make a systematic analysis of various factors affecting the simultaneous photometric redshift and stellar population parameter estimation of galaxies in the context of Bayesian SED fitting, so as to provide clues for further improvement.
To separate the different factors which could contribute to the errors of photometric redshift and stellar population parameter estimation of galaxies, the empirical statistics-based and hydrodynamical simulationbased approaches have been employed to generate mock photometric sample of galaxies with or without noise for CSST-like, COSMOS-like and CSST+Euclid-like surveys, respectively.We compare the difference in performance of photometric parameter estimation with different run parameters of Bayesian analysis algorithm, different assumptions about the SFH and DAL of galaxies, and different observational datasets.Our main findings are as follows.
For the performance tests using empirical statisticsbased mock galaxy sample with idealized SED modeling: 1.The performance of photometric redshift and stellar population parameter estimation, in terms of speed and quality, is sensitive to the runtime parameters (the target sampling efficiency ef r and the number of live points nlive) of MultiNest algorithm.
2. A good balance among the speed, quality of parameter estimation, and accuracy of model comparison can be achieved when adopting the Multi-Nest runtime parameters ef r equals to 0.1 and nlive equals to 50.
3. By employing the optimized runtime parameters of MultiNest and simplest SED modeling, a speed of ∼ 2s/obj/cpu (∼ 10s/obj/cpu) can be achieved for a detailed Bayesian analysis of photometries from CSST-like survey, which is sufficient for the analysis of massive photometric data.Meanwhile, a quality of photometric redshift estimation with σ NMAD = 0.056, BIA = −0.0025,OLF = 0.215, a quality of photometric stellar mass estimation with σ NMAD = 0.113, BIA = −0.025,OLF = 0.285, and a quality of photometric SFR estimation with σ NMAD = 0.08, BIA = −0.01,OLF = 0.255 can be achieved.
4. With the optimized runtime parameters of Multi-Nest, the value of Bayesian evidence which is crucial for Bayesian model comparison can also be well estimated, although the error of Bayesian evidence tends to be overestimated, which may lead to a more conservative conclusion about model comparison.
5. The random observational errors in photometries are more important sources of errors than the parameter degeneracies and Bayesian analysis method and tool.
6.More complicated SED models apparently require longer running time.They also tend to overfit noisy photometries and lead to worse quality of photometric redshift, stellar mass and SFR estimation, which is likely due to more free parameters and more severe parameter degeneracies.
7. The value of Bayesian evidence clearly decreases with the increasing of the complexity of the SED model in both of the cases with and without noise.
For the performance tests using hydrodynamical simulation-based mock galaxy sample without idealized SED modeling: 1.The commonly used simple assumptions about the SFH and DAL of galaxies have severe impact on the quality of photometric parameter estimation of galaxies, especially for CSST-like survey with only photometries from seven broad-bands.
2. The performance of both Bayesian parameter estimation and model comparison highly depends on the discriminative power of the observational photometries.With more informative photometries, more clear results about SED model comparison and higher quality of photometric parameter estimation can be obtained.
3. While the SED model comparison with Bayesian evidence may favor SED models with very different complexities when using photometries from different surveys, the maximum likelihood (or equivalently minimum χ 2 ) tend to favor more complex models.For photometries with strong enough discriminative power, the two methods lead to more consistent results.However, for photometries without strong enough discriminative power, the two methods may lead to contradictory results.In both cases, the results of model selection with Bayesian evidence are more consistent with the measurements of the quality of parameter estimation.
4. In both of the cases with and without noise, the additional consideration of metallicity evolution helps to improve the quality of photometric redshift and stellar parameter estimation of galaxies, and increases the Bayesian evidence of corresponding SED model.
5. In the case without noise, the DAL of Salim et al. ( 2018) is a much better choice than that of Calzetti et al. (2000) for the hydrodynamical simulation-based mock galaxy sampe in CSSTlike, CSST+Euclid-like and COSMOS-like surveys.However, in the more realistic case with noise, it is only more favored in the COSMOS-like survey with Bayesian evidence-based model selection.With maximum likelihood (or equivalently minimum χ 2 )-based model selection, it could be more favored, but lead to worse parameter estimation.
6.In the case without noise, the more flexible forms of SFH lead to better quality of parameter estimation and increase the Bayesian evidence of corresponding SED model.However, in the more realistic case with noise, they are only more favored in the COSMOS-like survey.
7. With a quenching (or rejuvenation) component added to the SFH, the quality of parameter estimation and the Bayesian evidence of corresponding SED model decrease in all cases.Although the rejuvenation or rapid quenching events may happen in some galaxies, this additional component of SFH is not very effective for most of the galaxies in the hydrodynamical simulation-based mock galaxy sample.
8. The quality of parameter estimation is closely related to the level of Bayesian evidence such that the SED model with largest Bayesian evidence tends to give the best quality of parameter estimation, which is more clear for photometries with larger discriminative power.By using photometries without strong enough discriminative power, the quality of parameter estimation, especially that of stellar mass and SFR estimation, tend to decrease with the increasing of SED model complexity 9. Since the direct measurements of the quality of parameter estimation as indicated by NMAD, BIA and OLF are usually unavailable, the Bayesian model comparison with Bayesian evidence can be used to find the best SED model which is not only the most efficient but also give the best parameter estimation.
10.For photometric redshift, stellar mass and SFR, their accurate estimation becomes increasingly difficult, while the latter two are also more sensitive to the selection of SED models.
11.For the photometric redshift estimation of galaxies in CSST-like survey, the observational noise is the more important source of error than the imperfect SED modeling.However, for the photometric stellar mass and SFR estimation of galaxies, the opposite is true.
12. The combination of photometries from CSST-like and Euclid-like surveys is helpful for improving the quality of photometric redshift estimation and crucial for the more accurate stellar mass estimation, but not very useful for SFR estimation.
13.With photometries in 26 bands from COSMOSlike surveys, by employ the same SED model, BayeSED-V3 can achieve similar quality of photometric redshift, stellar mass and SFR estimation to previous works.Besides, with photometries in 26 bands from COSMOS-like surveys, more complicated SED models tend to be more favored, which is very different from the two cases with only photometries from CSST-like (7 bands) or CSST+Euclid-like (10 bands) surveys.
We conclude that the latest version of BayeSED is capable of achieving a good balance among speed, the quality of simultaneous photometric redshift and stellar population parameter estimation of galaxies and the reliable SED model comparison.This makes it suitable for the analysis of existing and forthcoming massive photometric data of galaxies in CSST wide-field multiband imaging survey and others.
Generally, the current main bottleneck that limits the performance of the Bayesian approach for the simultaneous photometric redshift and stellar population parameter estimation of galaxies is the reliability of the SED synthesis (or modeling) procedure.Assuming a more flexible model is not a complete solution.We need a SED model that is not only more flexible but also more precisely accurate.It can be achieved by gradually adding more informative priors and physical constraints to the SED synthesis (modeling) procedure of galaxies, which is the subject of future works.The Bayesian model selection method with Bayesian evidence, a quantified Occam's razor, is very helpful to identify the best SED model which is not only the most efficient but also give the best parameter estimation.
The results about simultaneous photometric redshift and stellar population estimation presented in this work are not yet optimal, especially those about CSST.The contributions of nebular lines and continuum emission to the SED, which may help break some parameter degeneracies, are still mising in this work.It is also worth to mention that the results of Bayesian SED model comparison and the metrics (OLF, BIA or NMAD) of parameter estimation highly depend on the selected samples.In this paper, we have chosen a relative broader sample to test the overall performance in the CSST wide-field imaging survey.For a differently selected sample which is designed for answering a more specific scientific question, the results could be different.
Finally, in addition to multi-bands photometries, we may need more informations from other forms of data, such as slitless spectroscopy and morphology parameters from the imaging to break the severe parameter degeneracies.More advanced methods may be able to take advantage of all information to give better redshift and/or stellar population parameter estimation of galaxies.These will be the subjects of future works as well.

Figure 1 .
Figure 1.The joint distributions of redshift and physical parameters of the empirical statistics-based mock galaxy population produced with BayeSED combined with SED models of different complexity.With the same set of empirical statistics, different SED models lead to slightly different redshift and age distribution, which is likely due to different mapping relations from free parameters to derived parameters.Meanwhile, only the two SED models with a quenching component produce a clear region of quiescent galaxies below the star-forming main sequence.

Figure 2 .
Figure 2. (a) Response functions for CSST and Euclid bands.(b) The modeled relation between magnitude and magnitude error for CSST bands.Sources with SN R < 1 (i.e.σm,i > 1.08574) are considered as non-detections.The non-detections in a wavelength band i with Nσ flux limit F lim,i and magnitude limit m lim,i are represented as sources with Fi = F lim,i and σF,i = −F lim,i /N (the flux case), or mi = m lim,i and σm,i = −1.08574/N(the magnitude case).These conventions make sure the consistent conversion between flux data and magnitude data in the input file of BayeSED.(c), (d): Same as in (a), (b), but for COSMOS bands.For clarity, the twelve intermediate bands (IBs) and two narrow bands (NBs) are not shown.

Figure 3 .
Figure 3.The comparison of redshift and magnitude distributions of the empirical statistics-based and the Horizon-AGN hydrodynamical simulation-based mock galaxy samples.
Figure 4. Performance test with empirical statistics-based mock galaxy sample for the photometric redshift estimation of galaxies in the CSST wide-field multiband imaging survey.(a) The results for only the simplest SED model (SFH=τ ,-CEH,DAL=Cal+00) employed in this work.We have considered six different choices of ef r (target sampling efficiency) as shown by different symbols.For each ef r, we have considered eight cases with the number of live points (nlive), which determines the effective sampling resolution, equals to 10, 15, 20, 25, 50, 100, 200, and 400, respectively.The relations between the computation time (in sec/obj/cpu, by employing one core of a 2.2 GHz cpu) and the performance metrics σNMAD (top panels), BIA (middle panels), OLF (bottom panels) are shown respectively.The results for the two cases with (left panels) and without (right panels) observational noise in the mock data are shown respectively.In general, larger value of nlive and smaller value of ef r lead to better quality of redshift estimation, but with the cost of longer running time.(b) The results for six different SED models with increasing complexity.Only the results with ef r = 0.1 are shown.In general, more complicated SED models require longer running time.They also lead to worse quality of photometric redshift estimation, which is likely due to more severe parameter degeneracies.Actually, for the last four more complicated SED models with the DAL of Salim et al. (2018), the number of free parameter is greater than the number of photometric data points (7 for CSST imaging survey), as shown in Table1.

Figure 5 .
Figure5.As in Figure4, but for the stellar mass estimation.The quality of stellar mass estimation, in terms of σNMAD, BIA and OLF, is worse than that of redshift estimation and more sensitive to the selection of SED models.In the case with noise, for the most complicated SED model (SFH=α-β-τ -r,+CEH,DAL=Sal+18) used in this work, the σNMAD, bias and OLF of stellar mass estimation increases significantly when nlive > 100.This should be a clear indication of overfitting to the noise in the data.In general, more complicated SED models lead to worse quality of stellar mass estimation.

Figure 7 .
Figure 7. Performance test of Bayesian evidence estimation with empirical statistics-based mock galaxy sample for CSST imaging survey.The relations between the computation time (in sec/obj by employing a single 2.2 GHz cpu core) and the value of the natural logarithm of Bayesian evidence (as computed with Equation 31) and its error (bottom panels) for the whole galaxy sample are shown.The results for two versions of Bayesian evidence with (top panels) or without (middle panels) importance sampling have been shown.(a) The results for only the simplest SED model (SFH=τ ,-CEH,DAL=Cal+00) employed in this work.We have considered six different choices of ef r (target sampling efficiency) as shown by different symbols.The results for seven cases with nlive equals to 15, 20, 25, 50, 100, 200, and 400 are shown respectively.The left panels show the results with noisy data while the right panels show the results with noise-free data.(b) The results for six different SED models with increasing complexity.Only the results with ef r = 0.1 are shown.The value of Bayesian evidence clearly decreases with the increasing of the complexity of the SED model.

Figure 9 .Figure 10 .
Figure 8.The results of photometric redshift estimation with (righ panels) and without (left panels) noise, and for the analysis of the empirical statistics-based (top panels) and hydrodynamical simulation-based (bottom panels) mock galaxy sample, respectively.The error from imperfect SED modeling will only present for the analysis of the hydrodynamical simulation-based mock galaxy sample.The photometric redshifts (z phot ) are estimated by employing the τ model of SFH without consideration of metallicity evolution and the Calzetti et al. (2000) model of DAL.The red solid line indicate the identity while the red dotted lines indicate the outlier limits, i.e. |z phot − ztrue| /(1 + ztrue) > 0.15.Here, we show the results obtained with the MultiNest runtime parameters ef r equals to 0.1 and nlive equals to 50.In general, the observational noise is the more important source of error for the photometric redshift estimation of galaxies, and the contribution from imperfect SED modeling is also very important.

Figure 12 .Figure 13 .Figure 14 .
Figure 12.The results of photometric redshift estimation from the Bayesian analysis of the hydrodynamical simulationbased mock data for CSST-like imaging survey by employing the two SED models with the largest Bayesian evidence.(a) By comparing with the results in panel c of Figure 8, it is clear that, with the additional consideration of metallicity evolution and the adoption of the DAL of Salim et al. (2018), the σNMAD of photometric redshift estimation is obviously reduced, although the bias and OLF are slightly increased.Meanwhile, the systematic patterns in the former results are also largely reduced.(c) By additionally employing a more complicated β-τ form of SFH, the σNMAD of photometric redshift estimation is only slightly reduced while the bias and OLF are exactly the same.(b, d) In the case with noise, the best two models are quite similar in the quality of photometric redshift estimation.Besides, there are two clear branches of outliers caused by the mis-identification of Lyman and Balmer break features.
Figure 15.An example of 1D and 2D posterior probability distribution functions of free parameters obtained for the Bayesian analysis of a mock galaxy with CSST-like (grey), CSST+Euclid-like (red), and COSMOS-like (blue) photometric data, respectively.The contours show the 1σ, 2σ, and 3σ confidence regions, while the red dash lines show the ground truth values of each parameter.It is clear that the parameters are more tightly constrained and some degeneracies between them have been broken when using datasets with increasing discriminative powers.

Figure 17 .Figure 18 .Figure 19 .
Figure17.The results of photometric redshift estimation from the Bayesian analysis of the hydrodynamical simulationbased mock data for CSST+Euclid-like survey by employing the two SED models with the largest Bayesian evidence.By comparing with the results for CSST-like survey in Figure12, it is clear that the quality of photometric redshift estimation has been obviously increased in both the cases with and without noise.Especially, in the more realistic case with noise, the outliers caused by the mis-identification of Lyman and Balmer break features have been largely reduced.This suggest that, the inclusion of J, H and Y bands from Euclid is helpful for improving the photometric redshift estimation.However, the more complicated β-τ form of SFH is not very helpful for improving the quality of photometric redshift estimation.
Figure 20.The Bayesian evidences (BE), maximum likelihood (ML) and metrics of photometric redshift (red star), stellar mass (blue square) and SFR (green circle) estimation from the Bayesian analysis of the hydrodynamical simulation-based mock galaxy sampe for COSMOS-like survey by employing six SED models (0:"SFH=τ ,-CEH,DAL=Cal+18", 1:"SFH=τ ,+CEH,DAL=Cal+18", 2:"SFH=τ ,+CEH,DAL=Sal+18", 3:"SFH=βτ ,+CEH,DAL=Sal+18", 4:"SFH=β-τ -r,+CEH,DAL=Sal+18", 5:"SFH=α-β-τ -r,+CEH,DAL=Sal+18") with increasing complexity in the forms of SFH and DAL , as well as for the cases with (bottom panels) and without noise (top panels).In the case without noise, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" has the lowest Bayesian evidence of ln(BE) = −1710604±6811.Meanwhile, the SED model "SFH=β-τ ,+CEH,DAL=Sal+18" which is neither the simplest nor the most complex models has the largest Bayesian evidence of ln(BE) = 387009 ± 7273, and give the highest quality parameter estimates.As in the cases for CSST-like and CSST+Euclid-like surveys, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.In the case with noise, the simplest model "SFH=τ ,-CEH,DAL=Cal+18" still has the lowest Bayesian evidence of ln(BE) = 99881 ± 4881.Meanwhile, the SED models "SFH=τ ,+CEH,DAL=Sal+18" and "SFH=β-τ ,+CEH,DAL=Sal+18" have the largest Bayesian evidences of ln(BE) = 131578 ± 5115 and ln(BE) = 130766 ± 5064 which are comparable within error bar.Unlike the cases for CSST-like and CSST+Euclid-like surveys, the model selection with maximum likelihood (or equivalently minimum χ 2 ) lead to similar results.The two models with the largest Bayesian evidences (or maximum likelihoods) also give the highest quality parameter estimates.In general, in the cases with and without noise, the same and more clear results of model comparison are obtained.Meanwhile, in the more realistic case with noise, the more complicated SED models are more favored than in the cases for CSST-like and CSST+Euclid-like survey.All of these are the natural results of the much stronger discriminative power of the COSMOS-like survey than the CSST-like and CSST+Euclid-like surveys.

Figure 21 .Figure 22 .Figure 23 .
Figure21.The results of photometric redshift estimation from the Bayesian analysis of the hydrodynamical simulation-based mock data for COSMOS-like survey by employing the two SED models with the largest Bayesian evidence.By comparing with the results for CSST-like survey in Figure12and that for for CSST+Euclid-like survey in Figure17, it is clear that the quality of photometric redshift estimation has been significantly increased in both the cases with and without noise.Meanwhile, in both cases, the best two SED models given identical quality of photometric redshift estimation.In the case without noise, the bias and OLF of photometric redshift estimation are almost zero while the σNMAD is only 0.002.Since the errors from parameter degeneracy and SED model error should be largely reduced, this suggest that the contribution of errors from the stochastic nature of MultiNest sampling algorithm and other potential errors in the BayeSED code should be less than 0.002.In the more realistic case with noise, the outliers caused by the mis-identification of Lyman and Balmer break features have been largely resolved.

Table 1 .
Summary of SED models, parameters and priors.