Neural Stellar Population Synthesis Emulator for the DESI PROVABGS

The Probabilistic Value-Added Bright Galaxy Survey (PROVABGS) catalog will provide the posterior distributions of physical properties of $>10$ million DESI Bright Galaxy Survey (BGS) galaxies. Each posterior distribution will be inferred from joint Bayesian modeling of observed photometry and spectroscopy using Markov Chain Monte Carlo sampling and the [arXiv:2202.01809] stellar population synthesis (SPS) model. To make this computationally feasible, PROVABGS will use a neural emulator for the SPS model to accelerate the posterior inference. In this work, we present how we construct the emulator using the [arXiv:1911.11778] approach and verify that it can be used to accurately infer galaxy properties. We confirm that the emulator is in excellent agreement with the original SPS model with $\ll 1\%$ error and is $100\times$ faster. In addition, we demonstrate that the posteriors of galaxy properties derived using the emulator are also in excellent agreement with those inferred using the original model. The neural emulator presented in this work is essential in bypassing the computational challenge posed in constructing the PROVABGS catalog. Furthermore, it demonstrates the advantages of emulation for scaling sophisticated analyses to millions of galaxies.


INTRODUCTION
Over the next decade, spectroscopic galaxy surveys such as the Dark Energy Spectroscopic Instrument (DESI; DESI Collaboration et al. 2016;Abareshi et al. 2022), the Prime Focus Spectrograph (PFS; Takada et al. 2014), William Herschel Telescope Enhanced Area Velocity Explorer (WEAVE; Dalton et al. 2012), and the 4-metre Multi-Object Spectroscopic Telescope (4MOST; de Jong et al. 2019) will measure the spectra of millions of galaxies. They will provide statistically powerful samples of galaxies at higher redshifts, low stellar mass ranges, and other less explored regimes. The DESI Bright Galaxy Survey (BGS; Hahn et al. 2022b), for instance, will provide a magnitude-limited galaxy sample out to z ∼ 0.6 and probe dwarf galaxies down to ∼ 10 7 M in the local universe. From the spectra of galaxies in these new regimes, we can measure their detailed physical properties and shed light on their formation and evolution.
We currently measure the physical properties of a galaxy, such as its stellar mass (M * ), star formation rate (SFR), metallicity (Z), and age (t age ), by modeling its spectral energy distribution (SED), or spectra, using stellar population synthesis (SPS; e.g. Bruzual & Charlot 2003;Maraston 2005;Conroy et al. 2009, see Walcher et al. 2011Conroy 2013 for a comprehensive review). Theoretical SPS models are compared to observed spectra using Bayesian inference to accurately quantify parameter uncertainties and degeneracies (Acquaviva et al. 2011;Chevallard & Charlot 2016;Leja et al. 2017;Carnall et al. 2018;Johnson et al. 2021). The Bayesian approach enables marginalization over nuisance parameters that model the effects of observational systematics (e.g. flux calibration). Posteriors of galaxies in a population can also be combined using a Bayesian hierarchical approach for more accurate population inference (e.g. jameskwon@ucsb.edu arXiv:2209.14323v2 [astro-ph.GA] 24 Mar 2023 Leja et al. 2021;Alsing et al. 2022;Leistedt et al. 2022). However, one major limitation of current Bayesian SED modeling methods is the computational resources that they require. Current methods, which use Markov Chain Monte Carlo (MCMC) techniques to sample the posterior, take 10-100 CPU hours per galaxy (e.g. Carnall et al. 2019;Tacchella et al. 2021). This makes it prohibitive to apply state-of-the-art galaxy SED modeling to the next-generation galaxy surveys. Alsing et al. (2020) recently demonstrated that neural emulators can be used to accelerate SED model evaluations by three to four orders of magnitude. With the speed-up of neural emulators, posterior inference can be reduced from hours to minutes per galaxy. In this work, we present a neural emulator specifically designed for the SPS model that will be used to construct the PRObabilistic Value-Added Bright Galaxy Survey (Hahn et al. 2022a, PROVABGS 1 ) catalog. PROVABGS will provide the posterior distributions of physical properties of > 10 million DESI BGS galaxies from joint SED modeling of their optical spectrophotometry. Its SED model is based on an SPS model with a nonparametric star formation history (SFH), a non-parametric metallicity history (ZH) that varies with time, and a flexible dust attenuation prescription. The goal of this work is to demonstrate that our emulator accurately reproduces the SED model (with 1% errors) as well as the posterior on galaxy properties. We begin in Section 2 with a brief description of the PROVABGS SED model. We then present how we design and train our neural emulator in Section 3. In Section 4, we validate the emulator.

STELLAR POPULATION SYNTHESIS
The PROVABGS catalog will use the state-of-the-art Hahn et al. (2022a) SPS model to infer the posteriors of the physical properties of BGS galaxies. In this section, we briefly introduce the PROVABGS SPS model that we emulate.
In the PROVABGS SPS model, a galaxy SED is modeled as a combination of the single stellar populations (SSPs), a group of stars of the same age and metallicity, formed over the lifetime of the galaxy.
The PROVABGS SPS model uses non-parametric prescriptions for its SFH and ZH. They are treated as linear combinations of basis functions derived from applying non-negative matrix factorization (NMF; Lee & Seung 1999;Cichocki & Phan 2009;Févotte & Idier 2011) to the SFHs and ZHs of simulated galaxies in the Illustris cosmological hydrodynamic simulation (Vogelsberger et al. 2014;Nelson et al. 2015). For SFH, the model uses four NMF bases with coefficients, β 1 , β 2 , β 3 , β 4 , and include an SSP starburst component parameterized with f burst and t burst for additional stochasticity. f burst denotes the fraction of the total stellar mass formed during the starburst and t burst is the time of the starburst event. For the ZH that varies with time, the model uses two NMF bases with coefficients γ 1 and γ 2 .
From the SFH and ZH, we calculate the rest-frame luminosity of a galaxy. The SFH and ZH are binned into logarithmically-spaced lookback time (t lookback ) that are truncated at the age of the galaxy. Then, for each t lookback bin, the luminosity of an SSP is computed, based on the metallicity in the bin, using Flexible Stellar Population Synthesis 2,3 (hereafter FSPS; Conroy et al. 2009;Conroy & Gunn 2010) with MIST isochrones (Paxton et al. 2011(Paxton et al. , 2013(Paxton et al. , 2015Choi et al. 2016;Dotter 2016), the Chabrier (2003) initial mass function, MILES stellar library (Sánchez-Blázquez et al. 2006) for the wavelengths between 3800-7100Å and BaSeL library (Lejeune et al. 1997(Lejeune et al. , 1998Westera et al. 2002) outside of that range. The SSP luminosity is attenuated using the two-component dust attenuation by Charlot & Fall (2000). SSPs younger than 100 Myr are first attenuated by birth clouds. Afterward, all SSPs are attenuated by the Kriek & Conroy (2013) attenuation curve. The SSP luminosities are then normalized by the total stellar mass formed in their t lookback bins and then combined to obtain the dust-attenuated SED.
In summary, apart from nuisance parameters, the PROVABGS SPS model has 12 model parameters: total stellar mass, four NMF SFH basis coefficients, two starburst components, two NMF ZH basis coefficients, and three dust attenuation parameters. For the full list of the PROVABGS model parameters and more details on the PROVABGS SPS model, we refer readers to Hahn et al. (2022a).

PROVABGS EMULATOR
The PROVABGS catalog will infer > 10 million posteriors using MCMC sampling. As we later demonstrate, accurately estimating the posterior requires 120,000 SPS model evaluations, given the dimensionality of our SPS model. If we evaluate the PROVABGS SPS model using FSPS, it requires > 10 CPU hours per galaxy and makes PROVABGS computationally infeasible (we will refer to the PROVABGS SPS model evaluated using FSPS as PROVABGS-FSPS). We, therefore, construct a neural emulator for the SPS model that will accelerate the model evaluations. In this section, we describe how we construct the emulator. We follow the neural emulation framework from Alsing et al. (2020). The emulator consists of a neural network that takes SPS model parameters as input and predicts coefficients of a fixed set of N PCA Principal Component Analysis (PCA) basis vectors for galaxy SEDs. The PCA coefficients and basis vectors are then linearly combined to predict the model SED as a function of wavelength. In Alsing et al. (2020), they use a single emulator over the full SED wavelength coverage. In this work, however, we divide the wavelength coverage into four contiguous ranges, 2000-3600, 3600-5500, 5500-7410, and 7410-60000Å, to achieve higher accuracy. The wavelength ranges have 127, 2109, 2113, and 549 spectral bins, respectively. We also use separate emulators for the NMF and burst contributions to the SFH. In total, we use 8 emulators. For each emulator, we use N PCA = {50, 50, 50, 30} PCA components, which we determined through experimentation to achieve 1% PCA reconstruction error throughout the entire wavelength. For the neural networks, we use the same fully-connected architecture and activation function as Alsing et al. (2020). We determine the number of hidden layers and units through experimentation: 8 hidden layers and 256 units for the NMF emulators and 6 hidden layers and 512 units for the burst emulators. Our architecture is deeper than the emulator in Alsing et al. (2020), because we have a more complex SPS model.
To train the emulators, we sample 1 million parameters from the priors in Table 1 of Hahn et al. (2022a) and use PROVABGS-FSPS to generate a model SED for each parameter. We then split the dataset into a training and validation set using a 90/10 split. From the training set, we derive the N PCA basis vectors and use them to transform the model SEDs to PCA coefficients. Before training, the weights of the neural network are randomly initialized from a normal distribution and biases are initially set to zero. Afterward, we train the neural network using the mean squared error between the predicted and true PCA coefficients as the loss function with the Adam optimizer (Kingma & Ba 2014). The initial learning rate is set to 10 −3 . If the loss evaluated on the validation set does not improve after 20 consecutive epochs, the learning rate is gradually decreased following the sequence: 10 −3 , 5 × 10 −4 , 10 −4 , . . . , 5 × 10 −6 , 10 −6 . After the learning rate reaches 10 −6 , we terminate the training when the validation loss does not improve for 20 consecutive epochs.

Validating Emulator SED
In this section, we demonstrate that the emulator accurately reproduces the SEDs computed by PROVABGS-FSPS. To test the emulator, we sample 100,000 parameters from the priors and calculate their SEDs using the emulator and PROVABGS-FSPS. Then, we compute the fractional difference of the SEDs: (f emul. − f fsps )/f fsps , where f emul. and f fsps denotes the emulator and PROVABGS-FSPS SED fluxes, respectively. In Figure 1, we present the fractional errors of the emulator in each of the four wavelength bins. We mark the percentiles of the emulator reconstruction error with different shades (68, 95, 99, 99.9%, from the darkest to lightest). For ∼ 95% of the test SEDs, the fractional error is consistently within < 1% in all wavelength bins. In fact, the SEDs from the emulator and PROVABGS-FSPS agree to a sub-percent level for 99.9% of the test SEDs in all but the first wavelength bin. One may be concerned regarding relatively higher fractional errors at λ < 3000Å; however, we demonstrate in Section 4.2 that this does not significantly impact the accuracy of the posteriors because BGS spectra have lower signal-to-noise ratios (SNRs) near-UV wavelengths (Hahn et al. 2022b). Also, SPS modeling has larger uncertainties in the near-UV due to factors such as metallicities, stellar age, and dust (Conroy 2013;Johnson et al. 2021), which alleviates potential concerns. We also note that since we train separate emulators for the different wavelength bins, the SEDs predicted by our emulator can be discontinuous. However, given the accuracy of our emulator, this discontinuity does not have a significant impact. As an additional illustration of the emulator's accuracy, we present the cumulative distribution of the absolute fractional error averaged over the wavelength 2400 < λ < 10000Å in Figure 2. The shaded regions correspond to the same percentiles as Figure 1. The emulator produces the mean fractional error 1% on > 99.9% of the test SEDs.

Validating Emulator Posterior
Next, we validate the posteriors of galaxy properties derived using the PROVABGS emulator against posteriors derived using PROVABGS-FSPS. We infer the posteriors from synthetic observations of BGS-like spectra from Hahn et al. (2022a). The synthetic spectra are constructed using ∼ 2,000 mock galaxies from the L-galaxies semi-analytic galaxy formation model (Lgal, hereafter; Henriques et al. 2015). They are calculated with FSPS using the SFH and ZH of the simulated galaxies. Then, velocity dispersion, dust attenuation, realistic noise model, and DESI BGS instrumental effect are applied to the spectra to produce realistic DESI-like spectra. We refer readers to Hahn et al. (2022a) for the details about the forward modeling. Out of the ∼ 2,000 Lgal galaxies, we select a representative subset of 93 galaxies. In Figure 3, we present the distribution of the physical properties of these galaxies (red). We focus on the following galaxy properties: total stellar mass (M * ), star formation rate averaged over past 1 Gyr (SFR 1Gyr ), mass-weighted metallicity (Z MW ), and mass-weighted age (t age,MW ). The galaxy properties are computed in the same way as in Hahn et al. (2022a). For reference, we present the galaxy property distribution of the full Lgal samples Figure 3. The distribution of the physical properties of 93 Lgal simulated galaxies that we use for our emulator versus PROVABGS-FSPS posterior comparison (red). We focus on the total stellar mass, average star formation rate over 1 Gyr, mass-weighted metallicity, and mass-weighted stellar age. The black contours mark 68 and 95 percentiles of each property for the entire Lgal samples. The selected Lgal galaxies are a representative sample of the full Lgal sample. We infer the posteriors of galaxy properties using both our emulator and the PROVABGS-FSPS from realistic BGS-like spectra constructed from these simulated galaxies.
in black contours and mark the 68 and 95th percentiles. The selected 93 samples span the full distribution of Lgal galaxy properties. We infer the posteriors of the 93 simulated galaxies from their spectra using a Gaussian likelihood, priors from Hahn et al. (2022a), and MCMC sampling. For our MCMC sampler, we use the Zeus ensemble slice sampler (Karamanis & Beutler 2020;Karamanis et al. 2021). Using PROVABGS-FSPS to compute SEDs, we initialize 30 walkers. We discard the first 1000 iterations as burn-in for 89 of the galaxies. For the other four galaxies, we use 6000 iterations as burn-in due to inaccurate initialization. The 3000 iterations after burn-in are used as samples of the posterior distributions. To infer the posteriors using the emulator, we initialize the walkers at the same positions as the walkers in the PROVABGS-FSPS chains after burn-in. This is to remove any impact of initialization when we later compare the two sets of posterior distributions. We confirm that our posterior samples converge within 3,000 iterations after burn-in using visual inspection of the distributions of walkers and by examining the autocorrelation time. For the latter, we use integrated autocorrelation time (IAT), which is the number of iterations needed to sample two points that are independent of each other. If N iteration > Kτ /N walkers , where τ is the converged IAT estimate and K is a constant, the walkers sampled a sufficient number of independent points to capture posterior distributions. We take K ∼ 50 and calculate the mean of τ for each walker at every 50 iterations following Goodman & Weare (2010). With N walkers = 30, we obtain τ ≈ 1000. Hence, N iteration ≈ 3000 is sufficient.
Next, we compare the posteriors inferred using the emulator and PROVABGS-FSPS. From the posteriors of SPS parameters, we derive posteriors of galaxy properties M * , SFR 1Gyr , Z MW , and t age,MW . In Figure 4, we present the emulator (red) and PROVABGS-FSPS (black) posteriors inferred from the same simulated galaxy. The contours represent the 68, 95, and 99.7 percentiles. The medians of the marginalized 1D posteriors are marked with vertical dashed lines in the histograms. The posteriors derived using the emulator and PROVABGS-FSPS are in excellent agreement. We also compare the medians and 1σ uncertainties of the 1D marginalized posteriors from the emulator and PROVABGS-FSPS for the selected LGal galaxies in Figure 5. From the leftmost panel, we compare log M * , log SFR 1Gyr , log Z MW , log t age,MW . There is excellent agreement on all 93 Lgal galaxies in the posterior medians and uncertainties: we find < 0.02σ discrepancies between the medians of the emulator and PROVABGS-FSPS posteriors.
As the main motivation for the emulator to evaluate the PROVABGS SPS model faster, we profile the computational advantage of the emulator. We randomly sample 1,000 test parameters from priors and profile the elapsed time for the emulator and PROVABGS-FSPS to calculate all 1,000 SEDs. On a 2.6 GHz 6-Core Intel Core i7 CPU, the emulator took an average of 2.3 ms per evaluation, while PROVABGS-FSPS took 240 ms per evaluation. Thus, to sample a posterior for a single galaxy, which requires ∼ 120,000 model evaluations, the emulator takes ∼ 4 minutes while PROVABGS-FSPS takes ∼ 10 hours. The emulator is > 100× faster than the original SPS model.

CONCLUSION
We presented the neural emulator for Hahn et al. (2022a) SPS model that will be used to construct the DESI PROVABGS catalog. The emulator is constructed using the neural emulation framework of Alsing et al. (2020). It consists of a fully-connected neural network that takes SPS model parameters as input and predicts PCA coefficients, which are combined with PCA basis vectors to predict the model SED. The neural network and PCA basis vectors are trained using a training set of 1 million SPS parameters and model SEDs. We use a similar neural network as the emulators in Alsing et al. (2020) and determine their architecture (number of hidden layers and units) through experimentation.
We validate the emulator using a test dataset of 100,000 model SEDs and confirm that the emulator reconstructs the SEDs computed using PROVABGS-FSPS with 1% error for λ > 3000Å. The emulator reproduces the original SPS model with mean absolute fraction errors of < 0.005 on 99.9% of the test SEDs. As further validation of our emulator, we compare posteriors of galaxy properties derived using the emulator and PROVABGS-FSPS. The posteriors are inferred from Bayesian SED modeling of synthetic BGS-like spectra constructed from 93 simulated galaxies in the Lgalaxies semi-analytic model. Based on autocorrelation time analysis, the MCMC sampling converges within 3,000 iterations after burn-in. We find excellent agreement between the emulator and PROVABGS-FSPS posteriors. The medians of the posteriors deviate by < 0.02σ and are in good agreement within the uncertainties. We also compare the computational cost of the emulator and PROVABGS-FSPS. The emulator takes 2-3 ms to model a single SED while the PROVABGS-FSPS requires 300-400 ms. To derive a posterior for a single galaxy, our emulator requires 100× less CPU time, from 10-13 hours to 4-6 minutes. Additional speed-up will be achievable using GPUs (e.g. Alsing et al. 2022) and with more parallel MCMC methods (e.g. affine 4 ). For reproducibility, the codebase for the training, validation, and benchmarking of the emulator is publicly available. 5 Our PROVABGS emulator is an accurate and 100× faster alternative to direct SPS computation using PROVABGS-FSPS. It will be essential for constructing PROVABGS and providing posteriors of galaxy properties for > 10 million BGS galaxies. More broadly, neural emulation offers a way to scale rigorous Bayesian inference to the tens of millions of galaxies that will be observed by upcoming galaxy surveys (e.g. DESI, PFS, WEAVE, 4MOST, and Rubin). Applications of neural emulation, such as the one presented in this work, will enable future galaxy studies to combine the most rigorous and accurate measurements of galaxy properties with the unprecedented statistical power of upcoming observations.