Reconstructing Lyα Fields from Low-resolution Hydrodynamical Simulations with Deep Learning

Hydrodynamical cosmological simulations are a powerful tool for accurately predicting the properties of the intergalactic medium (IGM) and for producing mock skies that can be compared against observational data. However, the need to resolve density fluctuation in the IGM puts a stringent requirement on the resolution of such simulations, which in turn limits the volumes that can be modeled, even on the most powerful supercomputers. In this work, we present a novel modeling method that combines physics-driven simulations with data-driven generative neural networks to produce outputs that are qualitatively and statistically close to the outputs of hydrodynamical simulations employing eight times higher resolution. We show that the Lyα flux field, as well as the underlying hydrodynamic fields, have greatly improved statistical fidelity over a low-resolution simulation. Importantly, the design of our neural network allows for sampling multiple realizations from a given input, enabling us to quantify the model uncertainty. Using test data, we demonstrate that this model uncertainty correlates well with the true error of the Lyα flux prediction. Ultimately, our approach allows for training on small simulation volumes and applying it to much larger ones, opening the door to producing accurate Lyα mock skies in volumes of Hubble size, as will be probed with DESI and future spectroscopic sky surveys.


Introduction
The neutral hydrogen in the intergalactic medium imprints a characteristic pattern in the absorption spectra of quasars, known as the "Lyα Forest."The Lyα forest is a powerful probe of the thermal state of the intergalactic medium, and also of cosmology.The distribution of the Lyα forest lines depends on the distribution and properties of the intervening gas, which is affected by the background of UV photons and the history of cosmological structure formation (for a recent review, see McQuinn 2016).
However, the real potential of the Lyα forest lies in using the full shapes of three-dimensional power spectra (P3D;McDonald 2003).This probe combines cosmological information from both large and small scales, radically expands the number of available modes, and breaks some of parameter degeneracies due to capturing correlations both along and perpendicular to the line of sight (see, for example, Rorai et al. 2017).This in turn offers the possibility of significantly improving constraining power on cosmological parameters compared to BAO or P1D.The measurement is difficult and has not been performed so far, because it requires both surveying large volumes and having large numbers of lines of sight.However, the ongoing Dark Energy Spectroscopic Instrument (DESI Collaboration et al. 2016) will be the first survey to enable measurement of P3D (Font-Ribera et al. 2018), due to its quasar density of ∼60 QSO deg −2 (Chaussidon et al. 2023), which is approximately three times denser than previous surveys (e.g., BOSS and eBOSS).In the future, the Stage V Spectroscopic Facility, which is currently being scoped by the cosmological community (Chou et al. 2022), will further improve the precision of such measurement.
Inferring parameters from P3D requires not only precise observations but also accurate theoretical modeling, which is difficult because there is no analytic solution for the small-scale evolution of the (baryon) density fluctuations.In order to precisely model the state of the IGM, it is instead necessary to approach the problem numerically.In this respect, hydrodynamic cosmological simulations have led to a consistent description of the IGM in the framework of large-scale structure formation (Cen et al. 1994).However, such simulations are computationally expensive, as they must retain a high spatial resolution of ∼20 h −1 kpc, which is necessary to resolve density fluctuations in the IGM (Lukić et al. 2015;Walther et al. 2021;Chabanier et al. 2023;Doughty et al. 2023) to accurately capture the Lyα signal, despite the fact that observatories like DESI are measuring fluctuations in Lyα flux only at scales 1 Mpc h −1 (Karacaylı et al. 2023;Ravoux et al. 2023).Furthermore, constraints on memory limit the size of volumes that can be run in high-resolution simulations; for example, a 1 h −1 Gpc simulation would require ∼50,0003 computational elements, far beyond computational resources available in the foreseeable future (we note that this requirement applies to both Eulerian and Lagrangian codes; see Chabanier et al. 2023).
In addition to P3D, large volumes enable cross-correlating Lyα and other tracers of matter fluctuations.Because the forest power spectrum is sensitive to the small-scale matter fluctuation, whereas CMB lensing convergence measures the total matter fluctuation along the line of sight, combining the two (Vallinotto et al. 2009(Vallinotto et al. , 2011;;Chiang & Slosar 2018;Dash & Guha Sarkar 2021) allows us to probe the nonlinear structure formation at large scales in a redshift range still poorly tested.Correlating galaxy positions and their Lyα environments is another promising research avenue (Font-Ribera et al. 2013;Bielby et al. 2017;Mukae et al. 2017;Momose et al. 2021bMomose et al. , 2021a) ) for tracing matter fluctuations, as galaxy formation occurs in preferentially more dense environments, while the forest mostly comes from less dense regions.This synergy between the two can also significantly improve the quality of density and initial conditions reconstruction (Horowitz et al. 2021).
Recently, machine learning, and more specifically deep learning, has shown remarkable potential to assist in solving highly nonlinear problems like this by serving as a surrogate model for complex physics simulations (for a recent review of many deep-learning applications in cosmology, see Huertas-Company & Lanusse 2023).Methods based on deep learning have been used to replace full simulations (He et al. 2019;Mustafa et al. 2019;Feder et al. 2020;Giusarma et al. 2023), to substitute expensive hydrodynamical solvers by inferring gas properties from N-body simulations (Tröster et al. 2019;Thiele et al. 2020;Dai & Seljak 2021;Bernardini et al. 2022;Harrington et al. 2022;Horowitz et al. 2022;Boonkongkird et al. 2023), or going even deeper into numerical algorithms, to replace chemical network solvers inside hydro simulations (Fan et al. 2022).
Pertinently, a number of works have employed deep learning to super-resolve the fields from coarse simulations, synthesizing fine-scale details without incurring large computational costs.This has been demonstrated successfully in the realm of N-body simulations and modeling of dark matter halos and small-scale galaxy formation physics (Li et al. 2021;Ni et al. 2021;Schaurecker et al. 2021).While these results are promising, directly applying super-resolution approaches in hydrodynamic simulations for Lyα poses a number of practical challenges.For one, as mentioned previously, the simultaneous requirement of high resolution (∼20 kpc) and large volumes (1h −1 Gpc) needed for accuracy in Lyα means that direct super-resolution models would generate overwhelmingly large outputs.This drawback is compounded by the fact that, in order to converge on the Lyα statistics at ∼1 h −1 Mpc scales (roughly the observational limit), the simulations must be run at ∼20 h −1 kpc resolution, even though the hydrodynamic outputs themselves are not necessarily needed on scales finer than ∼1 h −1 Mpc for analysis.It is therefore of interest to identify deep-learning approaches that might act similarly to super-resolution, providing reconstructions of the physics from fully resolved simulations, while sidestepping these practical constraints in order to enable deployment in survey-scale cosmological volumes.
In this work, we aim to advance the state of the art in deeplearning-assisted modeling of the Lyα forest and demonstrate a methodology for producing a mock sky at high resolution over volumes comparable to that of DESI.We develop a novel approach combining conditional generative adversarial networks (Goodfellow et al. 2014;Mirza & Osindero 2014) with paired sets of hydrodynamic simulations, which allows us to take cheaper, coarse-resolution simulations and correct their outputs to more closely resemble the result from highresolution, fully converged simulations that would be infeasible to run on very large volumes.We carefully design our network and training task to ensure our approach is lightweight in terms of compute and memory requirements, while still providing high-fidelity results that greatly improve on the low-resolution simulations in terms of qualitative and statistical fidelity.Furthermore, our model can be sampled repeatedly to produce multiple realizations given a single input, and we find that these provide Monte Carlo estimates of model uncertainty that correlate well with prediction error.Finally, we also explore the application of our approach to modeling not just the Lyα forest, but also the underlying hydrodynamic fields of interest, such as baryon density and temperature, demonstrating yet again significant improvements over the low-resolution simulations.
This paper is organized as follows.We first discuss the simulations we use to train our network and the methods used to compute Lyα forest from them in the Section 2. Then we describe the design and novelty of our model architecture and training incentives in Section 3. Our results are presented in Section 4, where we first discuss the accuracy of our model on the hydrodynamic field reconstructions and then the statistical accuracy of the observable Lyα absorption features derived from them.Finally, we present conclusions in Section 5.

Simulations
We assemble our training and testing data from pairs of simulations run by the publicly available Nyx 3 code (Almgren et al. 2013;Sexton et al. 2021).The code follows the evolution of dark matter using collisionless self-gravitating Lagrangian particles that are coupled to baryonic ideal gas modeled on a uniform Cartesian grid.Although Nyx is capable of employing adaptive mesh refinement (AMR), we do not make use of this functionality here, as the Lyα signal spans the vast low-density volumes and is fully absorbed close to halos where AMR would be relevant.
To model the Lyman α forest, Nyx models primordial gas composition of hydrogen and helium, and it follows the abundance of six species: neutral and ionized hydrogen; neutral, once-, and twice-ionized helium; and free electrons.For these species, all relevant atomic processes, i.e., ionization, recombination, and free-free transitions, are modeled.Heating and cooling source terms are calculated using a subcycled approach in order to avoid running the whole code on a short, cooling timescale.It is assumed that all gas elements are optically thin to ionizing photons, and that the ionization state can be fully described by a uniform and time-varying UV background radiation field (Oñorbe et al. 2017).This type of simulation is very common in studies of the intergalactic medium and is used as a forward model in virtually any recent inference work using the Lyα power spectrum (Boera et al. 2019;Walther et al. 2019;Palanque-Delabrouille et al. 2020;Rogers & Peiris 2021;Walther et al. 2021).Finally, the agreement between the Nyx code, which employs the Eulerian approach to hydrodynamics, and the SPH method was recently demonstrated in Chabanier et al. (2023).
The simulations are initialized at z = 200, using the Zeldovich approximation (Zel'dovich 1970).Transfer functions were generated with the Boltzmann solver code CLASS (Blas et al. 2011).The cosmological parameters are set according to the Planck-2016(Planck Collaboration et al. 2016) model: Ω b = 0.0487, Ω m = 0.31, H 0 = 67.5, n s = 0.96, and σ 8 = 0.83.We simulate volumes of 80 h −1 Mpc on a side, which is the same size that will be used in DESI 1D power spectrum modeling (Walther et al. 2021).We produce two pairs of such simulations, where each pair has a high-and a low-resolution run initialized from the same random initial conditions.The high-resolution simulations have 4096 3 elements (particles and grid cells) and are thus fully converged and percent-level accurate, while the low-resolution simulations are much coarser with only 512 3 elements.All simulation runs have identical cosmology, UV background, and all numerical parameters, and the two pairs of high-and lowresolution simulations differ only in the choice of random realization.This enables the model to be trained on one pair of simulations and tested on another.

Derived Fields
We use the software suite gimlet (Friesen et al. 2016) to derive the optical depth and flux fields from our hydrodynamic fields.The optical depth, τ, for Lyα photon scattering is given by ( ) where n X is the number density of species X (H I in our case), σ ν is the cross section of the interaction, and dr is the proper path-length element.We assume a Doppler line profile, which results in the following optical depth: , and f 12 is the upward oscillator strength of the Lyα resonance transition of frequency ν 0 , e is the elementary charge, m e is the electron mass, and c is the speed of light (Lukić et al. 2015).In the optically thin limit (neglecting scattering effects), τ becomes an entirely local quantity.Finally, Lyα flux is given by

Producing Target Fields
The goal of this work is to enable "correcting" the output of a low-resolution hydrodynamical simulation without increasing its memory footprint.In other words, we do not want to super-resolve such a simulation by adding elements of finer resolution, but rather to improve the accuracy of the already resolved scales.We do this in order to enable modeling of survey-level volumes.For instance, a volume that measures 600 h −1 Mpc on a side-which would be on the lower end of survey volumes-would require fields of size 30,720 3 to represent in high resolution, if we were to super-resolve it.Clearly, storing outputs of such sizes is not a viable approach in the foreseeable future.
Thus, we must first downsample each field from a 4096 3 simulation down 512 3 , matching the memory footprint of the low-resolution simulation.This is not a trivial step, because we must not only make accurate coarse representations of fine features (which are represented on grids with 8× more resolution elements) but also faithfully preserve the relevant correlations of each field (especially the probability density functions and power spectra).For the target flux, we can simply subsample without introducing statistical artifacts.But for the hydrodynamic fields that span many orders of magnitude, we find that a special recursive blur and subsample method works best in resizing the high-resolution fields down by a factor of 8 while preserving statistical fidelity, which we discuss in detail in Appendix C. We use this resized volume as the training target for our model, and we show sample slices from the highand low-resolution simulations along with the resized training target and model outputs in Figure 1.

Model Design
Modeling the hydrodynamic fields from high-resolution simulations on a coarse grid presents an interesting design challenge for a deep-learning model.The neural network must model data that exhibit sharp, potentially pixel-scale features, and the mapping from coarse to fine physical fields is inherently uncertain in a nontrivial, data-dependent manner (as in the more conventional direct super-resolution task, e.g., Ning et al. 2021).To address these challenges, we adopt a generative modeling approach, heavily inspired by highresolution image-to-image mapping research (Jiang et al. 2020a) as well as the StyleGAN (Karras et al. 2019(Karras et al. , 2020) ) family of models.Our model architecture can be viewed as a conditional GAN that models the distribution of (resized) highresolution hydrodynamic fields, given the corresponding fields from the low-resolution simulation.Our design facilitates modeling both sharp, localized features as well as sampling multiple realizations of the target field from a single input, allowing direct estimation of model and data uncertainty via ensemble predictions with a single set of trained network weights.

Multiscale Convolutional Architecture
Because our training and inference targets have already been downsampled (see Section 2.2), our model architecture does not need to produce additional resolution elements, so the network outputs can be represented on a spatial grid of the same size as the input.Thus, from a machine-learning perspective, our task bears a strong resemblance to the general idea of image-to-image mapping (Isola et al. 2018), which was first attempted with architectures like the ubiquitous U-Net (Ronneberger et al. 2015) trained with reconstruction ( 1 ) and adversarial loss functions.In subsequent years, the computer vision community has made progress advancing the models used in this area to handle higher-resolution data via multiscale adversarial training (Wang et al. 2018) as well as generating more detailed and perceptually realistic outputs via improved incorporation of input features within the network (Park et al. 2019;Jiang et al. 2020a).We base our model architecture on one such successful model in this area, TSIT (Jiang et al. 2020a), which has additionally been demonstrated to show promise in scientific applications (Duncan et al. 2022).TSIT is a fully convolutional model, which can be trained on smaller volumes cropped from a simulation and then deployed on the larger volume during inference, which loosens memory requirements for training (Harrington et al. 2022).
We show a schematic of our model in Figure 2(a).The primary components are a pair of encoder and generator networks, which are comprised of convolution-based residual blocks (He et al. 2016) assembled in a symmetric manner with downsampling/upsampling between each block in the encoder/generator, respectively.The encoder network takes as input the coarse hydrodynamical fields from a low-resolution simulation and extracts features from them at multiple spatial scales via successive residual blocks (ResBlocks) and downsampling operations.The extracted multiscale features are able to encode fine-level details as well as long-range information, both of which are needed to synthesize a physically realistic prediction that is consistent with the low-resolution input.Each ResBlock in the encoder consists of three convolutional layers, one of which forms a learned residual skip connection.
The generator network operates in a coarse-to-fine direction, incorporating extracted features from the encoder as well as sampled latent variables (given by Gaussian noise) at progressively larger spatial scales to form the final prediction.In the following section, we describe in detail how the generator samples latent variables via noise injection, but here we must first describe how the encoder features are incorporated into the generator.This is done using the FADE ResBlocks, proposed in the original TSIT work (Jiang et al. 2020a), which we illustrate in Figure 2(b).The FADE ResBlocks are largely the same as the standard ResBlocks in the encoder network, except for the addition of FADE modules that take extracted features f i from the encoder and use them to perform adaptive instance normalization.This design, inspired by style-transfer literature, allows the generator to flexibly incorporate multiscale features and use them to synthesize multiple outputs from a single input easily.We refer the reader to Jiang et al. (2020a) for further discussion on the advantages of the feature-wise adaptive normalization approach.

Latent Variable Noise Injection
A major goal of our work is to develop a nondeterministic model, such that we can generate multiple predictions of the high-resolution targets given a single input.This allows us to make ensembles of predictions that in aggregate give a sense of the combined model and data uncertainty associated with our neural network approach.The TSIT work does explore multiple ways of achieving this, mainly through either sampling a noise vector at the bottom of the generator network (similar to a standard GAN), or via an additional encoder into a latent space that is structured like that of a variational autoencoder (VAE).
Through initial experimentation with these methods, we found that neither method was sufficient to produce a statistically significant amount of variation in the network outputs.We suspect that part of the problem with these approaches is that they only introduce noise or variability at the "bottom" end of the generator, which represents features at the coarsest spatial scales.In our data, however, it is likely that much of the variation and uncertainty inherent to our coarse-tofine mapping actually lies in the finest spatial scales.Thus, only introducing noise at the coarsest scales may not give the generator network adequate means to express a full range of To satisfy memory limitations, inference is performed on small chunks of the desired volume, one at a time.To reduce edge effects, the chunks overlap with ghost cells that are removed after inference is performed.This is discussed in more depth in Appendix B.
variability in outputs (see further discussion of this concern in Section 3.2 of Karras et al. 2019).
To address this concern and produce more stochastic behavior, we leverage multiscale latent variables that inject additive Gaussian noise into the generator at all feature resolutions, similarly to the StyleGAN family of models (Karras et al. 2019).With this approach, the generator is far less dependent on only the noise at the bottom of the network to produce variation, so much so that previous works have produced equally high-quality and variable outputs even when simply replacing the bottom-level noise input with a constant, learnable tensor (Karras et al. 2019(Karras et al. , 2020) ) or simply a downsampled version of the input features (Park et al. 2019).We use the latter in this work, and then rely on the multiscale noise and FADE blocks in the later stages of the generator to produce physically realistic variation in our network outputs.
As the correct scale of noise to inject cannot be known a priori, and indeed the correct scale may change throughout training as the internal network weights and activations drift from random initialization toward their converged values, we choose to let the network learn the proper noise scale as a trainable parameter.Specifically, for each noise injection z i at a given stage in the generator g i , we sample a tensor of uncorrelated Gaussian noise from ( )  s 0, i , where s i is learnable.The tensor of noise is directly added to the output of the FADE ResBlock in g i , and the result is then passed to the subsequent generator stage g i+1 .At the beginning of training, we initialize s i at all stages of the generator to be sampled from ( )  1, 0.05 .Overall, our noise injection approach is most similar to that of Duncan et al. (2022), who found it to work well in producing physically realistic variability in generative weather forecasting.

Normalization
The hydrodynamic fields produced by Nyx span several orders of magnitude.Features in the baryon density and temperature fields are most easily interpretable in log-space.We follow Harrington et al. (2022) and use the following functions to normalize the hydrodynamic physical fields before passing them to the model: where ρ is baryon overdensity, T is temperature in Kelvin, and v is the velocity in cm/s.These choices roughly constrain each field to the range (−1, 1) such that its features are visible in linear space.The Lyα flux is naturally in the range (0, 1) and does not require additional dedicated normalization.To constrain the output fields to their target ranges, we use a hyperbolic tangent filter on the final layer (i.e., ( ) ¢ = x x tanh ) for hydrodynamic fields and a sigmoid filter for Lyα.

Loss Objective Design
We employ multiple loss functions to incentivize our networks to reproduce various visual and statistical qualities present in the data.The dominant component of the total loss used for training is  1 distance, which encourages large-scale features to match between the predictions and targets, and improves training stability: Here, y i is the target (i.e., "true") field, which corresponds to the predicted field f (x i ).
A well-known downside of  1 loss is its tendency to cause blurring of fine-scale features, so it is often augmented with a local or patch-based adversarial loss (Isola et al. 2018).We adopt the well-tested approach from image synthesis applications of using a multiscale adversarial loss  adv (Wang et al. 2018;Park et al. 2019;Jiang et al. 2020a), which is defined by multiple patch-based discriminator networks that each see different resized versions of the generated and ground truth data.We refer the reader to Wang et al. (2018) for a full description and analysis of this configuration; we use their Equation (5), including the feature matching terms, as our full adversarial loss  adv .More details on the discriminators are given in Appendix A.
Additionally, because the primary goal of the generator is to reconstruct statistics as accurately as possible, we introduce a spectral loss term,  fft , in addition to traditional  1 loss. fft is a measure of the difference between the Fourier transforms of the prediction and target (a stand-in for the power spectrum): Here, ( )  f denotes the 3D fast Fourier transform, truncated to frequencies below 10 h Mpc −1 .The complete loss for the generator is defined as where λ 1 = 100 and λ fft = 6 are hyperparameters to weigh the influence of  1 and  fft .After weighting, the relative scales of each of our loss terms are roughly ~~~    20, 10, 5, 10.

fft GAN feat
When λ 1 was set much lower, we observed that the adversarial model would occasionally infer realistic-looking volumes, but ones that did not match the ground truth as closely.

Results
To evaluate our model, we apply it to a separate volume at the same redshift that was not used for training (and with different random realizations of the initial conditions than those used for the training volume).As is visually apparent in Figure 1, the reconstructed fields inferred by our model for this test simulation contain much sharper features than the lowresolution simulation, and the dynamic range of each field resembles the high-resolution field much more closely.At very small scales, we can observe both our model outputs and the resized target fields used to train them contain minor differences with respect to the reference high-resolution simulation, which we take to be the "ground truth." In the following sections, we will analyze in detail the statistical properties of our reconstructed fields and assess how closely their summary statistics match those of the test simulation, using a variety of probes.We focus first on our model's direct predictions of Lyα, evaluating statistical properties of individual samples as well as ensemble predictions for quantifying model uncertainty.Finally, we also evaluate some statistical properties of the underlying hydrodynamic fields predicted by our model.

Lyα Fields
We show a visualization of the reconstructed Lyα field in Figure 1.This example 2D slice runs parallel to the line of sight, so that the redshift-space distortions are visible along the x-axis.
As seen in Figure 1, the reconstructed Lyα field is visually of high spatial fidelity and closely agrees with the resized target field.Agreement is especially good in diffuse regions and filaments, where densities are lower and there is more uniform motion along the line of sight.Conversely, the dense, hot regions surrounding galaxy clusters seem to be more difficult to handle, since the redshift-space distortions amplify any discrepancies between the low-resolution and high-resolution simulations.These dense galaxy clusters make up an exceedingly small fraction of space by volume, so our model may struggle due to a relative lack of training data.
The most commonly used statistics in evaluating accuracy of Lyα flux are the power spectrum (P(k)) and probability density function (PDF).We use the gimlet code (Friesen et al. 2016) to calculate these statistics, as well as the P(k) and PDF of the baryon hydrodynamic fields (density, velocity, and temperature).These statistics are calculated assuming sightlines or "skewers" that pass through the entire volume, parallel to one of the axes.

The Mean and Probability Density Function
The mean flux reconstructed by our model is only 2% lower than the high-resolution target.In the following analysis, we rescale all simulations to the same mean flux 〈F〉 = 0.6835.This is the mean flux calculated from our high-resolution training simulation, which is also in agreement with the observational measurements (e.g., Becker & Bolton 2013;Gaikwad et al. 2021).
The probability density function, Pr(F) is defined such that the integral of P over the full range of F is equal to 1, ( ) . We show the PDF of our model's Lyα predictions compared against that of the high-res and low-res simulations in Figure 3.We find that the PDF of our reconstructed Lyα is within 10% of the true value and thus offers a substantial improvement over the low-resolution simulation, whose PDF is off by as much as 50%.

The 1D Power Spectrum
The power spectrum of the Lyα forest offers a promising route to measuring both properties of the IGM and cosmological parameters.The P1D is one of the most important observables that can be predicted by our cosmological models.We define the flux overdensity as where 〈F〉 is the mean flux with respect to the entire volume in redshift space.In case of P1D, the power spectrum along a given skewer is given by where dF * is the complex conjugate of dF .We plot the mean power spectrum 〈P(k)〉 averaged over modes with magnitude k from all skewers, as done in Lukić et al. (2015).
We show comparisons between the generated and true P1D in Figure 3.Our reconstructed Lyα P1D is in agreement with the high-resolution test volume to within 5% for wavenumbers smaller than 10 h Mpc −1 , after which the resized target we trained the model on begins to diverge due to aliasing.Although accurate reconstruction of P1D is not the main target of our method-given that it can be reproduced by the "brute force" hydrodynamical simulations-it is very reassuring that this important statistic, which is used in so many inference works (Boera et al. 2019;Walther et al. 2019;Palanque-Delabrouille et al. 2020;Rogers & Peiris 2021;Walther et al. 2021, to mention a few), can be precisely reconstructed from otherwise quite inaccurate low-resolution hydrodynamical simulations that yield the incorrect P1D even at low wavenumbers for Lyα.

The 3D Power Spectrum
Unlike the hydrodynamic fields, the flux field is distorted along the line of sight, due to redshift-space distortions.It is therefore not appropriate to average out flux power in all three dimensions, due to this anisotropy.Instead of averaging over spherical shell bins of wavenumber k, a common method of describing an anisotropic power spectrum is in terms of k and μ, the angle between the wavevector k, and its LOS component: Therefore, the 3D power spectrum (P3D) we present is binned in k and μ, rather than k alone, resulting in P(k, μ).We separate this family of power spectra into four bins in μ: [0, 0.25), [0.25, 0.5), [0.5, 0.75), [0.75, 1].The later bin is most nearly parallel to the line of sight, while the first bin is almost perpendicular to the line of sight.
We show the 3D power spectrum in these four bins in Figure 3.For all four μ bins, our reconstructed Lyα achieves is within <10% of the true P3D for wavenumbers smaller than 10 h Mpc −1 .The improvement is greatest in the fourth bin, 0.75 μ 1, where the low resolution is off by >50% for most scales.This bin is also the most important for lineintensity mapping because it represents the spectral power most nearly along the line of sight.The substantial accuracy improvements offered by our reconstructed P3D opens the door for building accurate Lyα mock skies in Gpc volumes, using 4096 3 -8192 3 grids.

Uncertainty Quantification
As described in Section 4.2, we generate probabilistic ensemble forecasts by incorporating noise into the network at multiple scales.This approach allows for sampling repeated predictions to form a probabilistic ensemble forecast for a single region of space, even when using a single set of input fields, which gives us a measure of the uncertainty of our model's prediction.
Figure 4(a) presents an example ensemble forecast, illustrating the mean and interquartile range per grid cell across a set of 16 unique ensemble members.Qualitatively, the results demonstrate higher variability (uncertainty) in challenging locations such as the edges of absorption features, and lower variability in the center of large transmission or absorption regions.This aligns with expectations, as these areas adjacent to regions of zero flux are the most sensitive to small changes in the hydrodynamic fields and the discrepancy between lowand high-resolution simulations.We chose to use the interquartile range to represent the model uncertainty, as the distribution is non-Gaussian, but using the variance or minmax range produced similar correlations.
As shown in Figure 4, the predictive error of the ensemble mean is greatest in roughly the same regions where the variability is greatest.Figure 4(b) shows the cell-wise correlation between the true error (which we know for the case of test data) and the variability of the ensemble as measured by the interquartile range.The error seems to correlate with the variance according to a power law, meaning we can safely treat the variance as a surrogate for the cell-wise error.This suggests that it is possible to use this ensemble variability to estimate the uncertainty associated with cell-wise flux prediction.This uncertainty can be later be propagated through the scientific pipeline as needed.

Hydrodynamic Fields
Inspecting the baryon density fields in Figure 1 by eye, we see that our reconstruction matches the target field almost everywhere.The most prominent features missing from the baryon density reconstruction are the smallest filaments, especially those nearest the large clusters.These filaments are so thin that they are only a couple grid cells wide in the target field and are often not visible at all in the low-resolution simulation, making them challenging to model.
For most regimes, including the regime relevant for the IGM and Lyα forest, the reconstructed temperature field is accurate as well.However, our model also struggles to capture the complexity of thermal shocks surrounding galaxy clusters, such as the two bright points in Figure 1.These turbulent features result from the accretion of matter from filaments onto galaxy halos and are commonly found around large gravitationally collapsed structures.The high-resolution simulation is able to more accurately capture the complex fluid dynamics of these shocks, but the low-resolution simulation cannot capture this detail and blurs them into much wider and more diffuse clouds.Translating between these diffuse clouds to complex, turbulent shocks presents a challenge due to the difference in information content of the two fields.Our model is able to shrink the diameter of the reconstructed shock closer to the target, but it is unable to fully represent the turbulent details, which are not present in the input data.This cell-wise reconstruction error in temperature can be seen in the width of the distribution in the high temperature regime of Figure 6.
As shown in Figure 5, the PDF of baryon density reconstructed by our model is closer to the target field than the input field virtually everywhere.The PDF also reveals that our reconstructed baryon density field is notably much more diverse than the low-resolution input, as it contains grid cells with more extreme values than it is input (both more dense and more diffuse), making it even more similar to the target field.
The agreement of the PDF for temperature is less impressive than for density.Even so, our model offers substantial improvement over the low-resolution Nyx simulation, in particular in the tails of the temperature PDF.

Gas Phase Statistics
The properties of the gas that makes up the intergalactic medium (IGM) have been well studied, and we wish to   compare our reconstructed results to the expected statistics.We hope to be able to faithfully reproduce the phase distribution of the IGM.
We plot the density-temperature distribution in Figure 7.As a result of the downsampling scheme used to generate our targets (see Appendix C), much of the diversity in the gas density-temperature distribution is reduced in our reconstruction inference.The physics of our simulations do not capture processes in the condensed galaxy-forming regions and may not have enough statistical accuracy in the warm-hot intergalactic medium.However, those regimes are of little consequence for the Lyα forest.Our method does preserve the power-law relationship of the diffuse gas: Here, T 0 is a lower end of the power law and γ is the slope.The regime that follows this power law represents the majority of the gas and is by far the most important for predicting the Lyα flux.This power law is fit using gimlet code, and the resulting parameters for each volume are shown in Figure 7.The powerlaw parameters produced by the low-resolution simulation are off by more than 10%, while our method reconstructs these parameters to within about 1%.Measurements of the T 0 and γ for different redshifts provide insights into the thermal and ionization history of the IGM, as both H and He II reionizations are expected to significantly affect these parameters (see, for example, Walther et al. 2019;Gaikwad et al. 2021).
The region of phase space below this power law is unpopulated in both low-and high-resolution simulations, but it is populated in our reconstructed result and in the target volume used for training.This erroneous population is introduced through some artifact of our resampling scheme (as described in Appendix C).We have chosen a methodology that best preserves the spectral qualities of the high-resolution data but introduces some more subtle statistical artifacts like this.In future work, we hope to get around this issue, but because we are using a target that is shrunken by a factor of 8, it is not realistic to expect that all statistics can be preserved.This erroneous population accounts for very little of the total volume of our reconstructed model and has little impact on the properties of the derived Lyman-α fields.

Conclusion
In this work, we have presented a generative deep-learning method for translating between low-and high-resolution simulations of hydrodynamic structure formation.We have demonstrated that this translation provides an accurate reconstruction over a wide range of scales and captures a number of important statistical properties.Our model output can match the "true" statistical properties of the target hydrodynamical fields in the regimes relevant to the observables, and it outperforms the low-resolution input simulation in all regimes.The observable, Lyα flux, derived from our model output is within a few percent of the "real" simulation's power spectra up until scales of around ∼10 h Mpc −1 .
The scales accurately captured by this method go well beyond the resolution of the DESI survey (Walther et al. 2021).
Our method provides a far less computationally expensive alternative to high-resolution hydrodynamic simulations when generating mock survey data.
Our network can be conveniently trained on a relatively small volume for some given cosmological and astrophysical parameters, and then used to reconstruct an arbitrarily large volume.As shown in Figure 8, even an 80 Mph h −1 sample is sufficient to accurately train the model.This feature enables significant computational savings when generating larger mock catalogs.
Using noise injection in our generator, we can bridge the difference in the spectral power and information content of the input and target volumes.However, we also demonstrate that this noise injection allows us to better understand the predictive uncertainty of the reconstructed outputs.The cell-wise diversity of an ensemble of reconstructed outputs correlates with regions of greater inference error.This provides us with a means of estimating the uncertainty of our output, which can be propagated through the cosmological analysis pipeline.
There are some notable shortcomings in the hydrodynamic field reconstructions, in particular in and around the larger clusters and shocks.While these regions represent a very small percentage of the Universe by volume and have only marginal effect on the Lyα statistics used for cosmology, they are of significant astrophysical interest.Because these regimes are especially dependent on subgrid dynamics, they are poorly captured by the low-resolution simulation and present the greatest challenge to our machine-learning model.However, our method of refining the hydrodynamic fields from lowresolution simulations presents a notable improvement over Figure 7.The density-temperature distribution of the gas in our model predictions compared against the reference distributions from the low-resolution, highresolution, and resized simulations.For each output, we plot the best-fit power-law relationship between density and temperature, and we find our model provides the most accurate T 0 and γ estimates when compared against the high-resolution simulation.
previous machine-learning methods that reconstructed the hydrodynamic fields from n-body dark matter alone and could not capture these shock fronts at all (Harrington et al. 2022;Horowitz et al. 2022).Because we begin with realizations of the hydrodynamic fields, our model does not need to guess where these shocks are.
break it up into much smaller "chunks" that can be inferred individually and in parallel.
The neural network has less information and context available to it when inferring the cells closest to the edge of an input volume, causing it to falter and predict erroneous features.This leads to "edge effects," which are noticeable artifacts, and overall greater inference error around the edges of a predicted volume.
These edge effect artifacts would appear throughout the whole simulated volume, once the inferred chunks are stitched back together.To prevent this, we can make each chunk slightly wider than is needed and later trim off the overlapping regions.In practice, we tile the field with chunks of 256 3 , but to each inside face we add an extra 32 "ghost" cells, which are removed after inference.This greatly decreases the overall error and minimizes discontinuities between chunks.

Appendix C Resizing Methodology
Finding a method to resize large volumes presented a unique challenge.In typical image-processing circumstances, different resizing methodologies do not result in vastly different outputs.However, our circumstance is unlike common image problems, as our signal is populated by very thin features and spans many orders of magnitude (not to mention our application requires resizing by a factor of 8), so we must be especially cautious.It is thus important that we preserve the statistics of our target fields as precisely as possible, and we have experimented with several different resizing methods.

C.1. Subsampling
The most naive method for resizing a volume is to simply extract an evenly spaced subset of the high-resolution field.In our case, this means selecting one out of every 512 cells to represent that region of space.We sampled from about the center of that box using field = field [3::8, 3::8, 3::8].
This virtually guarantees that the probability distribution of each field's values will be preserved, but it can entirely disrupt the power spectrum.For subsampling, the overall power contained in the power spectra is preserved, so any power contained in scales above the new Nyquist frequency is aliased and disruptively shifted into lower wavenumbers (Trefethen 2019).This distorts the power spectrum, especially in the case of baryon density, where there would be large quantities of structure and spectral power on scales past the Nyquist limit.

C.2. Local Mean
Another plausible candidate for resizing our fields is to take the mean of each local cube of 8 3 cells using skimage.transform.downscale_local_mean().
Unlike subsampling, this method preserves the power spectrum at small scales.It is not perfect, but because it smooths the fields as it shrinks, the local mean operation does not introduce erroneous jagged features.But because each local neighborhood is pushed toward its average, this method inevitably skews the probability distribution toward the most common values.However, we prefer to minimize changes to both of these statistics.

C.3. The Gaussian Pyramid
We can find a middle ground between the two methods above by combining a subsampling routine with a Gaussian blur.If the spatial reduction we required in each dimension was only a factor of 2 or so, a single low pass and resample would likely suffice.However, for our purposes, we wish to reduce the volume's scale by a factor of 8 or more in each dimension.To achieve this while best preserving the field's power spectrum, we will apply multiple passes of a filter-thenresample scheme to gently resize our training data in a way that is more sensitive to the diverse scale space.
For a factor of 8, the Gaussian pyramid is executed by the following sequence:

C.4. Comparison
We plot the probability density functions and power spectra for each of these methods in Figure 9.The subsampling and pyramid methods can be applied in eight unique ways (choosing from eight central cells for subsampling), so we can establish a kind of dispersion shown as the shaded regions in the plot.We choose to use the pyramid method to train on, as the subsampling routine distorts the baryon density's power spectra and the local mean distorts the temperature PDF.
The orange line, representing our Gaussian pyramid method, is close to the high-resolution statistics in all cases.Subsampling (blue), may better preserve the PDF, but it introduces wild errors of over an order of magnitude in the power spectrum.Because of this, we finally settle on the Gaussian scheme to generate our training data.

Figure 1 .
Figure 1.Comparison of simulated and predicted Hydrodynamic fields and Lyα flux.The first column is the low-resolution simulation fed as input to our model, which is trained to reconstruct features from the high-resolution simulation (fourth column) using a version of it resized to an 8× coarser grid as the training target (third column).Our model predictions are shown in the second column, demonstrating clear improvement over the low-resolution simulation.

Figure 2 .
Figure 2. Part (a): Schematic of our model architecture.We pass as input the low-resolution density, velocity, and temperature fields.These are processed by a convolutional encoder that extracts features at multiple scales, which are then used to by the generator to reconstruct the target density, temperature, and Lyα flux fields.We inject learned multiscale noise in the generator, enabling uncertainty quantification by making our predictions stochastic.Part (b): The FADE residual block submodule in the generator.The FADE module performs element-wise denormalization by modulating the normalized activation using a learned affine transformation (adapted from Figure 3 in Jiang et al. (2020a)).Part (c):To satisfy memory limitations, inference is performed on small chunks of the desired volume, one at a time.To reduce edge effects, the chunks overlap with ghost cells that are removed after inference is performed.This is discussed in more depth in Appendix B.

Figure 3 .
Figure 3.The probability density (top left panel) and power spectrum (top right panel) of Lyα flux in redshift space in four cases: the high-and low-resolution Nyx simulations, the resized target, and our network output.The uncertainty band around our output shows the min-max range of an ensemble of 16 predictions.The lower row shows the dimensionless 3D flux power spectrum in four μ bins.The top left is mostly parallel to the line of sight, and the bottom right panel shows the power spectrum mostly perpendicular to the line of sight.The gray band in the residual highlights the 10% error range around the "true" high-resolution target.Our output (in purple) shows clear improvement over the low-resolution Nyx simulation.

4.
Part (a): Comparison between (fourth panel) and ensemble prediction error (third panel).Model uncertainty is defined as the width of the interquartile range of an ensemble of 16 predictions.The error is defined as the absolute difference between the target field (first panel) and the ensemble mean (second panel).The regions of large uncertainty of an ensemble of outputs coincide with the regions of greatest prediction error.Part (b): Correlation between the model uncertainty (interquartile range) of an ensemble of reconstructions and error of their mean in redshifted flux.The line shows a best-fit power law with an exponent of 0.92.

Figure 5 .
Figure5.The probability density functions of the predicted density and temperature fields compared against the reference distributions from the training target and high-resolution simulation.Reconstructing hydrodynamic fields is not the focus of our method, but nevertheless we provide improvement over the low-resolution fields.A discussion of how the target density field has been distorted can be found in Appendix C.

Figure 6 .
Figure 6.Cell-wise correlation between the reconstructed and target density and temperature fields.The diagonal line shows one-to-one correspondence.Our model struggles with the high-temperature shock regions, so the spread is wider in the high end of temperature range.

Figure 9 .
Figure9.Hydrodynamic field statistics for each resize method applied to our validation field.