Deep learning reconstruction of three-dimensional galaxy distributions with intensity mapping observations

Line intensity mapping is emerging as a novel method that can measure the collective intensity fluctuations of atomic/molecular line emission from distant galaxies. Several observational programs with various wavelengths are ongoing and planned, but there remains a critical problem of line confusion; emission lines originating from galaxies at different redshifts are confused at the same observed wavelength. We devise a generative adversarial network that extracts designated emission line signals from noisy three-dimensional data. Our novel network architecture allows two input data, in which the same underlying large-scale structure is traced by two emission lines of H$\rm \alpha$ and [OIII], so that the network learns the relative contributions at each wavelength and is trained to decompose the respective signals. After being trained with a large number of realistic mock catalogs, the network is able to reconstruct the three-dimensional distribution of emission-line galaxies at $z = 1.3-2.4$. Bright galaxies are identified with a precision of 84%, and the cross-correlation coefficients between the true and reconstructed intensity maps are as high as 0.8. Our deep-learning method can be readily applied to data from planned space-borne and ground-based experiments.


INTRODUCTION
The large-scale distribution of galaxies carries rich information on the structure and the evolution of the Universe and on how galaxies are formed from early through to the present day. Line intensity mapping (LIM) is aimed at measuring large-scale intensity fluctuations of line emissions from galaxies and intergalactic gas. Complementary to traditional galaxy surveys, LIM covers a broad spectral range and detects signals from essentially all emission sources residing in a large cosmological volume . It is thus possible to make a structural "map" of the Universe by a single observation. There have already been a few successful experiments that detect hydrogen 21-cm line (Chang et al. 2010;Ali et al. 2015), and observations targeting other emission lines such as CO/ [Cii] and Lyα/Hα/[Oiii] are ongoing (e.g., Keating et al. 2020;Concerto Collaboration et al. 2020;Cleary et al. 2021) or planned (e.g., Doré et al. 2014Doré et al. , 2018. LIM can efficiently survey a large observational volume, and the data from LIM are well suited, for instance, to study the formation and evolution of galaxies (Breysse et al. 2016;Keating et al. 2016) as well as geometry and the matter content of the Universe (see, e.g., Doré et al. 2014). LIM can also be used to study the reionization by combining with 21 cm observations of the inter-galactic medium (Dumitru et al. 2019;Moriwaki et al. 2019).
A key process in the analysis of LIM data is to separate the contributions from different emission lines originating from sources at different redshifts. Let us consider two emission lines with rest-frame wavelengths λ 1 and λ 2 . If they are emitted at redshifts z 1 and z 2 that satisfy λ 1 (1 + z 1 ) = λ 2 (1 + z 2 ) = λ o , they are observed at the same wavelength λ o , appearing as "interlopers" to each other. Cross-correlation analyses are proposed to solve this line confusion problem (e.g., Visbal & Loeb 2010), and there are several other statistical methods (e.g., Gong et al. 2014;Cheng et al. 2016). It is technically challenging to isolate the contribution of a particular emission line and to infer the intensity distribution, but a successful direct reconstruction of the threedimensional distribution of the emission sources would enhance the constraining power in cosmological studies as well as studies on the galaxy formation and evolution. If the contamination of interloper lines can be removed, we are able to analyze the large-scale structure accurately (see e.g., Fonseca et al. 2017) and also constrain the galaxy population by using methods such as the voxel intensity distribution (Breysse et al. 2017).
Customized convolutional neural networks (CNNs) have been developed and applied to separate different emission line signals and to effectively de-noise a map (Moriwaki et al. 2020(Moriwaki et al. , 2021, but such applications are limited to two-dimensional images without spectral information. Cheng et al. (2020) devise a reconstruction method that makes use of spectral analysis. Their algorithm effectively extracts the source galaxies with multiple emission lines brighter than a few times the noise level, but fainter signals still remain difficult to be detected. In this Letter, we propose to utilize the spectral information in an efficient manner so that a "machine" can learn the correlation of multiple emission lines at different wavelengths. It is possible to perform a full three-dimensional reconstruction by using a LIM observation with a broad wavelength coverage. This finally enables the reconstruction of the three-dimensional cosmic structure with LIM.

METHOD
We primarily consider NASA's SPHEREx 1 mission to be launched in 2024 and identify the two brightest emission lines Hα 6563Å and [Oiii] 5007Å as our target to be detected by SPHEREx. We do not consider the other interlopers such as [Oii] and Hβ. While the other lines' intensities are likely to be subdominant, they can also carry additional information in the spectral domain. Our method can easily be adjusted to deal with more than two emission lines, although the time needed for training may increase.

Training data
To generate mock observation catalogs for training and test, we use a publicly available code pinocchio (Monaco et al. 2013) that populates a large cosmological volume with dark matter halos 2 . We configure past light-cones of a hypothetical observer by arranging several simulation outputs to fill the volume (Figure 1). We set the simulation box size to 690h −1 comoving Mpc and the aperture of the light cone to 1.5 deg. The minimum halo mass considered is 2 × 10 11 h −1 M . We have confirmed that the presence of smaller haloes does not affect the total intensity significantly nor the intensity distribution. We carefully choose the line-of-sight direction of the light cone so that any galaxy does not appear more than once in the redshift range of our interest.
To assign line luminosities to the galaxies (haloes), we use halo mass-to-line luminosity relations computed in our previous study (Moriwaki et al. 2020) based on the results of cosmological hydrodynamics simulation Illus-trisTNG (Nelson et al. 2019). We assign the luminosities by assuming that the line luminosities of haloes in a halo mass bin M i follow an asymmetric normal distribution with different variances on the larger and smaller side than the most frequent luminosity value L i . This assigning process produces similar scatter in the halo mass-toline luminosity relations as that of IllustrisTNG. Both the Hα and [Oiii] line luminosities are approximately proportional to the star formation rate, but the derived Hα /[Oiii] ratio varies over a factor of ten because the [Oiii] luminosity depends also on the properties of the interstellar medium such as metallicity and ionization parameter. We find that the line ratios of our catalog haloes are also scattered in a similar way as that computed with IllustrisTNG.
The middle and bottom panels of Figure 1 show the intensity distributions of Hα and [Oiii] on a past lightcone. We adopt the spatial and spectral resolutions and the noise levels of SPHEREx 3 . The angular resolution is 0.1 arcmin and the spectral resolution is approximately constant (R ∼ 40) over the wavelength range of our interest. 4 Note that the corresponding physical length to the angular resolution is much smaller than that of the spectral resolution. At z = 1.5, for instance, 0.1 arcmin corresponds to 52 kpc, while R ∼ 40 corresponds to 47.2 Mpc. We add Gaussian noise to make realistic mock catalogs. The noise level is about two orders of magnitude larger than the mean intensities of line emissions (top panel of Figure 1), and thus detecting diffuse emitters, whose signals originates from galaxies from z = 1.3 to 2.4. The data cube within an angular size of 6.4 (the size of the yellow boxes) are used for training. Note that we have adopted larger angular resolution for visibility in this figure than the actual resolution of our training data. sources that are distributed over the entire intensity field is difficult even with our machine learning method.
For training, we generate 500 independent light-cones with 1.5 deg aperture over λ obs = 1.0µm − 2.5 µm using pinocchio. The wavelength range corresponds to 32 spectral bins of SPHEREx as shown in Figure 1. To reduce computational cost, we generate input data with 64 × 64 angular pixels. This corresponds to a field-ofview of 6.4 × 6.4 with an angular resolution of 0.1 arcmin. From each light cone, we randomly extract 100 such small volumes. Then a total of 50,000 mock observational data cubes are generated. As discussed in the following section, we use two portions of the mock observational data with different wavelength ranges (indicated by orange and yellow boxes in Figure 1) as input to the neural networks.

Network
We use a conditional generative adversarial network (cGAN; Isola et al. 2016) to perform the threedimensional reconstruction. In particular, we adopt conditional Wasserstein GAN (WGAN; Arjovsky et al. 2017). WGAN is known to increase training stability and the diversity of generated data (Foster 2019). We have four 3D convolutional neural networks: two generators, G 1 and G 2 , that reconstruct Hα and [Oiii] signals from observed data and corresponding two critics 5 , D 1 and D 2 , that distinguish true and reconstructed images. Each generator consists of four convolution layers followed by four de-convolution layers (Figure 2), whereas the critic consists of four de-convolution layers. The networks also include skip connections (Isola et al. 2016), dropout (Srivastava et al. 2014), and batch normalization (Ioffe & Szegedy 2015).
The most important information to be learned by the generators is the co-existence of multiple emission lines at different wavelengths. To make it easier for the generators to learn that the two emission lines are always observed with a separation of ∆λ obs = (λ Hα − λ [OIII] ) × (1 + z), we arrange the architecture so that the generators receive a pair of observed data cubes as an input. The cubes are covered by sixteen SPHEREx wavelengths filters from 1.48 µm to 2.19 µm, and 1.14 µm to 1.68 µm, which correspond to 1.25 z 2.4 of Hα and [Oiii] lines, respectively. The input cubes, denoted by x 1 and x 2 , are indicated by the orange and yellow boxes in Figure 1. By giving the two data cubes arranged such that the two emission lines from the same source appear at the same pixel, we let the generators learn the consistent co-existence of the two lines. Figure 2. The architecture of the generator that takes two feature maps (data cubes) as an input and consists of four shared convolution layers, followed by four deconvolution layers.
The critics also receive two data cubes as an input: either a pair of the observed and reconstructed data, (x i , G i (x 1 , x 2 )), or a pair of observed and true data, (x i , y i ), where y i is the true data cubes of Hα (i = 1) or [Oiii] (i = 2) that cover the same wavelength range as The networks are trained to optimize two loss functions defined by where the indices i = 1, 2 correspond to Hα and [Oiii], and λ i is a hyperparameter which we set λ 1 = λ 2 = 100 after some experiments. The objective of the generators (critics) is to decrease (increase) the loss functions. Another important building block of WGAN is the Lipschitz constraint imposed on the critics, which prevents the outputs of the critics from changing abruptly. To en-force the constraint, we adopt the same approach as in the original proposal by Arjovsky et al. (2017) in which they clip the weights of the critic to lie within a small range of [−0.01, 0.01]. We build our network using Tensorflow. We use Adam optimizer (Kingma & Ba 2014) with a learning rate of 0.0002 for training, set the batch size to be 50, and run 50 epochs on a single Nvidia Titan RTX GPU.

RESULT
To measure the performance of our WGAN, we generate an additional set of 1000 light-cones that are independent of the training data. We randomly choose an area of 0.85 deg × 0.85 deg from each light-cone and divide it into 8×8 cubes with the same size as the training data. The prepared test data are given to the generator of our WGAN. Finally, we reconstruct intensity cubes by combining 8 × 8 outputs.
In Figure 3, we show an example of the true and reconstructed Hα intensity distributions from z = 1.3 to 2.4. The large-scale galaxy distribution is reproduced accurately in 3D, despite the large noise level (see Figure 1). Pixel-by-pixel comparison shows remarkably good agreement between the true and reconstructed maps (Figure 4). Our network reconstructs the brightest sources accurately, and thus the underlying large-scale distribution is also well reproduced. Diffuse sources are not well reproduced because of the large observational noise considered in our study. This can also be seen in the point distribution function (Figure 5). The bright ends are reproduced, but the WGAN appears to have learned that it is optimal to regard faint pixels just as noise-dominated. The vertical lines are the noise level of SPHEREx averaged over 16 wavelength bins of the input data cubes, σ n = 2.25 × 10 −6 (upper), 3.06 × 10 −6 erg/s/cm 2 /sr (bottom). Figure 5 indicates that the effective limit of our machine learning reconstruction is a few-σ n . This is similar to the result of Cheng et al. (2020), who show that the CO line signals from similar redshifts are reconstructed down to a fewσ n level. Detecting diffuse "clouds" would be extremely difficult unless the observational noise is significantly reduced in future experiments. It should be noted here that the weaker [Oiii] signals are also accurately reconstructed, even though the bright end of the observed PDF is dominated by foreground Hα intensities.
We count the numbers of the pixels with intensities larger than 3-σ n in true (N true ) and reconstructed (N rec ) maps. We then compute the recall, N X /N true , and the precision, N X /N rec , where N X is the number of pixels that are detected and matched in both the true and reconstructed maps. The recall and the precision are 0.67 and 0.84 for Hα, and the corresponding values for [Oiii] are 0.78 and 0.68. We estimate that the typical intensities of [Oiii] are roughly half of Hα at the same observed wavelength, and our previous study shows that the detection performance degrades for such weaker lines when only two-dimensional data is used for the machine learning analysis (Moriwaki et al. 2020). The impressive reproducibility of the [Oiii] distribution in the present study can be attributed to the inclusion of the spectral information, as we discuss in the following.
To quantify the reconstruction accuracy of the largescale distribution, we compute the cross-correlation co- where P X is the cross-power spectrum and P true and P rec are the auto-power spectra of the true and reconstructed maps. We find that a high reconstruction performance with r ∼ 0.8 at k = 0.3 arcmin −1 for both Hα and [Oiii] has been achieved over the wide redshift range. This is consistent with the point source detection accuracy discussed above. To examine if the spatial clustering information is used along with the spectral information, we perform an additional test. We randomly shuffle the pixels of the test data and get rid of the angular correlation in the signals while preserving the spectral correlation. We then input the shuffled data into our network. The test result shows that the network still achieves high reproducibility; the bright pixels (> 10 −5 erg/s/cm 2 /sr) are reproduced with similar precision of ∼ 0.6 − 0.8 for both the lines. This implies that our network emphasizes the spectral information (emission line features) more than the spatial correlation information. We note that we consider a small area of 6.4 × 6.4 arcmin 2 for the reconstruction in this study. With the finest resolution achievable for our available computational resources, we are able to represent point sources but the particular configuration does not allow incorporating large-scale clustering features. In our previous study (Moriwaki et al. 2021), we showed that the information on the large-scale clustering is more properly used when the training data are generated with a sufficiently large area. Clearly, there is room for improvement in our method. In our future work, we will use data set with larger dimensions so that a machine can learn both the spectral information and the large-scale clustering of galaxies.

SUMMARY
We have developed, for the first time, neural networks that extract signals of two emission lines from noisy data obtained in LIM observations. Our 3D WGAN makes use of the information on the co-existence of two emission lines in a given pair of data cubes. It is able to reconstruct the bright sources when trained with a large number of mock observational maps that are closely configured for the SPHEREx experiment. Our method can be extended and applied to LIM observations at any other wavelengths. Once we can extract the individual signals, the reconstructed data can be used for cosmological/astrophysical parameter estimate, cross-correlation analysis, and planning follow-up observations.