Cosmic Velocity Field Reconstruction Using AI

Ziyong Wu; Zhenyu Zhang; Shuyang Pan; Haitao Miao; Xiaolin Luo; Xin Wang; Cristiano G. Sabiu; Jaime Forero-Romero; Yang Wang; Xiao-Dong Li

doi:10.3847/1538-4357/abf3bb

1. Introduction

The large-scale structure (LSS) of the universe is a key observational probe to study the physics of dark matter, dark energy, gravity, and cosmic neutrinos. In the next 10 years, stage IV surveys, including DESI⁵ , EUCLID⁶ , LSST⁷ , WFIRST⁸ , and CSST, will begin to map out an unprecedented large volume of the universe with extraordinary precision. It is of critical importance to have statistical tools that can reliably extract the physical information in the LSS data.

The peculiar velocities of the galaxies, sourced by the "initial" inhomogeneities, is an excellent probe for the physics of the LSS, enabling us to better study or measure such quantities as the redshift space distortions (Kaiser 1987; Jackson 1972), baryon acoustic oscillations (Eisenstein et al. 2005, 2007), the Alcock–Paczynski effect (Alcock & Paczyński 1979; Li et al. 2014, 2015, 2016; Ramanah et al. 2019), the cosmic web (Bardeen et al. 1986; Hahn et al. 2007; Forero-Romero et al. 2009; Hoffman et al. 2012; Forero-Romero et al. 2014; Fang et al. 2019), the kinematic Sunyaev–Zeldovich effect (Sunyaev & Zeldovich 1972, 1980), and the integrated Sachs–Wolfe effect (Sachs & Wolfe 1967; Rees & Sciama 1968; Crittenden & Turok 1996).

Observationally, the measurement of the peculiar velocities is a difficult task, as it requires redshift independent determination of the distance, which is usually accomplished via distance indicators such as type Ia supernovae (Phillips 1993; Riess et al. 1997; Radburn-Smith et al. 2004; Turnbull et al. 2012; Mathews et al. 2016), the Tully–Fisher relation (Tully & Fisher 1977; Masters et al. 2006, 2008), and the fundamental plane relation (Dressler et al. 1987; Djorgovski & Davis 1987; Springob et al. 2007). As an alternative approach, one can "reconstruct" the cosmic velocity field from the density field based on their relationship described by theories. Here the difficulty is the complexity caused by the nonlinear evolution of the structures. Numerous works have been done in this direction. For more details, one can check Nusser et al. (1991), Bernardeau (1992), Zaroubi et al. (1995), Croft & Gaztanaga (1997), Bernardeau et al. (1999), Kudlicki et al. (2000), Branchini et al. (2002), Mohayaee & Tully (2005), Lavaux et al. (2008), Bilicki & Chodorowski (2008), Kitaura et al. (2012), Wang et al. (1809), Jennings & Jennings (2015), and Ata et al. (2017).

Recently machine-learning algorithms, especially those based on deep neural networks, are becoming promising toolkits for the study of complex data that are difficult to solve by traditional methods. So far, this technique has been applied to almost all subfields of cosmology, including weak gravitational lensing (Schmelzle et al. 2017; Gupta et al. 2018; Springer et al. 2020; Fluri et al. 2019; Jeffrey et al. 2020; Merten et al. 2019; Peel et al. 2019; Tewes et al. 2019), the cosmic microwave background (Caldeira et al. 2018; Rodriguez et al. 2018; Perraudin et al. 2019; Münchmeyer & Smith 2019; Mishra et al. 2019), the large-scale structure (Ravanbakhsh et al. 2017; Lucie-Smith et al. 2018; Modi et al. 2018; Berger & Stein 2019; He et al. 2019; Lucie-Smith et al. 2019; Pfeffer et al. 2019; Ramanah et al. 2019; Tröster et al. 2019; Zhang et al. 2019; Mao et al. 2021; Pan & Liu 2020), gravitational waves (Dreissigacker et al. 2019; Gebhard et al. 2019), cosmic reionization (La Plante & Ntampaka 2018; Gillet et al. 2019; Hassan et al. 2019, 2020; Chardin et al. 2019), and supernovae (Lochner et al. 2016; Moss 2018; Ishida et al. 2019; Li et al. 2019; Muthukrishna et al. 2019). For more details, one can refer to Mehta et al. (2019), Jennings et al. (2019), Carleo et al. (2019), Ntampaka et al. (2019), and the references therein.

In this paper, we apply deep-learning techniques to reconstruct the velocity field from the dark matter density field. This converts the reconstruction problem to a nonlinear mapping between the two fields, which is achieved via a deep neural network with a U-net style architecture. This paper is organized as follows. In Section 2, we introduce the data set and data processing methods we use. In Section 3, we discuss our neural network, including the construction of our neural network, the selection of parameters and details of training, etc. Section 4 presents the main results, and Section 5 represents the conclusion and discussion.

2. Training and Testing Data Sets

The training and testing samples are generated using the COmoving Lagrangian Acceleration (COLA) code (Tassev et al. 2013). COLA computes the evolution of dark matter particles in a frame that is comoving with observers following trajectories predicted by the Lagrangian perturbation theory (LPT), in order to accurately deal with the small-scale structures, without sacrificing the accuracy of large scales. Being hundreds of times faster than N-body simulations, it still maintains a good accuracy from very large to highly nonlinear scales.

We generate a set of 14 simulations, assuming a ΛCDM cosmology Ω_m = 0.31, Ω_b = 0.05, σ₈ = 0.83, n_s = 0.96, and H₀ = 67.77 km·⁻¹ Mpc⁻¹. Each of the simulations is run within a cube with a volume of ${(512{h}^{-1}\mathrm{Mpc})}^{3}$ using 512³ dark matter particles, having a mean separation of 1h⁻¹Mpc per dimension. The output at z = 0 is then used for the main part of our analysis.

The cloud-in-cell (CIC) algorithm is adopted for constructing the density and momentum fields from the outputs. Since the momentum has three dimensions, for each sample we need to construct three fields describing p_x, p_y, and p_z, respectively. The division of the momentum and density fields then leads to three velocity fields, i.e., v_x( x ), v_y( x ), and v_z( x ).⁹ For all fields, we choose a resolution of (2 h⁻¹ Mpc )³, corresponding to 256³ voxels.

In practice, we further split the density and momentum/velocity voxels into smaller subcubes before feeding them to the neural network. We take such a process based on the following considerations:

1.
Learning a larger cube requires a larger number of neurons or layers in the network, making the training more difficult and expensive.
2.
Dealing with large fields is limited by memory constraints, especially if GPUs are used in the training process.
3.
By using small cubes as training samples, we force the neural network to focus on interpreting and predicting the small-scale, nonlinear patterns in the velocity fields. The large-scale velocity field, which can be easily estimated using perturbation theory, is not our focus.

To avoid possible inaccuracy and complexity brought by the boundary effects, the neural network is designed to map the density fields into momentum fields having a smaller size. For each momentum field, we take a 240³ voxel subfield from it, cut the subfield into 1728 20³-voxel subcubes, and set the subcubes as the targets (i.e., outputs) of the neural network. The inputs of the network are a series of 32³ voxel density fields sharing the same centers with those momentum fields. In this way, 75% voxels (lying near the outer boundary of the density fields) serve as adjacent points, for the purpose of enhancing accuracy 1.

**Figure 1.** Left panel: the input of our neural network is a 32³ voxel density field (blue), while the output is a 20³ voxel velocity field (red) located around the center of the input. This choice reduces boundary effects. Right panel: a series of overlapped input fields yield to nonoverlapped outputs, which can be spliced back to build up larger cubes.
Download figure:
Standard image High-resolution image

Furthermore, since the density values span three orders of magnitude, it is difficult for the neural network to establish an accurate mapping. Thus we use the following logarithmic transform to mitigate this problem

$\begin{eqnarray}&&\tilde{\rho }(x,y,z)=\mathrm{ln}(\rho (x,y,z)+1).\end{eqnarray} \tag{ 1 }$

By using their log values, we greatly decrease the variance. Moreover, the distribution of large-scale structure density is close to the lognormal distribution, so we can use the above expression to convert it into an approximate normal distribution (Falck et al. 2012; Neyrinck et al. 2009; Kitaura & Angulo 2012).

The 32³ voxel fields are split into training, verification, and test sets, among which the training set accounts for 60%, the verification set accounts for 30%, and the test set accounts for 10% of the total data. The single batch number for the training is set as 6.

3. Neural Network Architecture

We adopt a "U-net" style architecture, which is built upon the convolutional network and modified in a way that it has better performance in imaging analysis.

As mentioned in the previous section, the entire simulation box was divided into 32³ voxel subcubes with a size of ${(64{h}^{-1}\mathrm{Mpc})}^{3}$ , and mapped into 20³ voxel velocity fields. 64 h⁻¹ Mpc is large enough to capture the nonlinear features in the field, while reducing the input data complexity, thus reducing the required number of neurons. Accordingly, the overall structure of our network is designed as follows (see Figure 2):

1.
First, the input 32 voxel density fields are fed into two convolution layers, which convolve the inputs and pass the resulting feature fields to the next-level layers. To capture the abundant features in the 3D LSS, each layer has 128 filters, while each filter has a shape of 3³. The latter configuration is adopted throughout our network. These two convolution layers are designed to have zero-padding and 1-stride (in what follows "same convolution"), so that their outputs have the same dimension to their inputs.
2.
Then, the feature fields are convolved by 128 3³-filters, but using a stride of 2. Therefore, the outputs are reduced to the size of 16³. In this step we use the convolution (stride = 2) to effectively decrease the dimensions of the feature maps, and thus reduce the number of parameters to learn and the amount of computation performed in the network.
3.
To further extract features and compress them, the 16³ voxel feature fields are then processed by two same convolution and one convolution (stride = 2), for further feature extraction and compression. Here the three layers have as many as 256 filters, as we expect more features when entering a deeper-level regime.
4.
The outputs of the previous layers, i.e., 256 8³ voxel fields, are passed to two same convolution layers having 512 3³ filters in each, to further extract features.
5.
After that, a series of deconvolution layers are placed to conduct "inverse convolution" and achieve reconstruction. The 512 8³ voxel fields are first deconvolved by 256 3³ filters to produce 16³ voxel fields, then convolved by 256 3³ filters for further information extraction, and finally deconvolved by 128 3³ filters to recover 32³ voxel fields. The deconvolution is achieved via transpose convolution layers¹⁰ with stride 1.
6.
Finally, the 32³ voxel feature fields are passed to six convolution layers without padding (in what follows "valid convolution"). In each valid convolution the 3³-filters decrease the size of the data by 2, so the final output has a shape of 20³. They are passed to a deconvolution layer with three 3³ filters and stride 1 to build up a 20³ voxel cube with three-dimensional velocity as the final output.

**Figure 2.** The overall structure of our network is designed similarly to the "U-net" style architecture, which is built upon the convolutional network and modified in a way that it has better performance in imaging analysis. It consists of 15 convolution layers and 2 deconvolution layers, and maps the 32³ voxel density field (64 h⁻¹ Mpc) to the 20³ voxel velocity or momentum fields (40 h⁻¹ Mpc). A lot of detailed designs are adopted to guarantee the performance of the network.
Download figure:
Standard image High-resolution image

In summary, the network is composed of a series of convolution and deconvolution layers and has a symmetric structure. It can be generally considered as an encoder network followed by a decoder network. In this way, it not only identifies features at the pixel level, but projects the features learned at different stages of the encoder onto another pixel space.

A lot of detailed designs are adopted to guarantee the performance of the network. We summarize them as follows:

1.
In the decoder part, we adopted transpose convolution, instead of upsampling, as the deconvolution layer. Compared with the latter design, transpose convolution does a much better job in dealing with the nonlinearities in the fields. Based on the same consideration, in the encoder part, we also use transpose convolution, instead of max- or mean-pooling, to reduce the data.
2.
After each convolution layer we place one BatchNormalization (BN) layer and one activation layer. The former one is added to prevent the overfitting of the model, reduce the training cost, and improve the training speed. The latter one, for which we use rectified linear unit (ReLU) $f(x)=\max ({\rm{x}},0)$ , is crucial for the neural network, since it brings nonlinearity into the system.
3.
Each deconvolution is followed by a cropping layer, to match the shape of the preceding encoder convolutional density field so as to meet the concatenate condition. We crop both sides with the same pixel to guarantee each side has the same weight to the velocity field.
4.
After every deconvolution, we concatenate the higher resolution feature fields from the encoder network with the deconvolved features, in order to better learn representations in the following convolutions. Since the decoder is a sparse operation, we need to fill in more details from earlier stages.
5.
During the training, we randomly shuffled the input training samples of each epoch to prevent the effect of overfitting due to the similarity of adjacent fields.

4. Result

In the following we compare the neural network outputs with the input truth and the linear perturbation theory expectations. As mentioned in the previous subsection, in order to suppress the boundary effect in the training, the output of the neural network is a 20³-voxel field, located in the center of the 32³-voxel input field. Here we have already put together those subcubes in a larger field (Figure 1).

4.1. Pixel-to-pixel Comparison

Figure 3 shows three slices selected from the testing samples. They all have a size 40 h⁻¹ Mpc × 40 h⁻¹ Mpc and a thickness 2 h⁻¹ Mpc. In all figures, we show the original "truth" velocity field, the predictions of the neural network and the linear perturbation theory, and also their residuals to the original velocity field. Plotted in the lower-left corners are the density fields based on which the velocity fields are derived.

$| \cos \theta | $ — **Figure 3.** Three slices selected from the testing sample. In regions where two bulks of matter collide and merge, the velocity is highly nonlinear. For each slice, we plot the velocity field of original input (top left), U-net prediction (top middle), and perturbation theory prediction (top right). In the bottom row, we also plot the corresponding density field (bottom left), residual of U-net prediction (bottom middle), and PT prediction (bottom right). In the right column, we plot the histograms of v, ∣v_resiual∣/∣v_true∣, and $| \cos \theta |$ . All these suggest that the performance of neural network is much better.
Download figure:
Standard image High-resolution image

In all cases it is clear that the neural network achieves a better performance than the linear perturbation theory:

1.
The linear perturbation theory works well in the regime where the density and velocity is low (e.g., see the lower-right corner of the middle and lower panels). In the lower-right corner of the lowest panel, the performance of the perturbation theory is even better than the neural network, possibly because the latter puts most effort on predicting the nonlinear regions.
2.
The linear perturbation theory completely fails in the nonlinear regions with relatively large density and velocity. But the neural network still works well in these regions.
3.
The most interesting cases are those corresponding to merging situations where two regions with opposing bulk velocities collide into each other. This is shown in the lower-left part of the uppermost panel, the upper-left corner of the middle panel, and the left part of the lowest panel. While in these regions the perturbation theory completely fails, the neural network still works well in reconstructing the velocities.

To quantify the performance of the neural network, for all slices we plot the corresponding histograms of ∣v∣, ∣v_redisual∣/∣v_true∣, and $\cos \theta$ , where θ is the angle between the original and the predicted velocities.

We find the neural network correctly recovers the distribution of ∣v∣. However, the linear perturbation theory tends to overpredict the velocity in the dense regions. In the three slices, when checking the distribution of ∣v∣, the original fields give

$\begin{eqnarray}&&| v| =428\pm 188,186\pm 103,432\pm 208\ \mathrm{km}\,{{\rm{s}}}^{-1}\ (\mathrm{original}),\end{eqnarray} \tag{ 2 }$

while the neural network predictions give

$\begin{eqnarray}&&| v| =376\pm 194,177\pm 106,368\pm 193\,{\rm{k}}{\rm{m}}\,{{\rm{s}}}^{-1}\,({\rm{U}}-{\rm{n}}{\rm{e}}{\rm{t}}).\end{eqnarray} \tag{ 3 }$

In comparison, the linear perturbation theory predictions are

$\begin{eqnarray}&&| v| =649\pm 400,386\pm 354,797\pm 576\ \mathrm{km}\,{{\rm{s}}}^{-1}\ (\mathrm{PT}).\end{eqnarray} \tag{ 4 }$

Comparing ∣v_residual∣, the neural network results are

$\begin{eqnarray}&&| {v}_{\mathrm{residual}}| =126\pm 72,65\pm 41,152\pm 70\ \mathrm{km}\,{{\rm{s}}}^{-1}\ ({\rm{U}}-\mathrm{net}),\end{eqnarray} \tag{ 5 }$

while the linear perturbation theory yields

$\begin{eqnarray}&&| {v}_{\mathrm{residual}}| =281\pm 349,298\pm 299,407\pm 488\ \mathrm{km}\,{{\rm{s}}}^{-1}\ (\mathrm{PT}).\end{eqnarray} \tag{ 6 }$

The latter results are much worse. The residual velocities of the neural network results are 3−4 times smaller than the linear perturbation theory results.

Finally, the neural network performs better than linear theory in predicting the directions of the flows. They have $| \cos \theta | =0.93\pm 0.19,0.9\pm 0.27$ , 0.89 ± 0.26 for the three slices, while in the case of linear perturbation theory the results are 0.92 ± 0.26, 0.49 ± 0.69, and 0.92 ± 0.22. The neural network results are closer to 1 and with a smaller standard deviation.

Similar result can be seen in Figure 4. In the nonlinear regime, linear perturbation theory completely fails, while the U-net architecture can still correctly recover the momentum. When checking the distribution of ∣p∣, the original fields give

$\begin{eqnarray}&&| p| =2371\pm 4613\ \mathrm{km}\,{{\rm{s}}}^{-1}\ (\mathrm{original}),\end{eqnarray} \tag{ 7 }$

while the neural network predictions give

$\begin{eqnarray}&&| p| =2415\pm 4361\ \mathrm{km}\,{{\rm{s}}}^{-1}\ ({\rm{U}}-\mathrm{net}).\end{eqnarray} \tag{ 8 }$

In comparison, the linear perturbation theory predictions are

$\begin{eqnarray}&&| p| =1030\pm 2318\ \mathrm{km}\,{{\rm{s}}}^{-1}\ (\mathrm{PT}).\end{eqnarray} \tag{ 9 }$

In the middle panel, we show that the neural network also performs much better in reconstructing the curl of the momentum field.

**Figure 4.** We compare the momentum, momentum curl, and velocity divergence. Similar to Figure 3, we plot the field of original input (top left), U-net prediction (top middle), perturbation theory prediction (top right), corresponding density field (bottom left), residual of U-net prediction (bottom middle), and PT prediction (bottom right) as well as the histograms of quantities, their residuals, and angle. We find the U-net prediction is almost the same as the truth one, while the linear perturbation theory prediction loses many detailed structures. These all suggest that the performance of the neural network is much better.
Download figure:
Standard image High-resolution image

Another important quantity to characterize is the divergence of the velocity field, given its relevance to study superclusters and the cosmic web (Hoffman et al. 2012; Peñaranda-Rivera et al. 2020). So we analyze the divergence of the velocity field predicted by the neural network and compare it with the linear perturbation theory. We find again that the neural network outperforms linear perturbation theory. In Figure 4, the divergence of velocity field predicted by the neural network is similar to the real one, while the linear perturbation theory has a larger variance.

In addition, we also made a cell-to-cell comparison of the δ–θ_v and δ–Θ_p distribution of the truth field and the U-net or PT predicted fields, in a 480 h⁻¹ Mpc box with cell-size 2 h⁻¹ Mpc. Figure 5 shows that the scattering pattern of the U-net predicted field is basically consistent with that of the truth field. In comparison, the PT method leads to a significantly wrong δ–θ_v distribution, and also seriously overpredicts the curl value of many cells.

**Figure 5.** Distribution of the velocity divergence θ_v and the x-direction momentum curl Θ_p, along with the density contrast δ. From left to right, we show the results calculated using the truth field and the fields predicted by the U-net and PT methods, in a 480 h⁻¹ Mpc box with cell-size 2 h⁻¹ Mpc.
Download figure:
Standard image High-resolution image

4.2. Power Spectrum

We now proceed to check the clustering properties of the fields. The most commonly used statistics in cosmological studies are the two-point correlation function measured in configuration space, or the power spectrum measured in Fourier space. In what follows, we compute the two-point correlation function and power spectra of specific quantities defined as

$\begin{eqnarray}\begin{array}{rcl}{\xi }_{{AA}}(| {\boldsymbol{r}}| ) & = & \left\langle {\delta }_{A}\left({{\boldsymbol{r}}}^{{\prime} }\right){\delta }_{A}\left({{\boldsymbol{r}}}^{{\prime} }+{\boldsymbol{r}}\right)\right\rangle \\ {P}_{{AA}}(| {\boldsymbol{k}}| ) & = & \displaystyle \int {{\rm{d}}}^{3}{\boldsymbol{r}}{\boldsymbol{\xi }}(r){e}^{i{\boldsymbol{k}}\cdot {\boldsymbol{r}}},\end{array}\end{eqnarray} \tag{ 10 }$

where the angle bracket represents the average of the whole sample, and A denotes the physical quantities we choose to investigate. In this analysis, the following power spectrum is taken into account,

$\begin{eqnarray}&&{P}_{| v| | v| },{P}_{{\theta }_{v}{\theta }_{v}},{P}_{{pp}},{P}_{{{\rm{\Theta }}}_{p}{{\rm{\Theta }}}_{p}},\end{eqnarray} \tag{ 11 }$

where $| v| =\sqrt{{v}_{x}^{2}+{v}_{y}^{2}+{v}_{z}^{2}}$ , θ_v ≡ △ · v , p = ∣v∣δ is the momentum, and Θ_p ≡ △ × p . In order to compare the difference between the reconstructed field and the actual field, we define

$\begin{eqnarray}&&T(k)=\displaystyle \frac{{P}_{\mathrm{predicted}}(k)}{{P}_{\mathrm{true}}(k)}\end{eqnarray} \tag{ 12 }$

to characterize the difference between the reconstructed and true fields. All measures are conducted in ${(120{h}^{-1}\mathrm{Mpc})}^{3}$ boxes constructed from the testing samples.

Figure 6 shows the ${P}_{| v| | v| },{P}_{{\theta }_{v}{\theta }_{v}},{P}_{{pp}},{P}_{{{\rm{\Theta }}}_{p}{{\rm{\Theta }}}_{p}}$ of U-net and PT methods and their residuals to the actual power spectrum. Tables 1–4 compare the ratio of the power spectrum of PT and U-Net in different physical quantities mentioned above to the real power spectrum at k = 0.2, 0.6, and 1.0. When checking the results of P_{∣v∣∣v∣}, the neural network much better recovers its value in the quasi nonlinear regime of k ≳ 0.2 h Mpc⁻¹. In particular, we find ≲20% discrepancy in P_{∣v∣∣v∣} within the range of 0.2 h Mpc⁻¹ ≲ k ≲ 1.4 h Mpc⁻¹. The largest discrepancy occurs at k ≃ 0.272 h Mpc⁻¹, corresponding to a T(k) of 0.801. In contrast, the perturbation theory result always has a discrepancy of T(k) ≃ 1.8–5.3.

Table 1. Values of P_{∣v∣∣v∣}/P_true, Sampled at k = 0.2, 0.6, and 1.0

k (hMpc⁻¹)	0.2	0.6	1.0
P_{∣v∣∣v∣}/P_true, linear perturbation theory	2.259	5.245	3.516
P_{∣v∣∣v∣}/P_true, U-net	0.818	1.123	1.011

Download table as: ASCII Typeset image

Table 2. Values of P_pp/P_true, Sampled at k = 0.2, 0.6, and 1.0

k (hMpc⁻¹)	0.2	0.6	1.0
P_pp/P_true, linear perturbation theory	20.860	44.746	43.479
P_pp/P_true, U-net	1.068	0.920	0.929

Download table as: ASCII Typeset image

Table 3. Values of ${P}_{{\theta }_{v}{\theta }_{v}}/{P}_{\mathrm{true}}$ , Sampled at k = 0.2, 0.6, and 1.0

k (hMpc⁻¹)	0.2	0.6	1.0
${P}_{{\theta }_{v}{\theta }_{v}}/{P}_{\mathrm{true}}$ , linear perturbation theory	1.584	7.917	20.731
${P}_{{\theta }_{v}{\theta }_{v}}/{P}_{\mathrm{true}}$ , U-net	0.956	0.996	1.011

Download table as: ASCII Typeset image

Table 4. Values of ${P}_{{{\rm{\Theta }}}_{p}{{\rm{\Theta }}}_{p}}/{P}_{\mathrm{true}}$ , Sampled at k = 0.2, 0.6, and 1.0

k (hMpc⁻¹)	0.2	0.6	1.0
${P}_{{{\rm{\Theta }}}_{p}{{\rm{\Theta }}}_{p}}/{P}_{\mathrm{true}}$ , linear perturbation theory	14.806	24.086	18.065
${P}_{{{\rm{\Theta }}}_{p}{{\rm{\Theta }}}_{p}}/{P}_{\mathrm{true}}$ , U-net	1.048	0.907	0.95

Download table as: ASCII Typeset image

Similar results are found when comparing the other two power spectra. In P_pp we find a ≲8.2% discrepancy within the range of 0.2 h Mpc⁻¹ ≲ k ≲ 1.4 h Mpc⁻¹. The largest discrepancy, at k ≃ 0.816 h Mpc⁻¹, corresponds to a T(k) of 0.918, while the perturbation theory result always has a discrepancy of T(k) ≃ 15–47.

We find ≲18% discrepancy in ${P}_{{\theta }_{v}{\theta }_{v}}$ within the range of 0.2 h Mpc⁻¹ ≲ k ≲ 1.4 h Mpc⁻¹. The largest discrepancy, at k ≃ 1.335 h Mpc⁻¹, corresponds to a T(k) of 0.818, while perturbation theory consistently exhibits a discrepancy of T(k) ≃ 1.3–22.

There is a discrepancy of ≲9.2% in ${P}_{{{\rm{\Theta }}}_{p}{{\rm{\Theta }}}_{p}}$ within the range of 0.2 h Mpc⁻¹ ≲ k ≲ 1.4 h Mpc⁻¹. The largest discrepancy is seen at k ≃ 0.604 h Mpc⁻¹ with a T(k) value of 0.907, while perturbation theory has a discrepancy of T(k) ≃ 11–31.

It is worth noting here that since the outputs of our U-net only have a size of 40 h⁻¹ Mpc, the spatial sampling limits our ability to accurately recover the large-scale power spectra at k < 0.2 h Mpc⁻¹. This can be improved by making corrections on large scales, or simply increase the sizes of the input or output fields. Since the major focus of this work is to check the capability of the neural network in predicting small-scale, nonlinear velocity fields, we will not discuss this issue in detail.

5. Discussion and Conclusions

In this paper, we applied a deep-learning technique to reconstruct the velocity field from the dark matter density field, which has a resolution of 2 h⁻¹ Mpc. To this end we implement a "U-net" neural network, consisting of 15 convolution layers and 2 deconvolution layers with 48,690,307 parameters. The network maps the 32³-voxel input density field to velocity and momentum fields having size of 20³, so as to avoid boundary effects.

We find that the neural network manages to reconstruct the velocity and momentum fields and even outperforms the results from linear perturbation theory. The superiority of the neural network is more pronounced in regions where the density is relatively large and the nonlinear processes dominate. In particular, in regions where mergers take place, linear perturbation theory completely fails, while the neural network successfully recovers the velocity structure.

By conducting pixel-to-pixel comparison between the predicted velocity fields and underlying true fields, we find that the neural network can reasonably recover the distribution of ∣v∣, having discrepancy of ∣v_residual∣ ≲ 150 km s⁻¹, while for the perturbation theory results we find ∣v_residual∣ ≃ 300–400 km s⁻¹. The neural network also predicts well the directions of the velocities compared to the true velocities.

When analyzing the clustering properties of the fields, the neural network can well recover the amplitude and shape of P_{∣v∣∣v∣}, whose error ranges from 1% to ≲10% within the range of 0.2 ≲ k ≲ 1.5. Similarly, the error of P_pp is ≲8.2%, ${P}_{{\theta }_{v}{\theta }_{v}}$ is ≲17%, and ${P}_{{{\rm{\Theta }}}_{p}{{\rm{\Theta }}}_{p}}$ is ≲9.2% at the range of 0.2 ≲ k ≲ 1.4 All these results are much better than the linear perturbation theory results.

As a proof-of-concept study, our analysis demonstrates the ability of deep neural networks to reconstruct the nonlinear velocity and momentum fields from density fields. The neural network can even handle regions of shell-crossing, which is notoriously difficult within perturbation theory approaches. At the same time, there is still much room for improvement in the accuracy of the neural network, via further optimizing the architecture, enlarging the number of the training samples, or adding follow-up neural networks to fit the residuals and perform corrections.

The reconstructed peculiar velocity fields can be used for a number of studies, such as BAO reconstructions, RSD analyses, kinematic Sunyaev–Zeldovich (kSZ), supercluster analysis and the cosmic web construction. We will continue to work on this direction so that the machine-learning technique can be reliably applied to real observational data and help us uncover more of the mysteries of the universe.

We thank Kwan-Chuen Chan, Yin Li, Jie Wang, Le Zhang, and Yi Zheng for helpful discussions. This work is supported by the National SKA Program of China No. 2020SKA0110401. X.D.L. acknowledges the support from the NSFC grant (No. 11803094) and the Science and Technology Program of Guangzhou, China (No. 202002030360). C.G.S. acknowledges financial support from the National Research Foundation of Korea (NRF; #2020R1I1A1A01073494). J.F.-R. acknowledges support from COLCIENCIAS Contract No. 287-2016, Project 1204-712-50459. Y.W. is supported by NSFC grant No.11803095 and NSFC grant No.11733010. We acknowledge the use of Tianhe-2 supercomputer. We also acknowledge the use of the Kunlun cluster, a supercomputer owned by the School of Physics and Astronomy, Sun Yat-Sen University.

Cosmic Velocity Field Reconstruction Using AI

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Training and Testing Data Sets

3. Neural Network Architecture

4. Result

4.1. Pixel-to-pixel Comparison

4.2. Power Spectrum

5. Discussion and Conclusions

Footnotes

Cosmic Velocity Field Reconstruction Using AI

Article metrics

Permissions

Share this article

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction

2. Training and Testing Data Sets

3. Neural Network Architecture

4. Result

4.1. Pixel-to-pixel Comparison

4.2. Power Spectrum

5. Discussion and Conclusions

Footnotes