Learning neutrino effects in Cosmology with Convolutional Neural Networks

Measuring the sum of the three active neutrino masses, $M_\nu$, is one of the most important challenges in modern cosmology. Massive neutrinos imprint characteristic signatures on several cosmological observables in particular on the large-scale structure of the Universe. In order to maximize the information that can be retrieved from galaxy surveys, accurate theoretical predictions in the non-linear regime are needed. Currently, one way to achieve those predictions is by running cosmological numerical simulations. Unfortunately, producing those simulations requires high computational resources -- several hundred to thousand core-hours for each neutrino mass case. In this work, we propose a new method, based on a deep learning network, to quickly generate simulations with massive neutrinos from standard $\Lambda$CDM simulations without neutrinos. We computed multiple relevant statistical measures of deep-learning generated simulations, and conclude that our approach is an accurate alternative to the traditional N-body techniques. In particular the power spectrum is within $\simeq 6\%$ down to non-linear scales $k=0.7$~\rm h/Mpc. Finally, our method allows us to generate massive neutrino simulations 10,000 times faster than the traditional methods.


INTRODUCTION
The measurement of the mass and the ordering of neutrinos, constitutes the necessary steps to understand the physics of those particles and the beyond Standard Model mechanism.The discovery of neutrino oscillations (Fukuda et al. 1998;Ahmad et al. 2002;Araki et al. 2005;Adamson et al. 2008;Ahn et al. 2012;Abe et al. 2012Abe et al. , 2014;;Forero et al. 2014) have robustly established that at least two out of the three active neutrinos (electron, muon and tau) are massive.However, oscillation experiments only provide bounds on the neutrino mass squared differences.In the minimal neutrino scenario, the best fit values for the solar and atmospheric mass splitting are ∆m 2 21 ≡ m 2 2 − m 2 1 ≃ 7.6 × 10 −5 eV 2 and |∆m 2 31 | ≡ |m 2 3 − m 2 1 | ≃ 2.5 × 10 −3 eV 2 (Gonzalez- Garcia et al. 2014;Forero et al. 2014;Esteban et al. 2017).Since the sign of the atmospheric mass splitting is still unknown, we have two possible orderings of neutrino masses: normal Corresponding author: Elena Giusarma egiusarm@mtu.edu(∆m 2 31 > 0) and inverted (∆m 2 31 < 0).The lower limits on the sum of neutrino masses set by oscillation experiments are M ν ≳ 0.06 eV in the normal hierarchy and M ν ≳ 0.1 eV in the inverted hierarchy.The study of the endpoint energy of electrons produced in β-decay places an upper limit on the total neutrino mass at the level of M ν ≲ 1.1 eV (Kraus et al. 2005;Aker et al. 2019).
Cosmology provides an independent tool to constrain the neutrino masses and ordering.Neutrinos are the second most abundant particles in the Universe, after photons, and leave distinctive signatures on several cosmological observables.They strongly affect the background evolution of the Universe, as well as the evolution of cosmological perturbations.The light massive neutrinos are relativistic in the early Universe and contribute to the radiation energy density.At late time, they became non-relativistic, 1 contributing to the total matter density of the Universe.The non-relativistic neutrinos behave as Hot Dark Matter (HDM), possessing rela-tively large thermal velocities compared to any other massive particles.They thus cluster only on scales larger than their free-streaming length,2 suppressing the growth of perturbations on smaller scales.The presence of massive neutrinos also affects the anisotropies of the Cosmic Microwave Background (CMB), as these particles may turn non-relativistic around the decoupling period, and gravitational lensing of the CMB.The current upper bound on M ν is obtained by combining different cosmological data.In particular, the tightest constraints on M ν , which arise by combining CMB temperature and polarization anisotropies measurements with different observations of the large-scale structure of the Universe, range from 0.22 eV to 0.12 eV at 95% confidence level (Palanque-Delabrouille et al. 2015;Giusarma et al. 2016;Vagnozzi et al. 2017;Zennaro et al. 2018;Giusarma et al. 2018;Planck Collaboration et al. 2018, 2019).
Future cosmological surveys, such as Euclid Laureijs et al. (2011), DESI (Aghamousa et al. 2016), WFIRST (Spergel et al. 2015), VRO (Ivezić et al. 2019), and CMB-S4 (Abazajian et al. 2016), are expected to improve the constraints on the cosmological parameters.In particular, upcoming galaxy surveys will provide one of the most important sources of information to determine the neutrino masses and their mass ordering.To achieve those goals, accurate theoretical predictions for the spatial distribution of matter and luminous tracers in the presence of massive neutrino will be crucial.A powerful tool to obtain rigorous predictions in cosmology is through cosmological simulations.In particular, Nbody simulations are numerical simulations where cosmological matter fluctuations are evolved under only gravity.They allow us to make comparisons between observations and theory.They are used to generate mock galaxy catalogs and to compute error covariance matrices.Lastly, they are also used to optimize observational strategies.Over the past years, a large number of N-body simulations, including massive neutrinos, have been proposed and developed in the literature.Those simulations have made it possible to study the impact of neutrino masses on clustering in fully non-linear scale in real-space (Bird et al. 2012;Lesgourgues & Pastor 2012;Villaescusa-Navarro et al. 2013;Villaescusa-Navarro et al. 2014;Peloso et al. 2015), on clustering and abundance of halo and cosmic voids (Castorina et al. 2014;Castorina et al. 2015;Massara et al. 2015) and finally on clustering of matter in redshift-space (Villaescusa-Navarro et al. 2018;Bel et al. 2019).However, simulations are computationally expensive.The development of new computational methods is thus needed to accelerate this process.
In the last decade, deep learning approaches have been applied to many different fields of research, achieving outstanding results; especially in computer vision and image recognition challenges (Krizhevsky et al. 2012).Deep learning is a branch of machine learning that makes use of deep neural networks.An increasing number of studies are also adopting these methods for a variety of problems in cosmology.For example, Ravanbakhsh et al. (2017), He et al. (2018) and Ntampaka et al. (2019) applied convolutional neural networks (CNNs) to estimate the cosmological parameters directly from simulated dark matter distributions and the distribution of photons in Cosmic Microwave Background; He et al. (2019) built a deep CNN to predict the non-linear structure formation of the Universe from simple linear perturbation theory; Rodriguez et al. (2018) and Zamudio-Fernandez et al. (2019) used Generative Adversarial Networks (GANs) models to synthesize samples of the cosmic web and predict high-resolution 3D distributions of cosmic neutral hydrogen; Ramanah et al. (2019) introduced a novel halo painting network to map the cosmic density fields of dark matter to realistic halo distributions; Hezaveh et al. ( 2017) exploited the use of CNNs to estimate the strong gravitational lensing parameters accurately; finally, List et al. ( 2019) applied conditional Generative Adversarial Networks (cGANs) to predict how changing the underlying physics alters the simulation results.
In this paper, we apply a modified version of the deep neural network proposed by He et al. (2019) to establish the mapping between simulations with massless and massive neutrinos.The important advantage of our approach consists in the ability to produce complex numerical simulations from the standard ones in a fraction of few minutes using a modern Graphics Processing Unit (GPU), and with the same statistical properties as the standard N-body techniques.
The paper is organized as follows.In Section 2 we briefly describe the cosmological simulations we have used in this work.We then present a description of the training process and the quantitative results in Sections 3 and 4. Finally, we draw our conclusions in Section 5.

N-BODY SIMULATION DATA
In this section, we briefly describe the N-body simulations used in our work.We refer the reader to Villaescusa-Navarro et al. (2018);Villaescusa-Navarro et al. (2019) for further details on the simulations.
We use two subsets of HADES simulations (Villaescusa-Navarro et al. 2018), the precursor of the QUIJOTE simulations (Villaescusa-Navarro et al. 2019): the standard, ΛCDM, simulations (without neutrinos), and simulations with massive neutrinos.The value of the cosmological parameters are: Ω m = Ω cdm + Ω b + Ω ν = 0.3175, Ω b = 0.049, h = 0.6711, Ω Λ = 0.6825, n s = 0.9624 and A s = 2.13.The simulations with massive neutrinos have M ν = 0.15 eV, and since A s is fixed in both sets, the value of σ 8 is different: σ 8 = 0.833, and σ 8 = 0.798 for massless and massive neutrinos respectively.All the simulations are run in a periodic box size of 1 h −1 Gpc and follow the evolution of the cold dark matter (CDM) and neutrino particles (only in the case of simulations with massive neutrinos), from z = 99 to z = 0. We use 100 independent realizations for each model, containing the same number of cold dark matter and neutrino particles (N cdm = N ν = 5123 ).We focus our analysis at redshift z = 0, corresponding to the present time.
In order to speed up the process of training the neural network and to avoid GPUs memory problems, we split each of the 100 simulations into sub-cubes of size 32 3 voxels in Lagrangian space, 3 corresponding to regions of size 62.5 h −1 Mpc.Each realization contains thus 4, 096 subcubes, making a total of 409, 600 voxels for each simulation model.We then split the realizations of each model into three chunks: 70% for training (70 realizations), 20% for validation (20 realizations) and 10% (10 realizations) for testing.After the training and the testing, we concatenate the sub-cubes of each of the 10 realizations in the test data in a cube of size 1 h −1 Gpc and we compute the relevant summary statistics.Following He et al. (2019), we train our model using the displacement field rather than density field (see He et al. (2019) for more details).The displacement field is defined as: where ⃗ x f is the final position of the particles (at redshift z = 0) and ⃗ x i is the Lagrangian position of the same particles on a uniform grid.The input and target4 of the neural network is the displacement field from simulations without and with massive neutrinos, respectively.Our benchmark model consists of the Zel'dovich approximation (hereafter ZA) applied to massive neutrino models.
The ZA is a significantly faster approximation to full N-body simulations produced by first-order perturbation theory with neutrinos.The reason why we use the Zel'dovich approximation and not the second-order Lagrangian perturbation theory (2LPT), is because it is currently unknown how to predict 2LPT in massive neutrino models.In particular, there is no estimate for the two significant quantities, the second-order scale-dependent growth factor, and growth rate, necessary to calculate the 2LPT in presence of massive neutrinos.
We train our neural network by using the mean absolute error loss function that is given by the sum of all the absolute differences between the target displacement field and the predicted values: where ⃗ d t i is the true displacement field (from massive neutrino simulations), ⃗ d p i is the prediction from the D 3 M , and N is the total number of particles.
The total number of trainable parameters is 8.4 × 10 6 .
Figure 1 shows the projected density field of N-body simulations with massive neutrinos at z = 0 (top-left panel), together with the residuals between the target and the input (top-center panel) and the target and the output of our neural network (top-right panel).From the top-left panel, we can clearly see the presence of the filaments, surrounded by large underdense regions (dark regions), called voids, in between.The presence of massive neutrinos in the simulations induces a suppression in the growth of density perturbations on small scales, visible in the matter power spectrum.
The bottom-left panel of Figure 1 shows the PDF of the residuals from "Target − Input" (blue line) and "Target − D 3 M" (red line).By computing a Q-Q plot test, we established that both the curves approximately follow a Gaussian distribution.However, the blue curve exhibits a shift induced by the presence and the absence of neutrinos in the simulations.In order to verify if the statistical variations between the simulations is reflected in the D 3 M prediction, we also consider a different simulation instance in the training data with M ν = 0.15 eV, see bottom-right panel of Figure 1.In particular, by comparing the residuals between "D 3 M− realization 1" (red line), "Target − realization 1" (blue line) and "realization 1 − realization 2" (green line), we can note that the three residual distributions present a same shape and width.This is an indication that the density residuals coming from our D 3 M model have similar statistical properties as the simulations with the same neutrino mass.

Training time
One standard cosmological simulation without neutrinos currently requires ∼ 500 CPU hours to complete.The addition of massive neutrinos increases the simulation time to ∼ 700 CPU hours.The key advantage of the approach we describe in this work is its ability to reduce the additional computational cost of producing massive neutrino simulations.In particular, it requires ∼ 10 GPU hours to train and only few minutes to generate one simulation with massive neutrinos: a substantial gain in computing time compared to standard Nbody techniques.We run our CNN model on two GPUs, 320 NVIDIA P100-16GB.This allows us to speed up the training time by a factor of 2.5.

RESULTS
In this section, we present the main results obtained by using the D 3 M architecture and compare it with the standard N-body simulations and the benchmark model.We quantify the agreement between our model and numerical simulations by considering four different summary statistics: the overdensity distribution function, the power spectrum, the bispectrum, and the void size function.By doing this, we can compare the properties of the fully 3-dimensional density distributions.We compute the considered statistics by concatenating the sub-cubes of each of the ten realizations in the test data and calculating the mean summary statistics over ten simulations.

1-D Probability distribution function
We compute the overdensity distribution of the cold dark matter for the N-body simulations and D 3 M prediction.We first place the particles on a grid with 512 3 cells, and we smooth the density field with a top-hat filter on a scale of 10 Mpc/h.We then calculate the distribution as the fraction of cells with overdensity lying within a given interval.The probability distribution function provides information on the distribution of CDM overdensities in the cells.
The results are depicted in Figure 2, where we show (upper panel) the comparison of the overdensity distribution among the standard simulations (the input, red dashed-dotted line), massive neutrino simulations (the target, black dashed line) and our D 3 M model (blue line). 5The bottom panel of Figure 2 shows the residual of the neural network with respect to the target.The residual is approximately 0 for 0.3 < δ < 6 and differs from zero by a 15% and 6% for δ ≈ 0.15 and 9 respectively.Those results indicate a good agreement between the D 3 M and the target.

Power Spectrum
The most widely used statistic in cosmology to extract information from cosmological observations is the 2-point correlation function ξ(r), which is defined as the excess probability, compared with a random distribution of galaxies, of finding a pair of galaxies at a given separation.The Fourier transform of the 2-point correlation function is the power spectrum, P (k): where k is wavenumber of a fluctuation, that is related to the wavelength λ by k = 2π/λ, and δ(⃗ r) is the overdensity at position ⃗ r defined as δ = ρ/ρ − 1 with ρ denoting the mean density. 6The power spectrum is the quantity predicted directly by theories for the formation of large-scale structure, and in the case of a Gaussian density field, it provides a complete statistical description of its fluctuations.
In order to quantify the performance of our model against the target, we compute the transfer function T (k), defined as the square root of the ratio of the two power spectra: where P pred (k) and P target (k) are the power spectra of the density field predicted by the D 3 M and the target, respectively.
Figure 3 shows the average power spectrum and the transfer function of the density field over ten simulations.The red and black lines on the top panel correspond to the power spectrum of N-body simulations without and with massive neutrinos.We note that massive neutrinos induce a suppression of the power spectrum at small scales (larger k).This is the most significant effect of neutrino masses on the several cosmological observables.Neutrinos, being hot thermal relics, possess large velocity dispersion.Consequently, the non-relativistic neutrino overdensities will only cluster at wavelengths larger than their free streaming wavenumber, reducing the growth of cold dark matter fluctuations on small scales.
The blue and green lines depict the power spectrum from the D 3 M and benchmark model (Zel'dovich approximation).Notice that the latter reproduces the power spectrum accurately on very large-scales (k < 0.05 h/Mpc).This is what we expect since we are using the Zel'dovich approximation and not 2LPT (see subsection 3.1).
We find that the density distribution from the D 3 M sample has, on large-scales an average power spectrum which is very close in amplitude and in shape to that of N-body simulations with neutrinos (the target).On smaller scales, the power spectrum of our model departs from that of the target.To quantify this deviation, we focus on the bottom panel of Figure 3, that shows the transfer function as a function of wavenumber for the benchmark model, the D 3 M and the input.For an entirely accurate prediction, the transfer function is expected to be 1.The transfer function of the ZA approximation is close to one up to k < 0.05 h/Mpc and then decreases as expected.On the other hand, the transfer function of the D 3 M prediction is approximately 1 for 0.03 < k < 0.5 h/Mpc and differ from the unity by a 3.5% on scales k ≈ 0.55 h/Mpc.This discrepancy increases to

T(k)
N-body 0 D 3 M ZA 6.3% as k increases around 0.7 h/Mpc.Those results suggest that our model manages to predict the power spectrum accurately with massive neutrinos from large to intermediate scales.On smaller scales (k > 0.7 h/Mpc) the prediction starts to deviate from the target.This is not a surprise since those are the scales where the effects of non-linear evolution are important, and the mapping becomes highly non-trivial.

Bispectrum
The inflationary scenario predicts Gaussian initial conditions.The statistical properties of the Universe's density field should be fully characterized by the 2-point correlation function (2PCF) or the power spectrum.However, on small scales, non-linear gravitational instability induces non-Gaussian signatures in the mass distribution, which contain information on the nature of gravity and the dark matter.The lowest order statistical tool to describe a non-Gaussian  field is the three-point correlation function (3PCF), or equivalently, its Fourier transform, the bispectrum, defined as: (5) The bispectrum plays an important role in cosmology because it carries information about the spatial coherence of large-scale structures and can place strong constraints on models of structure formation.The presence of massive neutrinos induces a suppression of the amplitude of the bispectrum (Ruggeri et al. 2018).Furthermore, Hahn et al. (2019) have shown that the bispectrum is a powerful probe in constraining the total neutrino mass since can break the degeneracies between M ν and the other cosmological parameters.
Figure 4 shows the average bispectrum over ten simulations for our D 3 M (blue line), the benchmark model (green line), the input (red line) and the target (black line).We have selected a value of k 1 = 0.15 h/Mpc, and of k 2 = 0.25 h/Mpc and varied the angle among those two vectors.Those scales correspond to the maximum wavenumbers adopted in cosmology to infer the cosmological parameters from the galaxy power spectrum measurements.
We find that the mean relative bispectrum residual of our model compared to the target is 0.4%.This suggests that the D 3 M model can generate neutrino simulations with correct higher-order statistics in the mildly non-linear regime.Notice that on the other hand, the benchmark model is not able to reproduce the bispectrum from the target.This is not surprising as the validity of the Zel'dovich approximation, will be limited to large scales.

Voids Abundance
Voids are the most under-dense regions of our Universe and together with halos, filaments, and walls, constitute the large-scale structure of the Universe, known as the cosmic web.Voids enclose a large amount of cosmological information since they fill most of the volume of the Universe.Massive neutrinos modify the properties of cosmic voids, such as their number, size and shape.This occurs because the large thermal velocities of massive neutrinos prevent their evacuation from cosmic voids.Consequently, such additional mass within voids affects their evolution.
The last statistical tool we consider in this section is the void size function, defined as the abundance of voids as a function of the voids radius.In presence of massive neutrinos, the abundance of large voids is suppressed.This occurs because the additional neutrino masses inside the voids slow down their evolution and make voids smaller and denser (Massara et al. 2015;Kreisch et al. 2019;Pisani et al. 2019).Here we used the algorithm described in Banerjee & Dalal (2016) in order to identify voids in the galaxy distribution of our D 3 M, the target and the input.In Figure 5, upper panel, we compare the average void size function over ten realizations from the D 3 M (blue line), the input (red line), and the target (black line).We can see the suppression of the voids abundance induced by massive neutrinos (blue and black lines) at larger radius.The bottom panel shows the transfer function as a function of radius for the input and the neural network model.We can note the good agreement in the void size function among the D 3 M and the target up to R ≈ 20 [h −1 Mpc].The discrepancy from the unity increases by a 8% as the radius increases around 30 [h −1 Mpc].This indicates that our approach is also able to capture voids' information at the relevant scales.

Additional tests
Although our model reproduces the target's power spectrum within 6.3% up to scales k ≈ 0.7 h/Mpc, the accuracy degrades on smaller non-linear scales.In order to improve the model efficiency at those scales, we perform some additional tests.We first tune the hyperparameters in the neural network.In particular, we increase the number of the layers, change the learning rate and the batch size, and apply the mean square error (MSE) loss function.
We then test different convolutional neural network architectures.Specifically, we implement the D 3 M by including an Inception module after the first block in the downsampling and before the last block in the expansive path.The Inception network (Szegedy et al. 2014) uses convolutions of different sizes to capture details at varied scales.In our case, we are interested in extracting detailed features on smaller scales.This motivates the use of the Inception module at the beginning and at the end of the D 3 M .In particular, we use the first original version of the Inception architecture with 3 different sizes of filters (1 × 1, 3 × 3, 5 × 5) and max pooling layers.The outputs are then concatenated and passed as input to the second/last block of the D 3 M .
The results are depicted in Figure 6.We compare the transfer function of the additional tests we performed with the D 3 M architecture used in our analysis.We can see that these tests do not significantly improve the performance of our model at smaller scales.We only find a slight improvement on larger scales when we consider the MSE loss function or a deeper D 3 M architecture.Moreover the agreement for small scales is compromised in favor of the agreement for large scales when we use a deeper D 3 M model.This demonstrates the stability of our model and shows that more accurate trials are needed to improve the predictions on nonlinear scales.We leave this exploration to future work.It is lastly worth to mention that in order to speed up the learning process of our D 3 M architecture, we parallelized the training across the data dimension using two GPUs.In particular, to train on multiple GPUs we used the PyTorch multi GPU techniques-data parallelism.By doing that, the two GPUs synchronize the model parameters (the weights) to ensure that they are training a consistent model.Unfortunately, such data parallelism suffers from excessive inter-GPU communication overheads due to frequent weight synchronization among GPUs.This process introduces some systematic during the training that affects the final prediction (see for example the orange dotted line in Figure 6).The development of new parallelization approaches to speed up the training process in deep neural networks and improve the predictions is one of major research area of study in machine learning (Chen et al. 2018;Krizhevsky 2014;Lin et al. 2018) and it is well beyond the scope of our paper.

CONCLUSIONS
Upcoming galaxy surveys, such as Euclid, DESI, DES or LSST, are expected to reach the sensitivity necessary to detect the absolute scale of neutrino masses.To achieve this goal, precise theory predictions are required to analyze the data from those surveys and to understand and model the impact of non-linearities in the matter power spectrum, galaxy bias, and redshift space distortion in presence of massive neu-trinos.A powerful way to obtain accurate theory predictions is by running numerical cosmological simulations.However, producing those simulations requires high computational resources.
In this work, we have presented a deep learning approach to predict fast non-standard cosmological simulations with massive neutrinos from standard N-body simulations.We have quantified the performance of our D 3 M model by considering four summary statistics often used in cosmology: the 1-D PDF, the power spectrum, the bispectrum, and the void abundance.We have shown that our model is able to learn the mapping between the displacement field in simulations with massless and massive neutrinos, down to scales as small as 0.7 h/Mpc at z = 0, and reproduce the effect of massive neutrinos in large-scale structure.Moreover, with our method, we can generate massive neutrino simulations five orders of magnitude faster than the traditional methods and dramatically reduce the computational costs necessary of implementing simulations with massive neutrinos.
This work demonstrates that the deep learning approach is an accurate alternative to the traditional N-body techniques by mapping simulations with massive neutrinos directly from simulations with massless neutrinos.The use of such approach will be particularly useful in the near future.The need for fast N-body simulations will increase with the upcoming large cosmological surveys such as Euclid and LSST.Our methods will thus be crucial for producing fast and large simulation datasets for cosmological analyses.
In future work, we will improve our model to capture the effect of massive neutrinos on smaller non-linear scales by using the QUIJOTE high-resolution simulations (Villaescusa-Navarro et al. 2019).Another direction will be to explore the possibility of generating fast neutrino simulations with an arbitrary mass from the standard ΛCDM simulations or more complex non-standard cosmological simulations that include, for example, modified gravity or primordial nongaussianity effects (Peel et al. 2019;Novaes et al. 2015).We leave for future work the investigation of these different effects.
We note that recently, Zennaro et al. (2019) has presented a new method to rescale massless neutrinos simulations to massive neutrinos simulations, following a very different approach to the one used in this work.Using the output of that method as the input of our neural net has the potential to improve the accuracy of both our method and Zennaro et al. (2019) results, as neural nets train faster on residuals.We plan to study this in future work.
Flatiron Institute for his help in solving technical problems with GPUs.The work of EG, FVN, SH and SH is supported by the Simons Foundation.The HADES and QUIJOTE simulations are publicly available at https:// franciscovillaescusa.github.io/hades.htmland https://github.com/franciscovillaescusa/Quijote-simulations, respectively.The analysis of the simulations and training of the neural network has been carried out in the Rusty cluster of the Flatiron Institute.

APPENDIX A. APPENDIX A: METHOD
In this section, we introduce a brief description of the convolutional neural networks and the D 3 Marchitecture used in this work.

A.1. Convolutional neural networks
An artificial neural network is an interconnected group of nodes (neurons) that combine to learn complex patterns.Neural networks are modeled after biological neural network systems and attempt to allow computers to learn in a similar manner to humans.An artificial neuron receives one or more input x i and multiplies them by a set of connection weights w i plus a bias.The result of such operation is called net input: The output can be then calculated by applying the activation function over the net input.The activation function is a non-linear function which evaluates if the neuron would fire or not.There exist different types of activation functions, but the Rectified Linear Unit (ReLU) is the most widely used in neural networks nowadays.ReLU activation function is defined as: One of the greatest advantage of ReLU is that it does not activate all neurons at the same time.It converts all negative inputs to zero and the neuron does not get activated.This makes it very computational efficient than other activation functions.
A particular class of neural networks that can be used to learn spatial features is called Convolutional Neural Networks (CNN).A CNN consists of an input and an output layer, as well as multiple hidden layers.The input consists of images that computers read as pixels and express them as matrices.The convolutional layer contains a set of learnable parameters (also known as filters or kernels).A filter is used to detect the presence of specific features or patterns present in the input.This filter is convolved across the input file, and a dot product is computed to give an activation map.The feature images produced by a convolutional layer is then passed to the next layer in the CNN.
One of the main problems we can encounter when training learning systems is a change in the distribution of the outputs of each hidden layer.Each input layer is affected by the model parameters of all preceding layers.As a consequence, a small change made within a layer can produce a large variation of the inputs of another layer and change feature maps distribution.This means that later layers needs to continuously adapt to the new distribution obtained from the previous one.This change can slow down the convergence of the learning algorithm.In order to improve the performance of the neural network, we can make use of the Batch normalization technique Ioffe & Szegedy (2015).Batch normalization (BN) consists in normalizing the output of a previous activation layer during the training, so that the input to the activation function across each training batch has a fixed mean (0) and a variance (1).Instead of using the entire data set to normalize activation functions, the normalization is restrained to each mini-batch in the training process.
A.2.The D 3 M Architecture We follow He et al. (2019) and we use a D 3 M architecture that consists of three parts: the contracting path, a bottleneck and the expansive path.During the contracting path, the spatial resolution is reduced while the feature information is increased.The expansive path increases the spatial resolution and combines it with the feature information.The bottleneck is in between the contracting and expansive path and allows the model to learn a compression of the input data.
The contracting and expansive paths consist each of blocks of two 3 3 convolutional neural networks, while the bottom of the U-Net consists of two convolutional neural networks.Every convolution is followed by zero padding, BN, and ReLU.As in U-Net, at all resolution levels, with exception of the bottleneck, we concatenate the input of the downsampling layers to the output of the upsampling layers.This allows to transfer the small-scale spatial information from the contracting path to the expansive path, that it would otherwise have poor spatial resolution.At the final layer, a 1 3 convolution is used to map 64 features to the final 3-D data.We refer the reader to He et al. (2019) and Ronneberger et al. (2015) for a more detailed description of the architecture.In order to test the robustness of the D 3 M, we train our neural network by considering a different neutrino mass case.In particular, while the input still includes standard simulations without neutrinos, the new target consists of neutrino simulations with mass M ν = 0.6 eV.The results are depicted in Figure 7, where we show the power spectrum and the transfer function for the input, target and the prediction of the neural network.Also for this neutrino mass case, we can note a very good agreement in the power spectrum between the target and the D 3 M on large-scales and a deviation from the target on small non-linear scales.In particular, the transfer function is approximately 1 for 0.03 < k < 0.45 h/Mpc and differs from the unity by a 8% from scales k ≈ 0.5 h/Mpc.Those results suggest that our model is applicable independently of neutrino mass ensuring the robustness of our approach.
Model and Training We use a modified version of the deep neural network, Deep Density Displacement Model (D 3 M), introduced by He et al. (2019) with 15 convolution plus deconvolution layer.D 3 M is a generalization of the standard U-Net, first proposed in Ronneberger et al. (2015) for biomedical image segmentation, to work with three-dimensional data.A brief description of the architecture is shown in Appendix A.

Figure 1 .
Figure1.The top left panel shows the cold dark matter density field at z = 0 from a region of 1000×1000×15 (h/Mpc) 3 for a massive neutrinos model (Mν = 0.15 eV).The top-center panel shows the residuals between the target (massive neutrino simulations) and the simulations with massless neutrinos (the input); the top-right panel depicts the residual between the target and D 3 M.The two bottom panels illustrate the PDF of the residual density field as described in the text.

Figure 2 .
Figure 2. The top panel shows the normalized probability distribution function of CDM overdensities.The red and black lines show the overdensity distribution for N-body simulations with massless (the input of our D 3 M) and massive neutrinos (target), respectively.The blue line depicts the D 3 M results.The bottom panel illustrates the residual for the D 3 M with respect to the simulations with massive neutrinos.

Figure 3 .
Figure 3.The top panel shows the matter power spectrum comparison among the standard N-body simulations (the input of the D 3 M, red dashed-dotted line), massive neutrino simulations (the target, black dashed line), CNN model dubbed D 3 M (blue line) and Zel'dovich approximation (the benchmark model, green dotted line).The bottom panel depicts the transfer function for the input, the D 3 M and the benchmark model.

Figure 4 .
Figure 4.The plot shows the bispectrum predicted by standard ΛCDM simulations (the input of our architecture, red dashed-dotted line), massive neutrino simulations (the target, black dashed line), CNN model (blue line) and Zel'dovich approximation (the benchmark model, green dotted line).We set k1 = 0.15 h/Mpc and k2 = 0.25 h/Mpc.Notice that the bispectrum of the D 3 M model is very close to that of the target.

Figure 5 .
Figure 5.The upper panel shows the void size function for the input (standard N-body simulations, red dashed-dotted line), target (massive neutrino simulations, black dashed line) and D 3 M (blue line).The bottom panel depicts the transfer function for the input and the neural network model.

Figure 6 .
Figure 6.Transfer function comparison among the D 3 M (blue line) used in the analysis and the modified version of the architecture obtained by including the Inception module (pink dashed-dotted line), varying the loss function (grey dashed line) and increasing the number of layers of about the double of D 3 M (orange dotted line).

Figure 7 .
Figure 7.The top panel shows the matter power spectrum comparison among the standard N-body simulations (red dashed-dotted line), the new target (black dashed line) and the prediction (magenta line).The bottom panel depicts the transfer function for the input and the D 3 M.