Fast production of cosmological emulators in modified gravity: the matter power spectrum

We test the convergence of fast simulations based on the COmoving Lagrangian Acceleration (COLA) method for predictions of the matter power spectrum, specialising our analysis in the redshift range 1 ≤ z ≤ 1.65, relevant to high-redshift spectroscopic galaxy surveys. We then focus on the enhancement of the matter power spectrum in modified gravity (MG), the boost factor, using the Dvali-Gabadadze-Porrati (DGP) theory as a test case but developing a general approach that can be applied to other MG theories. After identifying the minimal simulation requirements for accurate DGP boost factors, we design and produce a COLA simulation suite that we use to train a neural network emulator for the DGP boost factor. Using MG-AREPO simulations as a reference, we estimate the emulator accuracy to be of ∼ 3% up to k = 5 h Mpc-1 at 0 ≤ z ≤ 2. We make the emulator publicly available at: https://github.com/BartolomeoF/nDGPemu.


Introduction
With the successful launch of the Euclid1 satellite and the DESI2 collaboration releasing their 1% survey data [1], cosmology is living a very exciting time.The extraordinarily large amount of galaxies that these surveys will observe is fuelling the efforts of the scientific community aimed at accurately modelling the non-linear regime of structure formation to obtain precise cosmological constraints.
One of the key objectives of next-generation surveys is to shed light on the nature of gravity by constraining alternative gravity models.The simplest modified gravity (MG) theories can be described by means of a fifth force mediated by an additional scalar field.Given the stringent constraints coming from solar system tests of gravity, MG theories of interest to cosmology incorporate screening mechanisms which suppress the fifth force in specific environments.Hence, the large-scale structure (LSS) of the Universe provides a unique laboratory to test the validity of GR and explore alternative gravity models.By comparing LSS probes with theoretical predictions, we can place constraints on the deviations from GR and potentially uncover new physics.
Yet, accessing the non-linear regime of structure formation presents significant challenges.Analytical techniques, such as perturbation theory, break down in this regime, and numerical simulations become essential.Cosmological simulations play a vital role in studying the formation and evolution of structures, enabling us to investigate the complex interplay between gravity, dark matter (DM), and baryonic physics.To simulate structure formation on the smallest scales, DM-only N-body simulations are not enough and it is necessary to run hydrodynamical simulations which incorporate the effects of baryonic matter.However, despite the advances in simulating baryonic physics, hydrodynamical simulations show important discrepancies in their small-scales predictions [2], so weak lensing data below baryonic-feedback scales is discarded in cosmological analysis (e.g.see [3,4]).
Furthermore, cosmological simulations are computationally demanding, requiring substantial computational resources and time.To overcome these computational challenges, the development of efficient methods to speed up simulations has been a topic of active research.One such method is the COmoving Lagrangian Acceleration (COLA) method [5], which combines second-order Lagrangian Perturbation Theory (2LPT) with the Particle Mesh N-body technique to accurately reproduce the non-linear dynamics of structure formation while significantly reducing computational costs.In particular, COLA exploits the fact that long-range forces perceived by the N-body particles in a frame comoving with the 2LPT solution are weaker than the ones perceived in the Eulerian frame.This allows COLA simulations to use a coarser time-resolution while providing an accurate description of the matter distribution on large scales [5,6].An interesting alternative approach recently developed in [7], proposes the use of non-symplectic integrators inspired by LPT to recover the correct large-scale evolution in PM simulations without the need to explicitly compute the 2LPT solution.
The potential of the COLA method was already exploited in [8] to produce mock galaxy catalogues for the analysis of the WiggleZ survey [9], where it was shown that COLA simulations can be used to create synthetic galaxy catalogues with comparable accuracy to the full N-body code Gadget [10] on scales k ≤ 0.2 h Mpc −1 .Since then, the COLA method has seen several extensions: • to create more accurate halo catalogues [6], • to produce matter distribution and mock halo catalogues on the light-cone [11,12], • to extend to MG theories the evolution of the matter distribution [13][14][15] and the creation of mock galaxy catalogues [16], • to include the effects of massive neutrinos [17].
Of particular interest to this work is the extension to MG theories presented in [14] and as implemented in the publicly available library FML 3 .This relies on the extension to MG of the 2LPT solutions and to an approximation of the screening mechanisms based on the analytic solution in spherical symmetry [18] which is provided for the Vainshtein and chameleon screening mechanisms.Recently, an alternative technique to approximate the Vainshtein screening [19] has been revised and implemented in a custom version of FML-COLA [20].Despite the speed-up brought by fast simulation methods like COLA, the computational cost of simulations is still too high to allow a fine sampling of the vast theory parameter space that is needed in Monte Carlo searches to infer precise parameter constraints.One possible solution to this problem is to approximate the predictions of cosmological simulations for specific summary statistics making use of machine learning techniques [21,22] or fitting functions (see for instance [23]).The resulting approximating functions are normally referred to as emulators in the field and their key feature is that they provide theoretical predictions at extremely small computational cost.Accurate emulators for the matter power spectrum, one of the most elementary summary statistics that can be predicted thanks to simulations, are available for standard cosmologies [24,25].Matter power spectrum emulators have been developed also in MG gravity theories such as [26][27][28][29][30] in Hu-Sawicki f (R) model [31], and [32] in the Dvali-Gabadadze-Porrati (DGP) model [33].Interesting alternative approaches based on the halo model were also developed to predict the matter power spectrum in MG theories [34][35][36].
Due to the many subtleties that characterise N-body simulations and to the approximate nature of emulators, these tools need to be validated.This can be done for example by means of comparisons between different approaches [24,25,37].In the case of N-body simulations, it is also important to check that the agreement with the other codes is not just due to a coincidence and that the predictions converge to the true answer when increasing resolution.
In this work we push the COLA method to its limits to study the convergence of COLA predictions for the matter power spectrum and its enhancement in MG, the boost factor.We employ the results of this analysis to design and produce an emulator for the matter power spectrum in DGP gravity that we publicly release together with this paper.While in this work we focus on DGP gravity, the framework that we develop is quite general and can be applied to other MG theories.
The remainder of this paper is organised as follows.In section 2 we provide an in-depth analysis of the convergence and validation of COLA simulations for prediction of the DM power spectrum in GR.In section 3 we extend the convergence analysis to the nDGP boost factor for the DM power spectrum and present an emulator for this signal trained on COLA simulations.Finally, we come to our conclusions in section 4.

Convergence tests
Cosmological simulations are subject to errors due to the finite resolution characterising their numerical implementation.In the case of N-body simulations, time-resolution, forceresolution and mass-resolution are important quantities which control the simulation accuracy [24,38,39].To validate simulation results it is necessary to compare the output of simulations against well-established results like analytic predictions or previous simulations already validated (e.g., in [16,40] we validated COLA simulations by comparing them with ELEPHANT simulations).Furthermore, the self-consistency of a simulation technique can be established by performing relative convergence tests, where simulations with different resolutions are compared to understand the reliability of specific simulation settings.
The COLA method exploits the 2LPT and the Particle-Mesh (PM) method to evolve the DM distribution.The use of 2LPT allows us to reduce the time resolution of simulations without losing accuracy on large scales [6].Conversely, increasing the time resolution makes the use of 2LPT less important and with sufficient time resolution, the simulations evolve like PM-only simulations.In this sense, we will refer to COLA simulations with high time resolution as PM simulations in the following sections.This equivalence is important since the accuracy of PM simulations (and therefore of COLA simulations in ΛCDM) is only limited by resolution and they converge to full N-body simulations results with increasing resolution [41].However, when comparing COLA with PM-only algorithms, it is not fair to compare their computational cost using the same number of time-steps as done in [38].In fact, COLA does not necessarily need as many time-steps as PM-only simulations to achieve comparable accuracy.This is evident in the low time-resolution case, where the 2LPT guarantees that COLA provides accurate results on large scales while PM-only simulations already show significant deviations [6].Even in the high time-resolution case, COLA can achieve the same accuracy as PM-only codes with fewer time-steps.Indeed, the latter needs a finer timestepping at high redshift to deal with the rapidly changing growth factor [38] while the 2LPT guarantees good accuracy at high redshift in COLA even with larger time-steps [5,6].For instance, the PM code GLAM needs 147 time-steps to have ∆a/a ≈ 0.014 at z = 0 [38] while COLA needs only ≈ 70 time-steps for the same low-redshift time resolution.
The focus of this section is to validate COLA simulations for the predictions of the matter power spectrum.To do so, we first introduce the cosmological emulators in subsection 2.1, Ratio of Bacco and EE2 power spectra for the cosmology defined in eq (2.1).
then we compare COLA results with full N-body simulations and cosmological emulators in subsection 2.2, and finally, we study the relative-convergence of COLA simulations for the matter power spectrum in subsection 2.3.

Cosmological emulators
Cosmological emulators interpolate the results of computationally expensive cosmological simulations in high-dimensional parameter spaces.Two notable such emulators are the Bacco emulator [25] and the Euclid Emulator 2 (EE2) [24].While these emulators differ on many aspects (e.g. the interpolation techniques employed, the parameter space spanned and the N-body techniques used for the training set) they have been shown to be in agreement with each other within the claimed accuracy [24,42] for several test cosmologies.
Here in particular we are interested in the agreement between these two emulators for a specific cosmology, the one that was used to run the N-body simulation suite that we will discuss in subsection 2.2.In figure 1 we show the ratio of the matter power spectra estimated with Bacco and EE2 at redshifts 0, 0.5 and 1 for the above-mentioned cosmology (see eq 2.1).The agreement is within 2% at all the scales considered.At z = 1, the agreement is well within 1%.This is consistent with the accuracy claimed by the two emulators and gives additional reliability to the emulators' results, in particular at redshift z = 1, that we use in the next subsection to prove the accuracy of COLA simulations for predictions of the matter power spectrum.

Comparison of COLA with emulators and N-body simulations
As reference results to compare COLA with, we make use of the power spectra obtained from a suite of DM-only simulations in GR and nDGP gravity4 ran with MG-AREPO [43,44] which uses the tree-particle-mesh technique for gravitational interactions and a moving mesh to solve the hydrodynamic equations (see references [10,45] for more on the N-body techniques at the base of this code).The simulations were started at redshift z = 127 using the cosmological parameters Ω m,0 = 0.3089 , Ω Λ,0 = 0.6911 , Ω b,0 = 0.0486 , n s = 0.9667 , σ 8 = 0.8159 , h = 0.6774 , and evolved down to redshift z = 0 using 1024 3 particles in a (1Gpc/h) 3 box [46].
To assess the accuracy of COLA simulations we produce a suite of simulations in a L = 512Mpc/h box, with the same cosmology of the Arepo simulations and with different mass resolutions (N 1/3 part = 512, 1024, 2048), time resolutions (N step = 25×[1, 2, 3] = 25, 50, 75) and with force resolutions depending on the mass resolutions (N ).We start the simulations at redshift z = 19 and stop them at redshift z = 1.The choice of 25 time-steps as the low time-resolution case is motivated by the time-step size of typical COLA configurations (∆a ≈ 0.02).In the high time-resolution case, where the late-time evolution is PM-like, 75 time-steps produce a time-step size of ∆a ≈ 0.006.This is the same step size of the reference simulations used in the convergence tests of GLAM at z < 3 [38].
In the next subsection, we will use the highest resolution PM run (N 1/3 mesh = 4096, N step = 75) as the reference to assess the accuracy of all the other runs.Before that, we validate the highest resolution run by comparing its matter power spectrum with the ones from the Bacco emulator and from the Arepo simulations.In Figure 2 we plot the ratio of the matter power spectra of PM and Arepo simulations with the Bacco power spectrum for the cosmology of eq.(2.1), in GR at redshift 1.The first thing that comes to the eye, is that the Arepo power spectrum (green line) departs from Bacco already at k ∼ 0.6h/Mpc, and at 0.02 0.05 0.1 0.2 0. k ∼ 1h/Mpc the discrepancy is already at 4%.We attribute this to the low mass resolutions of these Arepo simulations (M part ≈ 8 • 10 10 h −1 M ⊙ ) [24], and to motivate this claim we add to the comparison the PM simulation for N 1/3 mesh = 2048, N step = 75 (blue line) which has a similar mass resolution of Arepo simulations.This is in good agreement with Arepo up to k ∼ 2h/Mpc.The blue line instead shows that the higher resolution PM run is in sub-percent agreement with Bacco up to k ∼ 2 h Mpc −1 .

Relative convergence of COLA simulations
With all the simulations of the convergence suite, we compare the accuracy of all the different settings using the high-resolution run as the reference.To summarise the results, we show in different figures different mass resolutions, figure 3  respectively on the left and on the right 5 .In each panel, we show the ratio of the power spectra in the COLA run with that of the high-resolution run for the various force and time resolutions described in the legends .
For the N 1/3 part = 512 case, we notice that the results are almost insensitive to the time resolution and do not converge towards the high-resolution result.This is attributed to the fact that the low mass resolution has a strong effect on the scales considered in the comparison, especially for higher redshifts.In addition to this, we notice that the convergence is nonmonotonic: the fact that N 1/3 mesh = 1024 results have significantly larger power than the ones of N 1/3 mesh = 1536 and N 1/3 mesh = 2048 is somehow surprising.From additional tests that we have run, we could identify this behaviour as being due to discreteness effects happening at high redshifts (see [48,49] for recent progress on this topic) when the particles' displacements from the Fourier grid are small in combination with the continuous Fourier space kernels used for the force computation in the PM scheme 6 .While this effect is moderate when starting the simulations at z = 19, it becomes more important when starting the simulations at higher redshift.
For the N 1/3 part = 1024 and 2048 cases, the convergence towards the high-resolution PM result happens only if the force and time resolutions are increased accordingly.This condition is particularly evident in the simulations with N 1/3 part = 1024, where increasing the number of time steps from 25 to 75 keeping the fixed force resolution of N 1/3 mesh = 1024 does not improve the convergence toward the high-resolution run.Similarly, increasing the force resolution above N 1/3 mesh = 2048 but using only 25 time-steps does not improve the agreement with the highest-resolution run.We notice that, in the N 1/3 part = 1024 case, the results for high force and time resolution appear to be more converged at low redshift than at high redshift: this is due to the fact that discreteness effects are more important at early times while force resolution becomes more important at late times.
To assess the convergence of the highest-resolution run used as a reference in the previous figures, we compare in figure 6 the power spectra of lower-resolution simulations with the highest-resolution one, as done previously, but now increasing mass, force and time resolution simultaneously.The agreement between the N  and the highest-resolution run indicates that the latter is converged at sub-percent level up to k ∼ 1 h Mpc −1 .This is in line with the result of the comparison with the Bacco emulator.

Modified gravity boost factor
Another interesting application of COLA simulations is to predict the matter power spectrum in MG [26].As non-linear matter power spectrum emulators are already available, it is possible to obtain the matter power spectrum in MG by multiplying the matter power spectrum in GR with the MG boost factor In general, the MG boost factor depends on MG parameters as well as on cosmological parameters and redshift.In the following, we focus on the DGP model and use COLA simulations to train an emulator for the matter power spectrum in this gravity theory.Nonetheless, the method that we develop is not specific to the DGP model and can be used to produce emulators for other MG theories.
The DGP gravity is a braneworld model in 5-dimensions, which has been thoroughly studied by the modified gravity community due to its interesting cosmological features [51][52][53].This theory has 2 branches of solutions for the Hubble expansion on the brane [51]: a self-accelerating branch with ghost instabilities on a de Sitter background [52,54], and a normal branch which requires fine-tuning in the form of an effective dark energy to recover the observed ΛCDM-like expansion history [55,56].Nonetheless, the normal branch of the DGP theory (nDGP) also presents a non-trivial fifth force dynamics which can be described by the Vainshtein mechanism [57], so this theory was often used as a test-case representative for the class of Vainshtein-screened theories.In particular, several N-body codes were developed to simulate non-linear structure formation in nDGP theory (e.g.DGPM [55], ECOSMOG [58], MG-AREPO [59], MG-GLAM [41]) and two of them, DGPM and ECOSMOG, were showed to give consistent prediction for the matter power spectrum boost-factor in [60].
To develop the nDGP emulator, we start by studying the simulation requirements to accurately predict the nDGP boost factor with COLA and investigate the sensitivity of the boost factor to the theory parameters in subsection 3.1.Then we present the simulations suite and the data processing used to create the emulator's data sets in subsection 3.2 and finally, we discuss the emulator's training and performance in subsection 3.3.

Accuracy and sensitivity
The computational cost of cosmological simulations depends on several parameters.In particular, it is very sensitive to volume and resolution.With this in mind, we want to find the optimal simulation settings that allow us to have the target accuracy while probing large enough scales but without requiring excessive computational resources.
To define the simulation requirements for the nDGP emulator, we start by assessing the accuracy of COLA in predicting the boost factor of nDGP theories against Arepo simulations.To this extent, we run high-resolution PM simulations7 with the same cosmology of Arepo simulations for several values of the nDGP parameter, H 0 r c .We simulate the evolution of 2048 3 particles in a box of side L = 1024 h −1 Mpc using 6048 3 mesh-grids and 75 time-steps from redshift z ini = 127 to z fin = 1.The mass resolutions that we use in the PM simulations is M part ∼ 1 • 10 10 h −1 M ⊙ , roughly 8 times higher than the one of the Arepo simulations.The force resolution is ℓ Force ≡ L/N 1/3 mesh ≈ 0.17 h −1 Mpc and the time resolution is ∆a ≈ 0.006.We run the simulations in GR and nDGP from the same IC.For nDGP simulations we use the 4 values of the H 0 r c parameter (H 0 r c =0.5, 1, 2, 5) used in the Arepo simulations suite.
Figure 7 shows a comparison between the nDGP boost factors of PM and Arepo simulations for the different values of the H 0 r c parameter.We also include in the comparison the nDGP boost factor estimated from linear theory.In the top panel, we show the nDGP boost factors, while in the bottom we show the relative difference between the PM and linear boost factors from Arepo boost factors.On large scales, PM results seem to be more consistent with linear theory than the ones from Arepo simulations.Overall, we find ≲ 0.5% agreement in all gravity models up to k ∼ 5 h Mpc −1 between the high-resolution simulations and Arepo simulations.This means that the screening approximation used in COLA simulations in nDGP theory [14,16], once tuned on full N-body simulations for a single value of H 0 r c , is sub-percent accurate in reproducing the power spectrum boost factor of nDGP theories for a wide range of the parameter H 0 r c and for different redshift values 8 .
The COLA simulations that we run in nDGP use IC produced with the back-scaling approach.The IC are set in such a way that the IC power spectrum is the same for nDGP and GR simulations.This is done by integrating the growth factors from matter domination at redshift z = 499 and by normalising them at the redshift of IC z ini Due to this choice of normalisation, COLA simulations ignore the fact that nDGP can be different from GR also at higher redshift.In particular, COLA simulations are usually started at redshift z ini = 19, while nDGP effects can be significant up to redshift z ∼ 100.We can correct this by multiplying the final power spectrum by 9A lin (z) = which has the effect of re-scaling the amplitude of the power spectrum as if the simulations were started at redshift z = 127.This approach neglects the effects of non-linearity before z ini = 19, but these are sub-dominant due to the weakness of the nDGP fifth force at early times.The importance of this correction is evident from figure 8, where we show that the agreement with Arepo of the nDGP boost factors from PM simulations started at z ini = 19 is improved when applying the linear correction.In fact, the raw results of z ini = 19 PM simulations show 1% deviations from Arepo (left panel), while they are in better than 0.5% agreement with Arepo once corrected using eq (3. 3) (right panel).
Having tested the accuracy of COLA simulations in the PM-limit in reproducing the nDGP boost factor, we want to test the impact of force and mass resolution on the nDGP boost factor in COLA simulations to find the optimal setup to span the theory parameter space running a large number of simulations.To do so, we run more COLA simulations in a box of side L = 512 h Mpc −1 with lower mass-resolution (M part ∼ 8 • 10 10 h −1 M ⊙ , similarly to Arepo simulations), with time-resolution ∆a ≈ 0.02 and with 4 force resolutions: N Before choosing a strategy to sample the parameter space for producing the emulator's training set, we explore the impact that each parameter has on the boost factor.To do so we first investigate the range of values that the parameter H 0 r c should span: for stronger deviations from GR we choose ∼ 50% deviations as the upper limit, while for the other limit, we want to "smoothly" connect to GR so we look for models where the deviation is < 1% (i.e., below our target accuracy).Based on previous results in the literature [62], we select H 0 r c = 0.2 as the strong deviation case and H 0 r c = 20 as the weak deviation case.
The results of [42] show that the response function of the DM power spectrum for changes in cosmological parameters estimated with COLA agrees with cosmological emulators in a wide range of cosmologies when using ℓ Force ≡ L/N 1/3 part = 0.5 h −1 Mpc.We shall use at least the same force resolution here for the boost factor since we want to predict it for a similar range of cosmologies.As an additional test to check the accuracy of the boost factor for the wider range of gravity parameters, we produce high-resolution PM simulations and use them to make a comparison with the boost factors of ℓ Force = 0.5 h −1 Mpc simulations.Using a default box side of 512 h −1 Mpc, we in fact compare the boost factors obtained using 512 3 particles and N 1/3 mesh = 1024 with the ones obtained using 1024 3 particles and N 1/3 mesh = 3072.We show the ratio of the two boost factors for 4 redshift values (legend) and 3 values of the parameter H 0 r c (plot titles) in figure 10.In the most extreme case, the difference in the two 0.95 1.00   boost factors is ≲ 1% up to k = 1 h Mpc −1 (≲ 3% up to k = 5 h Mpc −1 ).This is in line with our target accuracy so we rely on these low-resolution settings for the simulations that we use to conduct the rest of our analysis.
The boost factor of nDGP is strongly dependent on the theory parameter H 0 r c and on the redshift as shown in figure 11, where we plot the boost factors for three values of H 0 r c (titles) and 4 redshift values (legend).However, the cosmological dependence of the nDGP boost factor is quite weak.In support of this statement, we show the effect of changing each cosmological parameter on the nDGP boost factor for H 0 r c = 0.2 in figure 12, H 0 r c = 2 in figure 13 and H 0 r c = 20 in figure 14.These figures not only show that the response is limited to ∼ 7% in the most extreme case (variations of Ω m for H 0 r c = 0.2 at redshift z = 0) but also that its scale dependence is similar across the different cosmological parameters and values of H 0 r c .The amplitude of the effect is smaller for larger H 0 r c values.

Simulations and data sets
We produce two independent simulation suites, one for the training set and the other for the test set.Both training and test set simulations use 512 3 particles to track the evolution of the matter density field in a (512 h −1 Mpc) 3 box.The forces are calculated using 1024 3 mesh grids.The simulations are started at z ini = 19 and evolved up to z fin = 0 using 50 time-steps linearly spaced in the scale factor.We sample the cosmological parameter space using the Latin Hypercube Sampling (LHS) technique [63], a space-filling sampling method widely used in computer experiments to produce near-random sets of points which represent the real variability of multi-dimensional sample spaces.The range of cosmological parameters spanned by the training and test sets is described in table 1.We use 20 cosmologies10 for the training set and 10 for the test set, which are listed in table 2 and table 3 respectively.For the modified gravity parameter H 0 r c we opt for a logarithmic sampling with 21 points in the range [0.2, 20] to produce the training set, and with 10 log-random points in the same range for the test set.The redshift sampling is determined by the time-stepping of COLA simulations.For each cosmology in the training set (test set), we run a vanilla GR simulation and nDGP simulations in all the gravity models of the training set (test set) using the same IC.
Each simulation takes roughly 28 minutes on a single node of the Sciama High-Performance Compute (HPC) cluster using 32 tasks11 .This corresponds to ≈ 15 CPU hours.In total, we ran 22 × 20 = 440 simulations for the training set and 11 × 10 = 110 simulations for the test set.This means that the total computational cost for the simulations is ≈ 8 • 10 3 CPU hours.
In particular, we have used a partition counting 12 nodes with 32 cores each for a total of 384 cores (a typical value for an HPC cluster) obtaining all the power spectra in less than one wall-clock day.
The power spectra are computed by interpolating the DM distribution on a 1024 3 mesh grid with the cloud-in-cell mass-assignment scheme.The power spectra are corrected for the  window function used in the interpolation and for shot noise.We use bins of width ∆k = k f between k min = 1 2 k f and k Nyquist = 5 h Mpc −1 , where k f is the fundamental frequency of the box.Our target accuracy is 1% up to k = 1 h Mpc −1 , scales up to which we have already proven COLA simulations with these specifications to have ∼ 1% reliable determination of the cosmological response function [42].However for practical applications (e.g.weak lensing 0.99 1.00 1.01 predictions) we extend the emulation range to k = 5 h Mpc −1 , where the analysis discussed in the previous section suggests an accuracy of our simulations of ∼ 3% for the nDGP boost factor.Beyond k = 5 h Mpc −1 , baryonic effects are dominant over MG effects and it is still unclear whether we can extract additional information from these small scales [3].As we have seen that mass resolution is a limiting factor at high redshift, we make the conservative 0.999 1.000 1.001  choice of restricting our data sets to the power spectra at z max = 2.We take the ratio of the power spectra in nDGP theories with the respective power spectrum in GR for each redshift and cosmology to obtain the nDGP boost factors.As the nDGP and GR simulations are run from the same IC, the sample variance is largely cancelled out when computing the nDGP boost factors.Since our simulations are started at redshift z ini = 19 we multiply the nDGP boost factors with the linear theory correction in eq.(3.3).Due to the numerical methods used to estimate the power spectrum of the N-body particles, the nDGP boost factors that we obtain are sampled with a linear binning in kspace and are affected by finite resolution noise at small scales.This noise is non-physical and may worsen the interpolation accuracy, therefore we decide to smooth the boost factors.We compare two smoothing techniques: one based on a logarithmic re-sampling of the data, the other based on the Savitzky-Golay filter [64].In the left plot of Fig 15 is shown a comparison of the two smoothing techniques for the central case of H 0 r c = 2 and z = 0.The two techniques give consistent results so we decided to perform the rest of the analysis with the Savitzky-Golay filter.

Parameter
Multi-layer perceptrons (MLP), simple neural networks that we want to employ for the parameter space interpolation, work better when they deal with normalised input values [65], hence we re-scale our parameters accordingly.We map the cosmological parameters in such a way that the minimum and maximum values they assume in parameter space are 0 and 1.The redshift is naturally mapped in the range [ 1  3 , 1] by computing the scale factor.We finally convert the MG parameter H 0 r c into W rc ≡ 0.2 H 0 rc which spans [0.01, 1] for our H 0 r c range (or more precisely it spans [0, 1] as we include the GR trivial results).In the right plot of Fig 15, the distribution of the normalised parameters is summarised in 10 linear bins between 0 and 1.
We convert the boost factor with the formula to give more importance to small deviations from GR than to larger deviations 12 .The simplicity and self-similarity of the nDGP boost factor curve suggest that the 407 k-modes that we are using to represent the boost factor are probably unnecessary.After subtracting the mean value of the boost factor from each element of the dataset, we perform a Principal Component Analysis (PCA) [66] to study if the variance of the training set is significantly larger in some k-modes-space directions than in others.We find that the component with the greatest variance accounts for more than 99% of the variance, while by using the first three components it is possible to describe ∼ 99.98% of the variance.This means that we can reduce the dimensionality of the problem from 407 output values (one for each k bin) to 3 output values losing only ∼ 0.02% of the information.Furthermore, this remapping from the k-space manifold to the 3-dimensional manifold of the principal eigenvalues makes the emulation problem insensitive to the choice of k-binning used to compute the power spectra.

Emulator
We train an MLP with one hidden layer of 100 nodes activated by hyperbolic tangent functions with the limited-memory Broyden-Fletcher-Goldfarb-Shanno [67] optimiser 13 on the training set.This converges in less than 5000 iterations, producing the nDGP emulator that we benchmark on the test set.In the top panel of figure 16 we show some boost factors examples taken from the test set (solid lines) and compared with the emulator's predictions (dashed lines).It is worth noticing that the emulated boost factors are free of high-frequency noise thanks to the smoothing discussed in subsection 3.2.In the bottom panel of figure 16 we show the mean (black dashed line) and variance (1σ and 2σ shaded regions) of the emulation error on the test set together with the residuals for the 30 examples shown in the top panel.The 2σ contours are well within the 1% threshold at all scales.
As an additional test of the emulation accuracy, we compare the emulator's predictions with the boost computed from Arepo simulations in nDGP gravity using 4 values of H 0 r c : 0.5, 1, 2, 5. Thanks to the flexibility of the emulator, unlike in the comparison between PM and Arepo simulations of figure 7, we are able in this case to compare the boosts at each of the 9 output redshifts of the Arepo simulation suite within the interpolation range.The results  are shown in figure 17.The emulator predicts the nDGP boost factor of Arepo with ∼ 2% accuracy in all cases and at all the scales considered.Additionally, we have checked that the emulator performance reaches ∼ 1% accuracy if restricting the redshift range to z ∈ [0.5, 2].This is made possible by the excellent agreement between the nDGP boost factors of COLA and Arepo simulations but also by the linear-theory correction of eq (3.3) applied to the nDGP boost factors of COLA simulations started at redshift z ini = 19.
It is important to notice that even the results of these Arepo simulations might be not perfectly converged, but it is beyond the scope of this paper to assess the accuracy of these full N-body simulations.Furthermore, we have Arepo results for specific values of the modified gravity parameter H 0 r c , so the accuracy of the emulator at the edges of the parameter space cannot be fully assessed.With these caveats, and considering the level of convergence of the COLA predictions for the nDGP boost factor discussed in subsection 3.1, we estimate the accuracy of the emulator to be of ∼ 2% up to k = 1 h Mpc −1 and of ∼ 3% up to k = 5 h Mpc −1 .Once a thorough assessment of the accuracy of the Arepo boost factors used here as ground truth should become available, this should be factored into the error budget for the emulator.
The biggest computational cost behind the nDGP emulator comes from running the COLA simulations necessary to produce the power spectra in GR and nDGP theories (∼ 10 4 CPU hours).The data processing applied to the power spectra in order to produce the training set has a minor impact on the computational cost (∼ 10 CPU minutes).Also, the training of the emulator is relatively fast (∼ 1 CPU hour).Once the model is trained, predictions are extremely fast.On an average single-core CPU, the model can produce ∼ 5000 predictions in a second.The prediction speed of emulators is key to enabling fast cosmological inference with Monte Carlo techniques which require 10 5 − 10 6 evaluations [68][69][70].
As we have shown here, DM power spectrum emulators can be extended to MG models, and in particular to nDGP gravity, with modest computational cost thanks to COLA simulations, enabling accurate and extremely fast predictions of the matter power spectrum up to k = 5 h −1 Mpc.

Conclusions
This article focused on the application of the COLA simulations for predictions of the matter power spectrum in GR and Modified Gravity (MG) theories.We addressed the issue of errors arising from finite resolution in numerical implementations of simulations, including time-resolution, force-resolution, and mass-resolution.Based on our analysis we designed and exploited a COLA simulation suite to extend cosmological emulators to nDGP gravity.Together with this paper, we release the nDGP emulator that will be publicly available at the address: https://github.com/BartolomeoF/nDGPemu.
To study the convergence of COLA simulations in ΛCDM, we argued that COLA converges to PM-only simulations increasing the time resolution, and referred to results obtained in this limit as PM results.By comparing high force-resolution PM simulations with Bacco emulator (trained with high mass-resolution simulations) and low mass-resolution N-body simulations (Arepo) for predictions of the matter power spectrum at redshift z = 1, we highlighted the importance of mass resolution, responsible for ∼ 4% differences at k = 1 h Mpc −1 , while showing that PM simulations reproduce the reference results in both the low massresolution and high-mass-resolution cases with ∼ 1% accuracy up to k ∼ 2 h Mpc −1 .After that, we produced a suite of simulations in ΛCDM with the same cosmology but with different numbers of particles, force grids and time-steps, and using high-resolution PM simulations as a reference, we tested the relative convergence of COLA simulations.We concluded that the power spectra converge towards the reference only when mass-resolution, force-resolution and time-resolution are increased accordingly.
To showcase the potentiality of COLA for extending cosmological emulators to MG theories, we developed an nDGP boost-factor emulator of the matter power spectrum in section 3.By using Arepo predictions as a reference, we investigated the accuracy of COLA simulations for the nDGP boost factor in relation to the resolution settings and identified minimal requirements for our target accuracy.To include MG effects at z > z ini in the backscaling approach, we proposed and validated the linear theory correction of eq (3.3).We also studied the response of the nDGP boost factors to the change of cosmological parameters finding weak cosmological dependence of the nDGP boost factor and confirming that the cosmological dependence becomes weaker for smaller deviations from ΛCDM (larger values of H 0 r c ).In subsection 3.2, we detailed our simulation strategy to compute the nDGP boost factors and examined the computational cost consisting of a total of ≈ 8 • 10 3 CPU hours.After smoothing the signals and remapping the parameters in a unitary interval, we performed a PCA and selected only 3 components while preserving 99.98% of the information.We then used the resulting data set to train a neural network and bench-marked its accuracy on an independent test set (< 1% at all scales considered).As an additional test, we compared the emulator predictions with the Arepo results and found better than 2% agreement at all scales considered.We that COLA simulations can be used to accurately extend matter power spectrum emulators to MG theories up to k = 1 h Mpc −1 (k = 5 h Mpc −1 ) with 2% accuracy (3% accuracy).
The approach developed in this work can be extended to other gravity models.In particular it would be interesting to train an emulator for the matter power spectrum in Horndeski gravity [71,72] which is made possible thanks to the recent extension of the COLA method [15,20,42] to the sub-class of Horndeski theories incorporating the Vainshtein screening mechanism.Furthermore, we plan to extend the convergence of COLA simulations for predictions of halo statistics for which we have already collected promising results.Finally, we aim to develop a suite of COLA-based emulators for several LSS probes, such as the ones studied in [16,40].

Figure 3 .
Figure3.Ratio of the matter power spectrum of simulations with 512 particles for the force and time resolutions shown in the legend with that of the high-resolution run.
, and split each figure into 2 panels for two redshift values z = 1.65, 1

kFigure 4 . 65 N 5 kFigure 5 .
Figure 4. Ratio of the matter power spectrum of simulations with 1024 particles for the force and time resolutions shown in the legend with that of the high-resolution run.

Figure 6 .
Figure 6.Convergence of matter power spectra towards the high-resolution result as mass-, forceand time-resolution are simultaneously increased (as indicated by the colour legend).Different line styles are used to distinguish results at different redshift values as indicated in the line-style legend.

Figure 7 .
Figure 7. Ratio of the nDGP boost factors at redshift z = 1 between PM simulations and Arepo simulations in 4 nDGP gravity models as indicated in the legend.

Figure 8 .
Figure 8. Ratio of the nDGP boost factors at redshift z = 1 between PM simulations started at redshift 19 and Arepo simulations in 4 nDGP gravity models as indicated in the legend.The left panel shows the raw results from simulations, and the right panel shows the results after applying the linear correction in eq (3.3).
(high force resolution).We compare the results for the nDGP boost factor of low mass-resolutions COLA simulations with high-resolutions PM simulations in figure9where we can see that the results are converged at sub-percent level up to k ∼ 1 h Mpc −1 and the force resolution has a significant impact on the boost factor only on very small scales, k > 2 h Mpc −1 .The results of the intermediate cases, N , are consistent with the high-resolution PM simulations within 2% up to k ∼ 5 h Mpc −1 .

Figure 9 .
Figure 9. Ratio of the nDGP boost factors in low mass-resolution COLA simulations with that in high-resolution PM simulations for different force resolutions, 1/3 mesh = 512, N 1/3 mesh = 1024, N 1/3 mesh = 1536 and N 1/3 mesh = 2048.In different panels, we show results for different nDGP gravity models as indicated in the top-left corner of each panel.The black dashed lines show the typical amplitude of the boost factor in the different gravity models for visual reference.

Figure 10 .
Figure 10.Ratio of the nDGP boost factors obtained with low-resolution COLA simulations (N 1/3 part = 512, N 1/3 mesh = 1024) with respect to the same boost factors but obtained with high-resolution PM simulations (N 1/3 part = 1024, N 1/3 mesh = 3072).The legend describes the redshift value of each line.

Figure 11 .
Figure 11.nDGP boost factor for three values of the parameter H 0 r c .The legend in figure 10 describes the redshift value of each line.

Figure 12 .
Figure 12.Change of the nDGP boost factor for H 0 r c = 0.2 due to the variation of each cosmological parameter (specified in each panel) with respect to the reference cosmology.The redshift values of each line are as described in the legend.

Figure 15 .
Figure 15.On the left.Comparison of the two smoothing techniques for the boost factor of nDGP with H 0 r c = 2. On the right.Distribution of the normalised parameters across the training set in 10 bins between 0 and 1: all but W rc and a are uniformly distributed, simplifying the emulation problem.

Figure 16 .
Figure 16.Comparison between boost factors of the test set (solid lines) and emulated boost factors (dashed lines) for 30 examples randomly chosen in the test set.The residuals between emulated and true boost factors (coloured solid lines) are shown in the bottom panel together with the mean (dashed black line) and variance (1σ and 2σ shaded regions) of the emulation error on the test-set.

Figure 17 .
Figure 17.Comparison between emulator predictions (dashed lines) and Arepo results (solid lines) of the nDGP boost factor at 9 redshifts z ∈ [0, 2] for 4 values of the parameter H 0 r c : 0.5 (in red), 1 (in orange), 2 (in blue) and 5 (in green).The bottom panel shows the residuals.
Comparison of PM simulations matter power spectra with Bacco and Arepo power spectra.The high mass-resolution PM run agrees with Bacco up to k ∼ 2h/Mpc within 1%.Arepo does not agree with Bacco but is in better agreement with the low mass-resolution PM run which has approximately the same mass resolution as the Arepo simulation.

Table 1 .
Range of cosmological parameters spanned by the nDGP emulator.

Table 2 .
Figure 13.Same as figure 12 but for H 0 r c = 2 List of cosmologies used to create the training set for the nDGP emulator.
b = 0.06 Figure 14.Same as figure 12 but for H 0 r c = 20

Table 3 .
List of cosmologies used to create the test set for the nDGP emulator.