Cosmological gravity on all scales. Part III. Non-linear matter power spectrum in phenomenological modified gravity

Model-independent tests of gravity with cosmology are important when testing extensions to the standard cosmological model. To maximise the impact of these tests one requires predictions for the matter power spectrum on non-linear scales. In this work we validate the ReACT approach to the non-linear matter power spectrum against a suite of phenomenological modified gravity N-body simulations with a time-varying gravitational constant, covering a wider range of parameter space than previously examined. This vanilla application of ReACT has limited range and precision due to the different concentration-mass relation c(M) that occurs when gravity is modified. We extend this approach with a fitting function for a modified concentration-mass relation, allowing for accurate (1%) computation of the matter power spectrum up k = 2 h Mpc-1 across a substantial range of parameter space. This fitting function allows precision model-independent tests of modified gravity to be carried out using the data from upcoming large scale structure surveys.


Introduction
Over the last two decades, the field of cosmology has entered an era of precision measurements.The current cosmological paradigm, the ΛCDM model contains Cold Dark Matter (CDM) and dark energy (sourced by the cosmological constant Λ) as its main components, but the underlying nature of both remains unknown.The validity of General Relativity (GR) is a crucial assumption in this picture since both CDM and Λ are inferred from solely gravitational observations.In addition, efforts to detect dark matter directly or indirectly have not yet been successful, while quantum field theory predicts a Λ value that is many orders of magnitude larger than that inferred from cosmological measurements.These issues form part of the motivation for modifying the law of gravity and the establishment of a vast space of modified gravity models [1,2] in order to explain the physics attributed to the dark sector.Efficiently testing this model space is a major challenge in cosmology.
The assumption that GR is the correct theory of gravity involves an extrapolation of several orders of magnitude from small scales (compact object mergers and solar-system tests) to the cosmological scales probed by measurements of the Cosmic Microwave Background (CMB).Upcoming experiments such as the Euclid satellite [3] 1 , the Nancy Roman telescope (NRT) [4] 2 , the Vera Rubin Observatory (VRO) [5] 3 and Square Kilometre Array (SKA) 4 [6] will generate an unprecedented volume of data in the so-called non-linear regime, where the density contrast δ 1, i.e., the fluctuation away from the average density of the Universe is large where GR has not been adequately tested.In the context of modified gravity, one requires a parameterisation of modified gravity models [7,8] that encompasses a large region of the parameter space.The lack of a robust tool that takes advantage of the additional constraining power from including non-linear scales to predict cosmological observables across the range of scales for general models of modified gravity results in strict data-cuts that excise non-linear scales from the data analysis, severely limiting constraining power (see, for example [9]).Previous studies of modified gravity in the non-linear regime have primarily been Nbody simulations of specific models such as f (R) or Dvali-Gabadadze-Porrati (DGP) gravity (see e.g.[10,11]).Such studies are useful to understand structure formation as a function of model-parameters, and explore observational signatures in specific theories [12].However, it is not practical to run dedicated N -body simulations for every modified gravity model in order to predict cosmological observables, as the associated computational cost would be too large.In addition, more model-independent approaches allow for a more comprehensive null test of ΛCDM and don't rely on us having already come up with the correct alternative.
One approach for testing gravity in a model-independent way on all scales is described in [13].In this approach, the standard parameterisation in linear theory involving two parameters is extended to all cosmological scales: µ is a change in the strength of gravity, and η is the "slip", a possible difference between the two Newtonian gauge scalar metric potentials.Both of these can in principle be almost arbitrary functions of time and space.In our simulation work we follow "the maximally-phenomenological pixels" approach described in [13] in which the modified gravity parameters consist of independent piece-wise constant functions composed of independent bins in time 5 as this is most suited to our philosophy of model-independence.
In [14] we studied the phenomenology of N -body simulations with binned time-dependent modified gravity parameters.We improved on the previous attempt to study such models [15,16] in which µ was set to be a constant value at z < 50, i.e., constant throughout the simulation redshift range.We focused on the matter power spectrum P (k) measured from the simulations and the weak-lensing observables computed from P (k).We found that the ReACT formalism [17][18][19] is most successful among those available in the literature in reproducing the cosmological observables computed in or from our simulations.We designed a pipeline to compute cosmological observables in the non-linear regime using ReACT.Here, we validate this pipeline in detail across a much wider region of the (phenomenological) modified gravity parameter-space.In particular, we quantify the range of scales any cosmological observable can reliably be computed using ReACT.In addition, there were indications in [14] that this implementation would not be precise enough over a wide enough range of scales to maximise the gain from upcoming surveys, and also that this issue could be alleviated by modifying the concentration-mass relation.We investigate this issue in detail and provide an extension to the fit of [14] that substantially improves the range of precision of the fitting function.Our goal is to validate ReACT in order to run modified gravity forecasts for binned µ(z) and η(z).
The structure of this paper is as follows.In section 2 we review our parameterisation, define the terms we will use to quantify the results from our simulations and present our adapted implementation of ReACT with binned modified gravity functions.We define our pipeline for computing P (k) and the weak-lensing convergence spectrum using ReACT.We also discuss how one can improve the accuracy of ReACT by modifying halo concentrations.We then present the suite of N -body simulations that we have run to validate our ReACT pipeline in section 3.In section 4, we quantify the accuracy and precision of ReACT in predicting P (k) and weak-lensing observables against our simulations both with and without the concentration modification.We discuss the relevance of these results in the literature and the impact they may have for cosmological parameter forecasts in the context of modified gravity for surveys such as Euclid, NRT, VRO and SKA in section and conclude in section 5.

Modified gravity framework
As described in [14], our parameterisation may be formulated as where ρ is the background density, ∆ = δ − ȧ a 3 c 2 k 2 ik i ṽi is the gauge-invariant density contrast in Fourier space, G N is Newton's constant, φ P and ψ P are the standard Newtonian gravitational potentials 6 (normally found to be equal in GR) and µ(a, k) is a dimensionless function of space and time representing a change to the strength of gravity.These equations can be derived from the FLRW metric expanded up the first Post-Friedmann order (which is one step beyond leading order) at which the equations describe structure formation on all scales (see [13] for details).An important consequence of this parameterisation is that the dynamics of massive particles are purely governed by (2.1), i.e., one can build and run dark matter only N -body simulations where the structure formation is completely specified by a single modified gravity parameter, µ.The second parameter η, the so-called slip parameter purely affects photon geodesics and, therefore, can be modelled in post-processing.

Applying the ReACT framework to modified gravity with binned µ
We showed in [14] that the ReACT framework performed significantly better than other candidates from the literature including the standard halofit prescription and the standard halo model in predicting P (k) measured from our simulations.We showed how we can take the publicly available implementation of the ReACT code and apply it to our binned models.We call this implementation the 'vanilla' implementation throughout this paper.
The fundamental idea at the centre of ReACT is that the accuracy of the halo model can be significantly improved by measuring the so-called 'non-linear response of the power spectrum' [20] i.e., the ratio of matter power spectra between two cosmologies that share the same linear power spectrum.The halo model reaction formalism computes the non-linear matter power spectrum for beyond ΛCDM cosmologies from the so-called 'pseudo spectrum', which is the ΛCDM power spectrum with the the initial conditions modified to match the growth in the modified gravity model.The use of the pseudo spectrum ensures that the transition from the 2-halo term to the 1-halo term is smooth.This transition was known to be a problematic region to model accurately for the standard halo model prescription.In this case, the similarities in the mass functions for the two cosmologies, i.e., since the two cosmologies have the same linear clustering, allows the inaccuracy in the transition region to be overcome.This was first shown in [20] and later validated for f (R) and DGP gravity in [21].This idea was implemented into a numerical fitting function dubbed the "halo model reaction" or ReACT [17,19,22].
A key point to note is that in the publicly available implementation of ReACT, two separate parameters, µ and F are used to model the linear and non-linear gravitational clustering, respectively.The linear modifications to the Poisson equation are encoded in µ(a, k), while F is computed by assuming a specific model and applying the spherical collapse formalism to specific modified gravity models with specified screening mechanisms.Given this prescription, the ReACT code computes the reaction from the following input information: • The linear power spectrum at the redshift(s) of interest for the target cosmology.
• The function µ which encodes the linear modification to gravity via the Poisson equation applied to scales over which perturbation theory may be applied.
• The function F that encodes the modification to the non-linear Poisson equation that is relevant for N -body simulations.
• Additional input parameters connected to specific models of modified gravity/established linear theory paramterisations such as the Effective Field Theory (EFT) of dark energy.
However, in our parameterisation the Poisson equation is derived making no distinction between the clustering of matter due to gravity on linear vs non-linear scales (see sec.IV A in [13]).Therefore, in order to make contact with our simulations within the the ReACT framework, one sets F ≡ µ(a, k) − 1.In other words, we ensure that the parameter µ(a, k) specifies both the linear and non-linear clustering in our simulations, which in general is not true for many models considered in the basic ReACT framework.We also switch off/set to their respective ΛCDM values any additional model-specific parameters.
The corrections from the modifications to the Poisson equation are propagated through to the halo model quantities such as the mass function via the spherical collapse model implemented within ReACT.This introduces a slightly different source term in the differential equation for the radius of the halo within spherical collapse.We describe this procedure in appendix A.
We quantify the performance of ReACT in later sections by comparing the matter power spectrum P (k) measured from our simulations against that predicted by the fitting function.In practice, we will compute the so-called halo model 'reaction' term, a dimensionless ratio of halo model quantities given by where P (z, k) is the matter power spectrum in the specified cosmology and P pseudo (z, k) is the ΛCDM matter power spectrum with the initial conditions modified such that the linear growth factor is identical to the the numerator at the redshift of interest.Using eq.2.3, we define the k fail parameter as the smallest wavenumber at which ReACT failed to predict R(k) in our simulations at a specified accuracy threshold of 1%.

Modifying halo model ingredients in ReACT
We found evidence in [14] to suggest that the accuracy of ReACT relative to our simulations may be improved by modifying the halo model parameters within the code.In particular, we modified the halo concentration-mass amplitude parameter.In this work we systematically develop this finding into a modification to ReACT that substantially improves the 'vanilla' ReACT approach.In this subsection we describe this procedure to model c(M ) as a function of µ(z) to maximise the accuracy of ReACT relative to our simulations.We call this implementation the 'extended' implementation of ReACT.
A key ingredient in the halo-model approach in ReACT is the density profile of dark matter halos.The standard profile is the Navarro-Frenk-White (NFW) density profile [23] where r s and ρ s are the scale radius and density, respectively.These are both free parameters within the simple parameterization.In general, the NFW profile is divergent, but one can impose an upper limit on the size of the halo given by the virial radius r = r vir .Integrating the density profile gives the halo mass given by where the NFW concentration parameter c = r vir /r s .The parameterization is usually complemented by a mass-concentration relation, c(M ).This relation is measured from simulations, and it reduces the number of free parameters to one that is defined by the total mass of the halo given a specific background density.Within the vanilla implementation of the ReACT algorithm, the c(M ) relation is specified by where c 0 = 9.0, M * is the characteristic mass computed by imposing a condition on the peak height ν(M * ) = 1 of the Sheth-Tormen mass function and α = 0.13.The c(M ) relation for the ΛCDM model, has been measured from a suite of N -body simulations [24] and is parameterized by where In general, the concentration scales with mass as a weak power law at low redshifts.This trend can be traced back to the hierarchical nature of structure formation.At late  [25] parameters from ( [24]).We note that there is significant scatter as our simulation suite has not been optimised for measuring c(M ).In the right panel, we illustrate that the ReACT c(M ) using the expression in (2.7) agrees with that from (2.6).This allows us to express the concentration-mass relation in our modified gravity simulations as a ratio relative to the ΛCDM ReACT case.times, when smaller halos have already formed and are merging to form larger halos, the inner regions of these large halos are relatively undisturbed by mass accretion.Therefore, as halos become larger, their virial radii become bigger while the core radius remains relatively unchanged.This then leads to more massive halos having lower concentrations in comparison to less massive halos [24].As such, hierarchical structure formation leads one to expect that increasing µ at late redshifts will enhance structure formation at late times, affecting massive halos and thus reducing the ratio of A/A ΛCDM and vice-versa for when µ is increased at early times.We found evidence for this behaviour in [14] and in this work.In addition, we also find additional complexity arising at very low z mp that we couldn't have resolved earlier due to the larger width of our µ bins relative to this work.
In this work, we show that the computation of the full non-linear P (k) with high accuracy in ReACT for generic modified gravity requires modification of the concentration-mass relation in section 4.2.We do this by finding the value of A in (2.7) that gives the best representation of the non-linear power spectrum from the simulation, where this is defined as the representation that has the largest k fail .By performing a range of simulations with different modifications to gravity, we provide a fitting function for the amplitude A of the c(M ) relation as a function of µ(z) validated against the simulations.
We primarily use an accuracy threshold of 1% to compute k fail .However, we note that baryonic feedback has been studied in a recent release of ReACT [18] quoting an accuracy of 3%.Therefore, we also present, both in our abstract and conclusions, the 1% and 3% k fail values.This is particularly relevant for our weak lensing results, where we also compute the corresponding values of fail for the weak-lensing convergence power spectrum (again for 1% and 3% accuracy thresholds), and we reflect on how robust these numbers are across our parameter space (see sec.4.3).
We have optimised our simulation suite to investigate and fit the non-linear matter power spectrum, which is sub optimal for measuring the c(M ) relation from the simulations.As such, we cannot perform a consistency check between the c(M ) in our modified gravity simulations and our fitting function prediction.Since we are only interested in a accurate way of creating the non-linear power spectrum from the linear one, this test is not essential, and we leave it to future work to examine this in detail.
We note that modifications to gravity have been shown to affect the ΛCDM c(M ) relation.It has been shown that in cosmologies with modified growth histories such as smooth dark energy models that the c(M ) relation is modified in order to accurately predict P (k) using the halo model [26].In f (R) gravity, it has been shown that the c(M ) is modified from a simple power law due to the fact that M * contains implicit environmental dependence on halo mass [27][28][29].These findings further justify our choice to improve the fit from "vanilla" ReACT by modifying the c(M ) relation.

N-body simulation suite & coverage of parameter space
In [14] we studied the phenomenology of models with a time-dependent µ by implementing piece-wise constant bins7 in redshift for µ in our simulations.In particular, we focused on the matter power spectrum P (k) in our simulations.We showed that non-linear clustering statistics can exhibit significant variation as a function of µ(z) for the same linear P (k).This analysis revealed that the ReACT formalism was able to accurately predict P (k) in our simulations, albeit for a small region of parameter space.In this work we quantitatively validate the ReACT formalism to understand the range of scales over which it can be used to predict P (k) with specified accuracy and precision.To do this, we need to explore a wide range of modified gravity parameter space.Since we bin µ in redshift, this involves sampling a three dimensional parameter space where the degrees of freedom are the centre of the bin, the bin width, and the value of µ.
As shown in [14], the linear growth factor D(z) between the start and end of a redshift bin is a good quantity to use to characterise the bin width (rather than working directly with a difference in a or z).As such, we continue to work with redshift bins of constant D(z) in ΛCDM (this can be computed analytically in linear theory).We now describe the procedure we followed to fix µ(z) across our simulations.
We designed two sets of simulations to extensively test the performance of ReACT.In simulation set α (presented in table 1) we fixed D(z) and obtained 8 redshift bins.All the bins in these simulations are chosen such that the ΛCDM growth factor associated to them is constant and is D(z max )/D(z min ) = 1.26.We choose the initial value of D(z) to be similar to that associated with the binning in [30] so that we may contextualise our results (and any future results derived from the data products in this work) with parameterised Euclid forecasts.
In set α we pick two different values of µ on each side of µ = 1, with the expectation from our findings in [14] that our results are likely to be symmetric about the line µ = 1.In other words, the matter power spectrum exhibited equal and opposite effects when µ = 1.1 and µ = 0.9, respectively in the same redshift bin.We found both in our previous work and in this analysis that this behaviour persists and is independent of the choice of µ(z), (see fig. 3 for more info).
We also noted in [14] that the performance of ReACT varied with position and width of the bin in question.In particular, we found that the 'maximal' departure from ΛCDM was  1.The simulation parameters for set α of our simulation suite in order to validate ReACT.This choice was designed to mimic the binning in [30] which is likely to be representative of the bin-width typically chosen for a Euclid -like survey.For this set of simulations, we varied µ symmetrically about µ = 1, to test if the symmetry we found in [14] persists.Note that these are all bins with equal ΛCDM growth, i.e., D( 26.This set does not represent the full suite of simulations we run (see fig. 2).Due to the symmetry in the shape of the power spectrum, we only use µ > 1 for set β, to make optimal use of computational resources.
in redshift bins at the extreme end of the redshift range associated to our simulations.This motivates a higher resolution analysis at low z in set β8 In simulation set β, we vary of µ, D(z) and over a range values to further validate ReACT in our approach.We restrict ourselves to µ > 1 (taking advantage of the aforementioned symmetry to save on computational resources) and introduce additional bins at low redshift (see fig. 2).We implement finer sampling of µ at the lower range of z in set β, which reveals that the shape of the P (k) undergoes a dramatic change as the midpoint of the redshift bin approaches z = 0 (see figs. 3, 4 and associated discussion).In order to develop and validate our fitting function for the concentration-mass amplitude parameter A, we ran additional simulations in which we vary the linear growth factor associated with our redshift bins; again, we restrict ourselves to µ > 1.We illustrate the full range of bins we have run in fig. 2.
We compare simulations across different values of µ and D(z) by keeping the bin midpoints constant across the parameter-space.In other words, the endpoints of each redshift bin are varied in order to keep the midpoint the same for all D(z).We will return to this point in the next subsection where we show our fitting function.This procedure allows us to systematically investigate the performance of ReACT for varying bin-widths, epochs and µ(z) in as orthogonal a manner as feasibly possible.Furthermore, as we will see in section 4, keeping the midpoint constant allows a simple interpretation of the variation of A(µ, D(z)).Our full suite of simulations represents an unprecedented exploration of phenomenological modified gravity parameter space on non-linear scales.

Vanilla Implementation
In the previous sections, we have explained the reasons that motivated our choice of using the µ − D(z) parameter space to explore the modified gravity phenomenology in our simulations.In this section, we show the results of measuring the matter power spectrum from our suite of simulations and quantify the performance of ReACT, both without any modifications and with modifications to the halo concentrations.We present R(k) for the different µ(z) presented in table 1 in fig. 3. We note that the symmetry about µ = 1 is preserved irrespective of D(z), µ(z) and the midpoint of the redshift bin.As a consequence of this symmetry, we restrict ourselves to µ > 1 for the rest of our analysis.
We also observe in fig. 3 that the 'departure' from ΛCDM seems to be more dramatic the later µ is switched on.This behaviour is consistent with our earlier results in [14] (but now with greater redshift resolution in the transition regime), where the relative difference to ΛCDM decreases with increase in the redshift at which the modification occurs.Furthermore, the shape of the matter power spectrum changes drastically when µ is switched on in the bin 0 ≤ z ≤ 0.43 compared to the general trend seen in the other bins.This regime is examined in greater detail in set β see fig. 2).We present the matter power spectra from this set of simulations in fig.4, where it is clear that this transition seems to occur around the epoch where the transition to dark energy domination takes place.We will return to this point shortly.We observe symmetry in the power spectra about µ = 1.We also note that the extent to which the power spectra differ from ΛCDM increases as the redshift at which modified gravity is switched on decreases.Consistently with this, we see later that the accuracy of ReACT is better for the power spectra closer to ΛCDM, i.e. for smaller |µ − 1| and higher z mp .
In [14] we hypothesised that the shape of the power spectra can be traced back to halo formation time as a function of its mass.In other words, due to the hierarchical nature of structure formation, the average mass of the halos forming in our simulations is large at low redshift.Therefore, the range of scales affected relative to the opposite case are when µ is modified at low redshift is larger.In other words, there is a transfer of power (from small scales to large scales) that takes place when µ is switched on at late times.We will come back to this point when we discuss our fitting function for halo concentrations, since this tends to reduce the typical concentration of the halos.3 is clearly broken once the Universe enters dark energy domination.This fact will influence the performance of our fitting function (see fig. 6).In the context of maximising the constraining power of future surveys, ensuring that k fail > 1 h Mpc −1 thus requires modifying the halo concentrations consistently with µ(z) (see sec. 2).
In fig. 5, we quantify this point by computing the value of k fail for our simulations in set β.We find that the departure from ΛCDM and, therefore, the accuracy of ReACT is monotonic with µ.We don't see such a clear trend for k fail as a function of D(z).Interestingly, ReACT does much better for the simulations in which µ is modified at 2 ≤ z ≤ 7, across parameter space.As mentioned earlier (see sec.2.2, the additional complexity in structure formation arising from mergers of halos drives down k fail for the lower redshift bins, with exceptions arising in cases where the bin-midpoint is such that the bin straddles the transition period between the two separate regimes of structure formation. 9. We note that the lowest k fail is associated to the smallest redshift bin, i.e., 0 ≤ z ≤ 0.43.As mentioned before, the transition into dark energy domination seems to coincide with the change in shape of the power spectra at z < 0.5 as we previously mentioned (see fig. 4), we see that the trend of how P (k) is modified as a function of µ(z) is broken and indeed reversed in the range 0 ≤ z ≤ 0.9.Due to the accelerated expansion at these late times, structure formation appears to be 'frozen in', and increasing µ simply adds power much more uniformly across a range of quasi-linear to non-linear scales relative to the pseudo-ΛCDM case as seen in fig. 4.
From our analysis, we see that the main challenge in predicting P (k) across our simulations is ensuring a consistent value of k fail across all our simulations.Particularly in the context of maximising the constraining power of future surveys, ensuring that k fail > 1 h Mpc −1 would require a factor > 2 improvement in k fail for the lowest redshift bins.We have already mentioned in sec. 2 that the ReACT P (k) prediction can be improved by modifying the halo concentrations consistently with µ(z).We will now describe how this can be achieved.The k fail values as a function of z mp i.e., the midpoint of the redshift bin at which µ is 'switched on' for our vanilla implementation of ReACT.We see that, as indicated in fig.3, the accuracy of ReACT when µ is closer to GR, i.e., k fail monotonically decreases as µ increases.We note that k fail tends to increase with z mp , but this trend is broken when z mp is in the transition regime from matter domination to dark energy domination.

Extended Implementation
We perform a fit for the ratio of amplitude of the c(M ) parameter relative to the ΛCDM case, that is A/A ΛCDM , required to maximise the k fail for all µ(z).This allows us to capture the varying phenomenology of our simulations more accurately using ReACT.Note that while we are fitting for a single parameter, i.e., the ratio of the amplitudes of c(M ), the actual fit contains multiple parameters (see eq. (4.4)).Following the ReACT philosophy of computing quantities as a ratio relative to ΛCDM, we compute our fitting function for the ratio A/A ΛCDM , thus ensuring our analysis is insensitive to realisation-dependent effects [31].We remind the reader that in order to compute R(k), one needs the modified gravity P (k) as well as the ΛCDM P (k) with equalised linear growth.
As earlier, we use the k fail parameter to quantify the accuracy with which ReACT captures the simulation results.We vary the amplitude A of the c(M ) relation to maximise k fail .We then calculate the ratio A/A ΛCDM as a function of the bin midpoint for each set of simulations in fig. 2. In other words, we model the parameter space across the range of values of 1 ≤ µ ≤ 1.2 and 1.26 ≤ D ≤ 1.45.With the variation of the amplitude of the concentration-mass relation, we are able to achieve k fail > 1.5 h −1 Mpc for an accuracy threshold of 1% and k fail > 3 h Mpc −1 for an accuracy threshold of 3%.
We wish to create a fit for the concentration modification as a function of bin midpoint, and do this as a function of µ and D, thus covering the full phenomenological parameter space we are considering.The general functional form for the fit is given by where z = z mp − z pk and z pk is the redshift at which the power spectrum is computed.We the redshift range into two regimes.In the first regime, i.e., 1.2 ≤ z ≤ 7, we use an exponential function where C 1 < 0 when µ > 1 (and vice-versa for µ < 1).This form ties back to the qualitative prediction that we made from hierarchical structure formation, i.e., modifying µ at later times leads to larger halos and therefore smaller concentrations relative ΛCDM.Note that for calculating the power spectrum at z pk when µ = 1 in a particular bin, A/A ΛCDM has a single constant value that is a function of z pk and the bin parameters {z mp , µ, D} (see appendix C for details).In particular, A/A ΛCDM is not taken as a function of redshift during the integration in redshift that occurs in the halo model calculation within ReACT.Our general fit for z pk could be used to make A/A ΛCDM a function of the redshift in the halo model calculation; we leave an investigation of whether this further improves the accuracy to future work.We find that this trend is broken at very low redshift, i.e., at z mp < 1.We fit this upturn in the behaviour of A/A ΛCDM using a cubic polynomial In the transition region between these two cases (0.7 ≤ z mp ≤ 1.2) we use a cubic spline to interpolate between f 1 and f 2 .Thus, we have a list of parameters that vary as a function of {µ, D}.We find that the C i scale linearly with D at fixed values of µ, and the dependence on µ is a second order polynomial at fixed values of D (see appendix C for more details).We consider µ first and derive the fitting parameters Ci and Ci , as a function of arbitrary µ at the extreme values of D = {1.26,1.35}, respectively.We then linearly interpolate in D to obtain the general form Ĉi in the two-dimensional (µ, D) space.This allows us to explicitly construct the final fitting function for A at z pk = 0 to be ) where we present the explicit functional forms for Ci and Ci in appendix C. Note that equation gives an accurate calculation of the non-linear matter power spectrum at z=0 for arbitrary µ, D(z) in the range 0 ≤ |µ − 1| ≤ 0.2 and 1.26 ≤ D(z) ≤ 1.45.See appendix C for how this is generalised to power spectra at non-zero z and for µ < 1.This general form is implemented in our extended ReACT code.We see that the shape of the fit remains consistent across parameter space, which ties back our discussion regarding the two regimes (see sec. 2).The exponential functional form at z > 1 follows from the hierarchical structure formation (see appendix A) while the strong upturn in the concentration at low redshifts is likely due to the freezing in of structure formation as dark energy domination occurs.
We show the performance of the fitting function described in this section at z = 0 as a function of and D(z) in fig.6.The two redshift regimes are clearly visible in both the panels.Each data point is obtained from the simulations, and the k fail values associated with this 'extended' implementation can be seen in fig.10.The robustness of this fitting function to simulation and cosmological parameters is discussed in appendix B. We also present a qualitative justification of the general observation of the modification at low redshift being more substantial than when µ is modified at high redshift in appendix A.
Interestingly, the sharp upturn in A coincides with the redshift at which Ω Λ starts becoming comparable to the critical density of the Universe.Once dark energy domination is firmly established, structure formation is frozen in and each modification of gravity affects structure formation uniformly across the range of scales of convergence in our simulations.This numerical recipe is to our knowledge the only validated one in the literature for modelindependent modified gravity.
We have already presented the 1% k fail for all of our simulations in fig.10.In particular, we present the k fail for varying accuracy thresholds in appendix C. We find that across all of our simulations, we achieve a 1% k fail ≥ 2 h Mpc −1 .Note that this is considerably improved for the bins with z mp > 1, where k fail > 3 h Mpc −1 can be achieved.Most notably, we see that a factor 2 improvement in k fail with our concentration modification compared to the vanilla ReACT.Finally, k fail values of 3 h Mpc −1 can be achieved to within 3% consistently across all our simulations.

Quantifying the performance of the fit for lensing observations
We now quantify the accuracy and range of validity of our fit for future lensing surveys and deduce fail , analogous to k fail in angular space 10 We compute the weak-lensing convergence Figure 7.We show the ratio of the weak-lensing convergence spectrum computed using the power spectra directly measured from our simulations to that computed from ReACT.The vertical lines are the cut-off in where the disagreement exceeds 1% for each case.In the left panel, we present the most pessimistic scenario, i.e., the µ(z) corresponding to the lowest value of k fail .In the right panel, we present a more 'realistic' scenario with a k fail is around the typical value in our simulations.For larger values of µ the improvement is from ∼ 200 to ∼ 500 − 1000.For smaller µ the improvement is from ∼ 400 to > 1000.
power spectrum from the matter power spectrum and the modified gravity parameters where P δ is the matter power spectrum, η is the second modified gravity parameter that affects the photon geodesics and χ is the comoving angular diameter distance to the source along the line of sight.We do this for the two conservative and realistic scenarios from our simulations.The function g(z) is a filter function that depends on the redshift distribution of the background galaxies and is typically written as We use the standard expression for the source galaxy redshift distribution given by [33] Figure 8.The weak-lensing convergence power spectrum cut-off computed for the bin with the smallest k fail compared to the ΛCDM spectrum with Euclid -like error bars.As presented in table 2, we show the cutoff that DES employed in their beyond-ΛCDM analysis which was restricted to linear scales, 1% and 3% values of fail for µ = 1.2 (red) and µ = 1.1 (blue).We also indicate the regions where baryons are most likely to play a significant role [32].Here, we see explicitly here that the amount of information scales as 2 , and therefore cutting off the C at low value of leads to information loss and therefore smaller constraining power.Note also that the error bars become smaller as one goes from small to large , i.e., from large scales to small scales.
We adopt a Euclid-like binning of the source number density into 10 equi-populated bins according to eq. (4.10) with z 0 = 0.9/ √ 2, α = 2 and β = 3/2, where we have assumed an average source density ng = 30 arcmin −2 [30,[34][35][36][37].The tomographic bins are such that our fitting function may be used across the entire weak-lensing data-set.The projected errorbars are obtained by computing the following [37] where f sky = 0.7 is the fractional sky coverage, σ = 0.21 is the variance of the observed ellipticities and ni is the surface galaxy density of each bin.We compute the C from the matter power spectra using both our simulation snapshots as well as from ReACT.We compare the prediction from vanilla ReACT and with our concentration fitting function implemented in the code with that obtained from the simulations to obtain an fail value up to which we formally validate ReACT for weak-lensing.
While we cannot associate a one-to-one mapping between k fail and fail , we would expect naively that simulations with small values of k fail will be correlated to the ones with low fail values.We report the 1% fail value for this case as our most conservative estimate of the scale up to which one can reliably compute the convergence power spectrum C .However, we also quote the 3% fail as a more 'realistic' estimate of the physical scales that can be accurately modelled with ReACT.Note that our 1% accuracy results may be viewed to be conservative Table 2.We report the value of fail that we find across the range of values of µ in our simulations.We find that the variation in fail is much more significant with µ compared to the linear growth factor D(z). Therefore, we report a conservative 1% fail corresponding to the simulation with the smallest 1% k fail and a 'realistic' 3% fail that corresponds to a more typical value of k fail that can be achieved to within 3% (see appendix C for more details on k fail as a function of µ and accuracy threshold).We pick these numbers to correspond to the bin corresponding to the smallest values of k fail as the limiting value for given value of µ, which is 0.91 ≤ z ≤ 1.47.
because the expected level of accuracy that can be achieved across the parameter space validated in our simulations accounting for baryonic feedback [17] in ReACT is 3%.We show an example of these two cases in fig. 7. Note that applying our extended implementation of ReACT always leads to a significant improvement in the range of scales we can accurately model.In addition, For smaller values of µ, even our vanilla implementation of ReACT is able to achieve sub-3% accuracy.
Table 2 shows that with our vanilla implementation, we are able to consistently model the weak-lensing convergence power spectrum only up to ∼ 500.In comparison, with our extended implementation, we are able to reliably model up to ∼ 1300.To put this in context, the DES analysis that reported 3x2pt constraints [9] on extensions to ΛCDM employed stringent scale-cuts that only data on linear scales survived.This corresponds to an fail ∼ 100 in our analysis.We note that this leads to a significant loss in constraining power.In a recent study [32], it was found that the modelling of baryons limited the constraining power that can be gained on scales corresponding to > 1500.We condense all these results into fig.8, where it is clear the that using our extended formalism allows one to achieve values of fail close to the limit quoted above which baryons will likely dominate the constraining power.We note that we are able to achieve fail within a factor of ∼ 2 of this limit even in the most conservative cases, and indeed achieve fail > 1500 for the vast majority of our simulations.Without our concentration fitting function, we consistently achieve fail < 500.

Conclusion
The wealth of data provided by upcoming surveys such as the Euclid satellite, NRT, VRO and SKA creates an unprecedented opportunity to test modified gravity in a model-independent way with cosmology, which could provide crucial evidence for the nature of the dark sector.The majority of this survey data is on non-linear scales, so robust theoretical predictions for this regime are crucial for maximising the gain from these surveys.
In this work we extend the preliminary investigation in [14] to perform a thorough validation of our ReACT approach for predicting the non-linear matter power spectrum in phenomenological modified gravity.We cover the modified gravity parameter space by treating µ as a piece-wise constant function, and we create a substantial suite of N-body simulations varying the value in each piece, the width of each piece, and the location (in redshift) of each piece.
We analyse the performance of our "vanilla" implementation of ReACT across this simulation suite, finding that we are able to model the weak lensing convergence power spectrum measured in our simulations across parameter space reliably up to ∼ 500.There is a significant loss in constraining power associated to this cutoff, as seen in fig.8. Therefore, we provide a fitting function to improve this.
We find that the concentration-mass relation is one of the key ingredients in ReACT and have derived a fit for the ratio of its amplitude relative to ΛCDM across the MG parameter space.The key results are in equations (4.1), (4.4), generalised to arbitrary redshift in appendix C, which we implement as a modification to the ΛCDM c(M ) relation in our extended version of ReACT.See table 2 for the range of validity of this expression for the matter power spectrum and weak-lensing calculations, given different choices of accuracy level and the region of phenomenological MG parameter space.We are able to achieve 1% accuracy in predicting the matter power spectrum up to k = 1 h Mpc −1 for the extreme case of µ = 1.2 in each of our bins and 3% accuracy up to k = 3 h Mpc −1 .We achieve larger values of k fail for µ = 1.1 and this can be seen in fig.10.We show for a Euclid-like survey, how this translates to a 1% (3%) cut-off of = 900 (1200) for bins with µ = 1.2 and = 1300 (2000) for µ = 1.1.
Most importantly, the tools developed here allow the data from existing and future surveys to be used to constrain phenomenological modified gravity.Performing this analysis robustly will be facilitated by the existing work demonstrating how baryonic effects and massive neutrions can be included in the ReACT approach [22].
Our simulations present a useful dataset from which one can recover the peculiar velocity field and therefore compute the small scale redshift-space-distortions of galaxy clustering [38].Cross-correlating the galaxy-clustering and lensing data can prove to be a powerful tool in the future to probe gravity on non-linear scales.Such an analysis would provide valuable insight into the nature of the pixelised µ approach, which will inform the full time and scale dependent analysis.In other words, we provide a way to produce 3x2pt forecasts and constraints in a model-independent way.
The time-dependent case studied here and in [14] exhibits rich phenomenology on nonlinear scales, and some theories (such as those encompassed by the Parameterised-Post-Newtonian Cosmology approach [39][40][41]), have no scale dependence in µ on non-linear scales.As such a detection or tight constraints on this purely time-dependent case would be a significant step forward in terms of testing gravity on cosmological scales.Nonetheless, now that the time-dependent analysis has been understood and fully validated, this approach can be extended to a wider analysis that includes the extra complexity from additionally allowing scale dependent behaviour of µ. conditions.Since we compute the ratio of matter power spectra with the identical initial conditions, our results are realisation-independent [31].We verified this by computing the R(z, k) for two of our simulations after averaging over three realisations and found identical results.We also investigated the effects of varying the standard cosmological parameters.We obtain the same R(z, k) at both z = 0 and z = 0.5 when we vary cosmological parameters, to the DES Y1 cosmological parameters [42] and the joint Planck-DES Y1 analysis [25].These results justify that the results in the main text are not dependent on the realisations that were examined, or on the Λ CDM parameters.This ensures that the fitting function has a wide range of validity and is suitable for future data analyses.
We make use of the k fail statistic in order to derive the fitting parameters in our extended implementation of ReACT (see sec.4.2).We derive our fitting parameters by imposing 1% accuracy threshold in computing k fail .This might be an overly conservative choice considering that recent release of ReACT that incorporated baryonic modelling [17] reported an accuracy of 3%.In fig. 10, we show the value of k fail we achieve for 1%, 2% and 3% thresholds, respectively for across the range of our simulation parameters (µ, D(z)).We see that our choice of using k fail means that we are most sensitive to variation in µ rather than D(z).This motivates our choice of presenting our weak-lensing results for the extreme values of µ in our simulation suite.

C Concentration Fitting function
In the main text, we derive the fitting function for the amplitude of the concentration-mass relation as a function of z to be an exponential at z > 1 and a cubic polynomial at z < 1.We noted that the fitting parameters vary linearly as a function of D and quadratically as a function of µ (see fig. 11.We thus derived the general form of the fitting parameters at z pk = 0. We now give the explicit functional forms we use to construct the fitting parameters at z pk = 0. We derive the fitting parameters Ci as a function of µ at D = 1.26 to be C i = γ i (µ − 1) + δ i (µ − 1) 2 .We show in table 3 the values of these parameters for the extreme values .We present the k fail parameter with different accuracy thresholds for our simulations across the our parameter-space of (µ, D(z)).This is particularly relevant for the lensing prediction since we would cut-off P (k) at k fail in the computation of the convergence power spectrum.In particular, our choice of 1% in the main text may be conservative, given that baryonic effects may limit the accuracy of ReACT to 3%.We also note that the variation in k fail is substantial between µ = 1.1 and µ = 1.2, whereas the difference is much less significant between the two cases D = 1.26 and D = 1.45.These observations motivate out choice of presenting the corresponding weak-lensing fail as a function of µ for 'conservative' and 'realistic' cases, respectively.We show the symmetry in the shape of the matter power spectrum across µ = 1 is also seen in the fitting function.In the above figure, the data points are derived from the simulations for µ = 1.1 (circles) and µ = 0.9, (crosses) respectively but we only derive the fit for µ = 1.1.We then derive the fit to the µ = 0.9 datapoints by taking the reciprocal of the fit the µ = 1.1 points.
Figure 13.We show the fitting function at z = 0 and z pk = 0.5, where the circular and cross data points represent the measurement of the fit at z mp corresponding to the simulations at z pk = 0 and z pk = 0.5, respectively.We linearly interpolate between the these two curves to obtain the fitting parameters at z pk = 0.25, and indeed at other redshifts to produce the weak-lensing convergence spectra in the main text (see fig. 7).In other words, we do not use any simulation data to produce the curve at z pk = 0.25, and still obtain fail > 800 consistently across our simulation set as well as agreement with the simulations (blue data points).
the concentrationmass relation above for Planck 2018 (TT+TE+low P+BAO)[25] parameters with the c(M ) measured from our simulations and the c(M ) relation in ReACT in fig.1.Moreover, we compare (2.6) and (2.7).Although we find that our simulations are quite noisy -they have not been optimised to measure the c(M ) relation -all are in agreement with each other.In what follows we will use (2.7) to modify the c(M ) relation within ReACT, keeping γ and M 0 constant and varying A.

Figure 1 .
Figure 1.In the left panel, we present the concentration-mass relation measured from our ΛCDM simulations and a fit for the Planck 2018 (TT+TE+low P+BAO)[25] parameters from ([24]).We note that there is significant scatter as our simulation suite has not been optimised for measuring c(M ).In the right panel, we illustrate that the ReACT c(M ) using the expression in (2.7) agrees with that from(2.6).This allows us to express the concentration-mass relation in our modified gravity simulations as a ratio relative to the ΛCDM ReACT case.

Figure 2 .
Figure 2. Locations of the set β simulations in the ∆z − z mp plane: the width of each redshift bin ∆z as a function of the bin midpoint z mp for the different values of D(z) we used in our simulations.We present the redshift bins across all of our simulations and we see that for z > 1 all the bin midpoints coincide.This allows us to use z mp as a reference point to compare results across different values of D(z).This figure contains both set α (red data-points) (see table 1) and set β (blue and brown data points).In set β, we varied D(z) and increased our redshift resolution at z < 1 to sample the resulting change in shape in P (k) for these low-z mp bins.Note that we run simulations with µ = {1.1,1.15, 1.2} for all these bins.
Figure 2. Locations of the set β simulations in the ∆z − z mp plane: the width of each redshift bin ∆z as a function of the bin midpoint z mp for the different values of D(z) we used in our simulations.We present the redshift bins across all of our simulations and we see that for z > 1 all the bin midpoints coincide.This allows us to use z mp as a reference point to compare results across different values of D(z).This figure contains both set α (red data-points) (see table 1) and set β (blue and brown data points).In set β, we varied D(z) and increased our redshift resolution at z < 1 to sample the resulting change in shape in P (k) for these low-z mp bins.Note that we run simulations with µ = {1.1,1.15, 1.2} for all these bins.

Figure 3 .
Figure 3. R(k) at z = 0 from all of our simulations with redshift bins associated with ΛCDM linear growth D = 1.26.We observe symmetry in the power spectra about µ = 1.We also note that the extent to which the power spectra differ from ΛCDM increases as the redshift at which modified gravity is switched on decreases.Consistently with this, we see later that the accuracy of ReACT is better for the power spectra closer to ΛCDM, i.e. for smaller |µ − 1| and higher z mp .

Figure 4 .
Figure 4. R(k) from the simulations with the µ modified in the redshift range containing the transition from matter to dark energy domination.The general trend indicated in fig.3is clearly broken once the Universe enters dark energy domination.This fact will influence the performance of our fitting function (see fig.6).In the context of maximising the constraining power of future surveys, ensuring that k fail > 1 h Mpc −1 thus requires modifying the halo concentrations consistently with µ(z) (see sec. 2).

Figure 5 .
Figure5.The k fail values as a function of z mp i.e., the midpoint of the redshift bin at which µ is 'switched on' for our vanilla implementation of ReACT.We see that, as indicated in fig.3, the accuracy of ReACT when µ is closer to GR, i.e., k fail monotonically decreases as µ increases.We note that k fail tends to increase with z mp , but this trend is broken when z mp is in the transition regime from matter domination to dark energy domination.

Figure 6 .
Figure 6.The fitting function for the ratio of the amplitude of the concentration-mass parameter relative to ΛCDM as a function of z mp .The data points are computed from our simulations at the midpoint of the redshift bin at which µ is switched on in the simulation.In the left panel, we present our fit for varying D(z) and in the right panel, for varying µ.We see that the shape of the fit remains consistent across parameter space, which ties back our discussion regarding the two regimes (see sec. 2).The exponential functional form at z > 1 follows from the hierarchical structure formation (see appendix A) while the strong upturn in the concentration at low redshifts is likely due to the freezing in of structure formation as dark energy domination occurs.

Figure 9 .
Figure 9.We show the solution to the differential equation (A.1) for different redshift bins for a constant value of µ.Clearly, the spherical collapse parameter is more dramatically affected when µ is modified at late times.

Figure 10
Figure 10.We present the k fail parameter with different accuracy thresholds for our simulations across the our parameter-space of (µ, D(z)).This is particularly relevant for the lensing prediction since we would cut-off P (k) at k fail in the computation of the convergence power spectrum.In particular, our choice of 1% in the main text may be conservative, given that baryonic effects may limit the accuracy of ReACT to 3%.We also note that the variation in k fail is substantial between µ = 1.1 and µ = 1.2, whereas the difference is much less significant between the two cases D = 1.26 and D = 1.45.These observations motivate out choice of presenting the corresponding weak-lensing

Figure 11 .
Figure 11.The variation of the fitting parameters as a function of µ at z = 0.

Figure 12 .
Figure 12.We show the symmetry in the shape of the matter power spectrum across µ = 1 is also seen in the fitting function.In the above figure, the data points are derived from the simulations for µ = 1.1 (circles) and µ = 0.9, (crosses) respectively but we only derive the fit for µ = 1.1.We then derive the fit to the µ = 0.9 datapoints by taking the reciprocal of the fit the µ = 1.1 points.
|µ − 1| Vanilla fail Extended fail Vanilla k fail Extended k fail