Quijote-PNG: The Information Content of the Halo Mass Function

We study signatures of primordial non-Gaussianity (PNG) in the redshift-space halo field on nonlinear scales using a combination of three summary statistics, namely, the halo mass function (HMF), power spectrum, and bispectrum. The choice of adding the HMF to our previous joint analysis of the power spectrum and bispectrum is driven by a preliminary field-level analysis, in which we train graph neural networks on halo catalogs to infer the PNG f NL parameter. The covariance matrix and the responses of our summaries to changes in model parameters are extracted from a suite of halo catalogs constructed from the Quijote-png N-body simulations. We consider the three main types of PNG: local, equilateral, and orthogonal. Adding the HMF to our previous joint analysis of the power spectrum and bispectrum produces two main effects. First, it reduces the equilateral f NL predicted errors by roughly a factor of 2 while also producing notable, although smaller, improvements for orthogonal PNG. Second, it helps break the degeneracy between the local PNG amplitude, fNLlocal , and assembly bias, b ϕ , without relying on any external prior assumption. Our final forecasts for the PNG parameters are ΔfNLlocal=40 , ΔfNLequil=200 , ΔfNLortho=85 , on a cubic volume of 1Gpc/h3 , with a halo number density of n¯∼5.1×10−5h3Mpc−3 , at z = 1, and considering scales up to kmax=0.5hMpc−1 .


INTRODUCTION
The presence of a certain degree of non-Gaussianity (NG) in the primordial cosmological perturbation field is a general prediction of both inflationary and other early Universe scenarios.In addition, both the level of the predicted NG signal and the shape of the expected NG signatures are significantly model dependent.This makes primordial non-Gaussianity (PNG) a powerful tool to constrain inflation, or alternative primordial models, and to provide clues about physics at very high energy scales.
From an observational point of view, the challenging aspect of any PNG analysis is that the expected NG signatures are very small and the optimal statistic that maximizes their signal-to-noise ratio is unknown from low-redshift observables.Indeed, to date there has been no experimental detection of a PNG signal, although significant constraints have been placed using Cosmic Microwave Background (CMB) data; the CMB is an ideal observable for PNG studies, since it formed at early times, when cosmological perturbations where still in the linear regime, hence preserving the statistical features of the primordial fluctuation field.The most precise results currently come from the analysis of Planck CMB data, which produced an upper bound on the level of PNG at roughly less than 0.1% than the amplitude of the Gaussian component of the field (Akrami et al. 2020).
The open question is whether and how we can obtain more stringent PNG constraints-or achieve a detection-with future cosmological observations.In this respect, it is known that, after Planck, CMB data have nearly saturated its PNG constraining power, with possible improvements of, at most, a factor ∼ 2 for relevant parameters in a majority of scenarios (Finelli et al. 2018;Abazajian et al. 2019).It is therefore necessary to explore different observables.Galaxy clustering is a natural candidate for two main reasons.First of all, in the limit of weak PNG, the bispectrum (i.e., the 3-point function of the Fourier/harmonic modes) of primordial cosmological perturbations contains most of the non-Gaussian information and the three-dimensional galaxy density field contains more bispectrum modes for NG analysis than the two-dimensional CMB map.Furthermore, some models-notably, those producing a "local type" bispectrum, where the signal peaks on squeezed Fourier mode triangles-generate a characteristic scale dependent signature in the galaxy power spectrum on very large scales (Dalal et al. 2008;Matarrese & Verde 2008;Slosar et al. 2008;McDonald 2008;Giannantonio & Porciani 2010;Desjacques & Seljak 2010a), which can be used to constrain NG.
In both cases there are however some important complications to consider.As far as bispectrum analysis is concerned, the big caveat is that the additional modes in the Large Scale Structure (LSS) bispectrum are in the non-linear regime.Hence, they present a "latetime" component generated by the non-linear gravitational evolution of structures, which is hard to disentangle and much larger than the primordial one.Of course, this late-time 3-point signal is interesting in itself since it carries a lot of information about cosmological parameters and structure evolution (Hahn et al. 2020;Hahn & Villaescusa-Navarro 2021); however, as long as we are focused on PNG, it is a massive source of contamination, with an amplitude ∼ 1000 times larger than the primordial signal of interest.The scale-dependent power spectrum signature on large scales clearly does not present this problem and was considered for a long time a cleaner LSS probe of PNG, although limited to a subset of all possible PNG scenarios.However, a significant issue has been recently re-pointed out also in this area (Reid et al. 2010;Barreira 2020Barreira , 2022)), namely the degeneracy produced by the breaking of the universality relation that was generally used to link the NG galaxy bias parameter b φ to the linear bias parameter b 1 .This is due to halo/galaxy assembly bias effects and, if not addressed in any way, it allows us only to constrain the b φ f NL combination.
A key objective in cosmological PNG studies is thus developing optimal data analysis strategies to overcome, at least partially, the aforementioned issues.As long as the b φ (b 1 ) relation is concerned, an active effort is being put into characterizing it as well as possible via numerical studies of N-body simulations (Barreira 2020(Barreira , 2022;;Lazeyras et al. 2023;Sullivan et al. 2023), in order to produce accurate priors.Another logical line of attack, which we start exploring in this work, is that of going beyond a power spectrum + bispectrum analysis and include extra summary statistics, which could help disentangle the PNG signal from late-time evolution effects.The open question, with no straightforward answer, is of course, which summary statistics are best suited to this purpose?In this paper we explore the halo mass function (HMF) as an interesting candidate.This choice was not casual but was driven by training graph neural networks to perform field-level likelihood-free inference on halo catalogues from Quijote-png simulations.The analysis of the outcome of those calculations led us to the conclusion that the model was extracting information from the abundance of halos, as we explain in section 3.1.Therefore, the halo mass function can be seen as a machine learning-driven statistic that stands ahead of others.
Furthermore, our choice is also justified at a theoretical level, since the HMF has been known for a long time to be sensitive to non-Gaussian initial conditionswhich are able to skew its distribution by changing the abundance of massive halos-and it was proposed as an interesting complementary PNG probe to the bispectrum in a number of papers (Matarrese et al. 2000;Sefusatti et al. 2007;Grossi et al. 2007;Pillepich et al. 2010;Desjacques et al. 2009;Grossi et al. 2009;Desjacques & Seljak 2010b;LoVerde & Smith 2011;Palma et al. 2020).On top of this, a major advantage of the HMF is that it directly depends on the PNG amplitude parameter f NL .Therefore, it does not exhibit the b φf NL degeneracy that affects the scale-dependent power spectrum signature.
This work belongs to the Quijote-png series (Coulton et al. 2023a;Jung et al. 2022a;Coulton et al. 2023b;Jung et al. 2022b), where we aim to build a simulationbased pipeline to optimally extract NG information, pushing our analysis to smaller, non-linear scales.This kind of approach is complementary to a perturbation theory-based, likelihood analysis of power spectrum and bispectrum (Moradinezhad Dizgah et al. 2021;Cabass et al. 2022a,b;D'Amico et al. 2022).See also Giri et al. (2023), for an alternative simulation-based approach, which uses large scale modulation of small scale power.
The paper is structured as follows: in section 2 we briefly describe the simulation dataset used in our analysis; in section 3.1 we describe our preliminary field-level analysis; in section 3.2 we recall and summarize the main methodological aspects of our data analysis pipeline to extract relevant summary statistics and compute the corresponding Fisher matrix; section 3.3 is devoted to a specific discussion of the HMF-the main new ingredient with respect to our previous analyses-and of how we extract it from simulations; our numerical Fisher forecasts are described in section 4, where we also discuss the improvements coming from complementing the initial power spectrum + bispectrum analysis with HMF estimates; finally, we draw our conclusions in section 5.

SIMULATIONS
In this work, we use the publicly available halo catalogues derived from the Quijote suite of N-body simulations (Villaescusa-Navarro et al. 2020).1These simulations have been produced using the codes 2LPTIC (Crocce et al. 2006) and 2LPTPNG (Scoccimarro et al. 2012;Coulton et al. 2023a) 2 to generate initial conditions at z = 127, Gadget-III (Springel 2005) to follow their evolution up to z = 0 and the Friends-of-friends algorithm to identify the halos in each simulation (Davis et al. 1985).
We report the cosmological parameters of these simulations in table 1.As described in section 3.2, we use 15000 simulations at the fiducial cosmology to evaluate covariance matrices, and paired sets of 500 catalogues where one parameter is displaced by a small step from its fiducial value to compute derivatives with respect to all parameters considered in the analyses.As in Coulton et al. (2023b); Jung et al. (2022b), we focus on the following cosmological parameters {σ 8 , Ω m , n s , h}3 and PNG amplitudes {f local NL , f equil NL , f ortho NL }, including a simplified bias parameter M min (the minimum mass of halos included in the analysis).To ensure that the initial condition generation method has not generated unphysical higher-order N-point functions, which could impact the results presented here, we performed further validation of the initial conditions by examining the primordial trispectrum.As is discussed in appendix A, we find no evidence of large, unphysical trispectra in the initial conditions.
We focus our analyses at redshift z = 1, for which all power spectra and (modal) bispectra have been computed in Jung et al. (2022b).Results at lower redshifts, z = 0.5 and z = 0, are also shown in appendix B.

Field-level analysis
As we discussed in the introduction, the problem of finding an optimal summary statistic that minimizes the error bars on a given cosmological or PNG parameter is unsolved.An alternative to using summary statistics is to perform field-level analysis.The goal with this kind of analysis is to maximize the amount of information that can be extracted without relying on summary statistics.While there are many types of methods to perform such analysis, in our case we made use of graph neural networks (GNNs) (Battaglia et al. 2018).The advantages of GNNs over other methods are that they 1) do not impose a cut on scales; 2) symmetries (e.g.rotational and translational invariance) can be easily implemented; and 3) can be more interpretable than other methods.Because of this, we decided to train GNNs to perform field-level likelihood-free inference.
As a starting point, we run 1,000 simulations; each containing 512 3 particles in a periodic box of size 1 h −1 Gpc.Each of those simulations has a different initial random seed but also a different value of f local NL in the range −300, +300.The value of the cosmological parameters was the same in all simulations.We then trained a GNN to perform field-level likelihoodfree inference on the value of f local NL .The architecture and training procedure are the same as those outlined in de Santi et al. (2023); Shao et al. (2022); Villanueva-Domingo & Villaescusa-Navarro (2022).
From this exercise, we found that our model was able to infer the value of f local NL with an error of σ(f local NL ) ∼ 35, at z = 0.In an attempt to understand the behavior of the network, we trained a deep set model (Zaheer et al. 2017) where the only information we made use of the halos was their masses, not their spatial positions.By training such a model, we found that the performance of this model was almost identical to the one of the GNNs.We thus concluded that the network was likely not using the clustering of the halos to perform the inference.Therefore, the network should be using the abundance of halos to infer f local NL .To verify this, we trained a simple model consisting of fully connected layers on the halo mass function of the halo catalogues from the simulations.We found that this model performed almost as well as the GNN.From this exercise, we reached the conclusion that the halo mass function is a summary statistic that contains lots of information, likely more than clustering-based statistics as the GNN did not use those to perform the inference.
We emphasize that we trained the GNN using halo catalogues from simulations that only vary f local NL .Therefore, our results did not account for degeneracies with cosmological parameters that could degrade the constraints, as we shall see below.
This motivated a further analysis-illustrated in the following sections-in which we explicitly extract the power spectrum, bispectrum and HMF from the Quijote dataset, as well as their covariance and response to variations in both cosmological and PNG parameters, in order to perform a full Fisher matrix forecast on nonlinear scales.

Fisher information
In this section, we recall the main ingredients of our Fisher analysis pipeline, which was previously used in Jung et al. (2022b).
The Fisher information matrix, defined as allows us to estimate the variance, σ 2 (θ i ) = (F −1 ) ii , of the optimal unbiased estimator of a given summary statistic s with covariance C assuming the statistic is Gaussian distributed,4 and neglecting the dependence of C itself on parameters (Carron 2013).
In this work, both the covariance and derivatives are computed from the simulations described in section 2. The covariance matrix is evaluated using where n r is the number of realizations at fiducial cosmology (15000 here).Then, to obtain an unbiased estimate of the precision matrix, we apply the Hartlap correction factor (Hartlap et al. 2007) where n s is the length of the summary statistic vector s (note however that this correction is very small here as n s ∼ 10 2 while n r = 15000).The derivatives are calculated using finite difference: where we use the sets of 500 simulations where one parameter θ i is displaced by ±δθ i with respect to its fiducial value.However, it was noticed in Coulton et al. (2023b); Jung et al. (2022b) that this number of realizations was not sufficient to obtain fully converged derivatives of the halo power spectrum and bispectrum, leading to spuriously low predictions when analyzing jointly cosmological parameters and PNG amplitudes.To overcome this issue, a conservative approach to Fisher matrix computations was developed in Coulton & Wandelt (2023); Coulton et al. (2023b), which is based on computing the Fisher matrix from maximally compressed statistics instead of working with the summary statistics directly.
As shown in Heavens et al. (2000); Alsing & Wandelt (2018), the compressed quantity defined by si = conserves all the statistical information about the parameter θ i contained in the data vector s, if s follows a Gaussian likelihood (hence, the same assumption as for the Fisher matrix in eq. 1).This compression uses the same ingredients as for the Fisher matrix computation (covariance and derivatives of s), with the addition of the mean s that is trivial to evaluate from the simulations at fiducial cosmology.Repeating the process for all parameters of interest in θ, one can then compute the Fisher matrix of the compressed statistics s by substituting it to s in eq. ( 1).In practice, one has to separate the initial dataset into two subsets.The first is used to perform the compression (i.e.compute the derivatives in eq. 5) and the second is compressed (i.e.s in eq. 5) and is then used to calculate derivatives ∂s/∂θ i and covariance Ĉ of the compressed statistics, to obtain a conservative estimation of the Fisher matrix.In this work, we use 80% and 20% of the simulations for the two steps respectively, which have been verified to give optimal and numerically stable results.We repeat the procedure for many random splits of the data (between the two steps) and average the results to minimize the intrinsic variance of the method.Finally, as shown in Coulton & Wandelt (2023), computing the following combination of the standard (overoptimistic) and compressed (conservative) Fisher matrices where G corresponds to the geometric mean defined by gives unbiased estimates of Fisher error bars with a much smaller number of simulations.An illustration of the different convergences for the three methods is provided in appendix C.

Halo mass function
In addition to the halo power spectrum and bispectrum, we consider the halo mass function (HMF) defined as the number of dark matter halos per unit of comoving volume per unit of logarithmic mass bins.
We measure it in the Quijote simulations using 15 logarithmic bins corresponding to halo masses M between approximately 2.0 × 10 13 and 4.6 × 10 15 M /h (note however that we do not use the first two bins in the analyses presented in section 4).To be exact, we use the same binning as in Bayer et al. (2021), where the counted halos contain each between 30 and 7000 dark matter particles. 5 In figure 1, we show the impact of the three shapes of PNG on the halo mass function.Both the local and equilateral shapes increase the number of massive halos for a positive f NL value (and decrease it for a negative f NL ) and have very degenerate signatures, while for orthogonal PNG it is the opposite.For less massive halos, the effect of PNG changes sign (with the switch occurring for higher masses for orthogonal PNG, which is the only one which appears in the mass range of the plot at z = 1).This effect was already present on early works on the HMF with PNG simulations (see e.g.LoVerde et al. 2008) and is due to the fact that, at fixed Ω m , more massive halos can only appear at the expense of less massive halos and matter in smaller structures.

Constraints from the HMF
As a preliminary exercise, in figure 2, we show the constraining power of the halo mass function on PNG amplitudes f NL of the three shapes, assuming exactly known cosmological parameters.As expected, the HMF is, in this case, extremely sensitive to the presence of PNG, leading to even tighter constraints than the power spectrum and bispectrum.For example, our Fisher forecast on PNG of the local type is σ(f local NL ) ∼ 30 at z = 0, which is in very good agreement with the GNN and deep sets results σ(f local NL ) 35 (see section 3.1), and is more than two times smaller than the equivalent power spectrum + bispectrum forecasted error bar.
However, it is well known that there are large degeneracies between f NL and several cosmological parame-5 The mass of a halo is given by M = N mp where N is the number of dark matter particle it contains and mp is the mass of a dark matter particle.However, mp depends on the cosmological parameter Ωm, which requires to include the correction term − 1

∂ HMF
∂ lnN when computing the derivative ∂ HMF ∂ Ωm (see Bayer et al. 2021, for details).This derivative can also be evaluated by finite difference, between bins of N .at z = 0 and z = 1.For internal comparison, the derivative with respect a given parameter θ is multiplied by the finite difference ∆θ, used for its numerical estimation (see table 1 for details).The vertical scale is logarithmic, except in the range [−10 −8 , 10 −8 ], where it is linear.Note that, in some cases, we have a change of sign in the fNL derivatives, implying an opposite effect of PNG on the abundance of high mass and low mass halos respectively.This is consistent with previous findings in the literature, as pointed out in the main text.The decreasing behaviour of all derivatives at high M is related to the exponential decay of the HMF in this mass range; note that a plot of the logarithmic derivatives would display clear differences between them, also at high M .The numerical results displayed here have all been cross-validated in the simulation-independent, halo-model based analysis that we describe in section 4.4.
When we jointly analyze all parameters, these degeneracies increase the errors significantly (by roughly one order of magnitude at z = 1, and slightly less at z = 0, where the change of sign of f NL derivative-seen in figure 1-helps distinguish it from the response to variations in other cosmological parameters), making them larger than those achievable from the power spectrum and bispectrum combination.

Joint constraints with the power spectrum and bispectrum
While, as expected, the HMF alone does not produce competitive f NL constraints in comparison with the power spectrum and bispectrum, it does remain interesting to investigate whether a combined analysis of all three statistics can produce significant improvements; this is the main point of the present work.Complementing our previous power spectrum + bispectrum analysis with the HMF can in principle benefit us in two ways.First of all, it directly adds extra information about the f NL parameter; also, it could be useful to help break the ).These constraints are derived from the Quijote suite of halo catalogues at z = 0 and z = 1, each having a 1 (Gpc/h) 3 volume.The solid lines (with triangle markers) are computed for each primordial shape independently, assuming a fixed cosmology (at fiducial values), while for the dash-dotted lines we marginalize over the cosmological parameters σ8 and Ωm.This highlights the large degeneracies between the parameters at the level of the halo mass function.For comparison, we also show the corresponding constraints from the power spectrum and bispectrum (horizontal solid lines and dash-dotted lines for the independent and joint cases respectively), as computed previously in Jung et al. (2022b) (Mmin = 3.2 × 10 13 M /h).
If we consider the unmarginalized HMF results, we see that the fNL constraining power is higher at z = 1 for the local and equilateral case, despite the smaller number of halos at this redshift; this is clearly due to a stronger response of the HMF to variations in fNL at higher redshift, consistent with previous findings (see, e.g., figure 4 in LoVerde et al. 2008).
The shape is due to the change of sign in the fNL derivative at different masses, discussed in the main text and in figure 1.
important degeneracy between f NL and the so-called b φ bias parameter.Before presenting our results, let us review and discuss the latter point in more detail.In the presence of local PNG, the halo density fluctuation field δ h (z) can be written to leading order as follows (Dalal et al. 2008 where δ m is the matter density fluctuation, D(z) is the growth factor and b 1 , b φ are bias parameters, defined respectively as the response of δ h to mass density δ m and primordial potential φ.It is evident, in this relation, that the scale-dependent signature depends on both b φ and f NL , and that the two parameters are completely degenerate.This issue can be avoided if one assumes-as it was generally done-the universality relation between b 1 and b φ , that is where δ c is the critical density for collapse.However, it has been recently pointed out in Barreira (2020Barreira ( , 2022) ) that such a relation does not accurately describe the bias of either galaxies, selected by stellar mass, or halos, selected by concentration.Therefore, b φ is not exactly determined anymore and this reintroduces the b φf NL degeneracy problem.To overcome the issue, different studies have been focusing on using simulations to produce accurate priors on b φ (Lazeyras et al. 2023) and on exploiting the multi-tracer technique (Barreira & Krause 2023;Sullivan et al. 2023;Karagiannis et al. 2023).In the present context, the idea is instead to try and break the degeneracy by exploiting the information in the HMF-which selects all halos in each given mass bin-and its direct dependence on f NL and not on b φ .
For clarity, we split the discussion of our results into two parts: initially, we assume universality in the b φ (b 1 ) relation using eq.( 9) and we measure the sheer extra information content in the HMF, in absence of the b φf NL degeneracy 6 ; later on, we instead treat b φ as a free parameter.

Fixing b φ
The outcome of the first part of the analysis (assuming universality in b φ (b 1 ) is illustrated in figure 3 and 4 (see also table 2).We see that, by adding the HMF, error bars on σ 8 and f equil NL roughly become two times smaller than the power spectrum + bispectrum result.Moreover, there is also a noticeable improvement for Ω m and f ortho NL .For f local NL there is instead no clear improvement; this seems due to the fact that in this case the information content is totally dominated by the power spectrum contribution, via scale dependent bias; such contribution is instead smaller for the orthogonal shape and absent for the equilateral case, making the HMF inclusion more important for these scenarios and especially for the equilateral one.
Note that we consider only halos with masses above ∼ 4 × 10 13 M /h in the HMF, which is larger than the fiducial M min = 3.2×10 13 M /h used to study the power spectrum and bispectrum.This means that the HMF is not sensitive at all to small variations of M min around 6 Or, equivalently, we forecast the power spectrum + bispectrum + HMF constraining power on the b φ f NL parameter combination  the fiducial value.However, through cross-correlated terms with the other summary statistics, error bars on M min are almost two orders of magnitude smaller 7 .In appendix C, we verify the numerical stability of our results by varying the number of simulations used.
It is interesting to check which halo mass range gives the largest contribution to the observed improvements.To this purpose, we repeat the analysis by sub-dividing halos in a "low mass" (4×10 13 < M < 1.7×10 14 M /h), "intermediate mass" (1.7×10 14 < M < 7.5×10 14 M /h) and "large mass" (M > 7.5 × 10 14 M /h) interval and check the contribution of each group separately.Our results are displayed in figure 5 where we see that the low mass range carries most of the information.This is somewhat counter intuitive, since the effect of f NL is expected to be larger in the tails of the distribution.
7 An important caveat here is that it is important to verify whether this conclusion holds when considering a more complex bias model, which includes higher order bias parameters; this will be done as part of a future work on mock galaxy catalogues, by including numerical derivatives with respect to HOD parameters Accounting for the effects of b φ in our methodology is not straightforward, since b φ cannot be explicitly in-cluded as an input parameter in our simulations and this does not allow us to directly compute the numerical derivative ∂s/∂b φ .To circumvent this issue in a simple way and be able to perform a first test of the ability of the HMF to remove degeneracies between b φ and f local NL , we then decide here to work in the conservative assumption that these two parameters are fully degenerate at the level of the halo power spectrum and bispectrum.In other words, we assume that ∂s/∂b φ ∝ ∂s/∂f local NL , where s is either the power spectrum or the bispectrum.
For the HMF, we instead set the derivative with respect to b φ equal to zero, as it does not depend on this parameter, and compute the f local NL derivative as usual.In figure 6, we show the 1-σ Fisher constraints obtained in this assumption and compare them with the "ideal" (b φ fixed) constraints derived in the previous section, for different k max (see also table 2).
The most important result here is that the inclusion of the HMF makes it possible to break the b φ -f local NL degeneracy to a level which allows us to produce meaningful f local NL constraints without resorting to any prior information on b φ .The final f local NL forecast is however degraded by a factor ∼ 2.5 with respect to the idealized, b φ fixed case that was shown in figure 11.In order to achieve this constraining level it is also crucial to include the information from the power spectrum and bispectrum at non-linear scales (k between 0.2 and 0.5 h Mpc −1 ), as it helps break degeneracies with several cosmological parameters (Ω m in particular).
We corroborate our findings with a simulationindependent analysis based on the halo model (for a review, see Cooray & Sheth 2002;Asgari et al. 2023).Within this framework, we describe the HMF and halo power spectrum following Takada & Spergel (2014), up to k max = 0.2 h Mpc −1 .We use the halo mass function and bias from Tinker et al. ( 2010) using directly M 200,m as the mass definition in the mass integration.In the power spectrum analysis of the simulations the halos are considered point-like, thus we use a Dirac delta as halo profile; thanks to the low k max we use, the 2-halo term dominates the signal and this approximation is appropriate.The effect of PNG-here we only consider the local model-is included as a correction to the HMF parametrized according to LoVerde & Smith (2011), and through the scale dependent halo bias shown in equation ( 8).While aware that the M 200,m mass does not match the FOF mass used in the rest of the paper, we still consider as observable the HMF divided in  10 bins logarithmically spaced between 3.2 × 10 13 M /h and 3.2 × 10 15 M /h.We bin the halo power spectrum in 30 bins logarithmically spaced between 6.3 × 10 −3 h Mpc −1 and 0.2 h Mpc −1 .We choose a relatively low k max to ensure that non-linearities are negligible at this stage.In the HMF-halo power spectrum covariance, for which we again follow Takada & Spergel (2014), only the Gaussian terms are included at present.A more refined analysis, including a wider range of scales and masses, the complete covariance, uncertainties on the parametrization of the HMF and, crucially, the bispectrum will be presented in a future work (Ravenni & et al. in prep.).
The results are shown in figure 7, which highlights a very good agreement between our preliminary theoretical computations and the purely simulation-based forecast.This result confirms that a joint analysis including the HMF is an interesting approach, which deserves further investigation and could be adopted as a complementary strategy to those already implemented in the literature to address the b φ -f local NL degeneracy issue.

Removing degeneracies with Planck priors
As highlighted in section 4.2, removing degeneracies of the HMF using the information from the halo power spectrum and halo bispectrum improves significantly the constraints on PNG of the equilateral type.In this section, we push the idea further by assuming strong, but realistic, priors on cosmological parameters, based on CMB measurements from Planck.
We use the same Gaussian likelihood based on the Planck CMB data (Aghanim et al. 2020) 8 in addition to our HMF, power spectrum and bispectrum measurements to derive 1-σ Fisher constraints (see also table 2).For both f local NL and f equil NL it improves these constraints, while the effect is smaller for f ortho NL .Note also that the effect is the strongest when the HMF is also considered in the analysis, meaning it removes degeneracies between PNG and cosmological parameters at the level of the HMF.Concerning numerical convergence with the number of simulations used to compute the derivatives, including these Planck priors also improves it significantly, where only f equil NL is not optimally constrained for the power spectrum + bispectrum case, and all parameters have converged when we add the HMF information.

CONCLUSION
In this work we presented a combined analysis of the power spectrum, bispectrum, and mass function of dark matter halos in the Quijote-png simulation suite.Our main goal was that of verifying whether adding the HMF to our previous joint power spectrum and bispectrum analyses (Coulton et al. 2023a;Jung et al. 2022a;Coulton et al. 2023b;Jung et al. 2022b) could lead to improved constraints on primordial non-Gaussianity.The main underlying reason behind this analysis is that the HMF turned out to be the statistics used by a sophisticated graph neural network when carrying out a preliminary field-level likelihood-free inference calculation.Furthermore, the HMF tail has been known for a long time to be strongly sensitive to PNG.Finally, the HMF not only carries complementary information to the power spectrum and bispectrum, but also does not suffer from the b φ -f local NL , assembly bias-PNG degeneracy that has been recently pointed out in Barreira (2020Barreira ( , 2022) ) as an important issue in the analysis of local PNG.
Our results show that the HMF can indeed play a significant role in tightening the expected PNG bounds and breaking parameter degeneracies, when its contribution is added to those of the power spectrum and bispectrum.In the first part of our analysis we remove a priori the b φ -f local NL degeneracy by assuming universality in the b φ (b 1 ) relation, i.e, we set b φ = 2δ c (b 1 −1).In this case, we see that the HMF is able to improve equilateral f NL constraints by roughly a factor 2 and orthogonal f NL constraints by 15%.Constraints on PNG of the local type are instead unchanged, since in this idealized scenario the local PNG information is dominated by the large scale power spectrum modes, via scale dependent bias.
In the second part of the analysis, we treat instead b φ as a free parameter and assume that the responses of the halo power spectrum and bispectrum to changes in b φ and f local NL are identical, that is, we assume that these two parameters are fully degenerate in a joint analysis of power spectrum and bispectrum.Starting with this setup, we then see that the additional inclusion of the HMF is able to break the b φ -f local NL degeneracy at a significant level, without the need to rely on any prior on b φ or any other external information.More precisely, our final f local NL constraints after marginalizing over b φ and other standard cosmological parameters are now degraded by a factor ∼ 2.5, compared to the ideal case in which b φ is fixed by the universality relation.We confirmed these results with a semi-analytical, halo model based evaluation of the Fisher matrix, in which we restrict ourselves to the power spectrum and HMF, after verifying that for local PNG these two observables give the dominant contributions to the final sensitivity.We note that to achieve the claimed level of precision on f local NL , it is important to include non-linear scales in the analysis, up to k max = 0.5 h Mpc −1 since they help break additional important degeneracies that affect the HMF constraining power.We also stress that Quijote-png simulations have a cosmological volume of 1 (h Gpc) −3 , making it not straightforward to generalize our forecasts to, e.g., a Euclid-like or other coming survey settings.For the same reason, a direct comparison with other forecastssuch as those based on the multi-tracer methodology and placing suitable priors on b φ -are not at the moment easy to make.In a forthcoming publication, Ravenni & et al. (in prep.), we will produce more detailed semianalytical predictions for future surveys, based on the halo model.
The results presented here have to be considered as preliminary also as they rely on a simplified bias model for our tracers and they do not account for systematic effects in the determination of the HMF from actual observations.Indeed, the dark matter mass of a halo is a quantity notoriously difficult to measure observationally, especially for high-redshift objects.Halos are complex and dynamic structures, which are almost exclusively probed by the signal broadcasted by the baryons they host.(Dark) Mass measurements tend to require sophisticated and labor-intensive observations, which is unfeasible for a large number of objects, as needed for the HMF.Moreover, the sample completeness (for the host halo, not the tracers!) need to be known exquisitely well, which may constitute a formidable challenge.Among the most promising approaches are the Sunyaev Zeldovich effect-selected clusters (signal at mm wavelengths) (Mroczkowski et al. 2019), X-ray clusters (Pratt et al. 2019) and (optical) gravitational lensing mass determination (e.g.Murray et al. 2022).For example, cluster catalogs will increase drastically with a suite of forthcoming experiments; eROSITA (Predehl et al. 2021), Simons Observatory (Ade et al. 2019), Euclid (Laureijs et al. 2011), Roman (Akeson et al. 2019) and Rubin (Ivezić et al. 2019).Cluster masses will not be measured directly but inferred through proxies; these proxies, however, will be provided as a product of these surveys, and are expected to be or be made robust and reliable.An important ingredient for any HMF analysis would be to quantify robustly the probability distribution of the proxies as a function of the true halo mass.This can then simply be folded in the error budget and the uncertainty propagated through to the inferred parameters.
The results shown in this paper clearly show that a joint analysis of the HMF, power spectrum and bispectrum of LSS tracers is a promising approach to constrain PNG, hence providing another motivation for further investigation in this direction and for addressing the aforementioned observational issues. .The significance of the detection of the trispectrum in the initial conditions for the three types of primordial non-Gaussianity.This is computed using 200 simulations of each type of primordial non-Gaussianity as τ NL (Kogo & Komatsu 2006).In many inflationary models, τ NL is generated with local non-Gaussianity and thus, the trispectrum seen here is physical.
These trispectra measurements suggest that unphysical higher order N-point functions are not significant in our simulations.

B. ANALYSES AT OTHER REDSHIFTS
We have performed a similar analysis using the Quijote snapshots at z = 0.5 and 0 to verify that our conclusions hold at other lower redshifts.As can be seen in figure 10, this is indeed the case.For all parameters, the relative improvements due to including the halo mass function in the Fisher analysis are of the same order (note however that the difference between the halo power spectrum and bispectrum results is more pronounced at lower redshifts).

C. CONVERGENCE OF NUMERICAL DERIVATIVES
In figure 11, we study the impact of varying the number of simulations used to compute numerical derivatives on the 1-σ Fisher constraints, both with and without including the HMF in the analyses.This shows that the parameters for which the improvement due to the HMF is the largest (i.e.σ 8 and f equil NL ) have also a better numerical convergence with the number of simulations (smaller difference between standard and conservative Fisher methods).

Figure 1 .
Figure 1.The halo mass function derivatives with respect to the parameters σ8, Ωm, f local NL , f equil NL , f ortho NL

Figure 2 .
Figure2.The 1-σ Fisher error bars on fNL (local, equilateral and orthogonal) from the halo mass function, as a function of the maximum mass Mmax of halos considered (Mmin ∼ 4.1 × 10 13 M /h).These constraints are derived from the Quijote suite of halo catalogues at z = 0 and z = 1, each having a 1 (Gpc/h) 3 volume.The solid lines (with triangle markers) are computed for each primordial shape independently, assuming a fixed cosmology (at fiducial values), while for the dash-dotted lines we marginalize over the cosmological parameters σ8 and Ωm.This highlights the large degeneracies between the parameters at the level of the halo mass function.For comparison, we also show the corresponding constraints from the power spectrum and bispectrum (horizontal solid lines and dash-dotted lines for the independent and joint cases respectively), as computed previously inJung et al. (2022b)  (Mmin = 3.2 × 10 13 M /h).If we consider the unmarginalized HMF results, we see that the fNL constraining power is higher at z = 1 for the local and equilateral case, despite the smaller number of halos at this redshift; this is clearly due to a stronger response of the HMF to variations in fNL at higher redshift, consistent with previous findings (see, e.g., figure4inLoVerde et al. 2008).The shape is due to the change of sign in the fNL derivative at different masses, discussed in the main text and in figure1.

Figure 3 .
Figure 3. Ratio of 1-σ Fisher error bars on cosmological parameters and PNG amplitudes from the halo mass function, halo power spectrum and halo bispectrum at z = 1, assuming b φ fixed.This illustrates how including the halo mass function tightens the constraints on several parameters (σ8 and f equil NL in particular).Note that the values of these error bars are given in table 2 and figure 11.

Figure 4 .
Figure 4.The impact of the HMF on the 1-σ constraints on cosmological parameters and PNG amplitudes from the halo power spectrum and bispectrum at z = 1, assuming b φ fixed.However, this is balanced by the fact that, within the analyzed mass range and considering our mass binning choice, we have significantly more halos in the lowest mass interval (∼ 95%, ∼ 5% and 0.01% of the halos are in the low, intermediate and large mass bins respectively).4.4.Breaking the b φ -f local NL degeneracy with the HMF

Figure 5 .
Figure 5.The impact of different mass bins of the HMF on the 1-σ constraints on cosmological parameters and PNG amplitudes from the halo power spectrum and bispectrum at z = 1, assuming b φ fixed.Note that we restrict only the mass range for the HMF.

Figure 6 .
Figure 6.The Halo Mass Function can break the b φ -f local NL degeneracy in the power spectrum and bispectrum.As in figure 3, we show normalized 1-σ Fisher error bars derived from the HMF, halo power spectrum and bispectrum at z = 1.Here, we assumed that f local NL and b φ are fully degenerate at the power spectrum and bispectrum level, while the HMF does not depend on b φ .

Figure 7 .
Figure 7. Similar to figure 6, considering only {σ8, f localNL } and bias parameters.The 1-σ Fisher constraints include the information contained in the HMF and the power spectrum information up to kmax = 0.2 h Mpc −1 computed using the halo model on the left, and from simulations on the right.Note that both methods give σ(f local NL ) ∼ 50 and similar σ(σ8) (less than 20% difference).

Figure 8 .
Figure 8. Similar to figure 3, where we include Planck priors on the cosmological parameters {σ8, Ωm, ns, h} and we assume b φ fixed.
Figure9.The significance of the detection of the trispectrum in the initial conditions for the three types of primordial non-Gaussianity.This is computed using 200 simulations of each type of primordial non-Gaussianity

Table 1 .
The parameters of the Quijote and Quijote-png halo catalogues used in this work.

Table 2 .
The 1-σ constraints on cosmological parameters and PNG amplitudes at z = 1 obtained by combining the information of the halo power spectrum, bispectrum and mass function, each measured from the Quijote and Quijote-png simulations.