Fast Generation of Mock Galaxy Catalogs with COLA

We investigate the feasibility of using the comoving Lagrangian acceleration (COLA) technique to efficiently generate galaxy mock catalogs that can accurately reproduce the statistical properties of observed galaxies. Our proposed scheme combines the subhalo abundance-matching (SHAM) procedure with COLA simulations, using only three free parameters: the scatter magnitude (σ scat) in SHAM, the initial redshift (z init) of the COLA simulation, and the time stride (da) used by COLA. In this proof-of-concept study, we focus on a subset of BOSS CMASS NGC galaxies within the redshift range z ∈ [0.45, 0.55]. We perform GADGET simulation and low-resolution COLA simulations with various combinations of (z init, da), each using 10243 particles in an 800 h −1 Mpc box. By minimizing the difference between COLA mock and CMASS NGC galaxies for the monopole of the two-point correlation function (2PCF), we obtain the optimal σ scat. We have found that by setting z init = 29 and da = 1/30, we achieve a good agreement between COLA mock and CMASS NGC galaxies within the range of 4–20 h −1 Mpc, with a computational cost lower by 2 orders of magnitude than that of the GADGET N-body code. Moreover, a detailed verification is performed by comparing various statistical properties, such as anisotropic 2PCF, three-point clustering, and power spectrum multipoles, which shows a similar performance of the GADGET mock and COLA mock catalogs with the CMASS NGC galaxies. Furthermore, we assess the robustness of the COLA mock catalogs for different cosmological models, demonstrating consistent results in the resulting 2PCFs. Our findings suggest that COLA simulations are a promising tool for efficiently generating mock catalogs for emulators and machine-learning analyses to explore the large-scale structure of the Universe.


INTRODUCTION
The observable universe provides a wealth of information concerning the evolution of cosmology, and the statistics of its large-scale structure (LSS) plays a crucial role in constraining cosmological models.The study of LSS relies on large-scale sky surveys, which can be classified into three main types, i.e. photometric and spectroscopic galaxy surveys, and line-intensity mapping (LIM) surveys.The advancement of precision cosmology has been greatly aided by the availability of precise observational data, coupled with the progress in cosmological simulations and statistical methods.
In recent years, the advancement in Large-Scale Structure (LSS) analysis has required the use of considerably larger sets of simulations.Currently, the prevailing statistical estimators utilized to investigate LSS include the twopoint correlation function (2PCF) (Landy & Szalay 1993), the three-point correlation function (3PCF) (Szapudi & Szalay 1998), and Fourier space multipoles (Feldman et al. 1993;Hand et al. 2017).In order to accurately model the relationship between these statistics and cosmological parameters, it becomes imperative to rely on emulators constructed using extensive sets of simulations (Knabenhans et al. 2019(Knabenhans et al. , 2021;;McClintock et al. 2019;Zhai et al. 2019).Moreover, in recent years machine learning algorithms have increasingly found diverse applications in the analysis of galaxy survey data (Ravanbakhsh et al. 2017;Schmelzle et al. 2017;Ntampaka et al. 2019;Pan et al. 2020;Mao et al. 2020;Lazanu 2021;Wu et al. 2021;Anagnostidis et al. 2022;Makinen et al. 2022;Wu et al. 2023).Therefore, acquiring mock catalogues of galaxies in different cosmologies becomes indispensable for effectively training machine learning models and constraining cosmological parameters.Hence, the generation of ample mock catalogs in different cosmologies is extremely important, irrespective of whether one chooses to employ manually designed statistics as the data summary or adopts machine-learning methods.
The advancement of observations also necessitates extensive simulations.In the next decade, the stage IV surveys such like DESI (DESI Collaboration et al. 2016), EUCLID (Laureijs et al. 2011), Roman (Spergel et al. 2015), LSST (LSST Science Collaboration et al. 2009), CSST (Zhan 2011;Hu Zhan 2021;Miao et al. 2023), Subaru (Aihara et al. 2018) and SKA (Braun et al. 2019;Weltman et al. 2020) are going to map large areas of the sky with unprecedented accuracy and efficiency, resulting in vast amounts of data.Consequently, the development of rapid mock generation techniques becomes crucial in supporting cosmological research that utilizes data from these new-generation sky surveys.
In order to generate mock catalogues that can be compared to observations, it is crucial to employ methods that establish a connection between the distribution of galaxies and the underlying dark matter field.Within the context of LSS analysis, the two traditional employed approaches are the Halo Occupation Distribution (HOD) and the Subhalo Abundance Matching (SHAM) methods.In recent years, several new methods have been proposed.For example, Behroozi et al. (2019Behroozi et al. ( , 2020) ) proposed a method to flexibly and self-consistently determine individual galaxies' star formation rates from their host haloes' potential well depths, assembly histories, and redshifts.In 2022, Wechsler et al. (2022) proposed the ADDGALS technique, which places galaxies within cosmological simulation lightcone outputs, yielding realistic mock galaxies.These mocks are generated using semi-analytic models from numerical simulations (Yung et al. 2022(Yung et al. , 2023;;Bose et al. 2022).These methods play a vital role in bridging the gap between observed galaxy distributions and the underlying dark matter structures.By establishing a connection between galaxies and their corresponding dark matter halos, these techniques not only facilitate follow-up cosmological analysis, but also foster a comprehensive understanding of the LSS and its relationship with galaxy properties.
The concept of HOD originated in the early 2000s with pioneering studies such as Jing et al. (1998); Peacock & Smith (2000); Seljak (2000).Subsequently, it was further refined and explored in Kravtsov et al. (2004), which specifies the probability distribution for the number of galaxies that meet certain criteria (such as a luminosity or stellar mass threshold) within a halo, conditioned on its mass, denoted as P (N |M ).Building upon this, an extended version of the HOD model was proposed by Yang et al. (2003), which establishes a connection between the full distribution of galaxy luminosity and the distribution of DM halos based on the conditional luminosity function.Notably, Yang et al. (2005) utilized this model to calibrate galaxy group finders in magnitude-limited redshift surveys.While the HOD model typically requires 3-5 parameters for a given galaxy sample Zheng et al. (2005); Reddick et al. (2013), the functional form of the HOD can become complex for large galaxy samples analysis, and determining the optimal parameter values to generate mock samples that match with observed statistics can be a time-consuming task.
The SHAM method is grounded on a straightforward assumption that more luminous (or massive) galaxies are hosted by more massive halos.Implementing SHAM entails establishing a mapping between galaxy stellar mass and various halo properties, resulting in the same stellar mass functions (SMFs) but distinct clustering signals.Specifically, SHAM relies on matching the stellar mass of a galaxy (M * ) with the mass of the (sub)halo (M h ).However, a challenge arises from the fact that subhalo masses are subject to intense tidal stripping upon entering larger halos, rendering them unreliable indicators of subhalo size.To address this issue, Kravtsov et al. (2004) proposed employing the maximum circular velocity of subhalos, denoted as V max , as a more robust property for matching galaxies.Furthermore, Shu et al. (2012) suggested that the relationship between galaxies and DM halos is not strictly one-to-one due to inherent physical scatter.Unlike the HOD method, the SHAM approach is largely non-parametric, aside from the scatter factor (as described in Sect.2.4).This basic and intuitive assumption allows for faster and simpler generation of mock galaxy catalogs.
Currently, robust N-body simulation techniques utilize methods such as Particle-Particle Particle-Mesh (P 3 M) or Tree-PM to precisely calculate the gravitational forces acted on each particle, resulting in a typical force resolution of approximately 0.001 h −1 Mpc.For instance, the GADGET N-body/SPH simulation code, which is freely available (Springel et al. 2001;Springel 2005;Springel et al. 2021), is based on the Tree-PM algorithm.However, these techniques have the drawback of requiring numerous timesteps to faithfully simulate structure formation across both large and small scales, thus demanding substantial computational resources.
In contrast, fast mock generation methods offer the capability to simulate the distribution of galaxies in the Universe with significantly reduced computational costs.Currently proposed methods include PINOCCHIO (Monaco et al. 2002), PTHalos (Scoccimarro & Sheth 2002), QPM (White et al. 2013), PATCHY (Kitaura et al. 2013), HALOGEN (Avila et al. 2015), COLA (Tassev et al. 2013), and so on.These methods employ Lagrangian Perturbation Theory (LPT) or the Particle-Mesh (PM) method to simulate the evolution of dark matter particles, followed by halo-finding algorithms or halo bias models to generate mock galaxy catalogues that exhibit statistical properties consistent with observations.These methods provide an efficient alternative for generating mock catalogs while maintaining fidelity to the observed statistics.Among them, COLA is a quasi-N-body method that places particles in a co-moving frame following LPT to simulate the large-scale dynamical evolution, and also uses a full-blown N-body code with the PM algorithm to compute the small-scale dynamics.This distinctive design allows COLA to generate mock catalogues efficiently while maintaining a commendable level of precision in the non-linear clustering regime.As a result, COLA emerges as a promising tool for fulfilling the requirements of mock generation tasks essential for emulator or machine learning analyses.
The COLA method has garnered significant attention in follow-up studies across various disciplines.For instance, Tassev et al. (2015) proposed a spatial extension of the N-body COLA method, enabling zoom-in simulations.Furthermore, Koda et al. (2016a) developed a technique using COLA to generate mock catalogues that incorporate low-mass halos, demonstrating its capability to resolve both massive and low-mass halos.Koda et al. (2016b) employed COLA and HOD methods to create 600 mock galaxy catalogues for the WiggleZ Dark Energy Survey.Additionally, Izard et al. (2018) introduced the ICE − COLA method, which efficiently generates weak lensing maps and halo catalogues in the lightcone, offering a rapid and accurate solution for generating mock catalogues to model galaxy clustering observables.Most recently, Ferrero et al. (2021) utilized ICE − COLA to produce halo lightcone catalogs and applied HOD methods to generate mock galaxy lightcone catalogues for DES Y3 samples.These studies highlight the diverse applications and advancements facilitated by the COLA method and its variants.
In this study, we explore the application of SHAM method to the outputs of the COLA fast simulation technique in order to construct mock galaxy catalogs that accurately reproduce the observed clustering properties.We present our fast mock generation method in Sect.2, which includes a brief explanation of the COLA algorithm, the halo-finder method, and the SHAM technique.Additionally, the observational data utilized in this study are detailed in Sect.2.1.The determination of the optimal parameter used in the SHAM technique is presented in Sect.3. In Sect.4, we compare the clustering properties of mock catalogues generated from GADGET and COLA simulations with observed galaxies.We further present the COLA results in different cosmologies in Sect.4.5.Finally, we summarize our results and conclude in Sect. 5.
In this paper, we employ three different "base" cosmology, listed below, to prevent any confusion.
2) In Sect.2.5, we adopt a base cosmology consistent with the WMAP 5-year data (Komatsu et al. 2009), with the following parameters: Ω m = 0.26, w = −1.0,h = 0.71, in order to maintain consistency with the analysis presented in Li et al. (2016).
3) In Sect.4.5, we assessed the impact of different cosmologies on our proposed scheme by using the Planck 2015 cosmology (Ade et al. 2016a) with the following parameters: Ω m = 0.31, w = −1.0,σ 8 = 0.82, as the basis for various cosmological models, as detailed in Tab. 4.

Observational Data
The Baryon Oscillation Spectroscopic Survey (BOSS) is a part of the SDSS-III (Bolton et al. 2012;Dawson et al. 2012;Eisenstein et al. 2011).BOSS aims to detect the characteristic scale imprinted by baryon acoustic oscillations (BAO) in the early universe by measuring the spatial distribution of luminous red galaxies (LRGs) and quasars (Eisenstein et al. 2001).The high-mass end of SMF is represented by Luminous Red Galaxies (LRGs), making them an ideal group to reproduce using SHAM.
BOSS provides redshift information for approximately 1.5 million galaxies in a sky region of ∼ 10 4 square degrees, divided into two samples: LOWZ and CMASS.The LOWZ sample comprises the brightest and reddest LRGs at z ≤0.4,while the CMASS sample targets galaxies at higher redshifts, many of which are also LRGs.The sky region covered by CMASS NGC galaxies is roughly RA ∈ For the purposes of our study, we focus solely on a subset of the BOSS DR12 CMASS NGC galaxies with a redshift range of z ∈ [0.45, 0.55], simply labelled as CMASS NGC.This enables us to rapidly estimate the accuracy of the mock generation.

COLA simulation
In N-body simulations the equation of motion is often solved using the standard leapfrog Kick-Drift-Kick algorithm.This algorithm discretizes the time evolution operator by applying a set of Kick and Drift operators, given by Here, x i (i = 0, 1, 2, • • • ) denotes the position of a particle at time t i ≡ i∆t, v i+1/2 is the velocity at t i+1/2 ≡ (i+1/2)∆t, and ϕ represents the gravitational potential.
The leapfrog process is a second-order accurate method for discretizing time, but its accuracy degrades for large time steps due to the truncation error from higher-order terms.To accurately integrate cosmological simulations, the time step is typically chosen to be proportional to the Hubble time H −1 (t).However, at high redshifts where the Hubble time is small, the time step must be correspondingly small to avoid additional errors.
COLA 1 , a hybrid approach that combines the second order LPT (2LPT) and N-body algorithms, is an effective solution for simulating dark matter particles.Perturbation theory has been successful in describing large scales, allowing for the linear growth rate to substitute for time integration in N-body simulations.COLA leverages this by using a comoving frame with observers following trajectories calculated in the perturbation theory.This tradeoff between accuracy at small scales and computational speed is achieved without sacrificing accuracy at large scales.
In the framework of COLA, particles evolve in a frame that is comoving with "LPT observers".The process begins with the computation of the initial conditions using 2LPT.Next, particles are evolved along their 2LPT trajectories and a residual displacement is added relative to the 2LPT path.This displacement is then integrated numerically using an N-body solver.Mathematically, where F represents the force on the particle, and the dots represent time derivatives.In LPT, the Eulerian final comoving positions x are related to the initial positions q through a displacement field Ψ : x(t) = q + Ψ (q, t) . (3) If applying 2LPT to the calculation, i.e., x LPT → x 2LPT , then the residual displacement field Ψ res is given by, where Ψ 1 and Ψ 2 are the Zel'dovich and 2LPT displacement fields at the present day (a = 1).Let us define the time operator T as where η is the conformal time, a is the scale factor, and H is the Hubble parameter with H 0 being its value today.
Then the operation of Kick-Drift-Kick algorithm in COLA is given by  1. Simulation settings and the computational costs for the GADGET-2 and COLA simulations.In both simulations, the same cosmological parameters were applied, specifically: Ωm = 0.2951, ΩΛ = 0.7049, Ω b = 0.0468, w = −1.0,σ8 = 0.80, ns = 0.96, and h = 0.6881.The number of DM particles and time steps in simulation are denoted by Npar and Nstep, respectively.The redshift range in the simulations is determined by the initial redshift zinit and the final redshift z final .We will evaluate the performance of COLA by using various values listed for the initial redshift (zinit), the time stride (da), and Nstep of COLA.When comparing the total wall-clock time of GADGET to that of COLA, with a similar number of CPUs, it becomes evident that the computational cost of GADGET simulations is approximately 102 times greater than that of COLA simulations. where Here ∆D n = D n,i+1 − D n,i for n = 1, 2, • • • , which denotes the change in the first and second-order growth factors over the timestep.COLA uses Eq. 6 & Eq. 7 to update the N-body particle positions and velocities, as well as to interpolate the quantities between timesteps for snapshots at redshifts of interest.Unlike the standard N-body methods, COLA relies solely on the PM method and 2LPT, leading to an imprecise force resolution and correspondingly a coarse resolution on small scales.For this reason, the force resolution of COLA simulations needs to be carefully considered when using the halo-finder.
To identify halos, we use the ROCKSTAR 2 (Robust Overdensity Calculation using K-Space Topologically Adaptive Refinement) halo finder (Behroozi et al. 2013a), which employs adaptive hierarchical refinement of Friend-of-Friend groups in six phase-space dimensions and one time dimension.The resulting halo samples from ROCKSTAR include both host halos and subhalos.To mitigate the small-scale uncertainty of COLA, we adjust the force resolution in the ROCKSTAR halo-finder to approach the grid size in our COLA simulation.The modern SHAM method not only requires high-resolution simulations with resolved substructure but also necessitates accurate merger trees to track the path of halos.The evolution history from the merger tree is helpful for finding representative halos throughout the history of the Universe.Once ROCKSTAR creates the particle-based merger trees, we use the "consistent trees" algorithm (Behroozi et al. 2013b), to trace the evolution of halos with redshift.
To assess the feasibility of using COLA, we ran simulations with 1024 3 particles in a box measuring 800 h −1 Mpc, allowing for comparison with GADGET mock under identical conditions.The initial matter power spectrum and transfer function were generated using CAMB (Lewis et al. 2000).Subsequently, in Sect.4.5 we employed the best simulation settings for production of the mock catalogues.A detailed outline of the simulation settings can be found in Tabs. 1 & 4. Comparison of halo mass functions (HMFs) of the simulated halos provided by GADGET simulation and COLA simulations with different settings of (zinit, da), respectively.The theoretical Press-Schechter mass function, labeled as 'PS', is also shown as a black dashed line in each panel.As observed, the HMF derived from COLA agree well with the theoretical prediction and that derived from GADGET.Furthermore, the results from different COLA parameter settings converge.In the low-mass region (M h ≲ 2 × 10 1 2 h −1 M⊙), the simulated HMFs in both and COLA begin to noticeably deviate from the theoretical predication.This deviation is due to the limitations of mass resolution in N-body simulations.
Furthermore, we perform a convergence test in the halo mass function (HMF) according to the simulated halos from and COLA simulations respectively.The results are shown in Fig. 1.As seen, both HMF of GADGET and HMFs of COLA simulations with different combinations of (z init , da) lead to comparable HMFs with the theoretical prediction (Press & Schechter 1974;Bond et al. 1991).By varying z init and da, the HMFs simulated with COLA are highly consistent with the HMF of GADGET.

Stellar mass function
The simulated halo samples correspond to the total galaxy samples covering all stellar mass intervals.However, due to the selection process in measurements, the stellar mass function (SMF) of a single observational survey is incomplete.By combining several galaxy surveys, the complete SMF can be obtained and be described by a fitted function.Typically, the observed SMF of quiescent galaxies is fitted with a double Schechter function (Weigel et  2016), while star-forming galaxies are often described using a single Schechter function (e.g., Li & White (2009); Peng et al. (2012); Muzzin et al. (2013)).The SMF is known to evolve with redshift, but Mitchell et al. (2016) indicates that the relationship between stellar mass and halo mass weakly evolves over the redshift range of 0 < z < 4. In this work, we use the redshift-invariant stellar mass function (SMF) by fitting it into a segmented function within a specific stellar mass interval.Each segment of the SMF satisfies the single Press-Schechter formula, given by The fitting parameters (ϕ c , α, log M c ) were proposed by Rodríguez-Torres et al. ( 2016) and are shown in Tab. 2.
To match the stellar mass catalogues and the BOSS DR12 catalogues, we employ three key quantities, namely PLATE, MJD, and FIBERID, to uniquely identify each galaxy in the BOSS DR12 catalogues.For the CMASS NGC catalogue, we combine the raw released catalogue with the Portsmouth SED-fit DR12 stellar mass catalogue (Maraston et al. 2009) to obtain the galaxy catalogue enriched with stellar mass.We then obtain the SMF of the CMASS NGC catalogue.By combining the complete SMF and the SMF of the CMASS NGC catalogue, we can calculate the downsampling ratios f down for each mass bin, as illustrated in Fig. 2.These ratios are used to generate mock catalogues of the CMASS NGC catalogue.

Subhalo abundance matching procedure
SHAM is a simple and powerful statistical approach for connecting galaxies to subhalos.In its simplest form, given some property of subhalos, such as halo mass or maximum circular velocity, the subhalo number density and the galaxy number density are matched in order to obtain the connection between subhalos and the galaxies that they host.Some SHAM related works focus on fitting the parameters in a function describing the stellar mass vs. halo mass (SMHM) relation in order to minimize the deviation between the model SMF and an observed SMF (Rodríguez-Puebla et al. 2017;Moster et al. 2018;Behroozi et al. 2019).
In this study, our focus does not lie in determining the SMHM relation, since the COLA simulation may result in a non-negligible uncertainty of halo mass.Specifically, our implementation of the SHAM process can be divided into the following steps: (1) For the host halos, we choose the maximum circular velocity, denoted as V max , which represents the highest circular speed attained by test particles within these halos.This property is considered as the halo property for the subhalo abundance matching (SHAM) technique.However, there is an additional complication caused by the significant evolution of subhalos within host halos due to interactions in the dense environments of the larger hosts.Consequently, using V max of a subhalo as a proxy for stellar masses may not yield accurate results.
To address this issue, it has become common practice to assign stellar masses or luminosities to subhalos based on their V peak value, as proposed by Hearin et al. (2013), where V peak represents the largest V max achieved by a subhalo throughout its entire history.
(2) The assumption underlying SHAM is that more luminous (or massive) galaxies are typically found within more massive halos.However, this relationship is not strictly one-to-one, as there exists a natural scatter between galaxies and dark matter halos.To account for this scatter, we employ a method proposed by Rodríguez-Torres et al. ( 2016), for assigning stellar mass to (sub)halos.This involves defining a scattered quantity, denoted as V scat

L
, and the expression is as follows: Bottom panel: downsampling ratio f down for three redshift bins.This ratio is defined as the number density of observed galaxies in a given bin (ϕNGC) divided by the number density of the full galaxy population (ϕtot).Since observations are inherently incomplete, the observed catalogue in each stellar mass bin represents a subset of the full galaxy population in the same sky region.Note that our analysis in this study specifically focuses on catalogues within the redshift range of 0.45 < z < 0.55 (solid blue). with where N is a random number following a Gaussian distribution with zero mean and a standard deviation of σ scat (V L |M * ).
(3) We connect the scattered quantity V scat L of (sub)halos to the stellar mass (M * ) of the central galaxies by assuming a monotonic relationship between them.If the number density of (sub)halos with V scat L matches the number density of galaxies with a stellar mass exceeding M * , then we assume that galaxies with a stellar mass of M * reside at the center of (sub)halos with V scat L .Our study uses a lower limit of stellar mass log 10 M cut * = 11.0.

Mock generation
In this study, we mainly generate mock catalogues in a cosmology using the two different N-body simulation codes.To validate the robustness and effectiveness of the COLA simulation, we examine three different cases with varying COLA settings.In the first case, we maintain the default settings of COLA, i.e., varying z init and adjusting da according to da = 1/(1 + z init ).In the second case, we vary z init while keeping da constant at da = 1/120.In the final case, we keep z init at 29 while altering da.Further details are provided in Tab. 1.The mock catalogues are labeled by their respective N-body method, GADGET mock and COLA mock.Both simulations use 1024 3 DM particles to trace the evolution and are conducted in cubic volumes with periodic boundary conditions and a side length of 800 h −1 Mpc creating 16 snapshots in the redshift range 0.2 < z < 0.7.The particle mass corresponds to 3.9 × 10 10 h −1 M ⊙ .For the COLA N-body method, the force resolution is determined by the simulated box size and the mesh size.In the default setting of COLA the mesh size is chosen as 3 3 times larger than the number of DM particles (N par ), resulting in a force resolution of approximately 260 h −1 kpc in a cubic volume with 800 3 h −3 Mpc 3 for N par = 1024 3 .We then use ROCKSTAR halo-finder and the Merger-Tree technique to identify halos from the DM particle snapshots of the COLA simulation.The input force resolution of ROCKSTAR halo-finder is set as the force resolution of the COLA simulation.The output redshifts of snapshots from the COLA simulation were chosen as follows: z snap = [0.71,0.66, 0.62, 0.57, 0.53, 0.50, 0.46, 0.42, 0.39, 0.36, 0.33, 0.30, 0.27, 0.25, 0.22, 0.20].These redshift values were selected in order to match the redshift range of the SDSS BOSS catalogues.
Applying the SHAM procedure, we generate mock catalogues with statistical properties that match those of observed galaxies in the CMASS NGC dataset.In this proof-of-concept study, we limit our analysis to a narrow redshift range of 0.45 < z < 0.55 to enable a fast comparison between the statistical properties of the COLA mock and GADGET mock catalogues with those of CMASS NGC.
The overall procedure for generating mock galaxies from the snapshot of halos is as follows.
(1) Using the redshift information available in the snapshots, we create periodic replicas of the snapshot of (sub)halos at z = 0.50, and create a sample of objects in a redshift-shell which corresponds to the true redshift range 0.4 < z < 0.6.
(2) To account for the effect of redshift space distortion (RSD) on the halo position within the redshift-shell, we use the following relation: where z represents the real redshift without RSD, while z obs represents the observed redshift after accounting for RSD.The projected velocity of the halo along line-of-sight (LoS) is denoted by v ∥ , and c represents the speed of light.We then update the redshift-shell satisfying the condition that 0.45 < z obs < 0.55.
(3) We select a block of sky region covering RA ∈ [100  ] which corresponds to the sky coverage of CMASS NGC galaxies (as shown in Fig. 3).This selection is made without taking into account any veto masks or fibre collisions in the BOSS DR12 data.
(4) Considering the complete SMF with the lower limit of stellar mass log 10 M cut * = 11.0, the number density of the chosen full galaxies can be obtained, then we apply the SHAM procedure to obtain the (sub)halo catalogue with the same number density.The assumption is that the mock galaxies located at the center of halos have a stellar mass distribution that follows the complete SMF (ϕ tot ) mentioned in Sect.2.3.
(5) Downsampling is necessary due to the inherent incompleteness of observations.This is because the galaxies we observe are a subset of the full galaxy population in the same sky region.As a result, the observed SMF, such as ϕ NGC shown in Fig. 2, is lower than the complete SMF, ϕ tot , described in Sect.2.3.The downsampling ratio f down represents the ratio of ϕ NGC to ϕ tot , as illustrated in Fig. 2.
(6) By applying the above procedures, we can generate a set of halo catalogs with varying σ scat values in the SHAM procedure.In order to quantify the differences between these halo catalogs and the observed CMASS NGC galaxies, we use the two-point clustering statistic ξ 0 (s), as defined in Sect.3, and calculate the corresponding χ 2 value, given by where the ∆ξ i 0 represents the deviation of ξ 0 between the halo catalogues and the CMASS NGC galaxies at the bin s i , while the covariance matrix C is determined from a large set of PATCHY mocks as defined in Eq. 15.It is worth noting that in this paper we apply the unified fiducial cosmology of (Ω m = 0.26, w = −1.0,h = 0.71) to estimate the statistics of mock catalogs.

MultiDark Patchy BOSS DR12 mock catalogues
To accurately estimate the covariance matrix, we use 1,000 MultiDark Patchy BOSS DR12 mock catalogues (Kitaura et al. 2016), labeled as PATCHY mock in this paper.These mock catalogues are generated using approximate gravity solvers and analytical-statistical biasing models, and they are calibrated to a BigMultiDark N-body simulation (Rodríguez-Torres et al. 2016) that employs 3, 840 3 particles in a (2.5 h −1 Gpc) 3 volume.This simulation assumes a ΛCDM cosmology with Ω m = 0.307, Ω b = 0.048, σ 8 = 0.82, n s = 0.96, and h = 0.67.Applying the aforementioned technique in several redshift bins, the resulted mock catalogues can match the redshift evolution of the biased tracers in the BOSS observations, and finally the contiguous lightcone can be created by combining the resulting mocks in several redshift bins.The MultiDark Patchy BOSS DR12 mock catalogues that result from this method accurately reproduced the number density, selection function, and survey geometry of the BOSS DR12 data.The 2PCF of the observational data was reproduced down to a few Mpc scales, with most within 1σ (Kitaura et al. 2016).These mock surveys have been utilized in a series of studies for the statistical analysis of BOSS data (Alam et al. 2017) and references therein).The extensive set of mock catalogues enabled us to perform a robust statistical error estimation.
In general, for a given observable in a particular bin i, say O(i), we can estimate the covariance matrix by computing the sample covariance of simulated mock catalogues through where the sum is performed over N m mock catalogues, O represents the mean value over those mocks, and the index m denotes the m-th realization mock.

DETERMINATION OF THE OPTIMAL VALUE OF SCATTER FACTOR
In this section, we will present the results of determining the optimal scatter factor σ scat of SHAM for different N step in the COAL simulation.The optimal σ scat is derived by minimizing the difference between the COLA mock and CMASS NGC through χ 2 , as defined in Eq. 14.In this study, we propose to calculate χ 2 using the monopole of the 2PCF instead of other statistical quantities.Figure 4.The χ 2 estimation defined by Eq. 14, which is used to quantify the differences between mock catalogs and the observed CMASS NGC galaxies.We employ different COLA settings for comparison.

Monopole of 2PCF
The Landy-Szalay estimator (Landy & Szalay 1993) is used to calculate the two-point correlation function (2PCF) (Davis & Peebles 1983) for both our mock catalogues and observation data.The estimator is defined as follows: where DD, DR, and RR represent the normalized pair counts for galaxy-galaxy, galaxy-random, and random-random samples, respectively.These counts are separated by a distance defined by s ± ∆s and µ ± ∆µ.Here, s is the distance between the pair, and µ = cos(θ), with θ being the angle between the line joining the pair of galaxies and the LoS direction to the target galaxy.The presented statistic measures the anisotropy of the clustering signal.The random catalogue is composed of unclustered points with a number density in redshift space that simulates the radial selection function of the observational data.In order to decrease the statistical variance of the estimator, we construct random catalogues that are ten times larger than the data catalogues.With using the CUTE code3 , we calculate the correlation function ξ(s, µ) for both the CMASS NGC galaxies and the mock catalogues within the redshift range of 0.45 < z < 0.55.The monopole ξ 0 (s) in configuration space is obtained by integrating ξ(s, µ) over each s, as given by Note that due to the large uncertainty of the correlation function at large clustering scales, we restrict the computation of the χ 2 to small clustering scales of s ∈ [4, 20] h −1 Mpc.A smaller value of χ 2 indicates a better match with the observed galaxies.

Fitting results
The resulting χ 2 with varying σ scat are presented in Fig. 4 and Tab. 3. Based on the best scattering parameters (σ best scat ), we obtained the mock catalogs for different simulations.Fig. 5 displays the 2PCF curves at small scales for GADGET mock and COLA mocks with different combinations of (z init , da).It can be observed that the ξ 0 (s) of the GADGET mock exhibits good consistency with that of CMASS NGC within the selected scale range of s ∈ [4, 20] h −1 Mpc.
In Fig. 5, we present the three cases.The left panel corresponds to da = 1/(1 + z init ), with z init values of 29, 49, 59, and 119.In all cases, the ξ 0 (s) values fall within the 2σ region, indicating a good match with CMASS NGC.The middle panel represents the case where z init varies while fixing da = 1/120, also showing good consistency with CMASS NGC.The right panel shows the case where z init is fixed at 29, but da is varied.In this case, the ξ 0 (s) values also fall within the 2σ region, indicating that different time strides da in COLA lead to a similar evolution process of dark matter particles.
Based on these results, we find that: Table 3. Best-fit values of σscat obtained by minimizing the χ 2 values in Eq. 14 for different simulations using the SHAM procedure.The optimal σscat is then used to determine the corresponding χ 2 min , which are shown in the rightmost column.The calculation of ξ0(s) is based on s values ranging from 4 to 20 h −1 Mpc.In all cases, the DOF are 16, determined by the number of bins.• For the GADGET mock, the minimum value of χ 2 min = 24.6 is achieved with Degree-of-Freedom (DOF) = 16 at σ best scat = 0.03, which is consistent with the χ 2 distribution at the 2σ level, since χ 2 min < 27.3, which is calculated as 16 + 2 √ 2 × 16.These results indicate that the reference mock catalogue effectively reproduce the two-point clustering characteristics of the CMASS NGC galaxies.
• The COLA simulations for the case with da = 1/(1 + z init ) and z init set to 29, 59, 49 and 119, yield χ 2 min values that consists with the χ 2 distribution at the 2σ level (χ 2 min < 27.3 when DOF=16).Specifically, the χ 2 min values are 23.2, 23.2, 19.4 and 21.4 for the respective simulations, as listed in Tab. 3. In the case where z init = 49, the χ 2 min of COLA with N step is comparable to that of GADGET mock with the same z init but a significantly larger number of steps, N step = 3676.This suggests that the COLA scheme can perform similarly to GADGET.
• The COLA simulations for the case with the same stride da = 1/120 but varying z init as 29, 59, and 119, respectively, yield χ 2 min values that are consistent with the χ 2 distribution at the 2σ level (χ 2 min < 27.3 with DOF=16).Specifically, the χ 2 min values are 20.8, 19.5 and 21.4 for the respective simulations, as listed in Tab. 3.These findings suggest that in the case of using the COLA N-body method, when starting from the same initial redshift, the accuracy of the mock catalogue remains robust for different s.However, increasing the initial redshift does not automatically guarantee an improved accuracy of the mock catalogue.This phenomenon can be attributed to the accumulation of errors that occur during the numerous steps of low-force resolution computation, particularly in cases with a high initial redshift of simulation.On the other hand, choosing a very large da (or a small N step ) can also lead to inaccuracies in the simulation, since a very small number of time steps may fail to effectively capture the nonlinear evolution that occurs on small clustering scales.Therefore, selecting an appropriate combination of (z init , da) is a critical factor in producing a high-quality mock catalog.
The main objective of this study is to provide an economical method for generating mock catalogues that can accurately reproduce the statistical properties of observed galaxies.Regarding Tab. 3, the analyses indicate that, all tested cases are reasonable choices for the COLA simulation, as they can fit the observed data within a 2σ level.However, for the objectives of this study and considering the computational times listed in Tab. 1, we select (z init = 29, da = 1/30) as our preferred COLA setting.This choice is based on the observation that there is no statistically significant difference among the cases with da = 1/30, 1/60, and 1/120.By using fewer steps, it becomes easier to generate simulation catalogs efficiently and in large quantities, thereby in line with the ultimate goal of this work.In the following sections, through the analysis of various statistical metrics, we will demonstrate that this choice is indeed appropriate and yields results comparable to those obtained using GADGET.
Using the best-fit σ scat for different combinations (z init , da), one can employ our proposed procedure to generate mock catalogues.Visualization of CMASS NGC, GADGET mock, and COLA mock (z init = 29, da = 1/30) is shown in Fig. 6, where we applied the same angular selection as CMASS NGC to the mock catalogues.From this plot, it is evident that both the data and the mocks exhibit the same distribution, with no discernible visual differences apart from cosmic variance.Fig. 7 displays a redshift slice of the COLA mock with (z init = 29, da = 1/30).The plot reveals the distribution of halos and catalogues selected using the SHAM process.Notably, it suggests that the COLA mock catalogues demonstrate a more pronounced clustering feature, closely resembling that observed in galaxies.
In summary, considering the incompleteness of the BOSS DR12 catalogue, by appropriately selecting the parameter σ scat in the SHAM procedure described in Sect.2.4, and choosing (z init = 29, da = 1/30) in COLA, the statistical properties of COLA mocks are similar to those of GADGET mock, and both of them are in good agreement with those of observed galaxies.Thus, our scheme greatly saves computational resources much more compared to the treatment using accurate N-body method, and this efficient mock generation method has broad applications in exploring the LSS of the Universe.For further verification, the subsequent section provides a thorough examination of the statistical properties for COLA simulations with varying z init and da values.

STATISTICAL MEASURES OF THE MOCK CATALOGUES
In this section, we assess the fidelity of the mock catalogues by conducting a comprehensive comparison of various statistical measures derived from both the mock catalogues and the observational data.By examining these statistics, we can judge the quality and accuracy of the mock catalogues.Through this analysis, it will become evident that The gray points represent the full (sub)halos, while the red marks ("×") correspond to the mock galaxies selected using the SHAM procedures.
(z init = 29, da = 1/30) is the optimal choice for accurately reproducing the various statistical properties observed in the data, while minimizing computation time.

The two-point clustering
Based on ξ(s, µ), the 2PCF in ξ(r ∥ , r ⊥ ) can be calculated by projecting the distances s between the pair in LoS direction onto its parallel (r ∥ ) and perpendicular (r ⊥ ) components.In Fig. 8, the left panel shows a comparison of the ξ(r ∥ , r ⊥ ) contours between CMASS NGC and GADGET mock with (z init = 49, N step = 3676).The right panel shows a comparison of the contours between CMASS NGC and COLA mock with (z init = 49, da = 1/50, N step = 50).In each panel, we find that the Finger-of-God (FoG) effects (Jackson 1972) of the mock catalogues are slightly stronger than the observation data in the small r ⊥ range.The difference of the FoG effect between GADGET mock and COLA mock is small, indicating that the velocity distribution of COLA mock approaches that of GADGET mock.
The resulting monopoles ξ 0 (s) for different mock catalogues are shown in Fig. 9.In the first case, with the setting of da = 1/(z init + 1), the two-point statistical properties of GADGET mock and COLA mock for small s values exhibit good consistency with the CMASS NGC galaxies for various values of z init , including (29,49,59,119).In the case of fixing da = 1/120, the two-point statistical properties at small scales of both the GADGET mock and COLA mock with initial redshifts z init = (29, 59, 119) also match well with those of CMASS NGC.In the last case, with the same initial redshift of z init = 29, the two-point statistical properties at small scales of the COLA mock with da = (1/30, 1/60, 1/120) also agree with those of the CMASS NGC galaxies.Moreover, in all three cases, noticeable deviations are observed at large scales due to the substantial sampling variance.

The anisotropic two-point clustering
To investigate the anisotropy using ξ(s, µ), we analyze the dependence of ξ on µ.Following the method presented in Li et al. (2015), we integrate ξ over the interval s min ≤ s ≤ s max , as given by Here a cut is applied on µ such that µ > µ min , to mitigate the geometric effect from the thin redshift shell of 0.45 < z < 0.55.The quantity, ξ ∆s (µ), describes the angle dependence of the two-point clustering, allowing it to estimate the anisotropy of the galaxies.However, the value of ξ at small scales is significantly affected by the FoG effect, which depends on the galaxy bias.This may result in a redshift evolution in ξ ∆s (µ), which is relatively difficult to model.On the other hand, at large scales, the measurement is dominated by noise due to poor statistics.According to Li et al. (2015), the choice of s min = 6 ∼ 10 h −1 Mpc and s max = 40 ∼ 70 h −1 Mpc is appropriate, as it provides reliable, tight and unbiased constraints on cosmological parameters.For the aforementioned purposes, we have chosen s min = 8 h −1 Mpc and s max = 60 h −1 Mpc, and and set µ min = 0.12 in this study.In Fig. 10, the anisotropic clustering patterns of the CMASS NGC, GADGET mock, and COLA mocks are compared using the ξ ∆s (µ) statistic.The GADGET mock catalogue has a similar anisotropy pattern to the observational result, with a relative error that falls within the 2σ uncertainties.Fig. 10 also shows the angular dependence of ξ on µ for the three different COLA settings.In the first case (left), we vary z init while adjusting da according to da = 1/(1 + z init ).In the second case (middle), we increase z init from 29 to 119 while keeping a fixed da of 1/120.We find consistent clustering patterns with that of CMASS NGC at 2σ level.Similarly, in the third case (right), where z init is fixed at 29 and da is varied, the COLA mock catalogues all agree with the observed data in terms of the statistical uncertainty.This observation suggest that the COLA simulation maintains good robustness with respect to changes in da.

Three-point clustering
Furthermore, we can use the three-point clustering analysis to assess the accuracy of our mock data.The three-point correlation function (3PCF) can be expressed as the probability of finding a galaxy in each of the volume elements dV 1 , dV 2 , and dV 3 , given that these elements are arranged in a particular configuration defined by the sides of a triangle (r 1 , r 2 , and r 3 ).This joint probability can be mathematically represented as: The above expression consists of several components: the sum of 2PCFs for each side of the triangle, the complete 3PCF denoted by ξ, and n which represents the mean density of data points.The 3PCF estimator is given by (Szapudi & Szalay 1998), where each term represents the normalized count of triplets in the data (D) and random (R) samples that satisfy a specific triangular configuration of our choice.The function ζ in the isotropic case depends on the three sides of the triangle, denoted as (r 1 , r 2 , r 3 ).In contrast, in the anisotropic case, ζ depends on the three sides of the triangle as well as two angles (θ 1 and θ 2 ) relative to the LoS direction ( ẑ), where cos θ 1 = r1 • ẑ and cos θ 2 = r2 • ẑ.Although one could perform an analysis in the anisotropic 5-parameter space (r 1 , r 2 , r 3 , θ 1 , θ 2 ), this study only considers the isotropic case of 3PCF, reducing 5-parameter space to ζ(r 1 , r 2 , r 3 ).We utilize the GRAMSCI4 (GRAph Made Statistics for Cosmological Information) method, a publicly available code developed by Sabiu et al. (2019), to estimate the threepoint statistics ζ(r 1 , r 2 , r 3 ).This method combines the concepts of KD-trees and graph databases to speed up the calculation of 3PCF, making it more efficient than traditional methods.The resulting 3PCF, as shown in Fig. 11, indicates that the mock catalogues have comparable three-point clustering with that observed in CMASS NGC.In the top panels, the solid line represents the resulting 3PCF from the observed CMASS NGC galaxies, and the shaded area represents the 2σ uncertainties obtained from PATCHY mocks.Based on the three-point clustering, the GADGET mock shows a good match with the observed data.For comparison, we illustrate the three different COLA settings from left to right.The adopted values of z init and da are listed in each panel.In most of the adopted parameter combinations for (z init , da), we observe that the resulting 3PCFs are consistent with those of the observed data from CMASS NGC within the 2σ level.Therefore for saving computing resources, (z init = 29, da = 1/30) is the optimal option for achieving a good consistency with the observed galaxies in terms of 3PCF.

Power spectrum multipoles
In order to estimate the redshift-space clustering of mock catalogues, we apply the FFT-based anisotropic power spectrum estimator proposed by Hand et al. (2017).Starting with the weighted object density field F (r) (Feldman et al. 1993), defined as where n and n s are the number density field for the catalogue of the observed objects and synthetic catalogue of random objects, w(r) is a general weighting scheme and the factor α normalizes the synthetic catalogue to the number density of the observed objects.I is the factor for normalization, defined as I = dr n2 (r)w 2 (r).Without loss of generality, we choose the FKP weights (Feldman et al. 1993) for both the data and random.The FKP method is an optimal approach that accommodates variations in density across a survey, effectively balancing sample variance and shot noise.Thus, each galaxy is assigned a weight of w = w FKP (r) ≡ (1 + n(r)P 0 ) −1 , and we adopt P 0 = 10 4 h −3 Mpc 3 as the fiducial value for a typical galaxy survey.The FFT-based anisotropic power spectrum estimator can be defined as where Ω k is the solid angle of the Fourier mode.P noise ℓ is the shot noise, given by where L ℓ is the ℓ-th order Legendre polynomial, and one can assume that P noise ℓ = 0 for ℓ > 0. F ℓ (k) is the Legendre expansion of the weighted object density field F (r) in Fourier space, given by Utilizing the NBODYKIT5 library (Hand et al. 2018), we calculate the power spectrum multipoles P ℓ (k).In this study, we consider only the multipoles of ℓ = 0, 2, and 4, as higher orders are mostly dominated by noise modes.As mentioned earlier in Fig. 3, we do apply the same angular selection of CMASS NGC to the mock catalogues when calculating the multipoles P ℓ (k).For all power spectrum measurements, we perform a linear binning in the chosen Fourier space with a bin width of ∆k = 0.01 h/Mpc, and the range of the Fourier space chosen for binning is k ∈ [0.002, 0.4] h/Mpc.Fig. 12, 13 and 14 display the comparisons of power spectrum multipoles (ℓ = 0, 2, 4) for CMASS NGC, GADGET mock, and COLA mock with three cases of (z init , da) respectively.As shown in the three figures, the black solid lines represent the results from CMASS NGC, while the shaded region represents the 2σ statistical uncertainties obtained from 1000 PATCHY mocks.
As observed, the monopole P 0 (k), which represents the angular averaged power spectrum, is displayed in the left panels of Fig. 12, 13 and 14.At large scales, when k < 0.1h/Mpc, most of the points for P 0 (k) of GADGET mock and COLA mock have a good consistency with that of CMASS NGC within 2σ level.The GADGET mock and COLA mock demonstrate a comparable ability to reproduce P 0 (k) to that of CMASS NGC.Moreover, when k ∈ [0.1, 0.2] h/Mpc, the monopole of COLA mock approaches the CMASS NGC measurement even more closely than that of GADGET mock.In addition, at small scales, k > 0.2 h/Mpc, P 0 (k) of COLA mock gradually deviates from the real data measurement and the theoretical prediction beyond the 2σ level.This deviation may be due to the relatively low force resolution of the COLA N-body method.
In addition to the monopole, the middle panels of Fig. 12, 13 and 14 present the results for the quadrupole, P 2 (k), which represents the leading anisotropies in the power spectrum in redshift space.At large scales, all of mocks are well consistent with observations within a 2σ uncertainty.However, for k > 0.2 h/Mpc, both GADGET mock and COLA mock have lower P 2 (k) values compared with CMASS NGC.This is likely due to the fact that both simulation mocks cannot reproduce the real RSD effects well at small scales, which is caused by the relatively low resolution of the simulations with only 1024 3 particles in a box with a side length of 800 h −1 Mpc.
Besides, the right panels of Fig. 12, 13 and 14 show the next-to-leading anisotropies represented by the hexadecapole P 4 (k).It can be observed that when k < 0.3 h/Mpc, the P 4 (k) values from CMASS NGC, GADGET mock, and COLA mock are very similar, with most of the values falling within the 2σ region, indicating good consistency with the theoretical prediction.
In summary, based on the analysis of multipoles presented above, it can be concluded that COLA mock catalogs with the used combinations of (z init , da) perform similarly to GADGET mock and can accurately reproduce the power spectra of anisotropies in most cases.

Performance test of varying cosmological parameters
To further assess the impact of different cosmological parameters on our proposed scheme and to validate its performance over various cosmological models, we varied the values of cosmological parameters in the COLA simulations.Using the same SHAM procedure as described in Sect.2.4, we generated COLA mocks for each set of cosmological parameters, with the setting of (z init = 29, da = 1/30).Based on these mock catalogs and their corresponding 2PCFs, we determined the optimal value of the parameter σ scat by minimizing the χ 2 of the 2PCFs through Eq. 14, and the corresponding results are shown in Tab. 4.
In this section, we select the base cosmology with the following parameter values: Ω m = 0.31, Ω b = 0.048, Ω Λ = 0.69, w = −1.0,σ 8 = 0.82, h = 0.68.These values closely approximate the mean constraint derived from the Planck 2015 results (Ade et al. 2016a).It is worth noting that this chosen cosmology differs from the one used in the previous sections of this study.In particular, we varied the values of Ω m , w, and σ 8 with respect to the base cosmological model, resulting in slight changes within the ranges of Ω m ∈ [0.29, 0.30], w ∈ [−1.1, −0.8], and σ 8 ∈ [0.81, 0.84].As indicated in Tab. 4, the best-fit value of σ scat shifts from 0.50 for the base cosmological model to a range of [0.47, 0.62] when different cosmological parameters are varied.This indicates that the values of σ scat do not change significantly, suggesting that the estimate is robust against different cosmological models.
The relative changes of χ 2 min with respect to the statistical uncertainty, defined as ∆χ 2 min /σ where ∆χ 2 min ≡ χ 2 min − χ 2 min (base), are listed in Tab. 4. Note that, the cosmological parameters chosen in the case "Base" are 2σ consistent with the Planck 2015 cosmology results (Ade et al. 2016b), whereas they are slightly different from what we used in the previous sections (see the detailed values in Sect.2.5).
Since DOF = 16 for the fit, according to the χ 2 distribution, the expected standard deviation (σ) is approximately 5.66, calculated as σ ≈ √ 2 × DOF.As shown in Tab. 4, for all cases, the absolute values of the changes in χ 2 min are smaller than the 2σ uncertainty.These results further support the validity of our proposed scheme for generating mock data for various cosmological models.

CONCLUSIONS
Fast mock generation has wide-ranging applications in studying LSS of the Universe and effectively evaluating various theoretical models on cosmological scales in Stage IV surveys.This is particularly crucial for approaches that necessitate numerous mock catalogues, such as machine learning.In this study, we utilized COLA simulations to propose an efficient method for generating mock catalogues rapidly.We have confirmed that the mocks generated through COLA accurately reproduce the observed statistical properties of LSS using various statistical estimators, including 2PCF and 3PCF in redshift space, and the power spectrum multipoles.Meanwhile, we find that the performance of COLA mocks is comparable to that of GADGET simulations.
There are only three free parameters involved in generating COLA mocks: z init , da and σ scat .σ scat is used for the SHAM procedure, while z init represents the initial redshift of COLA simulation and da is the time stride in COLA.Smaller values of da lead to more time-consuming simulations.Using the two-point clustering statistic, we computed the χ 2 values between the mocks and the observed galaxies (BOSS DR12 CMASS NGC catalogue).By minimizing χ 2 , we obtained the optimal value of σ scat for a given combination of (z init , da).
Based on the tests conducted by varying the combination of (z init , da), we have found that, in most cases, COLA can yield good consistency with various statistical quantities of observed galaxies.Especially, in the case of setting z init to 29, 59, and 119, respectively, while keeping da constant (with a chosen value of da = 1/120), based on the isotropic results, the anisotropic two-point clustering, and the three-point clustering, as well as the power spectrum multipoles, we observe that the statistical properties of these resulting COLA mocks are consistent with each other.Therefore, the simulated COLA method demonstrates robustness when varying the initial redshift, while the time stride da remains fixed.Furthermore, taking into account the computational cost of each simulation setup, the optimal choice is (z init = 29, da = 1/30).This choice not only allows for fast mock generation that statistically matches observations, but also minimizes computational demands.4. Performance test of COLA mocks with the setting of (zinit = 29, da = 1/30) by evaluating the effects of varying different cosmological parameters.When a cosmological parameter is changed from the base cosmology, it is written in bold.
We primarily focus on varying three parameters: Ωm (specific values listed in the 2nd column), w (the 3rd column), and σ8 (the 4th column).Depending on the cosmological parameters used in COLA simulation, the distribution of the mock catalogue generated by our SHAM process exhibits slight variations.By minimizing the χ 2 of the 2PCF (based on Eq. 14), we determined the optimal value of σ best scat (the 5th column) that provides the best fit between the simulated and observed 2PCF.In all cases, the optimal values of σ best scat do not change significantly compared with that in the base cosmology despite significant variations in the cosmological parameters.The relative changes of χ 2 min with respect to the statistical uncertainty are presented in the rightmost column, where ∆χ 2 min ≡ χ 2 min − χ 2 min (base).This implies that the relative changes for the different cosmology models are not significantly different from the 2σ uncertainty, where σ ≈ √ 2 × DOF ≃ 5.66 with DOF = 16.These catalogues can be accessed through at this url: https://nadc.china-vo.org/res/r101298/ .
Furthermore, we evaluated the performance of COLA mocks over different cosmological models.By considerably varying Ω m , w, and σ 8 , we observe that the best-fit values of σ scat do not change significantly.Additionally, the changes in the χ 2 values for 2PCF are well below the 2σ uncertainty.These results indicate the validity and robustness of our fast mock generation scheme across different cosmological models.
In summary, our scheme successfully produces mock catalogs that possess similar statistical properties to the observed galaxies.In the future, we plan to generate lightcone catalogs spanning a broad range of redshifts, fulfilling the data requirements for most cosmological studies, instead of focusing on a narrow redshift bin as done in this study.
Figure1.Comparison of halo mass functions (HMFs) of the simulated halos provided by GADGET simulation and COLA simulations with different settings of (zinit, da), respectively.The theoretical Press-Schechter mass function, labeled as 'PS', is also shown as a black dashed line in each panel.As observed, the HMF derived from COLA agree well with the theoretical prediction and that derived from GADGET.Furthermore, the results from different COLA parameter settings converge.In the low-mass region (M h ≲ 2 × 10 1 2 h −1 M⊙), the simulated HMFs in both and COLA begin to noticeably deviate from the theoretical predication.This deviation is due to the limitations of mass resolution in N-body simulations.

Figure 2 .
Figure2.Top panel: the redshift-invariant complete SMF ϕtot (dashed), as described in Sect.2.3, along with the observed SMFs of the CMASS NGC catalogue in three redshift bins (marked) denoted by ϕNGC.Bottom panel: downsampling ratio f down for three redshift bins.This ratio is defined as the number density of observed galaxies in a given bin (ϕNGC) divided by the number density of the full galaxy population (ϕtot).Since observations are inherently incomplete, the observed catalogue in each stellar mass bin represents a subset of the full galaxy population in the same sky region.Note that our analysis in this study specifically focuses on catalogues within the redshift range of 0.45 < z < 0.55 (solid blue).

Figure 3 .
Figure 3.Comparison of the sky coverage.The area shaded in blue represents the sky coverage of CMASS NGC.For simplicity, we do not apply the angular selection of CMASS NGC to the mock catalogues (GADGET and COLA mocks) when estimating the two-point or three-point clustering.This means that the sky coverage of the mocks includes the entire region in red.However, when calculating the multipoles of the power spectrum (see Sect.4.4), we do apply the angular selection of CMASS NGC to the mock catalogues.

Figure 5 .
Figure 5.Comparison of the monopole of 2PCF for GADGET mock, CMASS NGC and COLA mock with different combinations of (zinit, da).We observe a close match between ξ0(s) of GADGET mock and CMASS NGC within the range of s ∈ [4, 20] h −1 Mpc.When zinit is set to 29, 49, and 59, the resulting ξ0(s) values of the COLA mocks all fall within the 2σ region, indicating a good match with CMASS NGC.

Figure 7 .
Figure7.A redshift slice of COLA mock (zinit = 29, da = 1/30) with a mean redshift of zmean = 0.5 and a bin width of ∆z = 0.004 in the sky region of 165 • < RA < 185 • and 25 • < DEC < 45 • .Each side of this region extends approximately 468 h −1 Mpc based on the fiducial cosmology.The gray points represent the full (sub)halos, while the red marks ("×") correspond to the mock galaxies selected using the SHAM procedures.

Figure 8 .Figure 9 .
Figure 8. Contours of the correlation function ξ(r ⊥ , r ∥ ) dependent on the separation parallel (r ∥ ) and perpendicular (r ⊥ ) to the LoS direction.The solid contours show the ξ(r ⊥ , r ∥ ) of the CMASS NGC catalogue in each panel.Left: comparison of ξ(r ⊥ , r ∥ ) between the GADGET mock (dashed) and CMASS NGC (solid).Right: comparison of ξ(r ⊥ , r ∥ ) between the COLA mock with (zinit = 49, da = 1/50, Nstep = 50) (dashed) and CMASS NGC (solid).The contours from the outside to the inside correspond to ξ = (0.25, 0.5, 1, 2, 4, 8), respectively.It can be observed that the 2PCF of the COLA catalog is very similar to that predicted by GADGET, and both of them are able to accurately reproduce the observational result.

Figure 10 .
Figure10.Angular dependence of ξ on µ according to Eq. 18, with s integrated out.The results are presented only for µ > µmin, where µmin = 0.12.For comparison, the gray shaded area represents the 2σ statistical uncertainty estimated using 1000 PATCHY mocks.The bottom panels are the relative errors between the mock catalogs and the observation with respect to the 1σ statistical uncertainty, defined by (ξ mock ∆s − ξ NGC ∆s )/σ.The angular dependence for GADGET mock and the three different COLA settings are displayed from left to right for comparison.

Figure 11 .
Figure 11.Comparison of three-point statistics ζ(r1, r2, r3) between the observation data CMASS NGC and mock catalogs.The shaded area in the top panels represents the 2σ uncertainties originating from the initial randomness, which are estimated using PATCHY mock.r1 and r2 are both fixed at 15.5 h −1 Mpc, and r3 varies from 5 to 31 h −1 Mpc.The bottom panels show the relative errors between the mock catalogs and CMASS NGC with respect to the 1σ statistical uncertainty, defined by (ζ mock − ζ NGC )/σ.Same as in Fig. 10, three different COLA settings are used for comparison.

Figure 12 .
Figure 12.Top: comparison of power spectrum multipoles of ℓ = 0, 2, 4 for CMASS NGC and GADGET mock as well as COLA mock with zinit = 29, 49, 59, 119 when the time stride da = 1/Nstep.The shaded region represents the 2σ statistical uncertainty obtained from measuring 1000 PATCHY mocks for each multipole.Bottom: relative error between the mocks and CMASS NGC with respect to the 1σ statistical uncertainty, defined by (P mock ℓ

Figure 13 .Figure 14 .
Figure 13.Top: comparison of power spectrum multipoles of ℓ = 0, 2, 4 for CMASS NGC and GADGET mock as well as COLA mock with zinit = 29, 59, 119 when da = 1/120.The shaded region represents the 2σ statistical uncertainty obtained from measuring 1000 PATCHY mocks for each multipole.Bottom: relative error between the mocks and CMASS NGC with respect to the 1σ statistical uncertainty, defined by (P mock ℓ

Table 2 .
al. Parameters of the Press-Schechter SMF.Here α and ϕc denote the slope and the normalization, and Mc is the characteristic mass.