A PARAMETERIZED GALAXY CATALOG SIMULATOR FOR TESTING CLUSTER FINDING, MASS ESTIMATION, AND PHOTOMETRIC REDSHIFT ESTIMATION IN OPTICAL AND NEAR-INFRARED SURVEYS

Jeeseon Song; Joseph J. Mohr; Wayne A. Barkhouse; Michael S. Warren; Klaus Dolag; Cody Rude

doi:10.1088/0004-637X/747/1/58

1. INTRODUCTION

Connecting baryonic cosmic structures with dark matter (DM) in the universe is very important in understanding the evolution of large-scale structure. This is the key to understanding the implications of large observational surveys within the context of theoretical and numerical studies of large-scale structure evolution. Within this field of study clusters of galaxies have long been recognized as important laboratories for, e.g., studies of galaxy evolution (Dressler 1980; Butcher & Oemler 1984; Stanford et al. 1998; Lin et al. 2006; Faber et al. 2007; Hansen et al. 2009), intracluster medium enrichment (e.g., Lin & Mohr 2004, more references therein), and critical sign posts for studies of the evolution of the large-scale structure and cosmology through properties such as their mass function (MF; e.g., Evrard 1989; White et al. 1993; Vikhlinin et al. 2009; Vanderlinde et al. 2010) and their clustering (e.g., Miller & Batuski 2001; Hütsi 2010).

Galaxy cluster surveys are becoming one of the key tools for unveiling the nature of the dark energy (Wang & Steinhardt 1998; Haiman et al. 2001). For example, Gladders et al. (2007) recently used a moderate-scale galaxy cluster survey to constrain cosmological parameters using the self-calibration method (Hu et al. 2003; Majumdar & Mohr 2003, 2004). Other analyses using Sloan Digital Sky Survey (SDSS; York et al. 2000) have used a large ensemble of nearby clusters to study cosmology (Rozo et al. 2010). Although these optical surveys delivered large numbers of clusters, the cosmological constraints were only moderately strong. This points toward the need to better understand the systematics of cluster surveys. The key areas of concern include the cluster selection and the cluster mass estimation. Without continued progress in these areas we will not be able to capitalize on the rich cosmological information emerging from, for example, the South Pole Telescope Sunyaev–Zel'dovich effect (SZE) survey (Staniszewski et al. 2009), the Dark Energy Survey, and the eROSITA X-ray survey (Predehl et al. 2010).

Many simulations have successfully reproduced observational features in clusters and galaxies, such as the luminosity function (LF) or the correlation function. This usually has been done in one of two ways: using the framework of halo occupation distribution (HOD; e.g., Berlind & Weinberg 2002; Kravtsov et al. 2004), which can also depend on luminosity (e.g., Yang et al. 2003), and by using a semi-analytical model for galaxy evolution in merger history trees (e.g., Kauffmann et al. 1999). In the HOD framework, galaxies are populated within halos assuming a certain HOD and described by a conditional probability P(N∣M) that a halo with mass M contains N galaxies (Berlind & Weinberg 2002). Yang et al. (2003) extended the HOD modeling by labeling the galaxies with luminosities that use the conditional luminosity function (CLF) P(N∣M)dL, which gives the probability of finding a galaxy with luminosity in the range of L ± dL/2 as a function of halo mass M. These methods are powerful tools to populate DM particle simulations with galaxies, but one must rely on the adopted shape of the HOD or a specific galaxy evolution model as one of the ingredient of particle simulations. Certain aspects of galaxy evolution models, in particular, are still quite uncertain.

A more recent scheme to create mock catalogs as representations of large sky survey outputs uses the Hubble Volume Simulation (Evrard et al. 2002), where DM particles were chosen to be galaxies based on their local density, regardless of the host halo mass, and luminosities were assigned by the observed luminosity–density relation in SDSS (Wechsler 2004). Another recent example of relating luminosities and galaxies was developed by Vale & Ostriker (2006) using a subhalo catalog. In that study, they introduced a non-parametric model to relate the luminosity of galaxies and the mass of the DM halo or subhalo which hosts it, under the assumption that the luminosity–mass relation is monotonic. Their scheme, however, did not reproduce the observed LF in Lin et al. (2004).

Our objective is to provide $\it {empirically}$ motivated simulated galaxy catalogs that reproduce many of the galaxy properties in the universe using N-body simulations, so that we can use these catalogs to analyze and calibrate various automated optical and near-IR galaxy cluster analysis tools. We note that this method does not reproduce the luminosity-dependent clustering of galaxies in the universe, and therefore the simulated galaxy catalogs are not appropriate for large-scale structure clustering studies. Our galaxy simulator, by its nature, is therefore limited by a simulation that we use. Moreover, we want our catalog simulator to allow us to turn the current observational uncertainties into uncertainties on the selection function and mass estimations in the optical. To this end we assign the subhalo catalogs from the N-body simulation to be galaxies, and then directly implement observational features of galaxies in various wavebands $\it {without}$ assuming any HOD, semi-analytical model, or luminosity–mass relation.

This paper is organized as follows. In Section 2, we describe the N-body simulation, including the creation of our subhalo catalog. In Section 3, we introduce our method for assigning galaxy properties to subhalos. In Section 4, we show validation tests on the galaxy catalogs, compared to real data. In Section 5, a cluster finder is tested against the mock catalogs and analyzed. In Section 6, we discuss the reliability of the optical mass indicator, the B_gc parameter. Finally in Section 8, advantages and limitations of our approach are discussed, as well as future directions of the project. Throughout the paper, we assume the cosmological parameters to be Ω_M = 0.3, Ω_Λ = 0.7, and the Hubble parameter to be H₀ = 70 km s⁻¹ Mpc⁻¹.

2. N-BODY SIMULATION

The catalog simulator requires N-body simulations of sufficient resolution so that subhalo positions and dynamics can be directly extracted. One of the advantages we gain from using subhalo catalogs is that the dynamics of the galaxies within and around the cluster is better followed than in lower resolution simulations where only DM particles are available. Because of this, we emphasize that subhalo properties should be closely examined and compared to observed galaxy distributions before they are used in the galaxy simulator. Throughout the project, we use two distinct N-body simulations with subhalos. Below we describe one of the simulations upon which most of the development is based, while we refer to Dolag et al. (2005) for the details on the other simulation. We note that we use the latter in a limited way, since the field of view of the light cone is quite small (i.e., ${\sim} 8\deg ^2$ ). See Section 4 for its application.

2.1. Halo Catalogs

We carried out a cold dark matter only structure formation simulation with the Hashed Oct-Tree (HOT) algorithm, initially described in Warren & Salmon (1993), running at Los Alamos National Laboratory. We modeled a universe based on the concordance cosmology (Ω_M = 0.3, Ω_Λ = 0.7, H₀ = 70 km s⁻¹ Mpc⁻¹). Initial conditions were derived from transfer functions calculated by CMBFAST (Seljak & Zaldarriaga 1996). This DM simulation was performed with the same code and parameters as the series of simulations described in Warren et al. (2006). The simulation followed a cubic computational volume 816.3 h⁻¹ Mpc across with 1280³ (2.1 billion) particles. Each particle had a mass of 3.63 × 10⁹ M_☉.

DM halos are identified by the friends-of-friends (FoF) method (Davis et al. 1985). The FoF method identifies a set of spatially associated particles, roughly contained within an isodensity surface with a value of

$\begin{eqnarray*} &&\rho (h_{{\rm link}})=\frac{\alpha M_{p}}{(4\pi /3)h_{{\rm link}}^{3}}, \end{eqnarray*}$

where M_p is the particle mass and α is a constant of the order of two. For each object, one can identify how many linked particles it contains. Here the adopted linking length, h_link, is 0.2 times the average inter-particle spacing.

2.2. Subhalo Catalogs

Subhalos are identified independently from halos. The subhalo finding algorithm, called the IsoDen method (Pfitzner et al. 1997), selects all density enhancements. Subhalos are traced down to the smallest clumps which contain at least 10 particles. The IsoDen method calculates the spatial density field of particles to identify local peaks as centers of subhalos. At each center, isodensity surfaces grow to find all particles that belong to the center until any two different surfaces start to touch each other. Once subhalos are identified, then the mass is estimated by sum of particle mass within 50 kpc from the identified subhalos. Since subhalo mass is not estimated by the sum of particle masses inside subhalos, there is no absolute lower mass limit from a lower limit in the number of subhalos to clearly differentiate real subhalos from random noise peaks. Instead, we cut the subhalo catalog at mass limit of 3 × 10¹⁰ M_☉ to allow a more complete subhalo catalog.

Subhalos are then paired up with host halos if they lie within the virial radius r₂₀₀ of the host halo, which encloses the density of 200 times higher than the mean density of the universe, under the assumption that FoF masses are comparable to spherical overdensity (SO) masses. Since the simulation output box is periodic, subhalos on the edges of the box are also checked for their membership with hosts on the other side of the box. When a subhalo matches with two different host halos, we choose the more massive halo as the host. Although FoF masses (that are given for the host halos) are not exactly the same as SO masses, Lukić et al. (2009) found that the two masses are in good agreement within 5% for most halos (∼85%) in their N-body simulations. The other 15% of the halos had clear substructures and non-relaxed halos. We assume that for all of our halos, the agreement between FoF mass and SO mass is good enough for our purpose. We note that, however, this is one of the sources of scatter in the following comparison.

We examine the matched subhalo catalog to see whether the basic properties, such as the subhalo mass functions (SMF) and the radial distributions are consistent with observations of cluster galaxies. For example, in Figure 1, we boost the field SMF by 200/Ω_M in order to compare the shapes of the MF in two different environments inside clusters versus in the field. If the shapes of the two MFs are identical in terms of number density of subhalos per unit mass in two different environments, one would expect them to be identical in Figure 1 after the boosting. The remaining differences in the two MFs shown in the figure are the real population differences between the two species. Here halos with mass greater than 5 × 10¹³ M_☉ turn into clusters that contain member subhalos as cluster member galaxies by cross-matching subhalos with the halos within r₂₀₀ (i.e., no galaxy-free clusters are allowed). The field population includes lower mass halos along with their member subhalos (with halo mass less than the above mass cut), as well as subhalos that were not matched to any halo. In Figure 1, filled circles represent the field population and stars represent cluster members.

One characteristic to note is that there is a cutoff on the low-mass end for subhalos in the field and in clusters that is driven by the mass resolution of the simulation, as denoted by the dotted line in Figure 1. For the purposes of catalog simulation presented here, we choose the mass threshold of subhalos at 3 × 10¹⁰ M_☉ where the two MFs agree better, which allows us to use the bulk of the available population of cluster subhalos. Below that mass threshold, subhalos become incomplete and include more noise peaks in the background. The difference between field and cluster SMF turnover is an indication that subhalos must have been destroyed in dense regions, and particle groups of lower mass than this threshold are not found in sufficient numbers as subhalos, which is discussed more through the end of this section.

Also in Figure 1, we find an overdensity of cluster subhalos as compared to field subhalos after scaling by a factor of 200/Ω_m. In the K_s band, the normalization offset between the cluster LF (Lin et al. 2004) and the field LF (Kochanek et al. 2001) is about ∼3.5, which is consistent with the offset at a mass range around 5 × 10¹¹ M_☉ in the SMF if we assume differences in K_s-band mass to light between the field and the cluster are not significant. Note that Gao et al. (2004) find an offset in the subhalo abundance distribution in massive halos compared to that in the universe as a whole.

In Figure 2 we show subhalo radial profiles compared with a Navarro–Frenk–White (NFW) profile (Navarro et al. 1997). A total of 12,000 halos with mass greater than 5 × 10¹³ M_☉ are stacked to produce this figure, as well as Figure 1. For the radial profile in Figure 2 these halos are then binned into three mass ranges, where only 3000 random halos are shown. The lowest mass bin, shown with the dashed line, represents halos with mass between 5 × 10¹³ M_☉ and 2.0 × 10¹⁴ M_☉, and the next mass bin, shown with the dot-dashed line, contains halos with mass between 2.0 × 10¹⁴ and 10¹⁵ M_☉, and the highest mass bin contains all remaining halos that reach a mass up to 3.5 × 10¹⁵ M_☉. We compare these radial profiles of the subhalos with the NFW profile (solid lines), with a concentration of 5 and 3. These three subhalo profiles are in reasonable agreement with the NFW profile, but the halos in the highest mass bin, represented with the dashed line in the figure, show a deficit of subhalos in the central region when it is compared to the NFW profile with a concentration of 5 (for DM particles). This deficit has already been noted in previous studies (e.g., Ghigna et al. 2000; De Lucia et al. 2004) and is referred to as "antibiased" relative to the DM in the central regions of the halos. De Lucia et al. (2004) suggest that this is naturally explained as a consequence of the orbital decay combined with dynamical friction and mass loss that subhalos experience in high-density regions. Subhalos that are on orbits that take them through the cluster center are quickly destroyed and soon are no longer distinguishable from the central halo. Also, observed galaxy radial profiles in clusters are generally well fitted by an NFW model, but with a lower concentration of 3 or 4 (e.g., Lin et al. 2003, 2004). In our catalog generator these DM subhalos are used as galaxies, and so any radial properties of the subhalos will be reflected in the resulting galaxy population. For testing cluster finders that do not assume a specific shape for the cluster galaxy profile, this mismatch between the radial profiles of the cluster galaxies and the subhalos is not crucial.

3. CATALOG SIMULATOR

Once subhalos are closely examined, halo and subhalo catalogs can be ingested to the galaxy simulator. A two-dimensional sky view from an observer is a projection of a light cone where the volume depends on redshift extent, cosmological parameters, and the angular size or field of view. Ideally, one would like to build a mock catalog from a large enough simulation to include self-consistent cosmic expansion and structure formation evolution along the light cone. In the absence of such a simulated light cone as in the previous section, one can use one or more outputs from a periodic simulation box. In the tests described in the sections below we use a single z = 0 simulation output and stack it as needed to simulate the more distant universe. Therefore, the light cone that we build in this paper does not naturally follow the evolution of the underlying DM large-scale structure. This is not a limitation for the galaxy simulator itself, but rather a limitation of the underlying N-body simulation that we describe in the previous section. In the following paragraph, we describe how to map the light cone onto the sky from a single redshift output box.

To build our light cone we set an observer at a random position inside the box with a random line-of-sight direction—additional boxes are located along the chosen line of sight to simulate the higher redshift universe. This random realization makes it possible to build different mock catalogs from the same simulation. Cluster and galaxy redshifts are assigned using their comoving distances from the observer and their peculiar velocities projected onto the line of sight. The angular diameter distance together with the field of view determines the portion of the simulation that is included in the simulated catalog at each redshift. Once redshifts are determined, galaxy properties are assigned to reproduce several observed quantities, including the LF, the color distribution, and the halo occupation number (HON). In the following sections, we describe how observational constraints are used to guide the catalog simulator.

3.1. Galaxy Luminosity

In principle, we could use the subhalo mass to assign a luminosity. However, the mapping from DM mass to stellar mass (and therefore luminosity) is complex because of the stripping processes that affect the DM halos and stellar populations differently and that depend on the orbits of the galaxies through the cluster. Therefore, rather than this approach, we assign galaxy luminosity randomly to each subhalo in a manner that reproduces the observed LF in the appropriate environment. Because there is no strong mass segregation of subhalos, these two approaches should produce similar results. Moreover, with our approach we parameterize the field and cluster galaxy populations using Schechter LFs, and we can vary those populations systematically to explore the sensitivity of cluster finding and mass estimation to these properties. In addition, with this approach we have the flexibility to change the number density of field galaxies relative to cluster galaxies by altering the luminosity parameters.

When assigning luminosities to galaxies, we follow the observed K_s-band LF as given by Kochanek et al. (2001) for the field population and Lin et al. (2004) for cluster members (for parameter values see Table 1). We do not attempt to eliminate the contribution from the cluster populations to the observed LF in Kochanek et al. (2001) because cluster galaxies account for a small fraction of the whole (i.e., less than 10%), although the exact fraction of cluster galaxies to the field population would depend on the halo mass cut. If the halo mass cut to define "clusters" is high, then this fraction would be much smaller than 10%, while it would increase as one goes down to lower halo mass cut to include "groups." Both studies found a Schechter function as a good fit with the faint end slope fixed for local samples. Within the observed uncertainties, we match the two LFs by adopting K_* with the corresponding faint end slope, α (fixed). Special attention is given in determining luminosities of brightest cluster galaxies (BCGs; special type of galaxies in clusters, see Section 3.2), using the observed relation between L_BCG and M₂₀₀ (Lin & Mohr 2004),

$\begin{equation} \frac{L_{{\rm BCG}}}{10^{11}\,h_{70}^{-2}\,L_{\odot }}=4.9\pm 0.2\left(\frac{M_{200}}{10^{14}\,h_{70}^{-1}\,M_{\odot }}\right)^{0.26\pm 0.04}, \end{equation} \tag{ 1 }$

assuming the scatter in this relation to be Gaussian with 1σ of the observed uncertainty, 0.04.

Table 1. 2MASS LF Parameters

	M_K*	α
Cluster^a	−24.34 ± 0.01	−1.1
Field^b	−24.16 ± 0.05	−1.1

Notes. ^aLin et al. (2004). ^bKochanek et al. (2001).

Download table as: ASCII Typeset image

We impose a flux limit in drawing random luminosity from an LF that corresponds to the limit where the number of subhalos would match the expected number of galaxies in the LF. This flux limit is determined differently for the clusters and the field. For the clusters, we compare the HON in the simulated clusters (the number of subhalos within each host halo) to the HON in real clusters found by Lin et al. (2004),

$\begin{equation} N_{200}=36\pm 3\left(\frac{M_{200}}{10^{14}\,h_{70}^{-1}\,M_{\odot }}\right)^{0.87\pm 0.04}, \end{equation} \tag{ 2 }$

where N₂₀₀ is the number of galaxies within R₂₀₀ with an absolute K-band magnitude of −21.0, approximately M_* + 2.5. This comparison provides a corresponding flux limit for clusters.

For the field population, we find the survey depth from the total number density of subhalos above the adopted mass cut (i.e., 3 × 10¹⁰ M_☉) in the simulation box by comparing it to the number density of all galaxies (by integrating the LF to the survey depth) found by Kochanek et al. (2001). The resolution of this particular subhalo catalog is sufficient to push the survey depth down to −17 or −18 for clusters (see the black cluster line intercepts at around that K_s magnitude limit in Figure 3), while the field subhalo number density in absolute K_s space is deeper than an SDSS-like survey limit in absolute K_s at redshift of 0.2 and beyond. This mass resolution is good enough to build an SDSS-like catalog upon for all but the very lowest redshifts. As presented in the later sections, this mass resolution limitation at lower redshift range does not affect cluster findings because there are already enough galaxies to be identified as clusters at lower redshift even with fewer number of galaxies with cluster member properties. This might, however, add another reason for scatters in the mass–observable scaling relation.

**Figure 3.** Plot of how deep one can go down in simulating a survey, depending on underlying subhalo populations. Black x-axis (in magnitude) and y-axis (in 10¹⁰ M_☉) with the black solid line (clusters) and red dot-dashed line (field) show how a different lower mass threshold of subhalos to be included in a mock catalog determines *K_s* limit by changing subhalo number density in the simulation. The cluster and field lines are essentially the subhalo mass function curves in the absolute *K_s* magnitude space. Blue axes with blue solid line show an SDSS-like survey depth in the r band at 22.5 corresponds to which absolute magnitude in *K_s* at different redshift. With the adopted subhalo lower mass limit of 3.0 × 10¹⁰ M_☉ indicated by the green dashed line, the survey limit in the K band determined by subhalo density in clusters (i.e., ∼ − 19) is deep enough at redshift greater than ∼0.4, while the same subhalo density makes the field population deep enough at redshift of 0.2 or higher, as the green line intercepts with cluster and field mass function curves.
Download figure:
Standard image High-resolution image

In determining the flux limit, we evolve K_* of both field and cluster LFs with redshift according to the Bruzual & Charlot model (see Section 3.3). This is equivalent to an assumption that K_* galaxies in $\it {both}$ clusters and the field are red galaxies (i.e., no recent star formation), which is reasonable because K-band light is less affected by recent episode of star formation. We explore the effects of different field versus cluster population evolution in Section 3.5 below.

In the process described above, we also use the observed HON redshift evolution. Lin et al. (2006) examined the N–M relation evolution as a function of redshift by combining their local sample with a sample of several dozen clusters extending to redshifts z ∼ 1. Modeling the evolution as

$\begin{equation} N(M,z)=N(M,z=0)(1+z)^{\gamma _{{\rm HON}}}, \end{equation} \tag{ 3 }$

they showed that their data suggest very weak dependence of the HON on redshift. We adopt this form for the HON evolution with a fiducial γ_HON = 0 and vary this parameter to explore the impact of variations in the HON evolution on cluster finding.

3.2. Galaxy Type

A recent study (Weinmann et al. 2006) among many others (e.g., Hansen et al. 2009) has measured the fraction of blue galaxies within groups and clusters as a function of halo mass, halocentric radius, and luminosity using a group and cluster catalog that was optically selected from the second SDSS Data Release (DR2) in the redshift range between 0.01 and 0.2 (Yang et al. 2005). Results indicate that the blue fraction is higher in the outskirts of halos and in less massive halos. Gerke et al. (2007) extended the measurement of the blue fraction to higher redshift and to the field. They measured the cluster blue fraction as a function of radius from the cluster center, showing that the blue fraction gradually increased with distance from the center, approaching the blue fraction in the field.

This provides a convenient and simple way to model the galaxy populations in our simulated catalogs. We adopt an approach where clusters and the field contain only two types of galaxies—blue and red, with blue being star-forming galaxies and red being passively evolving galaxies. The variations in color of each of these types of galaxies are described in the next section. We further assume that clusters contain a special type of red galaxy—a BCG—which is the most luminous cluster member galaxy that is typically located near the center of the cluster. In our simulation the only difference in typical red galaxies and BCGs is their luminosity. The effects of active galactic nucleus emission or star formation on the colors of BCGs could be added once these properties are better characterized observationally.

First, we construct a functional form for the blue fraction, f_blue(z), to include variation with halo mass M₂₀₀, distance from the halo center r/r₂₀₀, and galaxy absolute magnitude in the K_s band, $M_{K_s}$ , by assuming that variations in blue fraction associated with these three parameters are separable:

$\begin{equation} f_{{\rm blue}}(z)=g_{{\rm blue}}(M_{K_s})A(M_{200})h_{{\rm blue}}\left(\frac{r}{r_{200}}\right)(1+z)^{\gamma _{{\rm blue}}}, \end{equation} \tag{ 4 }$

where g_blue(M_k) is the blue fraction as a function of its K_s magnitude, A(M₂₀₀) is a parameter associated with halo mass, and h_blue(r/r₂₀₀) is the blue fraction as a function of r/r₂₀₀, and the overall blue fraction evolution with redshift is assumed to be a power law with index γ_blue. We directly adopt the discrete data points for three different functions g_blue(M_k), A(M₂₀₀), and h_blue(r/r₂₀₀) from Figures 4, 5, and 8 in Weinmann et al. (2006), respectively. We translate the function g_blue(M_k) in terms of r band by assuming a fixed color r − K_s of 4.0 (from Bruzual & Charlot model). To find the parameter A(M₂₀₀), we integrate the blue fractions over the radial profile and the LF within the virial region. In this process, we adopt their mass limit of log₁₀M₂₀₀ = 11.85 from their data, although our clusters by definition has the lower mass threshold of 5 × 10¹³ M_☉. Once A(M₂₀₀) is determined, each galaxy is assigned a probability of being blue using Equation (4) together with the halo mass, the subhalo position within the halo, and the assigned K_s-band luminosity by interpolating discrete data points in the adopted data from Weinmann et al. (2006). Beyond the virial radius we have a field population, which is taken to have a fixed blue fraction of 80% (a tunable parameter based on Gerke et al. 2007).

Because Butcher & Oemler (1984) presented evidence that the fraction of the blue galaxies in clusters increases with redshift (the BO effect), many studies have attempted to replicate their work through different star formation indicators (e.g., Kodama & Bower 2001; Poggianti et al. 2006). There are, however, many complications involved in attempting to define the evolution of the fraction of blue galaxies (see Gerke et al. 2007 and references therein). A recent study on the evolution of the blue fraction in groups and in the field (Gerke et al. 2007) measured the blue fraction in groups as a function of redshift between 0.75 and 1.3. They found a nearly constant blue fraction in groups in this redshift range, and a rising field blue fraction beyond z = 1. Since we do not find any self-consistent observational study on the evolution of blue fractions in clusters at different redshifts ranging from local to the very distant universe that is statistically reliable, we parameterize the blue fraction evolution using γ_blue shown above, which allows us to explore the impact of more or less rapid evolution in the galaxy population. With a data set from large sky deep surveys, such as Dark Energy Survey, we will be able to constraint this more consistently.

3.3. Galaxy Color

Once every galaxy is assigned a K_s-band magnitude and an identity as either a blue or a red galaxy (note that BCGs are modeled as red galaxies), colors in optical and near-IR bands follow. This is where different types of galaxies (i.e., blue versus red) are differentiated in the final mock catalogs. Red galaxies have a characteristic spectrum with no strong emission lines and a prominent 4000 Å break since their stars have been passively evolving since their last episode of star formation at high redshift. We use a Bruzual & Charlot model (Bruzual & Charlot 2003, hereafter BC03) to assign the colors for the red galaxy population. We assume a single burst of star formation at z = 3 and let the galaxies passively evolve to z = 0 according to a single stellar population (SSP) synthesis model. We use six different models with six distinct metallicities corresponding to different luminosities, allowing us to construct a tilted red sequence. These SSP models are then tuned in metallicity so that the tilt of the color–magnitude relation for the Coma cluster (z = 0.0234) is reproduced. The models provide apparent magnitudes in ugriz JHK_s bands at specific redshifts out to z = 2.98 and at six different luminosities at each redshift. We interpolate using these models to generate model colors at intermediate redshifts and luminosities. Since each of the models contain only a single population of stars, we introduce scatter in the metallicity–luminosity relation to model variations among galaxies (i.e., to introduce intrinsic scatter in the color–magnitude relation). The observationally motivated scatter is 0.075 mag (Barkhouse et al. 2006).

The fact that blue galaxies with current or recent star formation activity are in general complex star formation histories (e.g., O'Connell 1997) makes them more difficult to model. Therefore, we sample colors of blue galaxies directly from a subset of the FLAMEX⁹ database. This database contains photometry in the BRIJK_s bands, as well as for the $\it {Spitzer}$ /IRAC [3.6] [4.5] [5.8] [8] micron filters, and measured photometric redshifts of 175,000 galaxies. We choose only those galaxies with well-measured fluxes in near-IR bands, leaving us with about 36,000 galaxies. To couple a galaxy with representative colors, we divide the adopted subset of FLAMEX into further subsets according to redshift and K_s-band luminosity. For each simulated galaxy, we choose the subset which is closest to its redshift and K_s-band luminosity, and then randomly select an observed galaxy from that group, with the additional constraint that it be bluer than the corresponding halo's color–magnitude relation at that redshift.

3.4. Observational Effects

Once magnitudes are determined for each galaxy, observational uncertainties (sum of uncertainties from the sources, the sky as background, and the systematic uncertainties from instruments) are applied to this model universe to produce a realistic simulated catalog. The galaxy catalog simulator can choose a specific telescope or instrument with a specific survey depth to add proper observational noise to the galaxy magnitudes. An integration time is determined such that the corresponding signal-to-noise ratio (S/N), which includes both systematic and statistical uncertainties, reproduces the measured uncertainties for the selection of telescope or instrument. The S/N is given by

$\begin{equation} {\rm S/N}=\frac{\sigma _{{\rm src}}^2}{\sqrt{\sigma _{{\rm src}}^2 +\sigma _{{\rm bkg}}^2 + \sigma _{{\rm sys}}^2}}, \end{equation} \tag{ 5 }$

where σ_src, σ_bkg are Poissonian uncertainties (from the photon counts on detectors) in the measured flux of a source and sky brightness (as a background), respectively, while σ_sys is the systematic uncertainty, which is band-dependent for a specific instrument. Each galaxy in a simulated catalog gets this total uncertainty (the denominator in Equation (5)) and then its "observed" magnitudes are selected as a Gaussian random deviate from "perfect" magnitudes, where total uncertainty is the width of Gaussian distribution. In this way, we create more realistic uncertainties for each galaxy, and we avoid repeating the same model spectral energy distribution or same extracted sample from FLAMEX. The chosen survey depth will impose a magnitude cut on galaxies, and typically our final catalogs contain galaxies with S/N > 5 (i.e., σ_mag ⩽ 0.2) as estimated by Equation (5).

3.5. Variations in Galaxy Populations

One of the advantages of a parameterized catalog simulator using input parameters for galaxy properties as in this work, inserted galaxy properties, such as the adopted LF for clusters and the field or the evolution in the HON and the galaxy color distribution, can be changed to create a different mock catalog within the same simulation output. Once a primary run is defined and constructed using the best observational estimates for input parameters, one can systematically study the effect of each input parameter listed above by changing one parameter at a time to produce another mock catalog and comparing the results to the primary run. This allows us to map out systematic changes in the performance of tested algorithms (e.g., cluster finders, richness estimators, redshift estimators) onto cosmological studies in a straightforward way to examine the systematic uncertainty of the selection function for the cluster finder due to the uncertainties in the creation of the mock catalogs. How much we change those galaxy properties is physically motivated, but the exact values for the deviations are not observed quantities. In our simulator, there are five parameters that can be controlled: the blue fractions as a function of physical properties of host clusters and galaxies, the evolution of the fraction of red galaxies within clusters with redshift, the LF of the field population, the evolution of HON with redshift, and the tightness of the color–magnitude relation.

4. AN SDSS-LIKE SIMULATED CATALOG

We use the simulator to produce a flux-limited SDSS-like galaxy catalog by imposing the r-band magnitude limit at 22.2 for S/N ⩾ 5 objects. In the final galaxy catalog, we include positions, redshifts, peculiar velocities, photometric redshifts, magnitudes in $\it {ugriz}$ in the AB system, JHK_s magnitudes, and the [3.6] and [4.5] micron $\it {Spitzer}$ /IRAC bands in the Vega magnitude system. Table 2 shows the input parameter values for the primary run.

Table 2. SDSS-like Catalog Parameters

Property	Parameter	Value
Field luminosity function	K_*	−24.21
Cluster luminosity function	K_*	−24.33
Blue fraction evolution	γ_blue	0.00
HON evolution	γ_HON	0.00
Red sequence scatter	Δ_RS	0.07

Download table as: ASCII Typeset image

The resulting galaxy catalog is then subjected to a series of validation tests which include log N–log S surface density for each band, color–magnitude relations for clusters, galaxy LFs of clusters and the field, HONs, radial profiles, and blue fraction as a function of radius and mass. Some of the basic features of this catalog appear in Figure 4. We note that this particular set of tests are done on a simulated catalog based on another simulation which has the field of view of approximately $8\deg ^2$ based on gas dynamic simulation by Dolag et al. (2005). This light-cone volume simulation contains the structure formation history that agrees well with the observation, whereas the main simulation that we have been using for most of the development is constructed from a single output. The top figure shows the log N–log S for both galaxies in the simulated catalog (solid lines) and those for ${\sim} 12\deg ^2$ of the sky from the SDSS archive (dashed lines). The bright end for the SDSS galaxies is noisy due to the small sky area. It is clear that the two distributions are in good agreement at magnitudes brighter than the flux limit of the simulation (at r ∼ 22.2).

**Figure 4.** Comparison of galaxy distribution in magnitude and redshift space. Top: solid lines represent galaxy counts deg⁻² for a realization of the light cone, with red for g, yellow for r, green for i, and blue for z bands. Dashed lines with the same color sequence exhibit the counterpart distribution for SDSS of about $12\ \deg ^2$ . Bottom: the true redshift distribution of the mock (solid lines) and the measured photometric redshift distribution in SDSS DR7 (dashed line). Both distributions represent fractional galaxy counts in each data set with r < 22.5.
Download figure:
Standard image High-resolution image

**Figure 4.** Comparison of galaxy distribution in magnitude and redshift space. Top: solid lines represent galaxy counts deg⁻² for a realization of the light cone, with red for g, yellow for r, green for i, and blue for z bands. Dashed lines with the same color sequence exhibit the counterpart distribution for SDSS of about $12\ \deg ^2$ . Bottom: the true redshift distribution of the mock (solid lines) and the measured photometric redshift distribution in SDSS DR7 (dashed line). Both distributions represent fractional galaxy counts in each data set with r < 22.5.
Download figure:
Standard image High-resolution image

The bottom figure shows the redshift distributions for the same data sets. As noted earlier, here we adopt another simulation to compare the redshift distribution because it includes a light-cone output but has a much smaller volume and therefore poorer statistics. For the purpose of testing the simulated galaxy color distribution, the smaller field of view is not a major issue, but for other tests described below the larger volume and bigger halos and subhalo catalogs are critically important. The solid line shows the distribution of assigned (by their positions and the peculiar velocities) redshifts in the simulation, while the dashed line shows the measured photometric redshift distribution that is available in the public SDSS DR7 data archive (Abazajian et al. 2009). The general trend between the solid line and the dashed line agree with each other, but the SDSS galaxies show a somewhat smoother distribution. In principle, one can run a photometric redshift estimator on the simulated galaxy catalog, preferentially the same photometric redshift estimator used in this particular comparison on SDSS DR7 data. However, we restrict ourselves to the real redshift in the simulation, rather than the remeasured photometric redshift, in order to enable a direct comparison in contents of the simulation and the real data by excluding another source of scatter from the photometric redshift estimator itself.

Figure 5 shows an example of the color–magnitude distribution of galaxies within the aperture size of its r₂₀₀ centered at a halo with mass of 4.5 × 10¹⁴ M_☉ at redshift of 0.21 (no field galaxies shown—only cluster member red and blue galaxies). The dotted line represents the red sequence at redshift of 0.2 by the same galaxy models used in the galaxy simulator, while the cluster's red members are marked as red dots to show the good agreement between the assigned galaxy colors and the redshift. For the better demonstration of the good representation of galaxy colors for redshifts, we construct a redshift estimator based on the red sequence of clusters, which in principle is comparable to the redshift estimator that the Red-sequence Cluster Survey uses for their analysis (see Gladders & Yee 2005 for details), and measure redshifts of clusters based on the simulated catalog. We use the same red galaxy models by BC03 that we use in the galaxy simulator described in Section 3.3. The detail of the tests of the redshift estimator is described in Section 7. In the tests, we measure a comparable scatter in redshift estimates for simulated clusters compared to our ensemble of clusters based on the SDSS– $\it {Chandra}$ joint data set below z = 0.6. This demonstrates that the simulated galaxy colors are consistent with the real SDSS galaxy colors in clusters with z < 0.6.

5. EVALUATING A RED-SEQUENCE CLUSTER FINDER

In this section, we introduce the first application of the simulated galaxy catalog described above to test a cluster finder—the Voronoi Tessellation and Percolation (VTP) cluster finding algorithm (Barkhouse et al. 2006). This cluster finder code is still undergoing development and testings to optimize the cluster detection parameters (W. A. Barkhouse et al. 2012, in preparation). In the VTP algorithm, clusters are detected as spatial overdensities by VTP method (Ramella et al. 2001) within redshift shells defined using the expected cluster galaxy colors in the color–magnitude relation. Because the finder searches in color–magnitude space, its output includes the redshift estimations as well as positions and richness measurements. The VTP finder has been run on the SDSS DR6 to build a cluster galaxy catalog where these test results are utilized. Completeness and false-positive (contamination) tests show how effectively this finder detects clusters in the simulation. "Halos" refer to the DM halos in the simulation, while "clusters" refer to the systems that are recovered by the cluster finder. This particular simulated galaxy catalog is the primary catalog described in the previous section, resembling SDSS catalogs with flux-limited galaxies at an r-band magnitude of 22.2.

The VTP algorithm first filters the galaxy catalog to select galaxies within 3σ of the intrinsic scatter of the red sequence (1σ = 0.075) at each redshift. The redshift-dependent red-sequence location in color–magnitude space is determined by assuming a stellar synthesis model with a star formation epoch with a burst at z = 5 prior to passive evolution (Kodama & Arimoto 1997). Redshifts from the finder, therefore, can be biased by the assumed model. In order to correct this bias, the raw prediction on redshift by the VTP finder is calibrated with redshifts in the mock catalog, since the mock catalogs use an SSP model with a different star burst formation time (z = 3). In practice, one needs a set of red-sequence models at redshifts that is well tested with a set of real clusters with spectroscopic redshifts. We emphasize that our red-sequence models that go into the galaxy simulator have been widely tested through several projects, including the redshift estimation test presented in Section 7 and Zenteno et al. (2011).

5.1. Completeness s(M₂₀₀, z)

We characterize the VTP cluster selections in terms of the completeness and contamination of the resulting cluster catalog. Note that completeness and contamination measures are sensitive to some degree to $\it {how}$ the matching of the cluster and halo catalogs is done. In this study, we match the VTP cluster candidates and the true DM halos by drawing a boundary for each DM halo with its r₂₀₀ and then selecting all VTP clusters that lie within this boundary. In cases of multiple cluster–halo matches, we select the more massive halo as the true match. The completeness s(M₂₀₀, z) is defined as the fraction of detected halos of that mass and redshift out of all the halos in the mock catalog with that mass and redshift. With increasing redshift, the completeness drops, more dramatically at redshift of 0.4 or higher with very little detection at redshift of 0.6 and higher, which is not surprising for a flux-limited data at r-band magnitude of 22.2.

Figure 6 shows the completeness of the primary run. Note that the completeness is a strong function of mass with essentially a cutoff at some threshold mass that increases with redshift. The completeness in the lowest redshift bins (〈z〉 ∼ 0.06) shows poorer performance than that in the next two bins (〈z〉 ∼ 0.17 and 0.29), and this may be because these nearest clusters of galaxies are projected onto a much larger portion of the sky as compared to high-redshift systems, effectively decreasing the S/N of a given system. Note also that at these redshifts the limited number of subhalos does not allow us to populate the cluster and field populations to the full depth of the flux-limited surveys, and this may also impact our results. The error bars shown include only Poisson noise and therefore reflect only the statistical uncertainties on our completeness measurements.

The completeness for low-mass halos is lower than for high-mass halos at all redshifts explored here. The main reason for this is that in a final simulated catalog, only halos with mass greater than 5 × 10¹³ M_☉ turn into clusters with cluster galaxy properties, while we keep all the lower mass halos inside the simulation. That means that any halo with mass below this mass threshold would be populated as the field populations, so that the cluster finder would not perform well at lower mass range. Another contribution to the low completeness in that mass range is a combination of a flux-limited sample of galaxies and a projection effect by smearing out signals from cluster members with contrast to the background field galaxies. A cluster finder would suffer from losing cluster signals in much deeper data because the noise from the background galaxies also increases with depth of a survey. This result is generally consistent with other optical cluster finders shown in the literature (e.g., Hao et al. 2010). The decreasing completeness for higher redshift bins is expected for an optical cluster finder on a survey like SDSS (i.e., r-band magnitude limit for SDSS is about 22).

5.2. Contamination c(z)

We define the contamination c(z) to be the fraction of detected VTP clusters not matched with DM halos. To measure this we cross-match the VTP clusters with DM halos, requiring a separation of equal to or less than each DM halo's r₂₀₀. Clusters with the smallest deviations in both spatial and redshift space are flagged as the best candidates. Clusters with no overlapping halos are taken as contamination. The number of these false clusters relative to the total number of detected clusters determines the contamination. Figure 7 shows the contamination fraction versus the redshift assigned to the contaminating cluster. Clusters at all redshifts where an SDSS-like survey probes, i.e., z < 0.4, show a very steady contamination level at around the 40% level. Note that we do not examine the mass distribution of the contaminating clusters because our only mass estimate in the case of contaminating clusters comes from the measured optical mass indicator B_gc, which is a poor indicator of cluster mass (see Section 6).

**Figure 7.** We plot the contamination fraction c(z) of all VTP clusters found that do not match to any true halo in the simulations. At SDSS depths the VTP cluster catalog exhibits low contamination at z < 0.6.
Download figure:
Standard image High-resolution image

The overall performance of the VTP cluster finder presented in this version is somewhat poorer than that of maxBCG (Koester et al. 2007), but it is undergoing thorough tests and improvements using simulated and real galaxy catalogs. The exact selection functions shown here are not meant to be the final measure of the performance of the VTP cluster finder. One potential source of contamination in a cluster finder comes when the cluster detections are not merged. That is, in the case of a true massive halo a typical cluster finder will detect multiple halos, and these multiple detections have to be identified and merged. Any non-merged detections will show up as contamination. This is part of the explanation of the high contamination in this cluster finder, which is still being tuned to overcome this difficulty. Another issue to consider is that we have adopted a mass cutoff in our defined cluster halos. Only halos with virial masses M₂₀₀ > 10^13.5 M_☉ are considered as clusters. Any systems of lower mass that trigger the cluster finder are considered as contamination.

5.3. Estimating Systematic Uncertainties on Completeness and Contamination

The completeness and contamination measurements presented above determine the best estimate of the selection function for the VTP finder, which can then be used in the cosmological interpretation of a VTP cluster catalog. An important element of this analysis is to include not only the best estimate of the selection function but also the associated uncertainties. These uncertainties are derived from current observational uncertainties in physical properties of the cluster and field galaxy populations. They are therefore systematic in nature—e.g., an enhanced blue fraction at high redshift will make it systematically more difficult to select clusters at those redshifts. We can use those observational uncertainties to characterize the uncertainties in the VTP selection function, and of course as the observational constraints on optical properties of the cluster and the field galaxy populations tighten, we can improve our simulated catalogs and reduce the systematic uncertainties on the selection function of the cluster finder. We proceed by creating several more simulated catalogs with input parameters varied around the best measured values. Then we run the VTP finder on each of these modified catalogs to determine the selection in each case. By comparing the selection from a modified catalog to that derived from the primary catalog, one can estimate the effect of changing each simulation parameter on the performance of the finder. Below we illustrate how this can be approached by carrying out such an analysis on the VTP cluster finder.

The residual completeness, defined as the completeness of a modified catalog where the completeness of the primary run is subtracted, is shown in several figures: one for tests of the impact of blue fraction (Figure 8), HON evolution (Figure 9), intrinsic scatter in the red sequence (Figure 10), and relative luminosity of the field and cluster populations (Figure 11). In each figure the bottom panel collects the results from each redshift, scales to the estimated 1σ uncertainty on completeness due to this effect, and presents no uncertainties to improve the readability. We note that for cases where the completeness approaches zero, even small changes between runs with different parameters can lead to apparently large fractional changes in the completeness. For this reason, we do not include these points in the plots.

**Figure 8.** Change in completeness due to changes in the blue fraction, which depends on the K-band luminosity, the mass of the host halo, and the galaxy's distance from the host halo center, as well as redshift evolution factor in the form of (1 + z)^γ_blue, where γ_blue is −1.6. Results show the change as a function of mass for each redshift slice considered between the fiducial blue fraction model and one that is perturbed by 1σ in its parameterization. The bottom panel shows the systematic uncertainty in the completeness due to the 1σ observational uncertainty in this blue fraction.
Download figure:
Standard image High-resolution image

**Figure 9.** Change in completeness due to changes in the evolution of the HON with γ = +1 (diamond) or −1 (triangle). Each panel with different color-coded symbols represents a different redshift bin of which the sequence is shown in the lowest panel. The lowest panel shows the systematic uncertainty in the completeness due to the 1σ observational uncertainty in HON.
Download figure:
Standard image High-resolution image

**Figure 10.** Change in completeness due to changes in the intrinsic scatter of the cluster red sequence Δ_RS = 0.02–0.03 (blue) and Δ_RS = 0.12–0.15 (red). Again, the bottom panel corresponds to the 1σ uncertainty in the completeness due to the uncertainties in the intrinsic scatter.
Download figure:
Standard image High-resolution image

**Figure 11.** Change in completeness due to changes in relationship between the field and cluster luminosity functions (LF). The shows the case where the field LF is made fainter, and the red is the case where the field LF is made bright. The bottom panel shows the 1σ uncertainties in the completeness due to remaining uncertainties in the offset between the cluster and field LFs.
Download figure:
Standard image High-resolution image

In Figure 8, a modified catalog is generated with blue fraction as a function of M₂₀₀, r₂₀₀, and $M_{K_s}$ (see Section 3.2) and with redshift evolution included in the form of (1 + z)^γ_blue. The adopted γ_blue is −1.6 so that the blue fraction at redshift of 1 to be 80%. Since the blue fraction depends on several factors, it is not possible to quote one value for the blue fraction at certain redshift. The adopted γ_blue is somewhat arbitrary in this presented exercise, since how the blue fraction in clusters evolves with redshift is still controversial. The fact that when there are less red galaxies in clusters at high redshift, the selection becomes not as good as in the primary catalog test. This is shown as dashed lines above the primary catalog completeness shown at zero. Figure 9 shows the comparison in completeness for the HON modification run where the evolutionary factor γ_HON = ±1 is implemented. With +1 shown with a diamond, for example, there are more galaxies in a cluster of a given mass at higher redshift, and this makes it easier to detect these systems. The triangles show the other case with −1, where there are fewer cluster galaxies at higher redshift, and so the finder detects fewer systems. Figure 10 shows two different levels of intrinsic scatter in the red sequence: one with smaller scatter (blue), ∼0.02–0.03, and the other with larger scatter (red), ∼0.12–0.15, than the fiducial scatter in the primary catalog. This test illustrates how the intrinsic scatter of the color–magnitude relation alters the performance of the cluster finder. The smaller the scatter is, the more clusters are recovered by the finder, for example, since the finder relies on the existence of the red sequence to isolate cluster galaxies from the field galaxies projected nearby. In Figure 11, triangle (diamond) symbols represent the case where the K_* of the field LF is brightened (dimmed) compared to that of the cluster LF. The completeness shown in the lower redshift bins is counterintuitive, where in high z the change is too small to see the effect. It is currently under investigation why the performance of a cluster finder gets better when the field galaxies get brighter, but a possible explanation is that brightened field galaxies at high z contribute to signals of clusters at lower z. We also note that the changes are at a level of <1%, which could be random fluctuation. Changing the field galaxies to make them brighter or fainter than the cluster does not have a significant impact on the completeness.

As noted above, the bottom panel of Figures 8, 9, 10, and 11 shows the systematic uncertainty in the completeness due to observational uncertainties due to uncertainties in the blue fraction in clusters, the HON, the size of the intrinsic scatter in the red sequence, and the field/cluster LF. The runs used in Figures 8, 9, 10, and 11 use characteristic values for the deviations from the best-fit values, which are large enough to see the effect of changes in the performance of the finder. We can then use these measured changes in completeness to estimate the changes one would see for a 1σ shift in each of the simulation parameters. Colors for redshift bins are the same as in Figure 6 and the others. Overall, it is clear that the existing observational uncertainties in the properties of cluster and field galaxy populations out to z ∼ 0.8 do not translate into large systematic uncertainties in the selection function for the VTP finder. To fully characterize the uncertainties in the selection function would require an analysis of the covariances among these effects. Here we are providing a demonstration of how one would proceed (Figure 12). In the limit of small covariances in the impact on the completeness of variations among these key simulation parameters, one can simply take the quadrature sum of the contributions to the uncertainty from each different parameter. This leads directly to an estimate of the systematic uncertainty on the completeness as a function of mass and redshift that could be used in the cosmological analysis of any cluster sample extracted using the finder. In a similar manner, the impact of mock catalog simulator parameters on the contamination could be carried out.

**Figure 12.** Combined uncertainty in completeness as a function of mass and redshift. Each 1σ uncertainty from the different simulation parameter tests is combined in a quadrature sum assuming contributions from each parameter are independent. Current observational uncertainties on the properties and evolution of cluster galaxy properties translate into typically ∼10% systematic uncertainties in the survey selection function.
Download figure:
Standard image High-resolution image

6. PERFORMANCE OF THE B_gc MASS ESTIMATOR

Along with finding clusters one must be able to reliably estimate their masses to be able to use the evolution of the MF with redshift to study the nature of the dark energy or cosmic acceleration. Establishing a reliable estimator in optical bands stands as a significant hurdle for optical cluster cosmology studies. This may well be because, unlike in the X-ray and SZE where the signature of the hot gas within the virial region is dramatically enhanced by its higher temperature and density, the galaxies change more gradually as they move from the field to the cluster. In particular, red-sequence galaxies exist not only in clusters, but also in groups and more generally as tracers of the large-scale structure. Because the red-sequence galaxy population comes not only from the cluster virial region but also from the surrounding cluster environment, any signature derived from this population (i.e., the number of red galaxies or the integrated luminosity from this population) seems to be less well correlated with cluster virial mass than measured extracted from the X-ray and SZE. Here we use our simulated catalogs to explore one well-known optical mass estimator.

One measure of the strength or richness of the cluster galaxy population is using B_gc, which is the amplitude in the cluster center–galaxy correlation. As in the previous sections, we only mean to demonstrate the application of our simulated galaxy catalog in testing and calibrating one of the optical richness estimators in the field, rather than reporting the final answers to the scaling relation or the scatter in that scaling relation.

The B_gc parameter, first pioneered by Longair & Seldner (1979) and later extensively tested by Yee & López-Cruz (1999), is defined as follows:

$\begin{eqnarray} &&B_{{\rm gc}}=N_{{\rm bkg}}\frac{D^{\gamma -3}A_{{\rm gc}}}{I_{\gamma }\Phi [M(m_0,z)]}, \end{eqnarray} \tag{ 6 }$

$\begin{eqnarray} &&A_{{\rm gc}}=\frac{N_{{\rm net}}}{N_{{\rm bkg}}}\left[\frac{(3-\gamma)}{2}\right]^{\theta ^{\gamma -1}}. \end{eqnarray} \tag{ 7 }$

Here, N_bkg is the background galaxy count down to an apparent magnitude m₀ and Φ[M(m₀, z)] is the integrated LF of galaxies up to the absolute magnitude M corresponding to m₀ at the cluster redshift z. I_γ is a constant that depends on the choice of γ, and D is the angular diameter distance to z. As shown in Equation (6), B_gc measurements depend on the LF that is adopted and the background number counts. Yee & López-Cruz (1999) tested the sensitivity of B_gc values to these two parameters and to the magnitude limit down to which the adopted LF was integrated and concluded that they did not strongly affect B_gc values as long as the normalization of the LF was carefully measured.

In Figure 13 we plot the richness of the systems in the cluster catalog extracted using the VTP algorithm versus the real mass of the halos in the simulation. As seen in our scatter plot of the B_gc versus M₂₀₀ for 8000 halos in the mock catalog (Figure 13), the scatter in this relation is quite large and the apparent correlation is quite weak. Red symbols in the plot, which represent clusters at redshift higher than 0.4, show that higher redshift clusters tend to have lower B_gc values for their mass. As shown in Equation (7), the measurement of B_gc depends on the net count of galaxies through A_gc (N_net = N_{cluster members} − N_bkg) which decreases with redshift due to the flux-limited nature of the mock catalog. This results in the preference of a lower B_gc for higher redshift systems. We also note that the B_gc measurement by the VTP finder is restricted within the red-sequence slice for each cluster, which results in "quantization" in B_gc values in Figure 13 (upper panel).

Rykoff et al. (2008) report a best-fit relation between X-ray luminosity and their mass indicator (N₂₀₀), using the maxBCG clusters at the redshift from 0.1 to 0.3 (Koester et al. 2007). N₂₀₀, their richness indicator, is given by the number of E/S0 ridgeline members falling within R₂₀₀ of the BCG and brighter than 0.4 L_*. The best-fit relation in their findings shows an intrinsic scatter σ_ln L = 0.86 ± 0.03, which corresponds to $\sigma _{\log _{10}L} \sim 0.37$ . That, in turn, corresponds to $\sigma _{\log _{10}M} \sim 0.25$ , assuming the scaling relation between X-ray luminosity and mass is a power law of the form log₁₀L ∼ log₁₀M^1.5 (Reiprich & Böhringer 2002). Figure 14 shows the cumulative distribution of B_gc scatter in each B_gc bin, and the cumulative Gaussian distribution with σ of 0.25 (red dotted line). Except for the highest B_gc bin, the cumulative distribution is nearly Gaussian with σ of 0.25. These data support the use of Gaussian scatter in log₁₀M in the optical richness parameter for cosmological studies (e.g., Gladders et al. 2007). Interestingly, our measurement of the intrinsic scatter in the B_gc–mass relation is consistent with the intrinsic scatter estimated for N₂₀₀ that is extracted by leveraging the X-ray properties of galaxy clusters (Rykoff et al. 2008).

**Figure 14.** Cumulative distribution in mass of different cuts in B_gc shifted so that they overlap at the 50th percentile point. Note that the different B_gc cuts correspond to cuts in mass that are well described by a Gaussian with $\sigma _{\log _{10}}$ of 0.25 (red line). The color scheme is the same as in Figure 13.
Download figure:
Standard image High-resolution image

**Figure 14.** Cumulative distribution in mass of different cuts in B_gc shifted so that they overlap at the 50th percentile point. Note that the different B_gc cuts correspond to cuts in mass that are well described by a Gaussian with $\sigma _{\log _{10}}$ of 0.25 (red line). The color scheme is the same as in Figure 13.
Download figure:
Standard image High-resolution image

7. A RED-SEQUENCE REDSHIFT ESTIMATOR

In this section, we present another application of the simulated galaxy catalog through validation tests of a redshift estimator that has been used in studies of SZE and X-ray selected galaxy clusters (Staniszewski et al. 2009; High et al. 2010; Williamson et al. 2011; Zenteno et al. 2011) and that forms the core of an optical cluster selection method being developed for joint optical and SZE cluster finding (Liu et al 2011). First we describe how the redshift estimator works and then we discuss its validation tests on the simulated galaxy catalog. In order to validate the performance of the redshift estimator, we also use a set of known clusters with SDSS data, as well as the simulated galaxy catalog presented in this work. The reason why we test it on the spectroscopic samples is to make sure that the models for red-sequence galaxies that we adopt in the simulator, as well as the redshift estimator, are consistent with the real red sequence.

In the case of an SZE or X-ray candidate, we examine galaxies within a region of the sky around the known center by choosing a physical radius of r ∼ 0.8 Mpc or an estimated virial radius r₂₀₀, depending on the application. At each redshift we select only the galaxies with colors appropriate for the red sequence, using the SSP model (the BC model which is the same model that we use to paint red galaxies in the mock catalog described in Section 3.3). We apply a correction for the expected number of background galaxies with this color at each redshift and thereby end up with a measured overdensity of red-sequence galaxies as a function of redshift along the line of sight toward the X-ray or SZE cluster candidate. All four available filters, g, r, i, z, are used to look for this red galaxy overdensity at all redshift ranges in order to avoid false peaks due to degeneracy between colors and redshift. The degeneracies in colors of g − r, r − i, and i − z has been known to generate false peaks at transitional redshift ranges (i.e., ∼0.35, ∼0.75 for z < 1) where the 4000 Å break moves out of one of the filters in colors. By scanning through the data with five colors, g − r, g − i, r − i, r − z, and i − z, at certain redshift intervals with two sets of colors at all redshift below 1.1, one can reinforce the real overdensity peaks to show up in any of these color combinations (sometimes in more than one colors), avoiding exclusively using g − r at redshift around 0.3–0.35, for example.

We have tested this approach using 51 X-ray and optically selected clusters that lie within the SDSS DR7 survey region. For each cluster we estimate the cluster mass and virial radius using an X-ray temperature and the mass–temperature scaling relation from (Finoguenov et al. 2001). Figure 15 shows the results for A1682, and Figure 16 shows the results from the redshift estimation of MS1621.5+2640. On the top panel of the histograms, the background-corrected net number within the cluster's r₂₀₀ is plotted in each redshift bin. A1682 is a cluster with X-ray temperature of 7.24 keV at z = 0.226 and MS1621.5+2640 has X-ray temperature of 7.6 keV at z = 0.426. Because the 4000 Å break in the old stellar populations of the cluster galaxies moves out of g band at about z = 0.35, the peak in the histogram for MS1621.5+2640 shows up in r − z versus z at the appropriate redshift, while A1682 shows a peak in g − r versus r histogram at the appropriate redshift. The bottom panel in each figure shows the likelihood of each detection where we assume Poisson noise. The redshift bin with the maximum likelihood present is chosen as the initial estimation of a cluster (indicated as red asterisk on the top panel), and we refine the redshift estimation using a Gaussian function fit to the overdensity distribution around the initial peak.

**Figure 15.** Red-sequence overdensity toward A1682 (z = 0.226) within 0.8 Mpc from the X-ray center. Top: The number of galaxies within the area of 0.8 Mpc radius from the center at each redshift and color which is corrected for an estimated number of background galaxies within the same aperture size at each redshift and color. The solid black line is the fitted line assuming the overdensity distribution around the peak is Gaussian. Bottom: The likelihood for an overdensity to be a cluster signal compared to background noise is shown, which is then turned into the probability of being real cluster detection, assuming that the background noise is Poissonian.
Download figure:
Standard image High-resolution image

**Figure 16.** Background-corrected red-sequence galaxy overdensity for MS1621.5+2640 (z = 0.426), the same as in Figure 15.
Download figure:
Standard image High-resolution image

In Figure 17 on the top panel, we show a plot of the photometric redshift versus the spectroscopic redshift for the full ensemble of 51 clusters, while the bottom panel shows the same test using a simulated galaxy clusters described in this paper. There is good overall agreement, with evidence that our estimates systematically higher by with the rms scatter of the photo-z's around the true redshifts (black dotted line) of 2.9% and an rms of 1.8% once the bias is removed. This systematic bias in redshift could be further reduced by additional tuning of the SSP models used to predict the red-sequence color and evolution, as supported by the test using the simulated galaxy cluster populations using the same SSP models for galaxy colors and the redshift estimator.

8. DISCUSSION

We have presented an empirical method for constructing simulated catalogs that rely upon high-resolution DM-only simulations and the observationally constrained properties of cluster and field galaxy populations. This empirical approach is attractive because it offers the possibility to test, improve, and characterize the final performance of optical cluster finders and other tools that are used on real galaxy catalogs. This method can be further tuned as improved observational constraints on cluster and field galaxy populations become available.

In Section 5 we have demonstrated the power of this approach by characterizing the selection function of the VTP cluster finder. We have used the mock catalog to measure the contamination as a function of redshift and the completeness as a function of mass and redshift. The development version of this code performs reasonably well, with characteristic contamination of ∼40% out to z ∼ 0.55, and completeness that increases with mass and reaches characteristic values of around 50%.

An advantage of our method is the ease of modifying the galaxy populations by altering the population parameters such as the blue fraction mass dependence and redshift evolution, the intrinsic scatter in the red sequence, the evolution of the HON, and the relative brightness of field and cluster galaxy populations. We have used this property of the catalog simulator to extract the systematic uncertainties on the selection function using the full range of the catalog parameters that are consistent with current observational data. Sensitivity of the completeness to variations in these parameters is at the ∼5% level for blue fraction, HON, and red-sequence scatter changes, but it does approach ∼15% at certain redshifts. Uncertainties are only at the ∼1% level due to relative brightness changes in the field and cluster LFs. We also have demonstrated how one would combine effects of uncertainties in different parameters on estimating selection functions of a cluster finder, assuming the effects of parameters explored in this work are independent. When they are all combined in quadrature, the uncertainty is at the level of ⩽15%. These uncertainties should be included in the cosmological analysis of optically selected cluster samples. With additional work on the optical properties of cluster galaxy populations, especially in the high-redshift regime, these uncertainties can be reduced to enable the full statistical power of the large optically selected cluster samples to be realized.

In addition, we used the ensemble of catalogs to test optical mass estimation (Section 6) and redshift estimation (Section 7). Our analysis shows that the B_gc optical mass estimator is correlated with cluster halo mass but with large scatter. The scatter in mass at fixed B_gc is approximately log normal and about 75% ( $\sigma _{\log _{10}M}\sim 0.25$ ), which is markedly worse performance than X-ray and SZE mass estimators (Mohr et al. 1999; O'Hara et al. 2006; Vikhlinin et al. 2009; Vanderlinde et al. 2010; Andersson et al. 2011). The performance of the red-sequence overdensity redshift estimator is better than 2% once biases possibly associated with a mismatch between the evolution of observed and modeled red sequences are taken into account.

This project highlights the importance of empirical mock catalogs, not only for obtaining an accurate estimate of the selection function for a cluster finding algorithm, but also for characterizing the uncertainties in the selection function. Moreover, a catalog generator is an essential tool during the development of tools for optical cluster finding, mass estimation, and photometric redshift measurement. It is important to note that this approach is not useful for studying the underlying physics of galaxy formation, which must be pursued using direct hydrodynamical simulations or even semi-analytic studies. Also, the simulator presented here builds upon the large-scale subhalo distribution of the underlying N-body simulation. So clearly an interesting next step in this development stream is to extend to larger volume light-cone outputs from other structure formation simulations and to extend the analyses of galaxy populations to higher redshift using deeper survey data sets such as those coming from the Dark Energy Survey.

J.S. acknowledges the support of the DOE grant DE-FG02-95ER40899. J.J.M. acknowledges the support of the Excellence Cluster Universe in Garching.

A PARAMETERIZED GALAXY CATALOG SIMULATOR FOR TESTING CLUSTER FINDING, MASS ESTIMATION, AND PHOTOMETRIC REDSHIFT ESTIMATION IN OPTICAL AND NEAR-INFRARED SURVEYS

Article metrics

Permissions

Author affiliations

Dates

ABSTRACT

1. INTRODUCTION