Constraining the H i–Halo Mass Relation from Galaxy Clustering

, , , , , , , and

Published 2017 August 31 © 2017. The American Astronomical Society. All rights reserved.
, , Citation Hong Guo et al 2017 ApJ 846 61 DOI 10.3847/1538-4357/aa85e7

Download Article PDF
DownloadArticle ePub

You need an eReader or compatible software to experience the benefits of the ePub3 file format.

0004-637X/846/1/61

Abstract

We study the dependence of galaxy clustering on H i mass using ∼16,000 galaxies with redshift in the range of $0.0025\lt z\lt 0.05$ and H i mass of ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$, drawn from the 70% complete sample of the Arecibo Legacy Fast ALFA survey. We construct subsamples of galaxies with ${M}_{{\rm{H}}{\rm{I}}}$ above different thresholds and make volume-limited clustering measurements in terms of three statistics: the projected two-point correlation function, the projected cross-correlation function with respect to a reference sample, and the redshift-space monopole moment. In contrast to previous studies, which found no/weak H i mass dependence, we find both the clustering amplitudes on scales above a few megaparsecs and the bias factors to increase significantly with increasing H i mass for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$. For H i mass thresholds below $\sim {10}^{9}\,{M}_{\odot }$, the inferred galaxy bias factors are systematically lower than the minimum halo bias from mass-selected halo samples. We extend the simple halo model, in which the galaxy content is only determined by halo mass, by including the halo formation time as an additional parameter. A model that puts H i-rich galaxies into halos that formed late can reproduce the clustering measurements reasonably well. We present the implications of our best-fitting model on the correlation of H i mass with halo mass and formation time, as well as the halo occupation distributions and H i mass functions for central and satellite galaxies. These results are compared with the predictions from semianalytic galaxy formation models and hydrodynamic galaxy formation simulations.

Export citation and abstract BibTeX RIS

1. Introduction

In the current paradigm of galaxy formation, baryon gas is expected to fall into the gravitational potential wells of dark matter halos, where it can cool, condense, and form stars. A key goal in modern galaxy formation is thus to understand the physical link between galaxies and their host dark matter halos. Indeed, this has been one of the key goals of the large optical imaging and spectroscopic surveys accomplished in the past one and a half decades, such as the Sloan Digital Sky Survey (SDSS; York et al. 2000). Based on these surveys, the abundance and clustering have been measured to high precision for low-redshift galaxies selected by stellar mass and multiband luminosities (e.g., Zehavi et al. 2005; Li et al. 2006, 2009; Yang et al. 2012; Reddick et al. 2013; Skibba et al. 2014). These measurements, together with various halo-based statistical models of galaxy distributions (see, e.g., Jing 1998; Berlind & Weinberg 2002; Yang et al. 2003; Zheng et al. 2005; Zehavi et al. 2011; Guo et al. 2014a, 2015a, 2016), have led to dramatic improvements in our understanding of the relationship between galaxy properties (e.g., stellar mass, luminosity, and color) and dark matter halos.

Compared to the stellar contents of galaxies, our understanding of their cold gas contents lags considerably behind. The gas distribution around galaxies can be used to probe the accretion, star formation, and feedback processes, providing additional tests to galaxy formation models. For high-redshift galaxies, the gas properties are typically studied through analyzing the hydrogen and metal absorption lines in the spectra of background galaxies and quasars (e.g., Steidel et al. 2010). At low redshifts, the gas distribution of galaxies can be mapped out through the 21 cm H i hyperfine emission line. Large surveys of the cold gas content of galaxies through H i have become available only in the past decade. The H i Parkes All-Sky Survey (HIPASS; Meyer et al. 2004) detected ∼5000 extragalactic H i sources out to $z\sim 0.04$ covering the whole southern sky, while the Arecibo Fast Legacy ALFA Survey (ALFALFA; Giovanelli et al. 2005) detected more than 30,000 extragalactic H i sources out to $z\sim 0.06$ in the northern sky. These surveys have enabled us to estimate the H i mass function of the H i-rich galaxies in the local universe (Zwaan et al. 2005; Martin et al. 2010), providing important constraints on the mass density of the atomic neutral hydrogen. However, these early surveys are limited by the effective frequency ranges and can only map the H i distribution in the local universe. In the future, the Square Kilometre Array (SKA)8 project will provide unprecedented opportunities to probe the H i distribution much deeper into the universe. Indeed, the ongoing Australian SKA Pathfinder (ASKAP)9 survey and MeerKAT10 science projects will provide a large amount of H i data in the forthcoming years. For example, the Wide-field ASKAP L-Band Legacy All-Sky Blind Survey (WALLABY; Koribalski 2012) will detect up to 500,000 H i-selected galaxies to a redshift of $z\sim 0.26$, the Westerbork Northern Sky H i Survey (WNSHS; Duffy et al. 2012b) is targeting at the northern sky complimentarily to WALLABY, the Deep Investigations of Neutral Gas Origins (DINGO; Meyer 2009) survey will study the evolution of the gas-rich galaxies to $z\sim 0.4$, the COSMOS H i Large Extragalactic Survey (CHILES; Fernández et al. 2013, 2016) will probe the H i deep field centered at the COSMOS field to $z\sim 0.45$, and the Looking at the Distant Universe with the MeerKAT Array survey (LADUMA; Holwerda et al. 2012) is designed to detect H i to $z\gt 1$.

Galaxy clustering provides a way to help establish the connection between the H i content of galaxies and dark matter halos. A commonly used statistic for quantifying the galaxy clustering is the two-point correlation function (2PCF), which measures the excess probability of finding pairs of galaxies at certain separations with respect to a random distribution. The 2PCF has been measured for the HIPASS and ALFALFA samples (Basilakos et al. 2007; Meyer et al. 2007; Martin et al. 2012; Papastergis et al. 2013). The H i-selected galaxies are found to have low clustering amplitudes, with a linear galaxy bias, ${b}_{{\rm{g}}}$, varying from 0.7 to 0.9 for different galaxy samples. However, the previous measurements of the clustering dependence on the H i mass are still controversial. Based on the same HIPASS galaxy sample, Basilakos et al. (2007) found a much stronger clustering amplitude for galaxies with a larger H i mass, while Meyer et al. (2007) found only very weak dependence. By using the 40% complete catalog of the ALFALFA survey, Papastergis et al. (2013) claimed to find no strong dependence of the clustering amplitudes of H i-selected galaxies on the H i mass over the range of ${M}_{{\rm{H}}{\rm{I}}}\,={10}^{8.5}\mbox{--}{10}^{10.5}\,{M}_{\odot }$. However, we note that there is a trend of H i mass dependence in their measurements of H i mass bin samples (see their Figure 10), but the authors attributed it to the finite-volume effect in the small ALFALFA sample.

It has been found that the H i mass generally increases with galaxy stellar mass, and that the stellar mass in this H i mass range would vary from 107 to ${10}^{12}\,{M}_{\odot }$ (Bothwell et al. 2009; Zhang et al. 2009, 2012; Evoli et al. 2011; Saintonge et al. 2011; McGaugh 2012; Popping et al. 2015). Given the strong dependence of galaxy clustering on stellar mass as well established in recent studies (see, e.g., Li et al. 2006, 2009; Yang et al. 2012; Reddick et al. 2013; Skibba et al. 2014), the no/weak dependence of clustering on H i mass is puzzling, even if a reasonable scatter between the H i and stellar masses is taken into account to weaken the dependence of clustering on H i mass.

Accurate clustering measurements of the H i-selected galaxies are important in constraining the H i–halo mass relation, as the galaxy bias is directly related to their host halo masses. The H i–halo mass relation has been studied in hydrodynamical simulations (see, e.g., Duffy et al. 2012a; Davé et al. 2013; Cunnama et al. 2014; Vogelsberger et al. 2014; Rafieferantsoa et al. 2015; Crain et al. 2017) and semianalytical models (see, e.g., Blitz & Rosolowsky 2006; Obreschkow et al. 2009; Popping et al. 2009, 2014; Fu et al. 2010, 2012, 2013; Lagos et al. 2011; Kim et al. 2017; Xie et al. 2017; Zoldan et al. 2017). The predicted H i–halo mass relations from the various models are highly dependent on the detailed algorithms and have significant differences. In addition, semiempirical models have been developed to match the observed H i mass functions (see, e.g., Popping et al. 2015; Padmanabhan & Kulkarni 2017; Padmanabhan & Refregier 2017; Padmanabhan et al. 2017). While the H i–halo mass relation may be reliably inferred from the spatial clustering of the observed H i-selected galaxies using halo-based statistical models, it has not been well studied so far. Modeling of the H i–halo mass relation would potentially be able to provide new and interesting constraints on galaxy formation models.

In this paper, we use the 70% complete catalog of the H i-selected galaxies in ALFALFA to examine the dependence of clustering on H i mass, as well as to investigate the statistical relation between H i mass and halo mass. When estimating the 2PCFs, we carefully take into account the effect of the ALFALFA sample selection, which depends on both the flux limit and the H i emission line width, as well as the sample variance effect caused by the limited survey volume. This gives rise to the finding of a significant dependence of clustering on the H i mass. We extend the simple subhalo abundance matching (SHAM; e.g., Conroy et al. 2006) model to interpret the observed clustering of the different H i mass samples. Different from the SHAM model adopted in the literature, which links the stellar mass or luminosity of galaxies with the maximum circular velocity or mass of the host halos, our model additionally takes into account the dependence of clustering on halo formation time. We will show that the inclusion of the halo formation time in the modeling is required by the data, enabling our model to successfully fit the observed H i mass dependence of the clustering and thus constrain the H i–halo mass relation.

The structure of the paper is constructed as follows. In Section 2, we describe the galaxy samples and the simulation used in the modeling. The clustering measurements are displayed in Section 3. We introduce our modeling method in Section 4. The model implications are presented in Section 5. We summarize and discuss our results in Sections 6 and 7, respectively. Throughout the paper, we assume a spatially flat Λ cold dark matter cosmology, with ${{\rm{\Omega }}}_{{\rm{m}}}=0.307$, h = 0.678, ${{\rm{\Omega }}}_{{\rm{b}}}=0.048$, and ${\sigma }_{8}=0.823$, consistent with the constraints from Planck (Planck Collaboration 2014) and with the parameters used in the simulation adopted for our modeling (see Section 4).

2. Data

2.1. The ALFALFA Survey and the H i Galaxy Sample

In this paper, we use the most up-to-date public data release, $\alpha .70$ 11 , of the ALFALFA survey (Giovanelli et al. 2005), which is a catalog of 70% of the final survey area. The blind observation of the 21 cm emission line is performed with the 305 m single-dish radio telescope at the Arecibo Observatory, which is very sensitive in the L band of 1.4 GHz with an angular resolution of 3farcm5. The optical counterparts of these H i sources are identified interactively by cross-matching external imaging databases (see details in Haynes et al. 2011). For our analysis, we only select H i detections with secure extragalactic sources, i.e., "Code 1" in the ALFALFA catalog. An H i mass is calculated for each H i source in the catalog from the integrated flux density of the H i emission line and the galaxy luminosity distance (see Equation (1) of Giovanelli et al. 2005).

Although ALFALFA is a blind survey, the resulting sample of the H i-detected sources is not purely flux limited. The detection rate depends on both the H i line flux, ${S}_{\mathrm{int}}$, and the line profile width, W50, as the detector is more sensitive to narrower line profiles than broader ones for a given ${S}_{\mathrm{int}}$. We apply the completeness cut suggested by Haynes et al. (2011, see their Equations (4) and (5)) to select a sample that is more than 50% complete. Figure 1 displays the angular distribution of the ALFALFA sample, where the right ascension (R.A.) and declination (decl.) are the angular positions of the optical counterparts of the H i sources. For simplicity, we only select galaxies in the ALFALFA survey limited to the R.A. and decl. cuts shown as the red boundaries in the figure.

Figure 1.

Figure 1. Angular distribution of galaxies in the ALFALFA $\alpha .70$ sample, separated into the northern (left) and southern (right) galactic caps. For simplicity, we only select galaxies in the ALFALFA survey limited to the simple R.A. and decl. cuts shown as the red boundaries.

Standard image High-resolution image

The redshift of each source in the ALFALFA catalog corresponds to the heliocentric velocity ${v}_{21}\equiv {{cz}}_{21}$ of the H i 21 cm emission line. We convert z21 to the redshift in the cosmic microwave background frame (Fixsen et al. 1996), ${z}_{\mathrm{CMB}}$, for the clustering measurements. Because of the radio frequency interference (RFI) from the Federal Aviation Administration radar at high heliocentric velocities, we further limit the redshift range to $0.0025\lt {z}_{\mathrm{CMB}}\lt 0.05$. Other sources of RFI still contaminate the signals in regions of frequency space corresponding to spherical shells in the survey volume. The average weights of the lost volume due to RFI, ${w}_{\mathrm{RFI}}$, have been estimated for galaxies with different heliocentric velocities in ALFALFA (see Figure 6 of Martin et al. 2010). We will correct for this effect using the random galaxy catalogs, as will be discussed in Section 3.1.

These restrictions produce a sample of 16,313 H i-detected galaxies in the redshift range of $0.0025\lt z\lt 0.05$ and with reliable estimates of the H i mass ranging from 107.1 to ${10}^{10.9}\,{M}_{\odot }$. The sample covers a total area of 4693 ${\deg }^{2}$ in the sky and an effective comoving volume of $1.52\times {10}^{6}\,{\mathrm{Mpc}}^{3}$.

2.2. H i Mass Samples

In order to study the dependence of clustering on H i mass, we have constructed a set of 13 H i mass samples selected by using different minimum masses, with a fixed mass threshold interval of ${\rm{\Delta }}\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })=0.2$. Table 1 lists the detailed sample information, including the total number of galaxies and the average number density (calculated using Equation (5)) of each sample.

Table 1.  Samples of Different H i Mass Thresholds

Sample ${N}_{\mathrm{gal}}$ ${n}_{{\rm{g}}}({h}^{3}\,{\mathrm{Mpc}}^{-3})$
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 8.0$ 16059 136.49 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 8.2$ 15900 113.52 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 8.4$ 15684 89.86 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 8.6$ 15395 68.97 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 8.8$ 14989 51.95 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 9.0$ 14394 39.05 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 9.2$ 13363 26.98 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 9.4$ 11752 17.50 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 9.6$ 9555 10.51 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 9.8$ 6775 5.84 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 10.0$ 3841 2.68 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 10.2$ 1444 0.92 × 10−3
$\mathrm{log}({M}_{{\rm{H}}{\rm{I}}}/{M}_{\odot })\gt 10.4$ 353 0.22 × 10−3

Note. All the H i mass samples cover the same redshift range of $0.0025\,\lt z\lt 0.05$, and the effective comoving volume is $1.52\times {10}^{6}{({h}^{-1}\mathrm{Mpc})}^{3}$. The total number of galaxies ${N}_{\mathrm{gal}}$ and the mean number density ng are also displayed (see Equation (5)).

Download table as:  ASCIITypeset image

The left panel of Figure 2 shows the distribution of the H i-selected galaxies in the redshift (z)–H i mass (${M}_{{\rm{H}}{\rm{I}}}$) plane. The two blue vertical lines indicate the redshift limits of our total sample. The apparent decrease in the number of galaxies around $z\sim 0.052$ is caused by the aforementioned RFI. The right panel presents the number density distribution n(z) for three typical ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples, with the blue, green, and red histograms for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}$, 109, and ${10}^{10}\,{M}_{\odot }$, respectively. The bumps and dips indicate the influence of local superclusters. For instance, the bump at $z\sim 0.005$ reflects the significant spatial overdensity of galaxies with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ in the Virgo Cluster. However, galaxies in the Virgo Cluster are generally H i deficient when compared to field galaxies of the same size and morphological type (see, e.g., Solanes et al. 2001; Gavazzi et al. 2005; Chung et al. 2009). The dip at $z\sim 0.02$ is caused by the fact that only very few galaxies in the Coma Cluster have ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ (Giovanelli & Haynes 1985; Magri et al. 1988; Bravo-Alfaro et al. 2000). These large-scale structures have a significant effect on the H i distribution in the local volume, especially for the low-${M}_{{\rm{H}}{\rm{I}}}$ samples, as we will see below. This is actually one of the main reasons why we opt for H i mass thresholds instead of differential mass bins, as the latter samples will be significantly affected by the limited volumes. H i mass threshold samples have the largest available volume of the ALFALFA survey, minimizing the sample variance effect.

Figure 2.

Figure 2. Left: distribution of the H i-selected galaxies as a function of redshift ${z}_{\mathrm{CMB}}$ and H i mass ${M}_{{\rm{H}}{\rm{I}}}$. The two blue vertical lines show the redshift limits of our sample. Right: number density distribution for three typical ${M}_{{\rm{H}}{\rm{I}}}$ samples, with the blue, green, and red histograms for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}$, 109, and ${10}^{10}\,{M}_{\odot }$, respectively. The number density distribution for the ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ sample in the SGC is shown as the magenta histogram.

Standard image High-resolution image

In the figure we additionally show the number density distribution of the ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ sample, but only for the subset in the southern galactic cap (SGC). Both the peak at ∼0.005 and the dip at ∼0.02 become less prominent in comparison to the full sample at the same mass threshold, indicating that the NGC is indeed overwhelmingly dominated by the superclusters, including Virgo and Coma.

The drop in the number density for each of the ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,M$ and ${10}^{9}\,{M}_{\odot }$ samples toward higher redshift reflects the fact that the sample is flux limited, not volume limited (see the left panel). We note that even though each H i mass threshold sample is not volume limited, the clustering measurements will be made in an effectively volume-limited sense, as detailed in Section 3.1.

2.3. SDSS/DR7 Galaxy Sample

Our analysis also makes use of the SDSS (York et al. 2000) Data Release 7 (DR7; Abazajian et al. 2009) Main galaxy sample, to investigate the correlation between the H i sources and the general population of galaxies. The ALFALFA survey covers a much larger area than the SDSS/DR7 in the SGC, but the two samples overlap significantly in the northern galactic cap (NGC). We take the SDSS/DR7 galaxy data from the New York University Value-Added Galaxy Catalog (NYU-VAGC; Blanton et al. 2005), where the effective area of the sample is about $7300\,{\deg }^{2}$ and the galaxies are selected simply by an r-band Petrosian magnitude limit of $r\sim 17.77$. For each galaxy the k + e corrected luminosities in the SDSS ugriz bands and a stellar mass estimated from the SDSS photometry are also provided in this catalog (Blanton & Roweis 2007).

2.4. The SMDPL Simulation

We use catalogs of dark matter halos and subhalos identified from the Small MultiDark simulation of Planck cosmology (SMDPL;12 Klypin et al. 2016) to model the statistical relation between the H i-selected galaxies and host dark matter halos. The SMDPL simulates the evolution of dark matter distribution in a comoving volume of ${400}^{3}\,{h}^{-3}$ Mpc3 with a mass resolution of $9.6\times {10}^{7}{h}^{-1}\,{M}_{\odot }$, assuming cosmological parameters of ${{\rm{\Omega }}}_{m}=0.307$, ${{\rm{\Omega }}}_{b}=0.048$, h = 0.678, ns = 0.96, and ${\sigma }_{8}\,=0.823$. The dark matter halos and subhalos in SMDPL are identified with the ROCKSTAR phase-space halo finder (Behroozi et al. 2013b), which is shown to be efficient and accurate in finding the bound spherical structures (Onions et al. 2012; Knebe et al. 2013). For the current work we use the simulation output at z = 0, which is close to the average redshift of the ALFALFA survey.

3. Measuring the Clustering and Biasing of H i Mass Selected Samples

3.1. Methodology of Clustering Measurements

As described in the previous section, when selecting the ALFALFA galaxy sample, we have applied the completeness limit, which depends on the H i line profile. Therefore, each galaxy in the catalog has a different flux limit ${S}_{\mathrm{int},\mathrm{lim}}$, depending on the width of its H i line profile. Different from galaxy clustering analyses that use volume-limited samples in SDSS DR7 (see, e.g., Zehavi et al. 2011; Guo et al. 2015b), constructing volume-limited samples for low-${M}_{{\rm{H}}{\rm{I}}}$ galaxies in ALFALFA is impractical owing to the limited redshift range and the W50-dependent flux limit. Instead, we measure the clustering of each sample using galaxies in the whole redshift range and correct for the effect of ${S}_{\mathrm{int},\mathrm{lim}}$ following the method laid out in Xu et al. (2016). Such a method makes use of the maximum volume ${V}_{\max }$ accessible to each galaxy, leading to an effectively volume-limited clustering measurement from a flux-limited sample (Xu et al. 2016).

Given the small volume of the ALFALFA survey, the local large-scale structures can potentially affect the clustering measurements even if we use H i mass threshold samples over a relatively large redshift range. We apply the maximum likelihood method (Zwaan et al. 2005) to calculate the effective volume available to each galaxy ${V}_{\mathrm{eff}}$ as the value of ${V}_{\max }$, which takes into account the survey sensitivity limit and the density fluctuations of galaxies caused by the large-scale structure. The details of the method can be found in Appendix B of Martin et al. 2010 (see also Papastergis et al. 2011). This has proved very robust against density fluctuations along the line of sight (LOS; Jones et al. 2016).

For each of our H i mass threshold samples, we use a generalized Landy–Szalay estimator (Landy & Szalay 1993) to measure the redshift-space 2PCFs $\xi ({r}_{{\rm{p}}},{r}_{\pi })$ (Li et al. 2009; Xu et al. 2016),

Equation (1)

where ${r}_{\pi }$ and ${r}_{{\rm{p}}}$ are the separations of galaxy pairs along and perpendicular to the LOS. The data–data (${\mathrm{DD}}^{* }$), data–random (${\mathrm{DR}}^{* }$), and random–random (${\mathrm{RR}}^{* }$) pair counts are calculated as follows:

Equation (2)

Equation (3)

Equation (4)

where ${V}_{{ij}}=\min ({V}_{\mathrm{eff},i},{V}_{\mathrm{eff},j})$, with ${V}_{\mathrm{eff},i}$ and ${V}_{\mathrm{eff},j}$ being the effective volumes accessible to the ith and jth objects, respectively. Note that in Equations (2)–(4) we only include pairs in which both of the galaxies are within the common volume Vij. As stated in the Appendix of Xu et al. (2016), at small rp separations, such a requirement would possibly exclude a small fraction of correlated pairs stretched along the LOS by the effect of peculiar velocities. By comparing to the case of including all galaxy pairs, we quantify this effect to be much smaller than 2% for ${r}_{{\rm{p}}}\lt 1\,{h}^{-1}\,\mathrm{Mpc}$, which is negligible compared to the measurement errors at these scales. For simplicity, we assume, in the above expressions, that the number of galaxies in the random catalog is the same as that in the data catalog. In practice, the random catalog we construct (see below) for each sample is 50 times as large as the data catalog, and all the pair counts are correctly normalized. The generalized Landy–Szalay estimator weighs each galaxy pair according to the common maximum volume that the pair of galaxies can both be observed, i.e., the contribution of each pair to the 2PCF measurement is accounted for in a volume-limited sense. In the end, we achieve an effectively volume-limited 2PCF measurement by making full use of a flux-limited sample (Xu et al. 2016). Compared to a traditional volume-limited sample, the sample variance is reduced in our approach.

The average sample number density ${n}_{{\rm{g}}}$ is estimated as

Equation (5)

This number density is substantially larger than that computed using the number of galaxies listed in Table 1 and the sample comoving volume ($1.55\times {10}^{6}\,{h}^{-3}\,{\mathrm{Mpc}}^{3}$), as the sample is essentially flux limited, which leads to larger incompleteness at larger redshifts.

The random samples are constructed in the following way. First, we generate a large set of random points uniformly distributed in the R.A. and $\sin (\mathrm{decl}.)$ plane within the survey area. The discontinuity in the angular distribution of the α.70 sample (especially the SGC) is taken into account in the random sample with the same angular geometry. We then assign each random point a set of properties (redshift and H i mass, as well as the corresponding ${V}_{\mathrm{eff}}$), which are the same as those of a real galaxy randomly drawn from the full sample of H i galaxies. This method has been verified to be reliable and accurate for wide-angle surveys such as SDSS and ALFALFA (see, e.g., Li et al. 2006; Ross et al. 2012). As the redshifts come directly from the full galaxy sample, the RFI effects seen in the data are automatically built into the random catalog. Next, for each H i mass threshold galaxy sample, we construct the corresponding random sample by selecting from the full random catalog the points with H i masses above the threshold.

To reduce the effect of redshift-space distortion (RSD) caused by galaxy peculiar velocities, we focus on the measurements of the projected 2PCF ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ (Davis & Peebles 1983), defined as

Equation (6)

In practice, the integration runs to a limited LOS separation ${r}_{\pi ,\max }$ instead of infinity in order to reduce the noise at very large separations. To select a reasonable ${r}_{\pi ,\max }$, we have investigated the variation of $\xi ({r}_{{\rm{p}}},{r}_{\pi })$ with ${r}_{\pi }$ for different H i mass threshold samples and found that the value of $\xi ({r}_{{\rm{p}}},{r}_{\pi })$ is close to zero and becomes noisy for most of the samples when ${r}_{\pi }$ is larger than $20\,{h}^{-1}\,\mathrm{Mpc}$. Therefore, to maximize the signal-to-noise ratio (S/N) of the ${w}_{{\rm{p}}}$ measurement, we adopt ${r}_{\pi ,\max }\,=20\,{h}^{-1}\,\mathrm{Mpc}$, and the same value is used in our models. We adopt logarithmic ${r}_{{\rm{p}}}$ bins of a constant width ${\rm{\Delta }}\mathrm{log}{r}_{{\rm{p}}}=0.2$ covering the range from 0.13 to $20.48\,{h}^{-1}\,\mathrm{Mpc}$, and linear ${r}_{\pi }$ bins of width ${\rm{\Delta }}{r}_{\pi }=2\,{h}^{-1}\,\mathrm{Mpc}$ from 0 to $20\,{h}^{-1}\,\mathrm{Mpc}$. The error covariance matrices for ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ are estimated using the jackknife resampling technique with 121 subsamples (Zehavi et al. 2011; Guo et al. 2015b).

3.2. Dependence of Clustering on H i Mass

Figure 3 shows the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements (symbols with error bars) for 12 of the 13 H i mass threshold samples. For comparison, the measurement of the sample with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ is repeated in every panel as a blue solid line. The black line is the predicted ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ for the underlying dark matter distribution at z = 0 assuming the same cosmology. Compared to the dark matter distribution, all the galaxy samples except the one with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10.2}\,{M}_{\odot }$ are significantly anti-biased, showing lower clustering amplitudes at all scales probed. This result is in agreement with previous studies of the clustering of H i-rich galaxies (Basilakos et al. 2007; Meyer et al. 2007; Li et al. 2012; Martin et al. 2012; Papastergis et al. 2013).

Figure 3.

Figure 3. Projected two-point correlation function for 12 of the 13 H i mass threshold samples. In each panel, the symbols with error bars present ${w}_{{\rm{p}}}$ as a function of ${r}_{{\rm{p}}}$ and its error for a given H i mass threshold sample. For comparison, the measurement of the sample with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ is repeated in every panel as a blue solid line. We also show the predicted dark matter distribution as the black line.

Standard image High-resolution image

In contrast to previous results of no dependence of clustering on H i mass (e.g., Meyer et al. 2007; Papastergis et al. 2013), we find the clustering amplitude of the H i-selected galaxies to depend on the H i mass, in different ways at different scales, when the mass threshold exceeds ${10}^{9}\,{M}_{\odot }$. On scales larger than a few megaparsecs, the clustering amplitude at a given scale shows no/little change as the H i mass threshold goes from the lowest value of ${10}^{8}\,{M}_{\odot }$ up to ${10}^{9}\,{M}_{\odot }$, before increasing significantly at higher masses. In the figure, the amplitude of ${w}_{{\rm{p}}}$ at $\sim 10\,{h}^{-1}\,\mathrm{Mpc}$ for the most massive sample (with ${M}_{{\rm{H}}{\rm{I}}}\,\gt {10}^{10.2}\,{M}_{\odot }$) is a factor of ∼4 higher than that of the sample with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ and is comparable to that of the dark matter. It is interesting to note that the slope of the correlation function also presents a systematic trend with H i mass when the threshold exceeds ${10}^{9}\,{M}_{\odot }$, with a flatter shape at higher masses. This effect makes the mass dependence more pronounced at larger scales. For instance, at ${r}_{{\rm{p}}}\sim 10\,{h}^{-1}\,\mathrm{Mpc}$ the increasing amplitude is observed for all the samples with mass threshold above ${10}^{9}\,{M}_{\odot }$, but at ${r}_{{\rm{p}}}\sim 1\,{h}^{-1}\,\mathrm{Mpc}$ the mass dependence becomes obvious only when ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9.8}\,{M}_{\odot }$ or higher.

On scales smaller than a few megaparsecs, the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurement shows no significant dependence on H i mass, except for intermediate- to high-mass samples with thresholds above ${10}^{9.8}\,{M}_{\odot }$. We note that the clustering amplitudes for samples with the H i mass threshold below ${10}^{9}\,{M}_{\odot }$ appear to be enhanced on intermediate scales, from a few ×102 kpc to a few $\,{h}^{-1}\,\mathrm{Mpc}$, when compared to the samples at higher masses. These differences are significant, but they are likely produced by the strong sample variance effect due to the small volume of the samples. Although our method of weighing pairs by $1/{V}_{\max }$ reduces the sample variance effect in comparison to the measurements with the traditional volume-limited sample, the effect may still be non-negligible, as the number of galaxies with ${10}^{8}\,{M}_{\odot }\lt {M}_{{\rm{H}}{\rm{I}}}\lt {10}^{9}\,{M}_{\odot }$ is only 1665 (see Table 1) and most of these galaxies are located at $z\lt 0.015$, where the local superclusters could dominate the clustering power (Figure 2). We will discuss the effect of the limited volume in more detail in the next subsection.

3.3. Robustness of the Clustering Measurements

Sample variance is the dominant source of uncertainty in clustering measurements when the survey volume is small. This is indeed the case for our low-mass samples, as indicated by the large error bars of the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements for the samples with H i mass thresholds below $\sim {10}^{9}\,{M}_{\odot }$ (see Figure 3). Our measurements on large scales and at high masses may also be affected to a certain extent by the sample variance. In this subsection, we perform four analyses to test the robustness of both the clustering measurements and the error estimates. The first three analyses are based on real samples from the ALFALFA and SDSS, while the fourth analysis uses a large set of mock catalogs constructed from a cosmological simulation. With these analyses we aim to better understand the reason behind the intermediate-scale enhancement in the correlation function of low-mass samples, as well as to further verify the mass dependence of the correlation function at ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$.

3.3.1. Finite-volume Effect

We note that Papastergis et al. (2013) found a trend of the clustering dependence on the H i mass by comparing the projected 2PCFs between two samples with the H i mass in the ranges of ${10}^{8.5}$${10}^{9.5}\,{M}_{\odot }$ and ${10}^{9.5}\mbox{--}{10}^{10}\,{M}_{\odot }$ (their Figure 10). But when limiting the higher H i mass sample to the same volume as the lower one, the trend disappears, which they attribute to the finite-volume effect. With the α.70 sample that is about 2.8 times larger than the volume in Papastergis et al. (2013), we perform the same test with the same H i mass ranges to verify our results, which is shown in the left panel of Figure 4. We compare the measurements of the two H i mass samples in a smaller volume of $0.0025\lt z\lt 0.03$ (blue and red circles). The measurement for the higher H i mass sample in $0.0025\lt z\lt 0.05$ is also shown for comparison.

Figure 4.

Figure 4. Left: comparisons of the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements with the H i mass in the ranges of ${10}^{8.5}$${10}^{9.5}\,{M}_{\odot }$ (blue circles) and ${10}^{9.5}$${10}^{10}\,{M}_{\odot }$ (red circles) in a smaller volume of $0.0025\lt z\lt 0.03$. We also show the measurements of the sample ${10}^{9.5}$${10}^{10}\,{M}_{\odot }$ in the redshift range of $0.0025\lt z\lt 0.05$ for comparison (black circles). Right: comparisons of the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements by applying weights according to ${V}_{\max }$ and ${V}_{\mathrm{eff}}$, respectively (see the text).

Standard image High-resolution image

There are two important features in the figure. First, in this smaller volume, we also find a clear trend of H i mass dependence on large scales between the low- and high-mass samples, consistent with Figure 3. Second, the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements for the high-mass sample are not significantly affected by the finite-volume effect. From the redshift–H i mass distribution in Figure 2, the high-mass sample in $0.0025\lt z\lt 0.03$ is almost volume limited, if not considering the dependence on W50. The consistency between the measurements of the high-mass sample in different volumes further verifies our method of using ${V}_{\mathrm{eff}}$ to correct for the sample selection effect. Therefore, the detection of H i mass dependence of the clustering can be attributed to both the larger volume of our sample and the correction for the flux limit in the measurements.

In this paper, we use the effective volume ${V}_{\mathrm{eff}}$ determined from the 2D stepwise maximum likelihood method of Zwaan et al. (2005), rather than the normal maximum volume ${V}_{\max }$ from the survey completeness limit. The use of ${V}_{\mathrm{eff}}$ helps obtain more accurate H i mass number density, which is necessary for the halo modeling. We show in the right panel of Figure 4 the measurements of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ for three typical H i mass samples using the weights of ${V}_{\mathrm{eff}}$ (circles with error bars) and ${V}_{\max }$ (lines). Except for the slight difference in the sample of ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$, which is within the measurement errors, there is no significant difference in using the two volume weights for higher H i mass samples.

3.3.2. NGC versus SGC

Another straightforward way to investigate the effect of sample variance and the validity of our results is to compare the clustering measurements from the data in the NGC and the SGC. The Virgo and Coma superclusters are both located in the northern galactic hemisphere, while the Perseus–Pisces supercluster, which dominates the southern hemisphere, is mostly outside the survey area of the α.70 sample. The results are shown in Figure 5, where we plot the measurements of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ for the same H i mass thresholds, but separately for the NGC (red filled circles) and SGC (blue filled circles) data. The measurements with the full data for each mass threshold sample and the projected 2PCF of dark matter are also shown as green and black solid lines, respectively, for comparison.

Figure 5.

Figure 5. Projected 2PCF measurements for galaxies in the NGC (red filled circles) and SGC (blue filled circles). The different panels show the measurements from different H i mass threshold samples. The green line in each panel displays the results from the combined sample of the NGC and SGC, while the black line shows the projected 2PCF of dark matter as in Figure 3.

Standard image High-resolution image

The ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements with the NGC and SGC data generally agree with each other within $1\sigma $ errors for high-mass samples with H i mass thresholds above ${10}^{9}\,{M}_{\odot }$, indicating that the local superclusters are no longer a dominant source of uncertainty when H i mass exceeds this threshold. When galaxies with ${M}_{{\rm{H}}{\rm{I}}}\lt {10}^{9}\,{M}_{\odot }$ are included, however, we find that the NGC samples have stronger clustering at intermediate scales than the corresponding SGC samples, and the effect is stronger for lower masses. At the lowest mass with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$, the ${w}_{{\rm{p}}}$ at ${r}_{{\rm{p}}}\sim 2\,{h}^{-1}\,\mathrm{Mpc}$ obtained from the NGC sample is about four times that obtained from the SGC sample. The figure also reveals that the NGC sample dominates the contribution to the measurements of the full sample at each given mass threshold. If only the SGC is considered for all masses, one would observe very weak or no mass dependence of the clustering for mass thresholds below $\sim {10}^{9}\,{M}_{\odot }$, instead of the declining trend seen in the total and NGC samples.

This analysis confirms our conjecture that the enhancement of the clustering at intermediate scales as seen in Figure 3 is contributed mostly (if not completely) by the galaxies in the NGC, where the several known superclusters dominate the clustering of the low-mass samples. Although the NGC sample covers a volume that is about 2.7 times larger than the SGC sample, the ${w}_{{\rm{p}}}$ errors of the former at low H i mass are comparable to, or even larger than, the latter. This again reflects the dominating role of the local superclusters in the clustering of the NGC galaxies. However, as seen from the redshift distribution of galaxies in the SGC (magenta histogram) in Figure 2, there are also clear imprints of voids and clusters, but not as significant as in the NGC. These voids could potentially lower the measured clustering amplitudes. Since the effect of the local large-scale structures is already taken into account in the ${V}_{\mathrm{eff}}$ weight, we expect the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements from the SGC to be more reliable for the low H i mass samples.

3.3.3. Clustering Relative to an SDSS Reference Sample

So far our results of H i-mass-dependent clustering are presented with the autocorrelation functions, which suffer from sample variance effects given the limited survey volume. The other way to study the H i mass dependence of the clustering with an alleviated sample variance effect is to cross-correlate each H i sample (target sample) with a reference sample. The reference samples for all target samples are selected so that galaxies in them are controlled to have the same distribution in intrinsic properties, but galaxies in each reference sample are weighted to give a similar redshift distribution to the corresponding target sample. We can derive the ratio of bias factors of the target sample and the corresponding reference sample from the target–reference and reference–reference correlation functions measured within the ALFALFA footprint. This ratio is largely free of sample variance. To obtain the bias factor of each target sample from this ratio, we can use the bias factor inferred from the reference–reference correlation function measured in a much larger volume (within the SDSS footprint; see below). In this step, the sample variance from the clustering of the reference sample still exists, but with a smaller magnitude as a result of the larger volume. Overall, with the above procedure, we can achieve measurements of bias factors for the target H i galaxy samples with a smaller sample variance effect.

We apply the cross-correlation technique as follows. First, we construct a volume-limited tracer sample in the SDSS Main galaxy sample, which consists of 27,411 galaxies with r-band absolute magnitudes $-19.5\lt {M}_{r}\lt -18.5$ and redshifts $0.0025\lt z\lt 0.05$. Next, a reference sample is selected from the above tracer sample by applying the geometry of the ALFALFA survey. For a given H i-selected sample, we assign a weight of ${n}_{{\rm{H}}{\rm{I}}}(z)/{n}_{\mathrm{reference}}(z)$ to each galaxy in the reference sample to ensure the same redshift distribution as in the H i sample. Finally, the counterpart of each reference sample in the full SDSS footprint is also similarly constructed. As the SDSS Main sample covers only three narrow stripes in the SGC with a relatively small sky coverage, we concentrate on the NGC for both the ALFALFA target and SDSS reference samples when measuring the cross-correlation functions.

The cross-correlation function between a given H i-selected sample and the corresponding reference sample is estimated using the extended Landy–Szalay estimator as in Zehavi et al. (2002) and Guo et al. (2012),

Equation (7)

where the subscripts "1" and "2" stand for the H i-selected sample and the reference sample, respectively, and ${\rm{D}}$ and ${\rm{R}}$ stand for real and random samples, respectively. The $1/{V}_{\max }$ weight is similarly applied as in Equations (2)–(4), and the galaxy weights in the reference sample are accounted for in doing the pair count. Similarly, we measure the 2PCF of the reference sample. For both the target–reference and reference–reference correlation functions, we integrate $\xi ({r}_{{\rm{p}}},{r}_{\pi })$ along the LOS up to ${r}_{\pi ,\max }=20\,{h}^{-1}\,\mathrm{Mpc}$ to obtain the projected 2PCFs. The relative bias, $b/{b}_{\mathrm{REF}}$, between a given H i sample and the reference sample is expected to follow

Equation (8)

where ${w}_{{\rm{p}},{\rm{H}}{\rm{I}}\times \mathrm{REF}}$ and ${w}_{{\rm{p}},\mathrm{REF}\times \mathrm{REF}}$ denote the projected target–reference and reference–reference two-point correlation functions, respectively. To determine the relative bias from the data, we use the ${w}_{{\rm{p}},{\rm{H}}{\rm{I}}\times \mathrm{REF}}$ and ${w}_{{\rm{p}},\mathrm{REF}\times \mathrm{REF}}$ measurements at ${r}_{{\rm{p}}}\gt 5\,{h}^{-1}\,\mathrm{Mpc}$ and find the best-fitting value of $b/{b}_{\mathrm{REF}}$ by maximizing the likelihood function,

Equation (9)

where

Equation (10)

The quantity with (without) a superscript "*" is the one from the model of Equation (8) (data). The covariance matrix depends on $q=b/{b}_{\mathrm{REF}}$ as

Equation (11)

where ${{\boldsymbol{C}}}_{{\rm{H}}{\rm{I}}\times \mathrm{REF}}$ is the covariance matrix of ${w}_{{\rm{p}},{\rm{H}}{\rm{I}}\times \mathrm{REF}}$, ${{\boldsymbol{C}}}_{\mathrm{REF}\times \mathrm{REF}}$ that of ${w}_{{\rm{p}},\mathrm{REF}\times \mathrm{REF}}$, and ${{\boldsymbol{C}}}_{{\rm{H}}{\rm{I}},{REF}}$ the covariance between ${w}_{{\rm{p}},{\rm{H}}{\rm{I}}\times \mathrm{REF}}$ and ${w}_{{\rm{p}},\mathrm{REF}\times \mathrm{REF}}$.

We also measure the projected 2PCF of the counterpart of each reference sample constructed within the SDSS footprint and infer ${b}_{\mathrm{REF}}$ by comparing to the projected 2PCF of matter. With the ratio $b/{b}_{\mathrm{REF}}$ and ${b}_{\mathrm{REF}}$, we end up with an inference of the bias factor b for each H i galaxy sample. The results are shown as squares in the left panel of Figure 8 below. The mass dependence is clear for galaxies with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$, while at lower H i mass the dependence disappears but with large error bars. The result will be discussed later with bias factors inferred from other methods.

3.3.4. Comparison with Mock Catalogs

The effect of sample variance can be evaluated accurately using N-body simulations with constrained initial conditions to simulate the dark matter distribution in the local universe as proposed in Zavala et al. (2009). For the current work we have constructed a set of 64 mock galaxy catalogs that have the same geometry as the ALFALFA sample (see detailed description in the next section). We do not attempt to obtain and apply constrained initial conditions to our simulation, which needs substantially more work and is not necessary for our purpose. In order to test whether the apparent clustering dependence on the H i mass at the high-mass end could be weakened by potential underestimates of the measurement errors from the jackknife resampling, we select 64 different subvolumes of the SMDPL simulation box and randomly place a virtual observer in each of them. We construct a mock catalog from each subvolume by mimicking the geometry of the ALFALFA survey, and we measure ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ in the same way as above. The correlation function errors caused by sample variance are then estimated from the variations among these mock catalogs. Figure 6 compares the average fractional diagonal errors of the projected 2PCFs as estimated from the jackknife resampling method (red circles) and those from the mock catalogs (blue squares) for the ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10}\,{M}_{\odot }$ sample. The estimates from the two methods agree well with each other at all scales except the largest scale probed (${r}_{{\rm{p}}}\sim 20\,{h}^{-1}\,\mathrm{Mpc}$), where the jackknife error is larger than the mock error by about 40%. We also find the same result when using other H i mass samples. This is in good agreement with previous studies (see Appendix B of Guo et al. 2013, for more details). This demonstrates that the jackknife errors are accurate on most scales probed here, from $\sim 20\,{h}^{-1}\,\mathrm{kpc}$ up to about $20\,{h}^{-1}\,\mathrm{Mpc}$, but more conservative at larger scales.

Figure 6.

Figure 6. Comparison between the average fractional errors of the projected 2PCFs using the jackknife resampling method (red circles) and those estimated from mock galaxy catalogs (blue squares) constructed from subvolumes of a large simulation box for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10}\,{M}_{\odot }$.

Standard image High-resolution image

The finite sample volume may cause not only the variance from measurement to measurement but also a systematic underestimate of the 2PCF by a constant value, known as the "integral constraint." We use the mock catalogs constructed above to quantify this effect, by comparing the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ averaged from different mocks with the one measured from the whole simulation box. The difference between the two ${w}_{{\rm{p}}}$ measurements on large scales is a measure of the integral constraint, which is $\sim 0.6\,{h}^{-1}\,\mathrm{Mpc}$ for the sample with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$. The difference increases to $\sim 1.5\,{h}^{-1}\,\mathrm{Mpc}$ if only the SGC is considered, explaining the slightly larger clustering amplitudes from the full-area sample than from the NGC alone, as shown in Figure 5. For samples with lower threshold masses, the effective volume drops and the bias factor decreases, which have opposite effects on the magnitude of the integral constraint. We find that the integral constraint shows a mild increase toward low-mass samples, and the data point at $\sim 8\,{h}^{-1}\,\mathrm{Mpc}$ in each top panel of Figure 3 is the only one substantially affected by the integral constraint (the correction is estimated to be at the level of ${\rm{\Delta }}{w}_{{\rm{p}}}=2.5\,{h}^{-1}\,\mathrm{Mpc}$ for the low H i mass samples from mock tests described in the next section). The differences seen between NGC and SGC measurements (Figure 5) are mainly caused by sample variance. Therefore, the H i mass dependence of the clustering on large scales and at high masses as observed above is robust.

3.4. Redshift-space Monopole Moments and the Biasing of the H i-selected Galaxies

In addition to the measurements of projected 2PCFs as presented above, we have also measured the monopole moment ${\xi }_{0}(s)$ of the redshift-space 3D 2PCF for our H i-selected samples, from which we aim to estimate the galaxy bias factor as a function of H i mass. In principle, ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ is preferred over ${\xi }_{0}(s)$ for the purpose of measuring galaxy bias, as the RSD is expected to be minimized in the former. In our case, however, we have adopted a relatively small integration depth with ${r}_{\pi ,\max }=20\,{h}^{-1}\,\mathrm{Mpc}$ in order to achieve the best S/N. This may result in a significant residual RSD effect on our ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements at large scales, leading to biased determination of the galaxy bias factors. Therefore, for the purpose of measuring the galaxy bias factors we opt for ${\xi }_{0}(s)$, which is not affected by the choice of ${r}_{\pi ,\max }$.

Following common practice, we first estimate the redshift-space 3D 2PCF in the form of $\xi (s,\mu )$ for each of our H i mass threshold samples, where ${\boldsymbol{s}}$ is the 3D separation in redshift space, with the amplitude defined by ${s}^{2}={r}_{{\rm{p}}}^{2}+{r}_{\pi }^{2}$, and μ is the cosine of the angle between ${\boldsymbol{s}}$ and the LOS. The redshift-space monopole moment is then obtained by integrating $\xi (s,\mu )$ over the full range of μ,

Equation (12)

We adopt linear bins for μ ranging from 0 to 1 with a constant interval of ${\rm{\Delta }}\mu =0.05$. On linear scales, the redshift-space monopole moment is related to the real-space 2PCF $\xi (r)$ through the following equation (Kaiser 1987; Hamilton 1992):

Equation (13)

where $\beta =f/{b}_{g}$ and ${\xi }_{\mathrm{DM}}$ is the real-space 2PCF of dark matter. The linear growth rate $f\simeq {{\rm{\Omega }}}_{{\rm{m}}}^{0.55}$ at z = 0 for our adopted cosmology. In practice, we use the measurements of ${\xi }_{0}(s)$ at $s\gt 5\,{h}^{-1}\,\mathrm{Mpc}$ to determine the value of $A\equiv {b}_{{\rm{g}}}^{2}(1\,+2\beta /3+{\beta }^{2}/5)$ by minimizing the ${\chi }^{2}$,

Equation (14)

where ${\boldsymbol{C}}$ is the full error covariance matrix of large-scale measurements of ${\xi }_{0}(s)$. The errors on ${b}_{{\rm{g}}}$ are estimated from the probability distribution with ${\rm{\Delta }}{\chi }^{2}=1$. In order to reliably measure the galaxy bias, we have also corrected for the integral constraint effect by using the mock galaxy catalogs (see details in Section 4.4).

Figure 7 displays the measurements of ${\xi }_{0}(s)$ for the different ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples. The corresponding monopole for dark matter distribution is plotted as a solid black line and repeated in every panel for comparison. For samples with mass thresholds below ${10}^{9}\,{M}_{\odot }$, we additionally show the result from the SGC as blue circles. At a given scale the monopole moment increases with increasing mass threshold, and the effect is similarly seen at all scales. This result is broadly consistent with the H i mass dependence of the projected 2PCFs, demonstrating that the mass dependence observed from the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements is not an artifact of the residual RSD effect.

Figure 7.

Figure 7. Similar to Figure 3, but for the redshift-space monopole moment ${\xi }_{0}(s)$. The filled red circles represent the measurements of ${\xi }_{0}(s)$ for the different ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples using the combined sample of the NGC and SGC, while the open blue circles for the low H i mass samples are measured from the SGC. The black solid line in each panel is the corresponding monopole moment predicted for the linear fluctuation field of dark matter (i.e., with a bias factor of b = 1).

Standard image High-resolution image

It is interesting that the SGC and the full area agree very well in the monopole moment measurement at all scales except the largest scales, where the ${\xi }_{0}(s)$ from the SGC becomes more noisy and drops quickly as a result of the limited sample volume. This finding suggests that, for small-volume samples like our H i-selected galaxy samples, the monopole moment is less affected by the sample variance when compared to the projected 2PCF. As discussed above, the sample variance effect in ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$, manifested as the enhancement at intermediate scales in Figure 5, is dominated by the Virgo supercluster in the NGC, which effectively boosts the number of galaxy–galaxy pairs, but with a broadened distribution of the LOS pair separation. It thus enhances the LOS integration of $\xi ({r}_{{\rm{p}}},{r}_{\pi })$, leading to higher ${w}_{{\rm{p}}}$ even at small ${r}_{{\rm{p}}}$. In the case of the monopole moment, the galaxy pairs in Virgo with large LOS separations contribute exclusively to the large-scale measurement, with little effect on the intermediate to small scales. The monopole moments thus provide an additional robustness test on our finding of the H i mass dependence of galaxy clustering, which was obtained above from the measurements of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$.

The galaxy bias factor ${b}_{{\rm{g}}}(\gt {M}_{{\rm{H}}{\rm{I}}})$ inferred from Equation (13) is plotted as circles in the left panel of Figure 8 as a function of H i mass threshold. The bias factor increases nearly linearly with increasing ${M}_{{\rm{H}}{\rm{I}}}$ threshold, ranging from ∼0.5 at the lowest mass ($\sim {10}^{8.2}\,{M}_{\odot }$) up to ∼0.9 at the highest mass ($\sim {10}^{10.2}\,{M}_{\odot }$). Also shown in the panel are the bias factors estimated based on the galaxy-to-matter ${w}_{{\rm{p}}}$ ratio (triangles) and the cross-correlation technique (squares). The solid curve corresponds to the best-fitting theoretical model (Section 4.4). The ones from cross-correlation and the best-fitting model are lower than other estimates above ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$. At the low-mass end, the large error bars prevent us from seeing a clear trend. Overall the different inferences appear to be consistent, given the error bars.

Figure 8.

Figure 8. Left: galaxy bias factor ${b}_{{\rm{g}}}(\gt {M}_{{\rm{H}}{\rm{I}}})$ inferred from different methods for different ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples. The methods include galaxy-to-matter ${w}_{{\rm{p}}}$ ratio (triangles), cross-correlation with an SDSS reference sample (squares), redshift-space monopole moment (Equation (13)) (circles), and results from an extended SHAM model fit (solid curve). See the text for details. The red (black) horizontal dotted line represents the minimum halo bias factor from the mass (${V}_{\mathrm{peak}}$) dependent dark matter halo clustering. Right: halo bias factor ${b}_{{\rm{h}}}$ as a function of halo mass (dotted curve; Tinker et al. 2005) or ${V}_{\mathrm{peak}}$ (solid curve).

Standard image High-resolution image

For comparison, we show in the right panel of Figure 8 the bias values ${b}_{{\rm{h}}}$ for dark matter halos. The dotted curve shows ${b}_{{\rm{h}}}$ as a function of halo mass from the fitting formula of Tinker et al. (2005), and the solid curve shows ${b}_{{\rm{h}}}$ as a function of ${V}_{\mathrm{peak}}$ from the SMDPL simulation. The halo-mass-dependent bias factor approaches a plateau of ∼0.63 at low mass, and the ${V}_{\mathrm{peak}}$-dependent bias factor reaches a minimum of ∼0.75 around tens of kilometers per second. These two bias values are marked with the red and black dotted lines in the left panel. The galaxy samples with low H i masses show a hint of bias factors lower than expected from low-mass or low-${V}_{\mathrm{peak}}$ halos. It indicates that it would be hard to interpret the clustering of H i galaxies with simple halo-based models. If the result persists with better data from future surveys, it would be remarkable. In Section 4, we will discuss the implications.

3.5. Section Summary

Before going to the next section for theoretical modeling, we briefly summarize our observational results. We have estimated the projected 2PCF ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$, the projected cross-correlation with a reference sample, and the monopole moment ${\xi }_{0}(s)$ of the redshift-space 2PCF, for the 13 ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples. The three types of measurements can all be used to estimate galaxy bias factors. While they suffer from different systematics (e.g., residual RSD effect with the projected correlation functions and inaccuracy of the Kaiser formula in the weakly nonlinear regime with the monopole method), the inferred galaxy bias factors are in broad agreement. We find that the galaxy bias factor increases nearly linearly with increasing threshold ${M}_{{\rm{H}}{\rm{I}}}$. There is also a hint that galaxy bias factors for low H i mass samples are lower than expected from low-mass (or low-${V}_{\mathrm{peak}}$) halos. In the next section, we perform modeling based on the ${w}_{{\rm{p}}}$ measurements. At low H i masses those measurements are affected by sample variance, but as we have demonstrated, the effect is properly included in the error estimation. This results in relatively large errors in ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements for low-mass samples, limiting their constraining power on the theoretical models. For this reason one may want to drop the low-mass samples or the galaxies in the NGC when doing the modeling. However, as pointed out by Norberg et al. (2011), any non-Bayesian massaging of the data is not recommended in clustering analyses, unless one knows the true answer. In fact, as we will show, using only the SGC data would not significantly change our model parameters.

4. Modeling the Clustering of H i-selected Galaxies

With the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements of the ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples, we perform theoretical modeling to investigate the connection between H i galaxies and dark matter halos. We first discuss the difficulty with the commonly adopted form of the HOD model to explain the measurements, and we show that the SHAM model in its simplest form is unable to reproduce the measurements, either. We then extend the SHAM model by including additional parameters related to the assembly history of halos/subhalos. Finally, we present the modeling results based on an extended SHAM model that incorporates halo/subhalo formation time.

4.1. Modeling with the HOD Model

Statistical models of galaxy distribution have been developed to link galaxies of different properties (e.g., luminosity, color, stellar mass, and star formation rate [SFR]) to dark matter halos. These include the halo occupation distribution model (HOD; e.g., Jing 1998; Peacock & Smith 2000; Seljak 2000; Scoccimarro et al. 2001; Berlind & Weinberg 2002; Zheng et al. 2005, 2009; Geach et al. 2012; Leauthaud et al. 2012; Guo et al. 2014b, 2015b; Skibba et al. 2015; Zu & Mandelbaum 2015), the conditional luminosity function model (CLF; e.g., Yang et al. 2003, 2004, 2012; van den Bosch et al. 2013), the SHAM model (e.g., Kravtsov et al. 2004; Vale & Ostriker 2004; Conroy et al. 2006; Shankar et al. 2006; Vale & Ostriker 2006; Wang et al. 2007b; Behroozi et al. 2010; Guo et al. 2010; Moster et al. 2010; Nuza et al. 2013; Rodríguez-Puebla et al. 2013; Sawala et al. 2015), and the halo-based empirical model (e.g., Lu et al. 2014, 2015).

We first make an attempt to model the measurements with the HOD model. For a threshold galaxy sample, the central galaxy occupation is commonly parameterized as (Zheng et al. 2005, 2007)

Equation (15)

where $\mathrm{erf}$ is the error function, ${M}_{\min }$ is the cutoff halo mass of central galaxies at which the occupation number $\langle {N}_{\mathrm{cen}}({M}_{\min })\rangle =0.5$, and ${\sigma }_{\mathrm{log}M}$ characterizes the width of the cutoff profile. The mean occupation function of satellites follows a power law modified by a low-mass cutoff (Zheng et al. 2005, 2007),

Equation (16)

and the number of satellites at fixed halo mass is assumed to follow a Poisson distribution. This five-parameter (${M}_{\min }$, ${\sigma }_{\mathrm{log}M}$, M0, ${M}_{1}^{{\prime} }$, and α) model works well for luminosity-threshold or stellar-mass-threshold samples (e.g., Zehavi et al. 2011).

When we apply the above HOD form to model the ${w}_{{\rm{p}}}$ measurements of the H i galaxies, we find that we could not achieve good fits, with the model ${w}_{{\rm{p}}}$ having too high amplitudes. The main reason lies in the tension between galaxy number density and clustering amplitude (bias factor). For each galaxy sample, the number density of galaxies roughly determines a halo mass threshold, and halos above such a threshold appear to have higher clustering amplitude than galaxies.

One way to reduce the tension is to introduce a duty cycle parameter ${f}_{\mathrm{dc}}$ (i.e., by multiplying this parameter by the above mean occupation function), which can account for the fact that only a fraction of halos at fixed mass can host the observed H i galaxies. With the duty cycle parameter, we can populate H i galaxies into lower-mass halos (with lower bias factor) while still matching the galaxy number density. Indeed, we are able to obtain reasonable fits. However, the fitting results for low-${M}_{{\rm{H}}{\rm{I}}}$ samples appear to be unphysical. For example, the value of ${M}_{\min }$ for the ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ sample is $1.3\times {10}^{9}\,{M}_{\odot }$. That is, about half of the baryons associated with halos of ${M}_{\min }$, in the sense of applying a global baryon fraction, are in the form of H i gas. For the ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ sample, the result is even more absurd, with ${M}_{\min }=1.2\times {10}^{9}\,{M}_{\odot }$ (i.e., an ${M}_{\min }$ halo would be almost wholly made of H i gas).

Given the difficulty in interpreting the clustering measurements of H i galaxies with a simple HOD parameterization, we turn to the SHAM model.

4.2. Modeling with the SHAM Model

With the SHAM model, we adopt the simplest form as our starting point. The SHAM model assumes that the stellar mass (or luminosity) of a galaxy is an increasing function of the maximum mass or circular velocity ever attained by its halo and derives the relation between stellar mass and halo mass by matching the abundance of galaxies above a given stellar-mass threshold with that of halos/subhalos above a threshold in maximum halo mass or circular velocity. The result agrees well with the stellar–halo mass relation established for central galaxies in groups/clusters in the low-redshift universe (see, e.g., Conroy et al. 2009; Behroozi et al. 2010; Guo et al. 2010; Reddick et al. 2013). Recently the subhalo age distribution matching (SADM) has been proposed as an extension to SHAM to further model galaxy color at fixed stellar mass, assuming that redder galaxies are hosted by halos/subhalos that are formed earlier (Hearin & Watson 2013). These simple models can reasonably reproduce the dependence of clustering on stellar mass (or luminosity) and color (see, e.g., Figure 3 of Hearin & Watson 2013), suggesting a possible connection between galaxy assembly and halo assembly.

We start with the SHAM model, attempting to populate the dark matter halos in the SMDPL simulation with galaxies of different H i masses. We consider two halo parameters: the peak circular velocity (${V}_{\mathrm{peak}}$) over the entire merger history of the (sub)halo, and the halo mass. Following common practice, we use the current halo mass for a main halo supposed to host a central galaxy, and the mass at the last epoch of accretion for subhalos supposed to host satellite galaxies, both referred to as ${M}_{\mathrm{acc}}$ in what follows. We rank-order all the halos/subhalos in the SMDPL simulation by either ${V}_{\mathrm{peak}}$ or ${M}_{\mathrm{acc}}$ and populate halos/subhalos of larger ${V}_{\mathrm{peak}}$ or ${M}_{\mathrm{acc}}$ with galaxies of higher ${M}_{{\rm{H}}{\rm{I}}}$. In Figure 9, the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ predicted by this simple model is compared with the observational measurements for three H i mass thresholds. Results from the ${V}_{\mathrm{peak}}$- and ${M}_{\mathrm{acc}}$-based models are shown as blue and red curves, respectively. Both models predict too strong clustering on all scales and at all H i mass thresholds, except the lowest mass threshold (${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$), for which the ${M}_{\mathrm{acc}}$-based model roughly matches the data. However, this agreement should not be overemphasized: as discussed in the previous section, the enhanced clustering at intermediate scales as seen in the low-mass samples is likely a biased measurement caused by sample variance (see the measurements with the SGC data).

Figure 9.

Figure 9. Projected 2PCFs from two SHAM models with ${V}_{\mathrm{peak}}$ (blue lines) and ${M}_{\mathrm{acc}}$ (red lines) as the halo properties for three H i-selected galaxy samples. The symbols with errors are the data measurements of ALFALFA samples. Blue triangles in the left panel represent measurements from the SGC data alone.

Standard image High-resolution image

In the simplest form of the SHAM model we adopt, no scatter is introduced between galaxy H i mass and halo/subhalo mass (or ${V}_{\mathrm{peak}}$). The failure of the model for the high-${M}_{{\rm{H}}{\rm{I}}}$ samples can be alleviated by introducing such a scatter, which allows a faction of H i galaxies to be populated into lower-mass/${V}_{\mathrm{peak}}$ halos (which have lower halo bias and weaker clustering). We find, however, that a simple inclusion of scatter up to 2 dex in stellar mass does not help for high-${M}_{{\rm{H}}{\rm{I}}}$ samples, presumably caused by the tension between the requirements of halo mass/${V}_{\mathrm{peak}}$ thresholds from galaxy number density and low galaxy clustering amplitude. Neither does scatter help for low-${M}_{{\rm{H}}{\rm{I}}}$ samples, as the large-scale halo bias factor approaches a plateau/minimum toward low mass/${V}_{\mathrm{peak}}$ (${b}_{{\rm{h}}}({M}_{{\rm{h}}})\sim 0.63$ at halo mass below $\sim {10}^{11}\,{M}_{\odot }$, and ${b}_{{\rm{h}}}({V}_{\mathrm{peak}})$ has a minimum of 0.75, as shown in Figure 8). As the galaxy bias factor ${b}_{g}(\gt {M}_{{\rm{H}}{\rm{I}}})$ tends to go below such values for low H i mass threshold samples (left panel of Figure 8), the SHAM model with scatter cannot explain the measured clustering for low ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples. This remains true for any model that assigns H i galaxies based only on the mass of dark matter halos, even if those galaxies are populated into a randomly selected subpopulation of halos. This implies that halos of similar dark matter masses are not randomly populated by H i-selected galaxies. Therefore, to properly model the H i-rich galaxies, one needs to introduce additional halo parameters that are related to the cold gas supply of galaxies, and those associated with the assembly history of halos are natural candidates.

4.3. Extending the SHAM Model by Introducing Additional Halo Parameters

At a fixed halo mass, halo clustering depends on the assembly history (known as the assembly bias), as originally found by Gao et al. (2005) using the Millennium Simulation (Springel et al. 2005) and confirmed by many follow-up studies (see, e.g., Wechsler et al. 2006; Zhu et al. 2006; Gao & White 2007; Jing et al. 2007; Wang et al. 2007a; Dalal et al. 2008; Li et al. 2008; Lacerna & Padilla 2011; Hearin et al. 2016). Halo parameters related to halo assembly history include halo formation time, halo concentration, and halo spin parameter. For halos at a fixed mass below the nonlinear mass for collapse (about $6.5\times {10}^{12}\,{M}_{\odot }$ for our adopted cosmology), those with earlier formation time, higher concentration, or higher spin are more strongly clustered (e.g., Wechsler et al. 2006; Gao & White 2007; Jing et al. 2007). In fact, halo formation time is adopted in the aforementioned SADM model as a parameter to characterize the epoch when the cold gas supply of star formation is deprived and explain the color dependence of clustering for galaxies at fixed luminosity or stellar mass (Hearin & Watson 2013). While halo formation time can be well considered as an additional parameter in this work, we also make attempts with halo concentration and spin.

The formation time of a halo or subhalo is defined as the redshift ${z}_{1/2}$ at which the halo mass first reaches half of the peak value over the whole merger history. The concentration parameter c is given by the ratio between the virial radius ${R}_{\mathrm{vir}}$ and the scale radius ${R}_{{\rm{s}}}$ of the halo. The halo spin parameter λ is defined as (Peebles 1969)

Equation (17)

where J and E are the magnitude of the angular momentum and the total energy of the halo, respectively. All three parameters are available in the catalog of ROCKSTAR halos (Behroozi et al. 2013b) for the simulation we use.

We extend the SHAM model by including one of the three halo assembly parameters in addition to the peak circular velocity ${V}_{\mathrm{peak}}$. Given that the data do not require a sophisticated model yet, we develop a simple procedure to perform the modeling with the extended SHAM. For a given H i-selected galaxy sample, we first preselect halos/subhalos according to the chosen halo assembly parameter. Then with such a selected subpopulation of halos we perform the abundance matching with ${V}_{\mathrm{peak}}$ (assuming no scatter). The halos/subhalos are preselected by imposing an upper bound for the assembly parameter (${z}_{1/2}$, c, or λ), as the samples are essentially in the halo mass regime in which halos with a lower assembly parameter show weaker clustering. By selecting halos with a low assembly parameter, it is possible to cross the plateau in the halo bias–mass relation and thus to explain the low clustering amplitude in the low ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples.

In our simple model, the only free parameter is the upper bound for the chosen halo assembly parameter. We note that the change in halo bias by varying this upper bound comes from a combination of the assembly effect and ${V}_{\mathrm{peak}}$ effect on halo bias. For example, halos of a higher mass on average form later and cluster more strongly. If we lower the upper bound on ${z}_{1/2}$, we select low-mass halos with lower bias (caused by the assembly effect), and at the same time, we also select relatively more halos of higher ${V}_{\mathrm{peak}}$ (higher bias) as they form later. Lowering the bias by the assembly effect by lowering the ${z}_{1/2}$ threshold can eventually be balanced by the inclusion of relatively more halos of higher ${V}_{\mathrm{peak}}$, leading to a lower limit for the clustering amplitude in the model. Similarly, the model also has an upper limit for the clustering amplitude. Our model can be regarded as introducing a specific way of populating galaxies into halos as a function of ${V}_{\mathrm{peak}}$ and the chosen assembly parameter.

In Figure 10 we show the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ from the two extreme models (red and blue curves) with each of the three assembly parameters (${z}_{1/2}$, c, or λ), predicting the minimum and maximum large-scale biases at a given ${M}_{{\rm{H}}{\rm{I}}}$ threshold (${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,M$ and ${10}^{10}\,{M}_{\odot }$ in the top and bottom panels, respectively). The data points in each panel show the measurements (as in Figure 3). Qualitatively, the clustering in the model is considerably weaker for smaller thresholds of ${z}_{1/2}$, c, or λ, and the effect is seen on all scales and for all three halo assembly parameters. This is consistent with the assembly bias effect. Quantitatively, the model based on the halo formation time (${z}_{1/2}$) can reasonably well fit the clustering measurements of both H i-selected galaxy samples. The model with the concentration parameter reproduces the clustering at intermediate to small scales for both samples, but predicts too high clustering amplitudes at scales above a few megaparsecs for the low-${M}_{{\rm{H}}{\rm{I}}}$ sample. The model with the spin parameter generally fails to reproduce the data at all scales and H i masses. Given the results from the above investigation, in what follows we limit our study to the model with halo formation time.

Figure 10.

Figure 10. Models taking into account halo assembly parameter for the samples of ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ (top panels) and ${10}^{10}\,{M}_{\odot }$ (bottom panels), respectively. The symbols in each panel are the measurements in Figure 3, while the red and blue solid lines are the corresponding models with the minimum and maximum values of large-scale bias, respectively (see the text for details).

Standard image High-resolution image

4.4. The SHAM Model with Halo Formation Time Thresholds

For each of the 13 ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples, we obtain the best-fitting threshold of halo formation time ${z}_{1/2}$ by minimizing the ${\chi }^{2}$,

Equation (18)

where ${\boldsymbol{C}}$ is the full error covariance matrix of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$. As we only have one free parameter, the degree of freedom (dof) is calculated as $\mathrm{dof}={N}_{\mathrm{data}}-1$, where ${N}_{\mathrm{data}}$ is the number of data points in the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurement.

Since the ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples are not independent of each other by selection, the modeling cannot be done independently for each sample. We adopt a straightforward method that models the different ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples coherently. We start with the most massive ${M}_{{\rm{H}}{\rm{I}}}$ threshold sample, applying our model to obtain a best-fitting ${z}_{1/2}$ threshold for the sample, and we assign an H i mass to each halo or subhalo that has a formation time later than the ${z}_{1/2}$ threshold by abundance matching ${M}_{{\rm{H}}{\rm{I}}}$ to ${V}_{\mathrm{peak}}$. We then successively apply the same procedure for lower ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples. For each sample, we keep the H i mass fixed for halos that are already assigned an H i mass in the previous samples, thus assigning an H i mass only to halos that are additionally selected by the ${z}_{1/2}$ threshold of the current sample. After this process is done for all the samples, we essentially obtain a mock galaxy catalog in the simulation box that reproduces the clustering of all the H i-selected galaxies with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$.

In order to appropriately take into account the effect of integral constraint, we have constructed 64 mock catalogs that have the same survey geometry as the ALFALFA sample. To this end, we divide the simulation box into 64 subvolumes, place the virtual observer at the center of each subvolume, and construct a mock catalog out of each subvolume by applying our best-fit model. The LOS velocity of each (sub)halo hosting a mock galaxy is used to account for the RSD effect. We have also applied the W50-dependent line flux limit ${S}_{\mathrm{int}}$ in ALFALFA by using the H i mass and the associated line profile width W50 for each mock galaxy. The average ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ from the 64 mock catalogs is adopted as the model prediction for a given H i mass threshold.

Figure 11 compares the observed ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ (red symbols with error bars) with the model prediction (blue solid lines) for 12 ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples. For comparison, we show the measurements of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ from the SGC alone, as the blue symbols with error bars, for which we have corrected the effect of the integral constraint according to the average correction estimated from the mock catalogs (see Section 3.3.4). The model reproduces the observed ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ for H i-selected samples with thresholds above ${M}_{{\rm{H}}{\rm{I}}}\sim {10}^{9}\,{M}_{\odot }$. For lower masses, the data points are systematically higher than the model at intermediate scales, which can be largely attributed to the sample variance in the observational sample, as discussed in the previous section. Thus, the discrepancy between the data and model at these low masses should not be overemphasized given the large errors in the observed ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$. Because of the large errors, the model parameters at the low masses are poorly constrained, as we will discuss below.

Figure 11.

Figure 11. Best-fitting model predictions (blue solid lines) of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ for the different ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples as in Figure 3. We also show the measurements of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ from the SGC as the blue open circles in each panel of the samples with the H i mass threshold lower than ${10}^{9}\,{M}_{\odot }$, where the effect of integral constraint has been corrected using the differences between the mock measurements in the SGC and the whole sample.

Standard image High-resolution image

It is interesting to see that, although the model parameters are constrained by the whole sample at a given ${M}_{{\rm{H}}{\rm{I}}}$ threshold, the predicted ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ appears to match very well with the data points from the SGC. In fact, we have tested that both the best-fit model parameters and the predicted ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ remain almost unchanged if we perform the fit with only the SGC data instead of the whole sample for the H i mass thresholds below ${10}^{9}\,{M}_{\odot }$. As we use a global halo population in our model, it cannot offer the flexibility to account for the sample variance effect, and this contributes to the insensitivity of the results to the measurements. The sample variance effect should also be reflected in the covariance matrix, which can further contribute to the above insensitivity. That is, in contrast to the measurements, the modeling results are less affected by the sample variance effect. On the other hand, the agreement between the model and the SGC data reinforces the conjecture that the SGC is less affected by sample variance and provides more reasonable measurements of ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ at low H i masses.

Table 2 lists the best-fitting model parameters for all the ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples, including the H i mass threshold of the sample, the reduced ${\chi }^{2}$, the minimum value of ${V}_{\mathrm{peak}}$, the halo formation time threshold, and the satellite fraction ${f}_{\mathrm{sat}}$. The halo formation time threshold is given in terms of the scale factor, ${a}_{1/2}\equiv 1/(1+{z}_{1/2})$. The errors on the halo formation time are given by the variation of ${z}_{1/2}$ around the best-fitting value with ${\rm{\Delta }}{\chi }^{2}=1$. The satellite fraction is around 10%, but slightly larger at the lowest masses.

Table 2.  Best-fitting Models for Different ${M}_{{\rm{H}}{\rm{I}}}$ Threshold Samples

Sample ${\chi }^{2}/\mathrm{dof}$ ${V}_{\mathrm{peak}}$ ${a}_{1/2}$ ${f}_{\mathrm{sat}}$ (%)
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8.0}{M}_{\odot }$ 8.64/9 37.38 0.43 ± 0.07 11.45
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8.2}{M}_{\odot }$ 6.80/9 39.87 0.45 ± 0.08 11.92
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8.4}{M}_{\odot }$ 4.53/9 41.56 0.39 ± 0.04 12.48
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8.6}{M}_{\odot }$ 5.96/9 41.56 0.41 ± 0.06 12.11
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8.8}{M}_{\odot }$ 7.36/9 41.56 0.48 ± 0.06 11.37
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9.0}{M}_{\odot }$ 7.97/11 41.56 0.53 ± 0.06 11.02
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9.2}{M}_{\odot }$ 7.31/11 50.44 0.53 ± 0.06 10.85
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9.4}{M}_{\odot }$ 9.34/11 60.62 0.55 ± 0.02 10.63
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9.6}{M}_{\odot }$ 13.06/11 73.87 0.56 ± 0.02 10.54
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9.8}{M}_{\odot }$ 10.90/11 86.41 0.59 ± 0.04 10.23
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10.0}{M}_{\odot }$ 12.10/11 110.69 0.62 ± 0.02 9.13
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10.2}{M}_{\odot }$ 6.85/10 131.14 0.70 ± 0.02 8.23
${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10.4}{M}_{\odot }$ 14.37/10 150.78 0.80 ± 0.01 10.29

Note. The best-fitting halo formation time ${a}_{1/2}$ (in terms of scale factor) for different ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples. The ${\chi }^{2}/\mathrm{dof}$, minimum value of ${V}_{\mathrm{peak}}$ (in units of $\,\mathrm{km}\,{{\rm{s}}}^{-1}$), and satellite fraction ${f}_{\mathrm{sat}}$ for each model are also shown.

Download table as:  ASCIITypeset image

Figure 12 (left panel) displays the halo formation time threshold ${a}_{1/2}$ as a function of the ${M}_{{\rm{H}}{\rm{I}}}$ threshold. Generally, ${a}_{1/2}$ increases with increasing ${M}_{{\rm{H}}{\rm{I}}}$, ranging from ∼0.4 for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ up to ∼0.8 for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10.4}\,{M}_{\odot }$, which correspond to ${z}_{1/2}\sim 1.5$ and ∼0.25, respectively. The halo mass dependence of ${a}_{1/2}$ is relatively weak at masses below $\sim {10}^{10}\,{M}_{\odot }$ and becomes strong at high masses. For comparison we show the average value of ${a}_{1/2}$ and the $1\sigma $ scatter for halos selected with the same ${V}_{\mathrm{peak}}$ cuts as in our model but without the additional selection by ${a}_{1/2}$. Overall, our model selects younger-than-average halos to host the H i-selected galaxies, with larger differences in ${a}_{1/2}$ at higher H i masses. In the right panel of the same figure we show the ${a}_{1/2}$ threshold as a function of the galaxy number density predicted by our model, which can be described by a power-law relation of ${a}_{1/2}\,=0.682\,{[{n}_{{\rm{g}}}/({10}^{-3}{h}^{3}{\mathrm{Mpc}}^{-3})]}^{-0.088}$. We note that in this fitting formula, ${a}_{1/2}$ exceeds unity at ${n}_{{\rm{g}}}\lt 1.3\times {10}^{-5}\,{h}^{3}\,{\mathrm{Mpc}}^{-3}$; thus, the formula is not applicable for extremely low densities.

Figure 12.

Figure 12. Left: halo formation time threshold ${a}_{1/2}$ as a function of the ${M}_{{\rm{H}}{\rm{I}}}$ thresholds of the galaxy samples. The circles show the best-fitting model predictions. The black solid line is the average halo formation time for the halos selected with the same ${V}_{\mathrm{peak}}$ cut as the best-fitting models but without applying the ${a}_{1/2}$ cut. The shaded area is the $1\sigma $ scatter around the mean. Right: halo formation time threshold ${a}_{1/2}$ as a function of galaxy number density. The red solid line is a power-law fit of ${a}_{1/2}=0.682\,{[{n}_{{\rm{g}}}/({10}^{-3}{h}^{3}{\mathrm{Mpc}}^{-3})]}^{-0.088}$.

Standard image High-resolution image

5. Model Implications and Comparisons with the Literature

With our best-fit model, we can derive various properties and relationships for H i-selected galaxies. We compare the predictions from our model with those from previous work based on different methodologies.

5.1. H i–Halo Mass Relation

Our modeling results give the galaxy–halo relation for a series of ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples, from which we can construct the galaxy–halo relation from galaxies in narrow ${M}_{{\rm{H}}{\rm{I}}}$ bins. Figure 13 displays the ${M}_{{\rm{H}}{\rm{I}}}$ versus halo mass relation as inferred for the central galaxies in our best-fitting model (circles with error bars). We use the halo mass at which the distribution of ${M}_{{\rm{h}}}$ is peaked as the typical halo mass for each ${M}_{{\rm{H}}{\rm{I}}}$ bin. The error bars on ${M}_{{\rm{h}}}$ indicate the mass range of halos that host 68.3% of the H i galaxies in the ${M}_{{\rm{H}}{\rm{I}}}$ bin. The large upper error bar, as seen particularly at large ${M}_{{\rm{H}}{\rm{I}}}$, indicates both that the host halos of H i-rich galaxies span a wide range of halo mass and that H i-rich galaxies are preferentially found in relatively low mass halos at a given ${M}_{{\rm{H}}{\rm{I}}}$.

Figure 13.

Figure 13. H i–halo mass relation for central galaxies. We compare our best-fitting model (open circles with error bars) with six different models, including a halo-based semiempirical model of Popping et al. (2015); three SAMs of Fu et al. (2013), Popping et al. (2014), and Kim et al. (2017); and two hydrodynamical simulation models from EAGLE (Crain et al. 2017) and Illustris (Vogelsberger et al. 2014). For convenience, we display the $1\sigma $ ranges of the H i–halo mass relations from different models as color bands. The dotted line is the prediction from combining the average H i–stellar and stellar–halo mass relations (see text).

Standard image High-resolution image

A noticeable result from our model is the nonmonotonic relation between ${M}_{{\rm{h}}}$ and ${M}_{{\rm{H}}{\rm{I}}}$, in the sense that ${M}_{{\rm{h}}}$ increases with ${M}_{{\rm{H}}{\rm{I}}}$ but drops suddenly at ${M}_{{\rm{H}}{\rm{I}}}\sim {10}^{8.7}\,{M}_{\odot }$, before increasing again at ${M}_{{\rm{H}}{\rm{I}}}\sim {10}^{9.1}\,{M}_{\odot }$. The overall relationship can be accurately described by a triple-power-law function as follows:

Equation (19)

The H i–halo mass relation may be empirically estimated from combining the H i–stellar mass relation (Huang et al. 2012) with the stellar–halo mass relation (Moster et al. 2010), both available from previous studies. The result is shown as the dotted line in Figure 13, which is a monotonically increasing function with relatively flat (steep) slope below (above) ${M}_{{\rm{H}}{\rm{I}}}\sim {10}^{9.5}\,{M}_{\odot }$. This relation agrees roughly but not exactly with our model for ${M}_{{\rm{H}}{\rm{I}}}\lt {10}^{9.5}\,{M}_{\odot }$ in terms of both amplitude and slope. At higher H i masses, this empirical relation predicts much higher halo masses than our model. This may be attributed to the scatter in the H i–stellar and stellar–halo mass relations, which are not considered here. The steep relation at the high-mass end is expected to flatten if the scatter is included, thus becoming more consistent with our model prediction.

The H i–halo mass relation has been obtained in previous studies in different ways. For comparison, Figure 13 shows the H i–halo mass relations predicted by six different models. These include the halo-based semiempirical model by Popping et al. (2015); three different semianalytical models (SAMs) of galaxy formation by Fu et al. (2013), Popping et al. (2014), and Kim et al. (2017); and two hydrodynamical simulations by Crain et al. (2017) (the EAGLE project) and Vogelsberger et al. (2014) (the Illustris project). For clarity, we only show the $1\sigma $ range and omit the average relation for each model. We have corrected the effect of different cosmological parameters adopted in the different models, scaling the halo masses by the ratio of ${{\rm{\Omega }}}_{{\rm{m}}}$ and the H i masses by the ratio of ${{\rm{\Omega }}}_{{\rm{b}}}$, both relative to the value adopted in this work.

The three SAMs are built on the halo merger trees from high-resolution N-body simulations and include the partitioning of cold gas into atomic and molecular phases by applying empirical prescriptions. It is thus not surprising to see the H i–halo mass relations from these models being quite similar to each other at H i masses above $\sim {10}^{8}\,{M}_{\odot }$. At the low-mass end, the SAMs of Fu et al. (2013) and Popping et al. (2014) agree well, but the SAM of Kim et al. (2017) predicts significantly higher halo masses, which may be attributed to the photoionization feedback additionally implemented in order to bring the model to better match the low-mass end of the observed H i mass function (HIMF). The empirical relation from Popping et al. (2015) was obtained for galaxies at z = 0 in two successive steps: first assigning an SFR to each galaxy using the fitting functions from Behroozi et al. (2013a), and then estimating both H i and H2 masses by the combination of an empirical molecular-gas-based SFR relation and a pressure-based molecular gas fraction relation. This result agrees broadly with the SAMs at H i masses below $\sim {10}^{10}\,{M}_{\odot }$. At higher masses, the halo mass predicted by Popping et al. (2015) increases more rapidly with increasing ${M}_{{\rm{H}}{\rm{I}}}$. The authors considered the scatter in the SFR–stellar mass relation but ignored the scatter in the gas-related relations, which might be part of the reason for the steep H i–halo mass relation at high masses.

The two hydrodynamic simulations present similar slopes but significantly different amplitudes in the H i–halo mass relation. For ${M}_{{\rm{H}}{\rm{I}}}\lt {10}^{9}\,{M}_{\odot }$ and at given ${M}_{{\rm{H}}{\rm{I}}}$, the halo mass of H i-selected galaxies in the EAGLE simulation is an order of magnitude higher than the halo mass in the Illustris simulation. In other words, at a fixed halo mass, the EAGLE simulation predicts too low H i mass in galaxies. At ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ the difference decreases with increasing ${M}_{{\rm{H}}{\rm{I}}}$, but still it is as large as ∼0.5 dex even at ${M}_{{\rm{H}}{\rm{I}}}\sim {10}^{10}\,{M}_{\odot }$. The SAMs and the empirical model of Popping et al. (2015) fall in between the EAGLE and Illustris simulations. More work is needed if we are to fully understand the reason behind the large discrepancy between the two simulations, but the relatively poor resolution of the EAGLE simulation could be part of the reason. Empirical models rather than physics models related to cold gas have had to be applied in the EAGLE simulation given the limited resolution, resulting in an HIMF significantly lower than the observed one, as shown in Crain et al. (2017, see their Figure 8; see also Figure 16 below).

Our model agrees quite well with the Illustris simulation for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$, where the clustering measurements are robust to the sample variance (see discussion in Section 3). At ${M}_{{\rm{H}}{\rm{I}}}\lt {10}^{8.7}\,{M}_{\odot }$, the Illustris simulation predicts a nearly constant halo mass, independent of ${M}_{{\rm{H}}{\rm{I}}}$. In contrast, the halo mass from our model still strongly depends on ${M}_{{\rm{H}}{\rm{I}}}$ at these low H i masses, a behavior that agrees with both the empirical model of Popping et al. (2015) and the SAMs of Fu et al. (2013) and Popping et al. (2014). However, we would like to point out that the agreement with these models at the low-mass end should not be overemphasized, again given the large uncertainties in the clustering measurements as extensively discussed above. Larger and deeper H i surveys are needed to reliably determine the low-mass end of the H i–halo mass relation.

5.2. Halo Occupation Distributions

Figure 14 displays the mean occupation distribution functions inferred from our best-fitting models for three typical ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ (blue lines), ${10}^{9}\,{M}_{\odot }$ (green lines), and ${10}^{10}\,{M}_{\odot }$ (red lines). The total mean halo occupation function, $\langle N({M}_{{\rm{h}}})\rangle $ (solid lines), is decomposed into contributions from central galaxies, $\langle {N}_{\mathrm{cen}}\rangle $ (dotted lines), and satellite galaxies, $\langle {N}_{\mathrm{sat}}\rangle $ (dashed lines). Because we have only selected halos younger than the formation time threshold ${a}_{1/2}$, it is not surprising that the central occupation numbers for the samples of ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}$ and ${10}^{10}\,{M}_{\odot }$ are smaller than unity even for very massive halos with ${M}_{{\rm{h}}}\sim {10}^{15}\,{M}_{\odot }$.

Figure 14.

Figure 14. Mean occupation functions from the best-fitting models for the three typical ${M}_{{\rm{H}}{\rm{I}}}$ threshold samples with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ (blue lines), ${10}^{9}\,{M}_{\odot }$ (green lines), and ${10}^{10}\,{M}_{\odot }$ (red lines). The total mean halo occupation function (solid lines) is decomposed into contributions from central galaxies (dotted lines) and satellite galaxies (dashed lines).

Standard image High-resolution image

In traditional HOD models, the characteristic host halo mass for central galaxies with a threshold luminosity or stellar mass is usually given by ${M}_{\min }$, the mass at which the mean occupation number of central galaxies per halo is 0.5 (see Equation (15); Zehavi et al. 2005, 2011; Zheng et al. 2009; White et al. 2011; Guo et al. 2014a, 2015a). In our case, however, the central galaxy occupation does not show a rapid cutoff at the low-mass end, particularly for the samples with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ or higher. For instance, for the sample of ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10}\,{M}_{\odot }$, ${M}_{\min }\sim 3\times {10}^{14}\,{M}_{\odot }$, but most of the central galaxies in this sample live in halos of about ${10}^{11.5}\,{M}_{\odot }$ (see Figure 13). This results from the fact that, by construction, our model tends to put H i galaxies into young, low-mass halos to reduce the large-scale bias. It should be noted that the mean occupation function implied by our model is of limited use for the general HOD modeling, as halos are preselected according to their assembly history, breaking the HOD assumption that the galaxy content depends only on halo mass.

Figure 15 shows the average halo occupation number $\langle N({M}_{{\rm{h}}})\rangle $ (left panel) and the differential probability distribution of halo mass ${dp}/d\mathrm{log}{M}_{{\rm{h}}}$ (right panel) for the ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ sample. The probability distribution of halo mass is derived from the product of $\langle N({M}_{{\rm{h}}})\rangle $ and the halo mass function. We show the results only for the sample with ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ and compare these with the predictions of the same quantities from three models: the SAM of Fu et al. (2013), the EAGLE simulation by Crain et al. (2017), and the Illustris simulation by Vogelsberger et al. (2014). Results are shown separately for central and satellite galaxies. We consider only the three models because data for other models are not available to us. We note that the EAGLE simulation data from Crain et al. (2017) only include the central galaxy population.

Figure 15.

Figure 15. Mean halo occupation number $\langle N({M}_{{\rm{h}}})\rangle $ (left panel) and the differential probability distribution of halo mass ${dp}/d\mathrm{log}{M}_{{\rm{h}}}$ (right panel) for the sample of ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$. The predictions from our best-fitting model are shown as the black lines, while the results from the SAM of Fu et al. (2013) and from the EAGLE and Illustris simulations are shown as the red, blue, and green lines, respectively. The models for the central and satellite galaxies are displayed in solid and dashed curves, respectively.

Standard image High-resolution image

For central galaxies, the SAM of Fu et al. (2013) predicts a bimodal distribution of $\langle {N}_{\mathrm{cen}}({M}_{{\rm{h}}})\rangle $, with two well-separated populations peaked at ${M}_{{\rm{h}}}\sim {10}^{11.5}\,M$ and ${10}^{14.5}\,{M}_{\odot }$, respectively. This bimodal distribution is not seen in the hydrodynamic simulations, in which the H i-selected central galaxies are limited to halos with intermediate to low masses. Neither is the bimodal distribution seen in our model, but our model appears to be in better agreement with the SAM of Fu et al. (2013), in the sense that both models predict a considerable fraction of the most massive halos to be able to host H i-rich galaxies. For satellite galaxies, all the models predict a power-law mean occupation function with halos of higher masses hosting larger numbers of H i-selected galaxies, but on average our model requires the host halos of satellites to be more massive than those in the other models. In our model, the existence of H i-rich satellites in massive halos is possible as long as the halos are formed at a substantially late time. This population must be very rare in observational samples given the extremely low abundance of the most massive halos. In fact, H i emission has been detected in satellite galaxies of massive halos with ${M}_{{\rm{h}}}\sim {10}^{14}\mbox{--}{10}^{15}\,{M}_{\odot }$ (e.g., Catinella et al.2013).

The actual shape of the occupation function at high mass does not have a significant effect on the probability distribution of host halo mass, because the halo mass function decreases rapidly with ${M}_{{\rm{h}}}$ at the massive end. This can be clearly seen from the right panel of Figure 15, where the probability distribution function of host halo mass for central galaxies is concentrated to a narrow range of halo mass, ranging from a few $\times {10}^{9}\,{M}_{\odot }$ to $\sim {10}^{13}\,{M}_{\odot }$, with a peak at $\sim 2\times {10}^{10}\,{M}_{\odot }$. This result is in good agreement with the Illustris simulation, while the distribution of halo mass for central galaxies in both the EAGLE simulation and the SAM of Fu et al. (2013) covers a similarly narrow halo mass range but is peaked at $(2\mbox{--}3)\times {10}^{11}\,{M}_{\odot }$, an order of magnitude higher than predicted by our model. For satellite galaxies, all the models, including ours, span a broad range of halo mass from ${M}_{{\rm{h}}}\sim {10}^{11}\,{M}_{\odot }$ up to $\sim {10}^{15}\,{M}_{\odot }$, although our model shows a slightly different shape in the overall distribution.

5.3. H i Mass Function

In Figure 16 we show the HIMF of galaxies as predicted by our model for the full galaxy population (left panel) and the populations of central (middle panel) and satellite (right panel) galaxies. For comparison, the Schechter function fit of the observational HIMF from ALFALFA (Martin et al. 2010) is plotted as a black solid line and repeated in every panel. By construction the HIMF from our model perfectly matches the observed HIMF for the whole H i galaxy population. Furthermore, our model predicts that the overall HIMF is predominantly contributed by the central galaxy population. The central and satellite populations contribute about 90% and 10%, respectively, of the total abundance at a given ${M}_{{\rm{H}}{\rm{I}}}$, and this result is essentially independent of the H i mass, broadly consistent with the satellite fractions listed in Table 2, which are constant at $\sim 10 \% $.

Figure 16.

Figure 16. H i mass functions of all (left panel), central (middle panel), and satellite galaxies (right panel) from several models. The results from the SAM of Fu et al. (2013), Illustris, and EAGLE are displayed as the red, green, and blue lines, respectively. Our model predictions are shown as the filled circles. We also show for comparison the Schechter function fit from Martin et al. (2010) for all galaxies in the ALFALFA sample as the black line in each panel.

Standard image High-resolution image

Figure 16 also shows the HIMF prediction of the Illustris and EAGLE simulations, as well as the SAM of Fu et al. (2013). Despite the agreement on the H i–halo mass relation between our model and the Illustris simulation for ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ (see Figure 13), the overall abundance of the H i-selected galaxies from Illustris is significantly higher than that of the other models, with largest discrepancies at intermediate ${M}_{{\rm{H}}{\rm{I}}}$, and this is true for both central and satellite galaxies. The EAGLE simulation predicts an HIMF with a much lower amplitude than others, though the prediction is available only for central galaxies. The SAM of Fu et al. (2013) appears to broadly agree with our model, particularly for the central galaxies and the full population. For satellite galaxies, the SAM predicts a steeper slope at the low-mass end, where the HIMF from our model is quite flat. It is not surprising that the SAM can successfully reproduce the overall HIMF, as the model parameters were tuned to do so. It is encouraging that the SAM model and our model show a broad agreement in both the central and satellite components of the HIMF.

6. Conclusions

In this paper, we have investigated the dependence of clustering on the H i content of galaxies using the 70% complete sample of the ALFALFA survey. We select galaxy samples by different H i mass thresholds, ranging from ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{8}\,{M}_{\odot }$ to ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{10.4}\,{M}_{\odot }$, and for each sample we have estimated three clustering statistics: the projected 2PCF ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$, the projected cross-correlation function with respect to a reference sample, and the redshift-space monopole moment ${\xi }_{0}(s)$. We construct a halo-based statistical model in which the H i content of a galaxy depends on both the host dark matter halo mass and the halo formation time, and we use the ${w}_{{\rm{p}}}({r}_{{\rm{p}}})$ measurements to constrain the model parameters. We discuss the H i–halo mass relation and H i mass functions for central and satellite galaxies as inferred from our best-fitting models, and we compare these results with predictions of the same statistics from other models in the literature, including the traditional HOD models, semianalytic models of galaxy formation, and cosmological hydrodynamical simulations.

Our main conclusions can be summarized as follows:

  • 1.  
    In contrast to previous studies that found no significant dependence of clustering on H i mass, we find that the projected 2PCFs depend strongly on the H i mass, in the sense that galaxies of higher H i masses are more strongly clustered on scales above a few megaparsecs than the lower H i mass galaxies. This finding is robust, as we also infer consistent galaxy bias factors from the redshift-space monopole moments and the cross-correlations between the ALFALFA and SDSS galaxy samples.
  • 2.  
    The bias factors of the low H i mass samples are systematically lower than the minimum bias of dark matter halos selected by mass or ${V}_{\mathrm{peak}}$ thresholds. This implies that the relation between H i-rich galaxies and halos depends not only on halo mass or ${V}_{\mathrm{peak}}$, as commonly assumed in traditional halo models, such as the HOD model or the simplest SHAM model.
  • 3.  
    The clustering measurements of the H i-selected samples can be reasonably explained by an extended SHAM model, which includes a parameter related to the halo assembly bias effect in addition to ${V}_{\mathrm{peak}}$. In our model, this parameter is chosen to be the halo formation time.

7. Discussions

Thanks to the most up-to-date ALFALFA $\alpha .70$ sample, we are able to perform an extensive investigation of the H i mass dependence of galaxy clustering. We make full use of the flux-limited sample through weighing galaxy pairs by $1/{V}_{\max }$ to improve the clustering measurements and to achieve effectively volume-limited measurements. We have tested the robustness of our clustering measurements by checking the finite-volume effect, analyzing the NGC and SGC subsamples separately, measuring the clustering relative to a large reference sample, and carrying out comparisons with mock catalogs. We find that the clustering measurements at H i masses below ${10}^{9}\,{M}_{\odot }$ are seriously biased by the existence of the superclusters in the NGC, while the measurements based on the SGC data alone appear to be more consistent with the low bias factors at the low-mass end as obtained from the redshift-space monopole moments, which are less affected by the sample variance when compared to the projected 2PCFs. However, larger samples covering even larger volumes are still needed in order to have unbiased clustering measurements at these low H i masses.

At ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$, where the different clustering statistics provide consistent results, we find the clustering on scales larger than a few megaparsecs to significantly depend on H i mass, with stronger clustering at higher masses. Previous studies have controversial conclusions about this dependence (Basilakos et al. 2007; Meyer et al. 2007; Papastergis et al. 2013), which can at least partially (if not purely) be attributed to the smaller sample sizes and the non-volume-limited clustering measurements. In fact, the H i mass dependence of clustering is naturally expected given the positive correlation of ${M}_{{\rm{H}}{\rm{I}}}$ with stellar mass (e.g., Figure 2 of Huang et al. 2012) and the known stellar-mass dependence of clustering (e.g., Li et al. 2006). The stellar-mass dependence has been studied in depth in the past decade for both low-z galaxies and those at higher redshifts, thanks to the much larger samples from optical spectroscopic or photometric redshift surveys. These studies have provided stringent constraints on the link between galaxies of different stellar masses and their host dark matter halos. In principle, the H i mass dependence of clustering should also provide interesting constraints on the galaxy–halo relationship. Our halo-based model has shown that this is indeed the case.

The model we have proposed in this work is an extended version of the simple SHAM model. In our model the H i content of a galaxy depends not only on the mass (or more accurately ${V}_{\mathrm{peak}}$) of its host halo, as in the SHAM model, but also on the halo assembly history. It has been known that the clustering of halos depends not only on halo mass but also on halo assembly history (Gao et al. 2005; Wechsler et al. 2006; Gao & White 2007; Wang et al. 2007a; Li et al. 2008). If the galaxy formation and evolution tightly track the halo assembly, the above halo assembly bias effect would translate to an effect on galaxy clustering (aka galaxy assembly bias). We have adopted three parameters to quantify halo assembly history—halo formation time, halo spin, and halo concentration—and found the halo formation time to be the best parameter that can reproduce the clustering of H i-rich galaxies at all scales. One interesting prediction of our model is that H i-rich satellite galaxies could reside in massive halos as long as the halos are formed at substantially late times. This makes sense if the satellite galaxies lose their cold gas in a smooth manner. That is, the ram pressure stripping of gas happens slowly, not immediately at accretion as assumed in most of the current semianalytic models. Such smooth gas stripping has been observed in satellite galaxies in low-redshift galaxy clusters (e.g., Zhang et al. 2013) and has recently been investigated in theoretical models (e.g., Luo et al. 2016).

Recent studies of galaxy clustering have attempted to include the effect of halo assembly bias in the SHAM model. By tying galaxy color to halo age, such models can qualitatively reproduce the codependence of clustering on stellar mass and color (Hearin & Watson 2013; Zentner et al. 2014, 2016).13 Apparently, our model is along the same lines as these studies, and the results are not surprising given the close correlation between cold gas fractions of galaxies and their colors.

Observational studies attempting to directly detect the galaxy assembly bias in galaxy clustering have produced controversial results, however. On one hand, Miyatake et al. (2016) and More et al. (2016) analyzed two samples of galaxy clusters that have similar halo masses of $1.9\times {10}^{14}{h}^{-1}\,{M}_{\odot }$, as estimated from weak-lensing signals, and found their halo bias values to differ by a factor of 1.5. This was regarded as observational evidence in support of the halo assembly bias effect. On the other hand, the assembly bias signal observed in the redMaPPer clusters is shown to be strongly contaminated by the projection effect of the cluster membership identification (Zu et al. 2017). In addition, Lin et al. (2016) compared the clustering of early- and late-forming galaxies with a mean halo mass of $\sim 9\times {10}^{11}{h}^{-1}\,{M}_{\odot }$ and found no convincing evidence for galaxy assembly bias. In cosmological simulations, halo assembly bias is most pronounced for low-mass halos with masses below $\sim {10}^{13}\,{M}_{\odot }$, as originally found by Gao et al. (2005). Our model predicts that the H i-rich galaxies detected in the ALFALFA sample are mostly hosted by halos less massive than $9\times {10}^{11}{h}^{-1}\,\,{M}_{\odot }$. Our results imply that the H i content of galaxies may be more tightly correlated with the halo assembly history, compared to quantities derived from optical observations. Next-generation H i surveys might be able to provide better constraints on the correlation between H i gas and halo assembly, i.e., galaxy assembly bias on top of halo assembly bias.

We find that at ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9}\,{M}_{\odot }$ the H i–halo mass relation predicted by our model agrees well with the relation from the Illustris simulation, while the EAGLE simulation and current semianalytic models predict higher halo masses at a given H i mass. However, the Illustris simulation predicts an HIMF that is substantially too high when compared to the observed HIMF from ALFALFA. By construction our model accurately reproduces the observed HIMF for the full sample and additionally predicts the HIMFs for central and satellite galaxies separately. It is interesting to note that the predicted central galaxy HIMF by our model agrees well with that from the SAM of Fu et al. (2013). For the satellite population, the same SAM agrees with our model at the high-mass end of the HIMF but predicts too many galaxies at ${M}_{{\rm{H}}{\rm{I}}}\lt {10}^{10}\,{M}_{\odot }$. We did not examine the clustering properties predicted by these models, but in principle our measurements of the H i-dependent clustering should be helpful for testing and constraining the models. In a recent paper, Zoldan et al. (2017) have analyzed the H i content of galaxies for six different SAMs, finding most of them to agree with the observational paper by Papastergis et al. (2013) in the sense that both the models and the data reveal no significant mass dependence of the clustering at ${M}_{{\rm{H}}{\rm{I}}}\gt {10}^{9.5}\,{M}_{\odot }$. However, the results in Papastergis et al. (2013) are based on a substantially smaller sample and on non-volume-limited clustering measurements. Given the H i mass dependence of the clustering we find here with larger samples and effectively volume-limited measurements, the comparison between the SAM predictions and the observational results needs to be revisited in order to properly test the models.

With much larger samples of H i-selected galaxies in the foreseeable future, we will be able to further verify our model predictions and study the relation between the cold gas and the halo environment in more depth. First, one may make use of optically selected galaxy samples instead of H i samples for the clustering analysis, estimating a pseudo-H i mass for each galaxy based on the tight scaling relations between the H i gas content and the other properties such as color and surface brightness (e.g., Zhang et al. 2009; Li et al. 2012; Teimoorinia et al. 2017). Current photometric H i estimators, which are calibrated with the existing H i surveys, such as ALFALFA and GASS, can provide unbiased H i–stellar mass ratios for galaxies spanning a wide range of gas mass fraction, with scatter of only ∼0.2–0.3 dex. By applying their estimator to a large sample of SDSS galaxies, Li et al. (2012) examined the dependence of galaxy clustering on the H i–stellar mass ratio and found that at a given stellar mass, galaxies with higher gas fractions have weaker clustering amplitudes. We expect the poorly estimated clustering at the low H i mass end with the limited sample size to be well improved, thus providing additional and important constraints on our model, particularly at the low-mass end. The larger sample would also allow us to include not only the H i mass but also the H i–stellar mass ratio in clustering measurements and modeling, a property supposed to be more closely linked to the halo assembly history. Second, we note that our simplified halo model is restrictive by construction, as the accuracy of the clustering measurements from the small volume of the α.70 sample does not allow for strong constraints on more sophisticated and flexible models. The traditional SHAM model usually includes the scatter in the stellar–halo mass relation as a free parameter (see, e.g., Behroozi et al. 2010; Moster et al. 2010; Leauthaud et al. 2012; Nuza et al. 2013; Reddick et al. 2013). In our model, we test the effect of the scatter but do not include it in the adopted model, although part of the scatter between the H i and halo mass is taken into account through the scatter between ${V}_{\mathrm{peak}}$ and halo mass. In addition, in our model we assume a constant halo formation time threshold for each ${M}_{{\rm{H}}{\rm{I}}}$ threshold sample. With higher-precision clustering measurements from future larger data, we expect to be able to better constrain the mapping of the galaxy H i content onto the multidimensional halo parameter space (e.g., halo mass and formation time).

This work is supported by the National Key Basic Research Program of China (nos. 2015CB857003, 2015CB857004). H.G. acknowledges the support of the 100 Talents Program of the Chinese Academy of Sciences and NSFC (nos. 11543003, 11773049). C.L. acknowledges the financial support of NSFC (nos. 11173045, 11233005, 11325314, 11320101002) and the Strategic Priority Research Program "The Emergence of Cosmological Structures" of CAS (grant no. XDB09000000). H.J.M. acknowledges the support from NSF AST-1517528 and NSFC-11673015. We thank the anonymous referee for the helpful comments that improved the presentation of this paper. We thank Jian Fu, Gergö Popping, Robert A. Crain, Han-seek Kim, Emmanouil Papastergis, and Michael G. Jones for kindly providing the data used in this paper.

We gratefully acknowledge the use of the High Performance Computing Resource in the Core Facility for Advanced Research Computing at Shanghai Astronomical Observatory. We acknowledge the Gauss Centre for Supercomputing e.V. (http://www.Gauss-centre.eu) and the Partnership for Advanced Supercomputing in Europe (PRACE, http://www.prace-ri.eu) for funding the MultiDark simulation project by providing computing time on the GCS Supercomputer SuperMUC at Leibniz Supercomputing Centre (LRZ, http://www.lrz.de).

Footnotes

Please wait… references are loading.
10.3847/1538-4357/aa85e7