This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Brought to you by:

Articles

THE CLUSTERING CHARACTERISTICS OF H i-SELECTED GALAXIES FROM THE 40% ALFALFA SURVEY

, , , and

Published 2012 April 13 © 2012. The American Astronomical Society. All rights reserved.
, , Citation Ann M. Martin et al 2012 ApJ 750 38 DOI 10.1088/0004-637X/750/1/38

0004-637X/750/1/38

ABSTRACT

The 40% Arecibo Legacy Fast ALFA survey catalog (α.40) of ∼10,150 H i-selected galaxies is used to analyze the clustering properties of gas-rich galaxies. By employing the Landy–Szalay estimator and a full covariance analysis for the two-point galaxy–galaxy correlation function, we obtain the real-space correlation function and model it as a power law, ξ(r) = (r/r0)−γ, on scales <10 h−1 Mpc. As the largest sample of blindly H i-selected galaxies to date, α.40 provides detailed understanding of the clustering of this population. We find γ = 1.51 ± 0.09 and r0 = 3.3 + 0.3, −0.2 h−1 Mpc, reinforcing the understanding that gas-rich galaxies represent the most weakly clustered galaxy population known; we also observe a departure from a pure power-law shape at intermediate scales, as predicted in ΛCDM halo occupation distribution models. Furthermore, we measure the bias parameter for the α.40 galaxy sample and find that H i galaxies are severely antibiased on small scales, but only weakly antibiased on large scales. The robust measurement of the correlation function for gas-rich galaxies obtained via the α.40 sample constrains models of the distribution of H i in simulated galaxies, and will be employed to better understand the role of gas in environmentally dependent galaxy evolution.

Export citation and abstract BibTeX RIS

1. INTRODUCTION

Galaxies selected by their neutral hydrogen are known to be less clustered than their optically selected counterparts (Basilakos et al. 2007; Meyer et al. 2007 for HIPASS) and less likely to be found in such dense environments. Given anticipated cosmological uses of 21 cm galaxy redshift surveys, it is important to understand the clustering characteristics of this population of galaxies. Specifically, 21 cm line surveys obtain detections and redshifts concurrently, along with H i mass, reducing their expense and eliminating the need for follow-up observations. Such surveys are also able to probe galaxy populations irrespective of luminosity, stellar mass, or dust extinction. Additionally, such surveys are sensitive to low-luminosity dwarf systems, which tend to be gas-dominated (Geha et al. 2006). Conversely, such surveys are biased against clusters, the most luminous galaxies, and the "red and dead" galaxy population.

Given the lack of large and deep H i-selected galaxy samples to date (the HIPASS main catalog and its northern extension contain, respectively, 4315 and 1002 galaxies; Meyer et al. 2004; Wong et al. 2006), this population, its evolution, and its bias compared to dark matter are poorly understood. The selection of these galaxies is strongly limited in redshift, and targeted observations can only extend to z ∼ 0.2 (Catinella et al. 2008; Freudling et al. 2011), while the Arecibo4 Legacy Fast ALFA (ALFALFA) survey is limited to z < 0.06. At the same time, this population is poised to become the standard for cosmological measurements based on observations of resolved galaxies as well as intensity mapping. For example, galaxy redshift surveys taking advantage of the 21 cm transition of neutral hydrogen undertaken with instruments like the Square Kilometer Array (SKA) would potentially provide constraints on the dark energy equation of state and its variation with redshift (Abdalla et al. 2010; Myers et al. 2009).

The differences in neutral hydrogen distribution between galaxies in clusters and those in the field are unevenly understood, with proposed solutions spanning from "nature" (i.e., gas-rich galaxies form in low-concentration dark matter halos and/or in underdense environments) to "nurture" (i.e., processes that occur after formation deplete the H i gas from halos, through ram-pressure stripping or galaxy interactions, or enrich H i reservoirs, through cold accretion). The reality is a combination of many processes and initial conditions. Probing the relationship between cold gas mass and other properties known to be anticorrelated with clustering (such as spiral morphology, late type (Norberg et al. 2002), active star formation (Kauffmann et al. 2004), and blue colors (Zehavi et al. 2005)) may help to better articulate the influence of environment on galaxy evolution while also constraining the populations to which future large 21 cm line surveys will be sensitive.

Most work directly related to the clustering of gas-rich galaxies came out of the HIPASS survey. Meyer et al. (2007) and Basilakos et al. (2007) both identified the HIPASS H i-selected sample as the weakest clustering population of galaxies known, but their results regarding the mass dependence of the clustering were in conflict. While the HIPASS team found a statistically insignificant difference between "high" and "low" H i mass galaxies, Basilakos et al. (2007) found that high-mass galaxies clustered more strongly. More recently, Passmoor et al. (2011) compare the ALFALFA and HIPASS projected correlation function and angular correlation function, and find that they are similar but that ALFALFA's sensitivity to low-mass galaxies makes that sample more strongly antibiased relative to dark matter. However, Passmoor et al. (2011) use only the ALFALFA catalogs published in Giovanelli et al. (2007), Saintonge et al. (2008), and Kent et al. (2008) (∼1800 galaxies) despite several other ALFALFA catalogs being available at time of publication; these catalogs include the Virgo cluster and Pisces-Perseus foreground void and cover small volumes, so do not comprise a representative sample. Passmoor et al. (2011) are therefore severely limited in their ability to make broader claims about the population.

The excellent sensitivity and large sample size of the α.40 sample allow us to probe the clustering characteristics of H i-selected galaxies through the two-point galaxy–galaxy correlation function.

In the following sections, we describe our data set (Section 2) and the methodology used to measure the galaxy–galaxy correlation function (Section 3). We then estimate the real-space correlation function, both assuming a power law and by direct inversion, and investigate the impact of methodology choices in Section 4. We compare the ALFALFA clustering results to those found in simulations that have, for the first time, attempted to assign reasonable cold H i gas masses to simulated galaxies, in Section 5, while also discussing the results in context, before concluding in Section 6.

2. DATA SET

2.1. ALFALFA α.40 Sample

The ongoing ALFALFA survey is completing a census of galaxies in the local universe, out to z ∼ 0.06, using the seven-pixel ALFA receiver at the Arecibo Observatory to detect the 21 cm line of neutral hydrogen. Compared to previous blind neutral hydrogen surveys (e.g., HIPASS), ALFALFA's enhanced sensitivity, detection centroiding, volume, and sample size, resulting in a cosmologically representative sample, make it ideally suited for an accurate measurement of the correlation function of gas-rich galaxies.

The sample used here includes the sky coverage of the α.40 sample recently presented in Haynes et al. (2011), referred to as α.40 because it includes the data extracted from coverage of 40% of ALFALFA's skyprint. The statistical completeness and noise characteristics of the ALFALFA source catalog are well understood and have been discussed extensively elsewhere. Further details may be found in Saintonge (2007), Martin et al. (2010), and Haynes et al. (2011), which include discussions of the characteristics of the α.40 sample and the sensitivity of the ALFALFA survey. In particular, Haynes et al. (2011) discuss impacts of various volume restrictions on the derivation of the H i mass function. Here, we summarize the salient points.

Confidently detected sources in ALFALFA are assigned one of three object codes, where Code 1 refers to a reliable extragalactic detection with a high signal-to-noise ratio (S/N; > 6.5). For the sample used here, we neglect the other objects, Code 2 and Code 9 sources; Code 9 sources are high velocity clouds (HVCs) of hydrogen gas in the vicinity of the Milky Way and are thus not extragalactic, whereas Code 2 extragalactic sources have lower S/N and are only included in the catalog because they are corroborated by a known optical source at the same position and redshift. Furthermore, ALFALFA's ability to detect extragalactic signal near its redshift limit is degraded due to a strong source of terrestrial radio frequency interference, the FAA radar at the San Juan airport. We therefore include only objects within 15,000 km s−1, which results in only a modest loss of source counts.

ALFALFA's sensitivity depends not only on the integrated flux, but also on the 21 cm spectrum's profile width W50 (km s−1). Because the mass of an H i source is a function of its distance and integrated flux, integrated flux can be thought of as a proxy for mass. The survey therefore is not volume-, flux-, or mass-limited, and the reconstructed selection function must take this complex sensitivity into account. Thus, when the sample is viewed as the distribution of galaxy masses as a function of distance, as in Figure 3 of Haynes et al. (2011), it is clear that α.40 is sensitive to very low H i mass galaxies nearby but only to significant masses at greater distances

We refer the reader to Figure 1 of Martin et al. (2010), which displays the dependence of ALFALFA's sensitivity on both integrated flux and profile width. The HIPASS survey recovered sources with the same dependence on these two parameters. ALFALFA is more sensitive than HIPASS, with a 5σ detection limit of 0.72 Jy km s−1  for a source with profile width 200 km s−1  in ALFALFA compared to 5.6 Jy km s−1  for the same source in HIPASS. In Martin et al. (2010), we fit a linear relationship between integrated flux and profile width, with a break at 400 km s−1, which describes the sensitivity of the survey. We use that same relationship and the selection function derived in that work throughout this paper.

The trimmed sample includes only Code 1 objects from the α.40 catalog within 15,000 km s−1, for a total of ∼10,150 galaxies used in measuring the two-point correlation function.

2.2. Selection Function

The distance dependence of the selection function of α.40 was determined using the two-dimensional stepwise maximum likelihood (2DSWML) method, described fully in Appendix B of Martin et al. (2010). 2DSWML is related to the stepwise maximum likelihood (SWML) approach, but modified to account for the survey sensitivity's two-dimensional dependence on both integrated flux and profile width. 2DSWML splits the distribution of galaxy masses and profile widths in the α.40 sample into logarithmic bins, and then calculates the best-fit H i mass function (analogous to a luminosity function) which maximizes the joint likelihood of detecting all galaxies in the sample. This approach simultaneously measures the selection function for each detected galaxy in the sample. In this work, we will use that selection function S(Di) for each galaxy i with a known distance Di in Mpc.

For application to the correlation function, $S_{D_i}$ is calculated for every galaxy in the sample and then SD is smoothed. This smoothed selection function can additionally be combined with an H i mass function to make predictions regarding the number of galaxies of a given mass that are expected to be found in the survey, or used to predict the redshift distribution of the survey under an assumption of homogeneity. Figure 1, previously published in Haynes et al. (2011), shows a histogram of the α.40 redshift distribution, with peaks and dips representing clusters and voids, respectively, along with an overplotted prediction based on the selection function and a non-clustered universe.

Figure 1.

Figure 1. Observed redshift distribution of α.40 galaxies (histogram) compared to the expected distribution (solid line) obtained via the survey's selection function. This figure was previously published as Figure 21 in Haynes et al. (2011).

Standard image High-resolution image

Disagreements between the prediction and the observations are due both to the existence of large-scale structure in the survey volume and to the loss of survey sensitivity at certain velocities due to radio frequency interference. This contamination is quantified as a percentage of survey coverage at a given heliocentric velocity, or spectral weighting, as discussed in Martin et al. (2010) and earlier publications from the ALFALFA survey. For the purposes described here, the weights map has been translated into the cosmic microwave background (CMB) reference frame in order to most accurately model the predicted ALFALFA galaxy distribution (see Section 3.2). The selection function is used both for the creation of the random samples for estimation of the correlation function, and for the weighting of pair counts in that estimate.

3. METHOD: ESTIMATION OF ξ(r) AND ERROR ANALYSIS

We measure the correlation function, ξ(σ, π) in bins of on-sky (σ) and radial (π) redshift–space separations, using their observed velocities. Given the redshift extent of α.40, we have translated measured galaxy velocities from the heliocentric frame of reference to the CMB frame of reference using Lineweaver et al. (1996). For two galaxies i and j, these separations are

Equation (1)

and

Equation (2)

where θ is the angular separation of the two galaxies on the sky, vi and vj are defined in the CMB reference frame, and H0 is expressed in units of h (H0 = 100h). We adopt the Davis & Peebles (1983) definitions for σ and π rather than those used in Fisher et al. (1994a), but note that Guzzo et al. (1997) found negligible differences for a sample with similar redshift extent and covering a portion of α.40's survey volume. Because we are using a sample of galaxies in the very local universe, we neglect cosmological corrections to the distances. This choice is further supported by our focus on small relative pair distances (always less than 30 h−1 Mpc) and our interest in projected quantities where such distance errors, already very small in magnitude, are absorbed in the projection.

Our ultimate goal is to measure ξ(r), the real-space correlation function, through the observables actually available to us, that is, ξ(σ, π). In particular, we are interested in modeling the power-law shape of the correlation function up to ∼10 h−1 Mpc, beyond which point the correlation function is known to diverge from a simple power law.

Since ξ measures not simply the probability distribution of galaxy separations in a sample, but the excess probability compared to a homogeneously distributed sample, estimators compare the observed galaxy distribution to a random distribution designed to reflect the survey's observational limitations but to exclude the effects of large-scale structure. This is straightforwardly accomplished by comparing the number of pairs in (σ, π) separation bins from the observed sample to the pair counts from the random sample. In the sections that follow, we will describe this method and the corresponding error analysis in greater detail.

3.1. Pairwise Estimation

We adopt the Landy–Szalay pairwise estimator (Landy & Szalay 1993) for the correlation function. The Landy–Szalay normalization of data–data (DD), random–random (RR), and data–random (DR) pair counts allows us to construct a random catalog that contains many more objects than the observed data catalog, thereby reducing the introduction of shot noise from the random set. The Landy–Szalay estimator is constructed from these normalized counts:

Equation (3)

Because α.40 is not volume-limited, the pair counts must be weighted so that the measurement is not dominated by galaxies at the peak of the selection function. Following Meyer et al. (2007) and Hawkins et al. (2003), we apply a weighting wij = wi × wj for the contribution of each pair i, j to the Landy–Szalay estimator, given by

Equation (4)

where ND is the number of galaxies in α.40, S(ri) is the selection function measured for α.40 at ri = czCMB, i/H0, and

Equation (5)

defined in terms of the redshift–space coordinate $s = \sqrt{\sigma ^2 + \pi ^2}$.

This expression for J3 requires an assumed model for ξ(s), but the final measurement of the correlation function is not sensitive to this assumed input for object weighting; we assume a power-law form

Equation (6)

and we test our robustness by first assuming a fiducial value found for optically selected samples, s0 = 5.0 h−1 Mpc and γ = 1.8 and, after that, iterating to the value s0 and γ measured for α.40. No statistically significant difference is observed through this iterative process, and we therefore proceed as other authors have, using the fiducial optical values reported here in our J3 weighting. Following Fisher et al. (1994a, 1994b), we apply an artificial cutoff with a maximum value of s = 30 h−1 Mpc in the expression for J3.

3.2. Random Samples

We construct random samples that contain 20 times the number of objects in the α.40 data set. These random samples are carefully designed to include survey selection effects while excluding correlations due to large-scale structure. This is accomplished by predicting the distribution of czCMB from the survey selection and H i mass functions (see Figure 1) and then folding in the loss of volume as a function of velocity due to radio frequency interference, measured from the spectral weights map in Martin et al. (2010). Objects in the random set are randomly assigned a sky position within the right ascension and declination boundaries of α.40 and are then assigned a redshift from this predicted distribution. The resulting redshift distribution for one example instance of the random sample procedure is shown in Figure 2.

Figure 2.

Figure 2. Redshift distribution of the constructed random sample. The dips in the distribution at ∼8000 km s−1  are due to radio frequency interference at the Arecibo Observatory. When data at these frequencies are flagged as bad (and thus ignored in the processing pipeline), it leads to a reduction in the effective search volume at the corresponding velocities, which translates into a reduction in counts in the random samples. See Martin et al. (2010) for a plot of the average relative weight as a function of velocity in α.40.

Standard image High-resolution image

3.3. Error Analysis

The correlation function is measured in bins of separation. While the correlation function is expressed as a function of several different coordinates while iterating toward the real-space correlation function ξ(r), the bin counts and thus the measured correlation functions are correlated with one another in every such coordinate system. Because structures, such as clusters, will contribute an overabundance of pairs to a set of several bins, the measurement in each bin is not independent of the others. In plots of the correlation function shown here, we display the on-diagonal elements of the covariance matrix (i.e., the standard deviations) as uncertainties on each point. However, in order to work with our measurement to estimate the power-law shape of the correlation function of gas-rich galaxies, we must construct a full covariance matrix and take off-diagonal elements into account.

To construct the covariance matrix C, we carry out our pair-counting routine on more than 500 bootstrap resamplings of the data, and a single catalog of random objects is reused in each case. Each of the bootstrap measurements of ξ(σ, π) contains Ng galaxies selected at random from α.40, with replacement. From this set of realizations, we construct the covariance matrices for ξ(s), Ξ(σ) and ξ(r). The covariance between two correlation function bins bl and bm is given by

Equation (7)

The significant off-diagonal elements of the covariance matrix make it difficult to obtain a power-law model fit by minimizing the χ2 values weighted by the variance. The covariance matrix, however, is not an inescapable quality of the data, but is actually dependent on the basis in which the data are projected. In this case, we have some number of bins Nb representing a set of variables b (bin centers in h−1 Mpc), and can choose to work in an orthonormal basis with Nb coordinate axes in which the covariance matrix C is diagonalized. This basis is defined by the principal component eigenvectors of the measurement, and we borrow elements of principal component analysis in order to obtain model parameter fits and uncertainty estimates.

The principal components are linear combinations of the original Nb variables arranged such that the first principal component corresponds to an orthonormal axis through Nb-dimensional space that explains the largest proportion of variance in the data set. These principal component vectors are defined by the eigenvectors of the covariance matrix of the original data set.

Following Fisher et al. (1994a), we calculate the principal eigenvectors and construct a diagonalizing matrix, R, the columns of which are these eigenvectors, and a new covariance matrix, $\tilde{C}$, projected in the new basis set. Since all of the covariance has been accounted for in the definition of the principal components, $\tilde{C}$ has no off-diagonal elements, and the variance is captured in the on-diagonal elements $\tilde{\sigma }$.

Given $\tilde{C}$ and R, a set of models with varying values for s0 and γ can be projected into the principal component basis via $\tilde{b}_{{\rm model}}$ = RT bmodel, for comparison to the measured $\tilde{b}$. We find the value of each parameter that minimizes the expression

Equation (8)

Finally, we construct error ellipses to fully describe the likely parameter space of the power-law model for the correlation function (Press et al. 1992).

3.4. Obtaining the Real-space Correlation Function

From the three-dimensional galaxy coordinates available to us, we can construct ξ(σ, π). This calculation of ξ(σ, π) is the fundamental measurement upon which the results presented in the rest of this work are based. The resulting image, shown in Figure 3 with contours overplotted, clearly reveals the redshift–space distortions that lead to the difficulty in estimating the real-space correlation function. The radial coordinate, π, appears weakly stretched at small angular separation σ because of the Eddington effect in clusters, though because H i-selected galaxies are known to avoid dense cluster regions, this effect is less prominent for α.40 than for optically selected samples. In the other dimension, π is flattened on large scales because of the coherent motion of galaxies toward attractors. This "squeezing" effect is determined by the clustering bias (with respect to dark matter) of the sample and by the underlying matter density fluctuations. In a future work, we will explore the shape of ξ(σ, π) to model the matter density field.

Figure 3.

Figure 3. Two-dimensional correlation function ξ(σ, π) from α.40, measured in h−1 Mpc; brighter colors indicate stronger clustering.

Standard image High-resolution image

In order to obtain the real-space correlation function ξ(r), we must take the intermediate step of projecting ξ(σ, π) along the π axis (in practice, using the discrete bins of size Δπ), resulting in what is known as the "projected correlation function" and symbolized as Ξ(σ)/(σ):

Equation (9)

Following previous work, the maximum value of the integration, πmax, is selected so that the summation is convergent but is kept as low as possible to avoid the introduction of noise from poorly measured intermediate scales. From the original estimation of ξ(σ, π), which counted pairs up to distances of ∼60 h−1 Mpc, we carry the sum along the π axis up to a scale of πmax 29.7 h−1 Mpc. Further, we confirm that the resulting correlation function is not strongly sensitive to the chosen value of πmax, but extending the integration to scales that are too large for the sample to sufficiently measure introduces scatter and noise into the correlation function estimate.

Ξ(σ)/(σ) is closely related to the function in which we are truly interested, ξ(r), where r is the real-space distance, via

Equation (10)

In order to evaluate the real-space correlation function, some assumptions must be made about its form. Two options are usually explored in the literature: a power-law form, or a stepwise-function form which makes no assumptions about shape but does assume that the binning used well represents an underlying smooth correlation function (i.e., the "direct inversion" method). If we assume a power law of the form ξ(r) = (r/r0)−γ, we find

Equation (11)

In Equation (11), the function Γ is the well-known Gamma function. Equation (11) can be recast in terms of fitting parameters:

Equation (12)

Following Meyer et al. (2007), we rearrange Equation (12), obtain the best-fit power law of the form ξ(σ) = $a_1 \; \sigma ^{a_2}$ using the χ2 minimization given by Equation (8), and then relate those parameters to r0 and γ which represent the best-fit power law for ξ(r).

In the next section, we derive and discuss ξ(r) using the mechanisms described in this section.

4. RESULTS: CLUSTERING IN α.40

4.1. Ξ(σ)/σ and ξ(r) Assuming Power-law Model

The projected correlation function Ξ(σ)/σ (recast for the figure and the fitting as Ξ(σ)) is displayed in Figure 4, along with error bars reflecting the on-diagonal elements of the full covariance matrix. The dashed line is the best-fit model obtained by χ2 minimization using the full covariance matrix. In Table 1, we list the parameters for the fit and their uncertainties, along with the fits obtained if only the on-diagonal elements (the standard deviations, σ) are used to carry out the standard least-squares fit. For comparison, we also include the clustering reported by the HIPASS team (Meyer et al. 2007; note that those authors ignored the off-diagonal elements in their error analysis), the clustering found by Basilakos et al. (2007) using the same HIPASS data set, and the clustering of several optically selected samples of interest. We also display the Passmoor et al. (2011) results, which used a small, publicly available early subset of the ALFALFA data.

Figure 4.

Figure 4. Projected correlation function Ξ(σ) from α.40. Error bars reflect the on-diagonal elements of the full covariance matrix. The overplotted dashed line is the fit from the full covariance analysis, with γ = 1.51  ±  .09; r0 = 3.3+0.3−0.2 (h−1 Mpc).

Standard image High-resolution image

Table 1. Best-fit Correlation Function Power-law Models

Fitting Method r0 γ
  (h−1 Mpc)  
Full covariance 3.3 (+0.3, −0.2) 1.51 (±.09)
On-diagonal only 3.2 (± 0.1) 1.48 (±.03)
Passmoor ALFALFAa 2.3 (± 0.6) 1.6 (±.1)
HIPASSa (2007)b 3.5 (± 0.3) 1.47 (±.08)
HIPASSbc 3.3 (±0.3) 1.4 (± 0.2)
2dFGRS late-type faintd 3.7 (±0.8) 1.8 (± 0.1)
SDSS brighte 6.2 (±0.2) 1.85 (± 0.03)
SDSS fainte 3.5 (±0.3) 1.92 (± 0.05)
IRAS All-Sky (real space)f 3.76 (± 0.20) 1.66 (± 0.10)
QDOTg 3.87 (± 0.32) 1.11 (± 0.09)
Pisces-Perseus early typesh 8.35 (± 0.75) 2.05 (± 0.10)
Pisces-Perseus late typesh 5.55 (± 0.45) 1.73 (± 0.08)

Notes. aPassmoor et al. (2011). bMeyer et al. (2007). cBasilakos et al. (2007). dWe include the second-faintest sample due to a warning in Norberg et al. (2002) that the faintest (and smallest) sample provides poorly constrained fits. eZehavi et al. (2005). fFisher et al. (1994a). gMoore et al. (1994). hGuzzo et al. (1997).

Download table as:  ASCIITypeset image

Error ellipses are displayed in Figure 5, with the dashed contour giving the 1σ single-parameter uncertainties listed in Table 1.

Figure 5.

Figure 5. χ2 contours for γ and r0 (h−1 Mpc). The dashed contour gives the 1σ projected uncertainties on γ and r0 as single free parameters, and the solid contours give joint 1, 2, and 3σ fits, respectively, to the pair of two free parameters.

Standard image High-resolution image

While both the full covariance analysis and the assumption of bin independence give similar results, the larger uncertainties on the full covariance analysis give an indication of the need to be conservative. Parameter uncertainties previously reported in the literature (i.e., Meyer et al. 2007) significantly underestimate their reported statistical uncertainties. Even with the greater sensitivity, larger sample size, and deeper redshift range of α.40, the correlation function analysis allows for quite a large range of clustering scenarios.

We confirm the HIPASS result that H i-selected galaxies are among the most weakly clustered known class of galaxies, most comparable to, but still less clustered than, the faint late-type subsampling in 2dFGRS and the IRAS galaxy redshift survey, which was also biased toward star-forming galaxies. Similar to the results we will present in Section 6 for α.40, the flux-limited sample of IRAS galaxies considered in Fisher et al. (1994a) was found to be antibiased relative to cold dark matter on small scales but unbiased on intermediate scales (∼10 h−1 Mpc) and positively biased on the largest scales (beyond 10 h−1 Mpc). Similarly, the QDOT sample (Moore et al. 1994), also taken from the IRAS parent catalog but based on a lower flux limit and employing a different sampling strategy, was found to be unbiased with respect to dark matter.

Guzzo et al. (1997) provide an interesting comparison to our findings, as it was also based on an analysis of 21 cm galaxy profiles observed with the Arecibo Observatory, although that work was based on an optically selected, magnitude-limited sample rather than a blindly H i-selected sample as in this case. This work also samples a region that partially overlaps with α.40. Guzzo et al. (1997) split their sample by morphological type and determined the variation of clustering strength between early- and late-type (spiral and irregular) galaxies in Zwicky's catalog within the Pisces-Perseus region. They found that the early types were significantly more clustered than the late types, as reflected more generally in Table 1, but their volume-limited sample is significantly more clustered than the H i-selected ALFALFA sample.

Our findings for the clustering of H i-selected galaxies are in agreement with previous results, particularly with our understanding that ALFALFA galaxies tend to be blue, spiral, and late-type galaxies which are already known to be weakly clustered (Norberg et al. 2002; Kauffmann et al. 2004; Zehavi et al. 2005). Apart from the estimates of uncertainties, the clustering of α.40 is in agreement with the HIPASS findings but not with Passmoor et al. (2011), which is not unexpected giving the weaknesses—in particular extremely small sample size—of the latter.

4.2. ξ(r) via the Inversion Method and the Shape of the Correlation Function

ξ(r) will only be tidily related to Ξ(σ)/σ if we assume that the underlying physics of galaxy formation dictates a power-law form for ξ(r). In the previous section, we calculated the correlation function of gas-rich galaxies under that assumption, but it is also possible to avoid that assumption and obtain ξ(r) by direct inversion of the projected correlation function Ξ. The inversion method tests the power-law assumption, though it is a noisy measurement and results in large scatter. As an independent test of the shape of the correlation function, ξ(r) determined via this inversion method will be especially useful at scales above ∼2.5 h−1 Mpc, where the correlation function shows features inconsistent with the power-law assumption.

Following Meyer et al. (2007), Hawkins et al. (2003), and Saunders et al. (1992), we take our measurement of Ξ(σ) to represent an underlying step function form with values Ξl in intervals with centers σl, rearrange Equation (10), and interpolate between bins to give

Equation (13)

The sum in Equation (13) is truncated so that σmax = πmax for the value of πmax used in Equation (9).

The projected correlation function and ξ(r) obtained by the inversion method are in excellent agreement, as shown in Figure 6, where the points are the inversion ξ(r) with on-diagonal uncertainties and the overplotted dashed line is the best-fit power-law model for the projected correlation function. It is also clear that the inversion method, as remarked earlier, is more vulnerable to variance. Its use is motivated as a check on the assumed shape of the correlation function. Furthermore, because the power-law assumption is known to be useful only on small-to-intermediate scales, the correlation function obtained via the inversion method allows us to extend to large scales, ∼30 h−1 Mpc.

Figure 6.

Figure 6. ξ(r) obtained via the inversion method, extended to scales ∼30 h−1 Mpc, with the best-fit power-law model for the projected correlation function overplotted as a dashed line to demonstrate agreement.

Standard image High-resolution image

By relaxing the power-law assumption, we can further examine divergence from a power-law shape, a well-known phenomenon found in clustering studies of other populations of galaxies. The ALFALFA correlation function shows a "shoulder" at scales of ∼ a few h−1 Mpc, as observed by, e.g., Guzzo et al. (1991), Hawkins et al. (2003), and Zehavi et al. (2004). Under the assumption of an inflationary, cold dark matter universe and using a halo occupation distribution (HOD) model, Zehavi et al. (2004) infer that the well-known shoulder is due to two distinct regimes in which galaxy–galaxy pairs are counted. On large scales, pairs are counted from separate dark matter halos, while on small scales, pairs are counted in the same dark matter halo and are subject to nonlinearity.

Because the galaxies probed by ALFALFA are gas rich, and because an H i-selected sample is biased toward gas-dominated low-mass objects that would be classified as low surface brightness (LSB) dwarfs in an optical survey, we expect that the characteristics of the single-halo regime would differ from that observed in the case of an optically selected sample. For example, tidal interactions and stripping within dense halos, which would decrease the pair counts of H i-selected galaxies, would change the relative contributions of the two regimes. Watson et al. (2011) find that a resulting power-law correlation function, when the contributions from both the one-halo and two-halo regimes are included, is only found under conditions in a narrow mass and redshift range for the general population of galaxies. Given a different HOD model as a function of galaxy properties, the shoulder or break from the power law would be more prominent. The ALFALFA correlation function therefore provides yet another approach from which we can better understand the evolution of H i and the distribution of gas-rich galaxies in the present universe. In a future paper, as the ALFALFA data set continues to grow, we will present an HOD analysis of the ALFALFA correlation function as an extension of the present work.

4.3. Systematics and Methodology

Our estimation of ξ(σ, π) included two choices upon which our results could be dependent. We selected both the logarithmic binning intervals for pair counting as well as the value πmax for projecting ξ(σ, π) into Ξ(σ). Investigations of alternative schemes suggest that our results are not strongly dependent on either the binning or the choice of πmax, though πmax is specifically selected to lead to a stable solution for ξ without introducing the noise and scatter from scales that are poorly probed by the α.40 sample.

We consider whether extreme redshift distortions nearby, for small values of czCMB, could be contaminating our results. To test this possibility, we repeat the measurement, this time excluding galaxies within czCMB < 2000 km s−1, as well as 3000 km s−1. This reduced the sample size to ∼9300 and ∼8900, respectively, but we found no difference in the final fitting parameters r0 and γ. We conclude that there is no advantage to be gained in eliminating nearby galaxies from α.40 for the correlation function analysis.

The expression for J3 in Equation (5) requires an expression for the shape of the correlation function. In calculating ξ(σ, π)), there is therefore a presumably small dependence on the as-yet-unknown parameters s0 and γ. As briefly mentioned in Section 3.1, one possible way to avoid any potential problems is to use the fiducial optical sample parameters s0 = 5.0 and γ = −1.8 to calculate the parameters for an H i-selected sample, and then iterate toward a stable solution. In attempts to do this, we find that there is no significant difference between the parameters estimated via these two methods, and confirm that ξ is not dependent on the precise form of J3. Such iteration does not provide any advantage. We demonstrate this in Figure 7, which displays the error contours on the power-law fit parameters for a sample limited to czCMB > 2000 km s−1, and for J3 using the approximate parameters estimated in Section 4.1. The results are very close to those in Figure 5 and the 1σ estimated parameters are identical.

Figure 7.

Figure 7. χ2 contours for γ and r0 (h−1 Mpc) excluding all galaxies with czCMB < 2000 km s−1. The parameters used in J3 are approximations of the H i-selected r0 (3.4 h−1 Mpc) and γ (−1.5). The dashed contour gives the 1σ projected uncertainties on each parameter, and the solid contours give 1, 2, and 3σ fits, respectively, to the pair of parameters.

Standard image High-resolution image

5. DISCUSSION

5.1. Comparison with Mock Catalogs

The correlation function of gas-rich galaxies has implications for the improvement of galaxy simulations, by providing an observational constraint for the results of simulations. This work will allow a better match between simulations and the observed relationship between gas mass and clustering properties. Simulations are just now progressing to the point where reasonable, realistic cold H i gas masses can be assigned to galaxies. In this section, we will compare the results of the correlation analysis of α.40 with presently available cold dark matter simulations.

We are limited in our ability to compare our observations to simulations by what is available publicly. Martin et al. (2010) took advantage of the Obreschkow et al. (2009, hereafter O09) simulation, which assigned cold gas to galaxies from the De Lucia & Blaizot (2007) catalog of Millennium Simulation galaxies. In that work, we found that O09's simulation provided a reasonable fit to the observed H i mass function. However, this catalog may not be adequate for comparison with an observed correlation function. In particular, O09 caution that the mass resolution of the simulation prevents them from applying their findings to faint, LSB, or low-mass galaxies. Given the known correlations between galaxy type and clustering, and between H i mass and luminosity, it would be difficult to use this catalog to explore the relationship between current simulations and ALFALFA's observations. Furthermore, O09 did not themselves carry out this analysis.

Kim et al. (2011) have provided another option for comparison, using a set of four GALFORM semi-analytical models that treats a range of processes which influence gas reservoirs, including cooling, ram-pressure stripping, mergers, star formation, and supernova feedback. They report results over a range of redshifts, but for this work only their results at z = 0 are of interest. They find that the galaxy–galaxy correlation function of the simulations is consistent with those found for HIPASS, and confirm that their simulation shows gas-rich galaxies as being significantly less clustered than dark matter. Differences between the models and the scales at which those differences are important can be used to highlight potential problems in the assumptions, such as models that overpredict the gas richness of satellites.

In Figure 8, we compare the models presented in Kim et al. (2011) with the observed α.40 correlation function for H i-selected galaxies. The models include that of Bower et al. (2006), labeled as Bow06; a modified version of the same, labeled MHIBow06; a version that uses a slightly different background cosmology that is in better agreement with the Wilkinson Microwave Anisotropy Probe (WMAP) parameters, labeled GpcBow06; and, finally, the model of Font et al. (2008), labeled as Font08. In all models, only galaxies with Mcold > 109.5 h−2 M, where MH i = 0.76 Mcold/(1 + .04), are included, which matches the HIPASS galaxy selection but may be more massive than would be ideal for matching α.40, which probes very small gas masses (Martin et al. 2010; Haynes et al. 2011). The models are described in detail in Kim et al. (2011), and here we only discuss the main differences that may be relevant for a comparison to α.40.

Figure 8.

Figure 8. Models for ξ(σ) from Kim et al. (2011), compared with α.40 (filled points with error bars).

Standard image High-resolution image

Bow06, MHIBow06, and Font08 all use the Millennium Simulation to track galaxies and halos, while GpcBow06 uses a different method involving merger trees and a large box size. Bow06 and Font08 are able to match optical luminosity functions, but both overpredict the abundance of H i in low-mass galaxies. MHIBow06 was created by adjusting the star formation timescale in Bow06, thereby fixing this excess while maintaining the agreement with optical properties of galaxies. GpcBow06, finally, also has a modified star formation prescription which better fits the H i mass function compared to Bow06.

In Figure 8, it is clear that MHIBow06 and Bow06 fit the observed H i correlation function on small scales, while Font08 drastically overpredicts and GpcBow06 drastically underpredicts the strength of clustering for gas-rich galaxies on those scales. At large scales, both GpcBow06 and Bow06 underpredict the clustering strength, while Font08 and MHIBow06 follow it closely. Although not a perfect match, the MHIBow06 model appears to be most consistent with the clustering of gas-rich galaxies over the full range of accessible scales.

Part of these differences may be due to the mass resolution of the models, given that α.40 probes to significantly lower masses than the HIPASS survey for which these models were designed. What is clear, however, is that α.40 already provides constraints that can begin to differentiate between successful and unsuccessful models, and the full ALFALFA sample should be able to provide very robust constraints for testing simulations that take H i into account.

5.2. The Bias Parameter for H i-selected Galaxies

The bias between any two classes of objects indicates their relative clustering strength. For cosmological purposes, we are interested in comparing the clustering of types of galaxies with the underlying dark matter halo distribution, in order to understand how well future surveys would probe the true (baryonic + dark) mass distribution. The comparison is achieved through the linear bias parameter at z = 0 b0:

Equation (14)

In general, b0 = b0(r), based on the linear theory prediction that b0 is independent of scale. In real galaxies, however, this is expected to become true only above intermediate scales of ∼10 h−1 Mpc. If b0 > 1.0, as is the case for red galaxies which tend to be found in clusters (e.g., Guzzo et al. 1997; Norberg et al. 2002; Zehavi et al. 2005; Li et al. 2006; Swanson et al. 2008), then the distribution is positively biased with respect to the dark matter. For galaxies like H i-selected populations, b0 < 1.0 and they are said to be antibiased.

As a proxy for the underlying dark matter distribution at z = 0, we use the correlation function of dark matter halos from the Millennium Simulation, given in Springel et al. (2005) as a function over the same scales that we are interested in. The Millennium Simulation, however, used an early WMAP estimated value for the parameter σ8 of ∼0.9 that is generally recognized to have been high. We have adjusted our calculations to use the recommended value σ8 = 0.8 from the seven-year WMAP (WMAP7) results (Larson et al. 2011).

In Figure 9, we compare that correlation function to the α.40 observation of ξ(r) for H i-selected galaxies using the inversion method. The dark matter correlation function deviates strongly from a power law on small scales, an indication of the well-known fact that bias is scale dependent. Dark matter is, as expected, significantly more strongly clustered than this particular population of galaxies, but the bias becomes less significant at intermediate and large scales.

Figure 9.

Figure 9. Real-space correlation function ξ(r) for dark matter from the Millennium Simulation (solid line, and adjusted for the WMAP7 value of σ8) and for H i-selected galaxies from α.40 (direct inversion method points with error bars, along with the best-fit power law plotted as dashed line; model fit given in Table 1).

Standard image High-resolution image

In Figure 10, we display the bias parameter as a function of scale. The error bars are based only on the α.40 uncertainties and they assume that there is no uncertainty in the Millennium Simulation's measurement of the correlation function. This figure reflects what we already understand about the clustering properties of H i selected galaxies: on small scales, the clustering of gas-rich galaxies is weaker, and on ever-larger scales the distribution of gas-rich galaxies begins to more closely reflect the underlying matter distribution. Basilakos et al. (2007) measured the linear bias parameter on large scales for the HIPASS sample using a different technique. Exploring modeled dark matter power spectra for different values of b0, including bias and assuming a concordance cosmology, they identified the most likely bias parameter. They found b0 = 0.7 ± 0.1, in general agreement with the findings for α.40 though we have measured the bias parameter as a function of scale. The preliminary work of Passmoor et al. (2011) is generally consistent with this result, though that work does not capture our finding that the sample becomes unbiased on large scales.

Figure 10.

Figure 10. Bias parameter b0(r) as a function of scale.

Standard image High-resolution image

Because of the very limited redshift extent of ALFALFA and the α.40 sample, our work cannot comment on the evolution of clustering or bias for H i-selected objects. However, as a robust measurement of these properties at z = 0, we do provide a benchmark observational constraint with which theoretical models will need to agree. The earliest work comparing the bias of H i at z = 0 to the anticipated evolution, by Basilakos et al. (2007), determined b0 ∼ 0.68 today and predicted b4 would range from ∼2 to 4 by z = 4. Recently, Marín et al. (2010) have used a different, simple bias model, incorporating observational constraints, which relates MH i to MDM to estimate H i masses of Millennium Simulation halos, and then investigated the bias of H i with respect to the halo distribution. At z = 0, they estimate that the overall linear bias parameter on large scales is ∼0.8. Their Figure 6 is more comparable to our Figure 10, and shows the same overall rise of the bias with increasing scale found here. Their models also predict that the bias will rise sharply with redshift, with the linear bias parameter reaching b4 ∼ 2 by z ∼ 4.

The α.40 observations and Marín et al. (2010) predictions have implications for large-scale 21 cm galaxy surveys and intensity mapping projects with such instruments as the SKA. If the theoretical results reflect the true evolution of H i gas in the universe, then these projects can expect strong 21 cm signals at a range of redshifts. Perhaps more importantly, the α.40 observation of the correlation function at low redshift provides a robust baseline constraint for the development of SKA model predictions, and future simulations will need to match both the H i mass function and the correlation function at z ∼ 0.

Because the H i-selected galaxy bias is likely to be strongly dependent on H i mass, with low-mass objects severely antibiased with respect to the underlying dark matter distribution, high-redshift surveys which are sensitive only to the high-mass end of the HIMF should expect to be mildly antibiased at low redshifts and increasingly positively biased at intermediate to high redshifts. We will explore the mass, color, and luminosity dependence of the correlation function as the focus of a future work.

6. SUMMARY AND CONCLUSIONS

We have used the ∼10,150 galaxy α.40 sample to measure the correlation function of H i-selected galaxies in the local universe. We use bootstrap resampling and a full covariance analysis in order to model the real-space correlation function on scales <10 h−1 Mpc as a power law, ξ(r) = (r/r0)−γ. We find that γ = 1.51 ± 0.09 and that the clustering scale length is r0 = 3.3 + 0.3, −0.2 h−1 Mpc. Furthermore, we show using a direct inversion method that the observed α.40 real-space correlation function closely follows this power law. The direct inversion method also allows exploration of the divergence from a single power law, seen as a "shoulder" in the correlation function at scales of ∼ a few h−1 Mpc. Our findings are shown to be robust against the precise form of the weighting used in the pairwise estimation of ξ(σ, π) and the α.40 sample selection criteria. The superior sensitivity of ALFALFA, and high selection function, allows us to include the full survey redshift range (cz = 0–15,000 km s−1) without the introduction of significant noise in the analysis.

The clustering of H i-selected galaxies is significantly weaker than the clustering of general populations of optically selected galaxies, and is most closely comparable to samples of faint, late-type, blue and/or star-forming galaxies found in optical (and infrared) surveys. Available models of H i in simulated galaxies are in general agreement with our observations, and the α.40 measurement of the correlation function is robust enough to begin constraining these models.

Finally, we measure the bias parameter for α.40, using the correlation function of dark matter halos from the Millennium Simulation, and find that the small-scale clustering of H i galaxies is severely antibiased with respect to the underlying dark matter distribution. On large scales, the antibiasing becomes only moderate. We suggest that isolating the high-mass galaxies in α.40 will show that this population more closely follows the true mass distribution and that an abundance of low-mass galaxies in underdense voids partially explains the strong antibiasing observed.

The α.40 sample provides, for the first time, a robust measurement of the clustering of H i-selected galaxies, which can be used to provide observational constraints for theoretical models. While gas-rich galaxies are, currently, poorly modeled in N-body and semi-analytic simulations of the universe at z = 0, this situation is likely to change given the results presented here and, especially, the full results when the ALFALFA catalog is complete with a sample of ∼30,000 objects.

The models of Kim et al. (2011), which reproduce the clustering characteristics of the HIPASS sample, can now be exploited to attempt to understand the clustering revealed by ALFALFA galaxies. Conversely, we may find that these models are not able to reliably reproduce the more complex characteristics of α.40, particularly the dependence on galaxy characteristics (e.g., H i mass, color). α.40 can therefore contribute to the improvement of these models, working to close the gap between the extremely detailed optical characteristics of simulated galaxies and the poor understanding of where cold gas fits into the picture. To date, such models could only be loosely compared to H i-selected samples, given the lack of a large survey like ALFALFA and robust measurements of cosmological simulations for these samples. Instead, these models typically focus on fitting the luminosities and stellar characteristics of observed galaxies, which are related to gas reservoirs (since gas fuels star formation), but only indirectly.

If simulations can be adjusted now that robust benchmarks exist for the z = 0 characteristics of H i-selected galaxies (e.g., this work, Martin et al. 2010; Papastergis et al. 2011, and others), this could constrain the allowed evolutionary tracks that the distribution of gas reservoirs may have followed. Furthermore, the clustering of H i-selected galaxies, in particular high-mass galaxies, can be applied to make predictions of the strength of the signal that will be obtained with future intensity mapping projects, which will not resolve individual galaxies but will measure the bulk H i on ∼10 Mpc scales. The α.40 measurement of the H i-selected galaxy bias indicates that, at low redshift, the selection of high-H i mass galaxies over large scales ensures a sample that adequately probes the underlying dark matter distribution.

These findings, and the potential for more robust understanding of the role of gas in galaxy evolution, motivate further work in this area. We will further explore clustering properties of gas-rich galaxies as the focus of future work; E. Papastergis et al. 2012 (in preparation) are analyzing the dependence of ξ(r) on such properties as galaxy color, gas fraction, luminosity, and H i mass. ALFALFA has a distinct advantage in exploring this dependence, given its high sensitivity across five orders of magnitude in H i mass, its blind ability to detect both LSB and extremely large, bright spirals, its coverage of a cosmologically representative volume, and its overall sample size.

The authors gratefully acknowledge the work of the entire ALFALFA collaboration team in observing, flagging, and extracting the catalog of galaxies used in this work. We also thank Han-Seek Kim for kindly providing the correlation function models from GALFORM, presented in Kim et al. (2011), and we acknowledge the efforts of the GALFORM team in developing the code that led to those model results.

This work was supported by NSF grants AST-0607007 and AST-1107390, and by grants from the National Defense Science and Engineering Graduate (NDSEG) fellowship and from the Brinson Foundation. A.M.M. was partially supported by an appointment to the NASA Postdoctoral Program at the LaRC, administered by Oak Ridge Associated Universities through a contract with NASA.

Footnotes

  • The Arecibo Observatory is operated by SRI International under a cooperative agreement with the National Science Foundation (AST-1100968), and in alliance with Ana G. Mendez-Universidad Metropolitana, and the Universities Space Research Association.

Please wait… references are loading.
10.1088/0004-637X/750/1/38