Quick recipes for gravitational-wave selection effects

Accurate modeling of selection effects is a key ingredient to the success of gravitational-wave astronomy. The detection probability plays a crucial role in both statistical population studies, where it enters the hierarchical Bayesian likelihood, and astrophysical modeling, where it is used to convert predictions from population-synthesis codes into observable distributions. We review the most commonly used approximations, extend them, and present some recipes for a straightforward implementation. These include a closed-form expression capturing both multiple detectors and noise realizations written in terms of the so-called Marcum Q-function and a ready-to-use mapping between signal-to-noise ratio (SNR) thresholds and false-alarm rates from state-of-the-art detection pipelines. The bias introduced by approximating the matched filter SNR with the optimal SNR is not symmetric: sources that are nominally below threshold are more likely to be detected than sources above threshold are to be missed. Using both analytical considerations and software injections in detection pipelines, we confirm that including noise realizations when estimating the selection function introduces an average variation of a few %. This effect is most relevant for large catalogs and specific subpopulations of sources at the edge of detectability (e.g. high redshifts).


Introduction
Gravitation-wave (GW) surveys are affected by selection biases.GW selection effects are relatively clean to model compared to those of conventional (i.e.electromagnetic) astronomy because, unlike photons, GWs travel unaffected across the Universe from emission to detection.That said, our detectors are not equally sensitive to compact binaries with different parameters (masses, spins, distance, inclination, etc.), which implies their observational coverage is not uniform.Selection-effect modeling is crucial in GW population studies [1,2] and it is not an exaggeration to say that accurately estimating the probability of detection -commonly referred to as p det -is a key ingredient to the success of GW astronomy as a whole.Indeed, it was shown [3] that modeling errors in the selection function will be (and perhaps already are) the leading limiting factor in our astrophysical inference, or at least that which scales more severely with the number of observed events.
The modern and more accurate approach to estimating selection biases is that of performing software injections using the same pipelines that are used for detection [4].
While accurate, this procedure is computationally expensive, requiring reweighting schemes [5], calibration factors [6], and an effective number of samples that is at least a factor of a few greater than the number of events in the catalog [7].In practice, many astrophysical predictions in the field relies on (semi-)analytical approximations to p det .Seminal work in this direction was presented by Finn and Chernoff [8], who approximated the detection statistics using the optimal signal-to-noise ratio (SNR) and factored out the dependence on the binary extrinsic parameters.Their approach is still widely adopted, with SNR thresholds ρ t ranging from ∼ 8 to ∼ 12 [9].More recent advances include those by Essick [10], who presented a semi-analytical model of GW detectability capturing the finite size of template banks, with all the associated complexities.Additional attempts leveraging both analytical [11] and machine-learning [12,13] techniques have also been explored.One can also calibrate SNR thresholds a posteriori, i.e. using the events that are considered detected [14].
In this paper, we review and extend some of the most commonly used approximations to GW selection effects, notably including noise realizations (Sec.2).For a single detector, we further develop the marginalization over the extrinsic parameters first presented in Ref. [8].For multiple detectors, we show that thresholding the matched filter SNR results in an expression for p det that can be written down in closed form using special functions.In short, it is sufficient to substitute a sharp step function with a so-called Marcum Q-function [15,16,17].We apply our findings to both controlled distributions and LIGO/Virgo injections (Sec.3).Thresholding the matched filter SNR instead of the optimal SNR results in values of p det that are systematically higher, and thus merger rates that are systematically lower.While this effect is of a few %, its impact grows dramatically with the size of the GW catalog [3] and disproportionally affects those specific regions of the parameters space where sources are close to the detection horizon [18,19,20].Our expressions are straightforward to implement and can be used to quickly post-process large samples of simulated sources such as the outputs of astrophysical population-synthesis codes (Sec.4).To facilitate the exploitation of our findings, we also describe a straightforward implementation of the Marcum Q-function for the Python programming language (Appendix A).

SNR thresholds
We organize our calculation in three steps of increasing complexity.In what follows, the symbol θ collectively denotes the intrinsic parameters of a compact binary (e.g.masses, spins) as well as the distance to the sources, while ξ indicates the remaining extrinsic parameters (inclination, sky location, and polarization angle).

Single detector & optimal SNR
Let us first consider the case of a single detector.The optimal SNR is given by where f indicates frequency, h(f ) is the GW strain, and S(f ) is the one-sided power spectral density of the detector.The word "optimal" has sometimes been used in the literature (including by some of the authors [12]) to indicate the SNR of an optimally oriented source.In this paper, we refer to the more common meaning of optimality with respect to noise realizations.A common approximation [8] is to take ρ opt as a detection statistics, i.e. assume that a source is detectable if the optimal SNR is greater than a given threshold ρ t .That is, we write where I is an indicator function equal to one if the condition inside the brackets is true and zero otherwise.* For a given distribution of extrinsic parameters p(ξ), one can compute (with an abuse of notation) For a single detector, the optimal SNR factorizes as follows [8,12] where 0 ≤ ω ≤ 1 is a projection parameter and ρ max (θ) is the optimal SNR of an optimally oriented source (i.e. a binary with face-on inclination located overhead the detector).In particular, one has ξ = {ι, ϑ, ϕ, ψ} and where ι is the binary inclination, ϑ and ϕ are a polar and an azimuthal angle for the sky location, and ψ is the polarization angle.Note that the factorization of Eq. ( 4) breaks down for precessing sources because the inclination of the orbital plane depends on the emitted frequency.That said, this effect is likely to be mild because current GW observations cover ≲ 1 precession cycle.With this factorization, the integral in Eq. ( 3) becomes [8] p det (θ Assuming that the distribution p(ξ) of the extrinsic parameters is known, this implies that selection effects can be estimated by evaluating the complementary cumulative distribution function of ω [8] The simplicity of this approximation is appealing: estimating selection effects reduces to evaluating a single waveform h(f ) for an optimally oriented source with intrinsic parameters θ, calculating ρ max from Eq. ( 1), and evaluating the universal function p(> ω) at ω = ρ t /ρ max .Figure 1 shows the outcome of this procedure assuming that sources are distributed isotropically in the sky (i.e.we take uniform distributions in cos ι, cos ϑ, ϕ, and ψ).To facilitate comparison with the rest of the paper, we show the averaged detectability p det (θ) as a function of ρ max /ρ t instead of its inverse, even though the latter enters Eq. ( 6) directly.For instance, the black curve in the top panel of Fig. 1 indicates that at least ∼ 80% of the binaries with a given set of intrinsic parameters θ will be detectable if at least one of these sources has an SNR that is ∼ 5 times above the detection threshold.

Single detector & matched filter SNR
Because noise realizations, any specific GW source will not be observed with SNR ρ opt but rather with some other value ρ obs .In the standard matched-filtering approach to GW detection, ρ obs is given by the filtered signal (made of both GWs and noise) normalized by its own root-mean-square; see Refs.[21,22].One can thus include the effect of noise realizations in the estimate of p det by thresholding the observed SNR ρ obs instead of ρ opt and computing the expectation value over noise realizations n, i.e.
Assuming the noise in the detector is Gaussian and stationary, the observed SNR is distributed normally around the optimal SNR with unit variance (see Refs. [21,22] but also Ref. [10] for caveats).Because of this property, computing ρ opt reduces to evaluating in Eq. ( 1) and adding a variance of one.One has and thus [23] where erf is the error function.Marginalizing over the extrinsic parameters as above yields Our results are shown in Fig. 1 assuming isotropic sources.Note that, unlike Eq. ( 6), thresholding the observed SNR does not result in a universal function of ρ max (θ)/ρ t but rather a one-parameter family of functions, where the additional parameter is ρ t itself.
In practice, introducing the SNR variance due to noise realizations causes variations in p det that are of O(1%).The effect decreases as ρ t increases: if the threshold is large, it is less likely that introducing a variance of one might turn a detectable event  into a non-detectable event, or vice versa.Broadly speaking, we find that the GW detectability computed by thresholding the matched filter SNR is larger (smaller) than that obtained by thresholding the optimal SNR when signals are weak (loud).From Fig. 1, the transition between these two behaviors takes place at ρ max ≃ 4ρ t .The largest deviations are found at SNRs ρ max ≃ 2ρ t .
The luminosity distance d L of astrophysical objects in the nearby Universe is distributed geometrically, max [24] (but note this will not be true for third-generation detectors [25]).
This implies that the left part of the curve in Fig. 1 where ρ t ≃ ρ max has a disproportionally larger impact on the overall population of detected sources.Therefore, thresholding the optimal SNR instead of the observed SNR has the net effect of underestimating p det (i.e. the residuals in Fig. 1 are positive) and thus overestimating the astrophysical merger rate.
One caveat here is that we did not truncate the Gaussian distribution in Eq. ( 9) to impose ρ obs ≥ 0. This is appropriate as long as the threshold value is much greater than the SNR variance, i.e. ρ t ≫ 1.

Multiple detectors & matched filter SNR
Let us now consider a network of multiple detectors.The network SNR is the root sum square of the individual SNRs, i.e.
where N is the number of interferometers.Each of the ρ obs,i 's is distributed normally around optimal values ρ opt,i with unit variance.The square root of the sum of Gaussian variates with different means and unit variances is distributed according to the noncentral χ distribution (which is also known as the generalized Rayleigh distribution) [26]; see also Ref. [10].The probability density function of ρ obs is where I ν (z) is the modified Bessel function of the first kind and order ν [27].The parameter ρ opt in Eq. ( 13) is root sum square of the individual optimal SNRs calculated as in Eq. (1), i.e.
Note how the detection probability in Eq. ( 13) only depends on the combined quantity ρ opt and not on how this SNR is distributed among the various instruments in the network.
We can now threshold the matched filter SNR ρ obs and compute the expectation value over noise realizations as in Eq. ( 8).We find this can also be written down using special functions.In particular, one has where is the generalized Marcum Q-function of order ν [15,16,17], which is used in the field of digital communications (but see e.g.Refs.[28,29] for some previous appearances in GW astronomy).In words, the generalized Marcum Q-function is the complementary cumulative distribution function of the non-central χ distribution.A two-line Python code to evaluate Eq. ( 16) is described in Appendix A and made available at Ref. [30].In Fig. 2, we evaluate Eq. ( 15) for the case of N = 3 interferometers.As expected, thresholding the observed SNR smoothens the sharp step one would instead obtain when using the optimal SNR as detection statistics, allowing for some finite probability for events below (above) threshold to be detected (missed).It is interesting to note that this effect is not symmetric: binaries with optimal network SNR that is below threshold are more likely to be detected than binaries with optimal network SNR above threshold are to be missed.That is, thresholding the optimal SNR ρ opt instead of the matched filter SNR ρ obs underestimates p det , hence overestimates the intrinsic merger rate.This effect is enhanced by the expected astrophysical SNR probability p(ρ opt ) ∝ ρ −4 opt [24], which implies binaries are more likely to be found below than above threshold.
We further quantify this detection/non-detection asymmetry as follows, borrowing terminology from that of a classification problem [31] where the actual outcome is given by ρ obs > ρ t and the predicted outcome is given by the test ρ opt > ρ t .For a set of sources with SNRs distributed according to p(ρ opt ), there are four mutually exclusive cases: • ρ t < ρ obs , ρ opt or "true positives".These events are flagged as detectable irrespectively of the thresholding strategy.The fraction of sources in this class is • ρ obs , ρ opt < ρ t or "true negatives".These events are flagged as non detectable irrespectively of the thresholding strategy.The fraction of sources in this class is • ρ opt < ρ t < ρ obs or "false negatives".These events are detectable but would be classified as non detectable if one neglects the SNR variance.The fraction of sources in this class is • ρ obs < ρ t < ρ opt or "false positives".These events are non detectable but would be classified as detectable if one neglects the SNR variance.The fraction of sources in this class is Figure 3 shows the fractions FN and FP as a function of the threshold ρ t and the number of detectors N .We consider a population with ρ opt ∈ [1, 100] distributed according to p(ρ opt ) ∝ ρ −4  opt .Both fractions go to zero as ρ t increase, corresponding to the curves of Fig. 2 approaching a step function: imposing a high detectability threshold makes it less likely for sources with a given optimal SNR to be scattered above or below threshold by a specific noise realization.The rate of false negatives is about an order of magnitude larger than that of false positives, confirming our point above.This difference increases with the number of detectors in the network, which is a consequence of Eq. (16).
If needed, one can marginalize Eq. ( 15) over the extrinsic parameters as in Eq. ( 3).Note however that the factorization of Eq. ( 4) is not useful when considering multiple detectors because sources cannot be optimally oriented for all interferometers at the same time.

Applications to LIGO/Virgo
In population studies, the quantity that enters the hierarchical likelihood [1,2] is the expectation value (abusing notation once more) where λ indicates the population parameters and p pop (θ, ξ|λ) is the chosen population model.In case one is only trying to measure the population properties of the intrinsic parameters [4], then p pop (θ, ξ|λ) = p pop (θ|λ)p(ξ) and p det (λ) = p det (θ)p pop (θ|λ)dθ.
In this section, we first compute population-averaged detectabilities on a controlled experiment and then use software injections in real LIGO noise.

Toy population
We estimate the impact of our findings on a toy population of GW events observable by the LIGO/Virgo network.We take a simple population p pop (θ, ξ|λ) where black-hole binaries have source-frame primary masses , source-frame secondary masses m 2 ∈ [5M ⊙ , m 1 ] distributed uniformly, redshifts z ∈ [0, 1] distributed uniformly in comoving volume and source-frame time p(z) ∝ (dV c /dz)/(1 + z), spins magnitudes χ 1,2 ∈ [0, 1] distributed uniformly, and spin directions distributed isotropically.We assume a flat ΛCDM cosmological model with parameters from Ref. [32].For the extrinsic parameters, we take uniform distributions in cos ι, cos ϑ, ϕ, and ψ as above.We consider a three-instrument network made of LIGO Livingston, LIGO Hanford, and Virgo at their design sensitivities [9] and compute optimal SNRs ρ opt,i using the the IMRPhenomX approximant [33].For this example, our threshold is set to ρ t = 12.
Figure 4 shows the resulting distributions of optimal network SNRs ρ opt (θ, ξ) and probabilities of detection p det (θ, ξ).A Monte Carlo estimate of the integral in Eq. ( 21) computed over the entire population returns p det (λ) ∼ 0.027 (we used 10 6 samples, so the error on this number is ∼ 10 −3 ).If one only selects events with |ρ opt (θ, ξ) − ρ t | < 1 (i.e.those close to threshold), we find p det (λ) ∼ 0.496.This should be compared with the analogous value ∼ 0.435 one would instead obtain with a simple cut on the optimal SNR.Marginalizing over noise realizations when estimating selection effects results in a higher p det and mostly affects subpopulations of sources that are close to detection threshold.

Pipeline injections
It is important to remember that the SNR (either optimal or observed) provides approximate information on the GW detectability.The quantity returned by GW  .Distribution of optimal SNR ρopt(θ, ξ) (top) and detectabilities p det (θ, ξ) (bottom) for a toy population of black-hole binaries observable by the LIGO/Virgo network at design sensitivity, assuming a threshold ρt = 12.In the top panel, the green histogram shows the optimal SNRs.The vertical solid (dotted) black lines show the median (90% inter-quantile range) of the probability of detection from Eq. ( 15), indicating the region where the transition between nondetectability to the left and detectability to the right takes place.In the bottom panel, the green histogram shows the detectabilities obtained by thresholding the matched filter SNR ρ obs and the two dashed black lines show the fractions of binaries that would be marked as detectable (p det = 1) and non-detectable (p det = 0) if one were to instead threshold the optimal SNR ρopt.
detection pipelines is the false-alarm rate (FAR), which is indeed the statistics in used state-of-the-art population analyses [4] to both compile the list of events and estimate selection effects (other selection strategies are sometime used, see e.g.p astro in Ref. [34]).Using the population average of Eq. ( 21), we now present a calibration of the SNR threshold that enters our p det approximation to reproduce the response of current detection pipelines as a function of their FARs.
We use software injections performed in real noise from the third FAR t [yr Mapping between SNR threshold ρt and FAR threshold obtained by matching the population-averaged detectability p det (λ).We use software injections into five different GW detection pipelines (colored curves) as well as the minimum FAR across all pipelines (black curves).Solid (dashed) curves are computed using the observed (optimal) SNR when thresholding for detectability.Curves are restricted to the regime of validity of our analysis (see text).To guide the eye, the grey dotted lines indicate FARt = 1/yr and ρt = 11.
LIGO/Virgo/KAGRA observing run [35].They report FAR values for five different detection pipelines and LIGO-only (i.e.N = 2) optimal SNRs for a fiducial population of sources (some of the injections also include Virgo; we neglect this difference and have verified it has a negligible impact on our results).Note their adopted population is different from that of our toy case above (see Ref. [35] for details).We use the injection dataset labeled as "mixture," which is their recommended default.We consider all the injections provided, including cases where the FAR could not be quantified (and is reported as ∞ in the dataset).Note the injections were performed uniformly in time, thus taking into account the duty cycle of the detectors.We compute the population-average detection probability by thresholding their FARs and match it with the population-average detection probability obtained with our p det approach.Figure 5 shows the resulting mapping between the two quantities.We repeat our calculation for each of the five detection pipelines provided in the available dataset [35] as well as by selecting the minimum FAR for each injected source.The latter is in line with the criterion used in Ref. [4] for selecting compact binaries of astrophysical origin.This calculation is not reliable for SNRs that are ≲ 6 because those injections were deemed "hopeless" to save computational time [35].For those values of ρ t , there is a (potentially large) number of missing sources with optimal SNRs below threshold that could have been scattered above threshold by the SNR variance.We thus restrict our analysis to ρ t ≥ 8, which is a few standard deviations away from the "hopeless" cut.
Additionally, some pipelines have minimum and maximum FAR values they attempt to quantify.We estimated these limits from the provided datasets and truncated the curves in Fig. 5 accordingly.These limitations do not significantly impact the minimum FAR calculation across pipelines, at least in the region of interest shown in Fig. 5.
Figure 5 shows that selection effects computed using a minimum FAR threshold of ∼ 1 yr −1 are reproduced by an SNR-based cut with threshold ρ t ≃ 11.We find once more that the bias induced by thresholding ρ opt instead of ρ obs is of a few % and underestimates p det across the entire range: that is, the ρ t threshold for a given FAR obtained when marginalizing over noise realization is larger compared to the case where noise is neglected.The difference between the two treatments is smaller than the typical difference between the various detection pipelines (and indeed current LIGO/Virgo selection criteria combine information from multiple pipelines).The population used in Ref. [35] is deliberately broad, while our results above indicate that the subpopulation of sources that are close to threshold will be affected more significantly by the SNR variance.Confirming this expectation with dedicated pipeline injections is left to future work.
As shown in Fig. 5, we empirically find that the mapping between ρ t and the minimum FAR threshold across the available pipelines is well described by a power law: A simple least-square regression returns p 1 = −1.80 and p 0 = 20.1 when thresholding using ρ obs (i.e. this is a fit to the solid black curve in Fig. 5), and p 1 = −1.72 and p 0 = 18.7 when thresholding using ρ opt (i.e. this is a fit to the dashed black curve in Fig. 5).These fits can be used to quickly filter synthetic distributions according to the desired purity of the resulting GW simulated catalog, though with the important caveat that this relationship was calibrated on a specific population of sources.We stress that Fig. 5 and Eq. ( 22) present a calibration between thresholds, not a mapping of SNR to FAR for a given trigger and pipeline.

Summary
We summarized and extended the most common treatment of GW selection effects, namely that of thresholding the SNR.We focused in particular on the marginalization over extrinsic parameters and noise realizations, considering both single and multiple detectors.
When modeling the filter imposed to observations by the detectors, the simplest strategy is to consider a source "detectable" if its optimal SNR ρ opt is sufficiently large [8].This approach does not take into account the SNR variance induced by noise realizations.As presented here, incorporating such an effect is straightforward and results in a closed-form expression of p det .All one needs to do is substitute the sharp step p det (θ, ξ) = I[ρ opt (θ, ξ) > ρ t ] with the smooth transition p det (θ, ξ) = Q N/2 ρ opt (θ, ξ), ρ t .
Including noise realizations when estimating p det takes care of the conceptual point recently raised in Ref. [20].There the authors argue that a consistent detectability estimate should never make use of the true signal parameters, which are not accessible, but only of the data, and these are inevitably affected by noise.
Our expression is appealing for its simplicity but still approximate.Notable simplifications include (i) assuming that noise is stationary and Gaussian, which is never perfectly the case, (ii) thresholding events using the SNR and not the FAR, and (iii) neglecting errors due to the finite size of our template banks [10].Further improvements include considering the phase-maximized SNR, which can also be written down analytically using special functions [36].
We find that a SNR threshold of ∼ 11 reproduces the selection criterion used in current analyses (FAR < 1 yr −1 for binary black holes [4]) and that the inclusion of noise realizations increases the average detection probability p det by a few %.While nominally modest, this effect becomes increasingly important as the number of detections grows because systematic errors related to selection effects scale faster than linearly with the number of events in the catalog [3].Furthermore, the projected bias is not uniform across the parameter space and disproportionally affects sources at the edge of detectability.For instance, this will be relevant to analyses targeting compact binaries at high redshift [18,19,20] and attempting to discriminate their origin as either astrophysical or cosmological [37,38].
Accurate modeling of selection effects is prominent in both (i) GW population studies, where selection effects enter the hierarchical likelihood, and (ii) the development of astrophysical predictions, where outputs of population-synthesis codes are postprocessed to obtain detectable distributions.We hope the "collection of recipes" we presented here will provide a useful companion to researchers working in either of these two contexts, facilitating the treatment of selection biases in GW astronomy.

Figure 1 .
Figure1.Detection probability p det (θ) averaged over the extrinsic parameters as a function of the optimal, single-detector SNR ρmax(θ) of an optimally oriented source with intrinsic parameters θ.The black-dashed line is obtained by thresholding the optimal SNR.Colored solid lines are obtained by thresholding the matched filter SNR with various thresholds ρt.The top panel shows the detection probability itself, while the bottom panel shows the difference between p det obtained using the matched filter SNR as a detection statistics and p det obtained using the optimal SNR instead.The vertical dotted lines indicate ρmax(θ) = ρt.

Figure 2 .
Figure2.Detectability p det (θ, ξ) of GW signals with given intrinsic and extrinsic parameters for a network of N = 3 detectors as a function of the optimal SNR ρopt(θ, ξ).The black-dashed line is obtained by thresholding the optimal SNR itself, resulting in a step function centered on the threshold ρt.Colored solid curves are obtained by thresholding the observed network SNR according to Eq. (15) assuming various thresholds ρt.

5 Figure 3 .
Figure 3. Fraction of false negative (blue, FN) and false positive (red, FP) detections one would get by thresholding the optimal SNR ρopt instead of the matched filter SNR ρ obs .Both fractions decrease with the SNR threshold ρt and their difference increases increases with the number of detectors N .Dashed, solid, and dotted curves refers to networks of N = 1, 3, and 5 interferometers, respectively.

Figure 4
Figure 4. Distribution of optimal SNR ρopt(θ, ξ) (top) and detectabilities p det (θ, ξ) (bottom) for a toy population of black-hole binaries observable by the LIGO/Virgo network at design sensitivity, assuming a threshold ρt = 12.In the top panel, the green histogram shows the optimal SNRs.The vertical solid (dotted) black lines show the median (90% inter-quantile range) of the probability of detection from Eq. (15), indicating the region where the transition between nondetectability to the left and detectability to the right takes place.In the bottom panel, the green histogram shows the detectabilities obtained by thresholding the matched filter SNR ρ obs and the two dashed black lines show the fractions of binaries that would be marked as detectable (p det = 1) and non-detectable (p det = 0) if one were to instead threshold the optimal SNR ρopt.