Flexible and Accurate Evaluation of Gravitational-wave Malmquist Bias with Machine Learning

Many astronomical surveys are limited by the brightness of the sources, and gravitational-wave searches are no exception. The detectability of gravitational waves from merging binaries is affected by the mass and spin of the constituent compact objects. To perform unbiased inference on the distribution of compact binaries, it is necessary to account for this selection effect, which is known as Malmquist bias. Since systematic error from selection effects grows with the number of events, it will be increasingly important over the coming years to accurately estimate the observational selection function for gravitational-wave astronomy. We employ density estimation methods to accurately and efficiently compute the compact binary coalescence selection function. We introduce a simple pre-processing method, which significantly reduces the complexity of the required machine-learning models. We demonstrate that our method has smaller statistical errors at comparable computational cost than the method currently most widely used allowing us to probe narrower distributions of spin magnitudes. The currently used method leaves 10%–50% of the interesting black hole spin models inaccessible; our new method can probe >99% of the models and has a lower uncertainty for >80% of the models.

Over the past several years, there has been increasing interest in population studies, which seek to measure the distribution of astrophysical parameters such as the mass, spin, and distance of merging compact objects using events observed with Advanced LIGO/Virgo (Aasi et al. 2015;Acernese et al. 2015) (see, e.g., Abbott et al. (2021a); Roulet et al. (2021); The LIGO Scientific Collaboration et al. (2021b) and references therein).To perform unbiased inference on the distribution of astrophysical parameters, it is necessary to account for selection biases when performing population inference, see, e.g., Loredo (2004); Farr et al. (2015); Mandel et al. (2019); Thrane & Talbot (2019); Vitale et al. (2020).The standard method employed in gravitational-wave searches requires computing the total sensitivity of search pipelines for a given population model.Evaluating the sensitivity for different population parameters involves integrating the time-dependent sensitivity to all binary parameters over the whole observing time.
Astrophysical population inference is typically performed using Markov Chain Monte Carlo (MCMC) (Metropolis et al. 1953;Hastings 1970) or nested sam-pling algorithms (Skilling 2004), which can require O(10 3 − 10 7 ) likelihood evaluations for a well converged run, and the selection function must be evaluated on the fly at every iteration.As the binary black hole catalog grows this integral must be evaluated with increasing precision (Farr 2019), and correspondingly increased computational cost.Additionally, as the catalog grows, so does our resolving power, meaning that sub-dominant effects, e.g., the effect of black hole spin on the sensitivity, must be considered.Guaranteeing sufficient precision is especially challenging for narrow population models as the Monte Carlo integrals currently performed are poorly suited to probing these distributions.In this work, we demonstrate that by performing a density estimation step on the set of found injections, we can dramatically increase the efficiency of these calculations, enabling us to probe narrow population models.
The rest of this paper is structured as follows.In the next section, we define some relevant quantities and outline methods for accounting for Malmquist bias in gravitational-wave searches.We then briefly summarize a few preliminaries for gravitational-wave population inference in Section 3. Following this, in Section 4, we describe the problem of density estimation and discuss various commonly used methods.In Section 5 we estimate the gravitational-wave transient selection function as a function of binary parameters using a Gaussian mixture model.After this, we apply our methods to the binary black hole systems identified in The LIGO Scientific Collaboration et al. (2021a).Some closing thoughts are then provided.The notebook that performed the analysis presented here can be found at Talbot (2022).

SENSITIVITY ESTIMATION
Most gravitational-wave population analyses impose a detection threshold on the analyzed triggers to avoid contamination from terrestrial noise sources, for example, demanding that the false-alarm rate of a trigger is less than once per year.Applying this threshold leads to a selection bias in the observed sample.We quantify this by considering the probability that a signal with parameters θ (e.g., binary mass and black hole spin) would surpass our threshold ρ th 1 p det (θ) = ρ>ρ th dd p(d|θ). (1) The integral is over all observed data and p det is the fraction of the data which surpasses the threshold under the assumption that a signal with parameters θ is 1 The specific choice of threshold is irrelevant so long as it is robustly defined.
present.For population analyses, we require the fraction of all sources that are detectable, for a given population model, characterized by parameters, Λ, where p(θ|Λ) is a conditional prior for θ given population (hyper-) parameters Λ, e.g., the shape of the black hole mass distribution.For a detailed derivation of these quantities see, e.g., Finn & Chernoff (1993); Messenger & Veitch (2013); Farr et al. (2015); Tiwari (2018); Thrane & Talbot (2019); Mandel et al. (2019).We emphasize that all population analyses which apply a threshold necessarily have a corresponding selection bias that must be accounted for, including analyses that explicitly model contamination of the sample from terrestrial sources (Gaebel et al. 2019;Galaudage et al. 2020;Roulet et al. 2020).However, see Smith et al. (2020) for a method that avoids thresholds entirely.
The integral over d in Eq. 1 requires that we understand the sensitivity of gravitational-wave searches throughout the observing history.In practice, there are currently two widely used methods to compute this integral: inject simulated signals into the data and see how many of them are recovered by the search pipelines; or use a semi-analytic approximation based on the power spectral density of the interferometers, e.g., Finn & Chernoff (1993).
The former method gives the most faithful representation of the search sensitivity.However, the latter has several computational advantages.Because of the large parameter space that must be covered, the injection and recovery procedure gives us only the parameter values of the found/missed signals, whereas the semi-analytic approach can efficiently generate a numerical value for p det marginalized over specific nuisance parameters.Thus, the semi-analytic approach can also be performed much more computationally cheaply due to the cost of performing and recovering injections.Previous methods to improve the reliability of semi-analytic estimates include calibration of semi-analytic estimates with the output of injection campaigns (Wysocki & O'Shaughnessy 2018) and phenomenological fits, e.g., Fishbach et al. (2018); Veske et al. (2021).
There have been several recent methods to leverage supervised machine learning methods to estimate p det (Gerosa et al. 2020;Wong et al. 2020b).Gerosa et al. use a neural network classifier to give a binary outcome of detectable or not detectable for a given set of binary parameters; this method requires retraining when the threshold is changed.In Wong et al., the authors train a neural network regressor to estimate the signal-to-noise ratio (a commonly used detection threshold), allowing for the threshold to be changed trivially.However, both of these methods require specifying all of the binary parameters in order to evaluate p det .In this work, we use density estimation on the set of found injections to provide a continuous, generative, model for p det in arbitrary subsets of the binary parameters.
The integral over θ in Eq. 2 marginalizes over all the parameters describing the source-15 parameters to completely characterize a quasi-circular binary black hole merger-in addition to any parameters describing the state of the instruments.In practice, many of the parameters are not modeled in current population analyses; the most complex models considered currently fit for the distribution of seven of these parameters, the two component masses, spin magnitudes, spin-tilt angles, and redshift, requiring the evaluation of a sevendimensional integral within each likelihood evaluation.The other parameters are assumed to be well described by the prior distributions used during sampling.These are mostly geometric parameters describing the position and orientation of the binary, although it is possible that some of these parameters may deviate from isotropy.For example, we could search for deviations from isotropy over the sky position, e.g.Payne et al. (2020); Stiskalek et al. (2021), or features in the distribution of the azimuthal spin parameters due be influenced by spin-orbit responances (Schnittman 2004;Gerosa et al. 2018;Varma et al. 2021).This integral is, therefore, recast as a Monte Carlo integral over the set of found injections (Tiwari 2018;Farr 2019) Here, p(θ|Λ 0 ) is the distribution of the injected signals which will depend on the specific analysis, N inj is the total number of injected signals and N found is the number of injections surpassing the threshold.The sum in Equation 3 is over samples drawn from the distribution of found injections To ensure sufficient convergence of the Monte Carlo integral we must have an effective sample size of at least four times the number of observed events (Farr 2019).This means that to fit tightly peaked distributions we need a large number of samples for the distribution of found injections or a continuous representation of p det .Performing more injections to increase the number of recovered injections quickly becomes computationally prohibitive.
In this work, we resolve this issue by performing density estimation using the set of found injections.Using these density estimates, we can directly evaluate p det and/or generate additional samples from the distribution of found injections.

Models
For demonstration purposes, we consider two simple population models from within the gravitational-wave literature.Following Abbott et al. (2021a); Talbot & Thrane (2018), we model the binary black hole mass distribution as a power law in the larger mass, m 1 , between the minimum and maximum mass along with a normally distributed component and a power law in the mass ratio, q = m 2 /m 1 , This is the Power-Law + Peak model in Abbott et al. (2021a) without the low-mass smoothing.We assume that both component spins are drawn from the same distribution.We model the spin magnitudes as following a Beta distribution (Wysocki et al. 2019) We model the distribution of spin orientations as a combination of a truncated half-normal and a uniform distribution (Talbot & Thrane 2017) The factor N ensures that the distribution is properly normalized.This is the Default model in Abbott et al. (2021a).

Likelihood
The standard likelihood used in population inference for gravitational-wave sources in the presence of selection biases is (e.g., Thrane & Talbot (2019); Mandel et al. (2019); Vitale et al. (2020)), Where the product over i runs over the N observed events with data d i .The integral over θ i is typically performed by importance sampling from the single-event posterior distribution for p(θ i |d i ) as is done to calculate P det .We take the publicly available samples from the single-event posterior distributions from Abbott et al. (2019aAbbott et al. ( , 2021b)); The LIGO Scientific Collaboration et al. (2021a).Since the likelihood explicitly depends on P det , calculating this quantity is the main target of this work.This likelihood is then used to explore the posterior distribution for the population parameters given all of the observed data, e.g., in Section 6.3.

DENSITY ESTIMATION
Reconstructing a function or probability density from a finite set of samples from the distribution is a widespread problem in data analysis.For example, injection campaigns to determine the sensitivity of gravitational-wave detectors do not give us a continuous description of the sensitivity, but rather a discrete set of samples from the distribution of found injections.
Density estimation methods can be loosely divided into parametric and non-parametric methods.2Parametric density estimation involves fitting a parameterized phenomenological model to the data.An example of this is the method used to reconstruct the population distribution of binary black holes in this work.To estimate the gravitational-wave selection function we will rely on non-parametric density estimation.
Many methods for nonparametric density estimation are commonly used; however, most traditional methods such as binning or kernel density estimation scale poorly as the dimensionality of the problem increases.More sophisticated density estimation techniques involving the optimization of many parameters, such as Gaussian mixture models or flow-based inference, have proved successful at approximating complex functions in large dimensional spaces, see, e.g., Powell et al. (2019); Gabbard et al. (2019); Green et al. (2020); Green & Gair (2021); Wong et al. (2020aWong et al. ( ,b, 2021) ) for applications in gravitational-wave inference.These models also provide natural ways to generate additional samples from the underlying densities at minimal cost and are therefore sometimes referred to as generative models.
In this work, we approximate p det (θ) using a Gaussian mixture model.A Gaussian mixture model is an unsupervised density estimator that approximates the distribution as a set of multivariate Gaussian distributions each with a unique mean and covariance.The model assumes that the target distribution can be well modeled by a finite sum of multivariate Gaussian distributions Here K is the number of components in the mixture and can be manually tuned, w k is the weight associated with the kth component, µ k and Σ k are the mean vector and covariance matrix for that component.The values of the w k , µ k , and Σ k are optimized using the expectationmaximization algorithm (Webb 1999) to maximize the value of over the training data.
The non-parametric methods discussed above are all examples of unsupervised learning techniques as they only do not require estimates of the target density as inputs.There are also supervised non-parametric density estimation methods that require the target function to be evaluated at the training points; for example, Gaussian process regression and neural network regression have also been applied widely in gravitational-wave data analysis, see, e.g.Graff et al. (2012)

SENSITIVITY TO BINARIES
In this section, we develop methods to evaluate Equations 2 and 1 and generate new samples from the distribution of found injections.We begin by training a function to estimate p det (θ)p(θ|Λ 0 ) using a set of ∼ 8 × 10 4 found injections.We use the same sensitivity data products used in The LIGO Scientific Collaboration et al. (2021b).Specifically, we take the found injections from Advanced LIGO/Virgo's third observing run with a threshold of false alarm rate < 1 yr −1 in any of the search pipelines employed by the LIGO/Virgo collaboration (The LIGO Scientific Collaboration et al. 2021b,a; LIGO Scientific Collaboration and Virgo Collaboration and KAGRA Collaboration 2021a).See the relevant publications and data releases for more details.

Pre-processing
Gravitational-wave parameters are typically only defined over finite domains and many have significant support at the edges, e.g., spin magnitudes are contained in the unit interval and the majority of observed black holes are consistent with being non-spinning.However, the algorithms we use for density estimation work best over an infinite domain without sharp boundaries.Our aim is to transform the found injections such that the transformed samples are drawn from a unit multivariate normal distribution.Therefore, we begin by performing the following mapping to the found injections: 1. Transform the injections from the original distribution to the unit interval.We denote generic transformations as U and discuss specific suitable transformations below.
2. Map the samples from the unit interval to a unit normal distribution using the probit function Φ −1 (Bliss 1934).
Mathematically the full transformation is and the Jacobian is where N is the normal distribution.We consider the four following scaling methods.
Naive.The simplest mapping onto the unit interval is a simple shift and scale from the original domain to the unit interval The mapping is attractive as it can be trivially applied to any parameter and has been used in other applications, e.g., D'Emilio et al. ( 2021).The Jacobian of this transformation is a constant CDF.For some parameters the original distribution may be more complex, however, we may know an analytic form for that distribution.In this case, we perform the mapping using the cumulative distribution function of the injected population In this case the Jacobian is This transformation is appealing as the form of the Jacobian means that evaluating p det is trivial; however, as we will see for parameters that strongly influence the detectability, this mapping does a poor job approximating a unit uniform distribution.
Approximate.In cases where p det is a strong function of θ with a known (or approximately known) functional form we may choose to map onto the unit interval using an approximate expression for the observed cumulative distribution Here pdet (θ) is our analytic approximation to the selection function and the Jacobian In our case, we use pdet (θ) ∝ m 2.35 q 2 so we have p(θ|Λ 0 )p det (θ) ≈ const.We note that the naive and approximate methods give the same transformation for primary mass and aligned spin components.There is a difference in the transformation of the mass ratio.While we only account for the dependence of p det on the primary mass, we could employ a more complex expression, e.g., as discussed in Veske et al. (2021).
Empirical.Finally, we use an approximation to the one-dimensional target distribution.We construct a Gaussian kernel density estimate p(θ) ≈ p(θ|Λ 0 )p det (θ) (20) from the found injections.The standard deviation is chosen using Scott's rule (Scott 1992).In order to account for parameters that have significant support at the edges we apply a reflecting boundary condition to the estimate Using this estimate, we then compute an empirical cumulative distribution function Two-dimensional density plots of the distribution of found injections before (top) and after (lower) applying the transformations described in Section 5.1.Our aim is to transform the data to approximate a unit multivariate normal distribution.In descending order these transformations are naive, CDF, approximate, and empirical (see the main text for definitions).We note that the empirical scaling maps the data most closely onto a unit multivariate normal distribution.The average log-likelihood over the test and training data of the trained Gaussian mixture model as a function of the number of Gaussian components.We note that the performance on the test data set flattens out after ∼ 10 components, while the performance on the training data continues to improve.
The Jacobian for this transformation is and can be trivially evaluated from our kernel density estimate.
In Figure 1 we show the set of found injections in the original (θ) space and each of the transformed spaces.In descending order, the rows are the original data, naive scaling, CDF scaling, approximate scaling, and empirical scaling respectively.Each of the transformations has removed the railing against the boundaries in all the parameters.However, there are visible features remaining, especially in the mass parameters.We note that the empirical scaling most closely transforms the data to an uncorrelated multivariate unit normal.We will use the empirical scaling going forward unless otherwise specified.

Density estimation
Our aim is to take the regularized samples {θ i } and estimate the density D using a Gaussian mixture model.Training is performed by maximizing the mean natural log-density of the test samples as implemented in scikit-learn (Pedregosa et al. 2011).Adding more components will improve the quality of the fit.However, by using too many components we risk over-fitting statistical fluctuations in the training set.To avoid this, we split the samples into a training (80%) and a test (20%) set.The fluctuations in the test set should be independent of those in the training set and therefore we will choose the number of components when the quality of the fit in the test set stops improving when adding more components.
In Figure 2, we show the average log-density over the training and test sample sets for the trained Gaussian mixture models with varying numbers of components.The offset between the two sets of points is simply due to random fluctuations in the train/test split.Other realizations can lead to a smaller mean log-density for the test data set.We note that the performance on the test data set flattens out after ∼ 10 components, while the performance on the training data continues to improve.We, therefore, use 10 components in the remainder of this work to avoid overfitting.In the subsequent sections, we use a 10-component Gaussian mixture model density estimate D trained using all of the found injections {θ i }.

Evaluation
After training the Gaussian mixture model, we can trivially generate new samples from the target distribution by drawing samples θ from the Gaussian mixture model and applying the inverse of Eq. 12.Alternatively, since these are density estimates, we can also directly evaluate the estimated density In practice, we want to evaluate the selection function Here J is the Jacobian from Equation 13and p(θ|Λ 0 ) is the original distribution of injections.We use Eq. 26 as an alternate means of computing Eq. 2 with an equivalent Monte Carlo integral over samples from the population distribution With the empirical mapping described in Section 5.1 we have This can be very efficiently evaluated as required.We note that this method requires an efficient method of generating samples from the population distribution.This can be trivially performed using inverse-transform sampling if the population model has an analytically invertible cumulative distribution function, or is a sum of such distributions.For other population models (for example, those using the low-mass smoothing introduced in Talbot & Thrane (2018) or the redshift distribution introduced in Fishbach et al. (2018) and used in many population analyses, e.g., (Abbott et al. 2019b(Abbott et al. , 2021a)) another method is required.The simplest alternative approach is numerically estimating the inverse cumulative distribution function; this can be implemented for all models at an increased, and perhaps prohibitive, computational cost.In Wong et al. (2020a), the authors train a deep flow-based generative network capable of very efficiently generating samples from the mass model in Talbot & Thrane (2018).Additionally, one could relegate the generation of samples from the population model to an offline pre-processing step by training a deep neural network to estimate P det directly.
In practice, we find that this method requires far fewer samples in the Monte Carlo integral than when resampling the found injections, 5000−10000 samples from the population model versus ∼ 80000 found injections with the same number of effective samples for each method.The number of effective samples is defined slightly differently for the two Monte Carlo methods considered here.For Equation 27we adopt the usual definition (Elvira et al. 2018) However, for Equation 3 a correction is required to account for the initial injections with p det = 0 (Farr 2019).Following Farr (2019), for both of these methods we only allow for samples with N eff > 4N events and marginalize over the statistical uncertainty in P det in the likelihood.

RESULTS
To demonstrate the efficacy of our new method we consider the accuracy and precision of three different methods to compute the population selection function We evaluate P det three times for each this set of samples.
1. Using Equation 3 with the original found injection set.
2. Using Equation 3 with samples generated using our Gaussian mixture model.27with 10000 samples from the population model.

Using Equation
We compare our estimators using 5000 samples drawn from the prior distribution for our population parameters specified in Table 1.These are the parameters describing the models presented in Section 3.1.We note that the prior is specified on the mean and variance of the Beta distribution We highlight one specific limiting case that will most clearly demonstrate the differences between the methods.We note that α χ → 0 indicates a spin distribution that peaks very sharply at zero black hole spin.If α χ < 1, the distribution is singular at a = 0; due to Monte Carlo convergence issues, these singular configurations have not been used in many previous analyses (e.g., Abbott et al. (2019bAbbott et al. ( , 2021a)); The LIGO Scientific Collaboration et al. (2021b).)

Computing P det
While the mean log-density is a suitable metric for training our estimators, we can perform a stronger test by considering how well the population-averaged P det (Λ) compares across a range of Λ values using our Gaussian mixture model.
In Figure 3, we show the distribution P det for these three methods (in order blue, orange, green).In the top and bottom panels, we show samples for non-singular (α χ , β χ > 1) and singular spin magnitude distributions respectively.We note that all the methods agree well for the non-singular distributions.For singular spin distributions, both methods that resample a fiducial set of The distribution of the logarithm of the population-averaged sensitivity P det over the prior used for our training data (see Table 1).In the top panel, we show samples for non-singular spin magnitude distributions (αχ, βχ > 1).In the bottom panel, we use samples for singular spin distributions.In blue and orange, we calculate P det using Equation 3 as in previous analyses using the recovered simulated injections from the LIGO/Virgo collaboration (blue) and samples drawn from our Gaussian mixture model fit to p det (orange).In green, we calculate P det using Equation 27 using 10000 samples from the population model.We note that the methods agree well for the non-singular distributions, however, the old method breaks down for singular spin distributions leading to the difference in calculated P det .
samples (blue and orange) methods differ significantly from our new method.These distributions are very sharply peaked and hence are the natural failure mode of importance sampling.The fact that the blue and orange results agree closely demonstrates that this effect is due to the difference in the construction of the Monte Carlo integral and not due to inaccuracy in the Gaussian mixture model estimate.
In Figure 4, we show the population-averaged sensitivity marginalized overall population parameters except for the logarithm of the Beta distribution α parameter.The color scheme is the same as Figure 3.The fiducial sample reweighting methods underestimate the sensitivity to singular spin distributions compared to methods In blue and orange, we calculate P det using Equation 3 as in previous analyses using the recovered simulated injections from the LIGO/Virgo collaboration (blue) and samples drawn from our Gaussian mixture model fit to p det (orange).In green, we calculate P det using Equation 27 using 10000 samples from the population model.The computed values of P det agree for non-singular spin distributions log 10 αχ > 0 but the sample reweighting method underestimates the sensitivity for singular spin configurations.
that directly evaluate p det .We find that all the methods agree well for the parameters describing the black hole mass distribution.

Monte Carlo Convergence
As described in Farr (2019), in order to have a reliable estimate of P det , we need a sufficient number of effective samples in our Monte Carlo integral.When reweighting a fixed set of recovered injections, this amounts to certain parts of the parameter space being inaccessible to our analyses.By contrast, we can draw arbitrary numbers of samples from our density estimates of p det in order to achieve sufficient convergence.
To demonstrate this, we compute the number of effective samples for each of the population samples used in the rest of this section using the Eqs 2 and 27.For the former, we use the ≈ 80000 found injections during O3 (LIGO Scientific Collaboration and Virgo Collaboration and KAGRA Collaboration 2021a).Throughout we use N events = 69 (BBH events confidently identified in O3.)For the latter, we draw 10 4 samples from the population model.In Figure 5 we show the number of effective samples for the old fiducial sample reweighting method (blue with real injections and orange with samples drawn from our density estimate) and our new method (green) for two population parameters.In the top and bottom panels, we show the parameters that demonstrate the most obvious trend for the old and new methods respectively (α χ and λ m the fraction of primary masses in the Gaussian component).Once again, we note that the blue and orange and results agree closely, indicating the robustness of the Gaussian mixture model fit.
The spread in N eff for the old method is over four orders of magnitude, while for the new method it is only two orders of magnitude.This means that if using a fixed number of samples at each iteration, that fixed number can be smaller for the new method leading to increased computational efficiency.As expected, we see that the efficiency of the fiducial sample reweighting drops significantly for α χ < 1.In contrast, our new method only weakly depends on the parameters of the spin magnitude distribution.We find that when reweighting the found injections we reject ≈ 10% of the non-singular samples and ≈ 50% of the singular samples compared to < 1% of samples using our new method.The new method has more effective samples (smaller uncertainty in > 80% of the space.)

Population inference
We now consider the impact of our different P det evaluation methods on population inference.We analyze the 69 binary black hole mergers with false alarm rate < 1 yr −1 identified in GWTC-3 The LIGO Scientific Collaboration et al. (2021a).We perform population inference three times, once with each of our P det estimation methods using the Bilby-MCMC sampler (Ashton & Talbot 2021).
In Figure 6, we show the inferred astrophysical population of binary black holes when using our two methods to evaluate the selection function P det .In blue, we evaluate Equation 3 using the original simulated injections.In orange, we evaluate Equation 3 using samples from our Gaussian mixture model.In green, we evaluate Equation 27 using our Gaussian mixture model.The solid curves show the mean inferred distribution, while the shaded regions indicate the 90% credible intervals.We find no significant difference between the inferred population using these methods although we note a slight shift away from singular spin configurations when using our new method.This is exactly the region in which the old method underestimates the sensitivity, leading to an overestimated likelihood.(Bottom) The fraction of primary black holes whose mass falls in the Gaussian component of a two-component power-law and Gaussian mixture model.In blue, we show N eff calculated by reweighting injections directly.In orange, we draw a new fixed set of samples using our Gaussian mixture model density estimate and reweight them.In green, we draw samples from the population model and directly evaluate p det (θ) using our Gaussian Mixture Model density estimate.The dashed black line shows the threshold for sufficient convergence of the Monte Carlo integral.We note that the old method is systematically biased away from small αχ (distributions that assign small spins to most black holes) and is unable to probe the region with αχ 0.5.In total, ≈ 40% of the prior volume is inaccessible with the old method, compared to < 1% with our new method.
signals are injected into the data and counting the number of recovered signals.While this method gives an optimal estimate of the performance of searches for signals in real data, they can be difficult to work with and extend to generic population models.In this work, we use these recovered signals to train a density estimate that can be reused to more efficiently compute the sensitivity to arbitrary populations.and cosine spin tilt (bottom) when analyzing the binary black hole mergers identified in GWTC-3 using three different methods for estimating the population-averaged sensitivity, P det : (blue) using the original found injections, (orange) new samples generated using a Gaussian mixture model fit to the found injections, (green) evaluating p det using samples from the population models.The results in blue and green do not include the dependence of P det on the spin distribution.We note that this leads to a slight change in the inferred distribution of spin orientations, when including spin dependence in the selection function there is slightly more preference for isotropic spins, although this is well below statistical uncertainties.The solid curves show the posterior predictive distribution and the shaded regions show the symmetric 90% credible region.
To improve the accuracy of this density estimate we introduce a pre-processing step that improves the convergence.Using this density estimate, we tested three methods to compute the population-level selection function.We found that our new density estimation method matches the previous injection resampling method for population models where the Monte Carlo integrals are well converged.We further demonstrated that our method is able to probe sharply peaked black hole spin distributions far more precisely than the existing method.This method can be more computationally expensive, especially for complex population mod-els; however deep learning surrogate models present a solution to this problem Wong et al. (2020a).
Using our method, it is trivial to compute the fraction of sources which are observed P det by marginalizing over parameters other than those parameterized in the population model, e.g., evaluate P det using the parameters which most directly affect the sensitivity (chirp mass, mass ratio, effective aligned/precessing spin) and model the population in terms of parameters with the most intuitive physical meaning (component masses, spin magnitudes, and orientations).We leave a detailed analysis of the best combination of parameters to use for the density estimation to future work.Our results are consistent with the results presented in The LIGO Scientific Collaboration et al. (2021b); however, the uncertainty on the measured selection function is less in > 80% of the space when using our new method.As the catalog of observed compact binary coalescences grows it will be vital to understand the systematic error in our estimation of the selection function.
Machine learning methods for density estimation are rapidly gaining popularity in the gravitational-wave data analysis community, e.g., Powell et al. (2019); Gabbard et al. (2019); Green et al. (2020); Green & Gair (2021); Wong et al. (2021Wong et al. ( , 2020a,b),b); Cuoco et al. (2020).Most of these methods require the use of complex neural network-based density estimators which require tuning many more free parameters and thus extremely large training data sets.The pre-processing method introduced here removes sharp spectral features, e.g., at prior boundaries, and thus enables high-precision estimation of the target distribution using Gaussian mixture models, rather than having to employ deep learning density estimators.Combining this pre-processing in other density estimation problems may have a similarly simplifying effect.
One limitation of the current method is that the Gaussian mixture model employed in this work provides only a best-fit model and does not provide an indication of uncertainty in the fit over the parameter space.We leave the exploration of density estimation techniques that model this uncertainty, e.g, Bayesian Gaussian mixture models to a future study.
We thank Maya Fishbach for producing mock injections used in an early version of this work.We thank Sylvia Biscoveanu, Tom Dent, Reed Essick, Jacob Golomb, Cody Messick, Richard O'Shaughnessy, Alan Weinstein, Daniel Wysocki, and Salvatore Vitale for useful comments and discussions.This work is supported through the Australian Research Council (ARC) Centre of Excellence CE170100004 and ARC Future Fellowship FT150100281.This is document LIGO-P2000505.This research has made use of data, software, and/or web tools obtained from the Gravitational Wave Open Science Center (https://www.gw-openscience.org),a service of LIGO Laboratory, the LIGO Scientific Collaboration and the Virgo Collaboration.Computing was performed computing clusters at the California Institute of Technology (LIGO Laboratory) supported by National Science Foundation Grants PHY-0757058 and PHY-0823459 and Swinburne University of Technology (OzSTAR).This work used publicly available samples from LIGO Scientific Collaboration & Virgo Scientific Collaboration (2018, 2020a,b); LIGO Scientific Collaboration and Virgo Collaboration and KAGRA Collaboration (2021b,a).A jupyter notebook to fully reproduce the results presented here along with a number of additional diagnostic figures can be found on Github.This work made use of Google Colaboratory.Software used in this work includes: numpy (Harris et al. 2020), scipy (Virtanen et al. 2020), scikit-learn (Pedregosa et al. 2011), matplotlib (Hunter 2007), pandas (pandas development team 2020), cupy (Okuta et al. 2017), Bilby (Ashton et al. 2019;Ashton & Talbot 2021), GWPopulation (Talbot et al. 2019) ; Veitch et al. (2015); Moore et al. (2016); Doctor et al. (2017); Lange et al. (2018); Landry & Essick (2019); Williams et al. (2020); Gerosa et al. (2020); D'Emilio et al. (2021).
Figure 2.The average log-likelihood over the test and training data of the trained Gaussian mixture model as a function of the number of Gaussian components.We note that the performance on the test data set flattens out after ∼ 10 components, while the performance on the training data continues to improve.
Figure 3.The distribution of the logarithm of the population-averaged sensitivity P det over the prior used for our training data (see Table1).In the top panel, we show samples for non-singular spin magnitude distributions (αχ, βχ > 1).In the bottom panel, we use samples for singular spin distributions.In blue and orange, we calculate P det using Equation 3 as in previous analyses using the recovered simulated injections from the LIGO/Virgo collaboration (blue) and samples drawn from our Gaussian mixture model fit to p det (orange).In green, we calculate P det using Equation 27 using 10000 samples from the population model.We note that the methods agree well for the non-singular distributions, however, the old method breaks down for singular spin distributions leading to the difference in calculated P det .

Figure 4 .
Figure4.The population averaged sensitivity marginalized over all population parameters except the logarithm of the Beta distribution α parameter for spin magnitudes.In blue and orange, we calculate P det using Equation 3 as in previous analyses using the recovered simulated injections from the LIGO/Virgo collaboration (blue) and samples drawn from our Gaussian mixture model fit to p det (orange).In green, we calculate P det using Equation 27 using 10000 samples from the population model.The computed values of P det agree for non-singular spin distributions log 10 αχ > 0 but the sample reweighting method underestimates the sensitivity for singular spin configurations.

Figure 5 .
Figure 5. Number of effective samples in the calculation of P det as a function of two population parameters describing the distribution of merging black holes.(Top) One of the parameters describing the black hole spin distribution αχ.(Bottom)The fraction of primary black holes whose mass falls in the Gaussian component of a two-component power-law and Gaussian mixture model.In blue, we show N eff calculated by reweighting injections directly.In orange, we draw a new fixed set of samples using our Gaussian mixture model density estimate and reweight them.In green, we draw samples from the population model and directly evaluate p det (θ) using our Gaussian Mixture Model density estimate.The dashed black line shows the threshold for sufficient convergence of the Monte Carlo integral.We note that the old method is systematically biased away from small αχ (distributions that assign small spins to most black holes) and is unable to probe the region with αχ 0.5.In total, ≈ 40% of the prior volume is inaccessible with the old method, compared to < 1% with our new method.

Figure 6 .
Figure6.The inferred distribution of primary mass (top) and cosine spin tilt (bottom) when analyzing the binary black hole mergers identified in GWTC-3 using three different methods for estimating the population-averaged sensitivity, P det : (blue) using the original found injections, (orange) new samples generated using a Gaussian mixture model fit to the found injections, (green) evaluating p det using samples from the population models.The results in blue and green do not include the dependence of P det on the spin distribution.We note that this leads to a slight change in the inferred distribution of spin orientations, when including spin dependence in the selection function there is slightly more preference for isotropic spins, although this is well below statistical uncertainties.The solid curves show the posterior predictive distribution and the shaded regions show the symmetric 90% credible region.

Table 1 .
Prior distribution for population parameters used in the analysis presented here.U(a, b) indicates a uniform distribution in[a, b].We note that the Beta distribution is parameterized in terms of the mean (µχ) and variance (σ 2 χ ).Additional cuts are imposed such that αχ, βχ > 0.