What You Don’t Know Can Hurt You: Use and Abuse of Astrophysical Models in Gravitational-wave Population Analyses

One of the goals of gravitational-wave astrophysics is to infer the number and properties of the formation channels of binary black holes (BBHs); to do so, one must be able to connect various models with the data. We explore benefits and potential issues with analyses using models informed by population synthesis. We consider five possible formation channels of BBHs, as in Zevin et al. (2021b). First, we confirm with the GWTC-3 catalog what Zevin et al. (2021b) found in the GWTC-2 catalog, i.e., that the data are not consistent with the totality of observed BBHs forming in any single channel. Next, using simulated detections, we show that the uncertainties in the estimation of the branching ratios can shrink by up to a factor of ∼1.7 as the catalog size increases from 50 to 250, within the expected number of BBH detections in LIGO–Virgo–KAGRA's fourth observing run. Finally, we show that this type of analysis is prone to significant biases. By simulating universes where all sources originate from a single channel, we show that the influence of the Bayesian prior can make it challenging to conclude that one channel produces all signals. Furthermore, by simulating universes where all five channels contribute but only a subset of channels are used in the analysis, we show that biases in the branching ratios can be as large as ∼50% with 250 detections. This suggests that caution should be used when interpreting the results of analyses based on strongly modeled astrophysical subpopulations.


INTRODUCTION
Gravitational waves (GWs) emitted by the mergers of compact objects, neutron stars and black holes, encode the properties of their sources including masses, spins, and distance.When one has a large enough dataset, information can be combined from the detected sources to infer properties of the underlying astrophysical process -or processes -that created them.Nearly 100 compact binary coalescences 1 (the large majority of which are BBHs) have been revealed in the data of groundbased GW detectors, LIGO (LIGO Scientific Collaboration et al. 2015) and Virgo (Acernese et al. 2015), up to their third observing run (O3) (The LIGO Scientific Collaboration et al. 2021a;Olsen et al. 2022;Nitz et al.Corresponding author: April Qiu Cheng aqc@mit.edu2023), allowing this type of analysis to be performed.The formation scenarios for compact binaries can be broadly separated in two categories: isolated evolution in the galactic field and dynamical assembly in dense environments such as clusters and AGN disks (see e.g.Mapelli 2021;Mandel & Farmer 2022, for reviews).The overall dataset might contain BBHs formed from a combination of these and other evolutionary pathways.
Ideally, one would like to fully characterize any astrophysical formation channels that contribute sources, as well as their relative abundances (branching ratios).In practice, several approaches have been proposed and followed in the literature.They all have merits and shortcomings, and we quickly review them here, focusing on BBHs, which are the topic of our work.
• Heuristic models -The most straightforward analyses rely on heuristic parametric distributions to describe the astrophysical distributions of black hole parameters.For example, the primary (i.e.most massive) black hole mass distribution can be modeled as a mixture model of a power law and a gaussian (the power law + peak model of Abbott et al. 2023, originally introduced by Talbot & Thrane 2018); the spin orientation distribution as a mixture model of an isotropic component and a component nearly aligned with the orbital angular momentum (Talbot & Thrane 2017;Vitale et al. 2017b); etc.The functional forms might be chosen based on computational expediency, or be inspired by reasonable astrophysical expectations (e.g. a power-law component in the black hole mass function because the masses of progenitor stars are distributed that way (Kroupa 2001)).Meanwhile, mixture models might allow for different sub-populations to be accounted for.
The main potential shortfall of this approach is that if the models are very strong, the resulting posteriors might actually be model-driven, especially for hard-to-measure parameters.This has been shown, for example, in the context of the spin magnitude measurement (Callister et al. 2022;Galaudage et al. 2021;Roulet et al. 2021) and the spin orientation (Vitale et al. 2022a).On the positive side, if a parameter can be reliably measured, it often has a clear connection with a meaningful astrophysical quantity (e.g., the slope of a mass power law).Heuristic models also allow correlations between parameters to be probed in a straightforward manner (Callister et al. 2021;Biscoveanu et al. 2022;Vitale et al. 2022a;Adamcewicz & Thrane 2022;Baibhav et al. 2023).
• Flexible models -Flexible model approaches have been proposed as a way of avoiding the risk of forcing features into the data.In such models, 1D posterior distributions (usually fully marginalized, e.g.p(m 1 |d)), are modeled as splines, Gaussian processes, or using autoregression (Mandel et al. 2017;Vitale et al. 2019;Tiwari 2021;Veske et al. 2021;Tiwari 2022;Rinaldi & Del Pozzo 2022;Edelman et al. 2022Edelman et al. , 2023;;Callister & Farr 2023;Golomb & Talbot 2022).While these approaches are less likely to impose stringent features into the posteriors (though see e.g.Farah et al. 2023) the number of unknown parameters is typically larger than for heuristic models and the model parameters are not usually associated to any specific astrophysical quantity.Furthermore, these methods are not well suited to disentangle sub-populations, and are instead better suited to measuring the overall distribution of parameters.
• Astrophysically informed models -Finally, one may use models obtained directly from the out-put of population synthesis.Given a set of initial conditions and choices for uncertain stellar, binary, and environmental physical parameters, such models make predictions for the anticipated underlying and detectable distribution of compact binaries.These models can be parametrized directly in terms of physically-meaningful quantities (e.g. the onset and evolution of binary mass transfer phases, the strength of supernova natal kicks, the efficiency of angular momentum transport), and correlations between parameters are automatically built in the models, both of which are very strong positive factors.In practice, the range of variations in physical uncertainties, as well as the number and complexity of the differing channels one considers, are often limited by the availability of numerical simulations that thoroughly explore the variation of the output (e.g.distribution of spin magnitude in the population) when one of the physical input parameters (e.g.efficiency of common envelope evolution) is varied.This is due to the fact that most population synthesis algorithms require significant computational resources to run, which implies that they cannot be evaluated "on the fly" for any value of their input parameters but must instead, for example, be evaluated on a sparse grid.Recent efforts have begun to more thoroughly explore compact binary population predictions for individual formation channels over an expansive array of physical and environmental uncertainties (e.g., Broekgaarden et al. 2022).Consideration of multiple formation channels simultaneously and self-consistently proves more difficult given the diversity of codebases needed to model different channels and the unique physics that affects compact binary populations from each channel.The most expansive multi-channel analysis to date was performed in Zevin et al. (2021b), who considered 5 possible formation channels (see Section 2.1 below), parameterized by the spin of quasi-isolated black holes at birth (a proxy for the efficiency of angular momentum transport in massive stars) and the efficiency of the common envelope ejection.By analyzing the 45 confident BBH sources of the penultimate (GWTC-2) LIGO-Virgo-KAGRA collaboration (LVK) catalog, Zevin et al. (2021b) found that the data required more than a single formation channel in order to explain the diversity of GW events and the distribution of parameters of the detected binaries.They also found that the data preferred small natal black hole spins, con-sistent with the fact that most of the LVK BBHs have small spin magnitudes.
In this paper, we analyze both merits and shortcoming of approaches based on models informed by the output of population synthesis codes.First, we repeat the analysis of Zevin et al. (2021b) on the latest LVK catalog (GWTC-3), which comprises 69 BBH with a false alarm rate of less than 1 yr −1 .We find that the model for common envelope evolution can explain up to 95% of the BBHs in the underlying population (while contributing up to 37% of the detectable BBHs).
Then, we create catalogs of simulated BBH signals with parameters drawn from our models, for some assumed values of the common envelope efficiency, quasiisolated natal black hole spins, and branching fractions across channels.We analyze the performance of the analysis as the number of BBH sources in the catalog increases from 50 to 250.We find that for channels that produce higher number of detectable sources (and hence are proportionally more represented in the catalog, even if the true underlying branching fractions are not higher) the uncertainty on the underlying branching fraction can improve by a factor of up to ∼ 1.7 as the number of sources increases to 250.
Next, we study the biases that can be introduced in this type of inference if the analysis is not using a suite of models fully representative of what is actually realized in nature.This problem was first indirectly shown by Franciolini et al. (2022), who added a channel for primordial black hole formation to those used by Zevin et al. (2021b), and obtained that the inference on the fraction of primordial back holes was significantly affected by which of the other channels were included in the analysis.To do so, we generate mock universes where each of the 5 channels contributes some known fraction of the underlying population, and run the inference excluding in turn one of the 5 models.We show how this introduces biases in the inference of the remaining 4 channels' branching fractions.The channels that are most heavily biased are the ones that can most easily produce sources similar to the one channel excluded from the analysis, as well as those with the lowest detection efficiencies.Finally, we generate universes where the totality of the BBH sources are produced by one of the 5 channels, and run the analysis with the same 5 models.We show that while the natal spin can be inferred correctly, it is usually not possible to exclude that more than one channel contributes to the population after 100 events, a caveat to our result from our inference on GWTC-3 data that multiple channels contribute to both the underlying and detected BBH population.
The rest of the paper is organized as follows: in Section 2 we review the basics of hierarchical inference and the models used in this work, and apply these tools to GWTC-3's BBHs.In Section 3 we apply the method to different simulated catalogs in the ideal scenario where models and true populations match.In Section 4 we focus on biases in the inference.We conclude in Section 5.

Hyper-Inference Method
We use hierarchical Bayesian inference on the branching fractions between different astrophysical formation channels of BBHs.Our methods mostly follow the analysis developed by Zevin et al. (2021b), adapting their codebase, Astrophysical Model Analysis and Evidence Evaluation (AMAZE), for our work.Here we outline the essentials of this method, as well as the key differences.
Each of these channels are modelled to predict the 4-dimensional distribution of the BBHs it forms with parameters ⃗ θ = [M c , q, χ eff , z], where M c is the sourceframe chirp mass, q is the mass ratio (defined to be 0 < q ≤ 1), χ eff is the effective dimensionless spin parameter, and z is the redshift; these are constructed into a probability distribution using a 4-dimensional KDE bounded by the physical constraints of each parameter.The models also depend on two additional parameters encoding uncertainties in the physical prescription: χ b , the dimensionless spin of a black hole formed in quasi-isolation directly following core collapse, and α CE , which parameterizes the efficiency of common envelope ejection (see e.g., Ivanova et al. 2013). 2 We note that the choice of natal black hole spin χ b does not set all black holes in a given population to merge with this exact spin because tidal spin-up processes (Qin et al. 2018;Zaldarriaga et al. 2018;Bavera et al. 2021) and hierarchical mergers (Pretorius 2005;González et al. 2007;Buonanno et al. 2008) can increase the spin of black holes that participate in BBH mergers.We assume these two parameters take on a grid of discrete values (χ b ∈ [0.0, 0.1, 0.2, 0.5], α CE ∈ [0.2, 0.5, 1.0, 2.0, 5.0]) over which we compute the models, although α CE only affects the CE channel in our models.A plot of the detection-weighted marginalized model KDEs for α CE = 1.0, χ b = 0.2 is shown in Figure 1 as an example.Further details, including formation models, detection weighting, and mathematical framework, can be found in Zevin et al. (2021b).
Overall, we perform hierarchical inference on 7 are the 5 astrophysical formation channel branching fractions.
The steps involved in perfoming hierarchical inference on GW populations given a set of posterior samples ( ⃗ θ = [M c , q, χ eff , z] in our case) for N obs sources in the pres-2 The choice of these hyperparameters come from the availability of self-consistent astrophysical models; in particular, the globular cluster simulations were run on a grid of discrete values in χ b , and α CE variations only affect the parameter distributions of the CE model, which was less computational expensive to explore additional variations than other models.See Zevin et al. (2021b) for further discussion on the reasoning and motivations for the physical prescriptions considered. 3Due to the restriction i β i = 1 (the branching fractions must add up to 1), we technically only perform inference on 6 hyperparameters.
ence of selection effects have been thoroughly discussed in the literature (Farr et al. 2015;Mandel et al. 2019;Thrane & Talbot 2019;Vitale et al. 2022b;Abbott et al. 2023), and we therefore do not review them here.As in Zevin et al. (2021b) (Kass & Raftery 1995).
We also recover the detectable branching fractions β det , which represent the fraction of detectable BBHs originating from each channel.
These are computed by re-scaling each underlying branching fraction by its detection efficiency, defined as ξ χ b ,αCE , where P det ( ⃗ θ) is the probability of detecting a BBH with parameters ⃗ θ and p( ⃗ θ|µ χ,α j ) is the probability of formation channel j producing a BBH with parameters ⃗ θ, dependent on the model µ j and choice of physical prescription χ b and α CE (Zevin et al. 2021b).We apply this method to both real (Section 2.2) and simulated (Sections 3, 4) BBH data.

Application to GWTC-3
We extend the work of Zevin et al. (2021b) by applying the inference to confident BBH detections up to the second half of the 3rd observing run (O3b).We use the publicly-released priors and posterior samples from the GWTC-2.1 (The LIGO Scientific Collaboration et al. 2021b) and GWTC-3 (The LIGO Scientific Collaboration et al. 2021a) analyses; detection probabilities are calculated for the LIGO-Hanford, LIGO-Livingston, and VIRGO network operating at midhighlatelow sensitivities (Abbott et al. 2018).There are two key differences between our analysis and that of Zevin et al. (2021b).First, we apply a more stringent detection threshold of FAR ≤ 1 yr −1 (Abbott et al. 2023); therefore, events GW190424 180648, GW190514 065416, and GW190909 114149, which were used in the previous analysis, are now excluded.Second, we evaluate the prior at the posterior points analytically rather than by constructing a KDE from prior samples; for further discussion on this point see Appendix A. Finally, as in Zevin et al. (2021b), we opt to exclude GW190521 and GW190814 from the analysis, as their posteriors extend significantly to regions of our model KDEs with little to for our 5 different formation channels with αCE = 1.0, χ b = 0.2.Note the unique features of each formation channel and their dependence on the chosen model χ b = 0.2: while the χ eff marginalized KDE for field channels CE and SMT peak at χ eff = χ b = 0.2, with the CE channel having support for larger χ eff through tidal spin-up, dynamical channels GC and NSC peak at χ eff = 0 due to the independently isotropically oriented spins of the BBHs with symmetric wings to more extreme χ eff values from hierarchical mergers, and the CHE channel has χ eff exceeding χ b = 0.2 due to significant tidal spin-up of both black hole progenitors.The first row is the same as the third row of Figure 1 from Zevin et al. (2021b), except here we have calculated the detection weighting with O4 sensitivities, and the black ticks show the median values of the parameter estimation posteriors for each event in GWTC-2, while the green and red ticks mark the events added to and removed from, respectively, this updated analysis that includes GWTC-3.
no support.Furthermore, in the case of GW190814, it is uncertain whether this system is comprised of two black holes or a neutron star and a black hole.We find that the inclusion of GW190521 does not significantly affect our results; see Franciolini et al. (2022) for an analysis that includes GW190521.Overall, we do hierarchical inference on 68 BBHs, compared to the 45 BBHs considered in Zevin et al. (2021b).Unless otherwise specified, we report results as median and 90% symmetric credible intervals.
Figure 2 shows the posterior distributions on the underlying branching fraction ⃗ β including detections from O3b; the same plot but for the detectable branching fractions can be found in Figure 11 in Appendix B. We find −0.8 % and β SMT = 4.0 +6.5 −3.2 %, indicating strong support for the CE channel dominating the underlying astrophysical population in our set of models.However, there is comparable contribution from all five channels to the detectable BBH population: β det NSC = 20 +17 −13 %, and β det SMT = 24 +24 −19 %.We attribute this difference to the fact that compared to other channels, the CE channel produces less massive black holes distributed at higher redshifts, which therefore are harder to detect; see Figure 1.With 90% (99%) credibility, no single formation channel contributes to more than 49% (61%) of the detectable BBH population.Additionally, over 98% of posterior samples have significant (β det > 10%) contributions from three or more different formation channels.Overall, we find that the CE channel contributes the most to the underlying BBH population; however, as we discuss in Section 4, the posterior for β CE is also the most uncertain and priordominated.We additionally find that a mix of formation channels contributes to the detectable BBH population These results, as well as the overall shapes of the branching fraction posteriors, are consistent with the analysis with GWTC-2 data (Zevin et al. 2021b), which found β CE = 71 +19 −60 %, and that no single channel contributed to more than 70% of the detectable BBH population with 99% confidence.No strong correlations are ap-  parent upon examining a corner plot of the branching fractions.
Turning to the selection of physical prescription hyperparameters χ b and α CE , we find no posterior support for models with χ b > 0.1; there is no significant preference between the χ b = 0.0 and χ b = 0.1 models, with BF χ b =0.1 χ b =0.0 = 1.18.We favor high common envelope efficiencies, with BF αCE=5.0αCE=1.0= 249, and strongly disfavor low common envelope efficiencies, with BF αCE=0.2αCE=1.0= 5 × 10 −5 .These results are also consistent with Zevin et al. (2021b).The main difference is that we obtain stronger constraints, both in singling out preferred hypermodels and in the uncertainties (widths) of the branching fraction hyperposteriors.We attribute this effect to the increase in sample size from GWTC-2 to GWTC-3, as well as the difference in method in evaluating the prior at the posterior points during the inference; see Appendix A for further discussion.

Method
Next, we perform the same inference using simulated BBH observations in a universe where the BBH population exactly follows our models.The motivation for this analysis is to to quantify how uncertainties in the hyperposterior scale with the number of observed events.We first create a mock "universe", i.e. a set of true values for our hyperparameters ⃗ From each formation channel j, we draw n j = β j n BBHs from the population model p(θ|µ χ b , αCE j ), for a total of n = 5 × 10 4 BBHs that form our underlying population.Next, we draw from this underlying population, assigning extrinsic parameters (sky location and inclination) from an isotropic distribution.For each BBH system, we calculate its optimal signal-tonoise ratio ρ opt assuming a network consisting of LIGO-Hanford, LIGO-Livingston, and Virgo operating at O4 low (LIGO) and high (Virgo) sensitivities (Abbott et al. 2018;O'Reilly et al. 2022), and keep only the mock signals with ρ opt ≥ 11; we repeat this process until we have a mock catalog of N obs detections.Then, we perform parameter estimation on the N obs BBHs.For both the SNR calculation and parameter estimation, we use the Bayesian inference software bilby (Ashton et al. 2019) and the IMRPhenomXP waveform approximant (Pratten et al. 2021).Finally, we use these posterior samples for the hierarchical inference analysis outlined in Section 2.1.For consistency, we use the same O4 sensitivities for the detection weighting during the inference as the above computation of mock signal SNRs.
The various mock universes that we use in this paper, along with the sections in which they are discussed, are summarized in Table 1.For this analysis, we choose a fiducial unequal mixture of formation channels with underlying branching fractions  1. True values of the branching fractions for the different universes from which we generate our mock catalogs, along with the number of detections in each mock catalog used in the inference and the sections in which they are discussed.For all universes we pick true values χ b = 0.0 and αCE = 1.0, except the equal branching fraction universe (second row), where we use χ b = 0.2.
To investigate the scaling of the hyperposterior with N obs , for each universe we repeat the inference with N obs = 50, N obs = 150, and N obs = 250.N obs = 250 represents an estimate on the total number of BBH detections anticipated by the end of O4 (Weizmann Kiendrebeogo et al. 2023).

Results
In Figure 3, we plot for our chosen fiducial universe the overall posterior distribution on the underlying branching fractions for different values of N obs (left column), as well as contributions (to scale) from the most favored values of χ b and α CE (columns 2 to 5).First, we do recover the true values of ⃗ β and χ b .For N obs = 250, we find β CE = 28 +26 −21 %, −2.1 % and β SMT = 24 +16 −13 %; the true value of the branching fraction falls within the 90% symmet- ric credible interval of the posterior for all 5 channels.Furthermore, χ b = 0, the chosen true value for χ b , is favored over the next best model, χ b = 0.1, by a Bayes factor of 3.5 × 10 5 .Unlike χ b , the model with α CE equal to the true value is not favored, with a marginal preference for higher values α CE = 5.0 and α CE = 2.0 over the true value α CE = 1.0 with Bayes factors BF αCE=5.0αCE=1.0= 8.9 and BF αCE=2.0αCE=1.0= 2.5.Since α CE only affects the CE channel, it is not too surprising that the inference did not decisively favor any one value, in a universe where most detected BBHs do not come from the CE channel.We note two possible contributing factors to favoring higher values of α CE over the true value.If BBHs formed in the CE channel disperse their envelopes more efficiently (i.e. have higher values of α CE ), the resulting BBHs are (a) Less massive, leading to lower detection efficiencies and therefore larger measurement uncertainties for this channel.This is because models with larger α CE have less low-mass binaries merging within the CE itself, leading to more low-mass BBHs being able to form and merge (Bavera et al. 2021).
(b) Lower spinning, due to the post-CE separations being wider for higher α CE and therefore less susceptible to tidal spin-up (Qin et al. 2018;Zaldarriaga et al. 2018;Bavera et al. 2021); this leads to more detections with χ eff closer to χ b = 0.0, the chosen fiducial value for this universe.In χ eff space, this is a feature degenerate with dynamical channels such as the GC channel, which also produces BBHs with χ eff ≈ 0 due to the BBHs produced having isotropic spin orientations.
This result suggests to view our result in Section 2.2, that higher common envelope efficiencies are favored, with some caution.More notably, we see convergence of the hyperposteriors towards their true values as we increase the number of detections.First, we highlight the narrowing of the branching fraction posteriors from the first row to the third row of Figure 3.We quantify uncertainties in the branching fraction posteriors by the widths of the 90% symmetric credible intervals, and will hereafter use the two phrases interchangeably.These uncertainties decrease by up to 69% as we go from 50 to 250 mock events (see  On the other hand, the inference has increasing support for α CE = 5.0, but even at 250 mock detections, we only weakly prefer it over the true model α CE = 1.0. It is worth noting that the scaling of the uncertainty with N obs varies significantly between formation channels.Some channels have more distinctive features in parameter space (see Figure 1) that make them easier to distinguish during the inference.Additionally, differences in the detection efficiencies of different formation channels likely play a role as well.Figure 5 shows the percent decrease in the uncertainty of branching fraction posteriors from N obs = 50 to N obs = 250 against the detection efficiency of its channel ξ χ b ,αCE for our fiducial values χ b = 0.0 and α CE = 1.0.Channels with lower detection efficiencies, most notably the CE channel, appear to scale much more poorly with N obs than channels with higher detection efficiencies.Indeed, the CE channel tends to produce lower-mass black holes at higher redshifts (due to typically shorter delay), leading to fewer of them being detectable (see Figure 1).We emphasize the effect of the low detection efficiency of the CE channel: despite the fact that 40% of the mock underlying population originates from this channel, only 2, 5, and 17 CE BBHs end up in the mock observations, of the 50, 150, and 250 total detections, respectively.The β CE posterior not only has the largest uncertainty of all formation channels, but also barely decreases in uncertainty as we increase N obs .If we instead examine how the detectable branching fractions scale with N obs in Figure 4, we can see that all detectable branching fractions narrow at similar rates as we increase N obs from 50 (first row) to 250 (third row).Indeed, we can see in Figure 5 that the detectable branching fractions (dotted line) do not exhibit the dependence of the convergence rate on detection efficiency that the underlying branching fractions (solid line) have.Already, we can see some of the difficulties that arise from hierarchical Bayesian inference in the face of large measurement uncertainties, selection effects, and degenerate features in parameter space.A common theme, we will explore these problems in further detail in Section 4.
Finally, we repeat this analysis for a different set of hyperparameters consisting of equal branching fractions between all formation channels (β j = 0.2 for all j), χ b = 0.2, and α CE = 1.0 (see the second row of Table 1).We find similar results in the convergence with N obs .Uncertainties in the underlying branching fraction decrease by up to 47% from N obs = 50 to N obs = 250, and, consistent with our previous results, the uncertainty in β CE does not decrease.We find also that the shrinking of the detectable branching fraction (β det ) posterior uncertainty is more consistent across different channels than the underlying branching fraction.Increasing N obs causes a strong preference for the true value of χ b = 0.2; while we have BF χ b =0.2 χ b =0.1 = 1.0 for N obs = 50, there is no posterior support for χ b ̸ = 0.2 at N obs = 150 and N obs = 250.Similar to the previous example, there is no strong preference for the true value of α CE , although it is slightly favored with BF αCE=1.0αCE=0.5 = 1.7.Figures showing the marginalized posteriors on β and β det for this set of hyperparameters can be found in Appendix B.

BIASES OF POPULATION INFERENCE
Finally, we investigate the biases that may arise when performing hierarchical inference with the methods described above in Section 2.1.We again refer to Table 1 for the chosen true values of the hyperparameters that we present in subsequent sections; we use the same method for hierarchical inference and simulated BBH detections as outlined in Sections 2.1 and 3.1, respectively.In Section 4.1, we perform hierarchical inference on sources originating exclusively from each individual formation channel and examine differences in the recovered posteriors.In Section 4.2, we isolate the effect of the natal black hole spin hyperparameter by comparing the posteriors of universes with different choices of χ b .In Section 4.3, we explore the consequences of doing hierarchical inference with incomplete information (i.e.excluding a channel from the inference) with detections from a mixture of formation channels.Finally, we summarize and discuss our results in Section 4.4.

Inference in Single-Channel Dominated Universes
For each channel j, we perform hierarchical inference on N obs = 100 mock detections in a universe where the entire underlying BBH population originates from channel j (i.e.β j = 1).We choose for our true values of the physical prescription χ b = 0.0 and α CE = 1.0, although the latter choice only affects the CE-dominated universe.Figure 6 shows the branching fraction posteriors and support for different values of χ b for each single-channel dominated universe.
In general, none of the branching fraction posteriors have significant support for β = 1, even for the channel that is actually producing the entirety of the BBHs.This happens because we use a flat, symmetric Dirichlet distribution for our prior, which results in a prior preference for a mixture of channels rather than a single dominating channel; this is illustrated by the gray curves in Figure 6.
The 5th percentiles of the dominating-channel branching fraction posteriors for the CE, CHE, GC, NSC, and SMT-dominated universes are β CE = 94%, β CHE = 62%, β GC = 41%, β NSC = 31%, and β SMT = 51%, respectively.The degree to which we underestimate the contribution from the dominating channel varies significantly depending on the channel.We again highlight the effect of detection efficiency.As we saw in Figure 5, the CE, GC, and SMT channels have the lowest detection efficiencies, especially the CE channel.Across the different universes under consideration, β CE , β GC , and β SMT (first, middle, and last columns, respectively) have larger uncertainties that compete with and take away from the branching fraction posterior of the dominating channel; because of the lower detection efficiencies of these channels, it is difficult to discern whether this channel is nonexistent or if its contribution to the full set of detected observations is minor.The case of β CE is particularly severe: as it does not strongly deviate from the prior, the 95th percentile for β CE is greater than 32% in all universes for which there is no CE contribution to the underlying population.The CE-dominated universe (first row) does not suffer from this effect; as a result, only in that universe do we recover with narrow precision that the dominating channel is indeed dominating.In the opposite case, the NSC channel has the highest detection efficiency.In the NSC-dominated universe (fourth row), the median value of the β NSC posterior is less than 0.5, with large contributions to the BBH population from the channels with lower detection efficiencies (β CE , β GC , and β SMT ).
There are also several interesting features of the spin model selection.First, in the CHE-dominated universe (second row), the inference is not able to select the true natal black hole spin of χ b = 0.0, and instead gives approximately equal support to χ b = 0.0, 0.1 and 0.2, as illustrated by the similar heights of the different colored curves.Recall that while the aligned-spin CE and SMT field channels mostly produce BBHs with χ eff ≈ χ b (although CE has a tail for higher spins from tidal spin-up) and the isotropic-spin GC and NSC dynamical channels produce BBHs scattered around χ eff ≈ 0, the CHE channel uniquely produces BBHs with χ eff ≳ 0.2 irrespective of χ b due to strong tidal spin-up effects (see Figure 1).As a result, the inference is not able to discern between values of χ b between 0 and the true value 0.2.In the universes dominated by dynamical formation channels (GC, third row, and NSC, fourth row), the selection of the true value of χ b = 0.0 is less strong, with non-negligible support for χ b = 0.1 as shown by the green curves.In these universes, most detections have χ eff ≈ 0, and χ b only affects the width of this distribution, a weaker feature to detect.Additionally, the sub-population of hierarchical mergers in these channels help to drive the mild support for χ b > 0; at lower values of χ b , hierarchical mergers occur more readily due to weaker gravitational recoil kicks.Hierarchical mergers can have |χ eff | sig- nificantly greater than zero (Pretorius 2005;González et al. 2007;Buonanno et al. 2008;Kimball et al. 2020), as illustrated by the wings of the marginalized χ eff distribution for the GC and NSC channels in Figure 1.Because hierarchical mergers form a larger sub-population in NSC due to their deeper potential wells and ability to retain post-merger black holes (Miller & Lauburg 2009;O'Leary et al. 2009;Hong & Lee 2015;Antonini & Rasio 2016), this effect is greater for the NSC-dominated universe.Additionally, in the NSC-dominated universe, the β NSC posterior for χ b = 0.1 (green curve) peaks at a higher value than χ b = 0.0 (blue curve).This is because when χ b = 0.1, field channels produce fewer of the χ eff ≈ 0 BBHs that make up the bulk of the population.Finally, although not shown in Figure 6, we comment on the inference on α CE .First, we do favor the true value of α CE = 1.0 for the CE-dominated channel, preferring it over the next most favored model with BF αCE=1.0αCE=2.0= 72, as expected from a universe where all detections are from the CE channel.We also note consistency with the results of Section 3.2 in the recovery of α CE for universes where there is no contribution from the CE channel; there is a small bias towards higher values of α CE .We find BF αCE>1.0αCE<1.0= 1.52, 1.54, 1.68, and 1.58 for the CHE, GC, NSC, and SMT-dominated universes, respectively 4 .

Biases in Spin Inference
To examine the effect of χ b on the hyperposterior, we perform hierarchical inference on different universes with the same underlying branching fractions, but different true values of χ b , using 250 mock detections.To isolate the effect of χ b , we choose an equal mixture of formation channels in the underlying population (β j = 0.2 for all channels j).We choose α CE = 1.0.
Figure 7 shows the marginalized branching fraction posteriors for universes with χ b = 0.0 and χ b = 0.2.Consistent with our results in Section 3.2, we recover the true values of the branching fractions as well as the true value of χ b for both universes.Here, we can see that the selection of χ b is non-linear: it is harder to distinguish between lower black hole spins (i.e.χ b = 0.0 versus χ b = 0.1) than higher spins.While there is no support for other values of χ b in the posterior of the χ b = 0.2 universe, there is still non-negligible support for χ b = 0.1 in the χ b = 0.0 universe, with BF χ b =0.0 χ b =0.1 = 30.This is expected; it is difficult to discern between slowly-spinning and non-spinning populations 4 To be precise, the quantity we calculate is BF due to the inherent measurement uncertainty of GW observations (Baird et al. 2013;Vitale et al. 2014Vitale et al. , 2017a;;Pürrer et al. 2016).
Figure 8 shows a corner plot of the branching fraction posteriors for both universes.Here, we can see the same effect: despite having the same number of mock detections, the data is more informative (yielding a posterior more different from the prior) in the universe with non-zero χ b .

Inference with an Incomplete Set of Populations
Finally, we perform hierarchical inference with one formation channel excluded, such that while all five channels are contributing sources, the inference is only performed with four channels.This analysis is motivated by the fact that any population analysis done on real BBH data likely does its inference with an incomplete set of formation channels; we most likely do not know the totality of all possible BBH formation channels, nor can we model them all accurately and self-consistently.We again consider an equal-mixture branching fraction universe (β = 0.2 between all formation channels), and true values for our physical prescription of α CE = 1.0 and χ b = 0.2. Figure 9 shows the marginalized branching fraction posteriors and support for different values of χ b for the full inference as well as for inferences with one channel excluded.We defer discussion of biases in α CE selection to Appendix B, and focus on χ b and the formation channel branching fractions in this section.
By examining which channels receive more or less posterior support as a result of the inference's incomplete knowledge of formation channels, we can infer the correlations between the branching fractions of different formation channels.For example, when the CE channel is excluded (second row), the increase in branching fraction is spread approximately equally over the other channels, suggesting a negative correlation of β CE with the other branching fractions, as is the prior.On the other hand, when the SMT channel is excluded from the inference (last row), the β CE posterior shifts towards higher values, while support for the other channels decreases.
Such correlations are consistent with the branching fraction corner plot from the full inference (Figure 8); the pink contours are relevant to the set of mock detections discussed in this section.The uncertainties in β CE are the greatest of all channels, and the posterior the least constrained; as such, the marginalized β CE posterior closely follows the prior.As a result, β CE is negatively correlated with all channels, which leads to systematic overestimation when doing inference with a channel excluded.In general, we can see that some branching fraction posteriors (i.e.β NSC and β CHE ) are much better constrained than others and are affected the least by the choice of prior.As seen in Section 4.1, the prior can cause a bias towards higher values of β for channels with lower detection efficiencies and greater uncertainties.
Next, we highlight the effect of incomplete formation channel knowledge on the selection of the natal black hole spin χ b .We see two cases in which the wrong value of χ b is selected.When the CHE channel is excluded (third row), we infer a high natal black hole spin of χ b = 0.5, as indicated by the yellow curve, with a Bayes factor over the true value χ b = 0.2 of BF χ b =0.5 χ b =0.2 = 373.This is due to the high-χ eff BBHs that the CHE channel produces.When the inference does not account for the CHE channel, it tries to explain these highly-spinning CHE black holes with other field-channel BBHs spinning at a higher χ b .Then, in order to still account for the lower χ eff non-CHE BBHs, the branching fraction for the GC and SMT channels are increased and decreased, respectively.We remark that no such adjustment is seen for the NSC and CE channels, despite them having similar features in χ eff space as the GC and SMT channels, respectively.
An opposite effect is seen when excluding the NSC channel (fifth row), which, as noted above, produces BBHs scattered around χ eff = 0 with tails that extend to more positive and negative values due to the presence of hierarchical mergers.The inference has constructed two competing explanations in order to explain these lower-spinning BBHs: low-χ b CE BBHs (represented by the green and blue curves with posterior support for high β CE and low β GC ) and higher-χ b GC BBHs (represented by the pink curve with support for low β CE and high β GC ).This highlights the degeneracy between low-spinning field channels and high-spinning dynamical channels in producing similar features in the population χ eff distribution.This case is especially remarkable because the green and blue curves (χ b = 0.0 and χ b = 0.1) for the β CE posterior bears striking resemblance to the β CE posterior inferred from GWTC-3 data (Figure 2), even though these features are purely an artifact of the inference neglecting a single BBH formation channel.We again remark on the difference between the β CE and β SMT posteriors: despite the fact that both CE and SMT are field channels with the similar features in χ eff space, support for β SMT does not increase (and rather decreases) in the low-χ b model as a result of the exclusion of the NSC channel.One reason why this may occur is that SMT BBHs cannot go through tidal spinup, unlike CE BBHs, and hence accounts for a narrower range of χ eff concentrated around χ b .Therefore, β SMT can be strongly affected by the exclusion of a channel with strong features in χ eff space, especially when the wrong model of χ b is inferred.
As mentioned in the previous paragraph, we also find notable that the inference favors two separate competing explanations when NSC is excluded.The GC and NSC marginalized χ eff KDEs are similar due to the isotropy of spin orientations in these two channels; upon examining the marginalized KDEs of the other three BBH parameters (M c , q, z), the GC channel appears still to have the closest resemblance to the NSC channel.Despite this, the inference does not simply overcompensate the exclusion of the NSC channel by correspondingly increasing β GC , suggesting the influence of higher-dimensional features in parameter space.On the other hand, no such effect is seen when the GC channel is excluded, whose posterior has a simple overcompensation in β NSC and β CE .
Indeed, although we have been able to broadly interpret these posteriors by focusing on different channels' features in χ eff space, there must be other subtle effects at play.We have shown in multiple ways the differences between the β CE and β SMT posteriors and the β GC and β NSC posteriors, despite having similar features in χ eff space.Although the exclusion of each channel has its own unique and interesting consequences, there appears to be a bias for the CE and GC channels, systematically underestimating the CHE and NSC channels.We point again to detection efficiency: of the field and dynamical channels, respectively, the CE and GC channels have by far the lowest detection efficiencies.With the uniform branching fraction spread in our current set of hyperparameters, only 2% of detections are expected to be from the CE channel (versus 30% and 16% from the CHE and SMT channels), and 16% from the GC channel (versus 36% from the NSC channel).
Finally, we have repeated this analysis for the set of hyperparameters used in the convergence analysis in Section 3, with an unequal mixture of channels and χ b = 0.0.In this universe, the bias towards overestimating β GC is more severe due to the lack of spin information from our choice of χ b .Consistent with our previous discussion, there again seems to be a bias towards the CE and GC channels when one channel is excluded.Due to the CHE and NSC channels' high detection efficiencies, the β CHE and β NSC posteriors support their low true values (5% for both channels) with small uncertainties.The corresponding plot of the branching fraction posteriors for this universe can be found in Appendix B.

Discussion
In this section, we conducted several investigations of different biases that arise in hierarchical Bayesian inference based on astrophysical formation models of BBHs.We summarize the key takeaways as follows: 1. Single-channel dominated universes (Section 4.1) • Even when our models of the underlying formation channels are perfectly accurate, at N obs = 100 the data are still relatively uninformative due to information loss from parameter estimation and low detection rates.Thus, results of inference are still influenced by the choice of a flat prior and exhibit bias towards a mixture of formation channels.This caveats our result from Section 2.2 that no single channel dominates the underlying BBH population, from the inference on GWTC-3 data.
• It is difficult to precisely infer the true value of α CE , as it only affects the CE channel, which produces very few detectable BBHs due to its low detection efficiency relative to the other channels considered.Only with mock catalogs where the CE channel dominates the detections can we recover the true value of α CE , as expected.
2. Biases in spin inference (Section 4.2) • It is easier to infer higher values of χ b than lower values.When χ b is low (0.0 or 0.1), one has less spin information, and uncertainties in both the branching fraction posteriors and the selection of the true value of χ b are greater.
3. Inference with incomplete populations (Section 4.3) • Both the branching fraction posteriors and the inferred value of χ b can be heavily affected if the inference is performed without knowledge of all contributing formation channels.Some branching fractions are overestimated or underestimated by a factor of ∼ 3 or more from the exclusion of a formation channel, and ignoring channels that produce particularly high or low χ eff BBHs can cause the inference to strongly select an incorrect value of χ b .
• Degeneracies exist between different sets of hyperparameters that can make it difficult for the inference to discriminate between them.For example, it can be difficult to distinguish between field BBHs with low χ b and dynamical BBHs with higher χ b , due to χ eff being the only spin information used in the inference (which in turn is due to the fact that χ eff is arguably the only spin parameter that can be measured for all BBHs with advanced detectors).There are likely more subtle degeneracies and correlations in higherdimensional parameter space that are difficult to explain from the marginalized distributions, but nonetheless play a role in the inference.
• Inference on the underlying branching fractions can be biased due to the varying detection efficiencies of different channels.In particular, the CE channel has a detection efficiency nearly an order of magnitude below the other channels, causing large measurement uncertainties in β CE and a relatively uninformed (i.e.close to the prior) posterior for β CE .As a result, the exclusion of a channel from the inference usually results in the β CE posterior support extending to higher values; similar effects can be seen in other low detection efficiency channels, such as the GC and SMT channels.

CONCLUSIONS
Understanding the physical processes and formation environments of compact binary mergers is one of the most pressing questions in GW astrophysics.In this paper, we pair the most recent catalog of BBH mergers provided by the LVK with an expansive, self-consistent suite of astrophysical models to investigate the origins of BBH mergers.Consistent with Zevin et al. (2021b), we find that given our set of astrophysical models, multiple formation channels are likely contributing to the observed population (though see Section 4.1 for a caveat).We demonstrate both the predictive power of our inference methodology and its scaling with future detections by generated mock observations with realistic measurement uncertainties from synthetic universes with known branching fractions and physical prescriptions.Perhaps most important, we also demonstrate the pitfalls of this type of inference, particularly how an incomplete census of formation models or incorrect physical assumptions can lead to significant biases in inference.This work should be treated as a cautionary tale for those attempting to understand relevant physical processes leading to compact binary mergers and formation environments of compact binary progenitors, as inference can be severely compromised if models suffer from inaccuracies of incompleteness.
Though the suite of BBH formation channel models used in this work are state-of-the-art and apply selfconsistent physical treatments where possible, they in many ways can be treated as exemplary.Given the numerous uncertainties in massive-star evolution, binary physics, compact object formation, and environmental effects, it is currently impossible to construct models with complete physical accuracy or to fully explore all the uncertainties that impact the source property predictions of population synthesis.Regardless, the biases demonstrated in this analysis are a generic concern when performing inference based on an incomplete or inaccurate set of astrophysical models.We do not suggest that such studies have no utility; compared to population inference that rely on heuristic or flexible models, studies such as these have the benefit of translating directly to physical constraints, albeit requiring proper caveats.Despite the potential issues with such analyses, we anticipate that given the diversity of BBH properties observed to date, the key result of multiple formation channels contributing to the detected population of BBHs remains robust.
A potential concern one might have when considering multiple formation channels for the production of BBH mergers is how the universe could conspire to have multiple distinct formation pathways, governed by unique physics, to contribute to the population of merging BBHs at a similar rate.Occam's razor would suggest that this is an unlikely scenario.However, astrophysical transients have been shown in many instances not to obey this principle (Maccarone 2021).Many channels of BBH formation have predicted rates within the same order-of-magnitude (Rodriguez et al. 2021;Broek-gaarden et al. 2022, see Mandel & Broekgaarden 2022 for a review) and the selection effects inherent to GW detection are certainly capable of causing sources from intrinsically rare channels to be heavily represented in the detected population.Future observations and improved population synthesis routines may help to more robustly disentangle the relative rates of various compact binary formation channels, and thereby have the capability of placing constraints on underlying physical processes.Nonetheless, for the time being we show that it is important to consider the potential biases that can accumulate when accounting for an incomplete picture of compact binary mergers in the universe.
Multiple avenues can be used in tandem with the analyses presented in this work to help expedite the ability of placing robust constraints on compact binary formation channels.In addition to analyses of the full population of BBH mergers, observational signatures from single events that are unique to one or a subset of formation channels will help to place constraints on the relative contribution of various formation channels (e.g., Zevin et al. 2021a).Observational constraints from other probes of compact binary formation outside of GW astronomy, such as electromagnetic surveys of BBH stellar progenitors, astrometric observations of compact object binaries, identification and host association of gammaray bursts and kilonovae, and characterization of pulsar binaries in the Milky Way can all help complement and improve constraints that rely solely on GW observations.Incorporating such information into astrophysical inference will help population analyses using astrophysical simulations remain pertinent and scale with the rapidlygrowing catalog of compact binary merger observations.The posterior samples in the analyses presented in this work, the code for calculating the prior at the posterior points (see Appendix A), and all figures, along with additional figures and the accompanying Jupyter notebook, are available on Zenodo (Cheng 2023).2, showing the marginalized branching fractions inferred from 68 events in the cumulative GWTC-3 catalog, but with the prior at the posterior points evaluated with a KDE, rather than analytically.
Figure 10 shows the marginalized branching fraction posteriors inferred from GWTC-2.1 and GWTC-3 data, but with the prior π(θ k i ) for each event i evaluated at each posterior sample k calculated via a 4-dimensional Gaussian KDE constructed from the prior samples provided from LVK data releases, as in Zevin et al. (2021b).Comparing with Figure 2, determining π(θ k i ) in this way gives rise to some noisy features in the posterior, such as non-trivial support for χ b = 0.2 and χ b = 0.5 (pink and yellow curves, respectively).While there is no support for χ b > 0.1 with analytical evaluation of the prior, with the KDE method we have BF χ b ≤0.1 χ b >0.1 = 4.00.Similarly, preference between different values of α CE is also weaker, with BF αCE=5.0αCE=1.0= 4.24, as opposed to 249.The primary notable features in the posterior as discussed in Section 2.2, however, still remain robust.

B. ADDITIONAL FIGURES
In this appendix, we show additional plots (starting from Figure 11).Details are given in each figure's caption.2, except here we show detectable branching fractions inferred from GWTC-2.1 and GWTC-3 data rather than underlying branching fractions.

Figure 1 .
Figure1.Detection-weighted (top)  and underlying (bottom) marginalized KDEs of our 4 BBH parameters, ⃗ θ = [Mc, q, χ eff , z], for our 5 different formation channels with αCE = 1.0, χ b = 0.2.Note the unique features of each formation channel and their dependence on the chosen model χ b = 0.2: while the χ eff marginalized KDE for field channels CE and SMT peak at χ eff = χ b = 0.2, with the CE channel having support for larger χ eff through tidal spin-up, dynamical channels GC and NSC peak at χ eff = 0 due to the independently isotropically oriented spins of the BBHs with symmetric wings to more extreme χ eff values from hierarchical mergers, and the CHE channel has χ eff exceeding χ b = 0.2 due to significant tidal spin-up of both black hole progenitors.The first row is the same as the third row of Figure1fromZevin et al. (2021b), except here we have calculated the detection weighting with O4 sensitivities, and the black ticks show the median values of the parameter estimation posteriors for each event in GWTC-2, while the green and red ticks mark the events added to and removed from, respectively, this updated analysis that includes GWTC-3.

Figure 2 .
Figure 2. Marginalized branching fractions inferred from 68 BBHs from the cumulative GWTC-3 for each of the five formation channels.Colored curves show the contributions from different values of χ b (top row) and αCE (bottom row), marginalized over αCE and χ b , respectively; the black dotted curve shows the total contribution from all hypermodels.Note the log-scaling of the y-axis; the 90% symmetric credible intervals are marked by vertical lines.

Figure 3 .
Figure 3. Marginalized branching fraction posteriors for 50 (first row), 150 (second row) and 250 (third row) mock detections.Here, branching fraction posteriors of different channels are plotted together as different colored lines, while contributions from different hyper-models are shown in different columns.The first column (from the left) shows the total marginalized branching fraction; the triangles mark the credible symmetric 90% intervals.The second and third columns show the contributions from the two values of χ b (marginalized over αCE) with the most support; the fourth and fifth columns are analogous for αCE.Note that the second through fifth columns do not represent normalized probability distributions; we plot them to scale with their contribution to the total marginalized posterior in the first column, just as with the colored curves in Figure2.True values of the underlying branching fractions (βCE = 40%, βCHE = 5%, βGC = 25%, βNSC = 5%, βSMT = 25%) are shown by the vertical colored lines in the "Overall" column, where βNSC and βSMT are given small artificial offsets, while χ b = 0.0 and αCE = 1.0.

Figure 4 .
Figure 4.The same figure as Figure 3 except with detectable branching fractions rather than underlying.

χ
b increases by nearly 5 orders of magnitude, strongly selecting the true value χ b = 0.0.This is illustrated by the empty plot corresponding to the contribution from χ b = 0.1 for N obs = 250 (third row, middle column), indicating negligible support for competing values of χ b .

Figure 5 .
Figure5.The percent decrease in the branching fraction posterior 90% symmetric credible interval width from 50 to 250 mock detections plotted against the channel's detection efficiency.The black dotted vertical lines mark which points correspond to which formation channel.While the underlying branching fractions (solid line) converge faster with N obs in channels with higher detection efficiencies, no such correlation is seen in the detectable branching fractions (dashed line).

Figure 6 .
Figure6.Marginalized branching fraction posteriors inferred from a catalog of 100 mock detections for a universe where the only BBH formation channel is (top to bottom) CE, CHE, GC, NSC, and SMT.As in the top row of Figure2, different colored curves represent support for different values of χ b marginalized over αCE, while the dotted black curve gives the total marginalized posterior.We have also included the prior in gray.The vertical lines mark the 90% symmetric credible interval.

Figure 7 .Figure 8 .
Figure 7. Marginalized branching fraction posteriors in the same format as Figure 6, except here the rows show two universes (each with mock observations) with different true values of χ b (0.0, top, and 0.2, bottom).The true values of the branching fractions (βj = 20% for all channels j) are given by the solid vertical lines, color-coded by the true value of χ b .

Figure 9 .
Figure 9. Marginalized branching fraction posteriors.The first row shows the results from the full inference in our χ b = 0.2, αCE = 1.0 equal branching-fraction universe with N obs = 250, identical to the second row of Figure 7, while each of the remaining rows shows the result of inference performed without knowledge of one of the formation channels.The format of this plot is the same as Figures 6 and 7.

Figure 10 .
Figure10.The same as Figure2, showing the marginalized branching fractions inferred from 68 events in the cumulative GWTC-3 catalog, but with the prior at the posterior points evaluated with a KDE, rather than analytically.

Figure 11 .
Figure11.The same as Figure2, except here we show detectable branching fractions inferred from GWTC-2.1 and GWTC-3 data rather than underlying branching fractions.
and a CE efficiency of α CE = 1.0.

Table 2 )
. The Bayes Factor in favor of the correct

Table 2 .
Uncertainties in the measurement of the branching fractions with N obs shown in Figures3 and 4. The right-most two columns give the Bayes factors of the true value of χ b and αCE over the most favored competing value for 50, 150, and 250 mock detections (first, second, and third row).The rest of the table shows the underlying and detectable branching fraction uncertainties, as well as their percent changes between the first and third rows.Note that these are credible 90% interval widths in percentage points, not percent uncertainties.