) Population Properties of Compact Objects from the Second LIGO-Virgo Gravitational-Wave Transient Catalog

We report on the population of 47 compact binary mergers detected with a false-alarm rate of < - 1 yr 1 in the second LIGO – Virgo Gravitational-Wave  Transient  Catalog. We observe several characteristics of the merging binary  black  hole ( BBH ) population not discernible until now. First, the primary mass spectrum contains structure beyond a power law with a sharp high-mass cutoff; it is more consistent with a broken  power  law with a break at  - + M 39.7 9.1 20.3 or a power  law with a Gaussian feature peaking at


ABSTRACT
We report on the population of the 47 compact binary mergers detected with a false-alarm rate <1 yr −1 in the second LIGO-Virgo Gravitational-Wave Transient Catalog, GWTC-2. We observe several characteristics of the merging binary black hole (BBH) population not discernible until now. First, the primary mass spectrum contains structure beyond a power-law with a sharp high-mass cut-off; it is more consistent with a broken power law with a break at 39.7 +20.3 −9.1 M , or a power law with a Gaussian feature peaking at 33.1 +4.0 −5.6 M (90% credible interval). While the primary mass distribution must extend to ∼ 65 M or beyond, only 2.9 +3.5 −1.7 % of systems have primary masses greater than 45 M . Second, we find that a fraction of BBH systems have component spins misaligned with the orbital angular momentum, giving rise to precession of the orbital plane. Moreover, 12% to 44% of BBH systems have spins tilted by more than 90 • , giving rise to a negative effective inspiral spin parameter χ eff . Under the assumption that such systems can only be formed by dynamical interactions, we infer that between 25% and 93% of BBH with non-vanishing |χ eff | > 0.01 are dynamically assembled. Third, we estimate merger rates, finding R BBH = 23.9 +14.3 −8.6 Gpc −3 yr −1 for BBH and R BNS = 320 +490 −240 Gpc −3 yr −1 for binary neutron stars. We find that the BBH rate likely increases with redshift (85% credibility), but not faster than the star-formation rate (86% credibility). Additionally, we examine recent exceptional events in the context of our population models, finding that the asymmetric masses of GW190412 and the high component masses of GW190521 are consistent with our models, but the low secondary mass of GW190814 makes it an outlier.
Keywords: gravitational waves 1. INTRODUCTION We analyze the population properties of black holes (BHs) and neutron stars (NSs) in compact binary systems using data from the LIGO-Virgo Gravitational-wave Transient Catalog 2 (GWTC-2; Abbott et al. 2020c). The GWTC-2 catalog combines observations from the first two observing runs (O1 and O2; Abbott et al. 2019b) and the first half of the third observing run (O3a; Abbott et al. 2020c) of the Advanced LIGO (Aasi et al. 2015) and Advanced Virgo (Acernese et al. 2015) gravitational-wave observatories. With the 39 additional candidates from O3a, we have more than quadrupled the number of events from O1 and O2, published in the first LIGO-Virgo Transient Catalog (GWTC-1; Abbott et al. 2019b). Counting only events with false alarm rate (FAR) < 1 yr −1 (as opposed to the less conservative FAR threshold of < 2 yr −1 in GWTC-2), the new combined catalog includes: two binary NS (BNS) events, 44 confident binary black hole (BBH) events, and one NS-BH (NSBH) candidate, which may be a BBH-a topic we discuss below. We define BBH events as systems where both masses are above 3 M at 90% credibility. These 47 events are listed in Table 1. Our chosen FAR threshold ensures a relatively pure sample with only ∼ 1 noise event (see Section 2) and excludes three marginal triggers presented in GWTC-2. Two of these excluded events are BBH candidates (GW190719_215514 and GW190909_114149) and one is an NSBH candidate (GW190426_152155).
The latest observations include a number of individually remarkable events, which invite theoretical speculation while providing challenges to existing models. The observation of high-mass binaries such as GW190521 (Abbott et al. 2020e), which has a primary mass m 1 ∼ 85 M , is in tension with the sharp mass cutoff m max = 40.8 +11.8 −4.4 M inferred from the GWTC-1 detections, forcing us to reconsider models for the distribution of black hole (BH) masses in binary systems (Abbott et al. 2020e,f). Here and in the following, the primary mass m 1 refers to the bigger of the two component masses in the binary, while the secondary mass m 2 refers to the smaller of the two. Along with GW190521, GW190602_175927 and GW190519_153544 also have primary masses above 45 M at > 99% credibility. These high-mass binaries are interesting from a theoretical perspective since they occur in the predicted pair-instability gap (Barkat et al. 1967;Fowler & Hoyle 1964;Heger et al. 2003;Woosley & Heger 2015;Belczynski et al. 2016). Additionally, GWTC-2 includes the first systems with confidently asymmetric component masses, including GW190412 with mass ratio m 2 /m 1 ≡ q = 0.28 +0.12 −0.06 (Abbott et al. 2020a) and GW190814 (Abbott et al. 2020b), with q = 0.112 +0.008 −0.009 . Furthermore, the secondary mass of GW190814, m 2 = 2.59 +0.08 −0.09 M , is near the purported NS-BH gap (Bailyn et al. 1998;Özel et al. 2011;Farr et al. 2011), posing a challenge to our understanding of binary formation (Abbott et al. 2020b;Zevin et al. 2020;Mandel et al. 2021). We can gain insight into these exceptional events by studying them in the context of the larger population of compact binaries (Fishbach et al. 2020b).
In particular, studying the enlarged population of BBH events enables us to investigate how compact binaries form. 1 Several evolutionary channels have been proposed to explain the origin of the compact binary mergers observed with Advanced LIGO and Advanced Virgo. The isolated binary channel may occur via common envelope evolution (e.g., Bethe & Brown 1998;Portegies Zwart & Yungelson 1998;Belczynski et al. 2002;Dominik et al. 2015), the remnants of Population III stars (e.g., Madau & Rees 2001;Inayoshi et al. 2017), or chemically homogeneous stellar evolution (e.g., Marchant et al. 2016;. Evolution via common envelope predicts BBH systems with component masses up to ∼ 50 M , mass ratios in the range 0.3 q < 1, and nearly aligned spins (Kalogera 2000;Mandel & O'Shaughnessy 2010;Dominik et al. 2013;Giacobbo et al. 2017;Eldridge et al. 2017;Olejak et al. 2020).
In the chemically homogeneous scenario, BBH systems are expected to form with nearly equal mass components, in addition to aligned spins and component masses in the range ∼ 20-50 M . In isolated formation scenarios, component BHs form via stellar collapse, and are thus not expected to occur within the pair-instability mass gap, ∼ 50-120 M , but may populate either side of the gap.
Before we continue, we summarize key questions addressed in the previous analysis of GWTC-1 data (Abbott et al. 2019a) and how the inclusion of O3a events affects our findings: 1. Are there BBH systems with component masses 45 M ? Following O1 and O2, we inferred that 99% of BBH systems have primary mass below m 99% ≈ 45 M . Moreover, this limit was consistent with a sharp cutoff at m max = 40.8 +11.8 −4.4 M , hypothesized to correspond to the lower edge of the pair-instability mass gap expected from supernova theory (Woosley & Heger 2015;Heger et al. 2003;Talbot & Thrane 2018). While the GWTC-2 events remain consistent with 97.1 +1.7 −3.5 % of BBH systems having primary masses below 45 M (inferred using the the Power Law + Peak mass model described in Section 3, or "Model C" from Abbott et al. 2019a), high-mass detections such as GW190521, GW190602_175927 and GW190519_153544 imply a non-zero rate of BBH mergers beyond the ∼ 45 M mass limit. We infer that the merger rate for systems with primary masses in the range 45 M < m 1 < 100 M is 0.70 +0.65 −0.35 Gpc −3 yr −1 , consistent with estimates inferred from GWTC-1 (Fishbach et al. 2020b).
2. Is there a preference for nearly equal component masses? All of the GWTC-1 observations are consistent with equal-mass binaries, with mass ratios q ≡ m 2 /m 1 = 1. In O3a, however, we detected two events with mass ratios bounded confidently away from unity: GW190814 and GW190412, though, most binaries are consistent with equal-mass. The NSBH candidate GW190426_152155, if real, also has unequal component masses q = 0.25 +0.41 −0.15 , but is above the FAR threshold for this work.
3. Does the merger rate evolve with redshift? From GWTC-1 we inferred that the BBH merger rate is 53.2 +55.8 −28.2 Gpc −3 yr −1 , assuming a rate density that is uniform in comoving volume. Allowing for a rate that evolves with redshift (Fishbach et al. 2018), we found that the local merger rate is R BBH (z = 0) = 19.7 +57.3 −15.9 Gpc −3 yr −1 , and that, while still consistent with a uniformin-comoving volume model, the rate density is likely increasing with redshift with 93% credibility (Abbott et al. 2019a). With GWTC-2, we are able to more tightly bound the BBH merger rate at R BBH = 23.9 +14.3 −8.6 Gpc −3 yr −1 (assuming the Power Law + Peak mass model and a constantin-comoving-volume merger rate), as well as its evolution with redshift. The data remain consistent with a merger rate that does not evolve with redshift, but prefer a rate that increases with redshift (85% credibility). Using the Power Law redshift evolution model of Section 3, we find that the merger rate evolves slower than the naive expectation of (1 + z) 2.7 from the local (z 1) star formation rate (SFR; Madau & Dickinson 2014) at 86% credibility. 4. How fast do black holes spin? From GWTC-1, we inferred that the BH spin component aligned with the orbital angular momentum is typically small (Abbott et al. 2016a;Farr et al. 2017;Farr et al. 2018;Tiwari et al. 2018;Abbott et al. 2019a;Wysocki et al. 2019a;Fernandez & Profumo 2019;Miller et al. 2020;Roulet et al. 2020). Among the GWTC-1 events, GW151226 is the only event to exhibit unambiguous signs of spin (Abbott et al. 2016b;Vitale et al. 2017a;Kimball et al. 2020a), while a few other events, including GW170729, show a mild preference for spin (Chatziioannou et al. 2019). We were unable to determine if this typical smallness was because the spin vectors are misaligned, because the spin magnitudes are small, or both, although the GWTC-1 data weakly disfavors the scenario in which all spins are perfectly aligned Tiwari et al. 2018;Abbott et al. 2019a;Biscoveanu et al. 2020). With additional data, we can now say confidently that some BBH systems have spins misaligned with the orbital angular momentum. A nonzero fraction of systems have measurable in-plane spin components, leading them to undergo precession of the orbital plane. Additionally, 12% to 44% of BBH systems merge with a negative effective inspiral spin parameter χ eff , see Eq. (5) below, implying that some component spins are tilted by more than 90 • relative to the orbital angular momentum axis. We refer to spins tilted more than 90 • as anti-aligned spins. 2 While some events identified in O3a have individually measurable nonzero spin, they occur infrequently, consistent with our previous estimates. We identify nine out of 44 BBH events in GWTC-2 with a positive effective inspiral spin parameter that excludes zero at greater than 95% credibility. 3 These nine events include both massive BBH systems like GW190519_153544, with a source primary mass m 1 = 66.0 +10.7 −12.0 M (90% credibility, uniform in redshifted mass prior) and less massive BBH systems like GW190728_064510, with m 1 = 12.3 +7.2 −2.2 M . No individual BBH events are observed with confidently negative effective inspiral spin parameter.

5.
What is the minimum black hole mass? Using GWTC-1, we previously inferred that, if there is a low-mass cut-off in the BBH mass spectrum, it is somewhere below 9 M (Abbott et al. 2019a).
With GWTC-2, we tighten the constraints on the minimum BH mass in a BBH system, finding m min < 6.6 M (90% credibility). Furthermore, we find that if the BH mass spectrum extends down to 3 M , it likely turns over at ∼ 7.8 +1.8 −2.0 M . This suggests that there may be a dearth of systems between NS and BH masses ). However, the O3a observation of GW190814 (Abbott et al. 2020b), with a sec-ondary mass m 2 = 2.59 +0.08 −0.09 M , complicates this picture. If the secondary mass is a BH, it would indicate that the BH mass spectrum extends below 3 M , to much lower masses than exhibited by the Galactic X-ray binary population (Bailyn et al. 1998;Özel et al. 2011;Farr et al. 2011, but see also Kreidberg et al. 2012;Thompson et al. 2019;Mandel et al. 2021). Alternatively, the secondary object in GW190814 could be an NS , but it would likely have to be significantly spinning to satisfy constraints on the maximum NS mass (Abbott et al. 2020b;Most et al. 2020;Tan et al. 2020;Essick & Landry 2020;Tews et al. 2020;Zhang & Li 2020). Either way, we find that GW190814 is an outlier with respect to the BBH population and the models we consider in this work. Unless stated otherwise, when presenting results on the BBH population, we exclude GW190814.
The remainder of this paper is organized as follows. In Section 2 we describe the data used in this study and detail how we select events for analysis. In Section 3, we provide a high-level overview of our models for the distributions of binary mass, spins, and redshift. In Section 4, we describe the hierarchical method used to fit population models to data. In Section 5, we present key results and discuss the astrophysical implications. Concluding remarks are provided in Section 6. The Appendix provides additional details regarding the statistical method (Appendix A) and descriptions of models (Appendix B, D, E), outlier analyses and model checking (Appendix C), and other supplementary material, including a discussion of gravitational lensing (Appendix F). The data release for this paper is available online in Abbott et al. (2020d).

DATA AND EVENT SELECTION
Searches for gravitational wave transients in the O3a data identified 39 candidate events with FAR below 2 yr −1 (Abbott et al. 2020c). This FAR cut excludes two BBH candidates and one low-significance NSBH candidate. It is unlikely the results of our BBH analyses would differ qualitatively with the inclusion of these two additional BBH events because they are typical of other, more confident GWTC-2 detections, and including them would lead to a modest 5% increase in the catalog size.
The event list was collated by combining results from several pipelines searching for compact binary mergers using archival data. The search pipelines include GstLAL (Sachdev et al. 2019;Hanna et al. 2020;Messick et al. 2017) and PyCBC (Nitz et al. 2019a;Allen et al. 2012;Allen 2005;Canton et al. 2014;Usman et al. 2016;Nitz et al. 2017), which use template-based matched filtering techniques, and cWB (Klimenko & Mitselmakher 2004;Klimenko et al. 2016), which uses a wavelet-based search that does not assume a physically parameterized signal model. These pipelines, along with two additional pipelines, MBTA (Adams et al. 2016), and SPIIR (Chu 2017), recovered most of the event candidates in lowlatency.
For the population studies presented here, the event list is further restricted to the 36 events with FAR < 1 yr −1 as a means to increase the purity of the sample. This FAR threshold excludes the lower-significance triggers GW190426_152155 (a potential NSBH or BBH candidate), GW190719_215514 and GW190909_114149 that appear in Abbott et al. (2020c). At this FAR threshold, we expect ∼1 noise event, and therefore a contamination fraction of 3%. 4 In this work we assume that all of the event candidates that meet this FAR threshold are of astrophysical origin. For the population analysis of the GWTC-1 BBH events, the selection criteria used for inclusion in the study is the FAR and the probability p astro that the source is of astrophysical origin. This value was only computed for BBHs with FAR < 2 yr −1 in Abbott et al. (2020c), so we simplify our criterion to only select on FAR. The set of GWTC-1 events that pass this FAR threshold is identical to the previous set chosen by FAR and p astro . Therefore, while the selection criteria here are different from our GWTC-1 analysis, the analyzed events are consistent.
In addition to the 36 events from O3a which passed the FAR threshold, the 11 detections presented in GWTC-1 (Abbott et al. 2019b) are also included in the event list used here to infer properties of the underlying population. All 47 events used in this analysis are tabulated in Table 1. Of these systems, 44 have both masses confidently in the BH mass range, with m 1 ≥ m 2 > 3 M . Unless stated otherwise, we restrict BBH population results to these 44 (see also Appendix C.3). Since our statistical framework relies on accurately quantifying the selection effects of our search, we only include events identified in GWTC-2, for which we have measured the search sensitivity; see Appendix A. In particular, the event list does not include candidates identified by independent analyses Venumadhav et al. 2020Venumadhav et al. , 2019Zackay et al. 2019;Nitz et al. 2019bMagee et al. 2019) of the publicly released LIGO and Virgo data (Abbott et al. 2019c;Trovato et al. 2020). Galaudage et al. (2019) and Roulet et al. (2020) suggest that our results are unlikely to change significantly with the inclusion of these events, and in the future it may be possible to include events from independent groups using a unified framework for the calculation of significance (Ashton & Thrane 2020;Pratten & Vecchio 2020;Roulet et al. 2020).
Parameter estimation results for each candidate event (Abbott et al. 2020c) were obtained using the LALInference (Veitch et al. 2015;LIGO Scientific Collaboration 2018), RIFT (Lange et al. 2018;Wysocki et al. 2019b), and Bilby (Ashton et al. 2019;Romero-Shaw et al. 2020b) pipelines, the latter employing the dynesty nested sampling tool (Speagle 2020). The parameter estimation pipelines use Bayesian sampling methods to produce fair draws from the posterior distribution function of the source parameters, conditioned on the data and given a models for the signal and noise (Abbott et al. 2016c).
For the BBH events previously published in GWTC-1, we use the published posterior samples inferred using the IMRPhenomPv2 (Hannam et al. 2014;Khan et al. 2016;Husa et al. 2016;Bohé et al. 2016) and SEOB-NRv3 (Pan et al. 2014;Taracchini et al. 2014;Babak et al. 2017) waveform models. For the new BBH events of GWTC-2, we use waveform approximants that include higher-order multipole content, including the IMRPhe-nomPv3HM (Khan et al. 2020), NRSur7dq4 (Varma et al. 2019) and SEOBNRv4PHM (Bohé et al. 2017;Ossokine et al. 2020). For all events, we average over these waveform families, in contrast to our previously published parameter estimation results for GW190521, which highlighted results from one waveform, NRSur7dq4 (Abbott et al. 2020e). A more complete description of the parameter estimation methods and waveform models used can be found in Section V of Abbott et al. (2020c).

POPULATION MODELS
In this section, we provide a high-level overview of the different population models used in this paper. Each model is given a nickname indicated in boldface. There are three categories of models: models for the shape of the mass distribution (Section 3.1), models for the spin distribution (Section 3.2), and models for the redshift distribution (Section 3.3). Readers interested in the astrophysical results may wish to skip ahead to Section 5. In Fig. 1, we provide graphical representations of each mass model described below. Additional details about each model are provided in Appendix B, including their mathematical definitions, lists of hyper-parameters, and their associated prior distributions.

Black hole mass distribution
• Truncated (4 parameters; Appendix B.1). Our simplest mass model, the distribution of primary masses is a power-law with hard cut-offs at both low (m min ) and high (m max ) masses. The highmass cut-off corresponds to the lower edge of the theorized pair-instability gap in the BH mass spectrum (Woosley & Heger 2015;Heger et al. 2003). The mass ratio distribution is also assumed to be a power law (Kovetz et al. 2017;. In Abbott et al. (2019a), it is referred to as "Model B." This model struggles to accommodate high-mass events like GW190521, necessitating more complicated models.
• Power Law + Peak (8 parameters; Appendix B.2). Similar to the Truncated model, but with two modifications. At low masses we implement a smoothing function to avoid a hard cut-off. At high masses, we include a Gaussian peak, initially introduced to model a pile-up from pulsational pair-instability supernovae (Talbot & Thrane 2018). This model is better able to accommodate the high-mass events that pose a challenge for the Truncated model. In Abbott et al. (2019a), it is referred to as "Model C." • Broken Power Law (7 parameters; Appendix B.3). The same as Truncated except, instead of a single power-law between m min and m max , the model allows for a break in the powerlaw at some mass m break . This model includes the low-mass smoothing function used in the Power Law + Peak model. The high mass feature m break may correspond to the onset of pair-instability, and the second power-law can be thought of as either a gradual tapering off, or a subpopulation of BHs within the pair-instability mass gap. This model accommodates the high-mass events that pose a challenge for the Truncated while providing an alternative to the Power Law + Peak model.  Figure 1. Graphical representations of the various mass distributions described in Section 3.1. Multi Spin, a model of both mass and spin, is similar to the mass distribution of Power Law + Peak, with a sharp lower mass cutoff rather than the smooth low mass turn-on.

Spin Distribution
• Default (4 parameters; Appendix D.1). This parameterization for the component BH spin magnitudes and tilts was previously used to explore the spin distribution of compact binaries in GWTC-1 (Abbott et al. 2019a). The spin of each component BH in a binary is assumed to be independently drawn from the same underlying distribution. The dimensionless spin magnitude is described using a beta distribution as in Wysocki et al. (2019a). The spin tilt distribution from Talbot & Thrane (2017) is a mixture model comprising two components: an isotropic component designed to model dynamically assembled binaries, and a second component in which the spins are preferentially aligned with the orbital angular momentum, as expected for isolated field binaries; the fraction of binaries in the purely-aligned sub-population is denoted ζ. 5 For this latter component, the spin tilt angles are distributed as a truncated Gaussian, with width σ t , that peaks when the BH spin is aligned to the orbital angular momentum. We use this model in concert with the mass models described above.
• Gaussian (5 parameters; Appendix D.2). While the Default spin model is physically inspired, this model, based on that of Miller et al. (2020), allows us to fit the distribution of phenomenological spin parameters χ eff (the effective inspiral spin parameter, Eq. 5) and χ p (the effective precession spin parameter, Eq. 6), assuming that their distribution is jointly described as a bivariate Gaussian. The ensemble properties of χ eff and χ p allow us to con-clude that the BBHs in GWTC-2 exhibit general relativistic spin-induced precession of the orbital plane (χ p > 0), and that some systems have component spins misaligned by more than 90 • (χ eff < 0) relative to the orbital angular momentum.
• Multi Spin (12 spin parameters, 10 mass parameters; Appendix D.3). This model allows for multiple subpopulations of BBH systems with distinct mass and spin distributions. Specifically, this model assumes a Truncated power-law mass distribution with the additional presence of a 2-D Gaussian subpopulation in m 1 and m 2 , truncated such that m 1 ≥ m 2 . While similar to the Power Law + Peak mass model, there is no smooth turn on and the mass ratio distribution is allowed to differ between each subpopulation. Most importantly, the two subpopulations have independently parameterized Default spin distributions. We use this model to test whether the BBH spin distribution varies as a function of mass as expected if higher-mass systems are the products of hierarchical mergers.

Redshift evolution
• Non-Evolving (0 parameters). Our default model posits that the merger rate is uniform in comoving volume.
• Power-law Evolution (1 parameter; Appendix E). Following Fishbach et al. (2018), the merger rate density is described by a power-law in (1 + z) where z is redshift. Given the finite range of Advanced LIGO and Advanced Virgo to BBH mergers, we only expect to constrain the redshift evolution at redshifts z 1 (Abbott et al. 2013). The farthest event in our analysis is likely GW190706_222641, at redshift z = 0.71 +0.32 −0.27 . 3.3 × 10 −2 Table 1. A summary of the events included in this analysis. Asterisks (*) denote binaries in which both components lie in the NS mass range while a dagger ( †) indicates a system in which the nature of the secondary component is unknown. Both of these classes are excluded from our analyses unless explicitly stated. From left to right, the columns show the event name, the 90% credible interval for primary source mass (units of M ), the 90% credible interval for secondary mass (units of M ), and the minimum available FAR. For events detected in more than one CBC detection pipeline, we report the lowest of the available FAR estimate. These credible intervals are obtained using a prior that is uniform in redshifted component mass and comoving volume, as in Table 6 of Abbott et al. (2020c).

METHOD
We adopt a hierarchical Bayesian approach, marginalizing over the properties of individual events to measure parameters of the population models described above; see, e.g., Mandel et al. 2019;Vitale 2020). Given data {d i } from N det gravitational-wave detections, the likelihood of the data given population parameters Λ is (Loredo 2004;Mandel et al. 2019;) (1) Here, N is the total number of events expected during the observation period (including both resolvable and unresolvable signals). Each event is described by a set of parameters θ, the likelihood of which is L(d|θ). The conditional prior π(θ|Λ), meanwhile, is defined by the mass, spin, and redshift models described above in Sec. 3. It serves to quantify the predicted distribution of event parameters θ given the hyper-parameters Λ of the population model. An example of a hyper-parameter is the power-law index α governing primary mass distribution π(m 1 |α) ∝ m −α 1 for the Truncated mass model. Finally, ξ(Λ) is the fraction of binaries that we expect to successfully detect for a population described by hyperparameters Λ. The procedure for calculating ξ(Λ) is described in Appendix A. One of our primary goals in this paper is to constrain the population hyper-parameters describing the distribution of gravitational-wave signals. Given a log-uniform prior on N , one can marginalize Eq. (1) over N to obtain (Fishbach et al. 2018;Mandel et al. 2019;Vitale 2020) To evaluate the single-event likelihood L(d i | θ), we use posterior samples that are obtained using some default prior π ∅ (θ). In this case, integrals over the likelihood can be replaced with weighted averages " . . . " over discrete samples. Equation (2), for example, becomes where the factor of π ∅ (θ) serves to divide out the prior used for initial parameter estimation. We evaluate the likelihoods for population models using the emcee, dynesty, and stan packages (Foreman-Mackey et al. 2013;Speagle 2020;Carpenter et al. 2017;Riddell et al. 2018). The likelihoods are implemented in a variety of software including GWPopulation  and PopModels . The priors adopted for each of our hyper-parameters are described in Appendix B. Throughout this paper, we find it useful to distinguish between the astrophysical distribution of a parameterthe distribution as it is in nature-and the observed distribution of a parameter-the distribution as it appears among detected events due to selection effects. The posterior population distribution for a given model represents our best guess for the astrophysical distribution of some source parameter θ, averaged over the posterior for population parameters Λ: The subscript Λ indicates that we have marginalised over population parameters. Meanwhile, the posterior predictive distribution refers to the population-averaged distribution of source parameters θ conditioned on detection.
In Appendix C, D and E, we provide posterior predictive checks for our population models. These checks consist of comparing simulated sets of N obs "predicted" and "observed" events. For every sample in the model hyper-posterior, we generate a set of predicted events by reweighting our found injections to the population model, and drawing N obs synthetic events. For each hyper-posterior sample, we then generate a set of observed events by reweighting the single-event posteriors of the N obs events to this population prior, and drawing one sample per event. Therefore, the inferred distribution of observed events depends on the population model considered. The first example of such a posterior predictive check is shown in Fig. 19. We calculate the cumulative distribution function for each set of predicted and observed events in the model hyper-posterior, and summarize these curves with 90% credibility bands. The uncertainty in the predicted bands comes from uncertainty in the population hyper-posterior, as well as Poisson fluctuations, because each cumulative distribution curve consists of N obs simulated events.
Phenomenological models such as we employ here can fail if their assumed form does not adequately represent reality or if priors are inappropriately chosen. Inferences from such models are inevitably prone to systematic error, given that the model is unlikely to be a perfect description of reality; for instance, the inferred local rate of BBH mergers is subject to a systematic error associated with our choice of model(s) for the BBH mass distribution. While such errors cannot be eliminated, we take steps to control and minimize their magnitude. First, we carry out posterior predictive checks to make sure our models are consistent with the data. Next, where possible, we check for consistency between models with different assumptions, for example, looking for common features in the Power Law + Peak and Broken Power Law models. Finally, we carry out tests to make sure that our inferences are not artifacts of our model design, for example, by applying the models to simulated uninformative data. Additional information about these tests is provided in the Appendix.

Mass Distribution
In this subsection we report results obtained using the mass models described in Section 3.1 (see Fig. 1). 6 We employ the Default spin model and the Non-Evolving redshift model. The results shown here are marginalized over the hyperparameters of the spin distribution.
A truncated power law fails to fit the high-mass BBH events. The Truncated model applied in Abbott et al. (2019a) measured the sharp high-mass cutoff to be m max = 40.8 +11.8 −4.4 M . When we fit this model to GWTC-2 data, we obtain m max = 78.5 +14.1 −9.4 M , in significant tension (> 99% credibility) with the GWTC-1 result; see Fig. 2. The Truncated model struggles to accommodate the high-mass systems of GWTC-2. At 99% credibility, three events of GWTC-2 have m 1 > 45 M (regardless of whether we use a population-informed prior; Fishbach et al. 2020b). This tension remains (at the > 93% credibility level) even when we exclude the highest-mass event GW190521 (Abbott et al. 2020e,f). The poor fit of the Truncated model is further seen in the posterior predictive check of Fig. 19 in Appendix C, which shows that the Truncated model fails to capture the relative excess of observations with m 1 ∼ 30 M compared to the number of events with m 1 45 M . Our fit to the Truncated model overpredicts the number of observations with m 1 > 45 M relative to the number of observations with 30M < m 1 < 45 M (98% credibility).
The Power Law + Peak, Broken Power Law, and Multi Peak models provide better fits to the shape of the mass distribution, particularly at high masses. Although our updated fit to the mass distribution extends to higher masses than the GWTC-1 fit, we find that 97.1 +1.7 −3.5 % of BBH systems have primary masses  Kimball et al. 2020a). In Table 2, we provide log Bayes factors (log 10 B) comparing the mass models; each Bayes factor is measured relative to the model with the highest Bayesian evidence: Power Law + Peak. In each case we use the Default spin model. For context, log 10 B > 1.5 is often interpreted as a strong preference for one model over another, and log 10 B > 2 as decisive evidence (Jeffreys 1961). While Bayes factors depend on the choice of hyperparameter priors, it is nonetheless possible to see that the Truncated model is disfavored compared to the more complicated models. This inference is driven in part by the fact that-in our posterior predictive checks-94% of the time, the Truncated model overpredicts the number of detections with m 1 > 50 M . See Appendix C for information about the posterior predictive checks. Meanwhile, there is not a strong preference for Power Law + Peak over Broken Power Law or Multi Peak. We currently lack the resolving power to determine whether the deviations from Truncated are best modeled as a break, a Gaussian peak, or two Gaussian peaks. As a further check, we carried out a follow-up analysis using a hybrid Broken Power Law + peak model, which indicated only modest support for a peak on top of the Broken Power Law distribution (log 10 B = 0.79).
There are features in the black hole mass spectrum beyond a power-law. Figure 3 shows the astrophysical merger rate density as a function of primary BH mass for the Truncated, Power Law + Peak, Figure 2. Posterior for the maximum mass using GWTC-1 and fit to the Truncated model (blue), compared to the posterior obtained by adding events from O3a data (red). The two distributions are inconsistent, suggesting the Truncated model is inadequate. The tension between GWTC-1 and GWTC-2 is somewhat alleviated by the exclusion of the high-mass event GW190521 (green). However, there remain several other high-mass events in O3a. The black dashed lines show primary mass posteriors for the three events in which m1 > 45 M at 99% credibility (we employ a prior that is uniform in redshifted masses). These events cause a significant shift in the mmax posterior if we assume a simple power-law fits the data.
Multi Peak and Broken Power Law models. We turn first to the Broken Power Law model (second panel in Fig. 3), which is characterized by two spectral indices, α 1 and α 2 , with p(m 1 ) ∝ m −α1 1 for m 1 < m break , and p(m 1 ) ∝ m −α2 1 above the break. We find the data prefer a break at m break = 39.7 +20.3 −9.1 M ; 90% credible bounds on the location of this break are denoted by the gray vertical band in the second panel of Fig. 3. For masses above the break, the Broken Power Law model prefers a significantly steeper powerlaw slope, from α 1 = 1.58 +0.82 −0.86 before to α 2 = 5.6 +4.1 −2.5 after. Figure 5 shows the joint posterior on α 1 and α 2 . We infer that α 2 > α 1 at credibility 98%. The break aligns with the cutoff m max inferred with GWTC-1 data (Abbott et al. 2019a), and we speculate that the steep drop-off in the merger rate that occurs after m break may be an imprint of PPSN, which are expected to become important for BH masses around 35 M (Woosley & Heger 2015;Heger et al. 2003).
The Power Law + Peak model (third panel of Fig. 3) produces a qualitatively similar fit to the one obtained from the Broken Power Law model. However, a key feature of the Power Law + Peak model is the Gaussian peak at 33.1 +4.0 −5.6 M , denoted by the gray vertical band in the third panel of Fig. 3. Evidence for a peak can be seen in Fig. 6, which shows the posterior for λ peak , the fraction of systems that belong to the Gaussian component. We see that λ peak = 0 (pure power law) is disfavored. It was envisioned (Talbot & Thrane 2018) that the power-law component of the Power Law + Peak model would terminate in the vicinity of this peak to create a high-mass gap. However, in order to  . Constraints on the power-law indices governing the primary mass distribution within the Broken Power Law model. The parameter α1 is the power-law index below the break, which is found to be m break = 39.7 +20.3 −9.1 while α2 is the index above the break. The dashed and solid contours mark the central 50% and 90% posterior credible regions, respectively, under a flat prior on α1, α2 in the range (−4, 12). We rule out with high confidence the hypothesis that α1 = α2, indicated by the dashed diagonal line, finding α2 > α1 with 98% credibility.
accommodate the most massive binaries in GWTC-2, the power-law extends to values of m max = 86 +12 −13 M . While the mass spectrum must extend to these high masses, we find that 99% of primary BH masses lie below m 99% ∼ 60 M ; see Table 3. The astrophysical rate density at ∼ 80M (the primary mass of GW190521) is two orders of magnitude lower than the rate density at ∼ 40M . However, because of selection effects, the posterior predictive distribution skews to much higher masses, as seen in Fig. 4, so that the probability of detecting at least one event with m 1 ≥ 80M after observing 44 BBH events drawn from the Power Law + Peak posterior predictive distribution of Fig 2. We find that λ peak = 0 (which corresponds to no Gaussian peak) is disfavored, supporting the hypothesis that there is a feature in the BH primary mass spectrum.
We cannot determine whether the high-mass events of GWTC-2 belong to a distinct subpopulation rather than a high-mass tail of the normal BBH population. An additional subpopulation may be expected if highmass BHs have a different origin from low mass ones; for example, if they are the products of hierarchical mergers Gerosa & Berti 2017;Chatziioannou et al. 2019;Doctor et al. 2019;Kimball et al. 2020a). Using the Multi Peak model (bottom panel in Fig. 3), which allows for a second high-mass Gaussian component at m 1 > 50 M in addition to the Gaussian component at m 1 40 M , we find that the addition of a second Gaussian peak is not preferred by the data. The Multi Peak model is mildly disfavored compared to Power Law + Peak, with a log 10 B = −0.3 (or Bayes factor B = 0.5) for Multi Peak relative to Power Law + Peak. Motivated by the hypothesis of hierarchical mergers, we consider a variation of the Multi Peak model, in which the location of the second peak is required to be at twice the value of the first peak; that is, µ m,2 = 2µ m,1 (see Appendix B.4. Studying the (µ m,1 , µ m,2 ) panel of the corner plot in Fig. 18, we see that the data mildly prefer the second peak at ≈ 70M , consistent with twice the value of the first peak at ≈ 35M . This "Modified MultiPeak" is mildly preferred over the original version by a Bayes factor of ∼ 2. A similar conclusion is found using the Multi Spin model; as discussed in Section 5.2, we find hints, but no significant evidence for subpopulations with distinct spin distributions. Additional evidence for hierarchical mergers is presented in Kimball et al. (2020b).
Within the framework of the Broken Power Law, Power Law + Peak and Multi Peak models, the most massive event, GW190521, appears to be a normal member of the BBH population in the context of the other GWTC-2 events (see Appendix C.2). The event GW190521 is an outlier if we consider it in the context of GWTC-1 with the Truncated model, but we interpret this as a limitation of the Truncated model (see Fig. 2; Abbott et al. 2020f).
The GWTC-1 detections showed that BHs more massive than ∼ 45 M merge relatively rarely, based on simple extrapolations from below 45 M . With GWTC-2, we are beginning to resolve the shape of the primary BH mass spectrum above 45 M . The implications are not yet clear, but there are intriguing possibilities. One hypothesis is that the events with m 1 > 45 M are simply the high-mass tail of the ordinary BBH population, and do not form through a distinct channel. For example, if the lower edge of the PPSN gap may be modeled as a smooth tapering rather than a sharp cutoff, the feature at m break = 39.7 +20.3 −9.1 may represent the onset of pair-instability. This explanation may pose challenges to our understanding of stellar evolution since the pairinstability cutoff of BH masses at ∼ 40 M is thought to be relatively abrupt Woosley 2017;Farmer et al. 2019), even though its precise location is uncertain Giacobbo et al. 2017;van Son et al. 2020;Farmer et al. 2020;Croon et al. 2020;Marchant & Moriya 2020). If the PPSN cutoff is indeed sharp and all observed BBH systems lie below the PPSN gap, the cutoff must occur at relatively high masses; in the Truncated model, m max = 78.5 +14.1 −9.4 M (or, excluding the most massive event, GW190521, m max = 57.0 +11.9 −6.6 M ). This may have significant implications for nuclear (Farmer et al. 2020) and particle (Croon et al. 2020;Ziegler & Freese 2020) physics.
The BBH primary mass distribution exhibits a global maximum between ∼ 4 and ∼ 10 M . Figure 7 shows the joint posterior for the m min and δ m parameters inferred using the Power Law + Peak and Broken Power Law mass models, including only BBH events with m 2 > 3 M . Recall that, while the Truncated model has a sharp cutoff at m min , the remaining models implement a smooth turn-on of width δ m above m min , causing the mass spectrum to peak and turn over between m min , and δ m + m min (Talbot & Thrane 2018). Using both the Broken Power Law model and the Power Law + Peak model, we find that the primary mass spectrum does not decrease monotonically from 3 M . Rather, it turns over at 7.8 +1.8 −2.0 M (Power Law + Peak model) or 6.02 +0.78 −1.96 M (Broken Power Law model). In other words, the mass distribution must turn over at m 1 > 3 M , with 99.9% credibility (assuming the Power Law + Peak model) or 98.5% credibility (Broken Power Law model). As seen in Fig. 7, if the BH low-mass cutoff is sharp (δ m = 0), then m min 4 M . Conversely, if the BH mass spectrum extends below m min 4 M , an extended turn-on δ m 3 M is required. These results support the existence of a low-mass gap (or dip) between ∼ 2.6 M (the secondary mass of GW190814; the most massive component mass observed below 3 M ) and ∼ 6 M , strengthening results from , although we cannot determine whether the low-mass gap is empty. Posterior distribution for population parameters mmin, the minimum BH mass, and δm, which controls the sharpness of the low-mass cut-off. A sharp cut-off corresponds to δm = 0. Analyzing the 44 BBH events (with the exclusion of GW190814), both models exclude (mmin = 3 M , δm = 0) with > 99% credibility, indicating that the rate drops off at low masses. To varying degrees, both models allow for mmin ≤ 3 M , δm > 0, suggesting that the low-mass gap may not be empty. See Appendices B.3 and B.2 for additional details about the δm, mmin parameters.
Since our models do not permit additional features beyond a smooth turn on to a power law at low masses, they struggle to accommodate GW190814 (with secondary mass at m 2 = 2.59 +0.08 −0.09 ). We can see this by comparing the mass distribution inferred from the events with m 1 ≥ m 2 > 3 M , discussed above, to the distribution inferred with GW190814. If GW190814 is a BBH system, the minimum BH mass must extend to m min = 2.18 +0.27 −0.16 M (see the dashed histograms in Fig. 8a). In Fig. 8b we show how the inclusion/exclusion of GW190814 affects the shape of the primary mass distribution below 5 M . We see that the two distributions are inconsistent at the low mass end, suggesting that there is a feature in the mass distribution between ∼ 2.6 M and ∼ 6 M that our models cannot capture. This effect can also be seen in the m 1% values inferred with/without GW190814, shown in Table 3. Assuming the Truncated and Power Law + Peak models, the m min posteriors inferred with GW190814 lie at the 0.25 +0.44 −0.20 and 0.78 +2.22 −0.69 percentiles, respectively, of the m min posteriors obtained without GW190814. Even using the Broken Power Law model, which admits greater overlap in the 1-dimensional m min posteriors inferred with/without GW190814, the addition of GW190814 significantly shifts the two-dimensional (δ m , m min ) pos-terior and the inferred mass spectrum. Assuming the Broken Power Law model with GW190814, the mass at which the mass spectrum turns over is shifted down to 2.59 +0.78 −0.39 M , which is inconsistent with the turnover mass (the low-mass local maximum) inferred without GW190814, 6.02 +0.78 −1.96 M . This indicates a failure of our models to fit GW190814 together with the BBH systems of GWTC-2. This finding is supported by additional studies described in Appendix C.3. Because GW190814 is a population outlier with respect to the BBH events of GWTC-2 and our choice of models, we exclude it from the analyses here unless otherwise indicated.
The distribution of mass ratios is broad. The GWTC-1 events are all individually consistent with q = 1. Describing the conditional mass-ratio distribution as a power law p(q|m 1 ) ∝ q βq , a population analysis of GWTC-1 allowed β q = 12 (our maximum prior bound), consistent with a mass-ratio distribution sharply peaked at equal-mass pairings (Abbott et al. 2019a;Fishbach & Holz 2020b). GWTC-2 saw the first detections of confidently asymmetric systems: GW190412 (Abbott et al. 2020a) and GW190814 (Abbott et al. 2020b). Excluding GW190814, our reconstruction of the mass ratio distribution is consistent with the results published in Abbott et al. (2020a) for the Power Law + Peak model and β q = 1.4 +2.5 −1.5 for the Broken Power Law model. We rule out distributions that are sharply peaked around q = 1, with β q < 2.9 (Power Law + Peak) and β q < 3.1 (Broken Power Law) at 90% credibility. However, we also disfavor distributions that prefer unequal-mass pairings, with β q > 0 at 92% (Power Law + Peak) and 94% (Broken Power Law) credibility. We find that 90% of systems in the underlying population have mass ratios q > 0.26 +0.14 −0.08 .

Spin Distribution
In this subsection, we highlight the results from the Gaussian, Default, and Multi Spin models. We fix the redshift distribution to a Non-Evolving merger rate. The Gaussian and Multi Spin models assume the mass distributions described in Appendix D.2 and D.3, respectively. For the Default spin model, we employ the Power Law + Peak mass model, simultaneously fitting the mass and spin distribution as in the previous subsection.
We observe spin-induced general relativistic precession of the orbital plane. As two BHs merge, the morphology of the resulting gravitational waveform depends on their spins. The spin-dependence of a gravitational waveform is determined in part by two phenomenological parameters. First, the effective inspiral spin pa-  rameter χ eff quantifies the spin components aligned with the orbital angular momentum (Damour 2001): Here, χ 1 and χ 2 are the dimensionless component spins, defined by χ i = |cS i /(Gm 2 i )| where S i is the spin angular momentum of component i, and θ 1 and θ 2 are the misalignment angles between the component spins and the orbital angular momentum. Second, spins with components perpendicular to the orbital angular momentum drive relativistic precession of the orbital plane (Apostolatos et al. 1994). The effect is quantified by the effective precession spin parameter (Schmidt et al. 2012;Hannam et al. 2014;Schmidt et al. 2015) A non-zero value of χ p indicates the presence of relativistic spin-induced precession of the orbital plane. Although the component spin tilts θ 1 and θ 2 appearing in Eqs. (5) and (6) generically evolve over the course of a binary inspiral, χ eff and χ p are themselves approximately conserved quantities (Kidder 1995;Schmidt et al. 2015).
The first unambiguous measurements of BH spin in gravitational-wave astronomy came from analyses of the BBH event GW151226. This system had χ eff > 0 at 99% credibility, with at least one of its components having spin magnitude χ > 0.2, and spin misalignment angles consistent with θ 1 = θ 2 = 0 (Abbott et al. 2016b). While analyses of GWTC-1 found no clear evidence for spin in the other events in GWTC-1 (Miller et al. 2020, but also see Zackay et al. 2019 andHuang et al. 2020), GWTC-1 is collectively inconsistent with a population of non-spinning BHs, if one allows for both spinning and non-spinning subpopulations (Kimball et al. 2020a). Moreover, population analyses of GWTC-1 mildly disfavor the scenario in which all spins are perfectly aligned (θ 1 = θ 2 = 0), although the degree of misalignment is degenerate with the spin magnitude distribution Farr et al. 2018;Tiwari et al. 2018;Abbott et al. 2019a;Wysocki et al. 2019b).
In GWTC-2, additional BBH events are observed with confidently positive effective inspiral spin parameter. No individual event is observed with confidently negative χ eff (Abbott et al. 2020c). Several events, including GW190521 (Abbott et al. 2020e,f) and GW190412 (Abbott et al. 2020a), show moderate evidence for non-zero χ p , but no single event unambiguously exhibits spininduced precession (Abbott et al. 2020c). Using the Default model described in Sec. 3 (see also Appendix D.1), we obtain evidence for non-vanishing spin-orbit misalignment among the population of BBH events in GWTC-2. The Default model describes the distribution of spin-orbit misalignments as a mixture between two components: a component with isotropicallyoriented spins and a preferentially-aligned component Left: Joint posterior on the fraction ζ of BBHs with preferentially-aligned spins (versus isotropic spins) and the spread σt of misalignment angles among this population obtained using the Default spin model (see Appendix D.1 for additional details). We rule out a population with perfectly aligned spins corresponding to ζ = 1 and σt = 0. The gray shaded region represents the region of parameter space inaccessible to our analysis. This region is artificially excluded due to sampling uncertainties even when analyzing uninformative samples drawn from the spin tilt prior. The dashed and solid contours mark the central 50% and 90% posterior credible regions, respectively, assuming a flat prior on ζ and σt. Right: Population predictive distributions for the effective precession spin parameter χp of BBH systems obtained using the Gaussian (blue) and Default (orange) spin models. Shaded regions show the central 90% credible bounds on p(χp) at a given spin value, while the solid lines show the median posterior prediction. The inset shows draws of the Gaussian χp distributions implied by the posterior on µp and σp. Broadly, we see support for two possible morphologies, indicated schematically by the dashed black curves. GWTC-2 is compatible with a χp distribution that is either broad, or one that is narrow and centered at µp ∼ 0.2.
with z = cos θ values centered at z = 0 (perfect alignment) with a Gaussian spread of width σ t . In the left side of Fig. 9, we show the joint posterior on σ t and the fraction ζ of events in the preferentially-aligned subpopulation. Perfect alignment corresponds to ζ = 1 and σ t = 0. We see that this case is ruled out at > 99% credibility. Thus, either a non-zero fraction of BBH events exhibit isotropically-oriented spins, or BBH spins are preferentially aligned to their orbits but with a nonvanishing spread. Either case constitutes an observation of in-plane spin components among the BBH population. The shaded gray tiles in the left side of Fig. 9 show the values of ζ and σ t that are artificially excluded by the prior and/or finite sampling effects; the true measurement using GWTC-2 lies well away from this artificial exclusion region. 7 In Fig. 10, discussed further below, we plot the range of component spin magnitude and tilt angle distributions recovered using the Default model. Although the data are consistent with tilt angle distributions that 7 In order to determine the artificial exclusion region, we generate prior samples for tilt angles θ 1,2 conditioned on the measured values of mass ratio q and spin magnitudes χ 1,2 . There are no prior samples in the gray tiles, which indicate that these tiles are artificially excluded due to finite sampling effects.
favor alignment, distributions that are highly peaked at cos θ 1,2 = 1 are ruled out.
A similar conclusion regarding the presence of in-plane spin components may be drawn using Gaussian spin model, which imposes an entirely different parameterization for the BH spin distribution and makes different assumptions regarding their masses. In particular, when measuring the mean µ p and standard deviation σ p of the χ p distribution, the case µ p = σ p = 0 is ruled out at > 99% credibility; fewer than 1% of posterior samples occur at µ p ≤ 0.05 and σ p ≤ 0.05. Since any non-zero µ p or σ p implies the existence of spin-induced precession, this result supports the observation of spin misalignment seen in the Default model. In the right side of Fig. 9, the dark blue curve and shaded blue region mark the median and 90% credible bound, respectively, on p(χ p ) as inferred by the Gaussian model. While the blue region in this figure suggests a χ p distribution that peaks at ∼ 0.2, there are in fact two morphologies preferred by the data according to the Gaussian model: the recovered χ p distribution is either broad-or narrowly peaked at χ p ≈ 0.2. This is illustrated by the inset, in which we plot an ensemble of distributions corresponding to individual draws from the (µ p , σ p ) posterior; the dashed black curves highlight traces representative of the two permitted morphologies.
For comparison, the orange curve in the right side of Fig. 9 shows the χ p distribution implied by the Default results discussed above. There are several potentially meaningful differences between the results from the Gaussian and Default models. In particular, the Default model predicts χ p distributions that are generally broader and peaked at lower values. This is due to additional physical constraints imposed by the Default spin model; component spins are presumed to preferentially cluster about θ = 0, an assumption that preferentially favors smaller χ p values. Nevertheless, the two models agree well within statistical uncertainties, 8 indicating that the identification of spin-induced precession is robust to the systematic modeling choices and prior uncertainties.
As mentioned above, GW190521 and GW190412 individually show mild evidence of precession, with χ p posteriors shifted away from their respective priors (Abbott et al. 2020a,e,f). To verify that our populationlevel conclusions are not driven primarily by these two events, we have repeated the Gaussian analysis excluding GW190521 and GW190412. Our results again exclude µ p = σ p = 0 at a similar level of confidence (> 99% credibility). This implies that the signature of precession observed here is due to the combined influence of many systems with only weakly measured χ p , consistent with expectations from simulation studies (Fairhurst et al. 2019;Wysocki et al. 2019b).
The injection sets used to quantify search selection effects (see Appendix A) contain only events whose component spins are perfectly aligned with their orbital angular momenta. The results in Fig. 9 therefore do not account for systematics possibly affecting our ability to detect events with misaligned spins. The matched filter template banks adopted by the GstLAL and PyCBC search pipelines, for instance, are composed of purely alignedspin waveforms, and so may have reduced sensitivity to events with high χ p (Harry et al. 2016;Calderón Bustillo et al. 2017). Selection effects can, however, only decrease the efficiency with which events with large in-plane spins are detected; incorporating such effects would further shift the posterior in Fig. 9 away from ζ = 1 and σ t = 0 and/or more strongly rule out a delta function at χ p = 0. Thus, the presence of in-plane spin components is robust to selection effects. The specific preference for µ p ≈ 0.2, though, may not be. In the future, accurately characterizing the effects of in-plane spins on detection efficiency will be crucial in order to robustly determine the shape of the cos θ and χ p distributions.
We observe anti-aligned spin, which may suggest the presence of more than one binary formation channel. Using the Gaussian model described in Sec. 3, we infer the presence of systems with negative effective inspiral spin parameter: χ eff < 0. Thus, there exist BBH systems with at least one component spin tilted by θ > 90 • relative to the orbital angular momenta. Figure 11 shows posteriors for the mean µ eff and standard deviation σ eff of the χ eff distribution, marginalized over µ p , σ p , and the covariance between the effective inspiral spin parameter and the effective precession spin parameter. With a peak at µ eff = 0.06 +0.05 −0.05 , we find that most systems have small but positive χ eff , in agreement with the inference from GWTC-1 (Miller et al. 2020;. With GWTC-2, we can now also constrain the width of the χ eff distribution. The result, σ eff = 0.12 +0.06 −0.04 , requires that a nonzero fraction of BBH systems have χ eff < 0. Unlike the constraints on the χ p distribution presented above, the results for the presence of negative effective inspiral spin parameter do incorporate selection effects via the prescription described in Sec. 4.
Analysis with the Default spin model is also suggestive of an anisotropic distribution of spin orientations. In Fig. 10, we plot the population distribution of cos θ 1,2 reconstructed using the Default model. While the cos θ 1,2 distribution shows a preference for primarily aligned spins, with cos θ 1,2 > 0, it also exhibits non-vanishing posterior support for cos θ 1,2 < 0, indicating the presence of component spins misaligned by more than 90 • . The χ eff distribution inferred with the Default model closely matches the distribution inferred using the Gaussian model; compare the orange and blue bands in the right panel of Fig. 11. The two models therefore agree on the fraction of systems with anti-aligned component spins.
To further verify that the apparent presence of events with negative χ eff is physical and not an artifact of our choice of models, we repeat our inference of the Gaussian χ eff distribution, this time permitting the minimum allowed effective inspiral spin parameter χ min eff (until now fixed to χ min eff = −1) to vary as an additional hyper-parameter to be inferred from the data. When fitting for χ min eff alongside µ eff and σ eff , we find that χ min eff is less than zero at 99% credibility (see Fig. 27 in the Appendix), confirming that the evidence for anti-aligned spin is not an artifact of our parameterization. Allowing χ min eff to vary yields similar results for the implied χ eff distribution, and in particular, the fraction of systems with negative χ eff . momenta. An isotropic spin orientation, which corresponds to a uniform distribution in cos θ1,2, is disfavored but not ruled out. The data do, however, rule out a highly peaked distribution at cos θ1,2 = 1. Rather, the data are consistent with a gently peaked distribution, with a modest preference for aligned spin (cos θ1,2 > 0). . Left: Posterior for the mean µ eff and standard deviation σ eff of the BBH χ eff distribution, obtained using the Gaussian model described in Appendix D.2. We marginalize over the parameters governing the distribution of the effective inspiral spin parameter and the effective precession spin parameter. While we infer a χ eff distribution that is peaked at positive values, its measured width implies that a non-zero fraction of BBH systems have negative χ eff , implying component spins misaligned by t1,2 > 90 • relative to the orbital angular momentum. Right: Population predictive distributions for the effective inspiral spin parameter χ eff obtained with both the Gaussian and Default spin models. Shaded regions show the central 90% credible bounds on p(χ eff ) and the solid lines show the median posterior prediction for the χ eff distribution.
The presence of BBH systems with negative effective inspiral spin parameter carries implications for the formation channels that give rise to stellar-mass BBH mergers. BBHs born in the field from isolated stellar progenitors are predicted to contain components whose spins are nearly aligned with their orbital angular momenta, although sufficiently strong supernova kicks might produce In particular, f n > 7% at 99% credibility. This result is in contrast to results obtained using GWTC-1 alone, which did not exhibit a confidently non-zero fraction of events with negative χ eff (Abbott et al. 2019a;Miller et al. 2020). Additionally, the relatively small fraction f v of binaries with vanishing spins may provide clues about how BHs gain angular momentum, given recent studies suggesting that most BHs are born slowly rotating (Fuller & Ma 2019). While we define the vanishing bin to be −0.01 ≤ χ eff ≤ 0.01, the exact choice of width for the vanishing bin does not strongly affect the values of f p and f n , relative to one another. We obtain nearly identical results, albeit with a slightly weaker lower bound on f n , when we additionally allow the minimum effective inspiral spin χ min eff to vary as described above; in this case f p = 0.65 +0.17 −0.16 and f n = 0.26 +0.17 −0.21 . As mentioned above, dynamical formation in dense clusters is not the only astrophysical explanation of negative effective inspiral spin parameter. If stellar progenitors experience both strong natal kicks in supernovae and inefficient spin realignment, 10% of BBH systems formed through isolated binary evolution may have With these qualifications in mind, if we interpret negative χ eff as indicative of dynamical formation in stellar clusters, then our constraints on f n can be used to infer the fraction of dynamically assembled binaries. We assume that dynamical assembly in dense stellar environments yields a χ eff distribution that is symmetric about zero, while isolated binary evolution produces only positive χ eff . Among the binaries with non-negligible spin (excluding those in the "vanishing" category above), the fractions f d and f i of binaries arising from dynamical and isolated channels are We find 0.25 ≤ f d ≤ 0.93 at 90% credibility, suggesting that both the field and the dynamical cluster scenarios contribute to the BBH mergers observed in GWTC-2.
Because the relative values of f n and f p are not sensitive to the width of the vanishing χ eff bin, this conclusion does not depend strongly on the definition of vanishing spin. At present, we are unable to include a systematic investigation of waveform error in our analysis of anti-aligned spin and orbital precession. However, preliminary studies suggest that waveform error is unlikely to significantly affect this and other results in this paper. As described in Abbott et al. (2020c), there is good agreement regarding the parameters of the GWTC-2 events when inferred with different waveforms. There is a caveat: our studies do not include eccentric waveforms. Romero-Shaw et al. For future work, it would be worthwhile to estimate the systematic error using different waveform approximants.
No strong evidence for variation of the spin distribution with mass. BHs born in hierarchical mergers inherit the orbital angular momenta of their progenitor systems, leading to significant spin magnitudes χ ≈ 0.7 for nearly equal-mass systems (Pretorius 2005;Berti & Volonteri 2008;Gerosa et al. 2018;Doctor et al. 2019;Rodriguez et al. , 2019Kimball et al. 2020a). If hierarchical mergers are present in GWTC-2, then one may expect correlations between the spins and masses of BBH systems, with more massive hierarchical mergers also possessing larger spins. We use the Multi Spin model to explore possible trends in the BBH spin distribution with mass, allowing for a distinct low-mass and high-mass subpopulation (with primary mass distributions parameterized by a power-law and Gaussian, respectively; see Sec. D.3), each with a distinct spin distribution. The low-mass power-law has a weak preference for smaller spins, as compared to the highmass Gaussian. Both subpopulations disfavor perfectly aligned systems, though the low-mass subpopulation has more support for small misalignments. In spite of these differences, the uncertainties on both of these subpopulations are broad enough that the two are fully consistent with each other, and we cannot confidently claim to detect a mass-dependence to the spin distribution at this stage. This is demonstrated in Fig. 13, which shows the posteriors for the spin distribution hyper-parameters associated with each mass subpopulation. These findings support the results of previous studies on GWTC-1, which could neither exclude nor confidently detect variation of the spin distribution with mass (Safarzadeh et al. 2020; Tiwari 2020).

Merger rate and redshift evolution
In this subsection we use the Power Law + Peak and Broken Power Law mass models with the Default spin model, and infer the merger rate using the Non-Evolving redshift model and the Power-law evolution redshift model.
We better constrain the binary black hole merger rate. Assuming a log-uniform prior, we find a BBH merger rate of R BBH = 23.9 +14.3 −8.6 Gpc −3 yr −1 using the Power Law + Peak mass distribution and the assumption of a non-evolving merger rate den- sity. We find that estimates of the BBH merger rate are robust to our choice of mass model, with excellent agreement between the Power Law + Peak, Broken Power Law, and Multi Peak models. The Truncated model yields a higher merger rate than the other models, but the results agree within statistical uncertainties: R BBH = 33 +22 −12 Gpc −3 yr −1 . Our BBH merger rate estimate is consistent with the hypothesis that a significant fraction of the merger rate is due to dynamically assembled binaries in globular clusters (Portegies Zwart & McMillan 2000;O'Leary et al. 2006;Moody & Sigurdsson 2009;Downing et al. 2011;Rodriguez et al. 2015;Park et al. 2017;Fragione & Kocsis 2018;Antonini & Gieles 2020), young/open star clusters (Banerjee et al. 2010;Ziosi et al. 2014), nuclear star clusters (O'Leary et al. 2009;Miller & Lauburg 2009;Antonini & Rasio 2016), or active galactic nuclei discs (McKernan et al. 2018;Tagawa et al. 2020;Gröbner et al. 2020).
In Table 4, we show the merger rate in different primary mass bins, as inferred with the Broken Power Law, Power Law + Peak and Multi Peak models. Taking the range between the lowest 5% and highest 95% estimate across these three models in each mass bin, we find the merger rate to be ∼ 4-14 Gpc −3 yr −1 in the range of 10-20 M , ∼ 1.3-5.3 Gpc −3 yr −1 in the range of 20-30 M , and ∼ 1.3-5.2 Gpc −3 yr −1 in the range of 30-40 M . In Fig. 7, we showed that the primary mass spectrum turns over between 4-10 M . We estimate the merger rate in this range to be ∼ 3-21 Gpc −3 yr −1 .
Our estimate of the BBH merger rate include only systems with m 1 ≥ m 2 > 3 M , which notably excludes GW190814. If we calculate the merger rate for all systems down to m 2 ≥ 2 M using our models, thereby including GW190814, we infer a higher merger rate: −26 Gpc −3 yr −1 for the Power Law + Peak model. The reason for this change is that including GW190814 increases the low-mass rate (see Fig. 8b). However, because our mass distribution models do not extrapolate well to m 2 < 3 M (see Section 5.1), the fit with GW190814 likely overestimates the rate of systems with masses between ∼ 2.6 M and ∼ 6 M . Because of the uncertainty regarding the nature of GW190814 and the low significance of GW190426_152155 (the other NSBH candidate in GWTC-2), we do not attempt to model the NSBH mass distribution, and do not calculate an NSBH merger rate. An estimate of the merger rate for GW190814-like systems can be found in Abbott et al. (2020b).
We update the binary neutron star merger rate. We give an update to the BNS rate based on the two confident BNS detections in GWTC-2, GW170817 and GW190425.
We estimate a rate R BNS = 320 +490 −240 Gpc −3 yr −1 by Monte-Carlo sampling from a non-spinning distribution of systems, with component masses uniformly distributed between 1 M and 2.5 M , 9 and uniformly distributed in comoving volume and detector-frame time with isotropic orientations and sky locations. The detectability of each simulated system was approximated by requiring a network signal-to-noise ratio above 10 with signal-to-noise ratios above 5 in at least two detectors. Because of the longer observing time and the lack of additional detections, we find a slightly smaller value for the BNS rate than previously reported: R BNS = 320 +490 −240 Gpc −3 yr −1 . Assuming that there are 0.01 Milky Way equivalent galaxies (MWEG)  Table 4. Merger rate estimate in different primary mass bins. These results assume a non-evolving merger rate density, and exclude GW190814 from the analysis.
in 1 Mpc 3 (Kopparapu et al. 2008), this implies a rate of R BNS = 32 +49 −24 MWEG −1 Myr −1 . The BBH merger rate probably increases with redshift, but slower than the star-formation rate. Figure 14 shows the merger rate as a function of redshift using the Power law evolution model (see Appendix E for additional details, and Fig. 30 for a posterior predictive check). When we allow the merger rate to evolve with redshift according to (1+z) κ , we find that the z = 0 merger rate is R(z = 0) = 19.3 +15.1 −9.0 Gpc −3 yr −1 . The posterior for the rate evolution parameter κ is shown in Fig. 15. Since GWTC-2 includes events with greater redshifts than the events in GWTC-1, we obtain a much tighter constraint on the evolution of the merger rate; compare our updated constraints of κ = 1.3 +2.1 −2.1 (Power Law + Peak model) and κ = 1.8 +2.1 −2.2 (Broken Power Law model) to the GWTC-1 result of κ = 8.4 +9.6 −9.5 . We find that the merger rate is consistent with a non-evolving distribution (κ = 0), but is more likely to increase with increasing redshift, with κ > 0 at 85% credibility (Power Law + Peak model) or 91% (Broken Power Law model).

p(κ)
Power Law + Peak Broken Power Law Figure 15. Posterior for the redshift evolution parameter κ from the Power-law Evolution model, which assumes that rate density scales like (1 + z) κ . We assume the Power Law + Peak and Broken Power Law mass models, and take a flat prior on κ.
catalog has highlighted the limitations of some early population models while yielding remarkable new signatures: 1. We find that the BBH primary mass spectrum is not well-described as a simple power-law with an abrupt cut-off; there is a strong statistical preference for other models with non-trivial features such as a peak or a tapering. These features occur at ≈ 37 M , where one might expect pair instability supernovae (and pulsational pair-instability supernovae) to shape the mass distribution of BHs.
2. At the opposite end of the spectrum, we observe a dearth of systems between NS and BH masses, suggesting that the BH mass spectrum likely turns over at ∼ 7.8 +1.8 −2.0 M . We constrain the minimum mass of BHs in BBH systems to be m min 5.7M at 90% credibility. This is greater than the mass of BH candidates in Galactic binaries, e.g., Thompson et al. (2019). These results hold only when we restrict our analysis to events with both component masses above 3 M .
3. Meanwhile, we find that our models fail to fit GW190814 together with the BBH systems with both components above 3 M . This may indicate that GW190814 belongs to a distinct population, or that there are additional features at the low mass end of the BBH mass spectrum that are missing in our models. This is perhaps unsurprising as the combination of mass ratio, merger rate, and secondary mass inferred from this system pose a challenge to our current understanding of compact binary formation (Abbott et al. 2020b;Arca Sedda 2020b;Zevin et al. 2020;Safarzadeh 2020).
4. We detect clear evidence of spin-induced, general relativistic precession of the orbital plane. We determine that this signature is not due to a single precessing merger, but from the overall preference of the data for precessing waveforms.
5. We observe that some fraction of the BHs in GWTC-2 are spinning with an orientation that is anti-aligned with respect to the orbital angular momentum of the binary. If we plausibly assume that all binaries with anti-aligned spins are assembled dynamically, this may imply that LIGO-Virgo events merge both dynamically and in the field. Based on the inferred mass and spin distributions, we find no clear evidence for or against hierarchical mergers in GWTC-2.
6. We compute the rate of compact binary mergers, finding R BNS = 320 +490 −240 Gpc −3 yr −1 and R BBH = 23.9 +14.3 −8.6 Gpc −3 yr −1 . The data are consistent with both a merger rate that is constant in time and one that tracks the SFR in the local universe, though the data prefer a merger rate that is somewhere in between. We find that the merger rate at z = 1 differs from the merger rate at z = 0 by a factor of R BBH (z = 1)/R BBH (z = 0) = 2.5 +8.0 −1.9 , to be compared with the SFR, R SFR (z = 1)/R SFR (z = 0) ∼ 6.
While a clearer picture is emerging of the population properties of compact binaries, key questions remain. How do we best characterize the deviations from powerlaw in the primary BH mass spectrum, and what is the physical origin of these new features? What is the origin of BBH mergers in the high-mass gap: hierarchical mergers, stars producing remnants heavier than expected from pair instability supernovae theory, or something else? What is the shape of the mass spectrum between NS and BH masses, and does the current dearth of systems between ∼ 3 M and ∼ 6 M represent an empty low-mass gap? If so, do systems like the secondary mass of GW190814 belong to the NS or BH side of the gap? Is the observation of anti-aligned spins indicative of dynamically assembled binaries? As the sensitivity of LIGO, Virgo, and KAGRA improves, and as more gravitational-wave transients are detected, we expect to begin to answer these questions. As future observations subject our models to increasing scrutiny, it is inevitable that refinements will be required to fit newly resolved features. This cycle of refining models to account for new data will reveal new questions while providing an evolving understanding of the conclusions presented here. We would like to thank all of the essential workers who put their health at risk during the COVID-19 pandemic, without whom we would not have been able to complete this work.
This is LIGO document number LIGO-P2000077.

A. ESTIMATING THE DETECTION FRACTION
A key ingredient in Eqs. (1) and (2) is the detection fraction ξ(Λ), the fraction of systems within some prior volume (redshift z < 2.3) that we expect to successfully detect. The detection fraction quantifies selection biases, and so it is critical to accurately characterize. For a population described by hyper-parameters Λ, the detection fraction is Here, P det (θ) is the detection probability: the probability that an event with parameters θ is detectable. The detection probability depends primarily on the masses and redshift of a system, and, to a lesser degree, on the spins. We calculate ξ(Λ) using injections. We simulate compact binary signals from a reference population and record which ones are successfully detected by the PyCBC and GstLAL search pipelines; see Abbott et al. (2020c). Following Tiwari (2018); Farr (2019); Vitale (2020); Loredo (2004), the point estimate for Eq. A1 is calculated using a Monte Carlo integral over found injections:ξ where N inj is the total number of injections, N found are the injections that are successfully detected, and p draw is the probability distribution from which the injections are drawn; see LIGO-Virgo (2020) for additional details. When sampling the population likelihood, we marginalize over the uncertainty inξ(Λ) following Farr (2019), and ensure that the effective number of found injections remaining after population re-weighting is sufficiently high (N eff > 4N det ). For the O3a observing period, we use the injection campaign described in Abbott et al. (2020c) and characterize the found injections as those recovered with a FAR below our threshold of 1 yr −1 in either PyCBC or GstLAL. For the O1 and O2 observing period, we supplement the O3a pipeline injections with mock injections drawn from the same distribution p draw above. For the mock injections, we calculate P det (m 1 , m 2 , z, χ 1,z , χ 2,z ) according to the semi-analytic approximation described in Abbott et al. (2019a), based on a single-detector signal-to-noise ratio threshold ρ = 8 and the Advanced LIGO Early-High Noise PSD (Abbott et al. 2013). We combine O1, O2 and O3 injection sets ensuring a constant rate of injections across the total observing time, yielding N inj ≈ 7.7 × 10 7 injections for O3a and N inj ≈ 7.1 × 10 7 for O1 and O2. To control computational costs, not all of the injections are performed in real data. Before injecting, the expected network signal-to-noise ratio of the injections is computed, and the hopeless injections with signal-to-noise ratio < 6 are removed.
Due to the finite number of injections, we approximate Eq. (A1) with a fixed spin distribution instead of the distribution implied by Λ. When combining the Truncated, Power Law + Peak, Broken Power Law and Multi Peak mass models together with the Default spin distribution, we assume that the aligned spin components χ 1,z , χ 2,z are independently drawn from a uniform distribution U (−0.5, 0.5). By making this approximation, we are in effect ignoring selection effects due to spin. Nevertheless, we expect this approximation to have a negligible impact on the inferred spin distribution compared to the statistical uncertainties. For aligned spin components in the range (−0.5, 0.5), the detection probability varies by no more than a factor of 2 (Ng et al. 2018a). Furthermore, our main conclusions regarding the spin distribution inferred from the Default model are supported by the Gaussian model, which requires no approximations for spin selection effects. The Multi Spin model calculates Eq. (A1) by calibrating a semi-analytic approximation to the list of found injections (Wysocki 2020).
The transfer function between the observed strain and astrophysical strain is subject to a systematic calibration uncertainty. We neglect this calibration uncertainty in our estimates of the search sensitivity above. For the O3a observing run, the amplitude uncertainty was 3% (Sun et al. 2020), which leads to a 10% systematic uncertainty in the sensitive spacetime volume and the inferred merger rate. This systematic uncertainty is subdominant to our uncertainties from Poisson counting error.

B. DETAILS OF MASS POPULATION MODELS
In this section we provide details about the population models described above in Section 3; see also Fig. 1. Each subsection includes a table with a summary of the parameters for that model and the prior distribution used for each parameter. The prior distributions are indicated using abbreviations: for example, U(0, 1) translates to uniform on the interval (0, 1) and LU(10 −6 , 10 5 ) translates to log-uniform on the interval 10 −6 , 10 5 .

B.1. Truncated mass model
This model is equivalent to "Model B" in Abbott et al. (2019a). The primary mass distribution for this model follows a power-law with spectral index α, and with a sharp cut-off at the lower end m min and the upper end of the distribution m max : Meanwhile, the mass ratio q ≡ m 2 /m 1 follows a power-law distribution with spectral index β q π(q|β q , m min , m 1 ) ∝ The hyper-parameters for this model are summarized in Table 5.

B.2. Power Law + Peak mass model
This is equivalent to "Model C" from Abbott et al. (2019a). It is motivated by the idea that the mass loss undergone by pulsational pair-instability supernovae could lead to a pile-up of BBH events before the pair-instability gap (Talbot & Thrane 2018). The primary mass distribution is an extension of Truncated with the addition of tapering at the lower mass end of the distribution and a Gaussian component: π(m 1 |λ peak , α, m min , δ m , m max , µ m , σ m ) = (1 − λ peak )P(m 1 | − α, m max ) + λ peak G(m 1 |µ m , σ m ) S(m 1 |m min , δ m ).

(B5)
Here, P(m 1 | − α, m max ) is a normalized power-law distribution with spectral index −α and high-mass cut-off m max . Meanwhile, G(m 1 |µ m , σ m ) is a normalized Gaussian distribution with mean µ m and width σ m . The parameter λ peak is a mixing fraction determining the relative prevalence of mergers in P and G. Finally, S(m 1 , m min , δ m ) is a smoothing function, which rises from 0 to 1 over the interval (m min , m min + δ m ): The conditional mass ratio distribution in this model also includes the smoothing term: π(q | β, m 1 , m min , δ m ) ∝ q βq S(qm 1 | m min , δ m ).
The hyper-parameters for this model are summarized in Table 6. In Fig. 16, we provide a corner plot representation of the posterior distribution for the Power Law + Peak hyper-parameters. The (µ m , λ peak ) panel describes the Gaussian component: µ m is the center of the Gaussian while  λ peak is the fraction of mergers taking place in the Gaussian (as opposed to the power-law distribution). Judging from this panel, it appears at first that λ peak peaks close to 0 (corresponding to no Gaussian peak). However, if we zoom in as in Fig. 6, we see that the posterior for λ peak is peaked clearly away from zero at ∼ 0.02. This is consistent with the large Bayes factor indicating preference for Power Law + Peak over Truncated.

B.3. Broken Power Law mass model
This model is an extension of Truncated. The primary mass distribution consists of a broken power-law. This is motivated by the potential tapering of the primary mass distribution at high masses. Also, the model employs a smoothing function to prevent a sharp cut-off at low masses. Here, is the mass where there is a break in the spectral index and b is the fraction of the way between m min and m max at which the primary mass distribution undergoes a break. Meanwhile, S(m 1 , m min , δ m ) is a smoothing function as in Eq. (B6). The conditional mass ratio distribution is the same as in the Power Law + Peak model; see Eq. B8. The hyper-parameters for this model are summarized in Table 7. In Fig. 17 we provide a corner plot for Broken Power Law. In the limit of no low-mass smoothing (δ m = 0), and in the limit of a second power-law with a steep slope that mimics a sharp cutoff (m break = m max ), this model reduces to Truncated. Above, we noted that the Broken Power Law model prefers a break in the primary mass spectrum near 40 M . On the other hand, if we believe that the feature represented by m break should be closer to a sharp cutoff, then the cut-off must occur at higher masses approaching the maximum mass of Truncated at m max = 74.6 +15.4 −8.6 M . This can be seen by the correlation between b and α 2 in Fig. 17. (1 − λ)P(m 1 | − α, m max ) + λλ 1 G(m 1 |µ m,1 , σ m,1 ) + λ(1 − λ 1 )G(m 1 |µ m,2 , σ m,2 ) S(m 1 |m min , δ m ).

B.4. Multi Peak mass model
Here, the parameters λ and λ 1 correspond to the fraction of binaries in any Gaussian component and the fraction of binaries in the lower-mass Gaussian of the Gaussian components, respectively. The distribution G(m 1 |µ m,1 , σ m,1 ) is a   normalized Gaussian distribution for the lower-mass peak with mean µ m,1 and width σ m,2 and G(m 1 |µ m,2 , σ m,1 ) is a normalized Gaussian distribution for the upper-mass peak with mean µ m,2 and width σ m,2 . The hyper-parameters for this model are summarized in Table 8. In Fig. 18 we provide a corner plot for Multi Peak for parameters corresponding to the two Gaussian peaks. The mean of the upper-mass peak µ m,2 = 68 +18 −14 M is located at approximately twice the mean of the lower-mass peak µ m,1 = 33.4 +4.4 −4.9 M . The remaining parameters are: α = 2.9 +1.9 −1.4 , β q = 0.9 +1.9 −1.3 , m min = 4.6 +1.3 −1.8 M and m max = 65 +31 −30 M . In Fig. 19, we provide a posterior predictive check for all of the mass models used in this analysis.

Parameter Description Prior α1
Power-law slope of the primary mass distribution for masses below m break . U(−4, 12) α2 Power-law slope for the primary mass distribution for masses above m break . U(−4, 12) βq Spectral index for the power-law of the mass ratio distribution. U(−4, 12) mmin Minimum mass of the power-law component of the primary mass distribution. U(2 M , 10 M ) mmax Maximum mass of the primary mass distribution.
The fraction of the way between mmin and mmax at which the primary mass distribution breaks, e.g. a break fraction of 0.4 between mmin = 5 and mmax = 85 means the break occurs at m1 = 32.  C. MASS MODEL CHECKING Section 5.1 describes the inferred mass distribution obtained with the Truncated, Broken Power Law, Power Law + Peak, and Multi Peak models, and compares the different models by calculating their Bayes factors. Here we assess the goodness-of-fit of the models using posterior predictive checks, comparing predicted and empirical catalogs of observed m 1 distributions in Fig. 19. The light colored bands show the cumulative distribution of m 1 as predicted by the model, while the darker bands show the empirical distribution based on the actual events observed in GWTC-2. The bands represent a family of curves, where each curve corresponds to a different draw from the population hyper-posterior. Each draw from the hyper-posterior updates both the predicted distribution (in the lighter color) and the empirical distribution (in the darker color), as the individual event posteriors are updated according to the inferred population distribution (Fishbach et al. 2020b;Galaudage et al. 2019;Miller et al. 2020). If the model is a good fit to the data, the dark colored bands should overlap with the light colored bands. Figure 19 shows the relatively poor fit for the Truncated model, which cannot capture the excess of events at ∼ 30-40 M compared to 40 M . The remaining panels show the improved fits with the Power Law + Peak, Broken Power Law and Multi Peak models. These results are consistent with the Bayes factors in Table 2, which conclude that the Truncated model is disfavored by a Bayes factor of 10-80 relative to the other models.

C.1. On GW190412
Other than GW190814, we find that GW190412 (Abbott et al. 2020a), when analyzed with a population informed prior, remains the only system for which we can confidently bound the mass ratio away from unity, yielding q < 0.53 at 99% credibility (using the Power Law + Peak mass model). All other events, when analyzed with a population-informed prior, are consistent with q = 1 at 99% credibility. Repeating the analysis in Abbott et al. (2020a), we perform a leave-one-out analysis without GW190412 and find β q = 4.0 +6.4 −3.2 (β q = 4.5 +5.9 Peak model; β q = 1.4 +2.5 −1.5 for the Broken Power Law model) has moderate (∼ 50%) overlap with the leave-one-out β q posterior, indicating that, consistent with the conclusion in Abbott et al. (2020a), GW190412 likely belongs to the low mass ratio tail of the distribution rather than a distinct subpopulation of asymmetric systems.

C.2. On GW190521
As discussed in Section 5.1, the most massive event, GW190521 (Abbott et al. 2020e), is an outlier with respect to the Truncated model (see Fig. 2), but fits well within the mass distributions inferred from the other models. In Fig. 20, we show the effect of GW190521 on the primary mass distribution. This event shifts the best-fit mass distribution, but this shift is within the statistical uncertainties. Thus, we find no evidence that GW190521 is an outlier within the framework of the Power Law + Peak and Broken Power Law mass models. This finding is supported by the posterior predictive check in subsection C.4. In Fig. 21, we show how the primary mass posterior distribution for GW190521 changes when we use the Power Law + Peak model to inform our prior. While the population-informed posterior on the primary mass prefers smaller masses (Fishbach et al. 2020b), the conclusion that the primary mass of

C.3. On GW190814
On the other hand, we see clear indication that GW190814 is an outlier with respect to the BBH population within the framework of the Power Law + Peak and Broken Power Law mass models, as discussed in Section 5.1. As an additional posterior predictive check, following the analysis described in Fishbach et al. (2020b) and Abbott et al. (2020a), we use the posterior predictive distribution, inferred without GW190814, to construct a distribution for the minimum m 2 detected in a sample of 45 events. When using both the Power Law + Peak and Broken Power Law models, we find that the observation of a system with a secondary mass equal to or smaller than the that of GW190814 (2.59 +0.08 −0.09 M ) is highly improbable, with probability < 0.02% for both Power Law + Peak and Broken Power Law; see Fig. 22b for the distribution of the minimum observed secondary mass in a sample of 45 events predicted by the Power Law + Peak model. The distribution for Broken Power Law is qualitatively similar. The mass ratio of GW190814 is also somewhat unusual according to this posterior predictive check; see Fig. 22a. Observing an event with the mass ratio of GW190814 or smaller, based on the fit to the other 44 BBH events, has probability < 0.02% in both the Power Law + Peak and Broken Power Law models. These posterior predictive checks suggest that GW190814 is not a typical BBH, and support the conclusion that there may be a dearth of systems between ∼ 2.6 M and ∼ 6 M . Future observations will reveal the precise shape of the mass distribution at low masses and extreme mass ratios, and better determine the nature of GW190814.

C.4. Mass and distance checks with a burst analysis
The earlier posterior predictive checks in this section compared simulated sets of BBH masses to the observed set of catalog events. As a complimentary posterior predictive check, we can simulate the gravitational-wave signals from these predicted events, run it through our search pipelines, and compare the synthetic data, as detected by the pipeline, to the observed data. As a proof of principle, we carry out this posterior predictive check with the Coherent WaveBurst (cWB) pipeline (Klimenko & Mitselmakher 2004;Klimenko et al. 2016), which is designed to detect unmodeled gravitational-wave transients. The cWB analysis resulted in the detection of 22 BBH events in GWTC-2. We investigate whether the set of cWB observations is consistent with the model predictions. We focus on assessing possible outliers at high masses and high redshifts, where cWB is especially sensitive to BBH signals. In particular, we examine whether GW190521 ,which was recovered with higher significance by the cWB saerch than the templated searches, is an outlier in the context of our mass and redshift models. Following Klimenko et al. (2016), we calculate the expected distribution of the central frequency f (which depends on the redshifted mass of the BBH) and coherent signal-to-noise ratio ρ (which, for a given redshifted mass, depends on the distance of the BBH) for two different population models: Power Law + Peak and Broken Power Law, using the Non-Evolving redshift model. We then compare the empirical distribution of (f, ρ), as recovered by the cWB pipeline to the distribution predicted by the population model. We quantify the comparison by calculating a p-value for each event i, which measures how unusual its observed (f i , ρ i ) is, given the distribution of predicted (f, ρ). The central frequency is a proxy for the redshifted mass while the signal-to-noise ratio is a proxy for the distance.
To compute the predicted distribution of (f, ρ), we inject simulated waveforms drawn from the Power Law + Peak and Broken Power Law distributions into the O1, O2 and O3a data and compile injections recovered by cWB   Wysocki et al. (2019a), the dimensionless spin magnitude distribution is taken to be a Beta distribution, where α χ and β χ are the standard shape parameters that determine the distribution's mean and variance. The Beta distribution is convenient because it is bounded on (0,1). The distributions for χ 1 and χ 2 are assumed to be the same. Following Talbot & Thrane (2017), we define z = cos θ 1,2 as the cosine of the tilt angle between component spin and a binary's orbital angular momentum, and assume that z is distributed as a mixture of two populations: Here, I(z) is an isotropic distribution, while G t (z|σ t ) is a truncated Gaussian, peaking at z = 0 (perfect alignment) with width σ t . The mixing parameter ζ controls the relative fraction of mergers drawn from the isotropic distribution and Gaussian subpopulations. The isotropic subpopulation is intended to accommodate dynamically assembled binaries, while G t is a model for field mergers. The hyper-parameters for this model and their priors are summarized in Table 9. Additional constraints to the priors on µ χ and σ 2 χ are applied by setting α χ , β χ > 1. In Fig. 24 we provide a corner plot for the Default spin model. This model prefers modest spin magnitudes. It favors the hypothesis that binaries are preferentially aligned (ζ → 1 and σ t 2), albeit with potentially large misalignment  angles (σ t > 0). The case of perfect alignment, which would correspond to ζ = 1 and σ = 0, is disfavored, lying outside the 99% credible bound on ζ and σ t . Within the main text, Fig. 10 shows the implied distributions of component spin magnitudes and tilt angles. The implied distributions of the effective precession spin parameter (χ p ) and the effective inspiral spin parameter (χ eff ) are shown in Figs. 9 and 11, respectively; these distributions are in good agreement with the results obtained using the Gaussian model described below. In particular, both models predict the existence of systems with anti-aligned spins (negative χ eff ), and in-plane spin components (non-zero χ p ).

D.2. Gaussian spin model
The Gaussian spin model offers an alternative description of BBH spins. It is convenient to measure the distribution of the effective inspiral spin parameter (χ eff ) and the effective precession spin parameter (χ p ), which are better constrained than individual component spin magnitudes or tilts. We parameterize the distributions of χ eff and χ p , using a bivariate Gaussian: π(χ eff , χ p |µ eff , σ eff , µ p , σ p , ρ) ∝ G(χ eff , χ p |µ µ µ, Σ Σ Σ).
The distribution has a mean µ µ µ = (µ eff , µ p ) and a covariance matrix The population parameters appearing in Eqs. (D14) and (D15) and their associated priors are summarized in Table 10. We truncate and normalize Eq. (D14) based on the allowed regions of the effective inspiral spin parameter: χ eff ∈ (−1, 1) and χ p ∈ (0, 1). The results from the Gaussian model are obtained assuming a Truncated mass model with α = −2.2, The fit excludes GW190814. The contours represent 50%, 90% credible bounds. A perfectly aligned spin distribution (σt = 0, ζ = 1) is ruled out at > 99% credibility, consistent with the results of the Gaussian model, but the data disfavor a purely isotropic distribution (ζ = 0 or σt 2). model to GWTC-2. We additionally assume a comoving merger rate density that grows as (1 + z) 2.7 . Although the Truncated model is disfavored relative to the more complex mass models discussed above, it is sufficient for purposes of constructing an informed mass ratio distribution, the primary confounding factor in efforts to measure χ eff and χ p (Ng et al. 2018a). The full posterior on parameters of the Gaussian model is shown in Fig. 25. We find no correlation between the parameters of the χ eff and χ p distributions, nor do we obtain any information regarding the degree of correlation ρ between the effective inspiral spin parameter and the effective precession spin parameter. As discussed in Sec. 5.2, analysis with the Gaussian spin model is consistent with the identification of spin-orbit misalignment using the Default spin model; with µ p = σ p = 0 disfavored. For the Default spin model, we verified that the signature of spin-orbit misalignment was not a spurious prior artifact, finding that our posterior lies safely outside the artificial exclusion region in Fig. 9. A similar exclusion region also exists for the Gaussain model around µ p = σ p = 0, but our estimate of its exact size is subject to sampling uncertainties driven by the relatively small number of prior samples close to χ p = 0 for each event. With the Gaussian model, we also find evidence that at least some BHs have anti-aligned spins, with θ > 90 • , such that χ eff < 0. To further evaluate the robustness of our Gaussian model fits, in Fig. 26 we show posterior predictive comparisons between predicted and empirical catalogs of χ eff and χ p measurements. The light blue bands mark 90% credible bounds on the predicted cumulative distribution of observed effective inspiral spin parameter values, given our posterior on the Gaussian model parameters. The dark shaded regions, meanwhile, show 90% credible bounds on the true distribution observed within GWTC-2, achieved by reweighting single event χ eff and χ p by repeated random draws from the Gaussian hyper-posterior. A similar predictive comparison between observation and an alternative strictly positive model, in which the effective inspiral spin distribution is truncated on the interval 0 ≤ χ eff ≤ 1, reveals possible tension; when asserting that all effective inspiral spins are positive, the resulting population model underpredicts the number of observations with χ eff < 0.1 approximately 75% of the time. As described in the main text, we can leverage the assumption that BBH with negative values of χ eff are formed dynamically to infer the fraction of binaries formed via dynamical (f d ) and isolated (f i ) channels; see Eq. (7). In Fig. 28 we show the corresponding posterior distributions for f i and f d .

Parameter Description Prior
There are several sources of possible bias that might influence our Gaussian model conclusions. One possible source of bias is the mass model presumed for the Gaussian spin analysis. As noted above, measurements of a binary's χ eff and mass ratio q are generally anti-correlated (Ng et al. 2018a). Therefore, our particular choice of β q = 1.3 could conceivably affect conclusions regarding the χ eff distribution. We have directly verified that the results in Fig. 27 remain robust under different fiducial choices of β q between −1.5 and 2.
Another source of bias may be the Gaussian functional form we impose on the χ eff and χ p distributions, enforcing a unimodal distribution with smooth tails. As discussed in Sec. 5.2, though, the Default spin model yields near-identical χ eff and χ p distributions, despite its different parametrization and different physical assumptions. As an additional check, we repeat the Gaussian spin analysis on our data, truncating the χ eff distribution not on (−1, 1), but on (χ min eff , 1), where χ min eff is inferred from the data. Figure 27 shows the marginal posterior for χ min eff . We find that χ min eff is constrained to be negative at 99% credibility, confirming that support for negative effective inspiral spin parameter is a feature of the data and not simply an artifact of the Gaussian model. In contrast, when we repeat the measurement of χ min eff using simulated catalogs drawn from (i ) a Gaussian population truncated to strictly positive values, and (ii ) a pair of delta functions at χ eff = 0 and 0.1, we correctly observe consistency with χ min eff = 0. Examining χ min eff is therefore a useful safeguard even when the true population is poorly fit by a Gaussian. However, if the χ eff distribution deviates too strongly from a Gaussian functional form, then it may remain possible to spuriously conclude that χ min eff < 0. In the case of significant tidal torques, for example, it is predicted that the χ eff distribution is strongly bimodal, with effective inspiral spins clustered near χ eff = 0 or 1, with no support at χ eff < 0 (Kushnir et al. 2016;Zaldarriaga et al. 2018;Bavera et al. 2019;Belczynski et al. 2020). To illustrate how results can be biased from model misspecification, we analyze mock observations drawn from a similar bimodal distribution. Using this intentionally misspecified model, we incorrectly conclude that χ min eff < 0. However, in cases of such extreme model mismatch, we expect that our data would fail the predictive check of Fig. 26. Indeed, Fig. 29 shows the result of such a predictive check in the case of the bimodal, tidally-torqued χ eff distribution. When fitting this population with a Gaussian model, we find that the model overpredicts the fraction of mock observations with negative χ eff as well as the range 0.3 χ eff 0.8, showing a clear deviation in the predicted and observed cumulative χ eff distributions. . Posterior distribution for χ min eff , below which we truncate the Gaussian χ eff distribution. While the results shown in the main text presume χ min eff = −1, in Sec. D.2 we elevate χ min eff to a free hyper-parameter to be determined by the data; the resulting marginalized posterior distribution is shown here. In this case, we exclude χ min eff ≥ 0 at 99% credibility. This finding affirms that the signatures of anti-aligned BH spins are present in our BBH catalog, and not a bias due to our choice of parameterized spin model.

D.3. Multi Spin model
This model is an extension of the Truncated mass model with an additional Gaussian component. It is similar to the Power Law + Peak model, but there are several differences. First, the high-mass subpopulation in Multi Spin is described by a Gaussian in both m 1 and m 2 (up to the m 1 ≥ m 2 truncation) while Power Law + Peak only models m 1 as a Gaussian, and assumes that all BBH systems are described by a power-law distribution in mass ratio q. Most importantly, as its name suggests, Multi Spin allows each subpopulation to have its own independent spin distribution, each of which follows the Default model, with ζ = 1. This allows us to probe whether the spin distribution varies with mass. The parameters for Multi Spin are summarized in Table 11. . Example of a failed posterior predictive check for the effective inspiral spin parameter χ eff distribution. We fit the Gaussian spin model to a mock catalog drawn from the strongly bimodal χ eff distributions predicted in the presence of tidal torques (Kushnir et al. 2016;Zaldarriaga et al. 2018;Bavera et al. 2019;Belczynski et al. 2020). The dark shaded region shows the central 90% credible bounds on the mock observed cumulative χ eff distribution, while the dark region corresponds to that predicted by the model. In this case, our Gaussian model is an extremely poor representation of the underlying χ eff distribution, and so significant tension is seen between the observed and predicted distributions, with the model overpredicting the fraction of observations with χ eff < 0 as well the fraction of observations with 0.3 χ eff 0.8.

E. REDSHIFT EVOLUTION MODELS
The power-law redshift evolution model parameterizes the merger rate density as where R 0 denotes the merger rate density at z = 0. This implies that the redshift distribution is where dV c /dz is the differential comoving volume, and C is related to R 0 by Parameter Description Prior R pl Local merger rate for the low-mass power-law subpopulation. U(0, 5000) Rg Local merger rate for the high-mass Gaussian subpopulation. U(0, 5000) αm Power-law slope of the primary mass distribution for the low-mass subpopulation.
U(−4, 12) βq Power-law slope of the mass ratio distribution for the low-mass subpopulation U(−4, 10) mmin Minimum mass of the primary mass distribution for the low-mass subpopulation. U(2, 10) mmax Maximum mass of the primary mass distribution for the low-mass subpopulation.
U(30, 100) µm 1 Centroid of the primary mass distribution for the high-mass subpopulation U(20, 50) σm 1 Width of the primary mass distribution for the high-mass subpopulation U(0.4, 10) µm 2 Centroid of the secondary mass distribution for the high-mass subpopulation U(20, 50) σm 2 Width of the secondary mass distribution for the high-mass subpopulation U(0.4, 10) Meanχ 1,pl Mean of the beta distribution of primary spin magnitudes for the low-mass subpopulation.
U(0, 1) Varχ 1,pl Variance of the beta distribution of primary spin magnitudes for the low-mass subpopulation.
U(0, 0.25) σ 1,pl Width of the truncated Gaussian distribution of cos(primary spin tilt angle) for the low-mass subpopulation.
U(0, 4) Meanχ 2,pl Mean of the beta distribution of secondary spin magnitudes for the low-mass subpopulation.
U(0, 1) Varχ 2,pl Variance of the beta distribution of secondary spin magnitudes for the low-mass subpopulation.
U(0, 0.25) σ 2,pl Width of the truncated Gaussian distribution of cos(secondary spin tilt angle) for the low-mass subpopulation.
U(0, 4) Meanχ1,g Mean of the beta distribution of primary spin magnitudes for the high-mass subpopulation.
U(0, 1) Varχ1,g Variance of the beta distribution of primary spin magnitudes for the high-mass subpopulation.
U(0, 0.25) σ1,g Width of the truncated Gaussian distribution of cos(primary spin tilt angle) for the high-mass subpopulation.
U(0, 4) Meanχ2,g Mean of the beta distribution of secondary spin magnitudes for the high-mass subpopulation.
U(0, 1) Varχ2,g Variance of the beta distribution of secondary spin magnitudes for the high-mass subpopulation.
U(0, 0.25) σ2,g Width of the truncated Gaussian distribution of cos(secondary spin tilt angle) for the high-mass subpopulation.
U(0, 4) We take z max = 2.3 in the analysis, as this is a conservative upper bound on the redshift at which we could detect BBH systems during O3a within the mass range considered here. When fitting this model, we employ a uniform prior on κ centered at κ = 0. The value κ = 0 corresponds to no evolution; i.e., a merger rate that is uniform-in-comoving volume and source-frame time. We take a sufficiently wide prior so that the likelihood is entirely within the prior range, κ ∈ (−6, 6).

F. GRAVITATIONAL-WAVE LENSING
It has been suggested that gravitational-wave lensing could bias the estimate of binary masses (Dai et al. 2017;Ng et al. 2018b;Li et al. 2018;Oguri 2018;Hannuksela et al. 2019), which could lead to a biased population inference. However, based on the predictions on the number of expected gravitational-wave sources and the distribution of galaxy lenses in the Universe, Li et al. (2018); Oguri (2018) predict that only around one in a thousand observed events are lensed, although this estimate can vary depending on the redshift evolution of the merger-rate density. The lensing rate