LIGO-Virgo-KAGRA’s Oldest Black Holes: Probing star formation at cosmic noon with GWTC-3

,


INTRODUCTION
The gravitational-wave (GW) detector network consisting of Advanced LIGO (LIGO Scientific Collaboration et al. 2015), Advanced Virgo (Acernese et al. 2015) and KAGRA (Akutsu et al. 2021) has observed GW radiation from binary black holes (BBHs) that merge at redshifts z merge ≲ 1 (e.g.Abbott et al. 2019aAbbott et al. , 2021a;; The LIGO Scientific Collaboration et al. 2021;Nitz et al. 2021;Olsen et al. 2022), with planned detector upgrades expanding the network sensitivity to z merge ≲ 2 (Miller et al. 2015;Abbott et al. 2018;Weizmann Kiendrebeogo et al. 2023).Observing stellar-mass binaries that merge beyond z merge ≳ 2, an era known as "cosmic noon" when the Universe formed most of its stars, requires a next generation of GW detectors proposed for the 2030s, such as Einstein Telescope (Maggiore et al. 2020) and Cosmic fishbach@cita.utoronto.caExplorer (Evans et al. 2021).These next-generation observatories would be sensitive to compact binary mergers past z merge > 50 (Hall & Evans 2019;Kalogera et al. 2021;Borhanian & Sathyaprakash 2022), likely observing every BBH merger in the Universe and opening up enormous discovery potential for learning about highredshift star formation, the first generation (Pop III) stars, the assembly and growth of the first galaxies, cosmic metallicity evolution, and the formation histories of BBH systems (Vitale et al. 2019;Ng et al. 2021Ng et al. , 2022;;Chruślińska 2022).
Although the current LIGO-Virgo-KAGRA (LVK) network cannot observe mergers that happened earlier than z merge ≈ 1, many of the observed low-redshift mergers may have formed at significantly higher redshifts.Merging binaries experience a delay time between the formation of the progenitor stars and the binary compact object merger.Because massive stars are short-lived, the bulk of this delay time typically consists of the GW-driven binary inspiral.The GW inspiral time  (2022a).We show histograms of the delay times per 10 6 M⊙ of stellar mass formed, assuming a flat-in-log distribution of metallicities, and bin sizes ∆ log t delay = 0.2.Thick dashed lines show the median for each formation channel, orange thin lines show the power laws used in this work.
τ inspiral is very sensitive to the orbital separation a, scaling as τ inspiral ∝ a 4 (Peters 1964).In fact, a binary needs to reach extremely short separations O(10R ⊙ ) just to merge within the age of the Universe (Peters 1964).Small increases in the orbital separation drastically increase the GW inspiral time, implying that even a narrow distribution of initial orbital separations will cause the distribution of delay times to extend out to very long delays.Indeed, different BBH formation channels (see e.g., Mandel & Farmer 2022;Mapelli 2020, for a review) typically predict delay time distributions with a long tail that extends beyond a Hubble time.This applies to dynamically-assembled BBH systems or stellar triples in various environments (e.g.Rodriguez et al. 2016;Antonini et al. 2017;Di Carlo et al. 2020;Yang et al. 2020;Michaely & Naoz 2022) as well as those resulting from isolated binary evolution (e.g.Dominik et al. 2013;Lamberts et al. 2016;Mapelli et al. 2017).Here we focus on predictions from isolated binary evolution.
Different isolated binary formation channels generally predict distinct distributions for the separations at BBH formation, and thus distinct delay time distributions.For example, the common envelope channel leads to shorter final separations than the stable mass transfer channel, because common envelopes shrink the binary orbit more efficiently than stable mass transfer (see Fig. 1, and e.g., Bavera et al. 2021;Gallegos-Garcia et al. 2021;van Son et al. 2022b).1 Similarly, chemically homogeneously evolving stars are expected to stay compact and tend towards shorter delay time distributions at sufficiently low metallicities (e.g., Marchant et al. 2016;du Buisson et al. 2020;Riley et al. 2021).
Because of these distinct predictions, directly measuring the delay time distribution probes the formation channels of GW sources.For mergers involving neutron stars, the delay time distribution can be inferred from a population of their host galaxies (Safarzadeh et al. 2019;Adhikari et al. 2020), the host galaxy properties of short gamma ray bursts (Zevin et al. 2022), or the r-process enrichment history (e.g., Naidu et al. 2022).For BBH sources without uniquely identified host galaxies, measuring the redshift evolution of the merger rate (Fishbach et al. 2018;Callister et al. 2020;Vitale et al. 2019;Safarzadeh et al. 2019;Callister & Farr 2023;Edelman et al. 2023) can inform the delay time distribution (Vitale et al. 2019;Fishbach & Kalogera 2021;Karathanasis et al. 2023).Fishbach & Kalogera (2021) found that the relatively steep redshift evolution of the BBH merger rate between z merge = 0 and z merge = 1, compared to the low-metallicity SFR model from Madau & Fragos (2017), favors short delay times, ruling out delay time distributions with typical delays ≳ 3 Gyr.In the absence of a direct model for the SFR, Mukherjee & Dizgah (2022) showed that the distribution of delay times between star formation and BBH mergers can be inferred by cross-correlating the redshift distributions of BBH mergers with electromagnetic tracers of star formation (e.g., line intensity mapping).
Conversely, if the delay time distribution is known, the redshift evolution of the BBH merger population directly constrains the BBH progenitor formation rate.The progenitor formation rate depends on the SFR, stellar initial mass function (IMF), and the cosmic metallicity as a function of redshift.These are highly important yet uncertain processes, particularly at high redshifts (Madau & Dickinson 2014;Smith 2020;Maiolino & Mannucci 2019).Therefore, GW mergers present an exciting opportunity to probe star-forming conditions in the high-redshift Universe (see, e.g.Chruślińska 2022, for a review).
Previous studies have studied the BBH progenitor formation rate within a population synthesis framework, simulating BBH merger events for a range of parameters that govern the metallicity-specific SFR.By comparing to LVK observations, one can place constraints on the input parameters (e.g.Riley & Mandel 2023).Alternatively, one can simultaneously fit the delay time distribution and the progenitor formation rate to the GW observations in a data-driven approach, but the two will be highly degenerate with each other unless one restricts the model flexibility by placing some physical priors (e.g.Vitale et al. 2019).
In this work, we assume a simplified form of the delay time distribution motivated by theoretical predictions, and use it to propagate the observed merger redshift of each BBH event in GWTC-3 backward to a probability distribution of its progenitor's formation redshift z form , while simultaneously inferring the progenitor formation rate.Our approach sits between a population synthesis forward model and a data-driven inference, and can be thought of as a highly simplified version of "backward population synthesis" (Andrews et al. 2018(Andrews et al. , 2021;;Wong et al. 2022).Unlike full backward population synthesis, we apply only a couple of predictions from population synthesis and combine them with GW observations, starting with a delay time distribution to infer BBH progenitor formation redshifts.Indeed, one application of the backward population synthesis approach is providing a straightforward consistency check that a given population synthesis simulation predicts delay times and metallicity-specific BBH rates that match both GW and SFR observations.The remainder of this paper is structured as follows.In §2, we introduce the theoretically-motivated delay time distributions and derive the relationship between delay times, merger redshifts and formation redshifts.In §3, we model the BBH merger rate in terms of a fixed delay time distribution and an unknown progenitor formation rate, and infer the progenitor formation rate from the GWTC-3 data.We then adopt a metallicitydependent yield dN BBH /dM SF (Z) motivated by population synthesis and a SFR model from galaxy observations to turn our inference of the BBH progenitor rate into a constraint on the cosmic metallicity evolution ( §4).In §5, we discuss what our population fit implies for the merger and formation redshifts of individual GWTC-3 BBH events, showing that GWTC-3 likely contains several systems that formed before cosmic noon.We discuss limitations of our method and future directions in §6 before concluding in §7.

DELAY TIME DISTRIBUTIONS
Before inferring the BBH formation rate in the following sections, we first review the relationship between the delay time distribution p(τ ), the merger rate R m and the formation rate R f , and explain our choice of delay time distribution models.Assuming a delay time τ , a binary that forms at lookback time t form merges at the lookback time: (1) Assuming a distribution of delay times p d (τ ), a binary that formed at t form will merge at a time t merge drawn from the distribution: The rate of mergers at a given time t merge depends on the formation rate in addition to the delay time distribution (e.g.Nakar 2007;Abbott et al. 2016): where in the second line we changed variables from t form to τ using Eq. 2. Meanwhile, by Bayes theorem, given a binary that merged at time t merge , the probability it formed at time t form is given by: Therefore, the BBH merger rate, which we can measure with GW observations, informs a combination of the progenitor formation rate and the delay time distribution (Eq.3) and allows us to infer the formation times of individual BBH events from their merger times (Eq.4).
Although the above equations are expressed in terms of lookback time, we assume a cosmological model (in this case, flat ΛCDM with parameters from Planck Collaboration et al. 2016) to convert between lookback time t and redshift z (Astropy Collaboration et al. 2018a), so that z merge (z form ) is the cosmological redshift at the lookback time t merge (t form ).
We assume a power-law parameterization for the delay time distribution with slope α τ , minimum delay time τ min , and maximum delay time τ max : where Θ denotes the indicator function.In reality, the delay time distribution varies depending on the BBH mass and formation metallicity.For simplicity, we neglect this dependence and assign the delay time distribution of Eq. 5 to every BBH system regardless of its mass and any assumptions about its formation metallicity, but see §6 for a discussion on this assumption.We bound our uncertainty in the delay time distribution by varying the power-law slope α τ .
In Fig. 1 we show variations of the delay time distribution inferred from the population synthesis predictions of van Son et al. (2022a).These results assume a universal Kroupa (2001) initial mass function (IMF) and average over a flat-in-log distribution of metallicities.Motivated by these models, we explore three slopes: α τ = −1 (our default model; similar to the prediction for "All BBHs", which is typically dominated by the common envelope channel), α τ = −0.35(similar to the prediction from the stable mass transfer channel), and an intermediate slope of α τ = −0.7.We further fix τ min = 10 Myr to represent the typical lifetime of a massive star, and fix τ max to the maximum lookback time of progenitor formation, which we take to be t 0 = 13.62 Gyr (corresponding to a maximum progenitor formation redshift z 0 = 20).Choices of τ max > t 0 are equivalent to changing the normalization of the progenitor formation rate.As we will see in the following, we are not yet sensitive to the choice of z 0 , as GWTC-3 is unlikely to contain any events that formed earlier than z form ≳ 10 for these choices of delay time distributions.

PROGENITOR FORMATION HISTORIES
We now use the GWTC-3 observations to fit the BBH merger rate, which we model as a convolution of an unknown progenitor formation rate and a theoreticallymotivated delay time distribution (Eq.3).The result is a measurement of the progenitor formation rate as a function of redshift (i.e., the formation rate of systems that will merge as BBHs within τ max = 13.62 Gyr).
Our approach is similar to the progenitor formation rate calculation by Fishbach & Kalogera (2021) with GWTC-2 (see their Fig. 7).However, while Fishbach & Kalogera (2021) assumed that the progenitor formation rate follows the low-metallicity SFR given by Madau & Fragos (2017) and fit for the metallicity threshold and scatter in the mean metallicity-redshift relation, here we adopt a more agnostic model, simply assuming that the progenitor formation rate can be described by the functional form from Katsianis et al. (2021) with free parameters N , a, b: where N is a normalization in units of Gpc −3 yr −1 , T (z) is the age (in Gyr) of the Universe at redshift z, b has units of inverse Gyr, and a is unitless.Although this function naturally falls to zero as z form approaches infinity (when T = 0), we set R f to zero for z form > z 0 with z 0 = 20.For each of the three delay time distributions we consider, we fit N , a and b, along with Figure 2. Inferred formation rate assuming the default ατ = −1 delay time distribution (green line shows median a posteriori formation rate as a function of redshift, while shaded bands enclose 50% and 90% posterior probability).We adopt the formation rate parametrization from Katsianis et al. ( 2021) with two parameters controlling the shape and one parameter controlling the amplitude (see Eq. 6); the prior is shown in yellow (dotted line shows median a priori formation rate, while shaded band encloses 90% prior probability).For comparison, we show the UV SFR (dashed light pink line) and UV+IR SFR (dot-dashed dark pink line) from Katsianis et al. (2021) normalized by 10 −6 M⊙.
the BBH mass and spin distribution in a hierarchical Bayesian framework (Loredo 2004;Mandel et al. 2019).Additional analysis details can be found in Appendix A.
Fig. 2 shows the fit to the progenitor formation history (green) assuming the default α τ = −1 delay time model.Although we fit the formation rate out to z 0 = 20, we only plot the formation rate for z form ≤ 6, because we expect the GWTC-3 events to have formed within this redshift range (see §5 and Fig. 6).For reference, we show the prior in yellow, the Katsianis et al. (2021) UV fit to the SFR as the dashed light pink line, and the Katsianis et al. (2021) UV+IR fit to the SFR as the dot-dashed dark pink line.Both SFRs are normalized by 10 −6 M ⊙ .
In the following, we use the UV SFR as the reference SFR because Katsianis et al. (2021) find that it better matches the observed stellar mass density evolution, whereas they note that the UV+IR SFR overestimates the observed stellar mass density by up to ∼ 0.5 dex.However, we caution that observational uncertainties affect both the SFR and the stellar mass density (e.g.Narayanan et al. 2023).These uncertainties are currently not significant compared to uncertainties associated with the GW inference and the choice of delay time model, so we adopt a fixed SFR model for the following proof-of-principle calculations.
We find that the inferred BBH progenitor formation rate rises more steeply with increasing redshift than the SFR, in particular over the range z form < 4 where we expect to get the most meaningful constraints with GWTC-3.This is illustrated in Fig. 3, where we show the yield dN BBH /dM SF inferred under the three different delay time models.The yield is defined as the number of BBH progenitors that will merge within τ max = 13.62 Gyr formed per stellar mass.For power-law delay time distributions with slopes α τ = −1 or shallower, we infer that the BBH yield decreases with decreasing z form , from 6.4 +9.4  −5.5 × 10 −6 M ⊙ −1 at z form = 4 to 0.134 +1.6  −0.127 × 10 −6 M ⊙ −1 at z form = 0 under the default α τ = −1 delay time model.
We find that compared to the default α τ = −1 delay time distribution, the shallower α τ = −0.7 and α τ = −0.35models provide a worse fit to the GW data in combination with our prior on the progenitor formation history.Under these shallow delay time models, our assumed formation rate model a priori limits the merger rate to evolve only by a factor of ≈ 2 between z merge = 0 and z merge = 1, whereas the data prefer a factor of ≈ 7 (Abbott et al. 2023).This worse fit is further illustrated by the maximum likelihood values of the fits under the different delay time assumptions, which differ by a factor of 26.5 (3.3) in favor of the α τ = −1 assumption relative to the α τ = −0.35(α τ = −0.7)assumption.Either our prior for the progenitor formation history is not sufficiently flexible (as we discuss in §6), or, if our prior captures the range of reasonable BBH progenitor formation rates, the α τ = −1 model is a better description of the true BBH delay time distribution.

METALLICITY EVOLUTION
We can use our measurement of the BBH yield dN BBH /dM SF (z form ) presented in Fig. 3 to infer the cosmic metallicity evolution with redshift.This is because we expect the BBH yield dN BBH /dM SF to depend strongly on the formation metallicity Z (Belczynski et al. 2010).Given a theoretically-motivated assumption for dN BBH /dM SF (Z), we can use our measurement of dN BBH /dM SF (z form ) to infer the metallicity distribution as a function of redshift p(Z | z form ).These quantities are related as: In Fig. 4  and Figs. 17 and 18 from Iorio et al. (2023).Motivated by these predictions we take dN BBH /dM SF (Z) to be of the form: y ln(w − log 10 Z)Θ(log 10 Z < w − 1), ( 8) so that y sets the BBH yield at low metallicities, and as log 10 Z approaches w − 1 from below, the BBH yield rapidly falls to zero.Typical predictions have 10 −5 < y < 6 × 10 −5 and 0.4 < w < 1.3.To bound these possibilities, we consider a "high yield" case with (y, w) = (6 × 10 −5 , 1.3), a "medium yield" case with (y, w) = (3.5 × 10 −5 , 0.85) and a "low yield" case with (y, w) = (10 −5 , 0.4), as shown by the red lines in Fig. 4. We approximate p(Z | z form ) with a log-normal distribution with width σ log 10 Z = 0.2 dex (though cf.Chruślińska 2022; van Son et al. 2023; we discuss the impact of the log-normal approximation in § 6).Using Eq. 7 with dN BBH /dM SF (Z) given by Eq. 8 and dN BBH /dM SF (z form ) shown in Fig. 3, we infer the mean log-metallicity ⟨log 10 (Z/Z ⊙ )⟩ as a function of redshift.
The results are shown in Fig. 5.The left panel shows ⟨log 10 (Z/Z ⊙ )⟩ inferred under the default α τ = −1 delay time distribution, while the right panel shows the inference under a shallow α τ = −0.35delay time distribution.Each panel displays the inferred metallicity under the three different dN BBH /dM SF (Z) assumptions: high yield (pink), medium yield (green) and low yield (brown).
These measurements can be compared against various metallicity evolution results in the literature (Maiolino & Mannucci 2019;Chruślińska 2022).For reference, we overplot three examples: the mean log-metallicity from Madau & Fragos (2017) ("MF17" or dotted, light blue line) and the peak metallicity curves from the left and right panels of Fig. 7 of Chruślińska et al. (2021), calculated under two different assumptions ("CNBL21-1" and "CNBL21-2" in the yellow dashed and gray dotdashed lines, respectively).While we can directly compare our results to Madau & Fragos (2017) who also adopt a log-normal metallicity distribution, the comparison to Chruślińska et al. (2021) is less straightforward, since their metallicity distributions have an extended low-metallicity tail, resulting in a peak log-metallicity that is higher than the average.We expect this distinction to be less significant than the various uncertain assumptions illustrated in Fig. 5, but we discuss the lognormal assumption further in §6.We also note that our results assume a constant scatter in the log-metallicity of 0.2 dex.If we instead assumed a larger (smaller) scatter, our uncertainty in the inferred ⟨log 10 (Z/Z ⊙ )⟩(z form ) would increase (decrease).
Meanwhile, different assumptions about the delay time distribution also affect the inferred metallicity evo-lution.A delay time model that favors longer delays, such as α τ = −0.35,implies lower BBH yields across the plotted range 0 < z form < 6 in order to avoid overpredicting the local merger rate, which largely consists of systems that formed at high redshifts.This lower BBH yield translates to higher inferred metallicities (right panel of Fig. 5).For example, under the medium yield assumption and the α τ = −0.35delay time model, we infer ⟨log 10 (Z/Z ⊙ )⟩(z form = 3) = −0.1 +0.1 −0.1 .However, as discussed in the previous section, the α τ = −0.35provides a poor fit to the GW data compared to the α τ = −1 model.
Our results illustrates the promise of BBH mergers as probes of the cosmic metallicity evolution, especially as GWs start providing improved constraints at formation redshifts past z form > 3 where the metallicity distribution inferred from other probes is highly uncertain (see, e.g., Fig. 7 in Chruślińska et al. 2021).Our inferred metallicity evolution is based on models of the delay time distribution, the BBH yield, and SFR.If we instead assumed that the metallicity evolution is known, we could use our measurement of dN BBH /dM SF (z form ) from Fig. 3 and apply Eq. 7 to infer the parameters governing the BBH yield as a function of metallicity, similar to the analysis in Fishbach & Kalogera (2021).In this study, the delay time distribution and BBH yield are inferred from population synthesis predictions, while we have used constraints from UV observations for the total SFR.Although each of these components is plagued by uncertainties, our inference of the metallicity evolution can be seen as a consistency check between these different model components.We find that our default delay time distribution (from the "All BBHs" prediction in Fig. 1), the medium yield assumption (dashed red line in Fig. 4) and the Katsianis et al. ( 2021) SFR imply a metallicity evolution that is consistent with external measurements in the literature (e.g.Madau & Fragos 2017).On the other hand, the high yield and low yield predictions, coupled with our other assumptions, require metallicities that are significantly different from the external measurements shown in Fig. 5 in order to match GW merger rates.

OLDEST BLACK HOLES IN GWTC-3
Within a hierarchical Bayesian population analysis, we simultaneously infer the population properties and update our inference of individual-event parameters (Fishbach et al. 2020;Galaudage et al. 2020;Miller et al. 2020;Essick & Fishbach 2021;Moore & Gerosa 2021).Therefore, our fit to the BBH merger rate in the previous sections, modeled in terms of the progenitor formation rate and delay time distribution, allows us to .Empirical distribution functions of the merger redshifts (gray) and formation redshifts (green and blue) of the 69 BBH events in GWTC-3 for fixed delay time distributions.The green filled band assumes a delay time distribution with power-law slope ατ = -1, and the blue, unfilled band assumes ατ = −0.35.The solid green and dashed blue lines denote the median EDF, and bands enclose 50% and 90% credibility.Both delay time distributions assume τmin = 10 Myr.The number of BBH events in GWTC-3 with a (merger or formation) redshift above a given value is denoted in the right-hand y-axis.
jointly infer the probable merger redshift, delay time, and formation redshift for each BBH event in GWTC-3.
Figure 6 shows the empirical distribution functions (EDFs) of the population-informed merger redshifts and formation redshifts of the 69 BBH events.To construct the EDFs, we draw one sample from each of the 69 population-informed redshift (z merge or z form ) posteriors and order the samples from smallest to largest.The EDF at the K-th ordered z sample takes the value K/69.Repeating this over 4000 draws, we calculate the median EDF of GWTC-3 redshifts and its 50% and 90% uncertainty bands.
The gray band in Fig. 6 shows the EDF of the merger redshifts z merge .We use the population-informed z merge posteriors inferred under our default population model (with α τ = −1), but the single-event z merge posteriors are similar across the different models we consider, because they are largely informed by the GW data.The EDF of the formation redshifts z form inferred under this default α τ = −1 population model is plotted in green (filled bands) in Fig. 6 form (zmerge) z 99% form (zmerge) z 99.9% form (zmerge) GWTC-3 Figure 7. Orange: Joint distribution of the merger and formation redshifts of detected BBH events at GWTC-3 sensitivity, inferred under the default population model (with ατ = −1 delay time distribution).We average over the population posterior uncertainty.Orange shading represents the probability density in a given (zmerge, z form ) hexagonal bin.For reference, top and right axes show corresponding lookback times.Green: At each zmerge, 90-, 99-, and 99.9percentiles of the conditional z form distribution (see Eq. 4).These represent the maximum formation redshift that we expect to probe given O(10), O(100) and O(1000) observations at a fixed zmerge.
α τ = −1 model, together with the inferred formation rate, which acts as a prior over the formation redshifts (see Eq. 4).Under the α τ = −0.35delay time distribution, events tends to experience longer delay times and therefore have higher formation redshifts, although the two EDFs overlap within statistical uncertainties due to the remaining uncertainty in the inferred formation histories.
We can see from Fig. 6 that although the observed BBH systems all merged at z merge ≲ 1 (lookback times t merge L ≲ 8 Gyr), some of them experienced long delays of several Gyr between formation and merger according to either delay time distribution.The maximum merger redshift in GWTC-3 is max(z merge ) = 1.1 +0.2 −0.2 (median and 90% credible interval).Nevertheless, for our default delay time distribution, more than 21 events in GWTC-3 formed earlier than z form > 1, 24 +26 −20 events formed earlier than z form > 2 and 13 +24 −12 events formed earlier than z form > 3 (with 90% credibility).Furthermore, at least one event in GWTC-3 formed earlier than z form > 4.4 (90% credibility), when the Universe had only formed ∼ 4% of its stellar mass (Katsianis et al. 2021).
To understand which merger redshifts in GWTC-3 are responsible for the highest formation redshifts (i.e. the oldest BHs), we plot the joint distribution of merger and formation redshifts in Fig. 7.The orange histogram shows the detected distribution of (z merge , z form ) according to our population inference under the default delay time model, with "detected" referring to BBH events observable at GWTC-3 sensitivity.We average over the population uncertainty.This population-averaged, detected distribution is often referred to as a posterior population distribution (PPD) in the GW literature (e.g.Abbott et al. 2019b).
Figure 7 shows that, thanks to the delay time distribution that peaks at τ min = 10 Myr, many BBH events experienced short delay times, leading to a region of high probability density at z form ≈ z merge in the detected distribution.However, there is a second region of high density centered at z merge ≈ 0.2, z form ≈ 2. This is caused by the fact that most observable BBH events merge at low redshifts z merge ≈ 0.2, but we infer that the formation rate is very small at similarly low z form . Therefore, many of these low-redshift mergers experienced long delay times and formed at much higher redshifts.This is broadly consistent with theoretical predictions for the formation redshifts of the observed, low-z merge BBH systems (Belczynski et al. 2016;Mapelli et al. 2018Mapelli et al. , 2019;;Boco et al. 2021).Our simplified "backward population synthesis" approach therefore provides a straightforward test of whether predictions for the delay time distribution (which we fixed here to our default α τ = −1, τ min = 10 Myr model) yield reasonable BBH formation redshifts.
Meanwhile, the green solid, dashed and dotted lines in Fig. 7 show the 0.9, 0.99, and 0.999 quantiles of the conditional probability distribution p(t form | t merge ) (given by Eq. 4), converting between lookback times to cosmological redshifts as necessary.With O(N ) events at a given merger redshift, we expect one of them to have formed at the N −1 N quantile of the z form distribution.Therefore, we expect to probe the 0.9, 0.99, and 0.999 quantiles with O(10), O(100) and O(1000) events, respectively.Among the 69 BBH events in GWTC-3, the maximum formation redshift is max(z form ) = 9.9 +9.1 −6.2 , which lies between the solid and dashed green lines.Interestingly, the largest z form in a GW sample is unlikely to correspond to the largest z merge .As the GW detectors' sensitivity improves, we will probe higher formation redshifts z form not directly because the detection horizon will increase to higher z merge , but because we will increase the BBH sample size, which allows us to probe higher quantiles of the z form distribution.The highest z form in the GW sample most likely corresponds to the z merge that hosts the majority of detections (z merge ≈ 0.2 for GWTC-3).
With thousands of observations at z merge < 1 (to be expected within the next couple of observing runs) we expect to probe progenitor formation past z form ≳ 15 according to our default delay time model and our inferred progenitor formation history (see the dotted green line in Fig. 7).This suggests that with enough observing time, the current generation of GW observatories will already be sensitive to the first star formation in the Universe.However, this prediction depends on the assumed delay time distribution.This corroborates the need for next-generation GW detectors, which will provide a direct measurement of the merger rate at high redshifts.

DISCUSSION ON SIMPLIFYING ASSUMPTIONS
In this work, we have made several simplifying assumptions that can be relaxed in future work.Here we briefly discuss the validity of these assumptions.
Mass and metallicity dependence of the delay time distribution -In Figure 8 we show the delay time distribution from van Son et al. (2022a) split between BBH systems formed at low (red), medium (yellow) and high (blue) metallicities.This shows a shift towards longer delay times for the highest metallicities.We suspect that this is caused by metallicity dependent winds, in particular during the Wolf-Rayet (WR) stage (e.g., Vink & de Koter 2005).When the system consists of a WR + BH (the final stage before BBH formation), stronger winds at higher metallicities lead to wider separations, and thus longer delay times (this was also concluded by e.g., Bavera et al. 2022;Riley et al. 2021).
The metallicity dependence will cause the delay time distribution to correlate with the formation redshift z form , so that systems forming at high z form likely experience shorter delays relative to systems forming at low z form .For a fixed BBH formation rate, this leads to a larger BBH merger rate at high redshifts compared to low redshifts, and therefore mimics the effect of changing the BBH formation rate.Neglecting this effect, we are likely overestimating the degree to which the BBH yield evolves with redshift.In other words, if we took into account the metallicity-dependence of the delay time distribution, we would infer a more gradual evolution of the BBH formation yield with redshift (Fig. 3), and correspondingly a weaker dependence of metallicity on redshift (Fig. 5).However, we expect the impact on our conclusions to be mild, because (a) the delay time distribution depends only weakly on metallicity over the relevant (low-metallicity) range for BBH formation, as seen by the small difference between the Z ≤ Z ⊙ /10 and Z ⊙ /10 < Z ≤ Z ⊙ /2 delay time distributions in Fig. 8; and (b) even neglecting the metallicity dependence of the delay time distribution, we infer relatively mild evolution of the average metallicity with redshift over the well-constrained range 0 < z form < 3 in Fig. 5.The delay time distribution could furthermore depend on BBH mass.In particular, van Son et al. (2022b) found that more massive BHs are formed exclusively by the stable mass transfer channel, while the common envelope channel dominates the formation of lower-mass systems.This result was confirmed by Belczynski et al. (2022) and Briel et al. (2023).Given the distinct delay time distributions of these two channels (Fig. 1), this implies a mass dependence of the delay time distribution.Currently, the contribution of each of these channels is an active area of research (Neijssel et al. 2019;Bavera et al. 2021;Marchant et al. 2021;Gallegos-Garcia et al. 2021).If lighter BBH systems tend to experience shorter delay times compared to the more massive systems, we expect a correlation between BBH mass and merger redshift, in which more massive BBH systems are more likely to merge at lower redshifts compared to lowmass BBH systems.With current GW data, there is no evidence for such a correlation, although it is not ruled out (Fishbach et al. 2021;van Son et al. 2022b;Abbott et al. 2023).Because the evolution of the mass distribution with redshift is degenerate with the evolution of the overall merger rate, if a negative correlation between BBH mass and merger redshift exists, we are likely underestimating the BBH merger rate at high redshifts, which is dominated by light systems that are harder to detect.We may therefore also be underestimating the progenitor formation rate at high redshifts, although we expect this effect to be subdominant to other uncer-tainties given that GWTC-3 does not clearly display a correlation between BBH mass and merger redshift.
In reality, the observed population of merging BBHs is most likely a mixture of multiple formation channels with a corresponding mixture of delay time distributions (e.g., Zevin et al. 2021;Bouffanais et al. 2021;Stevenson & Clarke 2022;Godfrey et al. 2023).Future work should simultaneously incorporate delay time distributions from several formation channels, including predictions from non-isolated binary evolution channels.The mass (or spin; e.g.Bavera et al. 2022) dependence of the delay time distribution might also provide a solution to differentiate between formation channels: if a formation channel predicts unique observable properties (such as massive and highly spinning black holes), then our analysis can be repeated for a subset of the observations that meets these criteria.For example, Fishbach & Fragione (2023) leveraged predicted delay time distributions from dynamical assembly of BBHs in globular clusters to infer the globular cluster formation history from the observed merger rate of systems with misaligned spins, because misaligned spins indicate a possible dynamical origin.
Flexible progenitor formation rate models -In this work, we used a simple three-parameter model for the progenitor formation rate, which follows a power-law in time at early times (high redshift) and exponentially decays at late times (low redshifts).Future work should allow for more flexible formation rate models, such as Gaussian processes (Vitale et al. 2019), splines (Edelman et al. 2023), or autoregressive processes (Callister & Farr 2023).In fact, there is already indication that our formation rate model provides a poor fit to the GWTC-3 data when we assume shallow delay time models (α τ = −0.7 or α τ = −0.35).This is consistent with the results of Fishbach & Kalogera (2021), who also found that short delay times were favored under the different models they considered for the BBH progenitor formation rate.However, because of the degeneracy between the delay time distribution and the progenitor formation rate, this may instead indicate that our formation rate parameterization fails to adequately fit the data.If the delay time distribution is indeed shallower than the default assumption of α τ = −1, the formation rate likely deviates from our simple parameterization at formation redshifts z form > 2.
Deviations from a log-normal metallicity distribution -When inferring the cosmic metallicity evolution in §4, we assumed a log-normal metallicity distribution at each redshift.However, there are observational and theoretical indications that the metallicity distribution may deviate from the log-normal approximation, featuring an extended low-metallicity tail that grows with redshift (see, e.g., the discussion in Chruślińska 2022).Depending on the BBH yield as a function of metallicity (see Fig. 4), we expect GW observations to be predominantly sensitive to the amount of low-metallicity star formation as a function of redshift.Therefore, current GW data cannot distinguish between a log-normal metallicity distribution that peaks at lower metallicities and a distribution with an extended low-metallicity tail that peaks at higher metallicities.Future work should fit for more flexible metallicity distributions (e.g.van Son et al. 2023) in order to account for this degeneracy in the location and shape of the metallicity distribution as a function of redshift.
Universality of the IMF -Our calculations have assumed a universal Kroupa (2001) IMF.However, there are many indications that the IMF varies with metallicity (see e.g., reviews by Kroupa et al. 2013;Hopkins 2018).The formation rate of BBH progenitors is unlikely to be strongly affected by variations in the high-redshift IMF within observational uncertainties, because the formation rate of high-mass stars, which is a combination of the SFR and the IMF, is better measured than the SFR or IMF independently (Klencki et al. 2018;Chruślińska et al. 2020).Nevertheless, a non-universal IMF may affect the delay time distribution and the BBH mass distribution.Our current model only considers the delay time distribution, the BBH yield, and the SFR derived under a fixed IMF.Adding the IMF to this equation also implies that we can in principle use a similar method to measure the IMF at high redshift, in particular the transition between the IMF of Pop III stars and the IMF today.

SUMMARY
The preceding sections have demonstrated that existing GW observations by the LVK are starting to reveal the conditions of BBH formation beyond cosmic noon.This argument stems from the fact that regardless of the precise formation scenario, some BBH systems are predicted to experience long delay times between the formation of their progenitor stars and their GW-driven merger.For this proof-of-principle study, we have adopted a few fixed delay time distributions motivated by predictions from isolated binary evolution.We then applied the assumed delay time distributions to GWTC-3 events to infer their progenitor formation redshifts, while simultaneously measuring the progenitor formation history out to z form ≳ 4 and implications for the cosmic metallicity evolution.Our main results are as follows: 1. Fixing the delay time distribution to our default distribution with power-law slope α τ = −1 and minimum delay of τ min = 10 Myr, we infer the BBH progenitor formation rate as a function of redshift z form . Assuming a star formation history model from Katsianis et al. (2021), we find that the number of BBH progenitor systems formed per stellar mass was likely higher in the past than today (for our default delay time model, the BBH yield was 6.4 +9.4 −5.5 × 10 −6 M ⊙ at z form = 4 compared to 0.134 +1.6  −0.127 × 10 −6 M ⊙ at z form = 0).
2. Combining our inferred BBH yield as a function of redshift with population synthesis predictions for the BBH yield as a function of metallicity, we measure the average metallicity as a function of redshift.For our default delay time, BBH yield, and star formation history models, we find that the mean log-metallicity ⟨log 10 (Z/Z ⊙ )⟩ today is 0.2 +0.2 −0.3 and was −0.3 +0.3 −0.4 at z form = 4.This is consistent with the metallicity evolution from Madau & Fragos (2017), highlighting that a simplified form of "backward population synthesis" (Andrews et al. 2018;Wong et al. 2022) can provide a powerful self-consistency check on the various components of BBH population modeling.
3. Our fit to the BBH population allows us to jointly infer the merger and formation redshifts of the 69 confident BBH events in GWTC-3.We find that GWTC-3 likely contains at least one BBH system that formed earlier than z form > 4.4.
The BBH merger rate is shaped by the SFR, IMF, BBH yield, and the delay time distribution.By fixing a few of these ingredients at a time and inferring the rest from the GW data, we can cross-check the various predictions of population synthesis with external measurements of the cosmic SFR and metallicity evolution.As the GW catalog grows and population synthesis simulations improve, the connection between these pieces will provide valuable insights into high-redshift and low-metallicity star formation, complementary to electromagnetic observations.

Figure 1 .
Figure1.Delay time distributions derived from all physics variations explored invan Son et al. (2022a).We show histograms of the delay times per 10 6 M⊙ of stellar mass formed, assuming a flat-in-log distribution of metallicities, and bin sizes ∆ log t delay = 0.2.Thick dashed lines show the median for each formation channel, orange thin lines show the power laws used in this work.
Figure6.Empirical distribution functions of the merger redshifts (gray) and formation redshifts (green and blue) of the 69 BBH events in GWTC-3 for fixed delay time distributions.The green filled band assumes a delay time distribution with power-law slope ατ = -1, and the blue, unfilled band assumes ατ = −0.35.The solid green and dashed blue lines denote the median EDF, and bands enclose 50% and 90% credibility.Both delay time distributions assume τmin = 10 Myr.The number of BBH events in GWTC-3 with a (merger or formation) redshift above a given value is denoted in the right-hand y-axis.
, while the formation redshifts inferred under the α τ = −0.35model is shown in blue (unfilled bands).The population-informed formation redshift inferred for each event depends on the assumed delay time distribution, which favors longer delay times for the α τ = −0.35model compared to the default

Figure 8 .
Figure 8. Similar to Figure 1, but binned by formation metallicity.This shows that the delay time distribution shifts towards longer delay times for high metallicities.