Unveiling the solution to the final-parsec problem by combining milli-Hertz gravitational-wave observation and AGN survey

Massive black hole binaries (MBHBs) could be the loudest gravitational-wave (GW) sources in milli-Hertz (mHz) GW band, but their dynamical evolution may stall when the black holes reach the innermost parsec of a galaxy. Such a"final-parsec problem"could be solved if MBHB forms in a gas-rich environment, such as an active galactic nucleus (AGN), but other solutions not involving AGNs also exist. Testing the correlation between these mHz GW sources and AGNs is difficult in real observation because AGNs are ubiquitous. To overcome this difficult, we use a statistical method, first designed to constrain the host galaxies of stellar-mass binary black holes, to search for a MBHB-AGN correlation in different astrophysical scenarios. We find that by detecting only one MBHB at $z \lesssim 0.5$, a mHz GW detector, such as the Laser Interferometer Space Antenna (LISA), can already distinguish different merger scenarios thanks to the precise localization of the source. Future detector networks and deeper AGNs surveys can further testify the MBHB-AGN correlation up to a redshift of $z\sim 2$ even if only a small fraction of MBHBs merge inside AGNs. These constraints will help settle the long-standing debate on the possible solutions to the final-parsec problem.


INTRODUCTION
So far gravitational waves (GWs) have been detected in two frequency bands.In the band of 1 − 10 3 Hz, GWs produced by merging stellar-mass black holes (BHs) or neutron stars have been detected by the ground-based detectors, such as Advanced LIGO and Virgo detectors (Abbott et al. 2016;Abbott et al. 2021).In the nano-Hertz band, a GW background is discerned recently by the pulsar timing arrays (Agazie et al. 2023;Antoniadis et al. 2023;Reardon et al. 2023;Xu et al. 2023).
However, a bottleneck exists.Soon after an MBHB shrinks to a size of about one parsec, it starts to eject the surrounding stars due to a slingshot effect (Quinlan 1996).Without an efficient way of replenishing stars to the vicinity of the binary, the simplest models predict that the evolution of the binary will stall (Begelman et al. 1980;Yu 2002).In this case, the number of MBHBs in the mHz GW band would be greatly diminished.Such an issue is called the "final-parsec problem" (Milosavljević & Merritt 2001).The problem can be partially mitigated (see Amaro-Seoane et al. 2023, for a review), by introducing more efficient stellar-relaxation mechanisms into the model (e.g.Yu 2002;Zhao et al. 2002;Merritt & Poon 2004;Berczik et al. 2006) or considering triple-MBH interaction (Blaes et al. 2002;Hoffman & Loeb 2007;Amaro-Seoane et al. 2010;Kulkarni & Loeb 2012;Bonetti et al. 2019) It is known that galaxy mergers often drive gas inflow towards the galactic centers.Subsequently, an accretion disk could form around an MBHB and convert the system into an active galactic nucleus (AGN) (Kauffmann & Haehnelt 2000).It has been speculated that the interaction with accretion disk could drive an MBHB to merger when stellar dynamics become insufficient (Begelman et al. 1980).Results from the early analytical studies (Gould & Rix 2000) as well as numerical simulations (Armitage & Natarajan 2002;Escala et al. 2005;Mayer et al. 2007;MacFadyen & Milosavljević 2008;Cuadra et al. 2009;Goicovic et al. 2016) seem to support this idea, but more recent hydrodynamical simulations cast some doubts (see Lai & Muñoz (2022) for a review).
If gas is the main driver of MBHB mergers, we could anticipate a spatial correlation between AGNs and the targets of mHz GW detectors (Haiman et al. 2009).The question is whether such a correlation can be observationally testified.Earlier studies show that LISA can localize an MBHB to a typical precision of ∼ 1 deg 2 in sky position and ∼ 1% in luminosity distance (Cutler 1998;Moore & Hellings 2002;Seto 2002;Cornish & Rubbo 2003;Vecchio 2004;Klein et al. 2016).Moreover, inside such a small error volume, luminous AGNs (e.g., brighter than 21 magnitude in B band) are scarce (Kocsis et al. 2006).Based on these results, it has been predicted that there is an one-to-one correlation between an MBHB merger and a luminous AGN, especially an AGN whose luminosity is varying with time, which is caused by the interaction between the MBHB and its surrounding accretion disk (Kocsis et al. 2007;Lang & Hughes 2008).The caveats of this prediction are that (i) AGNs, even without MBHBs, are intrinsically highly variable (e.g.Dotti et al. 2023), and (ii) in a typical error volume there are normally multiple less luminous AGNs which may or may not host MBHBs (Haiman et al. 2009).
Such a difficult motivates us to design a statistical test of the MBHB-AGN correlation.Several recent progress made such a statistical study feasible.(i) For LIGO/Virgo events, a statistical method has been proposed recently to test their correlation with AGNs ( Bartos et al. 2017).The efficacy of this method relies on the precision of sky localization, which is poor with the current ground-based detectors (Veronesi et al. 2022(Veronesi et al. , 2023)).But mHz GW detectors can greatly improve it.(ii) A network of space detectors, like LISA+TJ (Ruan et al. 2020) or LISA+TQ (Wang et al. 2019;Zhu et al. 2022), will further improve the sky localization by orders of magnitude (Wang et al. 2019;Ruan et al. 2020;Gong et al. 2021;Zhu et al. 2022;Shuman & Cornish 2022).(iii) A series of planned surveys, such as Euclid (Scaramella et al. 2022), Chinese Space Station Tele-scope (CSST) (Gong et al. 2019), Roman (Akeson et al. 2019), and Vera C. Rubin Observatory Legacy Survey of Space and Time (LSST) (Ivezić et al. 2019), are aiming at providing a complete catalog of galaxies and AGNs.
In the following, we will investigate the feasibility of using such a statistical method to reveal the role of gas in solving the final-parsec problem.Throughout this work, we choose the cosmological parameters H 0 = 70 km s −1 Mpc −1 , Ω M = 0.3, Ω Λ = 0.7.

MOCK DATA AND METHOD
The null hypothesis of our test is that MBHB mergers can happen in either normal galaxies or AGNs.Effectively, it means that the probability that a LISA MBHB coincides with an AGN is proportional to the fraction of AGNs among all types of galaxies.Our alternative hypothesis is that MBHBs merge predominantly in AGNs.In this case, we assume that all LISA MBHBs spatially coincide with AGNs.

Mock data
We first generate mock samples of MBHB mergers following Klein et al. (2016).In particular, we adopt three types of population models, namely, a light-seed model-popIII, and two heavy-seed models-Q3d and Q3nod (Klein et al. 2016).In all models BH seeds form at a redshfit range of z ∼ 15 − 20, but the formation mechanism of the seeds and the later merger history of the BHs are different.In model popIII, BH seeds have an initial mass of ∼ 10 2 M ⊙ , and they are produced by the remnants of population-III stars.In Q3d and Q3nod, BHs form due to the collapse of protogalactic disks and the initial masses are ∼ 10 5 M ⊙ .The difference between the latter two models is that Q3d accounts for the time delay between MBH mergers and galaxy mergers, while Q3nod assumes that MBHs coalescence as soon as galaxies merge.
We then use the IMRPhenomPv2 waveform model (Hannam et al. 2014;Schmidt et al. 2015) to simulate the GW signals of MBHBs.Our AGN catalog is compiled from cross-correlating the Milliquas catalog (Flesch 2021) with the Sloan Digital Sky Survey (SDSS) (Lyke et al. 2020).In this way, the SDSS provides us with the masses of the MBHs in our AGN catalog (Wu & Shen 2022).We use these masses to find candidate host AGNs of an MBHB generated by our mock population.Another criterion for finding a matching host AGN is that their spatial positions are consistent at 3σ confidence level.
As for mHz GW detectors, we consider three configurations, namely, LISA, LISA+TQ network, and LISA+TJ network.The detected MBHB catalog are generated according to Refs.(Klein et al. 2016;Ruan et al. 2020;Zhu et al. 2022).As a result, the total error in luminosity distance is typically ∆D L /D L ∼ 3%.The main contribution to ∆D L comes from weak lensing (Hirata et al. 2010;Cusin & Tamanini 2021) and peculiar velocities of the hosts (Kocsis et al. 2006), while detector configuration and MBHB population model play an insignificant role (Zhu et al. 2022).The latter two factors are more important to the error in sky localization, the effect can be seen in Figure 1.We find that with LISA alone, the median value of sky localization error varies between 1 and 100 deg 2 , depending on the population model.With LISA+TQ or LISA+TJ networks, the error can shrink by one to two orders of magnitude.

Analytical method
To statistically test our null and alternative hypotheses, we modify the method proposed in (Bartos et al. 2017) (also see Braun et al. (2008)) which was originally designed to test the association of LIGO/Virgo sources with AGNs.Here we define the spatial number density of AGN as ρ agn , and the error volume of the spatial localization of the ith detected MBHB as ∆V i .In our null hypothesis, the actual number of AGNs, N agn,i , within the error volume will follow a Poisson distribution with an expectation of λ i = ρ agn ∆V i .Mathematically, we can write the distribution of N agn,i as (1) In the alternative hypothesis, there is one guaranteed host AGN in ∆V i .Therefore, the number of the rest non-host AGNs is N agn,i − 1, and it follows the Poisson distribution with the expectation λ i .In this case we can write the distribution function as In reality, three factors could complicate the statistical probability distribution of N agn,i .(i) The number of AGNs within a typical spatial localization error volume of MBHB could be much larger than unity.In this case B i (N agn,i ) and S i (N agn,i ) become indistinguishable.(ii) It is possible that only a fraction f agn of MBHBs are driven to coalescence by AGNs.(iii) The host AGN of an MBHB may be missed by an astronomical survey if it is fainter than the limiting magnitude m * limit of the survey.This will affect our counting of N agn,i .Taking all three factors into account, we write the likelihood that a fraction of f agn of MBHBs are driven by AGNs as In the last equation, f compl,i refers to the completeness of an AGN catalog, i.e., the probability that the host AGN of the ith detected MBHB is included in the catalog.Its value varies from source to source because it is a function of the luminosity of the host AGN and the luminosity depends on the total mass of the MBHB.That is why we kept it in the last equation instead of absorbing it into the factor f agn (e.g., compare with Eq. (4) of Ref. (Bartos et al. 2017)).We calculate the completeness with where m * limit is the limiting magnitude (or depth) of a particular AGN survey and δ D is a Dirac delta function.Inside the Dirac function, m * (λ Edd , M i , D L,i ) is the apparent magnitude of the host AGN which depends on the Eddington ratio λ Edd of the AGN, the total mass M i of the MBHB, and its luminosity distance D L,i .We calculate the value of the apparent magnitude according to where M * ⊙ = +4.8mag and L ⊙ = 3.8 × 10 33 erg s −1 are the absolute magnitude and luminosity of the Sun, LEdd (M i ) = 1.3×10 38 (M i /M ⊙ ) erg s −1 is the Eddington luminosity of the host AGN of the ith MBHB, and BC is a bolometric correction (explained below).Moreover, p 0 (λ Edd ) is the distribution function of the Eddington ratio and α i is a normalization factor derived from integrating the numerator from negative infinity to positive infinity.In the following, we use a lognormal distribution of N [−1, 0.8] for p 0 (λ Edd ) (Wu & Shen 2022) and a bolometric correction of BC = 1.8 (Shen et al. 2020;Duras et al. 2020) to estimate the apparent magnitude of the host AGN of each MBHB.We use m * limit = +23 mag to calculate the completeness of the Milliquas catalog (Flesch 2021;Lyke et al. 2020).
To statistically constrain f agn in Eq. ( 3), we introduce the likelihood ratio defined in Ref. (Bartos et al. 2017).In our null hypothesis, λ follows a "background distribution" P bg (λ), the form of which has to be determined by Monte-Carlo simulations.What is important is that the distribution of λ will deviate from P bg (λ) in our alternative hypothesis.We evaluate the deviation using the p-value of MBHBs.
(i) If the p-value becomes less than 0.00135, we can reject the null hypothesis with a significance of 3σ and claim that a fraction of f agn of MBHBs are indeed merging in AGNs.(ii) Otherwise, we do not have sufficient evidence to support the alternative hypothesis to address the final-parsec problem.Since the statistical significance depends on the number of detected MBHBs, we denote N threshold GW,3σ as the minimum number of detected MBHBs that is required to reject the null hypothesis at a 3σ significance.

Power of LISA in testing the MBHB-AGN correlation
Figure 2 shows the dependence of N threshold GW,3σ on the mass and redshift of the source.Here we have assumed that only one detector, i.e., LISA is carrying out the observation.We find that in general N threshold GW,3σ is smaller for MBHBs at lower redshifts.The reason is that the error volume for localization shrinks with smaller redshift, so that the number of AGNs (N agn,i ) inside each error volume decreases.When N agn,i is small, its probability distribution (see Eqs. (1) and ( 2)) differs more significantly in the null hypothesis and the alternative one.We also find that N threshold GW,3σ peaks around a total BH mass of 10 7 M ⊙ but the signal-to-noise ratio (SNR) peaks at about 3 × 10 6 M ⊙ .The difference is caused by the incompleteness of the AGN catalog for low-mass GW,3σ , i.e., the minimum number of MBHBs that is needed for a single LISA to reject the null hypothesis, on the mass and redshift of the source.We assume m1 = m2 for the source and set fagn = 1 for our alternative hypothesis.The red contour line represents the average SNR in LISA.
MBHs, since MBHs with smaller masses are fainter on average.
Interestingly, Figure 2 shows that at z ≲ 0.5, even one detection of MBHB suffices to testify the existence of the MBHB-AGN correlation.This powerful constraint owes to the high spatial localization precision of LISA.In such low-redshift, the average number of AGNs inside a typical error volume of MBHB becomes much less than 1 and the the AGN catalog becomes complete.

Benefit of a long observing time and detector networks
Although we found that detecting one MBHB at z ≲ 0.5 by LISA would be sufficient to test the MBHB-AGN correlation, MBHB mergers are rare at low redshifts according to the current population models (Klein et al. 2016).For example, the models used in our work, namely, popIII, Q3d, and Q3nod, predict that about 0.13, 0.09 and 0.47 MBHB mergers would happen at z ≲ 0.5 per year.About a third of the mergers are suitable for our test because AGN surveys normally cover less than half of the sky.Given the above theoretical and observational constraints, a long observing time is preferred for space detectors to distinguish the null hypothesis from our alternative one.
Figure 3 shows the dependence of the required observing time on various factors.Here we have included detectable MBHBs from all redshifts.Our findings are as follows.
(i) The observing time is the shortest in the Q3nod model (red curves), because the event rate of MBHB mergers is the highest.In this model, even if we consider only one detector (LISA, see red solid line), an 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 observational period of 4 − 6 years can already put a 3σ constraint on f agn as long as f agn is greater than 0.6 − 0.8.In the other two models (black and blue solid lines), the required observing time would be longer than 6 years even if we assume f agn = 1.
(ii) The required observing time shortens as the value of f agn increases.Such a dependence is correlated with the behavior of the likelihood ratio λ (defined in Eq. ( 6)), which deviates more significantly from the background distribution P bg (λ) as f agn increases.
(iii) A network of detectors can significantly shorten the observing time.This gain in constraining power is caused by an increase of the precision of sky localization, which reduces the number of AGNs in the error volume.Consequently, given a fixed observing time, we can constrain an even smaller f agn .Take the Q3nod model for example (see the red lines), a 6-year observation by the LISA+TQ (LISA+TJ) network can constrain f agn to a value as small as 0.5 (0.3).

Importance of deeper AGN surveys
In the 2030s when space GW detectors are operating, it is likely that we will have more complete AGN catalogs provided by deeper surveys.A deeper AGN catalog could increase the completeness f compl,i in Eq. (3), which could in turn improve the statistical significance of the MBHB-AGN correlation if such a correlation indeed exists.
To investigate the effect of a deeper AGN survey, we show in Figure 4 the success rate of detecting a MBHB-AGN correlation when f agn = 1 (upper panel) as well as the minimum detectable f agn (lower panel).Each data  Success rate (upper panel) and minimum detectable fagn (bottom panel) as functions of the limiting magnitude of an AGN survey.Here we consider the LISA+TJ network and 6 years of observation.Different colors of the curves refer to different population models.The vertical lines show the limiting magnitude of several representative telescopes, including SDSS (York et al. 2000), Euclid (Scaramella et al. 2022), CSST (Gong et al. 2019), and LSST/Roman (Ivezić et al. 2019;Akeson et al. 2019).
point on a curve is derived from 100 trial simulations.The simulations are based on the LISA+TJ network and an observing time of 6 years.The population models for MBHBs are the same as before, but the AGN catalog is now generated according to an empirical AGN luminosity function Φ(M * , z) (Hopkins et al. 2007), where M * is the absolute magnitude of AGN.Here we have include the evolution of the luminosity function with redshift.Using the luminosity function, we calculate the expected number of AGNs inside an error volume ∆V i with where the upper limit of the integral M * limit,i is calculated according to the limiting magnitude m * limit,i of the chosen telescope and the cosmological redshift of the ith MBHB in the trial.In addition, we assume that only one third of the MBHBs are included in the analyses, in accordance with the sky coverage of the future surveys (Scaramella et al. 2022;Gong et al. 2019).
Figure 4 confirms that a deeper AGN survey can indeed help testify whether MBHB mergers are correlated with AGNs.First, the success rate increases with the limiting magnitude (see the upper panel), except for the Q3nod model in which case the success rate is already high (above 90%) even with a shallow survey.Second, as an AGN survey gets deeper, the minimum detectable f agn also becomes smaller (see lower panel).Take CSST for example, the chance of success is about 78%, 92%, and ∼ 100% in the three population models, and the minimum detectable f agn can reach about 0.73, 0.6, and 0.25.In the previous calculation of the luminosity of an AGN, we have relied on the total mass of the MBHB and the observed statistical distribution of the Eddington ratio (see Eq. 4).However, previous works found that for an AGN hosting a MBHB, the luminosity and its variability also depends on the mass ratio of the two BHs, since mass ratio affects the structure of the accretion disk (Farris et al. 2014;Muñoz et al. 2020;Liao et al. 2023).Unfortunately, the relationship between the accretion rate and mass ratio is not well understood, either theoretically or observationally (see Lai & Muñoz 2022, for a review).Future works are needed to improve this model component.
The mass ratio of MBHB also affects the sky localization.Figure 5 shows the error of sky localization as a function of the chirp mass and mass ratio of the MBHB.Here we are considering LISA observation alone.We can see that more unequal MBHBs have larger skylocalization errors.This result indicates that unequal MBHBs in general place weak constraints on the MBHB-AGN correlation.
Nevertheless, MBHBs with extreme mass ratios are difficult to form because of the long dynamical friction timescales associated with unequal galaxy mergers (Volonteri et al. 2003;Sesana et al. 2004;Klein et al. 2016).For this reason, the majority of the MBHBs in our mock samples have nearly equal masses.

Possible electro-magnetic counterparts to MBHB mergers
We would like to point out two caveats regarding the methodology presented in this work.First, we have assumed that no electro-magnetic (EM) counterparts to MBHBs are observed.In reality, MBH mergers could be accompanied by various EM signatures (see D'Orazio & Charisi 2023 and Amaro-Seoane et al. 2023 for reviews), such as a shift of broad emission lines (e.g.Nguyen et al. 2019Nguyen et al. , 2020;;Songsheng et al. 2019;Kelley 2021;Songsheng & Wang 2023), periodic photometric variation (e.g.Duffell et al. 2020;Gutiérrez et al. 2022;Charisi et al. 2022;Wang et al. 2023), or peculiar jet morpholo- gies (e.g.Begelman et al. 1980;Kulkarni & Loeb 2016;Bright & Paschalidis 2023).These features can help us identify the host galaxies of MBHBs and more accurately constrain the fraction of the MBH mergers inside AGNs.For these reasons, our constraint on the MBHB-AGN correlation can be considered as a conservative one.
Second, when testing our alternative hypothesis, we implicitly assumed that MBHB mergers and AGNs coincide in time.However, because the phase dominated by gravitational radiation can last millions of years (Begelman et al. 1980), the final merger, as well as the detection of the source, could be delayed with respect to the AGN phase.This delay may obscure the correlation between MBHB mergers and AGNs, but such merger events could correlate more tightly with post-starburst galaxies, which are considered to be the immediate descendants of AGNs (Yesuf et al. 2014).

CONCLUSION
In this work, we have shown the power of combing mHz GW observations and AGN surveys to constrain the main driver of MBHB mergers.Our main findings are as follows.(i) Thanks to the high precision of sky localization of LISA, detecting only one MBHB merger at z ≲ 0.5 is already sufficient to prove whether or not all MBHB mergers are driven by AGNs.(ii) However, realistic population models predict that MBHB mergers are rare at low redshifts.Therefore, only in optimistic conditions, e.g., in the Q3nod model, can LISA constrain the fraction of MBHB mergers inside AGNs.
But the strict preconditions could be relaxed if a network of detectors, such as LISA+TQ and LISA+TJ, are considered.(iii) Future deeper AGN surveys can significantly improve the constraint and detect a correlation between MBHBs and AGNs even when the fraction of MBHB mergers inside AGNs is as small as 20% (e.g., using LSST or Roman).Since AGNs constitute about (1 − 10)% of galaxies, a fraction of f agn ≃ 0.2 would indicate that MBHB mergers are closely correlated with AGNs.
Finally, we would like to point out two reminders.First, we only consider circular orbits for MBHBs because earlier works show that orbital eccentricity will be damped by gravitational radiation (Peters & Mathews 1963;Peters 1964).This assumption may not hold according to more resent simulations (Cuadra et al. 2009;Gualandris et al. 2022;Liao et al. 2023).Taking orbital eccentricities into account can further enhance the constraining power of our method because eccentrici-ties excite higher GW modes, which can be used to improve spatial localization (Mikóczi et al. 2012;Yang et al. 2022).Second, we note that the same statistical framework can be applied to another kind of mHz GW sources, e.g., the extreme-mass-ratio inspirals, to test their possible connection with AGNs (Pan et al. 2021).

Figure 1 .
Figure 1.Distribution of sky localization errors for the detectable MBHBs.Different colors indicate different detector configurations, including LISA (gray), LISA+TQ (blue), and LISA+TJ (purple).The dotted, dashed, and solid curves refer to, respectively, the popIII, Q3d, and Q3nod models.Therefore, the shaded areas indicate the uncertainties due to our poor knowledge of the MBHB population.

Figure 2 .
Figure 2. Dependence of N thresholdGW,3σ , i.e., the minimum number of MBHBs that is needed for a single LISA to reject the null hypothesis, on the mass and redshift of the source.We assume m1 = m2 for the source and set fagn = 1 for our alternative hypothesis.The red contour line represents the average SNR in LISA.

Figure 3 .
Figure 3. Detector observing time required to testify the MBHB-AGN correlation with a 3σ significance.Different colors refer to different population models.The solid, dashed, and dotted lines are derived for, respectively, a single LISA, LISA+TQ, and LISA+TJ.The error bars correspond to a 1σ confidence interval (CI).
Figure 4.Success rate (upper panel) and minimum detectable fagn (bottom panel) as functions of the limiting magnitude of an AGN survey.Here we consider the LISA+TJ network and 6 years of observation.Different colors of the curves refer to different population models.The vertical lines show the limiting magnitude of several representative telescopes, including SDSS(York et al. 2000), Euclid(Scaramella et al. 2022), CSST(Gong et al. 2019), and LSST/Roman(Ivezić et al. 2019;Akeson et al. 2019).
Effects of the MBHB mass ratio on AGN accretion and GW detection

Figure 5 .
Figure 5. Dependence of the sky localization error (1σ CI) of LISA on the mass ratio and chirp mass of the MBHBs at z = 1.The black dashed lines represent the results of equal-mass binaries with different total masses.