Impact of Overlapping Signals on Parameterized Post-Newtonian Coefficients in Tests of Gravity

Gravitational waves have been instrumental in providing deep insights into the nature of gravity. Next-generation detectors, such as the Einstein Telescope, are predicted to have a higher detection rate given the increased sensitivity and lower cutoff frequency. However, this increased sensitivity raises challenges concerning parameter estimation due to the foreseeable overlap of signals from multiple sources. Overlapping signals (OSs), if not properly identified, may introduce biases in estimating post-Newtonian (PN) coefficients in parameterized tests of general relativity (GR). We investigate how OSs affect −1PN to 2PN terms in parameterized GR tests, examining their potential to falsely suggest GR deviations. We estimate the prevalence of such misleading signals in next-generation detectors, and their collective influence on GR tests. We compare the effects of OSs on coefficients at different PN orders, concluding that overall the 1PN coefficient suffers the most. Our findings also reveal that while a nonnegligible portion of OSs exhibit biases in PN coefficients that might individually prefer to conclude deviations from GR, collectively, the direction to deviate is random and a statistical combination will still be in favor of GR.

1. INTRODUCTION Gravitational waves (GWs), predicted by general relativity (GR), have provided essential tests of gravity since their first direct detection by the LIGO/Virgo Collaboration in 2015 (Abbott et al. 2016).Current detected GWs are all generated from compact binary coalescences, such as the mergers of binary black holes (BBHs) or neutron stars, and carry information about the sources and the nature of gravity itself.To date, observatories including LIGO, Virgo, and KAGRA have identified approximately 90 GW events, offering us some of the most profound insights into GR and significantly enhancing our grasp of gravitational dynamics in the strong-field regime (Abbott et al. 2017a(Abbott et al. ,b, 2019a(Abbott et al. , 2023a)).Various tests of GR were also performed with current detections with no conclusive sign of GR being falsified (Abbott et al. 2019b(Abbott et al. ,c, 2021a,b;,b;Ghosh 2022).
Next generation (XG) detectors, for example, the Einstein Telescope (ET; Punturo et al. 2010;Hild et al. 2011;Sathyaprakash et al. 2012) and the Cosmic Explorer (CE; Reitze et al. 2019a,b), are expected to receive far more signals than current detectors do due to the largely increased sensitivity and a lower cut-off frequency (Sathyaprakash et al. 2019;Kalogera et al. 2021).In particular, ET, whose sensitivity is expected to be better by a factor of 10 than current-generation ground based detectors, has been shown to detect overlapping signals (OSs) in the mock data challenges (Regimbau et al. 2012;Meacher et al. 2016).
While the enhanced sensitivity of detectors is a significant achievement, it introduces a new challenge in the parameter estimation (PE) process that several studies have explored, notably in dealing with the potential overlap of signals from multiple GW sources (Samajdar et al. 2021;Pizzati et al. 2022;Relton & Raymond 2021;Regimbau & Hughes 2009).Such OSs can complicate the PE process, introducing biases across parameters (Antonelli et al. 2021;Janquart et al. 2022;Wang et al. 2024).In this work we investigate if misidentifying OSs can distort PE on parameters in the post-Newtonian (PN) waveform (Yunes & Pretorius 2009), suggesting deviations from GR.In the case where a weaker secondary signal overlaps with a primary signal, erroneously classifying this as a single, pure signal can lead to inaccuracies.It leads us to three crucial questions: • Is it possible for OSs, if inaccurately perceived as from a singular source, to misguide us into concluding a deviation from GR?
• Are such deceptive OSs frequent enough to pose a genuine concern for XG detectors?
• If a substantial number of such signals are identified, do they, as a group, indicate a deviation from GR?
There exist abundant studies on tests of parameterized PN coefficients for GR in general (Arun et al. 2006;Mishra et al. 2010;Cornish et al. 2011;Abbott et al. 2021b;Wang et al. 2022).For OSs, Hu & Veitch (2023) have shown that they may lead to incorrect measurement of deviation from the 0PN term in the GR waveform.However, a comprehensive assessment of the effects of OSs on PN coefficients is lacking.Our work aims to explore the aforementioned questions related to OSs regarding false-alarmed deviations in various PN coefficients, including −1PN, 0PN, 0.5PN, 1PN, 1.5PN, 2PN coefficients, and offers a comparative analysis of how each PN coefficient responds to misidentified OSs.We anticipate not only a potential for misleading deviations across all PN terms but also that some PN terms might be more adversely affected than others when leaving out the secondary signals in the OSs.
Our results show that there exist OSs resulting in large bias in multiple PN coefficients, and notably, these misleading signals represent a non-negligible portion of all overlapping instances to be registered by XG detectors.We also find that the relative size of the fractions of misleading OSs at different PN orders depends on how we define OSs.If only two GWs with very small mergertime difference can be identified as OSs, the fraction of misleading OSs at higher PN orders is larger.But generally, the impact of overlapping on the 1PN term is most significant among −1PN to 2PN coefficients, that is, GR-violation effects are most likely to be observed at 1PN when misidentifying an OS as one single GW signal.Fortunately, these signals only prefer models with deviation from GR individually, but not collectively, manifested in the rapid decrease of Bayes factor with increased occurrences of misleading signals.This aligns with the result in Hu & Veitch (2023), which considered the deviation from the 0PN term when OSs exist.
This paper is organized as follows.We introduce PE methods used in this work in Sec. 2, including how we calculate the Fisher matrix, biases, and Bayes factor for OSs.In Sec. 3 we list our simulation setup for both the waveforms and the population model for BBHs.Our results are given in Sec. 4, and Sec. 5 concludes the work.

PARAMETER ESTIMATION METHODS
To determine the parameters θ of the GW signal from data g(t) received by GW detectors, one can calculate the posterior distribution of the parameters using Bayes' theorem, P (θ|g(t)) ∝ P (θ)P (g|θ), where P (θ) is the prior of the parameters θ given assumed background knowledge, and P (g|θ) is the conditional probability of receiving data g(t) given parameters θ.The data g(t) consist of the GW signal h(t) and noise n(t).Assuming the noise is stationary and Gaussian, the likelihood reads (Finn 1992) where the inner product (u, v) is defined as where S n is the power spectral density of the noise, u(f ) and v(f ) are the Fourier transforms of u(t) and v(t), respectively.
In principle, given the GW data, one can use some sampling techniques, such as the Markov-Chain Monte Carlo (MCMC) method (Christensen & Meyer 1998;Christensen et al. 2004;Sharma 2017) and Nested Sampling (Skilling 2004(Skilling , 2006)), to obtain the posterior distributions of θ.However, the full Bayesian inference takes considerable computational time and resources, especially when the dimension of parameter space is high.In this work, we adopt the Fisher matrix (FM) as a fast approximation of the full Bayesian inference, which has been successfully applied to conducting PE for numerous OSs in previous works (Antonelli et al. 2021;Hu & Veitch 2023;Wang et al. 2024).Mathematically, the FM is defined as (Cutler & Flanagan 1994;Vallisneri 2008) where h ,α ≡ ∂h/∂θ α .Under the high signal-to-noise ratio (SNR) limit, or equivalently the linear signal approximation (LSA), the posterior can be well approximated by a Gaussian distribution (Finn 1992;Vallisneri 2008).
Denoting the inverse of the FM as Σ ≡ F −1 and assuming flat priors, the standard deviations and correlation coefficients of parameters are given by (no summation over repeated indices) Considering a superposition of two independent GW signals in the detector, the data can be written as where θ(i) represents the true parameters of the i-th signal.We consider the situation where an OS is mistakenly categorized as a signal containing a single event, and therefore only one-signal model h(t; θ) is used in the PE.Usually, the louder signal can be identified, while the weaker one induces biases in the PE of the louder signal (Pizzati et al. 2022).Without loss of generality, we assume that signal 1 is louder than signal 2.
When conducting PE, it is more convenient to consider the ratio of the bias relative to the statistical uncertainty of the parameters, that is, the reduced bias (Wang et al. 2024) where ∆θ α bias = Σ αβ h (1) ,β , h (2) refers to the absolute bias.Note that the index α is not summed.In this formula, we have assumed LSA, meaning that the SNR of signal 1 is large enough.
In PE, when |B α | exceeds 1, the bias is significant.When θ α is chosen as a PN-deviation parameter in the parameterized PN waveform test, we call the OSs with |B α | > 1 as "misleading OSs", which means that this kind of OSs may lead to a false conclusion of a deviation from GR.
Bayes factor is another useful indicator for assessing whether the data show a stronger preference for GR or for a deviation from GR.The Bayes factor K M1,M0 is defined as the ratio of evidences, where D stands for data, and M 1 , M 0 are two competing models.Bayes factor compares how well each model predicts the data.Roughly speaking, when K M1,M0 tends to an extreme large/small value, the data favor/disfavor M 1 over M 0 .Otherwise, the preference for M 1 and M 0 is comparable.In our calculation, D is the GW signal g(t), M 1 represents a model allowing the waveform deviating from GR at −1, 0, 1 and 2 PN orders, respectively denoted as M −1PN , M 0PN , M 1PN and M 2PN .The model M 0 represents GR.As an example, assuming flat priors and LSA, the Bayes factor of M −1PN against M 0 is given by (also see Eq. ( 13) in Hu & Veitch 2023) where p δϕ −1 is the prior probability for the −1PN deviation parameter δϕ −1 .We incline to consider p δϕ −1 as a constant.The residual chi-square, χ2 Mi , is defined as ,α ∆θ α bias,M i , (10) where ∆θ α bias,M i is the absolute bias of parameter α under hypothesis M i .

SIMULATION SETUP
We generate non-spinning, stellar-mass BBH signals using the IMRPhenomD waveform model.Similarly to Wang et al. (2024), we average the waveform over the sky position, inclination and polarization angles.This corresponds to a conservative scenario, where two GW signals come from the same direction with the same inclination angle and polarization angle.Using the angleaveraged waveform will affect the fraction of misleading OSs.However, here we are more interested in comparing the fractions of misleading OSs for different PN deviations, which are hardly affected by the angle-averaging.
When generating OSs, we use the GR waveform for both signals 1 and 2, and the free parameters for each signal are the chirp mass M in the detector frame, the symmetry mass ratio η, the luminosity distance d L , and the time of coalescence t c .When we recover with one single GR waveform, the estimated parameters are simply {M, η, d L , t c } of the primary signal.
For the GR-deviation model at the i-th PN order, we add a deviation parameter δϕ i , which leads to a phase correction in the Fourier-domain waveform (Abbott et al. 2021b) where u ≡ (πM f ) 1/3 , and M is the redshifted total mass.We only consider the modification in the inspiral stage, which cuts off when the frequency is twice the innermost stable circular orbit frequency.Usually, the deviation parameters at different PN orders are studied separately, assuming only one PN deviation parameter is nonzero, which is more effective to pick up possible deviations at different PN orders (Abbott et al. 2021b).Therefore, for the i-th PN deviation model, only 5 parameters, {M, η, d L , t c , δϕ i }, are estimated.
As for the detector, we choose the ET as an example, which is expected to detect thousands of OSs per year (Pizzati et al. 2022).We will use the designed ET-D sensitivity curve (Punturo et al. 2010;Hild et al. 2011;Maggiore et al. 2020).Now we introduce the population model.When generating OSs, we inject signal 1 and signal 2 with GR waveform, which requires the binary masses, the luminosity distance and the time of coalescence of the two signals.
We also request the SNR of signal 1 to be larger than 30 to ensure the validity of FM approximation.For the secondary signal, which is randomly drawn from population samples, we require its SNR to be smaller than that of the primary.Our treatment here is somehow arbitrary, but reasonable.More sophisticated treatments might be adopted in future studies.
The BBH mass population is generated according to the POWER+PEAK phenomenological model given in Abbott et al. (2019dAbbott et al. ( , 2023b)).For the distribution of the luminosity distance, we adopt the model given in Regimbau et al. (2017), which gives the merger rate in the detector frame, where V is the comoving volume.R m (z) is the merger rate per comoving volume at redshift z,  (Ando 2004;Belczynski et al. 2006;Nakar 2007;O'Shaughnessy et al. 2008;Dominik et al. 2012Dominik et al. , 2013)).The star formation rate R f (z) is given by where ν = 0.146 M ⊙ yr −1 Mpc −3 , a = 2.80, b = 2.46 and z p = 1.72 (Vangioni et al. 2015;Behroozi & Silk 2015).
The population model of t c needs more discussion.On the one hand, both the biases and the Bayes factor only depend on the difference between t (1) c and t (2) c , denoted as ∆t c , so we only need to consider the distribution of ∆t c .On the other hand, in this work we are interested in the fraction of misleading OSs in all OSs, but there is no unified criterion for OSs (Wang et al. 2024).Usually, this is done by setting a threshold ∆t c,th for ∆t c between two signals, which ranges from 0.1 s to several seconds in the previous works (Pizzati et al. 2022;Relton & Raymond 2021;Himemoto et al. 2021).To avoid the ambiguity of choosing ∆t c,th , we vary ∆t c,th in a wide range, from 0.1 s to O(10) s and generate the corresponding t c samples for each threshold.Consider an observation with a duration of T , and two GW events that are equally likely to occur at any time in this observation, i.e., both t (1) c and t (2) c obey a uniform distribution in the time interval (0, T ).We find that the conditional distribution of ∆t c given a threshold, P t c | ≤ ∆t c,th , can be well approximated by a uniform distribution.

RESULTS
As an illustration, we consider an OS with the following parameters, In Figure 1 we plot the change of the reduced bias with respect to ∆t c for this OS.It answers our first question in Sec. 1, namely, it is possible that, under the effects of a weaker signal 2, some PN terms of signal 1 possess a reduced bias larger than 1.It is also worth attention that large biases of −1PN, 0PN, 0.5PN, 1PN (and less significantly 1.5PN) coefficients are obtained around ∆t c = 1 s, while the bias of 2PN coefficient has peaked at a much smaller ∆t c around zero.This can be explained by the fact that the higher order PN terms are mainly prominent in the late inspiral phase of the signal, and we expect the deviation from a higher order PN term to be more significant when ∆t c between two signals is small.
Simply highlighting a single instance with parameters in Eq. ( 15) is not sufficient to underscore the risks posed by all OSs, as such instances may be too infrequent to a warrant general concern.To answer the second question in Sec. 1, we generate populations for different ∆t c,th and calculate the biases for every PN-deviation waveform under consideration.Afterwards, we calculated the fraction of misleading OSs in all OSs in each scenario, where the event numbers are large enough to ignore the statistical fluctuations.The findings are presented in Figure 2. The observation is that the fractions of misleading OSs, defined as |B δϕ | > 1, are around ten percent for populations with small ∆t c,th .Such fractions are not negligible.All fractions then decrease as ∆t c,th increases, because most signals with strong overlapping features happen at small ∆t c .When ∆t c,th is large, e.g., larger than 10 s, all fractions are inversely proportional to ∆t c,th .This means that almost all misleading OSs have ∆t c ≤ 10 s. 1  The behaviors of these fractions in Figure 2 at different PN orders are also noteworthy.We note that the fraction of misleading OSs in the 2PN-deviation and 1.5PNdeviation cases is larger than that in the 1PN-deviation case when ∆t c,th is within 0.5 s, while for a larger ∆t c,th it is the opposite.Since the 2PN and 1.5PN terms are 1 The fraction of the misleading OSs is If all misleading OSs have ∆tc ≤ 10 s, the first factor on the right-hand side of the above equation does not depend on ∆t c,th , while the second factor is inversely proportional to ∆t c,th .more prominent in the late inspiral phase, it is more likely to generate misleading OSs when ∆t c is small (as shown in Figure 1), corresponding to a relatively large fraction.As a trade-off, almost all misleading OSs in the 2PN-and 1.5PN-deviation cases have ∆t c ≲ 0.5 s, corresponding to the quick decrease after ∆t c,th ≃ 0.5 s.In comparison, in the 0PN, 0.5PN and 1PN deviation cases, similar behavior of these fractions is observed at ∆t c,th between 1 s and 10 s.The fraction of misleading OSs in the 1PN-deviation case decreases faster than that at the 0PN and 0.5PN orders, meaning that the number of misleading OSs in the 0PN-and 0.5PN-deviation cases with 1 s < ∆t c < 10 s is larger than that in the 1PNdeviation case.However, since ∆t c is relatively large when the 0PN and 0.5PN term dominates the phase evolution, it is difficult to generate large biases, and the total number of misleading OSs in the 1PN-deviation case is still larger than that in the 0PN-deviation case for a large ∆t c,th .In general, for most ∆t c,th (from 2 s to 70 s), the fraction of large bias in 1PN coefficient stays the highest among the PN-deviation cases we investigate in this work.
From the above findings, we observe a non-negligible proportion of misleading OSs to parameterized PN waveform tests.These OSs, when analyzed individually in tests of GR, tend to favor their respective PNdeviation model more strongly relative to the GR waveform.However, this is not the case when all misleading OSs are considered collectively because of the disagreement on the values of deviation parameters arising from each misleading OS. Figure 3 shows the evolution of Bayes factors of each PN-deviation model against GR with the increase of number of misleading OSs, denoted as N MOS .For all PN-deviation models, their corresponding Bayes factors exceed 1 only for the first few misleading OSs, and then start drastically decreasing as more misleading OSs enter the test of GR.It is evident that biases in PN deviation terms caused by OSs are not constant as they do not represent a "real" deviation.This is because these biases arise solely from the misclassification of an OS as a single-event signal.
However, in our PN-deviation models, the deviation parameters are assumed to be constant.When combining two misleading OSs, where, say, one has the reduced bias B α > 1 and the other has B α < −1, the Bayes factor will rapidly decrease due to the inconsistent biases.
To support this, we plot the distribution of reduced biases of the misleading OSs for each PN deviation model in Figure 4, from which we can observe symmetric distributions of reduced biases.
Another interesting aspect noted in Figure 3 is that the Bayes factor for the 2PN and 1.5PN deviation models decline more rapidly than those for the −1PN, 0PN, 0.5PN, and 1PN deviation models, especially at small ∆t c,th .In Figure 4, one can find that the distributions of biases of the −1PN, 0PN, 0.5PN and 1PN deviation models cluster more densely near 1, while the distributions for the 2PN and 1.5PN deviation models are broader, leading to a greater variance in individual bi-ased values.This is consistent with the argument above that it is more likely to have larger biases in the deviation parameter at the 2PN and 1.5PN orders.Therefore, the Bayes factor for the 2PN and 1.5PN deviation models decrease more rapidly with accumulation of misleading OSs.

CONCLUSIONS AND DISCUSSIONS
The XG ground-based detectors are anticipated to exhibit a significantly enhanced detection rate of BBHs relative to their forerunners, indicating the unavoidable appearance of OSs in future detection (Regimbau & Hughes 2009;Meacher et al. 2016;Kalogera et al. 2021).A primary concern regarding such OSs is how they can potentially introduce biases if an OS is mistakenly treated as a single-event signal, thereby misleading parameterized tests of GR.Hu & Veitch (2023) studied the biases in PE when testing GR at the 0PN order with these OSs, and left the comprehensive investigation at different PN orders for future work.
In this work, we extend Hu & Veitch (2023) and investigate how OSs affect −1PN to 2PN terms in parameterized tests of GR.We adopt the FM approximation to calculate the biases introduced by OSs.FM is a fast and effective method to study the biases of OSs (Finn 1992;Antonelli et al. 2021;Hu & Veitch 2023;Wang et al. 2024).For each PN deviation parameter, we calculate the proportion of signals exhibiting significant bias, and further explore the Bayes factor over accumulation of such misleading OSs.We find that a non-negligible fraction of OSs could produce a significant bias in each PN deviation parameter, especially when the difference in coalescence times, ∆t c , is small.It is worth noting that there are significant differences between the fractions of misleading OSs at different PN orders, while the fraction in the 1PN-deviation case usually stays the largest.This underscores the importance of consideration for GR deviations at different PN orders when discussing the impact of OSs on testing gravity.However, different misleading OSs lead to inconsistent values for deviation parameters, and if analyzed collectively, they do not show a preference for non-GR deviations over GR.This is supported by the decreasing trend of the Bayes factor with an increasing number of misleading OSs, shown in Figure 3.
In real detection, there exist other sources of PE biases when analyzing OSs, such as the waveform modeling (Hu & Veitch 2023), instrumental calibration (Sun et al. 2020;Hall et al. 2019) and detector glitches (Pankow et al. 2018).We leave a detailed assessment and comparison of all these effects on testing gravity for future work.However, as the misleading OSs discussed, we emphasize the importance of considering the effects at different PN orders to obtain a global understanding.

Figure 1 .Figure 2 .
Figure 1.Reduced bias of each PN deviation term as a function of ∆tc, for an OS characterized by parameters in Eq. (15).

Figure 3 .
Figure 3. Bayes factor divided by p(δϕ) over accumulation of misleading OSs, denoted as NMOS, with different ∆t c,th for each panel.The inset within each plot is a zoom-in plot of the initial behavior of the Bayes factor.

Figure 4 .
Figure 4. Distribution of large biases of each PN deviation parameter for different ∆t c,th .