How to Detect an Astrophysical Nanohertz Gravitational-Wave Background

Analysis of pulsar timing data have provided evidence for a stochastic gravitational wave background in the nHz frequency band. The most plausible source of such a background is the superposition of signals from millions of supermassive black hole binaries. The standard statistical techniques used to search for such a background and assess its significance make several simplifying assumptions, namely: i) Gaussianity; ii) isotropy; and most often iii) a power-law spectrum. However, a stochastic background from a finite collection of binaries does not exactly satisfy any of these assumptions. To understand the effect of these assumptions, we test standard analysis techniques on a large collection of realistic simulated datasets. The dataset length, observing schedule, and noise levels were chosen to emulate the NANOGrav 15-year dataset. Simulated signals from millions of binaries drawn from models based on the Illustris cosmological hydrodynamical simulation were added to the data. We find that the standard statistical methods perform remarkably well on these simulated datasets, despite their fundamental assumptions not being strictly met. They are able to achieve a confident detection of the background. However, even for a fixed set of astrophysical parameters, different realizations of the universe result in a large variance in the significance and recovered parameters of the background. We also find that the presence of loud individual binaries can bias the spectral recovery of the background if we do not account for them.

understand the effect of these assumptions, we test standard analysis techniques on a large collection of realistic simulated datasets.The dataset length, observing schedule, and noise levels were chosen to emulate the NANOGrav 15-year dataset.Simulated signals from millions of binaries drawn from models based on the Illustris cosmological hydrodynamical simulation were added to the data.We find that the standard statistical methods perform remarkably well on these simulated datasets, despite their fundamental assumptions not being strictly met.They are able to achieve a confident detection of the background.However, even for a fixed set of astrophysical parameters, different realizations of the universe result in a large variance in the significance and recovered parameters of the background.We also find that the presence of loud individual binaries can bias the spectral recovery of the background if we do not account for them.

INTRODUCTION
Pulsar timing arrays (PTAs) monitor millisecond pulsars over a timespan of decades.The times-of-arrival (TOAs) of these radio pulses are sensitive to various physical processes, including variations in the proper distance between the pulsars and the observer due to passing gravitational waves (GWs).PTAs recently reported evidence for a nanohertz stochastic GW background (GWB) in their most recent datasets (Agazie et al. 2023a;Antoniadis et al. 2023a;Reardon et al. 2023;Xu et al. 2023), and their results were shown to be consistent with each other (The International Pulsar Timing Array Collaboration et al. 2023).The most plausible source of such a background is a large collection of supermassive black hole binaries (SMBHBs, Agazie et al. 2023b), but other more exotic phenomena can also explain the signal (Afzal et al. 2023).
Several methods have been developed to search for a GWB in PTA datasets.While the GWB is expected to first appear as a common uncorrelated red noise process (CURN), the definitive signature of a GWB is considered to be the characteristic Hellings-Downs (HD, Hellings & Downs 1983) angular correlation pattern between pulsars.The most notable frequentist method is the so called optimal statistic (OS, Chamberlin et al. 2015), which is an unbiased maximum-likelihood estimator of the GWB amplitude based on the crosscorrelations between pulsars and the theoretical correlation pattern.It also allows one to define a signal-tonoise ratio (S/N), which is used as a detection statistic.Bayesian methods, on the other hand, marginalize over the parameters of a Fourier basis Gaussian process that describes the GWB (Lentati et al. 2013;van Haasteren & Vallisneri 2014).This can be constructed either with (HD model) or without HD correlations between pul- * Sloan Fellow † NASA Hubble Fellowship: Einstein Postdoctoral Fellow ‡ NANOGrav Physics Frontiers Center Postdoctoral Fellow § Deceased ¶ NSF Astronomy and Astrophysics Postdoctoral Fellow sars (CURN model).One can calculate the Bayes factor (BF) between these two models, which is the most important quantity in quantifying the level of Bayesian evidence for a GWB.
As with any statistical data analysis, one needs to make various assumptions when calculating either the OS S/N or the HD vs. CURN BF.Most commonly one assumes that the GWB is: i) Gaussian; ii) isotropic; iii) described by a power-law spectrum1 .However, if the GWB is produced by a finite number of SMBHBs, none of these assumptions hold exactly (see e.g.Bécsy et al. 2022).Methods have been developed that relax some of these assumptions.Assumption iii) is the one most often relaxed either by modeling the spectrum as a two-component so called broken power-law model or allowing the amplitudes of different frequency components to vary freely (see e.g.Sampson et al. 2015;Taylor et al. 2017;Agazie et al. 2023a,c;Meyers et al. 2023;Lamb et al. 2023;Gersbach et al. 2023).There are also specific searches looking for an anisotropic GWB (Mingarelli et al. 2013;Taylor & Gair 2013;Romano et al. 2015;Gair et al. 2015;Taylor et al. 2020;Ali-Haïmoud et al. 2020;Pol et al. 2022;Agazie et al. 2023d), thus relaxing assumption ii).One way of relaxing the assumption of Gaussianity is by modeling the background as a t-process, where the GWB power at each frequency follows a Student's t-distribution instead of a Gaussian distribution, thus allowing more flexibility (see Appendix D in Agazie et al. 2023a).While such generalized search algorithms are available, the flagship results in most GWB searches continue to rely upon these assumptions (see e.g.Agazie et al. 2023a;Antoniadis et al. 2023a;Reardon et al. 2023).This is in part due to their simplicity, which makes them easier to compute.In addition, it is also widely expected that their assumptions are good enough to make them strong and robust methods.
The expectation that models employing these assumptions are capable of detecting GWBs, even when these assumptions are broken is based largely on Cornish & Sesana (2013) and Cornish & Sampson (2016).In Cornish & Sesana (2013) it was shown that even a single SMBHB produces HD correlations in an array of isotropically distributed pulsars -showing that we can trade isotropy of the background for isotropy of the pulsar array.In Cornish & Sampson (2016) the authors studied both realistic GWBs based on SMBHB population models and also idealized Gaussian, isotropic GWBs with a power-law spectrum.They found that the significance with which one can detect these is similar, except for cases with an unrealistically low number of binaries.In this paper we carry out a similar analysis with a number of improvements: 1. We use a more up-to-date SMBHB population model; 2. Instead of pulsars with evenly sampled data, we use the actual observation times from the NANOGrav 15-year dataset (Agazie et al. 2023e); 3. In addition to white noise, we include pulsar red noise as well, and we set the level of those based on the noise properties found in the NANOGrav 15-year dataset (Agazie et al. 2023c); 4. Instead of simple frequency-domain analysis, we use the actual software used in Agazie et al. (2023a), that works in the time-domain, and takes into account covariances with the timing model.
Our analysis can also be considered as an extension of the consistency checks carried out in Johnson et al. (2023).There, these analysis pipelines were extensively stress tested to make sure they perform properly under the assumption that the data analyzed conforms to the models used.We extend the scope to see how these pipelines perform if their assumptions are not met perfectly, as is expected to be the case when analyzing real data.This is a special kind of model misspecification, where our signal model does not match the data.Understanding the effects of signal model misspecification, along with noise model misspecification (Goncharov et al. 2021) and developing consistency checks for our models (Meyers et al. 2023), will be increasingly important as PTAs get more and more sensitive.
The rest of the paper is organized as follows.In Section 2, we describe our realistic population-based simulated datasets and give more details about the statistical methods employed on them.In Section 3, we present our results, and in Section 4 we provide a brief conclusion and discuss future work.

Realistic simulated datasets
Our simulated datasets are based on the pulsars and their measured noise properties in the NANOGrav 15year dataset (Agazie et al. 2023e), following the simulation framework of Pol et al. (2021).We use the libstempo software package, and we simulate datasets with the actual observing times from the real dataset, but to reduce the data volume we only keep one observation per epoch.We simulate data with the white and red noise properties fixed to the maximum a posteriori values inferred from the NANOGrav 15-year dataset (Agazie et al. 2023c).
To simulate the contribution of a realistic GWB, we use an SMBHB population model implemented in the holodeck software package (Kelley et al. 2023), based on the Illustris cosmological hydrodynamical simulation (Vogelsberger et al. 2014).This model applies a post processing step to account for small-scale physics not captured by Illustris, uses a phenomenological model to describe binary evolution (for details see Agazie et al. 2023b), and produces a simulated list of binaries in the Universe (Kelley et al. 2017a(Kelley et al. ,b, 2018)).To add the contribution of these binaries, we model the brightest 1000 binaries in each frequency bin individually and add the rest as a stochastic background.This was shown to produce equivalent results to a full simulation, but with drastically reduced runtimes (Bécsy et al. 2022).We produce hundreds of realizations of these simulations by drawing new binary populations to account for cosmic variance, and randomizing their extrinsic parameters (sky location, inclination, polarization angle, phases).
We then analyze them with both Bayesian and frequentist statistical methods, which are described in the next sections.

Bayesian methods
To carry out a Bayesian statistical analysis of our datasets, we use the following likelihood function (for more details see, e.g., Eq. (7.35) in Taylor 2021): where δt are the timing residuals, and C is the noise covariance matrix that includes the effects of both white and red noise and the linearized timing model.The matrix C depends on parameters describing the noise.
In our analysis we fix the white noise parameters to their true values.We model the spectrum of intrinsic pulsar red noise with a power-law model, so that its power spectral density (PSD) in the ith pulsar is: where A i and γ i are the amplitude and the spectral slope of the RN in the ith pulsar, respectively, and f yr = (yr) −1 is the frequency corresponding to a period of a year.We marginalize over A i and γ i in our analysis.
Similarly, we model the timing residuals induced by the GWB as a red noise process with a power-law spectrum, such that its cross-power spectral density between the ith and jth pulsar is: where Γ ij is the overlap reduction function (ORF) between the two pulsars.For a GWB, the ORF is given by the HD curve (Hellings & Downs 1983), which introduces correlations between pulsars depending on their angular separation on the sky.Another useful model is a so called common uncorrelated red noise (CURN), which assumes Γ ij = δ ij , so no correlation between pulsars.This model is much more efficient to calculate, because C is block-diagonal.In practice, the posteriors on A GWB and γ GWB are similar between the CURN and HD models.This allows one to sample the simpler CURN model and reweight the posterior samples to get fair draws from the HD posterior (for details see Hourihane et al. 2023).The mean of the weights also serves as a measurement of the BF between the HD and CURN models, which is the primary quantity one needs to measure to claim a detection of the GWB.The variance of the weights can be used to approximate the number of effectively independent samples after reweighting.The ratio of this to the total number of samples defines the reweighting efficiency, which needs to be sufficiently high to get reliable results.Over the 773 realizations we analyzed, only 17 showed a reweighting efficiency of less than 5%, most of which resulted only in a modest error of the measured BFs, despite the low efficiency.Thus the use of the reweighting technique should not have a significant effect on our results.

Frequentist methods
While GWB searches usually focus on Bayesian methods, frequentist methods can also be useful, particularly given their significantly faster computational speed.The most common frequentist method used in searches for a GWB is the so called optimal statistic (OS, Chamberlin et al. 2015), which is an unbiased estimator of the GWB amplitude.It is constructed by maximizing the likelihood of the measured cross-correlation values under the assumption of an ORF and a (typically power-law) PSD model.This results in the estimated GWB amplitude ( Â) and the standard deviation of that estimate (σ A ).The ratio of these gives the signal-to-noise ratio, S/N, which can be used as a detection statistic for the presence of a GWB.
One disadvantage of the OS is that one needs to know the PSD describing the GWB and the intrinsic red noise of each pulsar, which is not known a priori in a real dataset.One can measure the PSD using Bayesian methods, and use a point estimate (usually the maximum a posteriori ) to calculate the OS.However, this does not take into account the uncertainty of the measured spectrum.To do so, Vigeland et al. (2018) developed the so called noise-marginalized OS, which calculates the OS for random draws of the spectrum from a Bayesian posterior, thus producing a distribution of OS values.Subsequently, Vallisneri et al. (2023) have also proposed a related method for interpreting the OS in the context of posterior predictive checking.
The OS can also be used to calculate the S/N of processes described by different ORFs.Two commonly calculated ORFs are the monopole and dipole, which could arise e.g. from systematic clock errors and ephemeris errors, respectively.One limitation of the standard OS, is that since these ORFs are usually not orthogonal, processes with less preferred ORFs can still produce high S/N values.To alleviate this problem, Sardesai & Vigeland (2023) developed the so called multi-component OS, which can allow multiple processes with different ORFs to fit the data simultaneously.As a result, different processes can be better decoupled.In this study we calculate S/N values with the noise-marginalized multicomponent OS method to assess the frequentist significance of the GWB, along with monopole or dipole correlated processes.

RESULTS
To produce robust results, we analyzed 773 different realizations of our simulated 15-yr dataset.The median (thick lines) and mean (thin lines) GWB spectra over these realizations is shown in Figure 1, both for all the binaries (red solid lines) and for all except the brightest binaries in each bin (green dashed lines).The shaded region represents amplitudes between the 5th and 95th percentile in each bin.We also show a spectrum with γ = 13/3 as a reference.Note that this has a slope between the median and mean lines for all sources, which give γ = 4.6 and γ = 4.2 in a linear fit to the first five frequency bins, respectively.We can also see that the median spectrum is similar to the mean spectrum once the brightest binary is removed.However, for the total spectrum, the mean spectrum is shallower and follows the expected 13/3 spectral slope better.This is due to the fact that the median is insensitive to outliers, and thus we lose the information about the bright sources.Figure 1.Mean and median GWB spectra over 773 realizations.Also shown is the spectrum after removing the brightest binary's contribution in each frequency bin.The shaded regions show the range of spectra between the 5th and 95th percentile.We also show a power-law spectrum with a canonical 13/3 slope as a reference.

Stochastic background
We carry out a Bayesian search for a 14-frequency CURN process in each realization, resulting in posterior distributions of the power-law parameters.Based on those, we calculate the median A GWB and γ GWB values for each realization.The distribution of these are shown on Figure 2 (blue), along with A GWB and γ GWB values estimated by a simple least squares fit to the theoretical spectrum using either the first 5 (orange) or the first 14 (green) frequency bins.Interestingly, fitting to the first 5 frequencies gives very similar values to the actual Bayesian CURN analysis, even though that uses the first 14 frequencies.This is due to the fact that the Bayesian analysis is dominated by the first few frequencies, where the signal is the strongest.Conversely, a 14-frequency fit to the spectra provides unrealistically accurate measurements, as it does not take into account the effect of red and white noise, the latter of which dominates the GWB signal at higher frequencies.Note that all of these distributions center roughly around the theoretically expected γ GWB = 13/3 value, and the amplitude to which this population was calibrated.However, they show a significant spread indicating that the lower γ GWB value and higher amplitude found in the NANOGrav 15-year GWB search (posterior distribution and median values shown by red contour and dot on Figure 2) is compatible with the binary population in our simulations.Note that this is consistent with the findings of Agazie et al. (2023a), where they found that the NANOGrav results are compatible with an SMBHB interpretation based on a comparison between the recovered A GWB and γ GWB values and fits to SMBHB population models. 2As we can see, Bayesian spectral analysis of population-based simulated datasets yield similar conclusions.This is not surprising given that we see a simple fit to the spectrum gives very similar results to a full analysis.
For each realization, we also produce HD posteriors using the reweighting method (Hourihane et al. 2023).We calculate the BF between the HD and CURN models which are shown as the blue histogram in Figure 3.Note the large variance of the BF, even though we fixed the astrophysical parameters of the simulation (more than eight orders of magnitude between the lowest and highest BF).To understand the source of this variance, it is worth looking at the recovered BFs as a function of the amplitude of the GWB. Figure 4 shows the BFs as a function of log 10 A 0.1yr −1 , i.e. the amplitude of the GWB referenced at a frequency of 0.1yr −1 instead of the usual 1yr −1 .We use this reference point, since there the amplitude is less covariant with γ GWB .We can see that while the BF is correlated with the amplitude, even at a given fixed amplitude we see significant scatter in the BF.Thus even for universes described by the same astrophysics and a GWB with the same amplitude, the recovered BF shows a significant variance.This is consistent with previous findings on both astrophysical and simple power-law backgrounds (Cornish & Sampson 2016;Hourihane et al. 2023).Thus this is a generic property of any stochastic background, not specifically of astrophysical backgrounds.
Figure 3  Mean of simulations (all) Simulations ( 14.13 < lgA0.1yr 1 < 13.88) Mean of simulations ( 14.13 < lgA0.1yr 1 < 13.88) NANOGrav 15-year w/ 14 frequencies Figure 3. Distribution of HD vs. CURN BFs for all 773 realizations (blue) and for 14 realizations with a recovered amplitude consistent with the amplitude recovered in the NANOGrav 15-year analysis at the 90% confidence level (purple).Also shown are the means of these distributions (dashed vertical lines), and the BF reported in Agazie et al. (2023a) based on the NANOGrav 15-year dataset (red).While this particular astrophysical model predicts amplitudes and BFs that tend to be lower than the one found in the NANOGrav 15-year dataset, if we select a subset of these that produce a consistent amplitude, they also show good consistency in terms of BF. with that reported in the NANOGrav 15-year GWB search (Agazie et al. 2023a), along with the actual measured BF in that search.We can see that the simulations show about four orders of magnitude scatter even in this narrow amplitude range, but they predict BFs that are roughly consistent with the real data.Note that this is not a rigorous consistency check of the 15-year results with the astrophysical predictions, since our astrophysical models tend to produce a lower amplitude than the one found in the 15-year dataset.Thus, the mean of our simulations produces a lower amplitude and lower BF than found in the 15-year data.However, once we filter out only those realizations that produce an amplitude close to the one found in the real data, we find that the BFs are consistent with the one measured on real data.This stands to show that our simplified analysis methods are capable of detecting the GWB with a significance reported in (Agazie et al. 2023a) even if the GWB is coming from a finite number of binaries.We also calculate the noise marginalized multicomponent optimal statistic for all realizations.Figure 5 shows the distribution of the mean monopole, dipole, and HD S/Ns, along with the HD vs. CURN BFs and log 10 A 0.1yr −1 .We also indicate the values found in the NANOGrav 15-year dataset on Figure 5 (red markers).Note that these are all roughly consistent with our simulations.Black lines represent zero S/Ns and BFs.Note that as expected both monopole and dipole S/N values We can see some interesting correlations between values shown on Figure 5.To quantify these, we calculate the Pearson correlation coefficient (r) for each parameter pair and show them on the figure where the correlation is significant (p-value < 10 −3 ).The strongest correlation of r = 0.82 can be observed between HD S/N and HD vs. CURN BF.This is reassuring, since these are both meant to quantify the significance of the presence of HD cross-correlations in the data.In fact, both Pol et al. (2021) and Vallisneri et al. (2023) gives an approximate mapping between these two quantities.Both the HD S/N and HD vs. CURN BF values are also correlated with log 10 A 0.1yr −1 , with r = 0.39 and r = 0.43, respectively.These relatively low correlation values are in line with our finding above, that a large variation remains in detection statistics even at fixed amplitude.The fact that the amplitude is more correlated with the BF than with the HD S/N can potentially be explained by the the fact that the Bayesian analysis uses uses both autocorrelation and cross-correlation information, while the OS relies solely on cross-correlations.Additionally, both monopole and HD S/N is anticorrelated with the dipole S/N.However, there seems to be no correlation between the HD and monopole S/N.

Individual binaries
We investigate detection prospects of individual binaries and how they relate to background properties by following Bécsy et al. (2022): we calculate the expected S/N of binaries defined as S/N = (s|s), where s is the simulated CW signal, i.e. we calculate the inner product of the waveform with itself.We do so for each binary with randomly assigned extrinsic parameters (sky location, inclination angle, polarization angle, phases), and we find the binary with the highest S/N in each real-ization.Figure 6 shows the S/N and GW frequency of that best binary in each realization, along with the log 10 A 0.1yr −1 and γ GWB values we find in our CURN analysis.Similarly to Bécsy et al. (2022) we find that the the highest-S/N sources tend to be found at moderate frequencies around 2-10 nHz.We find a higher mean S/N, which is not surprising given the inclusion of additional pulsars and updated noise models of NANOGrav 15-year pulsars.
We calculate the Pearson correlation coefficient (r) for each parameter pair and show it on Figure 6 where the correlation is significant (p-value < 10 −3 ).The most significant correlation is shown between the CW S/N and log 10 A 0.1yr −1 , with r = 0.51.This can be attributed to the fact that the CURN analysis does not include a CW model, so if a CW is present, the recovered amplitude is expected to be biased high.We also find significant anti-correlation between the CW S/N and γ GWB (r = −0.26).This is probably due to the fact that γ GWB can be biased low by a significant individual binary at moderate frequencies as we will see below.Finally, the only parameter that correlates with the frequency of the loudest binary is γ GWB , showing a negative correlation with r = −0.34.This is understandable by the fact that the higher the binary's frequency is, the more it can bias the GWB slope low.
To further investigate the covariance between individual binaries and the GWB, we analyzed a single realization with the model of an individual binary and CURN (CURN+CW).For this analysis we used QuickCW (Bécsy et al. 2022;Bécsy et al. 2023), a software package that builds on the enterprise (Ellis et al. 2019) and enterprise extensions (Taylor et al. 2021) libraries, but uses a custom likelihood calculation and a Markov chain Monte Carlo (MCMC) sampler tailored to the search for individual binaries.We selected this realization randomly under the constraints that it has a CW S/N>5 and a monopole S/N>1.5.The former ensured that there is a detectable individual binary in the data, while the latter was motivated by the subdominant monopolar signature found around 4 nHz in the NANOGrav 15-year dataset (Agazie et al. 2023a), which was also identified by the individual binary search carried out using that dataset (Agazie et al. 2023f).The selected realization has a monopole S/N of 1.8, a dipole S/N of −1.33, an HD S/N of 2.29, an HD vs CURN BF of 77, a log 10 A 0.1yr −1 = −14.4,a CW SNR of 6.2, and a CW frequency of 9 nHz.
Analyzing this dataset with the CURN+CW model we found that the CW signal was clearly detected, with a CURN+CW vs CURN BF of ∼25.Moreover, the parameter recovery of the CURN process was significantly affected by the inclusion of the CW model.Figure 7 shows the distribution of γ GWB and log 10 A under the CURN (red), HD (orange), CURN+CW (black), and HD+CW (blue) models.Posteriors with models including HD were produced by reweighting (Hourihane et al. 2023).We can see that in the case of the CURNonly and HD-only models, the unmodeled binary biases the γ GWB recovery low compared to the canonical 13/3 value, while the CURN+CW and HD+CW models produce γ GWB posteriors centered on values close to 13/3.This is understandable, as the CURN model can partially model the CW signal by introducing more power at higher frequencies where the CW is present.This is also consistent with the correlation between γ GWB and CW properties we found on Figure 6.While we cannot draw definitive conclusions from a single realization, this result suggests that modeling individual binaries along with the GWB may be crucial for unbiased spectral characterization of the GWB.We leave a detailed investigation of this problem to a future study.  .Distribution of GWB parameters under the CURN (red), HD (orange), CURN+CW (black), and HD+CW (blue) models for a particular realization with a clearly detectable binary (S/N = 6.2).Not modeling the CW signal biases the spectral slope low, while once we include it we recover the expected γ = 13/3 value.Note that in this particular example, the CURN and HD models show very similar posteriors, while the spectral recovery is slightly different between the CURN+CW and HD+CW models.
It is important to note that while the NANOGrav 15year analysis recovers a γ GWB value lower than 13/3, that does not significantly change with the inclusion of an individual binary model (Agazie et al. 2023f), and the γ GWB recovery is instead more affected by the particular noise models employed (Agazie et al. 2023a).This is unlike the EPTA DR2 analysis, where the inclusion of an individual binary completely changes the GWB recovery (Antoniadis et al. 2023b).This highlights that both a misspecified signal model and noise model can have adverse effects on parameter estimation.
Note that in this particular realization, the recovered spectra under the CURN and HD models are practically identical (red and orange on Figure 7).This is not particularly surprising, as this forms the basis of the reweighting technique (Hourihane et al. 2023).In the presence of an individual binary, the inclusion of correlations does have a small effect on the recovered spectrum (compare black and blue on Figure 7), but the difference is small enough that the reweighting technique can be effective.
In addition, the significance of the individual binary does not change significantly after reweighting.We show the distribution of some individual binary model parameters on Figure 8. Black shows posteriors from the CURN+CW run, while blue shows those reweighted to HD+CW.Red indicates the true parameters of the signal.In both cases, the binary is clearly detected.The reweighted posterior distributions are similar to the original ones, but are less constrained.We can also see that the recovery of some parameters show non-negligible bias.This run is well-converged, and QuickCW has been shown to produce unbiased parameter estimates (Agazie et al. 2023f) on CURN+CW simulations.Thus we suspect this bias is due to model mis-specification, e.g. because the background is modeled with a power-law spectrum, or because we assume there is only one significant binary present.In particular, the unmodeled contributions of subdominant binaries at similar frequencies could potentially result in strong biases in some parameters as the sampler tries to adjust the model to fit contributions from more than one binary.In this particular realizations there are three binaries with frequencies within 1/(2T obs ) ≃ 2 nHz, and amplitudes within a factor of three of the loudest binary.This effect is being investigated in a follow-up systematic injection study, where we analyze a large number of realizations under the CURN+CW model.If this bias is indeed the result of the single-binary assumption, an analysis modeling multiple binaries simultaneously could alleviate the problem.An efficient implementation of such an analysis is under development based on an existing multibinary pipeline (BayesHopper, Bécsy & Cornish 2020) and techniques from the efficient single-binary QuickCW algorithm (Bécsy et al. 2022).

CONCLUSION AND FUTURE WORK
We used state-of-the art statistical data analysis techniques to search for a stochastic GW background in realistic simulated datasets.We employed both Bayesian and frequentist techniques to estimate the parameters of the background and assess its significance.Our simulated datasets were produced based on an astrophysical population of supermassive black hole binaries, and as such the resulting stochastic backgrounds were anisotropic, non-Gaussian, and had non-power-law spectra.We found that standard analysis techniques were able to detect the characteristic Hellings-Downs correlations in these backgrounds, despite the fact that they assume isotropic, Gaussian, power-law backgrounds.
These analyses were also able to correctly characterize the spectrum of these backgrounds most of the time.However, we found the presence of a strong individual binary can bias the recovered amplitude high, and the recovered spectral slope low.We calculated the signalto-noise ratio of the loudest individual binary in each realization and examined how that relates to the background properties.
The fact that standard statistical analysis techniques used by the PTA community can indeed detect the signals from a realistic SMBHB population is reassuring.On the other hand, the possibility that individual binaries can bias spectral recovery is cause for concern, and suggests that joint analyses of individual binaries and the background might become necessary to avoid biased inference.Future studies investigating the interplay between these two source types will be crucial in making sure that any astrophysical interpretation is robust.In addition, it will be important to assess the importance of modeling multiple binaries simultaneously.We also plan to investigate the expected level of anisotropy in the stochastic background based on these realistic simulations using methods described e.g. in Agazie et al. (2023d) and Gardiner et al. (2023).While the results presented in this paper are based on a fixed astrophysical model, investigating the effect of astrophysical parameters on the resulting background and individual binary prospects is also being investigated (Gardiner et al. 2023).
Note that a similar independent analysis was reported concurrently in Valtolina et al. (2023), with results consistent with ours, based on simulated datasets resembling the latest dataset of the European Pulsar Timing Array.This highlights that our results are robust against the choice of the specific dataset we base our simulations on.One difference between the results is that Valtolina et al. (2023) finds that their spectral index recovery is biased high, while we see an unbiased recovery of the spectral index on average (see Figure 2).This is potentially due to the fact that the dataset we consider is significantly longer, which allows access to lower frequencies, where the spectrum is less affected by finite-number effects.

Figure 4 .
Figure 4. HD vs. CURN BFs as a function of the recovered GWB amplitude referenced at f = 0.1yr −1 (blue dots).While there is correlation between the amplitude and the BF, the BFs show significant variance even at a given amplitude.Also shown is the BF and amplitude reported inAgazie et al. (2023a) based on the NANOGrav 15-year dataset (red).

Figure 5 .
Figure 5. Distribution of mean OS S/Ns, HD vs. CURN BFs, and recovered GWB amplitude referenced at f = 0.1yr −1 .Grey lines represent zero S/Ns and unit BFs, and red lines represent the values reported in Agazie et al. (2023a) based on the NANOGrav 15-year dataset.We also indicate the Pearson correlation coefficient for every pair where the associated p-value is below 10 −3 .are centered around zero, while the HD S/N distribution is offset towards positive values.We can see some interesting correlations between values shown on Figure5.To quantify these, we calculate the Pearson correlation coefficient (r) for each parameter pair and show them on the figure where the correlation is significant (p-value < 10 −3 ).The strongest correlation of r = 0.82 can be observed between HD S/N and HD vs. CURN BF.This is reassuring, since these are both

Figure 6 .
Figure 6.Distribution of GWB spectrum parameters (γGWB, log 10 A 0.1yr −1 ) based on CURN analysis, and S/N and log 10 f of the highest-S/N CW source in each realization.Red lines represent the values reported in Agazie et al. (2023a) based on the NANOGrav 15-year dataset.We also indicate the Pearson correlation coefficient for every pair where the associated p-value is below 10 −3 .

Figure 7
Figure7.Distribution of GWB parameters under the CURN (red), HD (orange), CURN+CW (black), and HD+CW (blue) models for a particular realization with a clearly detectable binary (S/N = 6.2).Not modeling the CW signal biases the spectral slope low, while once we include it we recover the expected γ = 13/3 value.Note that in this particular example, the CURN and HD models show very similar posteriors, while the spectral recovery is slightly different between the CURN+CW and HD+CW models.
support of the NSF Physics Frontiers Center Award PFC-1430284 and the NSF Physics Frontiers Center Award PFC-2020265.L.B. acknowledges support from the National Science Foundation under awardAST-1909933  and from the Research Corporation for Science Advancement under Cottrell Scholar Award No. 27553.P.R.B. is supported by the Sci-Fellowship (6456).M.E.D. acknowledges support from the Naval Research Laboratory by NASA under contract S-15633Y.T.D. and M.T.L. are supported by an NSF Astronomy and Astrophysics Grant (AAG) award number 2009468.E.C.F. is supported by NASA under award number 80GSFC21M0002.G.E.F., S.C.S., and S.J.V. are supported by NSF award PHY-2011772.The Flatiron Institute is supported by the Simons Foundation.S.H. is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-1745301.The work of N.La.and X.S. is partly supported by the George and Hannah Bolinger Memorial Fund in the College of Science at Oregon State University.N.La.acknowledges the support from Larry W. Martin and Joyce B. O'Neill Endowed Fellowship in the College of Science at Oregon State University.Part of this research was carried out at the Jet Propulsion Laboratory, California Institute of Technology, under a contract with the National Aeronautics and Space Administration (80NM0018D0004).D.R.L. and M.A.M. are supported by NSF #1458952.M.A.M. is supported by NSF #2009425.C.M.F.M. was supported in part by the National Science Foundation under Grants No. NSF PHY-1748958 and AST-2106552.A.Mi. is supported by the Deutsche Forschungsgemeinschaft under Germany's Excellence Strategy -EXC 2121 Quantum Universe -390833306.The Dunlap Institute is funded by an endowment established by the David Dunlap family and the University of Toronto.K.D.O. was supported in part by NSF Grant No. 2207267.T.T.P. acknowledges support from the Extragalactic Astrophysics Research Group at Eötvös Loránd University, funded by the Eötvös Loránd Research Network (ELKH), which was used during the development of this research.S.M.R. and I.H.S. are CIFAR Fellows.Portions of this work performed at NRL were supported by ONR 6.1 basic research funding.J.D.R. also acknowledges support from start-up funds from Texas Tech University.J.S. is supported by an NSF Astronomy and Astrophysics Postdoctoral Fellowship under award AST-2202388, and acknowledges previous support by the NSF under award 1847938.S.R.T. acknowledges support from an NSF CAREER award #2146016.C.U. acknowledges support from BGU (Kreitman fellowship), and the Council for Higher Education and Israel Academy of Sciences and Humanities (Excellence fellowship).C.A.W. acknowledges support from CIERA, the Adler Planetarium, and the Brinson Foundation through a CIERA-Adler postdoctoral fellowship.O.Y. is supported by the National Science Foundation Graduate Research Fellowship under Grant No. DGE-2139292.
Agazie et al. (2023a)ered median log 10 A 0.1yr −1 and γGWB values from 14-frequency CURN analysis of 773 realizations (blue).Also shown are the values we get from a simple least-squares fit to the first 5 (orange) and first 14 (green) frequency bins of the simulated spectra, and the parameters reported inAgazie et al. (2023a)based on the NANOGrav 15-year dataset (red).Contours represent 99.7% levels (3σ), and dots show the medians.1-dimensional marginal distributions are also shown, with 95.4% intervals (2σ) highlighted for some of the distributions.While the NANOGrav results show a γGWB value lower than the 13/3 value expected from circular GW-driven GWB (gray), it is well within the spread of γGWB values found in our simulated datasets.