Comparing Recent Pulsar Timing Array Results on the Nanohertz Stochastic Gravitational-wave Background

The Australian, Chinese, European, Indian, and North American pulsar timing array (PTA) collaborations recently reported, at varying levels, evidence for the presence of a nanohertz gravitational-wave background (GWB). Given that each PTA made different choices in modeling their data, we perform a comparison of the GWB and individual pulsar noise parameters across the results reported from the PTAs that constitute the International Pulsar Timing Array (IPTA). We show that despite making different modeling choices, there is no significant difference in the GWB parameters that are measured by the different PTAs, agreeing within 1σ. The pulsar noise parameters are also consistent between different PTAs for the majority of the pulsars included in these analyses. We bridge the differences in modeling choices by adopting a standardized noise model for all pulsars and PTAs, finding that under this model there is a reduction in the tension in the pulsar noise parameters. As part of this reanalysis, we “extended” each PTA’s data set by adding extra pulsars that were not timed by that PTA. Under these extensions, we find better constraints on the GWB amplitude and a higher signal-to-noise ratio for the Hellings–Downs correlations. These extensions serve as a prelude to the benefits offered by a full combination of data across all pulsars in the IPTA, i.e., the IPTA’s Data Release 3, which will involve not just adding in additional pulsars but also including data from all three PTAs where any given pulsar is timed by more than a single PTA.

pulsars in the IPTA, i.e., the IPTA's Data Release 3, which will involve not just adding in additional pulsars but also including data from all three PTAs where any given pulsar is timed by more than a single PTA.

Introduction
Pulsar timing arrays (PTAs) seek to detect low-frequency gravitational waves (GWs) by monitoring a collection of millisecond radio pulsars (Foster 1990).When a GW is incident on a PTA, it induces shifts in the times of arrival of radio pulses.These shifts are correlated between pairs of pulsars depending on their angular separation, known as the Hellings-Downs (HD) correlations (Hellings & Downs 1983).
The most likely source of low-frequency GWs are supermassive black hole binaries (SMBHBs), although cosmological and other more exotic sources are also possible (Burke-Spolaor et al. 2019, and references therein).It is expected that an ensemble of SMBHBs can generate a stochastic GW background (GWB) that could be detected first, followed by the detection of individually resolvable SMBHB sources (Rosado et al. 2015).The spectrum of a stochastic GWB of SMBHB origin is affected by the evolution of SMBHBs and their environments (Taylor et al. 2017).In the case of purely GWdriven evolution, the pulsar timing residuals induced by the GWB follow a power law with a spectral index of γ = 13/3 in the convention used in this work, although more realistic backgrounds can deviate from this expectation (Phinney 2001;Sesana et al. 2008;Bécsy et al. 2022).
Currently, several independent groups organized by region are operating PTAs.The International Pulsar Timing Array (IPTA; Verbiest et al. 2016) is a consortium of four PTAs: the European PTA (EPTA; Desvignes et al. 2016), the Indian PTA (InPTA; Joshi et al. 2018), the North American Nanohertz Observatory for Gravitational waves (NANOGrav; Ransom et al. 2019), and the Australia-based Parkes PTA (PPTA; Manchester et al. 2013).Additionally, the Chinese PTA (CPTA; Lee 2016) and MeerKAT PTA (Miles et al. 2023) are not yet IPTA members, but they have observer status within the IPTA.
In 2020 and 2021, the EPTA, NANOGrav, and PPTA reported the discovery of a common uncorrelated red noise (CURN) process in their data but showed no evidence for or against HD correlations (Arzoumanian et al. 2020;Chen et al. 2021;Goncharov et al. 2021).These results were supported by the analysis of the second data release of the IPTA, a combination of older data from these three PTAs, where a consistent CURN process was also found (Perera et al. 2019;Antoniadis et al. 2022).
In Antoniadis et al. (2023a, hereafter EPTA+InPTA), Agazie et al. (2023a, hereafter NANOGrav), and Reardon et al. (2023a, hereafter PPTA), IPTA members presented analyses of their most recent data sets and reported evidence for an HD-correlated GWB with varying levels of significance.In this work we compare the results from these three studies: describing key features and results of the published data sets in Section 2, outlining the signal and noise models in Section 3, comparing the properties of the GWB using published posterior samples and a new factorized analysis in Section 4, and comparing individual pulsar noise properties in Section 5. Finally, in Section 6 we extend each PTA by adding in pulsars that are not timed by that PTA in a "pseudo-IPTA" combination using the factorized likelihood method.Note that this is different from a full data combination, currently underway as part of IPTA's third data release (IPTA-DR3), where the data from every PTA will be combined for each single pulsar using a unified timing and noise model.
The CPTA also reported on an HD-correlated GWB in Xu et al. (2023), which appears broadly consistent with the other PTAs.Unfortunately, we did not have access to those data and were unable to include their results in our study.

EPTAInPTA
EPTA+InPTA reported results from several permutations of the second data release of the EPTA (Antoniadis et al. 2023c).In this work, we consider the analysis using the data EPTA +InPTA named EPTA DR2new+.This data set contains the most recent 10.3 yr of EPTA observations for 25 pulsars.For 10 of these pulsars, about 3.5 yr of additional observations from InPTA were combined.The resulting combination has a total time span of about 11 yr.
The older EPTA data were excluded because they were collected using legacy observing systems.These observations had narrow radio bandwidths and were mostly collected at L band (1400 MHz) only.The lack of radio frequency coverage led to a covariance between dispersion measure (DM) variations and achromatic red noise (RN), potentially polluting the GW information.

NANOGrav
NANOGrav analyzed the NANOGrav 15 yr data set, which contains observations of 68 pulsars (Agazie et al. 2023c).One pulsar was excluded from the GWB analysis, having less than 3 yr of observations.NANOGrav reported a Bayes factor of about 200 in favor of HD-correlated GWB over CURN.This corresponds to a falsealarm probability of about 10 −3 or 5 × 10 −5 (3σ-4σ), depending on the background estimation method.Assuming a power law with fixed spectral index of 13/3, NANOGrav recovered an amplitude -+ -

PPTA
PPTA analyzed the third data release of the PPTA, using observations of 32 pulsars over 18 yr (Zic et al. 2023).Two pulsars were excluded from the GWB analysis because of the lack of data for one and the presence of strong steep RN for the other.The full data release contains additional legacy data, which were also excluded from the GWB analysis.
PPTA reported a Bayes factor of about 1.5 in favor of HDcorrelated GWB over CURN and a false-alarm probability of 0.02 (2σ) based on a likelihood ratio statistic.Assuming a power law with a fixed spectral index of 13/3, PPTA recovered an amplitude -+ -

Analysis Methods
The comparisons of published results reported in this work used samples collected by each PTA from relevant posterior distributions.EPTA+InPTA, NANOGrav, and PPTA all used the enterprise software package to evaluate the likelihood and priors (Ellis et al. 2020;Johnson et al. 2023), and all three PTAs used PTMCMCSampler to collect Markov Chain Monte Carlo (MCMC) samples (Ellis & van Haasteren 2017).All new MCMC runs for this work were conducted using the same enterprise and PTMCMCSampler software stack.

Signal and Noise Models
In three series of papers the PTAs presented a variety of analyses using different signal and noise models.Here we discuss the basic models used and point out where different PTAs do things differently.For a comprehensive look at the noise modeling done by each PTA, see Antoniadis et al. (2023d), Agazie et al. (2023d), andReardon et al. (2023b).

Published Analyses
Each PTA modeled the GWB as a power-law Fourier basis Gaussian process (GP) with HD correlations (van Haasteren et al. 2009;Lentati et al. 2013).The hyperparameters A log 10 HD and γ HD set the characteristic amplitude and spectral index of the power law, which acts as a prior on the Fourier coefficients, which describe the spectrum of the GWB.Each PTA set the fundamental frequency of their Fourier basis as the inverse total observation time, 1/T obs ; however, each PTA had a different T obs , resulting in different sampled frequencies.Each PTA performed an analysis to determine the number of frequencies across which there was a significant signal in their data, and based on this, 9, 14, and 28 frequencies were used in the GWB model for EPTA+InPTA, NANOGrav, and PPTA, respectively.This corresponds to frequency ranges of about 3-28 nHz, 2-30 nHz, and 1.7-47 nHz, respectively.
Each PTA used a similar power-law Fourier basis GP to model intrinsic pulsar RN.Like the GWB, this noise is achromatic: it does not depend on the radio frequency of the observation.Each pulsar had two parameters, A log 10 RN and γ RN , which characterize the power law.The fundamental Fourier frequency for RN matched that for GWB for NANOGrav, while it varied from pulsar to pulsar for EPTA+InPTA and PPTA, corresponding to the observation time of each pulsar.NANOGrav used 30 frequencies to model RN and included RN for every pulsar in their array.EPTA +InPTA performed Bayesian model selection to determine the noise models and related optimal number of frequencies to use for each pulsar.Pulsars with steeper RN spectra required fewer frequencies, the minimum being 10, whereas pulsars with shallower RN spectra required more, the maximum chosen being nearly 150.For some pulsars there was no intrinsic RN detected, so this model was not used for those pulsars.PPTA set the maximum RN frequency to 1/(240 days).For some pulsars, PPTA added an additional high-frequency achromatic noise, modeled as a power law with a maximum frequency 1/ (30 days).This combined model had a steep low-frequency spectrum and a shallow (but not flat) high-frequency spectrum.
One of the main sources of noise in pulsar timing data is DM variations that induce a chromatic (i.e., dependent on the radio frequency ν) delay in the pulse times of arrival proportional to ν −2 .EPTA+InPTA and PPTA used a power-law Fourier basis GP to model stochastic DM variations, known as DMGP.Again, each pulsar gained two parameters, A log 10 DM and γ DM , which characterized the power-law spectrum of deviations away from an initial deterministic fit.The deterministic component included a second-order Taylor series fit to the DM and a solar wind model, which took into account the relative position of the Sun and the pulsar when an observation was made.
EPTA+InPTA performed model selection to determine which pulsars experienced measurable DM variations and to set the number of Fourier frequencies used.PPTA set a maximum frequency of 1/(60 days).
For seven and one of their pulsars, respectively, PPTA and EPTA+InPTA included an additional nondispersive chromatic variation model, scattering noise, which modeled time delays with a ν −4 radio frequency dependence.This model used the same power-law GP framework.For each pulsar with scattering noise, PPTA also added a band noise model, an additional achromatic RN GP that affects only the lowest radio frequency band, ν < 960 MHz.This band noise accounted for excess noise that arose as a result of assuming a fixed chromatic scaling, ν −4 , in the scatter component.
The main NANOGrav results we considered in this work did not use a DMGP, instead using the DMX model to account for DM variations.DMX is a piecewise fit that determines a constant DM for each group of observations that occur near each other in time, effectively fitting a new DM for each pulsar every ∼6 days.This fit was part of the pulsar timing model and was analytically marginalized as part of the GWB search.
PPTA also included additional system noise to model timevarying instrumental noise for five pulsars.This model used the achromatic power-law GP framework, but applied only to observations made with a particular observing system.

Factorized Likelihood Analysis
As described above, the three data sets were processed using distinct timing pipelines, different modeling choices, and even different Fourier bases when the same models were used.To alleviate these procedural differences, we used the factorized likelihood approach (Taylor et al. 2022) to model all the pulsars from the different data sets using a standardized noise model.Despite the possibility that this choice might be suboptimal for some pulsars, this allowed us to make a more direct comparison between the properties of the noise and the GWB as seen by the difference PTAs.
In the factorized likelihood analyses, we used the NANO-Grav 15 yr data set (Agazie et al. 2023c), the PPTA DR3 (Reardon et al. 2023a), and the DR2New+ data set, which is a combination of the newest EPTA data with the InPTA data (Antoniadis et al. 2023b).For these data sets, we selected a global time span based on the longest observed pulsar in the data set, T span = 18.9 yr.This T span was used to define the fundamental frequency for all models with a Fourier domain GP.The frequencies in all Fourier bases were integer multiples f = n/T span , where The standardized noise model contained four pieces: intrinsic pulsar RN, interstellar DM variations, a deterministic solar wind model, and a fixed spectral index CURN to account for the GWB.The intrinsic RN was modeled using = n 30 max Fourier components for all pulsars.Each pulsar had two hyperparameters characterizing the spectrum: amplitude, A log 10 RN , and spectral index, γ RN .We modeled the DM variations in each pulsar using an = n 100 max -component power-law GP (e.g., Antoniadis et al. 2023d).This necessitated the removal of the DMX parameters in the pulsar timing models provided by NANOGrav, which were replaced by Taylor series expansions of DM up to a second-order polynomial term, matching the DM modeling done by EPTA +InPTA and PPTA.Each pulsar had two additional hyperparameters characterizing the DM variation spectrum: amplitude, A log 10 DM , and spectral index, γ DM .The deterministic solar wind was modeled as described in Hazboun et al. (2022), with the solar wind free electron density at Earth, n Earth , as the only free parameter.The pulsar J1713+0747 has shown evidence for multiple chromatic events, causing sudden radio-frequencydependent delays that relax over time, and two of these events are contained within the data sets analyzed here.We modeled these events as a deterministic exponential delay, where the amplitude, epoch, recovery timescale, and radio frequency dependence (i.e., ν − x ) are parameters of the model (Antoniadis et al. 2022(Antoniadis et al. , 2023d)).Finally, the CURN was modeled as a powerlaw process with = n 15 max Fourier components and a fixed spectral index of γ CURN = 13/3.The amplitude, A log 10 CURN , was left as a free parameter.After the individual pulsar analyses were completed, the posterior distributions for the individual A log 10 CURN values were combined to determine a truly joint posterior for the CURN amplitude.

Comparing the Published GWB Measurements
In previous IPTA work, we compared the posterior distributions of GWB parameters as observed by different PTAs using the Mahalanobis distance, which assumes the posteriors to be multivariate Gaussian distributions (Antoniadis et al. 2022).In this work, we adopted a non-Gaussian tension metric proposed by Raveri & Doux (2021) to assess the tension between the posterior distributions, obtained by different PTAs.To compute the metric, one first constructs a difference distribution by drawing a sample from each posterior distribution and taking the difference between the two, resulting in a single point of the difference distribution.This process is repeated until the difference distribution is sufficiently mapped.Thereafter, one determines the probability that the difference is zero by integration.The tension metric is then the Gaussian-equivalent "σ" value corresponding to the zero-difference probability.We performed a detailed comparison of all GWB and noise parameters using a modified version of the tensiometer144 package, some of which is presented here.The full comparison including visualizations for all GWB and noise parameters is available as supplementary material145 (IPTA 2024).
In EPTA+InPTA, NANOGrav, and PPTA, each PTA searched for an HD-correlated GWB with a power-law spectrum described by a spectral slope, γ, and characteristic amplitude, A log 10 .The 2D marginal posterior for these parameters from each PTA is shown in the right panel of Figure 1.To determine the joint posterior, we computed the simple intersection of the three individual PTA posterior distributions, by multiplying them and renormalizing.We did not weight the individual distributions in any way during this process.This is not, strictly speaking, fair, as the posteriors are not independent; some of the pulsars were observed by multiple PTAs.However, this simplistic combination shows that combining the output of the three experiments can provide an improvement in parameter estimation.
The left panel of Figure 1 shows the free spectral posteriors from each PTA.The individual violins show the measured HDcorrelated GWB amplitude at the corresponding frequency.Additionally, samples from the joint GWB posterior parameters are plotted as power laws.The samples give a sense of the uncertainty of power laws that are in good agreement with all of the individual free spectral results.
Figure 2 shows the difference distributions for the power-law GWB parameters for each pair of PTAs as computed by tensiometer.There is excellent agreement between all three pairs, as the tension metric is less than 1σ in all.Note that EPTA+InPTA had the widest posterior distribution, resulting in wider difference distributions for the two combinations involving those data.While the numerical difference between NANOGrav and PPTA is small, the tension is larger owing to the narrower posterior distributions.

Comparing the GWB Sensitivity of PTAs
A commonly used measure of GW detector performance is a frequency-dependent "sensitivity curve."This metric, which estimates the smallest amplitude of a GW-induced signal that a detector would detect, is often used in the GW community to assess detector performance (see Moore et al. 2014;Hazboun et al. 2019b;Kaiser & McWilliams 2021, and references therein).The hasasia (Hazboun et al. 2019a) package offers a means to efficiently compute such curves for PTAs.Specifically, the sensitivity curves we compare here are the sensitivity to interpulsar cross-correlations induced in the PTA by a GWB.As input, hasasia uses the original time-ofarrival data and the median noise parameters for all noise processes, including the GWB autocorrelations, which act as noise when trying to detect the cross-correlations.While the power-law GWB models that are used in actual searches are restricted to a small, low-frequency range, hasasia can determine the sensitivity to GWs at any frequency.This allows us to generate sensitivity curves from 1/T obs to the Nyquist frequency for each PTA.The low sensitivity at high frequency shows that each PTA's choice to cut off their GWB model at ∼30-50 nHz would not affect the efficacy of their searches.
In order to generate sensitivity curves for EPTAInPTA and PPTA, we made a few modifications to hasasia.This is because hasasia accounts for white noise and achromatic RN only.For analyses like NANOGrav, which modeled DM variations using DMX (which appears in the timing model), this is sufficient, but it is not for analyses that use DMGP, like EPTA+InPTA and PPTA.The system and band noise models used by PPTA must also be accounted for. 146he three resulting sensitivity curves are presented in Figure 3 and show the relative sensitivity of the PTA data sets.The general behavior of the curves follows the simple expectations based on the intrinsic properties of each data set.The differing lowfrequency sensitivity supports the reported evidence from each PTA for the presence of an HD-correlated GWB signal.The NANOGrav data set shows the best low-frequency sensitivity, reaching slightly lower frequencies than EPTAInPTA owing to the longer observing baseline of analyzed data.The PPTA data set that spans the longest time extends to the lowest frequencies but does not achieve the same low-frequency sensitivity as the other two.At the higher frequencies, the EPTAInPTA and PPTA data sets are more sensitive than NANOGrav owing to their higher observing cadences with observations occurring every ∼3 and ∼7 days, respectively.NANOGrav suffers at the high-frequency end owing to its lower observing cadence of roughly 30 or 14 days, depending on the pulsar.

Common Uncorrelated Red-noise Amplitude
In order to make a fair comparison of the observed GWB properties, we reanalyzed each PTA's data using the standardized noise models described in Section 3.2.In place of a full Bayesian analysis searching for HD-correlated GWB, which is computationally expensive, we used the factorized likelihood approach.Using this method, individual pulsars were analyzed independently in parallel and the results combined to arrive at a posterior distribution for the amplitude of CURN, A log 10 CURN , assuming a fixed spectral index of γ CURN = 13/3.This method did not include interpulsar crosscorrelations but acted as a good proxy to quickly determine the spectral properties of a common signal like the GWB.
We applied the standardized noise model described in Section 3.2 to every pulsar with a timing baseline longer than 3 yr.Only four pulsars were dropped owing to this time cutoff: J0614−3329 was dropped from NANOGrav, and J0900 −3144, J1741+1351, and J1902−5105 were dropped from PPTA.Each pulsar was independently analyzed, and the posteriors for the pulsars from a given PTA were combined, resulting in the CURN posteriors shown in color in Figure 4.There is broad agreement between the PTAs, and these new results agree well with the fixed spectral index GWB amplitude reported by the PTAs and stated in Section 2. The black boxes in Figure 4 are based on extending the individual PTA data sets and will be discussed in Section 6.2.

Dropout Factors
The factorized likelihood method can also be used to determine a pulsar's level of support, or lack thereof, for the CURN seen by the rest of the array (Aggarwal et al. 2019;Arzoumanian et al. 2020;Taylor et al. 2022).This is done by using leave-one-out cross-validation, comparing the ratio of Bayesian evidence for a CURN seen by the entire array versus that of the array without the pulsar in question.This calculation gives a dropout factor (DF) for each pulsar, where a DF greater than 1 means that the pulsar is consistent with the common signal seen by the rest of the PTA, and a DF less than 1 means that the pulsar is in tension with the rest of the PTA.A DF near 1 means that the pulsar does not have strong support for or against the common signal.
The left panel of Figure 5 shows the distribution of DFs for each PTA.Many pulsars, particularly those with short observing baselines, had a DF near unity.The right panel of Figure 5 shows the DFs for all pulsars observed by more than one PTA, with uncertainties determined by bootstrap resampling (which are very small).In some cases there was broad agreement, with pulsars like J1909-3744 and J1024-0719 having DFs greater than 1 and less than 1 for all PTAs, respectively.However, there are notable discrepancies for several pulsars examined.In some cases the differences were driven by observation time.If a PTA had a short observing baseline for a pulsar, it should not be sensitive to lowfrequency effects, and it should have no support for or against CURN.This was seen in J0030+0451, where PPTA has only 4 yr of data, and the other two PTAs have more than 10 yr.In J2124−3358 the case was reversed, with NANOGrav having only 3.5 yr of data.
Three additional pulsars, J1744−1134, J1600−3053, and J1857+0943, were observed with long timing baselines by each PTA, but this analysis resulted in disagreement in the DFs Each PTA used a different Fourier basis set by their own maximum observing time.The semitransparent gray lines are 100 samples from the joint 2D power-law posterior distribution, showing the spread of power-law models that are consistent with all of the PTA's data.Right: 2D posterior for HD-correlated power-law GWB parameters.Contours show 68%, 95%, and 99.7% of the posterior mass.The vertical dotted line is at γ = 13/3.among the PTAs.Because each pulsar is compared to the common signal as seen by its own array, it is challenging to understand what these differences mean.Using the forthcoming IPTA-DR3, where the data are fully combined, will provide a better opportunity to understand how these pulsars contribute to the detection of a common signal in the full IPTA.
In PPTA, J1744−1134 is one of four pulsars called out for being in tension with the rest.This status persists in our analysis.Interestingly, J1713+0747 had a high DF when using the PPTA data in this analysis, while in PPTA it had the lowest support for CURN.The reverse is true for J1600−3053, which had a high DF in PPTA but a low DF in this analysis.In all three cases the difference is likely attributed to noise modeling.Our standardized noise model differs from those employed in PPTA, where several additional components were modeled for each of these pulsars.This is discussed further in Section 5.

Comparing Interpulsar Correlations
Included in both the EPTA and NANOGrav analyses were searches for cross-correlation signals using a frequentist approach called the optimal statistic (Anholm et al. 2009;Chamberlin et al. 2015).This approach estimates the power of a cross-correlated GWB signal, Âgw 2 , by using a varianceweighted least-squares fit of individual pulsar pair correlated power.This results in a signal-to-noise ratio (S/N) for a crosscorrelated GWB.The noise-marginalized optimal statistic (NMOS) takes into account the posterior spread of the noise and CURN, resulting in a distribution of Âgw 2 and S/Ns (Vigeland et al. 2018).Note that Romano et al. (2021) showed that the Âgw 2 recovery is biased in cases where the amplitude has a comparable effect on the residuals to the noise.This can be attributed to the fact that the optimal statistic does not account for pulsar pair covariances from a common GWB.However, the use of S/N as a detection statistic is still valid.As a result, we will focus only on the S/N calculated using the optimal statistic framework.
The process of using a factorized likelihood analysis with the NMOS has been detailed within Taylor et al. (2022).In this case, we chose to marginalize over the CURN and the intrinsic pulsar RN only, while holding the DMGP parameters fixed at their   maximum likelihood values to reduce the computational complexity of the process.The resulting S/N distributions for each PTA are shown with colors in Figure 6.We measure a median S/N of 2.2, 4.9, and 1.0 for EPTA+InPTA, NANOGrav, and PPTA, respectively.The black boxes show extended data sets that will be discussed in Section 6.2.
When comparing these NMOS results with those published by NANOGrav and EPTA+InPTA, there are some discrepancies.For EPTA+InPTA there was a decrease in S/N from around 4 to just over 2, while for NANOGrav there was an increase in S/N from roughly 4 to 5. Both of these differences could be attributed to the differences in pulsar noise modeling.Our analysis modeled each pulsar with the standardized noise model, using a Fourier basis set with a time span of just under 19 yr.This is in contrast to the EPTA+InPTA published results, in which they used data-driven customized noise models for each pulsar.This, combined with the fact that the time span used to define the Fourier basis here was significantly longer than the EPTA+InPTA data set, may explain the overall lower S/N.For NANOGrav, our use of DMGP differed from the fiducial DMX model and could account for our observed increase in S/N.

Comparing Pulsar Noise Properties
While white noise arises during a particular observation and is partially associated with instrumental effects, other noise and signals can be astrophysical in nature and should be seen by all PTAs.As discussed in Section 3, each PTA had its own methods to model noise.Whatever the method, each PTA accounted for the two main expected noise components: achromatic RN intrinsic to each pulsar and DM variations, respectively referred to as "RN" and "DM" in what follows.Checking for consistency in these recovered noise model parameter posteriors acts as a further check of agreement between individual PTAs.It should be noted that our standardized noise models differ from the noise modeling done by each PTA.In particular, they are simpler than those used by EPTA+InPTA and PPTA and may not be the ideal noise model for many of the pulsars considered.
There are 11 pulsars that were present in all three data sets, and 10, 1, and 5 pulsars were shared among EPTA+InPTA and NANOGrav, EPTA+InPTA and PPTA, and NANOGrav and PPTA, respectively (see Figure 9).Here we compare their noise properties measured with each data set (1) as reported in the published posterior samples from noise analyses and (2) from new noise analyses that used the standardized models described in Section 3.2, but omitting the CURN component, marginalizing over solar wind electron density error from the timing model, and setting the fundamental Fourier frequency to be 1/T all , with T all the time duration between the first and the last observing epoch among all data sets.
Table 1 shows the tension metric computed for the 27 pulsars that were observed by multiple PTAs.For each pair of PTAs, the tensions for the RN (ΔRN) and the DM variation (ΔDM) 2D posterior distributions as reported in the published Figure 6.The NMOS distributions of S/N for each single PTA and combination of PTAs represented as box and whiskers.The boxes contain 50% of the distribution mass, and the center line marks the median.The whiskers contain 95% of the distribution mass.Note that the S/N measured for the EPTAInPTA data set is lower than that reported in EPTA+InPTA owing to our choice of Fourier basis frequencies being suboptimal for that data set.results (left) and obtained with the new analysis (right) are shown.The large majority of cases showed consistent results among the PTAs for both RN and DM variations.For most cases, the consistency was improved with the new analysis using standardized noise models.For some pulsars, significant tension can be explained by differences in observing time span or differences in observed radio bandwidth.The tensions 3σ are highlighted in the table, and the main discrepancies are discussed in more detail below.
When comparing the published results of EPTA+InPTA and NANOGrav, only the RN of PSR J1012+5307 has a significant tension (here >3.5σ), which is greatly reduced when using the standardized noise models.Figure 7 shows the timedomain realizations of the achromatic RN from the new analysis, which shows common features between the two data sets that are consistent with those reported in Antoniadis et al. (2023d).The tension of 2.3σ for RN of PSR J1857+0943 reflects the fact that the achromatic RN is well constrained Note.If one PTA did not observe the pulsar, "Ø" is shown.If one PTA did not use the relevant noise model, "−" is shown.Values on the left side of the column are calculated using the published posteriors from individual PTAs, which make different noise modeling choices.Values on the right side of the column are calculated using the new noise analysis, in which data from all PTAs are analyzed using the same noise models.Instances with tension 3σ are in boldface.
Comparing the published results of EPTA+InPTA and PPTA, we observed two and six tensions larger than 1σ for RN and DM variations, respectively.The use of standardized noise models significantly reduced the tension in RN for PSR J0900−3144 and PSR J1713+0747 and the tension in DM for PSR J1022+1001, PSR J1713+0747, PSR J1744−1134, PSR J1909-3744, and PSR J2124−3358.However, it did not improve DM consistency for PSR J0613−0200.Interestingly, the tension grew for the RN of PSR J1022+1001 and PSR J1909−3744 and for the DM of J1600−3053 and J1857+0943.Nevertheless, the time-domain realizations for the last three were highly consistent between EPTA+InPTA and PPTA.The slight discrepancy (<2σ) in the noise posteriors for these pulsars was likely caused by the significant difference of observing time spans.
The tension between the published RN posteriors in NANOGrav and PPTA was larger than 1σ for six pulsars and significantly reduced after using standardized noise models for five of them.This clearly shows how important the choice of noise modeling is on the observed RN, which for the noise analysis implicitly included any GWB signal.The new noise analysis resulted in more tension for PSR J0613−0200 and PSR J1744−1134, which are discussed below.The shorter observing time span of NANOGrav for PSR J1730−2304 (∼3 yr vs. >17 yr) and PSR J0437−4715 (∼5 yr vs. >15 yr) was very likely the cause of the large tension for these two pulsars.For PSR J1832−0836, the tension of 2.4σ for DM is likely caused by a lack of cadence and radio frequency coverage for PPTA.
Let us now focus on the main remaining inconsistencies: 1. PSR J0030+0451-This pulsar is known for its low ecliptic latitude, yielding significant solar wind effects that contribute to DM variations.The measured RN was consistent among the three data sets (<1σ).However, the posterior distributions for the DM were fully unconstrained for EPTA+InPTA and PPTA and highly constrained to a flat power law for the NANOGrav.The lack of sensitivity was likely caused by the low radio frequency resolution for EPTA+InPTA and due to the short observing time span for PPTA (∼4 yr). 2. PSR J0613−0200-The DM variations were highly consistent between NANOGrav and PPTA (0.7σ), but EPTA+InPTA displays a flatter-spectrum power law.This difference was likely caused by a longer time span of low radio frequency (<1 GHz) observations for the former two PTAs.The tension metric for the RN showed greater consistency between EPTA+InPTA and PPTA than for NANOGrav.However, the posterior distribution was poorly constrained for EPTA+InPTA and was constrained to differing values in the other two, as seen in Figure 7.Despite this, the time-domain realizations in Figure 7 show a consistent long-term trend for NANOGrav and PPTA.3. PSR J1022+1001-Observations of this pulsar are more affected by the solar wind owing to its low ecliptic latitude (Tiburzi et al. 2021).It also exhibits behavior consistent with profile evolution (Padmanabh et al. 2020, and references therein).Both of these factors made its chromatic and achromatic noise components more difficult to model than other pulsars.When using the standardized noise models, the DM variations between EPTA+InPTA and PPTA were broadly consistent (2σ).The short observing time span for NANOGrav (∼5 yr) compared with the other two (>10 yr) was a potential cause for the observed tension, yielding a consistent spectral slope but a higher amplitude at 1 yr −1 .The RN was poorly constrained for both EPTA+InPTA and NANOGrav.When using standardized noise models, the tension between EPTA+InPTA and PPTA increased from <0.1σ to 2.5σ, with the posterior distribution changing from unconstrained to constrained to a flat power law for PPTA.4. PSR J1640+2224-The tension metric for the RN was only 0.2σ between EPTA+InPTA and NANOGrav, where the former was unconstrained and the latter found a slightly constrained steep power law.However, the DM variations had a tension of 2.3σ, where EPTA+InPTA had a broader constraint for a very flat power law.As expected, we observed a better-constrained posterior distribution on DM variations for NANOGrav, which had better radio frequency coverage, with low-frequency data (<500 MHz) for the entire observing time span.This pulsar was not part of PPTA DR3. 5. PSR J1713+0747-The DM variation parameters were highly constrained and mainly consistent for the three data sets.The tension of 1.9σ between NANOGrav and PPTA reflects the difference in the amplitude, which was slightly larger for the latter.All the tensions for the RN are also less than 2σ, but slight differences were observed.The RN power law was flatter for EPTA +InPTA compared with the other two.Despite the very low tension (<0.1σ) between NANOGrav and PPTA, the posterior distributions contained visible differences: while the first was constrained to a single peak, the second was bimodal, with one mode consistent with NANOGrav and the second favoring a steeper power law (γ > 4.5).6. PSR J1744−1134-The constraints on DM variations were very consistent among the three data sets.Despite the low tension metric for the RN, we observed different behavior for each data set: unconstrained for EPTA+InPTA, steep power law broadly constrained for NANOGrav, and flat power law highly constrained for PPTA.The flat RN spectrum seen by PPTA is indicative of excess unmodeled white noise, not captured by our simpler standardized noise model.7. PSR J2124−3358-The RN was consistent among the three data sets, being fully unconstrained for EPTA +InPTA and NANOGrav but poorly constrained to a steep power law with PPTA.This behavior appears to be driven by the data set time spans, which are ∼4, ∼10, and >17 yr for NANOGrav, EPTA+InPTA, and PPTA, respectively.However, the DM variations were fully unconstrained for PPTA and poorly constrained for the two others, favoring flat power laws.
The use of standardized noise models allowed us to show high consistency among the three data sets.The remaining discrepancies discussed above will be studied in detail during the preparation for the full data combination of the IPTA-DR3.All plots produced for this analysis are available as supplementary material (see footnote 145; IPTA 2024).

Combined Free Spectral Refit
A realistic GWB spectrum will likely have more complicated features than a simple power law.The γ = 13/3 power law is the simplest model for a population of circular SMBHBs that are driven toward inspiral purely as a result of GW emission (Phinney 2001), although realistic backgrounds will deviate from this primitive spectral model (Sesana et al. 2008;Bécsy et al. 2022).For example, three-body interactions between the SMBHB and stars in the loss cone and interactions between the SMBHB and circumbinary gas disk will also drive the inspiral at larger SMBHB separations (Sesana 2013;Burke-Spolaor et al. 2019).This attenuates the GWB spectrum at lower frequencies, creating a "turnover" spectrum (Sampson et al. 2015).Detecting such a turnover could further resolve the "last parsec problem" (Milosavljević & Merritt 2003;Khan et al. 2013;Vasiliev et al. 2014;Agazie et al. 2023a;Antoniadis et al. 2023a), though we note that other sources have been proposed as a source of the GWB that can also produce a turnover spectrum (Antoniadis et al. 2023a;Afzal et al. 2023).This turnover spectrum can be modeled using two power laws with different spectral indices and a bend frequency where the transition occurs.
Typically, the GWB is assumed to be a GP.However, if SMBHBs are the source of the signal, it is possible that the occupation fraction of SMBHBs emitting in each frequency bin is too small and will cause a deviation from the Gaussianity assumption.To account for this possibility, we can apply a weight factor to each frequency of the standard power law that is distributed by a Student's t-distribution, modeling non-Gaussian deviations from the power-law spectrum.This model is known as a "t-process" (Agazie et al. 2023b).A localized peak in the spectrum will result in a large weight measured at that frequency.
The forthcoming IPTA-DR3 combined data set will allow for a precise search for turnovers or deviations away from a pure power law in the GWB spectrum.However, we can make a pseudo-IPTA combination using the fast spectral refit methods of Lamb et al. (2023), where we can estimate the recovered spectral index and amplitude by simultaneously refitting to the published free spectral posteriors of the three data sets (see Figure 1).We used the ceffyl (Lamb et al. 2023) software package to represent each posterior with highly optimized kernel density estimators (KDEs).For a given set of spectral parameters, we computed the probability density of the GWB spectral model at each frequency according to the KDE.The effective likelihood is simply the product of these probability densities over frequency bins.We confirmed that the posterior recovered when fitting a power law using this method is consistent with the joint posterior shown in Section 4.1.
The posteriors for this joint Bayesian fitting are shown in Figure 8.We recovered consistent posteriors across the turnover, t-process, and power-law models, which suggests that no spectral features beyond a power law are favored by the data.The turnover spectrum had a broadened posterior, showing more support for the γ = 13/3 power law.However, this was because the weak constraints on the first frequency bins of each data set allowed some support for a turnover near the low-frequency cutoff, and hence a steeper, higherfrequency spectral index than for a pure power law.The tprocess did not show significant support for any deviations away from the power law, except at high frequencies, where the spectrum was white noise dominated.There are some caveats to this result.As explained previously, each PTA used different methods to model their noise sources, such as DM variations.In addition, some pulsars were included in two or more of the data sets and are therefore being "double counted" in this pseudo-IPTA combination.

Extending Individual PTAs Using Factorized Likelihood
The flexibility and speed of the factorized likelihood method introduced in Section 3.2 provide an immediate approach to extending individual PTAs by sequentially adding pulsars observed by one PTA to those from another (being careful to only have one version of each pulsar) to achieve a pseudo-IPTA data set.We did this pseudo-IPTA construction in a piecewise fashion, where we first chose a "base" PTA data set and then added in pulsars from other data sets that are not timed by the base data set.
Since we only add the pulsars not already present in the data set, this addition operation is not commutative.For example, adding the new pulsars from EPTAInPTA to NANOGrav means that the pulsars in common will use NANOGrav data.The numbers of overlapping pulsars for each PTA are shown in Figure 9, where EPTAInPTA is represented by the blue circle, NANOGrav by the orange circle, and PPTA by the green circle.Combining all three data sets results in six different permutations to consider.This is similar to the approach of C. M. F. Mingarelli et al. (2024, in preparation) to combine individual pulsars from different PTAs in "lite" combination.
The effect of adding pulsars within the factorized likelihood analysis, producing a pseudo-IPTA data set, can be seen in the set of CURN posteriors that are shown in black in Figure 4.In each case, adding pulsars drawn from the other PTAs resulted in a more constrained CURN amplitude posterior than given by the original PTA alone, with a median thinning of the 68% credible interval by 15%.As a direct result of this, the Bayes factor for CURN over intrinsic RN alone also increased, with a median increase of 7 orders of magnitude.This is consistent with the findings of the IPTA-DR2 analysis (Antoniadis et al. 2022), where improvements to parameter estimation and detection significance are observed when PTA data sets are extended.
Starting from each of the factorized likelihood CURN posteriors from the above analysis, we again used the NMOS to determine the S/N distributions for each pseudo-IPTA data set.This method shows how adding in pulsars from other PTAs affects the significance of correlated power.
Figure 6 shows each individual PTA's NMOS S/N recovery alongside each possible pseudo-IPTA data set.These S/N distributions show that it did not matter which PTA we started with; adding additional pulsars will always result in a higher S/ N.This is consistent with scaling relations for the optimal statistic found in Siemens et al. (2013) and shows the strong promise of a full data combination.
Comparing the different combinations with each other, we see that there is a preference to the ordering of PTA additions.
Starting with NANOGrav data first in the process results in the highest S/Ns, and adding EPTAInPTA data before PPTA data results in higher S/Ns.These preferences are consistent with each PTA's individual S/N.

Fully Extended PTA Analysis
The extended analyses described in the previous section combined data using only one PTA's version of each pulsar at a time.It is possible to combine the full output of each PTA if extra care is taken to account for the fact that multiple copies of the same pulsar appear.This analysis is akin to a CURN factorized likelihood, where each PTA's CURN output is combined into a single joint posterior that is in turn passed to the NMOS.
There are two major complications.First, since the NMOS is a cross-correlation statistic, the correlations between the same pulsar as found in different PTA data sets must be ignored.Additionally, when we marginalize over the CURN and noise parameters, we must use a carefully constructed joint posterior that ensures that duplicate pulsars share common noise parameters.
This process begins with new CURN MCMC analyses for each PTA using the standardized noise model described in Section 3.2.The normalized product of the 2D marginalized posteriors for the CURN amplitude and spectral index from each PTA is the joint posterior, which we modeled as a KDE.We drew 10,000 samples from this KDE to use for marginalization.For each CURN sample we must determine the corresponding noise parameters for each pulsar.We did this by finding the nearest neighbor, i, in each PTA's MCMC samples  This process resulted in many duplicate MCMC samples, due to the restricted volume of the joint parameter space.We opted to use only unique samples.
The dashed box and whiskers of Figure 10 show the single PTA NMOS S/N for each PTA when using the new joint CURN posterior.While EPTAInPTA and NANOGrav show little change from the single PTA CURN analyses, PPTA shows an increased S/N.The PPTA CURN posterior had more mass at high spectral indices even compared to the PPTA HD posterior, whereas the joint posterior was much more concentrated near γ ∼ 4. When the high-γ region of parameter space was excluded from the PPTA analysis, more significant correlations were found.The dashed box and whiskers are more comparable to Figure 6, which used fixed spectral index γ = 13/3, than the solid ones.
The black box and whiskers of Figure 10 show the S/N for the fully extended data set using all pulsars from all PTAs.Here we found a notable increase in the S/N compared to any individual PTA and compared to the best single pulsar extension from Section 6.2.
We stress that it is challenging to understand what this increase in S/N means for detection significance, as the background distribution for the NMOS is known to be non-Gaussian (Hazboun et al. 2023).Determining the false-alarm probability for this new, ad hoc data extension would require many data simulations that are beyond the scope of this work.

Conclusions
We performed a comparison of the noise models used and the properties of the GWB recovered in the most recent data sets of EPTA+InPTA, NANOGrav, and PPTA.We found that a majority of the noise parameters were consistent between the three different data sets.Where there were significant (>3σ) differences, they could be attributed to different time spans and cadences, as well as the lack of frequency coverage in individual PTAs.These tensions also reduced significantly when standardized noise models were adopted.We also calculated and compared the relative sensitivity of the three data sets using sensitivity curves, showing that the different levels of reported evidence for the GWB were consistent with the sensitivities of each of these data sets.
Despite the different noise models used by each PTA, the GWB posterior distributions for all three PTA data sets were consistent within 1σ as calculated by the tension metric.The GWB spectra measured from all three PTA data sets were consistent with a single, "joint" power-law spectrum, with no evidence for deviations from this power-law spectral template.
Finally, we extended each of the three data sets using the factorized likelihood approach, to achieve a pseudo-IPTA analysis.The amplitudes of the CURN process estimated from a global posterior using different permutations of PTA extensions were found to be broadly consistent with each other, as well as with those reported by the individual PTAs.The addition of pulsars to any PTA also resulted in an increase in the measured HD S/N.We also performed a fully extended analysis using all of the pulsars from all of the PTAs, yielding a further increase in S/N.
The members of the IPTA, along with the MeerKAT PTA (Miles et al. 2023), are currently in the process of combining their most recent data sets, which will become IPTA-DR3.The comparisons presented here motivate improvements and best practices to be adopted in a unified analysis for the ongoing data combination by the IPTA.Based on our work, we believe that choosing the right noise model for each pulsar through Bayesian model selection will be an important step for future IPTA analyses.In particular, great care must be taken with the pulsars J0030+0451, J0613−0200, J1022+1001, J1640 +2224, J1713+0747, J1744−1134, and J2124−3358, where discrepancies between PTAs persisted when using the standardized noise model.While the results presented here adopted a much simplified approach as opposed to a true combination, these already hint at an enhancement in the significance of GWB detection in the full DR3 over those reported by EPTA+InPTA, NANOGrav, and PPTA individually.

Figure 1 .
Figure1.Left: free spectral posteriors for each PTA showing the measured HD-correlated GWB power in several frequency bins under no spectral shape assumption.Each PTA used a different Fourier basis set by their own maximum observing time.The semitransparent gray lines are 100 samples from the joint 2D power-law posterior distribution, showing the spread of power-law models that are consistent with all of the PTA's data.Right: 2D posterior for HD-correlated power-law GWB parameters.Contours show 68%, 95%, and 99.7% of the posterior mass.The vertical dotted line is at γ = 13/3.

Figure 2 .
Figure 2. Difference distributions for GWB parameters between pairs of PTAs as computed by tensiometer.The contours show 68% and 95% of the distribution mass.

Figure 3 .
Figure 3.Estimated sensitivity to the characteristic strain induced by a GWB as a function of GW frequency.The dashed line shows a power-law spectrum as determined by the joint 2D power-law posterior median from the right panel of Figure 1.

Figure 4 .
Figure 4. Amplitudes of CURN recovered using the factorized likelihood method.Extended data sets are also shown, where the pulsars of one data set are added to another, without repeating pulsars.The boxes contain 68% of the distribution mass, and the center line marks the median.The whiskers contain 95% of the distribution mass.

Figure 5 .
Figure 5. Left: the distribution of DFs for each PTA.DF is a measure of a pulsar's support for (>1) or against (<1) CURN.The majority of pulsars have a DF of 1, neither supporting nor rejecting the CURN seen by the rest of the array.Right: DFs for pulsars that are observed by more than one PTA, allowing for data set comparison under the standardized noise model.There are serious discrepancies for many of the pulsars, indicating the importance for fully combining the data set in IPTA-DR3.

Figure 7 .
Figure 7. Left: time-domain realizations of achromatic RN for PSR J1012+5307 measured with EPTA+InPTA and NANOGrav, using the standardized noise models.Right: time-domain realizations of achromatic RN for PSR J0613−0200 measured with EPTA+InPTA, NANOGrav, and PPTA, also using the standardized noise models.The plot at the right displays 2D marginalized posterior of the RN amplitude A RN and spectral slope γ RN (68%, 95%, 99.7% contours).For both time-domain figures, the colored areas represent 2σ credible intervals obtained from 300 realizations, and colored circles show the medians at observation epochs.

Figure 8 .
Figure8.Simultaneously fitting GWB spectral models to the free spectra of the three data sets.Left: 68% and 95% credible regions for the recovered GWB amplitude and spectral index for a power law, turnover, and t-process model.They are all consistent; however, the turnover model is significantly less constrained and has more support for a γ = 13/3 process.Right: the free spectral posteriors from each data set overlaid by power laws constructed from random draws from the ceffylgenerated posteriors.Colors correspond to the models in the left panel.

Figure 9 .
Figure 9.An unweighted Venn diagram showing the overlapping pulsars between each PTA's data sets.Note that for the factorized likelihood analyses four pulsars were dropped from the PPTA DR3 owing to baselines for those pulsars being shorter than 3 yr, but they are included in this diagram for completeness.

Figure 10 .
Figure10.The NMOS distributions of S/N for each single PTA and the fully extended combination of PTAs represented as box and whiskers.The boxes contain 50% of the distribution mass, and the center line marks the median.The whiskers contain 95% of the distribution mass.The colored solid ones were made using single PTA CURN runs.The dashed ones were made using single PTAs, but with the new joint posterior.Finally, the black box and whiskers show the fully extended analysis.

Table 1
Tension Metric (in Gaussian-equivalent "σ" Units) between the Data Sets for Noise Model Parameters, Sorted by Observing PTA