A publishing partnership

The following article is Free article

ARE WE THERE YET? TIME TO DETECTION OF NANOHERTZ GRAVITATIONAL WAVES BASED ON PULSAR-TIMING ARRAY LIMITS

, , , , , and

Published 2016 February 23 © 2016. The American Astronomical Society. All rights reserved.
, , Citation S. R. Taylor et al 2016 ApJL 819 L6DOI 10.3847/2041-8205/819/1/L6

2041-8205/819/1/L6

ABSTRACT

Decade-long timing observations of arrays of millisecond pulsars have placed highly constraining upper limits on the amplitude of the nanohertz gravitational-wave stochastic signal from the mergers of supermassive black hole binaries (∼10−15 strain at f = 1 yr−1). These limits suggest that binary merger rates have been overestimated, or that environmental influences from nuclear gas or stars accelerate orbital decay, reducing the gravitational-wave signal at the lowest, most sensitive frequencies. This prompts the question whether nanohertz gravitational waves (GWs) are likely to be detected in the near future. In this Letter, we answer this question quantitatively using simple statistical estimates, deriving the range of true signal amplitudes that are compatible with current upper limits, and computing expected detection probabilities as a function of observation time. We conclude that small arrays consisting of the pulsars with the least timing noise, which yield the tightest upper limits, have discouraging prospects of making a detection in the next two decades. By contrast, we find large arrays are crucial to detection because the quadrupolar spatial correlations induced by GWs can be well sampled by many pulsar pairs. Indeed, timing programs that monitor a large and expanding set of pulsars have an ∼80% probability of detecting GWs within the next 10 years, under assumptions on merger rates and environmental influences ranging from optimistic to conservative. Even in the extreme case where 90% of binaries stall before merger and environmental coupling effects diminish low-frequency gravitational-wave power, detection is delayed by at most a few years.

Export citation and abstractBibTeXRIS

1. INTRODUCTION

With the recent first discovery of gravitational waves (GWs) in the kilohertz frequency band, we are now firmly in the era of GW astronomy (Abbott et al. 2016). For the last 10 years, three international collaborations have been collecting precise timing observations of the most stable millisecond pulsars, with the aim of identifying the imprint of gravitational waves (GWs) in the nanohertz frequency band (Hobbs 2013; Kramer & Champion 2013; McLaughlin 2013). The best-motivated GW source for such pulsar-timing arrays (PTAs) is the stochastic background signal from the cosmological population of gravitationally bound supermassive black hole binaries (SMBHBs; Rajagopal & Romani 1995; Jaffe & Backer 2003; Wyithe & Loeb 2003) that are expected to form at the centers of galaxies after these merge (Sesana et al. 2004, 2008). If the binary inspirals are driven purely by GW radiation reaction at orbital separations corresponding to the PTA frequency band, the resulting timing-residual signal has a power-law spectrum with a characteristic exponent,

where hc(f) is the characteristic strain spectrum of the GW background (GWB). Theoretical expectations for the GW amplitude AGW center around a few 10−15 (Sesana 2013; McWilliams et al. 2014; Ravi et al. 2014). Recent observational upper limits from the three international pulsar-timing consortia (Arzoumanian et al. 2015a; Lentati et al. 2015; Shannon et al. 2015) imply AGW ≲ 10−15 at 95% confidence. These limits appear to be in contrast with theoretical expectations, so much so that it seems plausible that environmental effects in galactic centers either stall the formation of gravitationally bound SMBHBs or accelerate their inspiral; in both cases, the effective AGW is reduced at the frequencies where PTAs are most sensitive (Arzoumanian et al. 2015a; Shannon et al. 2015).

Indeed, the PTA detection of the stochastic GWB from SMBHBs, considered imminent only a few years ago (Siemens et al. 2013; McWilliams et al. 2014), now seems to be receding toward the future. Is this really the case? In this Letter, we answer this question quantitatively. Namely, given the upper limit Aul = 10−15 obtained by the Parkes PTA (PPTA; Shannon et al. 2015), we ask when we can expect to make a positive detection using different PTAs: the PPTA (Hobbs 2013), limited to the four low-timing-noise pulsars used for the upper limit; the North American Nanohertz Observatory for Gravitational-waves (NANOGrav; McLaughlin 2013); the European PTA (EPTA; Kramer & Champion 2013); the International PTA (IPTA; see, e.g., Hobbs et al. 2010); and the pulsar-timing project that will be supported by SKA1 (Janssen et al. 2015).

We adopt the frequentist formalism developed by Rosado et al. (2015), and we characterize the PTAs simply by listing, for each pulsar, the duration of the observation and the levels of measurement and timing noise. While GW searches and upper limits with PTAs have recently been given a Bayesian treatment (van Haasteren et al. 2009; Lentati et al. 2013; van Haasteren & Levin 2013), the frequentist approach based on optimal statistics is both convenient and appropriate for the purpose of this paper, because it dispenses with the simulation of actual data sets, and because the quantification of detection probability (DP) is intrinsically a frequentist statement (see, e.g., Vallisneri 2012). Furthermore, experience shows that frequentist upper limits and detection prospects are rather close to Bayesian results (see, e.g., Arzoumanian et al. 2015a).

We proceed as follows: for a specified PTA and true GWB amplitude (Atrue), we compute the DP as a function of observation time, using the detection statistic described below. Detection is a probabilistic endeavor since the actual realization of measurement and timing noise may obscure or expose the underlying GW signal. Indeed, we do not currently have access to Atrue, but only to the observed upper limit Aul. Setting upper limits is also a probabilistic endeavor because different realizations of noise lead to different Aul given the same Atrue, so we use the upper limit optimal statistic (also described below) to compute . By introducing an astrophysically motivated prior p(Atrue), we can then use Bayes's theorem to obtain . Finally, we obtain the expected DP(Aul) as .5

We conclude that, while small arrays of a few low-timing-noise pulsars (represented here by the Shannon et al. 2015 configuration) provide the most constraining upper limits on the nanohertz GWB, they are suboptimal for detection in comparison to larger arrays such as NANOGrav, EPTA, full PPTA, and the IPTA, which are expanded regularly with newly discovered pulsars. Indeed, detection and upper limits have rather different demands, and the PPTA itself may employ a ∼20-pulsar array for detection in the near future (Reardon et al. 2016). Encouragingly, we find that larger PTAs have ∼80% probability of making a detection in the next 10 years, even allowing for a reduced GWB signal due to significant binary stalling or environmental influences.

2. STATISTICS

The details of the frequency-domain statistical framework used here can be found in Rosado et al. (2015). In the frequentist context, a statistic X is a single number that summarizes the (noisy) data with respect to the presence of a (GW) signal, so that, on average, a higher Atrue yields a higher X, with fluctuations due to the range of possible noise realizations. Given a set of data, there is one observed , and setting an upper limit amounts to stating that if Atrue were equal to for 95% of noise realizations, the observed X would have been higher than .

Detection schemes based on statistics, on the other hand, proceed as follows: one considers separately the case where a signal is present in the data at a certain level, and the case where it is not, and computes the probability distribution of the statistic (over the ensemble of noise realizations) in both cases. The zero-signal distribution is used to set a detection threshold as a function of a false-alarm probability (FAP); the signal distribution is used to compute the probability that for a certain Atrue the statistic will exceed the threshold—that is, the DP (or detection efficiency).

In the following, we use cross-correlation statistics for both upper limit and detection considerations, making use of the correlated influence of a GWB signal on pulsars that are widely separated on the sky.

Upper limits—For the discussion of upper limits in this work, we will use the so-called A-statistic of Rosado et al. (2015) modified so that the standard deviation is measured in units of squared GWB amplitude and that the value of the statistic is equal to the injected GWB amplitude on average, as is done in Chamberlin et al. (2015). Assuming that the distribution of this statistic is Gaussian (a reasonable approximation at low signal-to-noise ratios; Chamberlin et al. 2015), we obtain

with

where denotes a sum over frequencies fk, denotes a sum over pulsar pairs, Γij is the overlap reduction function (i.e., this is the Hellings & Downs 1983 curve for an isotropic GWB) for pulsars i and j, is the modeled cross-power spectral density, and is the intrinsic noise power spectral density. From this Gaussian probability distribution, it can be shown that

where is the measured value of the GWB amplitude and C is our upper limit confidence (e.g., 95%). Thus, the distribution is a Gaussian distribution with standard deviation σ0 and mean . Note that to compute we perform a coordinate transformation to obtain = .

Detection—In the previous section, we used the A-statistic for upper limits in order to compare our results with upper limits computed using the statistic presented in Chamberlin et al. (2015). For assessing our DP, however, we will make use of the B-statistic of Rosado et al. (2015). In this case, we wish to determine the DP

where α0 is the FAP; μ1, σ1, and σ0 (distinct from that used previously) are the mean in the presence of a signal and the standard deviation in the presence (subscript 1) and in the absence (subscript 0) of a signal (see Equations (A16)–(A18) of Rosado et al. 2015 for more details). Both μ1 and σ1 grow with increasing signal amplitude. Here, we choose an FAP of 0.13% to match the 3σ detection threshold of Siemens et al. (2013).6

Implementation—In the work of Rosado et al. (2015), it is argued that timing model subtraction (namely, subtraction of the quadratic term of the timing model; due to pulsar spin down) can be emulated by not including the lowest frequency in the sum over k frequencies. However, we have found that these analytic results agree much better with simulations including timing model subtraction if all frequencies are included in the sum. Furthermore, to emulate data sets with different time spans, we only include frequencies in the sum that are greater than 1/Tmin, where Tmin is the shortest time span for the given pulsar pair.

In this work, we always include the GWB power spectral density in the intrinsic pulsar noise Pi. This is meant to mimic a real optimal statistic analysis where the intrinsic noise would be based on single-pulsar analyses in which there is no way to distinguish the intrinsic noise from the GWB power. This will make our analysis conservative in the sense that we are overestimating the intrinsic noise.

3. RESULTS AND DISCUSSION

In the following, we characterize detection prospects in terms of small and large PTAs, and specifically consider the following five configurations:

  • 1.  
    PPTA4, consisting of the four pulsars described in Shannon et al. (2015) with associated measurement (white) and timing (red) noise (Reardon et al. 2016; Shannon et al. 2015).
  • 2.  
    NANOGrav+, consisting of the 37 pulsars described in Table 3 of Arzoumanian et al. (2015b) with their associated noise properties, to be expanded in future observations by adding four new pulsars each year with 250 ns TOA measurement noise, in line with current expectations.
  • 3.  
    EPTA+, consisting of the 42 pulsars described in Table 3 of Caballero et al. (2015) with their associated noise properties, and regularly adding pulsars as in (2). Some of these additional pulsars are monitored with the Large European Array for Pulsars (LEAP; Kramer & Champion 2013; Bassa et al. 2016), which synthesizes a 194 m steerable dish from the five EPTA telescopes.
  • 4.  
    A conservative IPTA+, consisting of the 49 pulsars of the first IPTA data release, described in Verbiest et al. (2016) with measurement and timing noise7 properties as in Lentati et al. (2016), and expanding by six new pulsars each year, again with 250 ns TOA measurement noise.
  • 5.  
    A theorist's PTA (TPTA), consisting of the toy configuration of Rosado et al. (2015), which may be supported by an advanced radio telescope such as SKA1: 50 pulsars with 100 ns TOA measurement noise and no intrinsic timing noise.

These configurations are summarized in Table 1.8

Table 1.  PTAs Considered, as Characterized in Our Analysis

PTA Configuration Number of Pulsars
PPTA4 PPTA (Shannon et al. 2015) 4
NANOGrav+ NANOGrav 37 + 4 new/year with 250 ns
  (Arzoumanian et al. 2015b) TOA measurement precision
EPTA+ EPTA (Caballero et al. 2015) 42 + 4 new/year with 250 ns
    TOA measurement precision
IPTA+ IPTA Data Release 1 49 + 6 new/year with 250 ns
  (Verbiest et al. 2016) TOA measurement precision
  (Lentati et al. 2016)  
Theorist's PTA 100 ns TOA measurement precision 50

Download table as:  ASCIITypeset image

We consider four combined estimates for the amplitude and spectral shape of the stochastic GWB from SMBHBs, ranging from more optimistic (detection wise) to more conservative. For the amplitude, we adopt the observationally motivated lognormal distribution of Sesana (2013) with mean and standard deviation of 0.22; however, we also consider the case where 90% of binaries require longer than ∼9–13.7 Gyr before reaching the PTA band (i.e., the binaries stall; McWilliams et al. 2014; S. Burke-Spolaor & J. Simon 2015, private communication), which corresponds to a mean , with the same standard deviation. For each of these amplitude priors, we examine the purely GW-driven power-law spectrum of Equation (1), as well as the case where the GW strain has a turnover at f = 1/(11 year) caused by SMBHB interactions with stars in galactic nuclei, which accelerate binary inspiral and remove low-frequency power with respect to the pure power law (see, e.g., Arzoumanian et al. 2015a and Sampson et al. 2015). The 1/(11 year) turnover frequency was chosen to lie just beyond the low-frequency reach of the current PPTA data set since the Shannon et al. (2015) analysis still found ∼9% consistency with the purely GW-driven model of Sesana (2013) for f ≥ 1/(11 year). Similarly, the recent NANOGrav (Arzoumanian et al. 2015a) analysis finds 20% consistency using a nine-year data set (Arzoumanian et al. 2015b). Furthermore, our choice of a turnover at 1/(11 year) corresponds to reasonable assumptions about the density of stellar populations interacting with SMBHBs in galactic nuclei (see Figures 10 and 11 of Arzoumanian et al. 2015a). At very low frequencies, the frequency scaling of the timing-residual spectrum becomes S(f) ∝ 1/f due to the environmental influences.

In the top panel of Figure 1, we show the probability distribution obtained by setting the 95% A-statistic upper limit to the Shannon et al. (2015) value, 1 × 10−15, and by assuming no-stalling (solid blue line) and 90% stalling (faded solid red line) amplitude priors, with pure power-law true spectra. The dashed lines correspond to turnover spectra.9 For comparison, in the bottom panel of Figure 1, we show also as computed for NANOGrav's optimal-statistic upper limit 1.3 × 10−15 (Arzoumanian et al. 2015a). The underlying astrophysically motivated priors for no-stalling and 90% stalling are shown in each panel as blue and red dash-dotted lines, respectively.

Figure 1. Refer to the following caption and surrounding text.

Figure 1. The normalized probability distribution for the SMBHB GW amplitude Atrue, given upper limits from the PPTA (Shannon et al. 2015) and a large PTA that regularly adds pulsars (specifically NANOGrav; Arzoumanian et al. 2015a). The darker blue curves assume an amplitude prior based on Sesana (2013); the faded red curves modify it by assuming 90% binary stalling, reducing the amplitude -fold. The dashed curves reflect turnover spectra due to binary stellar hardening. The dash-dotted curves correspond to the astrophysically motivated priors on the amplitude for no-stalling (blue) and 90% stalling (red).

Standard image High-resolution image

Next, we compute the B-statistic GW DPs, Equation (5), for each PTA, as a function of Atrue and of the observation time T beyond current data sets. The resulting DPs are shown in Figures 2: the left and right columns correspond to the pure power-law and turnover spectra, respectively.10

Figure 2. Refer to the following caption and surrounding text.

Figure 2. Detection probabilities (DPs) for all arrays considered in this work, as a function of the true GW background amplitude Atrue and of observation time T beyond the existing data set. The left panels were derived for pure power-law GW background spectra, and the right panels for turnover spectra with a knee at f = 1/(11 years) due to stellar hardening.

Standard image High-resolution image

As mentioned above, for NANOGrav+/EPTA+ and IPTA+ we add new pulsars at rates of four and six per year, respectively. This annual expansion has a positive impact on the DP since it provides more independent pulsar pairs at differing angular separations, which help discriminate GWs with quadrupolar spatial correlations (the Hellings & Downs 1983 curve, or the more general anisotropic signatures discussed by Mingarelli et al. 2013 and Gair et al. 2014) from non-spatially correlated red-noise processes (Siemens et al. 2013). The unique GW correlation signature is the smoking gun for detection, and we must coordinate future efforts to maximize its measurement.

We obtain the expected DPs as a function of time by integrating DP (Atrue, T) against the curves of Figure 1 (top panel). The resulting are shown in Figures 3. As in Figure 1, the darker blue and lighter red curves correspond to no-stalling and 90% stalling amplitude priors. For the PPTA4 array, expected DPs remain below 10% throughout the next 20 years of observation, even in the most optimistic GW-signal scenario. This is unsurprising—a PTA consisting of a few exquisitely timed pulsars may provide very tight upper limits, but it will not usually yield convincing detection statistics, which require an array of many pulsars to map out the expected spatial correlation signature.

Figure 3. Refer to the following caption and surrounding text.

Figure 3. Summary of results for the growth of GWB detection probability (DP) with further observation time in each array. Blue and red lines are for zero and 90% binary stalling, respectively. Dashed lines correspond to the case where the true background spectrum has a turnover at f = 1/(11 years) due to binary stellar hardening.

Standard image High-resolution image

By contrast, large pulsar arrays such as NANOGrav+, EPTA+, and IPTA+ provide high DPs even with strong binary environmental couplings since they allow the quadrupolar spatial correlation signature of a GWB to be mapped by many different pulsar pairings. We expect these results to be qualitatively the same for a full PPTA that regularly adds pulsars to the array. In NANOGrav+, EPTA+, and IPTA+, DPs begin to grow rapidly after only five years of observation beyond current data sets. Binary stalling (the red curves) stunts this growth by three years at most. While the low-frequency turnover reduces DP, its effect is mitigated by the large number of pulsars and the annual catalog expansion.11

Finally, we see that the large number of well-timed pulsars in the TPTA builds highly convincing DPs after only a few years of operation. By the time the pulsars have been observed long enough that the influence of the turnover at f = 1/(11 year) may be noticeable, the DP is already close to unity. The same is true were we to add timing noise at currently known levels to these TPTA pulsars—the DP will already be close to unity when low-frequency timing noise begins to have a significant influence.

We stress that there are several important caveats to our analysis. We have focused on detecting the stochastic background rather than deterministic signals. Even for these, using data from more pulsars is desirable in order to build evidence for a coherent signal. We have assumed for each pulsar a single value of measurement noise that does not improve over time (as occurs in reality when receivers and backends are upgraded), nor do we consider interstellar medium effects such as dispersion (see, e.g., Stinebring 2013). The influence of a GWB spectral turnover depends on its frequency, which is a function of the typical environmental properties of galactic nuclei (either directly or in how these properties drive SMBHB orbital eccentricity)—our choice corresponds to reasonable assumptions about the stellar mass density of SMBHB environments. The timelines for the growth of detection in each PTA are approximate, intended to emphasize the differences between PTAs suited to upper limits versus detection and to demonstrate the influence of various binary stalling and environmental scenarios on DPs.

We conclude by emphasizing the different demands of placing stringent upper limits on the stochastic background versus actually detecting it.

  • 1.  
    Highly constraining, astrophysically significant upper limits are achievable with only a few exquisitely timed pulsars, but such a PTA is suboptimal for the detection of a stochastic GWB.
  • 2.  
    Timing many pulsars allows for the quadrupolar spatial correlation signature of the SMBHB background to be sampled at many different angular separations, enhancing prospects for detection.
  • 3.  
    Adding more pulsars regularly to PTAs will continually improve DP, in addition to the gains already made by timing existing pulsars for longer, and will help to mitigate the deleterious influences of binary stalling and environmental couplings.

It is our pleasure to thank Pablo Rosado, Alberto Sesana, Jonathan Gair, Lindley Lentati, Sarah Burke-Spolaor, Xavier Siemens, Maura McLaughlin, Joseph Romano, and Michael Kramer for very useful suggestions. We also thank the full NANOGrav collaboration for their comments and remarks. S.R.T. was supported by an appointment to the NASA Postdoctoral Program at the Jet Propulsion Laboratory, administered by Oak Ridge Associated Universities through a contract with NASA. M.V. acknowledges support from the JPL RTD program. J.A.E. and R.v.H. acknowledge support by NASA through Einstein Fellowship grants PF4-150120 and PF3-140116, respectively. C.M.F.M. was supported by a Marie Curie International Outgoing Fellowship within the European Union Seventh Framework Programme. This work was supported in part by National Science Foundation Physics Frontier Center award No. 1430284 and by grant PHYS-1066293 and the hospitality of the Aspen Center for Physics. This research was performed at the Jet Propulsion Laboratory, under contract with the National Aeronautics and Space Administration.

Footnotes

  • By assuming a prior, we are introducing a Bayesian element in a frequentist scheme, but this is necessary because we wish to make a statement about a range of astrophysical possibilities, which are all compatible to varying degrees with the observed upper limit.

  • This choice is clear if one notices that the 3σ range is a two-sided confidence limit; thus, 0.27% of the probability density function is outside of this range. To convert to a standard FAP, we need the one-sided limit that is simply 0.27/2 ∼ 0.13.

  • For IPTA pulsars, timing noise is taken to consist only of red spin noise, and not of any other system- or band-specific red noise. The latter components may be isolated with multi-system and multi-frequency observations, while the former is completely conflated with a GWB signal in the absence of cross-correlation measurements.

  • We rescale PPTA, NANOGrav, EPTA, and IPTA measurement noises (but not the timing noise) by a common factor for each PTA, so that the corresponding A-statistic upper limits match the results of Shannon et al. (2015), Arzoumanian et al. (2015a), Lentati et al. (2015), and Verbiest et al. (2016); this rescaling has little impact on DP in the future since white measurement noise becomes subdominant to red timing noise at low frequencies.

  • When analyzing these latter cases, we use the "wrong" spectrum—a pure power law—for the Pi in Equation (3), for consistency with existing analyses that assume GW-driven inspirals in deriving limits. As discussed in Rosado et al. (2015), doing so has negligible effects on the distribution of the A-statistic.

  • 10 

    In computing the B-statistic, we do assume that the analysis accounts correctly for the true spectral shape. Again, this choice has minimal effects.

  • 11 

    Shannon et al. (2015) advocate performing observations with higher cadence to improve sensitivity at f ≥ 0.2 yr−1, where the GWB would be less affected by environmental couplings. In fact, higher-cadence observations would improve sensitivity across all frequencies, but not if they come at the cost of reducing the number of monitored pulsars because of limited observing-time allocations. Furthermore, GW searches are actually sensitive to the spectrum of timing residuals rather than of strain itself. The two are related by , so even a turnover spectrum will leave a steep red-noise signature in the pulsar-timing residuals.

10.3847/2041-8205/819/1/L6
undefined