Brought to you by:
Paper

The NINJA-2 project: detecting and characterizing gravitational waveforms modelled using numerical binary black hole simulations

, , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , and

Published 20 May 2014 © 2014 IOP Publishing Ltd
, , Citation J Aasi et al 2014 Class. Quantum Grav. 31 115004 DOI 10.1088/0264-9381/31/11/115004

0264-9381/31/11/115004

Abstract

The Numerical INJection Analysis (NINJA) project is a collaborative effort between members of the numerical relativity and gravitational-wave (GW) astrophysics communities. The purpose of NINJA is to study the ability to detect GWs emitted from merging binary black holes (BBH) and recover their parameters with next-generation GW observatories. We report here on the results of the second NINJA project, NINJA-2, which employs 60 complete BBH hybrid waveforms consisting of a numerical portion modelling the late inspiral, merger, and ringdown stitched to a post-Newtonian portion modelling the early inspiral. In a 'blind injection challenge' similar to that conducted in recent Laser Interferometer Gravitational Wave Observatory (LIGO) and Virgo science runs, we added seven hybrid waveforms to two months of data recoloured to predictions of Advanced LIGO (aLIGO) and Advanced Virgo (AdV) sensitivity curves during their first observing runs. The resulting data was analysed by GW detection algorithms and 6 of the waveforms were recovered with false alarm rates smaller than 1 in a thousand years. Parameter-estimation algorithms were run on each of these waveforms to explore the ability to constrain the masses, component angular momenta and sky position of these waveforms. We find that the strong degeneracy between the mass ratio and the BHs' angular momenta will make it difficult to precisely estimate these parameters with aLIGO and AdV. We also perform a large-scale Monte Carlo study to assess the ability to recover each of the 60 hybrid waveforms with early aLIGO and AdV sensitivity curves. Our results predict that early aLIGO and AdV will have a volume-weighted average sensitive distance of 300 Mpc (1 Gpc) for 10M + 10M (50M + 50M) BBH coalescences. We demonstrate that neglecting the component angular momenta in the waveform models used in matched-filtering will result in a reduction in sensitivity for systems with large component angular momenta. This reduction is estimated to be up to ∼15% for 50M + 50M BBH coalescences with almost maximal angular momenta aligned with the orbit when using early aLIGO and AdV sensitivity curves.

Export citation and abstract BibTeX RIS

1. Introduction

A network of second-generation laser interferometric gravitational-wave (GW) observatories is presently under construction. The US-based Advanced Laser Interferometer Gravitational Wave Observatory (aLIGO) [1] is expected to have its initial observing run in 2015 utilizing observatories in Hanford, Washington and Livingston, Louisiana (denoted 'H' and 'L', respectively). aLIGO will then work towards reaching design sensitivity, expected in 2018–20 [2]. The French–Italian Advanced Virgo (AdV) observatory [3, 4] (denoted 'V') is expected to follow shortly after the aLIGO instruments. The cryogenically cooled KAGRA observatory [5, 6] and a India-based aLIGO facility [7, 8] are due to begin operations around 2020, providing a 5-site network to explore the GW sky in detail.

These second-generation observatories will have an order of magnitude increase in sensitivity over their first-generation counterparts and will be sensitive to a broader range of GW frequencies [1, 4, 6]. One of the primary observational targets for this global network is the inspiral, merger and ringdown of a binary system containing two black holes (BHs) [9]. With aLIGO and AdV operating at their final design sensitivities it is expected that 0.4–1000 binary BH (BBH) coalescences will be observed per year of operation [10]. Directly observing the collision of two BHs will allow GW astronomers to understand the physics of BH spacetimes and to explore the strong-field conditions of the theory of general relativity [11].

Exploring the underlying mass and spin distributions of stellar-mass BHs can tell us a great deal about the end stages of massive-star evolution. The mass measurements of compact objects made to date suggest a gap between the most massive neutron stars (≲ 3M) [12] and the least massive BHs (≳ 5 M) [13]. It is still an open question as to whether this gap is real and the result of formation mechanisms, or simply due to observational biases [14]. Whether or not ground-based detectors will be able to distinguish between these regions in mass space is of great interest. Furthermore, from the distributions of BH spin magnitudes and tilts (orientation of the spin relative to the orbital angular momentum), more can be learned about supernovae kicks and compact binary formation environments. Stellar-mass BH spin measurements are currently done by modelling either the accretion disc's thermal continuum x-ray spectrum, or the profile of the broadened Fe Kα line [15]. Both methods are fundamentally based on assumptions about the location of the inner edge of the accretion disc, and also depend sensitively on very complicated physical models of the disc and its emission. GWs will provide an entirely new method of measuring BH spin which does not require the complicated modelling of accretion disc physics.

Of the stellar-mass BH angular momenta (spin) measurements made to date, half are found to have a magnitude a ≳ 0.8 [15]. With BBH observations, aLIGO and AdV will be able to provide independent measurements of the BH spin magnitudes. Therefore, it will be interesting to evaluate how well aLIGO and AdV will be able to constrain the magnitude of the BHs' component spins. The direction of the compact objects' angular momenta is also of interest, with particular implications for formation mechanisms [16]. Measuring systems with component spins misaligned with the orbital angular momentum is outside of the scope of this project. However, this study does include systems with component spins that are both aligned and anti-aligned with the orbital angular momenta, and we will evaluate the ability of aLIGO and AdV to distinguish such systems from one another.

The standard technique for observing BBH mergers involves matched-filtering data taken from GW observatories against 'template' waveforms that should closely match potential astrophysical signals [1719]. The observable BBH waveform includes the signal from the inspiral of the two BHs, as well as their merger and the resulting BH's ringdown. Search templates must include all of these features [20, 21]. As an alternative to matched-filter searches, a number of algorithms exist to perform searches for unmodelled GW signals [2224]. These algorithms do not require accurate knowledge of the waveforms to make observations, but are not as sensitive as matched-filter searches in cases where the waveform models are well understood.

Theoretical models of the inspiral, merger and ringdown of BBH systems are necessary to produce template banks for matched-filter searches and to use as model signals to test both matched-filter and unmodelled searches. The inspiral portion of the waveform can be modelled by analytic post-Newtonian (PN) calculations [20, 25], while numerical solutions of the General Relativity field equations are required to accurately model the final orbits and merger. Prior to breakthroughs in numerical relativity (NR) in 2005 [2628], template banks and search pipeline tests used only inspiral waveforms. Since 2007, NR waveforms have been used to calibrate analytical waveform models [2936]. Some of the analytical waveforms have been already employed in search pipelines [37]. However, there exists another useful and valuable avenue of communication between numerical relativists and GW astronomers. As NR pushes into new regions of parameter space the waveforms can be used directly to test searches employing previously-calibrated templates, and the degree to which these searches prove to be insufficient can motivate both new template models and additional simulations.

The Numerical INJection Analysis (NINJA) project was created in 2008. The project uses recent advances in NR ([38] and references therein) to test analysis pipelines by adding numerically modelled, physically-realistic signals to detector noise and attempting to recover these signals with search pipelines. The first NINJA project (NINJA-1) [39] utilized a total of 23 numerical waveforms, which were injected into Gaussian noise coloured with the frequency sensitivity of initial LIGO and Virgo. These data were analysed by nine data-analysis groups using both search and parameter-estimation algorithms [39].

However, there were four limitations to the NINJA-1 analysis. First, due to the computational cost of NR simulations, most waveforms included only ∼10 orbits before merger. Therefore the waveforms were too short to inject over an astrophysically interesting mass range without introducing artefacts into the data. The lowest mass binary considered in NINJA-1 had a total mass of 35 M, whereas the mass of BHs could extend below 5M [14, 40]. Second, the waveforms were only inspected for obvious, pathological errors and no cross-checks were performed between the submitted waveforms. It was therefore difficult to assess the physical fidelity of the results. Third, the NINJA-1 data set contained stationary noise with the simulated signals already injected into the data. Since the data set lacked the non-Gaussian noise transients present in real detector data, it was not possible to fully explore the response of the algorithms in a real search scenario. Finally, the data set contained only 126 simulated signals, this precluded detailed statistical studies of the effectiveness of search and parameter-estimation algorithms. Despite these limitations, the NINJA-1 project lead to a framework within which to perform injection studies using waveforms as calculated by the full nonlinear general theory of relativity, established guidelines for such studies (in particular a well-defined format for the exchange of NR waveforms [41]), and clarified where further work was needed.

This lead to the initiation of the second NINJA project (NINJA-2), whose goals were to build and improve upon NINJA-1 and perform a systematic test of the efficiency of data-analysis pipelines in preparation for the advanced detector era. A set of 60 NR waveforms were submitted by 8 NR groups for the NINJA-2 project [42]. These waveforms conform to a set of length and accuracy requirements, and are attached to PN inspiral signals to produce hybrid PN–NR waveforms that can be injected over the full range of physically relevant total binary masses. The construction and verification of these waveforms is described in a previous paper [42], and summarized here in section 2. In the NR and Analytical-Relativity collaboration, a project complementary to the NINJA collaboration, 22 new NR waveforms were produced and rigorously analysed. These newly produced NR waveforms were compared to the most recent calibrated analytical models, finding that the loss of event rates due to modelling is below 3% [43]. In this paper we study the ability of the search algorithms used in the last of the initial-LIGO and Virgo science runs to observe numerically modelled BBH waveforms from the set of 60 waveforms submitted to the NINJA-2 project. This is done using data taken during LIGO's sixth and Virgo's second science runs and recolouring that data to the sensitivities expected from early observation runs of aLIGO and AdV.

There are a wide range of search and parameter-estimation algorithms available within the GW astronomy community: those that were used in past analyses of detector data, old algorithms that have been updated and re-tuned following the experience gained in those analyses, plus many new algorithms under development. For both practical reasons, and to mark a clear point in the development and refinement of these methods, this work employed only search and parameter-estimation algorithms that were approved and used in the last initial-LIGO and Virgo science runs, without any additional tuning or modifications [22, 37, 44, 45]. By doing this we aim to provide a benchmark against which future algorithms can be compared.

A set of seven NR waveforms, with masses ranging from 14.4 to 124M were added into the recoloured data as an unbiased test of the process through which candidate events are identified for BBH waveforms. This data was distributed to analysts who knew that such 'blind injections' were present but had no information about the number, parameters or temporal location of these waveforms. This was similar to blind injection tests conducted by the LIGO and Virgo collaborations in their latest science runs [46]. Using a search for unmodelled GW transients we found that one of these signals was recovered, with an estimated false alarm rate (FAR) of 1 every 47 years. The remaining 6 signals were consistent with background. Using a matched-filtered algorithm with a bank of BBH IMR waveforms, which were not calibrated against the NR signals used in NINJA-2, 6 of the signals were recovered with more significance than all background events. This allowed upper limits on the FAR ranging between 1 every 5000 years and 1 every 40 000 years to be placed on each blind injection. The remaining signal was not recovered due to having a low network signal-to-noise ratio (SNR) and possessing a large anti-aligned spin, which was not modelled in the bank of waveforms used in the search.

Parameter-estimation algorithms have come a long way since the first NINJA project. Previously these analyses were unable to estimate the parameters of high mass systems accurately due to the use of inspiral-only models on data with little measurable inspiral. We show that these tools are now capable of reliably providing parameter estimates for both low and high mass systems. For all but one injection the masses and spins of the BHs were recovered within the estimated 95% credible regions. The remaining injection suffered small systematic biases due to non-Gaussian features present in the noise, the modelling of which is an ongoing endeavour. We find that strong intrinsic degeneracies between the masses and BH spins [47, 48] make it difficult to constrain the masses well, for three of the signals the presence of a neutron star cannot be ruled out. We also investigate the ability to constrain the sky localization of the various signals and demonstrate how even low power non-Gaussian noise transients in the data can effect the recovery of the intrinsic parameters of BBH systems.

We use large sets of known waveforms to assess the efficiency of the matched-filter BBH search algorithm as a function of the mass and angular momenta of the component BHs. These are the first such studies that have been done using real data recoloured to second-generation noise curves, which include the non-Gaussian features that will be present in the data taken with aLIGO and AdV. As these results were obtained using the search pipelines and techniques that were deployed in the final observing runs of Initial LIGO and Initial Virgo, they can therefore provide a benchmark against which improvements to the search techniques can be compared and assessed.

In our large-scale simulation studies we find evidence that incorporating search waveforms including the effects of spin will increase the efficiency of searches. The results shown here can be used, in the future, to compare with results of search pipelines including the effects of component spins that are aligned with the orbital angular momentum, which are currently under development [4951]. We also assess the efficiency of the matched-filter BBH search algorithm to recover waveforms generated by different groups with the same parameters. We find that the efficiency of the matched-filter BBH search algorithm to recover different waveforms, generated by different groups, but with identical physical parameters, is indistinguishable up to statistical errors.

In this paper we have not scaled observed masses and distances to account for cosmological effects, which will be important especially for high-mass BBH collisions. Therefore any masses and distances quoted should be interpreted as observed masses and luminosity distances.

The paper contains four analyses: first, an unmodelled search for the seven blind injections. Second, a modelled search for the seven blind injections using frequency domain TaylorF2 waveforms for templates with total mass <25M and effective-one-body (EOB) inspiral–merger–ringdown waveforms for total mass >25M. Third, a followup parameter-estimation analysis on the seven blind injections using three inspiral–merger–ringdown templates—PhenomB with no spin, EOBNRv2 and PhenomB with aligned spins. Finally, a large-scale simulation campaign, adding a large number of the submitted NR waveforms to the NINJA-2 recoloured data and attempting to recover these with the modelled search pipeline.

The paper is organized as follows: in section 2 we briefly summarize the waveform catalogue described more fully in [42]. Section 3 describes the LIGO/Virgo data used and the processing that was done to make it resemble anticipated advanced-detector noise. Section 4 describes how the parameters for the blind injections were chosen, reports the values that were selected and describes the distribution of parameters used for the large-scale simulation campaign. Section 5 describes the various detection and parameter-estimation algorithms that were used in our analyses. Section 6 reports the results of the unmodelled and modelled searches for the seven blind injections. Section 7 describes the results of the parameter-estimation analysis on the blind injections. Section 8 describes the results of the large-scale simulation campaign, which aims to quantify the sensitivity of the detection searches to the different hybrid waveforms. We conclude in section 9 with a discussion of how well the various algorithms performed, and implications for the advanced detector era.

2. PN–NR hybrid waveforms

The NINJA-2 waveform catalog contains 60 PN–NR hybrid waveforms that were contributed by eight NR groups. This catalog and the procedures used to validate it are described in detail in [42]. We briefly summarize the NINJA-2 catalog here.

Each waveform in the NINJA-2 waveform catalog consists of a PN portion modelling the early inspiral, stitched to a numerical portion modelling the late inspiral, merger and ringdown. This ensures accurate modelling of the late portions of the waveform while simultaneously ensuring that waveforms are long enough to be scaled to masses as low as 10 M without starting abruptly within the sensitive frequency band of the detectors. We require that for the NR portion of the waveform the amplitude be accurate to within 5% and the phase (as a function of GW frequency) have an accumulated uncertainty over the inspiral, merger and ringdown of no more than 0.5 rad. Since we do not have access to exact waveforms we define 'accuracy' by convergence of the numerical waveforms as resolution and waveform-extraction radius are increased. We also require at least five orbits of numerical data in order to ensure robust blending with the PN portion. No requirements were placed on the hybridization itself, although it is known that hybridization can introduce significant errors [34, 52, 53]. It was decided to limit NINJA-2 to systems without eccentricity, and with BH spins parallel or anti-parallel to the orbital angular momentum. This last condition avoids precession, which we do for two reasons; (i) precession greatly complicates waveform phenomonology and we prefer to first tackle a simpler subset which still maintains the main features of binary evolution and merger; and (ii) at the start of NINJA-2 the precessing-binary parameter space had been sampled by only a handful of numerical simulations. Waveforms were submitted in the format described in [41], and data was provided as strain decomposed into spherical harmonics of weight −2. Groups were encouraged to submit modes beyond (l, m) = (2, ±2) and many did so. However the techniques to validate these higher modes are a current research topic. In order not to delay the NINJA-2 project it was decided to validate only the (2, ±2) modes in [42] and employ only these modes for the first NINJA-2 analysis. Different groups employed different codes, as well as different methods for solving initial conditions, dealing with singularities, evolving Einstein's equations, and extracting GW information. In addition different PN approximants and different hybridization methods were used by different groups in constructing the full hybrid waveforms. It was found that the dominant source of disagreement between submissions was in the PN portion, and in particular overlaps between submissions were greater than 0.97 over the range of masses, including regions sensitive to differences in hybridization techniques. See [42] for details.

The parameter space for aligned-spin BBH systems is four dimensional; the masses and spin magnitudes of each of the two holes. However, in the absence of matter Einstein's equations possess a mass invariance, and a solution obtained by NR or other method may be trivially rescaled to any total mass. We therefore eliminate total mass from the parameter space of submissions leaving the ratio of the two masses, denoted q, and the dimensionless spins denoted χ1, 2 which must lie between −1 < χ1, 2 < 1.

Tables 1 and 2 give a summary of the submissions for systems where the masses of the two BHs are equal and unequal, respectively. The first column of tables 1 and 2 gives a label for each waveform, to ease referring to them in later sections. These labels of the form 'G2+20+20_T4' are constructed as follows: the first letter represents the group submitting the numerical simulation:

  • F: the NR group at Florida Atlantic University, also using the BAM code [5457]
  • G: the Georgia Tech group using MayaKranc [5864]
  • J: the BAM (Jena) code, as used by the Cardiff–Jena–Palma–Viennacollaboration [33, 55, 6568]
  • L: the Lean code, developed by Ulrich Sperhake [69, 70]
  • Ll: the Llama code, used by the AEI group and the Palma–Caltech groups [7173]
  • R: the group from Rochester Institute of Technology, using the LazEv code [27, 7476]
  • S: the SXS collaboration using the SpEC code [7786]
  • U: the group from The University of Illinois [87].

Immediately after this letter follows the mass ratio q = m1/m2, where the BHs are labelled such that q ⩾ 1. Subsequently are the components of the initial dimensionless spin along the orbital angular momentum, multiplied by 100 (e.g. '+20' corresponds to $\hat{L}\cdot \vec{S}_1 /m_1^2=0.2$) of the more massive and the less massive BH. The label closes with the Taylor-approximant being used for the PN portion of the waveform, with 'T1' and 'T4' representing TaylorT1 and TaylorT4, respectively. The Georgia Tech group submitted four pairs of simulations where each pair simulates systems with identical physical parameters, stitched to the same PN approximant. These waveforms are not identical however as each simulation within a pair has a different number of NR cycles and was generated at a different resolution. These are distinguished by appending '_1' and '_2' to the label.

Each NR group verified that their waveforms met the minimum NINJA-2 requirements as described above. The minimum-five-orbits requirement was easily verified by inspection, and the amplitude and phase uncertainties were estimated by convergence tests with respect to numerical resolution and waveform-extraction radius. The full catalog was then verified by the NINJA-2 collaboration. Submissions were inspected in the time and frequency domains to identify any obvious problems caused by hybridization or integration from the Newman–Penrose curvature scalar ψ4 to strain. Where multiple simulations were available for the same physical parameters these simulations were compared using the matched-filter overlap. The inner product between two real waveforms s1(t) and s2(t) is defined as

Equation (1)

where $\tilde{x}$ denotes the Fourier transform of x and Sn(f) is the power spectral density (PSD), which was taken to be the target sensitivity for the first advanced-detector runs, referred to as the 'early aLIGO' PSD. This is described in more detail in section 3.

The overlap is then obtained by normalization and maximization over relative time and phase shifts, Δt and Δϕ

Equation (2)

The investigations in [42] demonstrated that the submitted waveforms met the requirements as outlined above and in addition were consistent with each other to the extent expected. We therefore conclude that these submissions model real GWs with sufficient accuracy to quantitatively determine how data-analysis pipelines will respond to signals in next-generation GW observatories.

The NINJA-2 waveforms cover the three-dimensional aligned-spin parameter space rather unevenly, as indicated in figure 1. The configurations available fall predominantly into two one-dimensional subspaces: (i) Binaries of varying mass ratio, but with non-spinning BHs. (ii) Binaries of BHs with equal-mass and equal-spin, and with varying spin-magnitude. Future studies, with additional waveforms covering the gaps that are clearly evident in figure 1 and waveforms including precession [43, 88, 89], would be useful to more fully understand the response of search codes across the parameter space, and would help to better tune analytical waveform models including inspiral, merger and ringdown phases.

Figure 1.

Figure 1. Mass ratio q and dimensionless spins χi of the NINJA-2 hybrid waveform submissions. Reproduced from [42].

Standard image High-resolution image

3. Modified detector noise

In this section we describe the techniques used in this work to emulate data that will be taken by second-generation GW observatories. This was accomplished by recolouring data taken from the initial LIGO and Virgo instruments to predicted 2015–2016 sensitivities. Recolouring initial LIGO and Virgo data allows the non-Gaussianity and non-stationarity of that data to be maintained.

The predicted sensitivity curves of the advanced detectors as a function of time can be found in the living document [2]. For this work we are interested in the sensitivity of the advanced detectors in 2015–2016 and used a previous prediction of the sensitivity curves for this time period as given in [90] and shown in the left panel of figure 2. These curves were used as the updated predictions given in [2] were not available when we began this study. We refer to the 2015–2016 predicted noise curves as the early sensitivity curves. It is clear from the figure that the predicted sensitivity of early AdV is significantly greater than that of the early aLIGO curve, when using the predictions given in [90]. In the right panel of figure 2 we show the distance at which optimally oriented, optimally located, non-spinning, equal mass binaries would be detected with a SNR of 8 using both noise curves. This is commonly referred to as the horizon distance. The early AdV noise curve was rescaled by a factor of 1.61 so that the sensitive distance for a (10M, 10M) binary merger would be equal to the early aLIGO noise curve. This rescaling was found to better reflect the updated predicted sensitivities presented in [2]. The results in this paper are generated using the early aLIGO and rescaled early AdV sensitivity curves.

Figure 2.

Figure 2. Left: predicted sensitivity curves for aLIGO and AdV. Shown are both the design curves and predicted 2015–2016 early sensitivity curves. Also shown is the early AdV noise curve rescaled such that the horizon distance for a (10M, 10M) binary system is equal to that obtained with the early aLIGO noise curve. Right: Horizon distance as a function of observed total mass for the early aLIGO and rescaled early AdV sensitivity curves. This plot is made considering only equal mass, non-spinning systems and calculated using the EOBNRv2 [31] waveform approximant. Results in this paper are generated from the early aLIGO noise curve and the rescaled early AdV curve.

Standard image High-resolution image

As with the initial science runs, we expect data taken from these detectors, in the absence of GW signals, to be neither Gaussian nor stationary. It is important that search pipelines demonstrate an ability to deal with these features. To simulate data with advanced detector sensitivities and with realistic non-Gaussian and non-stationary features, we chose to use data recorded by initial LIGO and Virgo and recolour that data to the predicted early sensitivity curves of aLIGO and AdV. The data we chose to recolour was data taken during LIGO's sixth science run and Virgo's second science run.

The procedure for producing such recoloured data was accomplished in the following steps, which were conducted separately for the two LIGO detectors and Virgo.

  • Identify a two-month duration of initial detector data to be recoloured.
  • Measure the PSD for each distinct section of science mode data using the PSD estimation routines in the lal software package [91].
  • Calculate an average PSD over the two-month period by taking, for every discrete value of frequency recorded in the PSDs, the median value over each of the PSDs in the set.
  • Remove any line features from the resulting PSD and from the predicted early noise curves. This is done because it is difficult to remove or introduce line features from the data without introducing unwanted artefacts. Therefore it is simpler to remove line features in the PSDs, which will have the effect of preserving the line features of the original data into the recoloured data.
  • For each frequency bin, record the median value of the PSD over each section of science mode.
  • Take the ratio of the median PSD and the predicted early advanced detector noise curve. This is the reweighting to be used when recolouring.
  • Using the time domain filtering abilities of the gstlal software package [92], recolour the data using this reweighting factor.

In figure 3 we show some examples of the PSDs obtained from recolouring the data and compare with the predicted sensitivity curves. As there are some small stretches of data in the original science runs where the sensitivity was significantly different from the average, we show the 10% and 90% quantiles as well as the maximum and minimum values for the PSD of the recoloured data. We notice that the sensitivity of the detector still varies with time, as in the initial data, and that the lines in the initial spectra are still present.

Figure 3.

Figure 3. Sensitivity curves of the recoloured data for the LIGO Hanford detector (left) and the Virgo detector (right). In both cases the black dashed line shows the predicted 2015–2016 sensitivity curve (with the scaling factor added for Virgo). The dark coloured region indicates the range between the 10% and 90% quantiles of the PSD over time. The lighter region shows the range between the minimum and maximum of the PSD over time.

Standard image High-resolution image

Non-Gaussian features present in the original data will still be present in the recoloured data, albeit distorted by the recolouring process. An example of this is shown in figure 4 where we show the SNR time-series around a known glitch in both the original and recoloured data. While the recolouring does have some effect on the glitch, the two SNR time series are very comparable. As in searches on the original data, we attempt to mitigate the effect of such features. A set of data quality flags were created for the initial detector data [93, 94]. These attempt to flag times where a known instrumental or environmental factor, which is known to produce non-Gaussian artefacts in the resulting strain data, was present. To simulate these data quality flags in our recoloured data we simply used the same flags that were present in the original data and apply them to the recoloured data.

Figure 4.

Figure 4. SNR time series in a 20 s window around a known glitch in the original data (left) and in the recoloured data (right). While the SNR time series clearly change, the primary features of the glitch are preserved across the recolouring procedure. These SNR time series were obtained by matched-filtering a short stretch of recoloured and original data against a (23.7, 1.3)M template.

Standard image High-resolution image

4. Injection parameters

As an unbiased test of the process through which candidate events are identified for BBH waveforms, seven BBH waveforms were added to the recoloured data. The analysts were aware that 'blind injections' had been added, however the number and parameters of these simulated signals were not disclosed until the analyses were completed. This was similar to blind injection tests conducted by the LIGO and Virgo collaborations in their latest science runs [46]. These injections are self-blinded to ensure that no bias from knowing the parameters of the signal, or indeed whether a candidate event is a signal or a noise artefact, affects the analysis process.

The seven waveforms added to the data were taken from the NR simulations discussed in section 2. The parameters of the blind injections are given in table 3. The distribution of physical parameters used in these blind injections was not intended to represent any physical distribution. Instead, the injections were chosen to test the ability to recover BBH systems across a wide range of parameter space. We describe the results of searches for these blind injections in section 6 and of parameter-estimation studies on these signals in section 7.

Table 1. Summary of the contributions to the NINJA-2 waveform catalog with m1 = m2. Given are an identifying label, described in section 2, mass ratio q = m1/m2 which is always 1 for these simulations, magnitude of the dimensionless spins $\chi _i=S_i/m_i^2$, orbital eccentricity e, frequency range of hybridization in Mω, the number of numerical cycles from the middle of the hybridization region through the peak amplitude, and the post-Newtonian Taylor-approximant(s) used for hybridization.

Label q χ1 χ2 1000e 100 Mω Hyb.range # NR Cycles pN Approx
S1−95−95_T1 1.0 −0.95 −0.95  1.00 3.3–4.1 18.42 T1
J1−85−85_T1 1.0 −0.85 −0.85  2.50 4.1–4.7 12.09 T1
J1−85−85_T4             T4
J1−75−75_T1 1.0 −0.75 −0.75  1.60 4.1–4.7 13.42 T1
J1−75−75_T4             T4
J1−50−50_T1 1.0 −0.50 −0.50  2.90 4.3–4.7 15.12 T1
J1−50−50_T4             T4
S1−44−44_T4 1.0 −0.44 −0.44  0.04 4.3–5.3 13.47 T4
Ll1−40−40_T1 1.0 −0.40 −0.40   6.1–8.0  6.42 T1
Ll1−40−40_T4             T4
J1−25−25_T1 1.0 −0.25 −0.25  2.50 4.5–5.0 15.15 T1
J1−25−25_T4             T4
Ll1−20−20_T1 1.0 −0.20 −0.20   5.7–7.8  8.16 T1
Ll1−20−20_T4             T4
J1+00+00_T1 1.0  0.00  0.00  1.80 4.6–5.1 15.72 T1
J1+00+00_T4             T4
G1+00+00_T4        3.00 5.5–7.5  9.77 T4
Ll1+00+00_F2         5.7–9.4  8.30 F2
S1+00+00_T4        0.05 3.6–4.5 22.98 T4
G1+20+20_T4_1 1.0  0.20  0.20 10.00 6.0–7.5  6.77 T4
G1+20+20_T4_2        6.00 5.5–7.5 10.96 T4
J1+25+25_T1 1.0  0.25  0.25  6.10 4.6–5.0 18.00 T1
J1+25+25_T4             T4
G1+40+40_T4_1 1.0  0.40  0.40 10.00 5.9–7.5  7.70 T4
G1+40+40_T4_2        6.00 5.5–7.5 12.02 T4
Ll1+40+40_T1         7.8–8.6  6.54 T1
Ll1+40+40_T4             T4
S1+44+44_T4 1.0  0.44  0.44  0.02 4.1–5.0 22.39 T4
J1+50+50_T1 1.0  0.50  0.50  6.10 5.2–5.9 15.71 T1
J1+50+50_T4             T4
G1+60+60_T4_1 1.0  0.60  0.60 12.00 6.0–7.5  8.56 T4
G1+60+60_T4_2        5.00 5.5–7.5 13.21 T4
J1+75+75_T1 1.0  0.75  0.75  6.00 6.0–7.0 14.03 T1
J1+75+75_T4             T4
G1+80+00_T4 1.0  0.80  0.00 13.00 5.5–7.5 12.26 T4
G1+80+80_T4_1 1.0  0.80  0.80 14.00 5.9–7.5  9.57 T4
G1+80+80_T4_2        6.70 5.5–7.5 14.25 T4
J1+85+85_T1 1.0  0.85  0.85  5.00 5.9–6.9 15.36 T1
J1+85+85_T4             T4
U1+85+85_T1       20.00 5.9–7.0 15.02 T1
G1+90+90_T4 1.0  0.90  0.90  3.00 5.8–7.5 15.05 T4
S1+97+97_T4 1.0  0.97  0.97  0.60 3.2–4.3 38.40 T4

Table 2. Summary of the contributions to the NINJA-2 waveform catalog with m1 > m2. Given are an identifying label, described in section 2, mass ratio q = m1/m2 magnitude of the dimensionless spins $\chi _i=S_i/m_i^2$, orbital eccentricity e, frequency range of hybridization in Mω, the number of numerical cycles from the middle of the hybridization region through the peak amplitude, and the post-Newtonian Taylor-approximant(s) used for hybridization.

Label q χ1 χ2 1000e 100 Mω Hyb. range # NR Cycles pN Approx
J2+00+00_T1  2.0 0.00 0.00  2.30 6.3–7.8  8.31 T1
J2+00+00_T4             T4
G2+00+00_T4        2.50 5.5–7.5 10.42 T4
Ll2+00+00_F2         6.3–9.4  7.47 F2
S2+00+00_T2        0.03 3.8–4.7 22.34 T2
G2+20+20_T4  2.0 0.20 0.20 10.00 5.6–7.5 11.50 T4
J2+25+00_T1  2.0 0.25 0.00  2.00 5.0–5.6 15.93 T1
J2+25+00_T4             T4
J3+00+00_T1  3.0 0.00 0.00  1.60 6.0–7.1 10.61 T1
J3+00+00_T4             T4
S3+00+00_T2        0.02 4.1–5.2 21.80 T2
F3+60+40_T4  3.0 0.60 0.40  1.00 5.0–5.6 18.89 T4
J4+00+00_T1  4.0 0.00 0.00  2.60 5.9–6.8 12.38 T1
J4+00+00_T4             T4
L4+00+00_T1        5.00 5.1–5.5 17.33 T1
S4+00+00_T2        0.03 4.4–5.5 21.67 T2
S6+00+00_T1  6.0 0.00 0.00  0.04 4.1–4.6 33.77 T1
R10+00+00_T4 10.0 0.00 0.00  0.40 7.3–7.4 14.44 T4

Table 3. The details of the blind injections that were added to the NINJA-2 datasets prior to analysis. In this table the event ID will be used throughout the paper to refer to specific injections. The network SNR of each injection is denoted by ρN. This is the sum of the overlaps of the injection with itself in each detector, using 30 Hz as the starting frequency in the overlap integrals. M denotes the total mass and q the mass ratio. χ denotes the spin on each black hole, in all seven cases both black holes in the binary had the same spin. RA and dec give the right ascension and declination of the signals respectively. Dist. denotes the distance to the source. Detectors online lists the detectors for which data is present at the time of signal. Hybridization range gives the range of frequencies in which the signal is hybridized between the post-Newtonian and numerical components. Waveform label indicates which numerical waveform was used, as shown in tables 1 and 2.

                     
Event ID label ρN Waveform q M (M) χ RA (rad) Dec. (rad) Dist. (Mpc) Detectors Online Hybrid Range (Hz)
1 J4+00+00_T4 23.9 4 124  0.00 1.26 −0.76 569 HLV  15–18
2 Ll1−20−20_T4 14.1 1  35.5 −0.20 1.70 −0.03 244 HLV  52–71
3 Ll1+40+40_T4 16.2 1  14.4  0.40 4.18  0.07 170 HLV 175–193
4 G2+20+20_T4 15.1 2  26.8  0.20 2.19 −0.36 247 LV  68–90
5 L4+00+00_T1 19.2 4  19.1  0.00 1.68  0.14  83 HV  86–93
6 J1+25+25_T4 16.9 1  75.7  0.25 4.68  0.49 854 HV  20–21
7 J1−75−75_T1  9.8 1  19.3 −0.75 0.81 −0.07 292 HLV  69–79

As well as these blind injections, a large number of (non-blind) simulated signals were subsequently analysed to obtain sufficient statistics to adequately evaluate the sensitive distances at which the NR waveforms could be detected in the early aLIGO and early AdV simulated data sets. As mentioned in section 2, each of the NR waveforms is valid only at the values of mass ratio and dimensionless spins for which it was generated. However, it is possible to change the total mass, which just rescales the waveform, and the various orientation and location parameters, which will only affect the overall phase and amplitude of the signal. For each of the 60 NR waveforms given in table 1 and used in results in section 8, a set of ∼42 000 simulated signals was generated, necessarily with mass ratios and spins that have values corresponding to the given NR waveform. The total mass was chosen from a uniform distribution between 10 and 100M. The simulations were distributed uniformly in distance, however they were not injected beyond a distance where they could not possibly be detected. The mass-dependent maximum distance that we chose to use is given by

Equation (3)

Here 1.219M is the chirp mass of a (1.4 + 1.4)M binary system. The factor of $\mathcal {M}^{5/6}$ describes, to leading order, how the SNR of the inspiral-only portion of a compact-binary merger at fixed distance will scale with mass when the inspiral is bandwidth-limited [19, 95]. 175 Mpc is chosen because it is larger than the distance at which it would be possibly to detect a (1.4 + 1.4)M binary merger with the early noise curves. However, to include a large margin for safety ∼7000 of the signals were generated with chirp-weighted distances between 175 and 350 Mpc. The orbital orientations, polarization angles and sky directions are all chosen from isotropic distributions. The signal coalescence times are drawn from a uniform distribution within our analysis window. Coalescence times were limited to times where at least two observatories were operating and no data-quality flags were active. The results of analyses on the non-blind simulated signals are given in section 8.

5. Search pipelines

The goal of this paper was to evaluate the detection sensitivity to BBH systems, modelled from the latest numerical simulations, using the search pipelines that were used to search for GW transient signals in data taken during the final initial LIGO and Virgo joint observing run. The two pipelines that were used to do this were the dedicated compact binary coalescence (CBC) search pipeline 'ihope' [37, 44, 46, 9698] and the unmodelled burst pipeline 'Coherent WaveBurst' (cWB) [45, 99101]. The ihope pipeline was developed as a search pipeline for detecting compact binary mergers. It employs a matched-filtering algorithm against a bank of template waveforms [44]. The ihope pipeline was used to search for CBC systems (not just BBHs) with component masses ∈ [1, 99] M. As a complement to template-based specialized searches, cWB was developed as an all-purpose unmodelled search pipeline, hence, it does not require a priori knowledge of the signal waveforms. It is better suited for burst signals spanning a small time-frequency volume. Moreover, due to the lack of model constraints, cWB is more adversely affected by background noise than matched-filter searches. Past simulation studies with initial LIGO sensitivity curves have shown that cWB was sensitive to CBC mergers with total masses ∈ [25, 500] M over wide regions of the binary parameter space [102].

In addition to the ihope and cWB detection pipelines we also use parameter-estimation algorithms to provide estimates of the parameters of compact binary systems observed with the detection algorithms. In the following section we provide a brief overview of the detection and parameter-estimation pipelines. The results of running these searches on the data containing the NINJA-2 blind injections are presented in section 6 and parameter-estimation results given in section 7.

5.1. Coherent WaveBurst

Coherent WaveBurst is a multi-resolution algorithm for coherent detection and reconstruction of GW bursts [22]. The cWB algorithm has been used in various LIGO–Virgo burst searches [45, 99, 100] and more recently in the search for intermediate mass BH binaries [101]. Within the framework of the constrained maximum likelihood analysis [22], cWB identifies GW signals in data from multiple detectors and provides estimates of the signal parameters, e.g. sky location and waveforms. Along with the reconstruction of unmodelled burst signals, which imply random polarization, cWB can perform loosely modelled likelihood analyses assuming different polarization states, i.e. elliptical, linear or circular.

The NINJA2 cWB analysis uses the elliptical polarization constraint [101, 102] and searches for signals in the frequency band from 32 to 1024 Hz. The analysis is performed in several steps: first, the data streams from all GW detectors are processed with the Meyer's wavelet transformations with six different time-frequency resolutions of 4 × 1/8, 8 × 1/16, 16 × 1/32, 32 × 1/64, 64 × 1/128, 128 × 1/256 (Hz × s). Then the data are conditioned with a linear predictor error filter to remove power lines, violin modes and other predictable data components. Triggers are reconstructed as the coherent sets of samples (pixels) identified in the time-frequency data. For each trigger the coherent statistics are then computed. These include the network correlation coefficient, cc and the network energy disbalance, Λ, which are used to enable the signal consistency selection cuts. The cWB detection statistic is the coherent network amplitude, η, which is used to rank the events and thereby establish the significance against a sample of background events obtained with the time-shift analysis [22, 101, 102]. This shifting procedure is typically performed thousands of times in order to accumulate sufficient statistics.

5.2. ihope

The ihope pipeline is designed to search for GWs emitted by coalescing compact binaries [44]. It has been optimized for and used in LIGO and Virgo GW searches over the past decade [37, 46, 96, 97, 103, 104], and also in the mock Laser Interferometer Space Antenna (LISA) data challenges [105]. The NINJA-2 ihope analysis uses the same pipeline-tuning that was used in the searches performed during the final initial LIGO and Virgo joint observing run [46].

The pipeline matched-filters the detector data against a bank of analytically modelled compact binary merger waveforms [19, 44]. Only nonspinning compact binary merger signals are used as filters and the bank is created so as to densely sample the range of possible binary masses [106]. For each detector, the filtering stage produces a sequence of triggers which are plausible events with a high SNR ρ. The algorithm proposed in [107] is used to keep only those that are found coincident in more than one detector across the network, which helps remove triggers due to noise. Knowledge of the instrument and its environment is used to further exclude triggers that are likely due to non-Gaussian noise transients, or glitches. Periods of heightened glitch rate are removed (vetoed) from the analysis. The time periods where the rate of glitches is elevated are divided into three veto categories. Periods of time flagged by category 1 and 2 vetoes are not included in the analysis as known couplings exist between instrumental problems and the GW channel during these periods. Periods of time vetoed at category 3 are likely to have instrumental problems. A strong GW signal can still be detected during category 3 times, but including these periods in the background estimate can compromise our ability to detect weaker signals in less glitchy periods of time. For this reason the search is performed both before and after category 3 vetoes are applied. The significance of events that survived category 1–3 vetoes were calculated using the background that also survived categories 1–3. The significance of events that survived category 2 but were vetoed at category 3 were calculated using background that survived categories 1–2.

Signal-based consistency measures further help distinguish real signals from background noise triggers in those that are not vetoed and pass the coincidence test. The χ2 statistic proposed in [108] quantifies the disagreement in the frequency evolution of the trigger and the waveform template that accumulated the highest SNR for it, cf equation (4.14) of [108]. We weight the SNR with this statistic to obtain the reweighted SNRs for all coincident triggers. The exact weighting depends on the mass range the search is focused on, cf equations (17), (18) of [44]. The reweighted SNR is used as the ranking statistic to evaluate the significance, and thus the FAR, of all triggers.

Following the division of the mass-parameter space used in [37, 46], we performed both low mass and high mass ihope searches on the NINJA-2 data. The low-mass search focused on binaries with 2 Mm1 + m2 < 25 M, and used frequency domain 3.5 PN waveforms as templates [109111]. The high-mass search instead focused on the mass-range 25 Mm1 + m2 < 100 M, and used the EOB inspiral–merger–ringdown model calibrated to NR, as described in [29]. The exact χ2-weighting used to define the reweighted SNR varied between the two analyses [44]. The significance of the triggers found by both was estimated as follows. All coincident triggers are divided into four categories, i.e. HL, LV, HV and HLV, based on the detector combination they are found to be coincident in [46]. They are further divided into three mass-categories based on their chirp mass $\mathcal {M}_{c}=(m_{1}m_{2})^{3/5}(m_{1}+m_{2})^{-1/5}$ for the low-mass search, and two categories based on their length in time for the high-mass search [46]. The rate of background noise triggers, or false alarms, has been found to be significantly higher for shorter signals from more massive binaries, and also to be different depending on the detector combination, and these categorizations help segregate these effects for estimation of the background [44, 46].

For all the triggers the combined reweighted SNR $\hat{\rho }$ is computed, which is the quadrature sum of reweighted SNRs across the network of detectors. All triggers are then ranked according to their $\hat{\rho }$ in each of the mass/duration and coincidence sub-categories independently, allowing us to estimate the trigger FAR at a given threshold $\hat{\rho }=\hat{\rho }_{0}$. This is described by

Equation (4)

which can be understood as the number of background noise triggers louder than the threshold ($N(\hat{\rho }\ge x)$) divided by the total time analysed for that coincident category (Tc). The limiting precision on this quantity is at least of order 1/Tc. Therefore, in the limit where no background events are louder than $\hat{\rho }$ we quote the FAR as less than 1/Tc. This is described in more detail in [96, 112]. From (4), the smallest FAR we can estimate is 1/Tc, and to get a more precise estimate for our detection candidates we simulate additional background time. We shift the time-stamps on the time-series of single detector triggers by Δt relative to the other detector(s), and treat the shifted time-series as independent coincident background time. All coincident triggers found in the shifted times would be purely due to background noise. We repeat this process setting Δt = ±5 s, ±10 s, ±15 s, ..., recording all the time-shifted coincidences, until Δt is larger than the duration of the dataset itself. With the additional coincident background time Tc accumulated in this way, we can get a more precise estimate of the low FARs we expect for detection candidates.

The FARs computed this way are further multiplied by a trials factor that compensates for the fact that we rank events in their own template-mass and coincidence sub-categories independently, while each of the sub-category corresponds to an (independent) analysis of the same stretch of interferometric data. This factor is discussed in detail in section 4 of [46]. Taking the trials factor into account, the final combined FAR (cFAR) is reported in table 5, and the results are described in detail in section 6.2.

5.3. Parameter estimation

The detection methods described above produce times of interest where a GW may be present in the data (i.e. triggers), along with point estimates of the compact object masses from the signal, independently in each detector. These triggers are followed up with the goal of estimating the posterior probability density function of the parameters that describe the signal and to evaluate the evidence of different waveform models. In order to do so, we use Bayesian methods, in which the data from all detectors are analysed coherently.

The Bayesian parameter-estimation algorithms used in this work provide estimates of the posterior probability distribution function. The probability of a set of parameters $\vec{\theta }$ under a model M given the observed data d can be written as

Equation (5)

Here $p(d|\vec{\theta },M)$ is the likelihood of observing the measured data given the set of parameters $\vec{\theta }$, and p(d|M) is the marginal distribution of the data under model M, commonly referred to as the evidence. When only concerned with parameter estimation the evidence is a normalization constant that can be ignored. It becomes relevant however, when comparing how well two models do in describing the observed data. By marginalizing over all model parameters the evidence is obtained

Equation (6)

Given the evidence of two competing models, M1 and M2, the support for M1 over M2 can be quantified via the Bayes factor

Equation (7)

These techniques require the generation of ∼106–107 model waveforms to probe the 9 (for non-spinning BHs) to 15 (for fully spinning BHs) dimensional parameter space of compact binary systems in circular orbit, making it infeasible at present to use NR simulations. Instead approximate models (e.g. PN, EOB) that are computationally cheaper to produce are used to estimate the parameters of a measured signal. Numerous studies have assessed the statistical uncertainty in compact binary parameter estimates [113116], which use the same approximate model for injection and analysis. Few studies have been done to quantify the systematic uncertainty in parameter estimates due to the use of these approximate models [117, 118]. NR simulations provide us with the most accurate waveforms currently available, making them ideal for quantifying the systematic uncertainties inherent with using approximate models. This mock data challenge is the first time such a study has been conducted using models that account for the component angular momenta of the compact objects.

The Markov Chain Monte Carlo sampler lalinference_mcmc [119, 120], and two nested sampling implementations lalinference_nest [115] and MultiNest [121] from the LALInference package of the LSC Algorithm Library [91] were used to follow up GW candidates from the detection search pipelines. Due to the computational burden, we carried out the analysis with lalinference_mcmc, and as a consistency check for selected candidates and waveforms posterior estimates were also obtained with lalinference_nest and MultiNest. For model comparisons we have calculated the evidence by marginalizing each posterior estimate using thermodynamic integration.

Each candidate was analysed using two distinct waveform models: PhenomB and EOBNRv2. Both models describe the IMR phases of the GW from a compact binary merger. EOBNRv2 models non-spinning binaries using the EOB that re-sums the PN dynamics and energy flux, and describes the merger–ringdown signal as a superposition of quasi-normal modes [31]. PhenomB is a phenomenological model with a PN description of the inspiral phase building up on test-mass terms to 2PN order, fit to a set of spinning and non-spinning PN–NR hybrid waveforms [33]. Waveforms are generated in the frequency domain and model binaries with component spins aligned with the orbital angular momentum through a single spin parameter χ ≡ (1 + δ)χ1/2 + (1 − δ)χ2/2. Here δ ≡ (m1m2)/M and $\chi _i\equiv S_i/m_i^2$, where Si denotes the angular momentum of the ith component of the binary, and M the total mass of the system.

The mass-ratio-dependent and higher order terms used in PhenomB are calibrated to PN–NR hybrids that cover the late-inspiral, merger and ringdown. Therefore the accuracy of the model is expected to decline with decreasing total mass, as the inspiral phase of the waveform becomes a larger fraction of the total SNR of the signal, especially for comparable mass binaries. At the time of the analysis however, it was the only IMR waveform, including spin effects, that was computationally feasible for use, making it the most physically relevant waveform for the analysis. EOBNRv2 is more computationally expensive, but has been shown to be accurate enough for uncertainties in parameter estimates to be dominated by statistical error, rather than systematic [122]. It only models binaries with non-spinning components however, so the model is only relevant for non-spinning injections. SEOBNRv1 is the successor to EOBNRv2 that accounts for (aligned) spin [32], however it is currently too computationally expensive to be used for parameter estimation.

Due to the lack of astrophysical constraints on compact binary systems, it is difficult to physically motivate any particular choice for the prior distribution of the intrinsic parameters (i.e. masses and spins). For this study we have chosen to use distributions that are uniform in component masses and component spin magnitudes over the range of parameter values being injected. The prior distribution was also flat in coalesence time across a 200 ms window centred on the trigger time, isotropic in orientation angles (e.g. inclination), and volumetric, giving equal prior probability to all spatial locations.

6. Blind injection challenge results

In this section we present the results of using the detection pipelines described in section 5 to search for the blind injections listed in table 3.

6.1. Coherent WaveBurst

For the NINJA2 cWB analysis it was decided a priori to search for GW bursts in the entire available times during which all three detectors were operating (17.9 days) and to discard the remaining times. First the search was performed on a total of 12 000 time-lagged observation times, accumulating 563.7 years of effective background live time. The background events that survived the data quality and analysis selection cuts (i.e. cc > 0.7 and Λ < 0.4) were used for calculation of the significance of candidate events. This background sample contains all reconstructed events, most of which do not resemble expected compact coalescence waveforms, as we do not enforce any waveform model. As a result, our background distribution is populated by relatively high SNR events.

After completing the background analysis, the zero lag live time was analysed. The search detected an on-source event showing a chirping waveform compatible with a CBC at a SNR ∼22.1 and η = 7.1 . The FAR of the candidate was estimated at ∼1/47 yr−1 from comparison with the burst reference background, yielding a false alarm probability of ∼0.001. After the parameters of the blind injections were disclosed, this event was revealed to be the first blind injection of table 3. As a follow-up analysis, we investigated a posteriori all the times of the blind injections, as well as those on 2-fold exclusive live time. We found that the rest of the injected signals are either reconstructed with extremely low η or missed, see table 4. For massive systems, such as events 1 and 6, the cWB algorithm recovers a large fraction of the injected SNR ratio. For lighter binaries, as expected, the algorithm is largely sub-optimal136.

Table 4. The cWB search and follow-up results. The event labels correspond to those of each blind injection given in table 3. This association is based on the time of the candidates relative to the time of the injections. M denotes the total mass. The false alarm probability, FAR, and false alarm probability, FAP, of each event are estimated by comparison with the empirically-calculated background distribution of the corresponding network of detectors. All but the first event are well within the bulk of the corresponding FAR distributions. η is the network correlated amplitude, which is the main cWB detection statistic. The injected network SNR ($\rho _{{\rm inj}}^{{\rm net}}$) is the square root of the quadratic sum of the optimal SNR in each detector. The recovered network SNR ($\rho _{{\rm rec}}^{{\rm net}}$) is the cWB estimate of the injected network SNR. Events 5 and 7 were completely missed due to a low reconstructed SNR and/or because of nearby noise glitches.

Event ID Event label M 1/FAR (yr) FAP Network η $\rho _{{\rm rec}}^{{\rm net}}$ $\rho _{{\rm inj}}^{{\rm net}}$
1 J4+00+00_T4 124 47 0.001 HLV 7.1 22.1 22.8
2 Ll1−20−20_T4 35.5 HLV 2.8  9.1 13.9
3 Ll1+40+40_T4 14.4 HLV 2.7  9.2 15.7
4 G2+20+20_T4 26.8 LV 1.6  7.4 14.1
5 L4+00+00_T1 19.1 HV 18.5
6 J1+25+25_T4 76.7 HV 2.0 13.8 15.9
7 J1−75−75_T1 19.3 HLV  9.5

6.2. ihope

The results of the low-mass and high-mass ihope searches are presented in table 5. The event IDs correspond to the event IDs of the blind injections in table 3. The mapping between the ihope candidates and the blind injections is based on the event times of each. All injections except for injection 7 were found with high significance in one or both searches. Event 7 was missed because the injection's SNR was too small to be detected by the pipeline. The optimal SNR of this injection—obtained by finding the overlap of the injection with itself—was 5.7 in H, 6.0 in L, and 5.3 in V, giving a network SNR of 9.8. However, the injected SNR in Virgo was below the SNR threshold used by the ihope pipeline (=5.5). This means that, at best, the event could only surpass threshold in H and L, giving a maximum recoverable network SNR of 8.2; the FAR at this network SNR is order 103 per year.

Table 5. The ihope search results. The event IDs correspond to the event ID of each blind injection given in table 3; this association is based on the time of the candidates relative to the time of the injections. The cFARs are calculated from all possible 5 s time shifts over the entire two-month dataset. M and q give the total mass and mass ratio respectively that were recovered in each detector. The recovered SNR (ρrec) and reweighted SNR ($\hat{\rho }$) are reported separately for each detector. To calculate the cFARs, the quadrature sum of $\hat{\rho }$ was used. Unless noted, the cFARs were calculated after category 1–3 vetoes were applied.

$\epsfbox{images/cqg494984un01.eps}$

a Only used LV triggers for computing significance of this event; see section 6.2. b Only used HL triggers for computing significance of this event; see section 6.2.

For this analysis we used the same vetoes as were used in [46] and [37] applied to the corresponding times in the recoloured data. After veto categories 1–3 were applied, the total analysed time consisted of 0.6 days of coincident HL data, 5.4 days of coincident LV data, 6.5 days of coincident HV data and 8.9 days of coincident HLV data. FARs were calculated in each bin using the time-shift method described in section 5.2, then combined over all bins.

Table 5 also gives the total masses and mass ratios that were recovered by the ihope pipeline in each detector for each candidates. We see that the values reported by ihope can vary substantially from the injected parameters. This is not surprising: many of the injections had spin, and one injection (event 1) was outside of the mass range covered by the template bank. We also see that the high-mass search deviates from the actual mass parameters more than the low-mass search. This, too, is expected since the template bank in the high-mass search is more sparsely populated. In general, templates are placed in ihope so as to maximize detection probability across the parameter space while minimizing computational cost. ihope therefore only provides a rough estimate of candidate parameters. For more precise estimates we use the parameter-estimation techniques described in section 5.3, the results of which are presented in section 7. For compact binary systems where most of the SNR is obtained from the inspiral, ihope is expected to give a good estimation of the chirp mass of the system [48, 123, 124]. For the lowest mass systems in this study, ihope is not recovering the chirp mass to within 5% accuracy. For the higher mass systems the error on the chirp mass recovered by ihope grows to over 100% for event 1.

The greatest concern for a detection pipeline like ihope is whether the mismatch between templates and signals is small enough so as not to lose a substantial amount of reweighted SNR. The templates used in this search were able to recover enough SNR of the blind injections to make them stand significantly above background. One might think that the majority of the recovered SNR comes from templates matching the PN part of the injected waveforms. However, figure 5 shows the SNR recovered by ihope in the PN and NR parts of the injections as a fraction of the total available SNR. We see that most of the available NR SNR is recovered in every event even though template waveforms did not have merger and ringdown (in the case of events recovered by the low-mass search), or were not calibrated to these particular numerical waveforms (in the case of events recovered by the high-mass search). To more rigorously determine what effect mismatch between templates and signals may have on detection sensitivity, we perform a large-scale injection campaign with the NINJA-2 waveforms in section 8.

Figure 5.

Figure 5. SNR recovered by ihope as a fraction of total available SNR in each detector for each found injection. Hatched bars give the percentage of available SNR in the PN part of the injection; solid bars give the percentage of available SNR in the NR part. Colour bars indicate the amount of SNR recovered by ihope. 'LM' indicates events recovered by the low-mass search—which used 3.5 PN waveforms for templates—'HM' indicates events recovered by the high-mass search—which used EOBNRv1 waveforms for templates. The PN SNR is determined by terminating the matched filter at the frequency half-way between the hybridization range; the NR part is given by filtering from that frequency and up. Remarkably, both the low-mass and high-mass searches recovered most of the NR SNR despite the templates not having merger and ringdown (low mass) or not calibrated to the numerical waveforms used for the injections (high mass).

Standard image High-resolution image

Initially we used 100 time shifts to identify candidate events. All of the coincident events associated with the blind injections were louder than all background in the 100 time shifts. These were the only events to be louder than all background. Using 100 time shifts we could only bound the cFAR of the events to ≲ 10 yr−1, which is not small enough to claim a detection. To improve our estimate, we performed as many 5 s time shifts as possible in the NINJA-2 dataset. This is the same method that was used for the blind injection described in [46].

Two blind injections were found in all three detectors: event 2 in the high-mass search and event 3 in the low-mass search (before category 3 vetoes were applied). Estimating background using the extended slide method with three detectors adds computational complexity, and has not previously been performed (the blind injection in [46] was only coincident in two detectors). However, in both events 2 and 3 one of the three detectors had significantly less $\hat{\rho }$ than the other two (H in event 2 and V in event 3). We therefore did not include the detector with the smallest $\hat{\rho }$ when estimating the extended background for these two events.

Event 6 was vetoed at category 3. We therefore calculated its significance only after the first two veto categories were applied. All of the other events survived category 3 vetoes. Naïvely, we expect these events to have lower FARs if their significance is calculated after categories 1–3 have been applied. However, for event 3 the trigger in the H detector was vetoed at category 3, leaving only L and V. Since the H trigger contributed a substantial amount of the combined reweighted SNR, we might expect the resulting FAR to be higher for this event after category 3. A method to deal with partially vetoed events like this has not been proposed. We therefore simply report both results here.

Event 4 was found with high significance by both the high-mass and low-mass searches. This is not surprising as the injected total mass was 26.8 M, which is close to the boundary between the two searches. Currently no method has been established on how to combine the results from the low-mass and high-mass searches. We therefore give both results here.

7. Parameter estimation results

For each of the blind injections described in sections 4 and 6, parameter distributions were estimated using both non-spinning and spin-aligned models, as functions of eleven and nine parameters, respectively. PhenomB was used both as an aligned-spin model, as well as a non-spinning model by fixing the effective spin χ to 0, which we will refer to as PhenomB$_\mathtt {\chi =0}$. In addition, the EOBNRv2 model was used as an additional non-spinning waveform (see section 5.3 for model descriptions). From these posterior estimates, one can determine the marginalized distributions in parameters of interest, as well as the evidence for the given model (see equation (6)).

Figure 6 shows the 95% credible region of the marginalized posterior in component mass space for the non-spinning and spinning models. The credible region is systematically larger for the spinning model due to the strong degeneracy between the mass ratio and spin magnitude for aligned-spin models [47, 48], illustrated in figure 7.

Figure 6.

Figure 6. The 95% credible regions for the EOBNRv2 (blue), PhenomB$_\mathtt {\chi =0}$ (red), and PhenomB (green) models. Injected values are indicated by the 'x' and the neutron star and mass gap regions are indicated where relevant. A strong degeneracy between spin and mass ratio results in systematic biases and artificially strong constraints on mass estimates when spin is ignored (i.e. EOBNRv2, PhenomB$_\mathtt {\chi =0}$). By accounting for spin the PhenomB model produces estimates consistent with the injected sources.

Standard image High-resolution image
Figure 7.

Figure 7. The 90% (dashed), 95% (solid), and 99% (dotted) credible regions of the two-dimensional marginalized posterior for the mass ratio and effective spin χ using the PhenomB model. The strong correlation between mass ratio and spin is responsible for the systematically weaker constraints placed on component masses when analysing signals with an aligned-spin model.

Standard image High-resolution image

In a real detection scenario, we can never be certain that a source contains only non-spinning components. Therefore to evaluate the ability to distinguish between 'typical' stellar-mass BHs and those occupying the mass gap (∼3–5 M), we must restrict our attention to the spinning PhenomB model. Event 3 is a spinning system where both components have a mass of 7.18 M. Despite being well above the mass gap, the degeneracy of the spinning model results in constraining the lower-mass component to be outside of the mass gap with only 46% confidence. Event 5, on the other hand, contains a lower-mass component well inside the mass gap, with a mass of 3.83 M. Knowing a priori that this is a non-spinning system, we would be able to constrain the mass to be within the gap with 100% certainty. However, when including spin, the PhenomB model only constrains it to be within the gap with 21% certainty. The estimates of the masses and spin are summarized in table 6. These results highlight the need to take spin into account when making any statements about compact object mass from GW data. We note that spin-aligned systems are the most extreme case of this degeneracy; if spins are mis-aligned with respect to the binary angular momentum the binary precesses, which causes phase and amplitude modulations in the observed waveform. This effect provides additional information to break the degeneracy between masses and spins that may result in tighter constraints on the component masses. However, we expect the exact details to depend on the actual parameters of the observed systems (masses, spins and orientation of the angular momenta), and studies are ongoing to address this issue.

Table 6. Summary of the mass and spin estimates for events 1–7 for each of the models used. Median values are quoted along with the upper and lower bounds of the 95% credible intervals. The injected values for each event are included for reference.

$\epsfbox{images/cqg494984un02.eps}$

If the waveform and noise models exactly describe the data, then these Bayesian credible intervals would be equivalent to frequentist confidence intervals, meaning that the true parameters would fall within the 95% credible interval, for example, for 95% of injections. Any errors in the models however, will introduce systematic biases that break this equivalence. In these analyses such biases could be introduced both by the use of model waveforms that only approximate those that were injected, and by our assumption that the noise is purely Gaussian. Since the noise used for this study is real (recoloured) noise recorded by the initial LIGO and Virgo detectors, the non-Gaussianities inherent with real noise can introduce bias to the parameter estimates [125]. Such biases are most apparent in the first event. This non-spinning injection was loud enough that systematic uncertainties between the NR and EOBNRv2 waveforms should be less than the statistical uncertainty [122]. Omega scan [126] spectrograms show there to be a significant glitch in the Hanford detector at the time of the injection (see figure 8(a)). An additional MCMC analysis was performed on the same injection made into noiseless data, using the same PSD as estimated for the event 1 analysis. Figure 8(b) shows the 95% credible region of this analysis to indeed constrain the injected values, leading to the conclusion that the non-Gaussian features of the detector noise led to significant systematic biases. Such biases, to varying degrees, are likely for any signal in noise that is not properly modelled. It will be crucial for better noise models to be implemented before the first detections in order to avoid these biases in parameter estimates.

Figure 8.

Figure 8. (a) shows a spectrogram of the noise in Hanford centred on the end time of the injection (before the blind injection is added). A non-Gaussian feature is present with peak energy slightly after the end time of the injection. (b) shows the 95% credible regions for the EOBNRv2 analysis of the event 1 injection in real noise (solid) and noiseless data (dashed). Injected values are indicated by the 'x'. The fact that the injected values are well outside of the 95% credible region when real noise is present, and not in noiseless data, leads to the conclusion the non-Gaussian feature in the noise led to significant systematic biases in parameter estimates.

Standard image High-resolution image

The top of figure 9 shows the evidence for each model for the events analysed. From the non-spinning (PhenomB$_\mathtt {\chi =0}$) and spinning (PhenomB) evidences, the support for the presence of spin in each signal can be quantified using the Bayes factor, defined in equation (7). The bottom of figure 9 shows the Bayes factor and associated error estimates.

Figure 9.

Figure 9. Top: evidences for the non-spinning (EOBNRv2 and PhenomB$_\mathtt {\chi =0}$) and spin-aligned (PhenomB) models. Bottom: Bayes factors showing the support for the spin-aligned model over the non-spinning PhenomB$_\mathtt {\chi =0}$ model. Events 3 and 4 are found to be spinning with high certainty. Event 7, with the largest spin magnitude, has little support for spin, though this is likely due to its low SNR.

Standard image High-resolution image

These model comparisons show strong support for the presence of spin for events 3 (χinj = 0.4) and 4 (χinj = 0.20). Event 7 had the greatest spin magnitude at χinj = −0.75, however analyses showed little evidence of it. This can likely be attributed to the low network SNR of event 7 (see table 3), which was partly due to the faster phase evolution of systems with spins counter-aligned to the orbital angular momentum [127].

There has been much work done to quantify the ability of ground-based detectors to localize GW sources on the sky [2, 128130]. The accuracy of such localizations will be important for triggering electromagnetic (EM) followup of GW detections. Though most of this sky-localization work has focused on binary neutron star mergers due to their likely association with short gamma ray bursts [131133] and numerous other proposed emission mechanisms (e.g. r-process [134], etc), there is still much to be learned from accurate BBH merger localization. Since such mergers have never been observed, the possibility of an unexpected EM emission mechanism warrants the EM followup of the first detections.

Figure 10 shows the two-dimensional marginalized distribution for the sky position of each event. Estimates of each sky position and the areas of the 95% credible regions are given in table 7. The constraints on sky position are remarkably consistent across the models used. This is of great importance for low latency sky localization efforts, where time is of the essence. These results suggest that the more physically accurate, and computationally expensive, waveform models do not provide significantly more precise or accurate estimates of source sky positions.

Figure 10.

Figure 10. 95% credible regions for the sky position of all seven events, using the EOBNRv2 (blue), PhenomB$_\mathtt {\chi =0}$ (red), and PhenomB (green) models. Despite the substantial differences between these models, their sky localization ability is very consistent.

Standard image High-resolution image

Table 7. Summary of the sky location estimates for events 1–7 for each of the models used. Median values are quoted along with the upper and lower bounds of the 95% credible intervals, as well as the area of the 95% credible regions in the sky. The injected values for each event are included for reference.

$\epsfbox{images/cqg494984un03.eps}$

8. Sensitivity evaluation

A large set of simulated signals, distributed as described in section 4, was used to assess the sensitivity of the pipelines to observe NR signals buried in data taken from initial LIGO and initial Virgo and recoloured to predicted early advanced detector observing runs, as described in section 3. Here, we use the CBC search pipelines (low-mass and high-mass ihope) to assess search efficiency in the chosen mass region: total mass between 10 and 100 solar masses.

Each signal in these large simulation sets was added at a random time to the two-month period of recoloured data. Coalescence times were limited to times where at least two observatories were operating and no data-quality flags were active. We then search for each signal using the ihope pipeline, as described in section 5. In the plots that follow we treat a simulated signal as 'detected' if there was no louder background event in the 100 time-slide trials that were performed to estimate the search background. As two ihope searches are performed, 'low-mass' and 'high-mass', we treat simulated signals as detected if either search recovers the signal with more significance than its corresponding 100 background trials.

The results of the injection campaign are summarized in figure 11, which shows the dependence of the search sensitive distance on the chosen NR waveform and the total mass value. The sensitive distance is defined as the volume-weighted average distance up to which a CBC signal with a given waveform and total mass can be detected, where we average over the source extrinsic parameters and over the varying detector noise levels in the data set via Monte Carlo integration. In order to obtain a clear comparison between different NR waveforms, we used the same set of random parameters (including total masses, coalescence times and orientation angles) for each set of ∼24 000 injected signals, thus statistical errors will tend to be correlated between different waveforms.

Figure 11.

Figure 11. Volume-weighted average sensitive distance as a function of total mass, for a variety of NR waveforms. The sensitive distance is defined as the volume-weighted average distance up to which a CBC signal with a given waveform and total mass can be detected, where we average over the source extrinsic parameters and over the varying detector noise levels in the data set via Monte Carlo integration. Top left—Waveforms with equal component masses and various spins. Here the dotted lines represent predicted performances if an aligned-spin search had been performed. Details of this prediction are given in section 8. Top right—Waveforms with zero component spins and various mass ratios. Bottom left—Waveforms with equal-mass, non-spinning components. The dashed black line represents the predicted sensitivity curve of non-spinning waveforms with the aLIGO sensitivity curve, as described in 8. The dotted black line represents the predicted sensitivity with the AdV sensitivity curve. Bottom right—Waveforms with unequal spins, or unequal component masses and non-zero spins. The waveform designations are given in table 1. These plots were generated using the data described in section 3 and the distribution of signals described in section 4.

Standard image High-resolution image

In figure 11, top left, we plot the sensitive distance as a function of total mass for a number of equal mass NR waveforms with varying component spin magnitudes. The two waveforms with anti-aligned spins have smaller sensitive distances than the other waveforms (see also [37]). This is expected; systems with anti-aligned spin emit less power in GWs than non-spinning systems [33, 135] and also may not match well with the template bank of non-spinning waveforms that was used [50]. At the largest masses in this study we can detect the two aligned-spin waveforms at distance ∼10%–15% larger than the non-spinning waveform. However, at the smallest masses used, the non-spinning waveform can be seen to comparable distances. Again this is a combination of two factors. Aligned-spin waveforms emit more power in GWs than non-spinning systems, but the bank of non-spinning waveforms will not capture all of that power. This emphasizes that investigating the use of template waveforms incorporating spin effects is a necessity for the advanced detector era. To illustrate this further the dotted lines in this figure represent a prediction of the sensitivity that would have been obtained if a bank of aligned-spin template waveforms had been used. This prediction was obtained by taking the sensitive distances found for non-spinning NR waveforms recovered in the current CBC searches, which use non-spinning inspiral-only or IMR templates, and multiplying them by the ratio of optimal (matched filter) SNRs of (anti)aligned-spin NR waveforms to non-spinning NR waveforms as a function of total mass. This is described by

Equation (8)

Here i is used to denote the aligned-spin waveform for which the sensitive distance, $D_{\mathrm{sens}}^i$, is to be predicted. NS is used to denote the non-spinning waveform. The optimal matched filter SNR at fixed distance is proportional to the signal power σ, given by the inner product

Equation (9)

where hi is the spinning waveform labelled by i (or hNS for the non-spinning waveform) for an optimally oriented and located binary at fixed distance.

We can see from the plot that the distance sensitivity to spinning waveforms using a non-spinning bank is not as large as the predicted values from using (anti)aligned-spin template banks. This is especially true for the highly aligned-spin waveform where the predicted sensitive distance is ∼15% larger than the obtained distance for systems with total mass >40M. We note that the actual sensitivity improvement of a search using (anti)aligned-spin templates may not be as much as predicted here because such a search would require a larger number of templates and therefore provide more chances to obtain large SNR when matched-filtered against the underlying noise. Therefore the detection threshold when performing an aligned-spin search will increase with respect to a search using a non-spinning template bank. However, even a factor of 10 increase in the number of independent templates will only increase the expected SNR of the loudest background event by less than 5%, if Gaussian noise is assumed [51]. Work is ongoing to assess how much larger template banks of aligned-spin BBH waveforms are when compared to banks restricted to only non-spinning waveforms and to accurately compare the performance of such searches.

In the top right panel of figure 11 we plot the horizon distance as a function of total mass for a number of non-spinning NR waveforms with varying mass ratio. The sensitivity to these systems, for the same total mass, increases as the mass ratio (which we take to be greater than or equal to 1) decreases. This is because, to leading order, the GW power emitted during the inspiral phase is dependent on the chirp mass [95]; for the same total mass, a system with higher mass ratio will have a smaller chirp mass.

The bottom left panel of figure 11 shows the horizon distance as a function of total mass for the five non-spinning, equal-mass waveforms that were submitted. As expected, there is no significant discrepancy between these waveforms. We remind the reader that the same set of source parameters was used for each of these five sets of injections. Therefore statistical errors are strongly correlated between the results for different waveforms. The dashed and dotted black lines on this plot represent a prediction of the sensitive distance to non-spinning signals for the early aLIGO and early AdV noise curves respectively. This prediction was obtained by calculating the horizon distance, defined in section 3, for the $G1+00+00\_T4$ waveform for both the early aLIGO and early AdV sensitivity curves as a function of mass. This measurement is then rescaled by a factor of 2.26 to account for the fact that the observatories do not have equal power to all orientations and sky locations [136]. We note that the obtained results agree well with this prediction, and fall between the early aLIGO and early AdV predictions when the two diverge. This is expected, as detection in ihope is dominated by the sensitivity of the second most sensitive detector operating at the time [44]. For times when at least two observatories were operating, the NINJA-2 dataset approximately consists of 50% of time when only one of the LIGO detectors and Virgo were operating and 50% of time when both LIGO detectors were operating, including time when Virgo is operating and when it is not. As the early aLIGO sensitivity does not drop below the early AdV sensitivity for the mass range considered, we expect the obtained sensitivity curve to lie roughly in the middle of the two predictions, and this is what we observe. It is worth pointing out that the ihope search is able to achieve this Gaussian-noise-predicted sensitivity, even though this analysis is run on real data, which includes non-stationarity and non-Gaussian transients.

Finally, in the bottom right panel of figure 11 we show sensitive distances for a number of waveforms with unequal spins, or unequal masses and non-zero spins.

9. Conclusion

This paper presents the first systematic study to assess the ability to detect numerically modelled binary black hole data in real data taken from initial LIGO and Virgo and recoloured to predicted sensitivity curves of Advanced LIGO and Advanced Virgo in early observing runs. Building upon the work of the first NINJA project, this work, the culmination of the second NINJA project, studies the ability to do gravitational wave astronomy on a set of 60 binary black hole hybrid waveforms submitted by eight NR groups.

In this work, a set of seven numerically modelled binary black hole waveforms were added into the recoloured data. This data was distributed to analysts with no knowledge of the parameters of the systems. The unmodelled gravitational waveform search pipeline, cWB, was able to recover one of these signals with an estimated false alarm rate of 1 every 47 years. The matched-filtered compact binary merger search pipeline, ihope, using a bank of BBH IMR waveforms, which were not calibrated against the NR signals used in NINJA-2, was able to recover 6 of the waveforms with false alarm rate upper limits ranging between 1 every 300 years and 1 every 2500 years.

A range of parameter-estimation codes were run on the seven blind injections that were added to the data used in this work. Though only results from the MCMC sampler were shown, these results proved to be statistically equivalent to estimates produce by the nested sampling and multinest samplers. These results demonstrate that it will be difficult to produce precise estimate of black hole component masses and spins because of intrinsic degeneracies between these parameters in the emitted waveforms. For some of the BBH blind injections we find that a neutron-star–black-hole coalescence cannot be ruled out. We also demonstrate the sensitivity of current parameter-estimation algorithms to non-Gaussian features in the data and explore the ability to perform sky-localization of BBH observations.

A large-scale Monte Carlo study was conducted to assess the efficiency of the ihope search pipeline as a function of the mass and angular momenta of the component black holes. We find that for non-spinning equal mass waveforms the sensitivity of the ihope search pipeline in real noise, including non-Gaussian artefacts, agrees well with predictions obtained using a Gaussian-noise assumption. We have found evidence that adding waveform models that include the effects of spin into the search pipeline will increase the efficiency of binary black hole observation. We have also demonstrated that the ability to recover numerical relativity waveforms, with identical parameters, but submitted by different groups, is indistinguishable up to statistical errors.

These results represent the next step for the NINJA collaboration; they address shortcomings in NINJA-1 while paving the way for future work. In a sense this paper represents a baseline, as it measures the ability of current gravitational-wave analyses to detect and recover the parameters of an important subset of possible BBH signals in non-Gaussian noise in the advanced detector era. From this baseline there are multiple directions in which NINJA can expand. On the NR front, groups are continuing to fill in the parameter space. As shown in figure 1, even within the subspace of systems with (anti-)aligned spins there are large regions left to explore. Although NINJA-2 chose not to consider precessing signals many groups already have or are working on such simulations [36, 43, 88, 137143]. Similarly, although the analyses used only the ℓ = m = 2 mode in this work, it is expected that higher modes will be important for detection and parameter recovery [21, 31, 144146]. Additional modes have been provided for many of the waveforms in the NR catalog, although they have not yet been validated. In all cases, as additional waveforms and modes become available they can be injected into the noise allowing for systematic tests of both detection and parameter-estimation analyses.

In parallel the detection and parameter-estimation analyses continue to evolve and improve. There is much development work ongoing to improve the analytical waveform models that are used in analysis pipelines, particularly for inspiral–merger–ringdown waveforms. It seems likely that before the first aLIGO and AdV observation runs generic fast IMR precessing analytic models will be available [32, 34, 36, 147, 148]. Improvements in how detection pipelines deal with non-Gaussianities are being explored to attempt to achieve the maximum possible sensitivity to BBH signals across the parameter space. A number of efforts are ongoing to implement aligned-spin waveform models into search algorithms. As we have demonstrated here, this will increase sensitivity to BBH systems with aLIGO and AdV [4951]. Work is also underway to develop more realistic models of detector noise for parameter-estimation pipelines, which account for the non-stationarity and non-Gaussianity present in real noise [149]. Accounting for such features is expected to greatly reduce systematic biases in the recovered masses and spins, such as those seen in event 1. The results presented here can provide a measure against which these next-generation analyses can be compared, in a way that measures not only their response to signals but also to realistic noise.

Acknowledgments

The authors gratefully acknowledge the support of the United States National Science Foundation for the construction and operation of the LIGO Laboratory, the Science and Technology Facilities Council of the United Kingdom, the Max-Planck-Society, and the State of Niedersachsen/Germany for support of the construction and operation of the GEO600 detector, and the Italian Istituto Nazionale di Fisica Nucleare and the French Centre National de la Recherche Scientifique for the construction and operation of the Virgo detector. The authors also gratefully acknowledge the support of the research by these agencies and by the Australian Research Council, the International Science Linkages program of the Commonwealth of Australia, the Council of Scientific and Industrial Research of India, the Istituto Nazionale di Fisica Nucleare of Italy, the Spanish Ministerio de Economía y Competitividad, the Conselleria d'Economia Hisenda i Innovació of the Govern de les Illes Balears, the Foundation for Fundamental Research on Matter supported by the Netherlands Organisation for Scientific Research, the Polish Ministry of Science and Higher Education, the FOCUS Programme of Foundation for Polish Science, the Royal Society, the Scottish Funding Council, the Scottish Universities Physics Alliance, The National Aeronautics and Space Administration, OTKA of Hungary, the Lyon Institute of Origins (LIO), the National Research Foundation of Korea, Industry Canada and the Province of Ontario through the Ministry of Economic Development and Innovation, the National Science and Engineering Research Council Canada, the Carnegie Trust, the Leverhulme Trust, the David and Lucile Packard Foundation, the Research Corporation, and the Alfred P Sloan Foundation.

We gratefully acknowledge support from the National Science Foundation under NSF grants PHY-1305730, PHY-1212426, PHY-1229173, AST-1028087, DRL-1136221, OCI-0832606, PHY-0903782, PHY-0929114, PHY-0969855, AST-1002667, PHY-0963136, PHY-1300903, PHY-1305387, PHY-1204334, PHY-0855315, PHY-0969111, PHY-1005426, PHY-0601459, PHY-1068881, PHY-1005655, PHY-0653443, PHY-0855892, PHY-0914553, PHY-0941417, PHY-0903973, PHY-0955825, by NASA grants 07-ATFP07-0158, NNX11AE11G, NNX13AH44G, NNX09AF96G, NNX09AF97G, by Marie Curie Grants of the 7th European Community Framework Programme FP7-PEOPLE-2011-CIG CBHEO No. 293412, by the DyBHo-256667 ERC Starting Grant, and MIRG-CT-2007-205005/PHY, and Science and Technology Facilities Council grants ST/H008438/1 and ST/I001085/1. Further funding was provided by the Sherman Fairchild Foundation, NSERC of Canada, the Canada Research Chairs Program, the Canadian Institute for Advanced Research, the Ramón y Cajal Programme of the Ministry of Education and Science of Spain, contracts AYA2010-15709, CSD2007-00042, CSD2009-00064 and FPA2010-16495 of the Spanish Ministry of Science and Innovation, the German Research Foundation, grant SFB/Transregio 7, the German Aerospace Center for LISA Germany, and the Grand-in-Aid for Scientific Research (24103006). Computations were carried out on Teragrid machines Lonestar, Ranger, Trestles and Kraken under Teragrid allocations TG-PHY060027N, TG-MCA99S008, TG-PHY090095, TG-PHY100051, TG-PHY990007N, TG-PHY090003, TG-MCA08X009. Computations were also performed on the clusters "HLRB-2' at LRZ Munich, 'NewHorizons' and 'Blue Sky' at RIT (funded by NSF grant nos PHY-1229173, AST-1028087, DMS-0820923 and PHY-0722703), 'Zwicky' at Caltech (funded by NSF MRI award PHY-0960291), 'Finis Terrae' (funded by CESGA-ICTS-2010-200), 'Caesaraugusta' (funded by BSC grant nos AECT-2011-2-0006, AECT-2011-3-0007), 'MareNostrum' (funded by BSC grant nos. AECT-2009-2-0017, AECT-2010-1-0008, AECT-2010-2-0013, AECT-2010-3-0010, AECT-2011-1-0015, AECT-2011-2-0012), 'VSC' in Vienna (funded by FWF grant P22498), 'Force' at GaTech, and on the GPC supercomputer at the SciNet HPC Consortium [150]; SciNet is funded by: the Canada Foundation for Innovation under the auspices of Compute Canada; the Government of Ontario; Ontario Research Fund—Research Excellence; and the University of Toronto.

Footnotes

  • 136 

    Lately, a lot of work has been devoted to extend the sensitivity of the algorithm to lower total masses, which is part of the on-going upgrades of cWB.

Please wait… references are loading.
10.1088/0264-9381/31/11/115004