A guide to LIGO-Virgo detector noise and extraction of transient gravitational-wave signals

The LIGO Scientific Collaboration and the Virgo Collaboration have cataloged eleven confidently detected gravitational-wave events during the first two observing runs of the advanced detector era. All eleven events were consistent with being from well-modeled mergers between compact stellar-mass objects: black holes or neutron stars. The data around the time of each of these events have been made publicly available through the gravitational-wave open science center. The entirety of the gravitational-wave strain data from the first and second observing runs have also now been made publicly available. There is considerable interest among the broad scientific community in understanding the data and methods used in the analyses. In this paper, we provide an overview of the detector noise properties and the data analysis techniques used to detect gravitational-wave signals and infer the source properties. We describe some of the checks that are performed to validate the analyses and results from the observations of gravitational-wave events. We also address concerns that have been raised about various properties of LIGO-Virgo detector noise and the correctness of our analyses as applied to the resulting data.

The LIGO Scientific Collaboration and the Virgo Collaboration have cataloged eleven confidently detected gravitational-wave events during the first two observing runs of the advanced detector era. All eleven events were consistent with being from well-modeled mergers between compact stellar-mass objects: black holes or neutron stars. The data around the time of each of these events have been made publicly available through the Gravitational-Wave Open Science Center. The entirety of the gravitational-wave strain data from the first and second observing runs have also now been made publicly available. There is considerable interest among the broad scientific community in understanding the data and methods used in the analyses. In this paper, we provide an overview of the detector noise properties and the data analysis techniques used to detect gravitational-wave signals and infer the source properties. We describe some of the checks that are performed to validate the analyses and results from the observations of gravitational-wave events. We also address concerns that have been raised about various properties of LIGO-Virgo detector noise and the correctness of our analyses as applied to the resulting data.

Introduction
Gravitational-wave observations have become an important new means to learn about the Universe. The LIGO Scientific Collaboration and the Virgo Collaboration (LVC) have published a series of discoveries beginning with the first detected event, GW150914 [1], a binary black hole merger. Within a span of two years, that event was followed by nine other binary black hole detections (GW151012 [2,3], GW151226 [4], GW170104 [5], GW170608 [6], GW170729, GW170809, GW170814 [7], GW170818 and GW170823), and one binary neutron star merger, GW170817 [8]. Details about all of these confidently-detected gravitational-wave events have been published in a catalog, GWTC-1 [3].
The global gravitational-wave detector network currently consists of two Advanced LIGO detectors in the U.S. [9] in Hanford, Washington and Livingston, Louisiana; the Advanced Virgo detector in Cascina, Italy [10]; and the GEO 600 detector in Germany [11]. In the coming years this network will grow through the addition of the Japanese detector, KAGRA [12][13][14], and a third Advanced LIGO detector to be located in India [15]. The first observing run (O1) of Advanced LIGO took place from September 12, 2015 until January 19, 2016. The second observing run (O2) for the Advanced LIGO detectors began on November 30, 2016, and lasted until August 25, 2017. The Advanced Virgo detector formally commenced observations in O2 on August 1, 2017, enabling the first three-detector observations of gravitational waves [3]. A third LIGO-Virgo observing run, O3, began on April 1, 2019, with all three detectors operating with their best sensitivity to date.
Consistency between multiple detectors helps greatly to suppress instrumental backgrounds and to allow coherent analysis of gravitational-wave signals. All of the event detections published to date have involved both of the Advanced LIGO detectors, while GW170814 and GW170818 were triple-detections sensed by Virgo as well. Data from Virgo were also used in the parameter estimation analysis and sky localization determination for GW170729, GW170809, and GW170817. The Virgo data were especially critical in helping to find the source of GW170817 [16]. This binary neutron star merger represented a remarkable debut for multi-messenger astronomy with gravitational waves, as it was closely followed by a short gamma-ray burst, GRB 170817A [17,18], and the relatively precise localization obtained from the gravitationalwave data enabled the identification and thorough multi-wavelength study of kilonova and afterglow emission from an optical counterpart, SSS17a / AT 2017gfo [16,19].
As summarized in [3], the LVC detections were made using two independent matched-filter analyses to search for compact binary coalescences in O2 [20,21], as well as an unmodeled search for short-duration transient signals or bursts [22]. Thus, detection methods that were developed by the LVC and tested using simulated signals added to mock data, or to previous sets of real data where any possible signals were overwhelmed by noise, have now been demonstrated to be effective for astrophysical gravitational-wave signals. Testing and validation of LVC analyses was achieved using both (simulated) signal injections performed within the analysis, i.e. in software, and signal injections made in hardware by moving the detectors' test masses.
The growth of the number of observed gravitational-wave events has stimulated intense interest in the astrophysical implications of the detected sources, as well as interest in the gravitational-wave data. Currently, the LVC releases data through the Gravitational-Wave Open Science Center (GWOSC) [23,24]. LIGO data releases are described in the LIGO Data Management Plan [25], an agreement between the LIGO Laboratory and the US National Science Foundation. The LVC policy for releasing gravitational-wave triggers and event candidates is presented in [26,27]. For detections of compact binary mergers, about one hour (4096 seconds) of calibrated strain data around the event time are released at the time of publication. These data are available for all published detections in O1 and O2 [28].
Currently the bulk data from the initial LIGO Science Runs since 2005 are available on GWOSC [24], as are the Advanced LIGO data from the O1 observing run [29] and the Advanced LIGO and Advanced Virgo data from the O2 observing run [30]. Timing for release of data in future observing runs is described in the Data Management Plan [25]; for instance, the bulk data from the first 6 months of the O3 run will be released in April 2021. GWOSC is continually updating and releasing data products that address the needs and interests of the broader scientific community. Many of the analysis software packages used by the collaboration are publicly available as open source code; a list of these is available on the GWOSC web site [24]. Also, a number of intermediate data products are released through the LIGO Document Control Center, typically linked with LVC papers; e.g. see [31].
With the public release of the LIGO and Virgo data, groups outside these collaborations are analyzing the released data. Most of these analyses are producing results consistent with the LVC's [32][33][34][35][36][37][38][39], and some additional significant event candidates have been reported [40,41]. The noise properties of the LIGO data and the correctness of the LVC data analysis for GW150914 have also been questioned [42,43], although successive gravitational wave detections have strengthened confidence in our detection and parameter estimation methods [3]. Motivated by the widespread interest in analyzing LIGO and Virgo data, in this paper we provide an overview of the properties of the LIGO-Virgo data and its noise components. We also describe the essential features of data analysis procedures that have been used by LIGO and Virgo teams to detect and measure the properties of the cataloged gravitational-wave sources [3], as summarized in Figure 1. The analysis of LIGO and Virgo data in searching for gravitational-wave signals is complex, as is the correct treatment of the statistical properties of noise. The LVC encourages the broader scientific community to access and analyze its data, and will always be open to discussions about the methods it uses to arrive at its conclusions. The codes used to analyze LIGO-Virgo data are public. The special purpose codes used to generate many of the figures in this paper are also available [44]. In addition, the LVC has made available a Jupyter notebook to illustrate methods used to produce key figures and results in a simplified implementation [45]. Finally, many of the software packages used by the LVC to process the LIGO-Virgo data, search for events and characterize observed signals can be found at the GWOSC site [46]. The paper is organized as follows. In Section 2 we describe the properties of the LIGO-Virgo data, while in Section 3 we discuss the noise that affects those data. Section 4 describes the basic data processing steps used to properly Fourier transform the data and estimate the power spectrum. Section 5 describes wavelet based time-frequency methods that can be used to assess possible deviations from stationary detector noise. Section 6 addresses detector and calibration issues for LIGO and Virgo. Section 7 describes the noise model used to define the likelihood function used in parameter estimation studies. Section 8 gives a description of the means by which the LVC searches for gravitational-wave signals, while Section 9 presents the means by which the LVC infers the detected waveforms and estimates the physical parameters for the system that emitted the gravitational waves. To illustrate these concepts, Section 10 provides a simplified description of how the publicly released data surrounding GW150914 can be used to find a best fit waveform model and to study the correlation properties of the residuals. We also address claims made in [42,43] concerning correlations in detector noise, residuals, and the estimation of GW150914's source properties. In addressing these claims, the LVC notes that it is beneficial for gravitational-wave science that groups external to our collaboration can introduce new ideas and techniques. Finally, in Section 11 we provide a summary assessment of LIGO and Virgo data properties as well as LVC data analysis findings and validation.

Properties of LIGO-Virgo data
The Advanced LIGO [9] and Advanced Virgo [10] second-generation gravitationalwave detectors are large-scale enhanced Michelson interferometers. The detectors are sensitive to space time strain induced by passing gravitational waves, as well as equivalent terrestrial force and displacement noises, each causing the lengths of the arms to vary over time. Differences in relative arm length generate power variations in the enhanced Michelsons output, captured by photodiodes. The signal from these photodiodes serve as both the gravitational-wave readout and an error signal for controlling the relative arm length below roughly 100 Hz.
The Advanced LIGO gravitational-wave detectors are identical in design, with 4 km long arms. Advanced Virgo has a similar design, with 3 km long arms. Fabry-Perot cavities are used in the arms of the detectors to increase the interaction time with a gravitational wave, and power recycling is used to increase the effective laser power. Signal recycling has been added in the Advanced LIGO detectors to shape their frequency response [9]. Advanced Virgo has not yet implemented signal recycling, but will in the future [10].
A calibration procedure is applied to the interferometer photodiode output of each detector (see Section 6.1) to produce gravitational-wave strain data as a time series, sampled at 16384 Hz for LIGO data and 20 kHz for Virgo data. For the Advanced LIGO detectors, the calibration is valid above 10 Hz and below 5 kHz, as described in Section 6.1. For Advanced Virgo in O2 the calibration validity range was from 10 Hz to 8 kHz [47]. The detectors also record hundreds of thousands of auxiliary channels, time series recorded in addition to the strain signal, that monitor the behavior of the detectors and their environment. The GWOSC provides distilled additional channels of data in which flags pertaining to different levels of problems with the data quality are implemented ‡. We employ continuous monitoring of the detector performance to characterize noise sources that could negatively impact the sensitivity of the searches or the source property estimation [48,49]. Invalid data due to detector malfunction, calibration error, or data acquisition problems are flagged so that they can be removed from analyses, as described in Section 6 and [50].

Basic properties of detector noise
The data recorded by the Advanced LIGO and Advanced Virgo instruments are impacted by many sources of noise, including quantum sensing noise, seismic noise, suspension thermal noise, mirror coating thermal noise, and gravity gradient noise [9]. In addition, there are transient noise events, for example coming from anthropogenic sources, weather, equipment malfunctions [48], as well as occasional transient noise of unknown origin [51]. There is also persistent elevated noise confined to certain frequencies, manifesting as very narrow peaks in a plot of noise versus frequency, which we refer to as spectral lines; these are typically caused by electrical and mechanical devices or resonances [49]. The combination of all the noise sources in a detector produces a time series n(t) that can be represented by a vector n, with components given by the discrete time samples n i = n(t i ). The noise is described as a stochastic process with statistical properties given by the joint probability distribution p(n). This model can be used to define summary statistics such as the mean µ = E[n] (where E is defined as the expectation value) and covariance C ij = E[(n i − µ)(n j − µ)] where the expectation values are taken with respect to p(n). The mean can be estimated from the data asμ where N = dim(n) is the number of data samples. The full covariance matrix cannot be estimated from the data without making additional assumptions as we have only M = 1 measurements for each data point, rendering the sample covariance matrix formally undefined: Estimates of the covariance matrix can be made if noise is assumed to follow a particular distribution, or if the noise properties are unchanging in time. Note that in practice, analyses generally do not use all N samples at once, but rather use segments of contiguous data of various lengths from a few seconds up to hours depending on the intended application. Noise is referred to as Gaussian if the joint probability distribution follows a multi-variate normal distribution: where C −1 ij is the inverse of the covariance matrix at i, j. The noise is referred to as stationary if C ij depends only on the lag |i − j|. Stationary noise is characterized by the correlation function C(τ ), where τ = |t i − t j | is the time lag. Transforming to the Fourier domain, where the labels i, j now refer to frequencies f i , f j , stationary noise has a diagonal covariance matrix C ij = δ ij S n (f i ), which defines the power spectral density S n (f ). The power spectral density is given by the Fourier transform of the correlation function C(τ ). Amplitude spectral density is the square root of power spectral density and has units of Hz −1/2 . The noise is referred to as white if C ij = δ ij σ 2 in both the frequency domain and the time domain. White noise is, however, a poor approximation to LIGO-Virgo detector noise Understanding the noise is crucial to detecting gravitational-wave signals and inferring the properties of the astrophysical sources that generate them. Improper modeling of the noise can result in the significance of an event being incorrectly estimated, and to systematic biases in the parameter estimation. To guard against these unwanted outcomes, detector characterization and noise modeling are significant activities within the LVC [48,50]. While many textbook treatments of gravitationalwave data analysis [52][53][54] describe the idealized case of independent detectors with stationary, Gaussian noise, actual LVC analyses are careful to account for deviations from this ideal.
The Advanced LIGO and Advanced Virgo detector data have a rich structure in both time and frequency. For a given gravitational-wave source, the noise (as described by its spectral density) governs the measured signal-to-noise ratio (SNR). Figure  2 shows the spectral frequency content of the LIGO-Livingston detector averaged over a three-minute period shortly before the first detection of gravitational waves from a binary neutron star merger, GW170817. During the O1 and O2 runs, the Advanced LIGO detectors had an averaged measured noise amplitude of about 10 −23 Hz −1/2 at 100 Hz. (The target sensitivity at 100 Hz for Advanced LIGO is 4 × 10 −24 Hz −1/2 [9], while for Advanced Virgo it is 5 × 10 −24 Hz −1/2 [10].) The steep shape at low frequencies is dominated by noise related to ground motion. Above roughly 100 Hz, the Advanced LIGO detectors are currently quantum noise limited, and their noise curves are dominated by shot noise [9,55]. High amplitude noise features are also present in the data at certain frequencies, including lines due to the AC power grid (harmonics of 60 Hz in the U.S. and 50 Hz in Europe), mechanical resonances of the mirror suspensions, injected calibration lines, and noise entering through the detector control systems. For a detailed account of noise sources that appear at specific frequencies in the Advanced LIGO detectors, see [49]. For a list of the Advanced Virgo noise lines for observing run O2, see [56].

Fourier domain analysis
The noise in the LIGO-Virgo detectors is, with isolated exceptions, approximately stationary, and therefore can be most easily characterized in the frequency domain. Stationary, Gaussian noise is uncorrelated between frequency bins, and the noiseñ(f ) in each bin follows a Gaussian distribution with random phase and amplitude S 1/2 n (f ). The first step in many LVC analyses is to Fourier transform the time-domain data using a fast Fourier transform (FFT) [57][58][59]. Since the FFT implicitly assumes that the stretch of data being transformed is periodic in time, window functions [60,61] have to be applied to the data to suppress spectral leakage [61] using e.g. a Tukey (cosine-tapered) window function. Failing to window the data will lead to spectral leakage and spurious correlations in the phase between bins. For the analysis of transient data the use of Tukey windows is advantageous as signals will suffer less modification than, for example, Hanning or Flattop windows [61].
As an illustration, Figure 3 shows a sequence of processing steps applied to a stretch of calibrated strain data from the LIGO-Hanford detector around the time of GW150914. The raw data are dominated by low-frequency noise. A Tukey window with 0.5 s transition regions was applied to the raw data. Next, the data were whitened by dividing the Fourier coefficients by an estimate of the amplitude spectral density of the noise, which ensures that the data in each frequency bin has equal significance by down-weighting frequencies where the noise is loud. The data were then inverse Fourier transformed to return to the time domain: The whitened samples were scaled to have unit variance in the time domain. As a final step, the data were bandpass filtered using a zero-phase, eighth order Butterworth filter with pass band [35 Hz,350 Hz]. The bandpass enhances the visibility of features of interest in this band by removing noise outside of the band -seismic and related noise at low frequencies, and quantum sensing noise at high frequencies. Note that such narrow bandpassing is only used for visualization purposes and is not employed in the LVC analyses. The gravitational-wave signal GW150914 is visible in the whitened and bandpassed data shown in the lower panel of Figure 3. While the steps above can make loud transient signals like GW150914 more easily visible in the strain time series, LVC's statistical analysis pipelines typically use a different sequence of processing steps. LVC pipelines for detection and parameter estimation proceed by first high-pass filtering the data, to remove high-amplitude noise below the range of frequencies that will be analyzed by the pipelines which typically starts at ∼ 20 Hz. The data may also be down-sampled, after low-pass filtering to avoid aliasing, to reduce computational costs; thus its frequency content will be affected by the anti-aliasing filter at high frequency, with a formal cutoff at the Nyquist frequency of the down-sampled data [20,21]. The LVC parameter estimation pipelines do not apply any bandpass filter to the data, but limit the likelihood integral calculation to begin at some lower frequency cut-off (typically also 20 Hz).
GW150914 was originally identified with high significance by a generic search for coherent excess power across the detector network [1,62], as well as by matched-filtering analyses [2], as described in Section 8, but this loud signal is also clearly visible in the data even with the minimal processing described here.

Methods for measuring the noise spectrum
The power spectral density of the noise S n (f ) is not known a priori and must be estimated from the data. One can perform a complex FFT of the entire data stream around some time to be searched for signals, but that yields only two samples (real and imaginary parts) per frequency bin, hence the variance in the estimate of S n (f ) in any single frequency bin is large. To overcome this, either some form of averaging is used [63], or a fit is made to a physical model for the spectrum [64]. For example, Welch averaging [65] can be used to reduce the variance in the estimated power spectrum, but at the cost of either reducing the frequency resolution or requiring longer stretches of data. The spectral estimate used to whiten the data in Figure 3 was found by applying a Welch average to 1024 s of data centered on GPS time 1126259462 (the nearest integer GPS time to the peak of the GW150914 signal). The data were broken up into overlapping 4 s long chunks, each spaced by 2 s. The data in each chunk was Tukey filtered and Fourier transformed. The power spectrum from all the chunks was then averaged. Figure 4 compares the power spectrum of the Hanford data shown in Figure 3, before and after applying the Tukey window, to the power spectrum estimated using Welch averaging. The non-windowed spectrum is swamped by spectral leakage, and follows a 1/f 2 scaling. This scaling results from the abrupt step function at the beginning and end of the data to be Fourier transformed. This non-windowed data chunk arises from multiplying a longer stretch of data by a boxcar (or top hat) window. Thus, when it is Fourier transformed, the result is the convolution of the desired spectrum of the original data with the Fourier transform of the 4s-long boxcar window, i.e. a cardinal sine (sinc) function whose amplitude decreases as 1/f 2 . Since the noise spectrum rises much more rapidly than 1/f 2 towards low frequencies, the entire visible frequency range is then dominated by the leakage from this low-frequency component.
When the noise spectrum varies significantly over time other spectral estimation methods have to be used [20,66]. One approach used in LVC parameter estimation studies is to fit a parametrized spectral model to the data that has a smooth spline component and a collection of Lorentzian lines [64]. In Section 5 we also discuss in detail the issue of stationarity and non-stationarity of the data, and the effects this has on the data analysis.
In addition to causing spectral leakage, improper windowing of the data can result in spurious phase correlations in the Fourier transform. Figure 5 shows a scatter plot of the Fourier phase as a function of frequency for the same stretch of data shown in Figure 4, both with and without the application of a window function. The un-windowed data shows a strong phase correlation, while the windowed data does not.
The degree to which a time series is consistent with being stationary and Gaussian noise can be diagnosed by looking at the distribution of its Fourier transformed frequency samples. If the noise is stationary and Gaussian the real and imaginary parts of the whitened noise in each frequency bin will be a collection of independent and identically distributed (i.i.d.) random variables with zero mean and unit variance: x ∼ N (0, 1). Departures from stationarity result in correlations between samples in different Fourier bins, while departures from Gaussianity can be identified by comparing the distribution of samples to a unit normal distribution. Loud instrumental noise transients and loud gravitational-wave bursts do contribute to non-stationary and non-Gaussian features, but away from these transient disturbances the LIGO-Virgo  Figure 3. The spectrum for the non-windowed data are swamped by spectral leakage, and follow a 1/f 2 scaling. The Welch average was computed using a longer stretch of data. data can be approximated as stationary and Gaussian. Figure 6 shows the whitened Fourier amplitudes for a quiet stretch of data from the LIGO-Livingston observatory.

Time-frequency analysis and stationarity
The LIGO-Virgo data exhibit two main types of non-stationary behavior. The first is slow and continuous adiabatic drifts in the power spectrum occurring over minutes or hours, and the second is short-duration noise transients, which we refer to as glitches, that are typically localized in time and frequency. Additional non-stationarity has been observed in the vicinity of spectral lines, such as those due to electromagnetic couplings to the 50/60 Hz AC power supply. The adiabatic drifts in the power spectrum can be defined in terms of locally stationary processes [67,68]. A locally stationary process has a covariance function which is the product of a covariance function for a stationary process and a time-variable function.
The stationarity of the data is evaluated as part of candidate event validation [3,48]. Here we describe some simplified non-stationarity tests that can be applied to the data. Non-stationarity can in principle be identified by looking for correlations in the Fourier amplitudes, but it is easier to identify and classify non-stationary behavior using time-frequency methods. The simplest approach is to divide the data into small chunks of time centered on time t i , and compute a smoothed estimate for the power spectrum for each chunk S n (f, t i ). Figure 7 shows Bayesian power spectral density estimates [64] computed using 8-second stretches of data from the LIGO-Hanford instrument that are spaced at 64-second intervals. The instrument noise level was highly variable during this time period, showing large changes in the power spectral  Figure 3. If no window is applied before performing the FTT, as was the case in the analysis in [42], spectral leakage causes the phase to be correlated. When the Tukey window is applied the phases appear randomly distributed, as expected for Gaussian noise. The phases show some clustering around the 60 Hz power line, consistent with the deterministic origin of this noise component. density in the band between 32 Hz and 256 Hz (note that this particular period of time for this example was chosen due to observed large variations in the detector's sensitivity). Wavelets provide a more flexible analysis framework than short-time Fourier transforms. Continuous wavelet transforms are commonly used in LIGO-Virgo data studies to produce spectrograms that provide a visual indication of non-stationary behavior. Quantitative assessments of non-stationarity may also be made by using discrete, orthogonal wavelet transforms. These can be visualized using a scalogram, showing the amplitudes of the wavelet basis functions at each discrete time and frequency pixel. Figure 8 shows a scalogram of the same stretch of LIGO-Hanford data which were used to produce Figure 7. The data were first whitened using an amplitude spectral density estimate taken from 256 seconds of data centered at GPS time 1165067917. The whitened data were then transformed using discrete wavepackets §, built from Meyer wavelets [70], that were chosen to give uniform time and frequency coverage with tiles of size ∆t = 0.5 seconds and ∆f = 1 Hz. The average power at each time was then computed by summing the squares of the wavelet amplitudes (and dividing by a normalization constant) between 16 and 256 Hz. The noise level is elevated for almost a minute around the center of the data segment.
When this analysis is applied to stationary, Gaussian noise, the power in each time interval follows a chi-squared distribution with N f degrees of freedom, where N f is the § Note that the standard discrete wavelet transformation applies successive high and low pass filters in a particular order. The wavelet wave packet transform generalizes this to consider all possible combinations of filter application orders. A particular path through this transformation sequence defines some wavelet wave packet transformation. One is free to choose the decomposition path. Various criteria can be used to select an optimal or near optimal decomposition for a particular data set. In the study presented in this paper we use a path that gives a regular spacing in frequency bands because this choice provides a simple generalization of a discrete Fourier transform to the more flexible time-frequency case. For more information on this method, see [69]. number of frequency pixels that are summed over. The distribution of the average power can be compared to this reference distribution using e.g. an Anderson-Darling test [71], to yield a quantitative measure of the non-stationarity. Note that while stationary noise is stationary no matter what time span is considered, non-stationary noise will produce different measures of departure depending on the averaging scale (here the width of the wavelet pixels in time) and time span of the data. For visualization purposes it is convenient to transform the average power p(t) to a new variable s(t) via the Wilson-Hilferty transformation [72], such that s(t) follows a N (0, 1) Gaussian distribution when the noise is stationary and Gaussian.
Applying the Anderson-Darling test to the total power yields p-values for the hypothesis that the data are stationary. When applied to the quiet stretch of data shown in Figure 9 the test yields a p-value of p = 0.74, indicating that the hypothesis that the data are stationary cannot be rejected over this time period at this wavelet scale. Applying the same test to the data shown in Figure 10 yields a p-value of p = 2.3 × 10 −6 , and we can reject the hypothesis that the data are stationary with high confidence. Any analysis that attempted to detect or estimate the parameters of a possible gravitational-wave signal occurring in this stretch of data would then have to take steps to mitigate, suppress or otherwise account for the departure from stationary noise.

Detector calibration and data quality
In this section we provide the central concepts related to the calibration of the data as well as an overview of the data quality checks we perform. These procedures ensure that the strain data used for analyses (namely, the analyses used by the LVC in publication  results) and made public on the GWOSC is calibrated properly with known error bars, and that time periods of poor data quality can be avoided, as explained below.

Realtime interferometer control
Calibration pipeline Figure 11. Differential arm length control loop and calibration diagram of the LIGO detectors from the GW150914 companion paper on calibration [55]. The left (grey) box shows the realtime detector controls while the right (purple) box shows the calibration procedure. ∆L f ree is the unsuppressed change in differential arm length, and hence the desired quantity. The photodiodes (part of the sensing C) measure the residual differential arm length ∆Lres, which is suppressed by the feedback loop. The "error signal" derr, equal to ∆Lres multiplied by the sensing function C, is passed through digital filters, D, and applied to the differential arm length actuators through the actuation function, A. In order to reconstruct an estimate of ∆L f ree in units of strain we model A and C, denoted in the purple box.
x (P C) T denotes where in the loop we apply a force to the test mass mirrors, via radiation pressure (photon calibrator), in order to measure A and C, as functions of frequency. The output of the calibration pipeline is then a strain signal, h(t), that is a faithful representation of ∆L f ree /L.

Detector calibration
The Advanced LIGO [9,55,73,74] and Advanced Virgo [10,47,75] detectors use feedback loops to keep the optical cavities on resonance. The strain calibration must thus include models and measurements of all readout electronics, as well as of electronics and transfer functions of actuation hardware that act on the mirrors through multiple points in the suspension systems [76]. As shown in Figure 11, there are three main components of the differential arm control loop for Advanced LIGO: the actuation function A(f ), the sensing function C(f ), and the digital filters applied, D(f ). All three are measured and modeled as functions of frequency.
The digital filters are known to great precision so the calibration error and uncertainty come from the differences between the model and measurement (including measurement error) of the actuation and sensing functions, A and C. To independently measure the actuation and sensing functions, a pair of beams from auxiliary lasers are reflected off of each test mass mirror, with their intensities modulated at a known frequency and amplitude to actuate with radiation pressure. These auxiliary laser assemblies are referred to as photon calibrators [77]. Once A and C are known, the true differential arm length is extracted and translated to a strain by dividing by the common length of the arm L (4 km for LIGO, 3 km for Virgo), per the following equation: where C and A are time-domain filters derived from frequency-domain measurements of the actuation function A(f ) and the sensing function C(f ). Note that the gravitational-wave strain can also appear in the common-mode arm length changes, and in changes to the lengths of all degrees of freedom in the detector. However, only the sensing of the differential change in the interferometer's arm lengths is engineered to have low enough instrumental noise to be sensitive to the strain induced by gravitational waves. The other optical lengths are controlled in order to maintain an optimal and linear response to the gravitational-wave strain in the differential degree of freedom.
Calibration measurements are made periodically in each observing run. In addition, to monitor time dependent parameters such as optical gain, cavity pole frequency and actuation strength drifts, several calibration lines are continously injected, at specific frequencies, by applying sinusoidal forces on the test mass mirrors using the photon calibrators; these lines will be present in the raw strain data. The calibration line frequencies are different amongst the detectors.
For the second observing run O2 and onwards, the calibration lines are removed from the calibrated h(t) strain data channel (within the calibration accuracy) [47,78]; the calibration lines were not removed from the O1 data [74]. Even for O1, the presence of the calibration lines does not affect the search for compact binary coalescence gravitational-wave signals as the amplitude of data at the frequencies of the lines is suppressed via the whitening [79,80] of the data when the calculation of the detection statistic is made for the data from each detector (see Section 8). Similarly, for parameter estimation the presence of the noise spectral density in the likelihood (see Section 9) minimizes the influence of spectral lines including calibration lines. Because the spectral lines are narrow, this frequency-domain weighting has a negligible effect on signal searches and parameter estimation and does not lead to any spurious effects such as generation of false candidate events or parameter biases.
For the Advanced Virgo calibration in O2 it was necessary to account for the transfer function of the optical response of the interferometer. This requires a calibration of the longitudinal actuators for the mirrors, still based on the laser wavelength as length reference using the so-called free swinging Michelson configuration described in [81]. In addition, the interferometer's output power as determined by the readout electronics also requires calibration. Calibration measurements are made weekly in each observing run and have shown stable actuation strengths. To monitor the time dependent optical gain and cavity pole frequency, several calibration lines are continously injected, as in LIGO detectors, at specific frequencies, by applying sinusoidal forces on the test mass mirrors using the electro-magnetic actuators. By construction, the calibration lines are removed from the calibrated h(t) strain data channel (within the calibration accuracy). For Advanced Virgo in O2 the gravitational wave strain reconstruction removed the contributions from control signals to the test mass mirror motion. A photon calibrator, namely an auxiliary laser that is used to reflect photons off a mirror and induce momentum transfer, was used to verify the calibration and confirm the sign of the strain channel h(t). See [47] for more explicit details of the Advanced Virgo calibration system for O2. Calibrated strain data for the LIGO and Virgo detectors are created online for use in low-latency searches. After the completion of the observing runs, final time-dependent calibrations were generated for each detector. The results presented in GWTC-1 [3] use the full frequency-dependent calibration uncertainties described in [47,82,83]. It is important to note that the detector strain channel h(t) is only calibrated between 10 Hz and 5 kHz for Advanced LIGO and 10 Hz and 8 kHz for Advanced Virgo [47]; the channel is not a faithful representation of strain at lower or higher frequencies.

Data quality and terrestrial noise
As described in Sections 3 and 5, calibrated LIGO and Virgo data can be both nonstationary and non-Gaussian at certain times and frequencies. Glitches may mimic true transient astrophysical signals in individual detectors [48], while spectral lines such as those seen in Figure 2 can blind searches for long-duration signals at those specific frequencies [49]. In this section we outline how we identify and characterize these noise features so that we can either exclude the bad data or assess the impact of remaining artifacts on searches for gravitational-wave signals. Figure 12 shows an example of a glitch. Glitches with power comparable to detectable signals have historically occurred on the order of once per minute, with larger glitches occurring less frequently. Even in their nominal state, the detectors' data contain glitches introduced by behavior of the instruments or complex interactions between the instruments and their environment. Many of these glitches (but not all) can be associated with transient signals in auxiliary channels from various sensors which serve as "witnesses" to environmental disturbances coupling into the interferometer. These associations allow us to identify and catalog certain classes of glitches. See [48] for a detailed presentation on the characterization of transient noise in Advanced LIGO, especially pertaining to the observation of the gravitational-wave signal GW150914. Descriptions of various noise sources for Advanced Virgo in O2 can be found in [84][85][86][87].
In searches for transient gravitational-wave signals, identified glitches and periods of poor data quality are flagged [50,88,89]. Periods of data are vetoed at various levels or categories depending on the severity of the problems; the GWOSC open data releases make this information available [24]. Sections of strongly non-stationary data that would corrupt the noise power spectral density estimates are removed entirely from the searches. Times when noise sources with known physical coupling to the gravitational-wave strain channel of the detector are active, and thus likely to cause glitches, are identified, and candidates at or around these times may be removed (vetoed) from search results. For a more detailed explanation of the strategy for mitigating noise sources, see Section 4 of [48]. For searches for long-duration signals, frequency bands known to be dominated by instrument noise are omitted from the analysis [49]. The strategy of vetoing times of probable glitches is expected to increase confidence in detection candidates that survive the application of vetoes [50,90,91], and may thus increase the number of astrophysical signals that can be confidently detected. Detection methods are discussed in detail in Section 8.
Although the great majority of transient noise sources are of local origin and thus uncorrelated between detectors, some noise sources exist that are potentially correlated between detectors, such as electromagnetic pulses from lightning coupling inductively into the detectors [92]. A key feature of the LIGO and Virgo experimental design is an array of physical environment monitors designed to detect environmental disturbances, and to have greater sensitivity to those disturbances than the detector's gravitationalwave strain channel does. The LIGO environmental sensor array includes seismometers, microphones, accelerometers, radio receivers and magnetometers to monitor ambient noise [93] . Virgo has a similar array of sensors [10] ¶. The environmental sensors' sensitivities are verified via a suite of noise injections performed at the beginning and end of each observing run; acoustic, magnetic, radio frequency, and vibrational tests are done to quantify the coupling from ambient noise to the gravitational-wave strain data h(t) [48]. These injections are conducted at multiple locations around the detector such that sensor coupling functions to h(t) via multiple potential coupling paths are verified and well understood.
As shown in Figure 2 of [48], the external transient electromagnetic coupling of the ambient noise to the gravitational-wave data channel h(t) is on the order of a factor of 100 below the current strain level, such that any electromagnetic source would have to register in one of the magnetometers surrounding the detector an SNR of 100 before registering in the gravitational-wave signal channel. This is easily confirmed with the study of lightning strikes during nearby storms [48]. The description of the coherence between the detectors' output strain signal h(t) and magnetometers about the detector for the AC power frequencies (50/60 Hz) is nontrivial and described in [94]. In Virgo, a detailed study of the electromagnetic coupling to the gravitational-wave data channel was recently carried out [84,85].
In addition, a potential correlated noise source in searches for a stochastic gravitational-wave background is Schumann resonances, or low-frequency magnetic field resonances between the Earth's surface and the ionosphere excited by lightning [95,96]. These resonances are also monitored with sensitive magnetometers, with the future goal of subtracting their effect on gravitational-wave strain data [97]. The effect of Schumann resonances on the measured gravitational-wave strain is below the current Advanced LIGO -Advanced Virgo noise floor [98].

Noise model and likelihood
The likelihood that the gravitational-wave strain data contains a given signal is the central quantity in both detection and parameter estimation of gravitational-wave events. In this section we relate this likelihood to the model assumptions commonly made for the noise components of gravitational-wave detector strain data. The data time series d collected from an interferometer can be written as the sum of the gravitational wave response of the detector, h, and the combination of all the noise sources in that detector n such that d = n + h. Since the true gravitational-wave signal in the detector h is unknown, we resort to using signal models denoted by h. We consider a model h to be a good description of the signal in the data if the residuals r = d − h are consistent with our model for the instrument noise. More quantitatively, the likelihood that the data d contains a possible signal h is given by the probability that r is a realization of the noise model. In other words, the likelihood function is the noise model. For Gaussian noise the likelihood can be written: where C is the noise correlation matrix, and The repeated indices include a sum over the network of detectors I, J and the data samples k and m. If the noise is uncorrelated between detectors C (Ik)(Jm) = δ IJ S I km , where S is the noise spectral density. Moreover, if the noise is stationary -so that correlations depend on only the time lag between data samples -then the noise correlation matrix in each detector will be diagonal in the Fourier domain: S I km = δ km S I (f k ). In that case we have χ 2 (d, h) = (r|r) where (a|b) is the familiar noise-weighted inner product: The likelihood function (6) is central to Bayesian inference [99], and with the specification of priors [100] for the signal and noise models, allows for the calculation of the model evidence [101,102] -giving the odds that a signal is present -and posterior distributions for the model parameters, θ, such as the masses and spins of a binary system [103][104][105]. For stationary, Gaussian noise that is uncorrelated between detectors, the likelihood function takes the form where the sum is taken over the detectors in the network and (a|b) is the noise weighted inner product defined in equation (8).
The likelihood function can also be used to define a frequentist detection statistic [53,106] given by the likelihood ratio between a signal h being present or absent in the data. If the data were stationary and Gaussian this statistic would follow a known distribution and the false alarm rate for an event could be computed analytically. In practice the noise exhibits deviations from stationarity and Gaussianity, and the methods used to detect and characterize signals have to be modified. Robust search methods have been developed that take into account the measured properties of the noise. These are described in Section 8. The noise modeling and consistency checks applied to signal characterization and parameters estimation are described in Section 9.

Signal detection
In this section we describe how candidate signals are identified in LIGO-Virgo data and how the statistical significance of each candidate is quantified by comparison with the observed properties of the detector noise.

Model comparison and the matched filter
The LVC searches for gravitational waves compare the null hypothesis, H 0 , that a given stretch of data contains only noise, to the signal hypothesis, H 1 , that the stretch of data contains both noise and a gravitational-wave signal + . Most searches assume that general relativity correctly describes the gravitational-wave signals. The likelihood of observing the data under the two hypotheses can be written in terms of Eq. (6) by where H 0 assumes noise alone with no signal in the data, while H 1 assumes a signal parameterized by θ, h(θ) is present in addition to the noise. For the present, we will assume that each detector data stream is being analyzed independently, and we discuss below how data from multiple detectors is combined in LVC searches. The probability of the signal hypothesis given the observed data, known as the posterior probability, is given by Bayes' theorem as where p(H 0 ) and p(H 1 ) are our prior beliefs of whether a signal is absent or present in the data. Regardless of these prior beliefs, the posterior probability is seen to be monotonic in the likelihood ratio and so this quantity is the optimal test statistic [54]. For Gaussian noise, the log of the likelihood ratio can be written in terms of the inner product of Eq. (8) as Only the first term of this expression involves the data; it is then observed that the posterior probability is a monotonic function of (d | h(θ)), a quantity known as the matched filter, which, therefore, is also an optimal test statistic. + In reality all LIGO-Virgo data may contain some level of gravitational-wave signal, but a signal can only be detected if the null hypothesis is sufficiently disfavored relative to the signal hypothesis.

Signal-to-noise ratio and template banks
In a matched-filter search for gravitational-wave signals, the signal parameters θ will not be known in advance. The optimal detection statistic would be obtained by marginalizing [107] the likelihood ratio Λ(d|θ) over all unknown parameters by integrating the likelihood ratio over these parameters * .
Since the log likelihood ratio is a linear function of the signal model, its exponential -the likelihood ratio itself -will generally be sharply peaked about its maximum, thus the maximum value of Λ(d|θ) over unknown parameters θ is expected to be a good approximation to the marginalized likelihood ratio (up to a possible constant rescaling). This maximization procedure is equivalent to minimizing the residuals seen in the detector, which can be seen as follows: The log likelihood can be written as Now it is clear that the parametersθ that maximize the log likelihood ratio are those that minimize the residuals d − h(θ) in terms of the noise weighted inner product. The parameters θ describing the strain observed in a detector include the signal amplitude A observed in the detector (which is inversely proportional to the distance to a gravitational-wave source), the phase φ of the sinusoidally-varying signal observed in the detector, the arrival time t of the signal (usually defined by the moment when it reaches peak gravitational-wave amplitude at the detector), and other parameters µ describing the physical parameters of the source such as the masses and spins of the components. We write where p(t, µ) and q(t, µ) are in-phase (cosine) and quadrature-phase (sine) waveforms, normalized so that (p | p) = (q | q) = 1, and which are orthogonal, (p | q) = 0. Maximization over the amplitude and phase can be done algebraically as follows: Eq. (13) can be rewritten using Eq. (15) as and is the SNR time series for waveform templates with parameters µ. The log-likelihood log Λ is maximal for amplitudeÂ = ρ and phaseφ = ϕ with max A,φ log Λ(θ) ≡ log Λ(t,Â,φ, µ) = 1 2 ρ 2 (t, µ) .
(19) * The integration measure to obtain an optimal statistic is given by the probability density of gravitational-wave signals over the unknown parameters [108]: for example if the parameters θ include ι, the inclination of the binary orbit relative to the line of sight for a compact binary gravitational-wave source, the signal probability density over ι is uniform in cos ι [109] Peaks in this time series correspond to times at which a signal is most likely to be present. Under the signal (noise) hypothesis, and in the presence of stationary and Gaussian noise with a known power spectrum, ρ 2 (t peak , µ) follows a non-central (central) chi-squared distribution with two degrees of freedom. The SNR time series can be conveniently expressed in terms of a complex time series as Eq. (8) of [66] as ρ = |z| and the phase ϕ = arg(z);p(f, µ) is the Fourier transform of the in-phase waveform (see Eq. 15). Eq. (20) is the inverse Fourier transform of where Θ(f ) is the Heaviside step function. Parameters such as the masses and spins of the binary components (represented by µ) change the morphology of the gravitational wave. In order to detect signals with a wide range of possible masses and spins, a bank consisting of large numbers of signal templates spanning the parameter space is produced, and each template in the bank is used as a matched filter. The template bank used to initially find signals in the data is constructed with a density in parameter space sufficiently high that the loss in SNR between a true signal and the best-fit template is less than 3%. See [2,3,110] for more details on the template banks used for the O1 and O2 searches.
An important property of the SNR should be noted: in Eq. (20) it can be seen that the integrand is proportional tod(f )/S n (f ), so the data are not simply being whitened (which would have been the case if the denominator were S 1/2 (f )), but in fact noisier parts of the frequency spectrum (including narrow lines) are suppressed in the matched filter. Equivalently, the SNR integral can be seen as correlating a whitened data time series with a whitened template. The SNR therefore provides a natural way to down-weight the frequency bands where the noise is large, and effectively notches out the various lines.

Rejection of noise artifacts and construction of candidate ranking statistic
While the SNR is the optimal detection statistic in the case of stationary Gaussian noise, transient instrumental artifacts make it a non-optimal statistic with real detector noise. Although the matched filter naturally suppresses stationary noisy features in the data, glitches can cause certain templates to produce high SNR values [50,[111][112][113]. We address this in several different ways: (i) As explained in Section 6.2, we use witness sensors to identify times when the environment or the instrument introduces frequent glitches and we veto a subset of these times found to impact search performance from our analysis [50]. These sensors include those that monitor the physical environment about the gravitational-wave detector, as well as those that record signals from within the internal control systems of the interferometer.
(ii) We implement waveform consistency tests which characterize the deviation of the data d from the model n + h [20,21,66,114]. For signals from compact binary mergers, these tests are extremely powerful and allow us to reject many glitches which have not been identified and vetoed, though for short signals the discriminatory power of these tests is diminished [3,20,114,115]. The exact implementation of these signal consistency tests vary among search pipelines, but all are based on the following principle: if the gravitational-wave model waveform is subtracted from the data to produce residuals d − h, the residuals should be consistent with Gaussian noise if the signal hypothesis is true. These residuals are re-filtered with the matched filter over different time or frequency intervals to determine if non-noise-like features persist; evidence of such features suggest that the model waveform h is not a good match to the non-Gaussian feature in the data, and the detection ranking statistic is down-weighted accordingly.
For example, the consistency test described in [66,114] constructs a chisquared test statistic by dividing the matched filter into n frequency bands as where (a | b) i is the same as the inner product in Eq. (8) but with the integrand restricted to the frequency interval f i−1 < f < f i with f 0 = 0 and f n = ∞. Here the bands are chosen so that (p | p) i = (q | q) i = 1/n. If the residual d − h is Gaussian noise, χ 2 is chi-squared distributed with ν = 2n − 2 degrees of freedom; values of χ 2 ν are indicative of residual non-Gaussian features in the data after the model has been subtracted. A re-weighted ranking statistic proposed in [20] down-weights the SNR for large values of χ 2 . A similar time-domain based signal consistency test is described in [21] and is incorporated into a likelihood ranking statistic. (iii) For all detections published to date we have required that gravitational-wave signals be identified via matched filtering in at least two independent detectors with consistent parameters. For example, the arrival times of the gravitational waves at each detector must differ by no more than the the maximum time-of-flight between the detectors, e.g. 10 ms for the LIGO Hanford -LIGO Livingston pair, with an extra 5 ms added in order to account for uncertainty in the inferred coalescence time at each detector. This 5 ms addition to the coincidence window is also used when searching for simultaneous events for the LIGO Hanford -Virgo pair (27 ms light travel time), and the LIGO Livingston -Virgo pair (26 ms light travel time) [3]. However, having now established the existence and frequency of gravitational-wave signals, it may now also be possible to make detections when only one detector is operating, and thus this time coincidence test is not available [116].
The matched-filter based searches employed by the LVC construct ranking statistics from the SNR and the waveform consistency test statistics [3]. In addition an astrophysical signal received in several detectors will have a common set of parameters µ (within limits imposed by limited SNR) in all detectors, and, furthermore, the amplitude, phase, and time-of-arrival of the signals observed in each detector will be determined by the direction of propagation of the wave (i.e., from where on the sky the signal originates) and the polarization state of the signal. Since gravitational waves have two polarizations (in general relativity), referred to as the plus-polarization h + and the cross-polarization h × , the strain on detector I is determined by the detector's antenna response patterns F +,I and F ×,I by where α and δ are the right ascension and declination of the source of the gravitational waves, D is the distance to the source of the waves, ι is the inclination of the orbital plane of the binary system (which, for circular orbits and leading order quadrupole emission, determines the ellipticity angle), and τ I = τ I (α, δ, t) is the travel time of the signal from the geocenter to the detector. Although a fully-coherent search for gravitational waves across a network of detectors is possible, we opt instead to perform searches independently in each detector and then demand that triggers seen in different detectors have consistent times of arrival and the same parameters µ since this provides a powerful glitch rejection consistency test as described above. However, further signal consistency requirements are also possible. For the leading-order quadrupole emission from a circular binary, the ratios of the amplitudes seen in two detectors is while the difference in arrival time is t I − t J = τ I − τ J and the difference in phase is It is therefore possible to include an amplitude-phase-time consistency measure in likelihood-based ranking statistics [20,50,117]; this was done for the most recent searches for gravitational-wave signals in the LIGO-Virgo O1 and O2 data [3].

Background estimation and detection confidence
After the steps described above which mitigate the effects of noise transients, the probability of remaining transients occurring simultaneously (within a time window that takes into account the maximal travel time of a signal, for example 10 ms for the two LIGO detectors) in two detectors and producing a large joint ranking statistic value becomes extremely small. Different searches adopt different approaches for measuring this probability as a function of the ranking statistic [50]. The basic method is to examine the statistical properties of the non-simultaneous transients observed in each detector and to artificially treat them as if they did occur simultaneously. The statistical significance of any candidate event observed in two or more detectors is quantified by its false alarm rate, which is the expected rate of events per time due to noise which would be assigned an equal or larger ranking statistic than the candidate. One approach to estimating false alarm rates is to shift one detectors data stream in time (by a time interval larger than the maximum time-of-flight between detectors) and repeat the search. The resulting "time-shifted" coincidences are then treated as a background noise sample. This is done numerous times with different time shifts in order to obtain a probability distribution for the joint detector ranking statistics. Each coincident trigger is assigned a false alarm rate given by the number of background triggers with an equal or larger ranking statistic, divided by the total time searched for time-shifted coincidences. For example, in [1] it is found that the frequency of transients producing more significant events than GW150914 is less than once every 200 000 years in both of the matched-filter searches employed by the LVC.
Another similar approach is to accumulate single-detector triggers not having simultaneous (within the time-of-flight of gravitational waves) triggers in another detector and therefore likely not associated with gravitational-wave signals. The distribution of the ranking statistic under the background (noise only) hypothesis is then estimated by randomly drawing single detector triggers from the inferred single detector distributions and artificially treating them as if they were simultaneous when constructing the ranking statistic. The significance of an observed ranking statistic value can then be evaluated using this background distribution [21,117]. These two independent methods of determining the significance of observed gravitational-wave candidates have both yielded high significance for the gravitational waves that have been identified.
Terrestrial noise sources that are potentially correlated between detectors are not taken into account by these background estimation methods. Thus, as discussed in Section 6.2 above, a detailed examination of physical and environmental sensors which monitor such noise sources and an assessment of their coupling to the measured gravitational-wave strain channel is carried out in order to check the validity of a detection candidate.
The LVC also performs searches of LIGO and Virgo data without specific waveform models [22,118]. These searches first identify periods of excess power in each detector's data stream and then builds a detection statistic based on the cross-correlation between the data streams. The significance of a particular detection statistic value is again assessed using time shift analyses. In [1] it was shown that the frequency of noise transients producing more significant events than GW150914 in such a generic transient search is once every 8 400 years.
The fact that multiple searches, employing different methods, all found GW150914 to be a highly significant candidate bolsters our confidence that this event is not the product of coincident transient noise. Furthermore, the signal is well matched by the waveform predicted by general relativity for the coalescence of a binary black hole system. Various tests performed using the first ten binary black hole mergers detected by the LVC with high confidence have shown no significant deviation from general relativity models [119].

Measurement of search sensitivity
A final component of a search pipeline is the determination of the sensitivity of the search to astrophysical populations of signals. The purpose of doing so is two-fold: First, it provides a metric by which a search can be tuned to optimize its detection efficiency for particular classes of signals; second, it provides a means for interpreting the rate of signals detected by the pipeline to the rate at which signals are generated by the population.
The sensitivity of the search pipeline is normally measured via a Monte Carlo procedure (see, e.g., [120]) in which simulated signals drawn from a hypothetical source population (e.g., some distribution of binary component masses and spins, orientation angles, arrival times, and distance) are added to real detector noise, and the search is rerun to determine the fraction of signals from this population that are detected by the search pipeline. A simulated signal is considered to be detected if the search pipeline produces a trigger above some chosen ranking statistic threshold. The result is represented as a time-and population-averaged spacetime sensitivity V T for a fixed ranking statistic threshold which corresponds to a fixed false alarm rate threshold (e.g., a threshold could be chosen to be one false alarm per century of observation).

Inferring waveform and physical parameters
Once a candidate gravitational-wave signal is identified, and its significance is established, the next goal is to use the data to infer the physical parameters of the system that created the gravitational waves [102][103][104][105][124][125][126][127]. The detection of gravitational waves as well as the inference of the physical parameters relies on knowledge of the generic shape of the signal one is looking for as well as the distribution of the noise. Moreover, gravitational-wave signals are weak, therefore uncertainties in these parameters may be large and a priori assumptions about the typical amplitudes and phase evolution of such signals do have a significant impact on the reconstructed waveform. For these reasons, inference of the physical parameters of the system, such as masses, spins of the merging objects, is done within the framework of Bayesian parameter estimation. The central elements that need defining are a model M for the gravitational-wave signal that allows for the prediction of the form of the signal from the values of the physical parameters of the system, and the so-called background or prior information I.
The left hand side is referred to as the posterior probability density function, or simply the posterior for θ, while the three terms on the right hand side are the prior probability density function p(θ|M, I), the likelihood function p(d|θ, M I), given by equation (9) and the evidence, p(d|M, I) = dθ p(θ|M, I)p(d|θ, M, I) .
Within the Bayesian parameter estimation framework, the inference is reduced to the calculation of the posterior for θ given the model M and the analysis assumptions I which uniquely determine the prior distribution and the likelihood function.

Waveform models
Let us focus now on the choice of signal model M . The signal model M determines the functional form of h(t; θ) which is key to calculating the likelihood function. For definiteness, we will concentrate on parametric forms of h(t; θ), as obtained by solving Einstein's equations. For a discussion of non-parametric signal models see [128]. Exact analytic solutions of Einstein's equations are notoriously difficult to obtain; therefore data-analysis-ready models are either based on perturbative solutions, e.g. the Taylor family of waveforms [129] or the effective-one-body waveforms [130][131][132][133][134], or on hybrid/phenomenological approaches such as the Phenom family of waveforms [135][136][137][138]. We will not discuss further the details of the waveform models here, but we will restrict ourselves to the two main types of waveforms employed in the original analysis of GW150914. For GW150914, the two models used in [127] were SEOBNRv2 and IMRPhenomPv2 [127,139]. Both waveform models are full inspiralmerger-ringdown models that succeed in reproducing numerical waveforms, especially in the region of approximately equal mass and moderate spins magnitudes. The main difference between the two models lays in the treatment of the spin dynamics. SEOBNRv2 models the dynamics of the component of the spin vectors along the direction of the orbital angular momentum while IMRPhenomPv2 includes also an effective treatment of the dynamics of the in-plane components of the spins, and thus includes an approximate precessing dynamics . Because GW150914 was nearly a face-off system (orbital angular momentum vector pointing away from the Earth), the LIGO instruments were not sensitive to the in-plane spin components, hence the two waveform models are essentially equivalent. In [143], the LVC has empirically shown that the inferred properties of GW150914 depend relatively weakly on a change in the waveform model. This finding was confirmed by the analysis presented in [134] using an independent effective-one-body implementation and by [144], in which numerical relativity solutions were directly compared with GW150914 data.

Prior distributions
The final functions necessary for the application of Bayes' theorem, Eq. (27), are the prior probability distributions for the parameters of interest. These are all the parameters necessary to completely characterise the gravitational-wave signal emitted during a coalescence event. For quasi-circular orbits, these are: • the component masses m 1 and m 2 ; • the spin vectors S 1 and S 2 ; • the polarisation angle ψ and the angle θ jn between the total angular momentum J and the propagation direction of the gravitational waven; • the source luminosity distance D L ; • the source right ascension α and declination δ; • a reference phase ϕ 0 and a reference time, typically the gravitational-wave strain peak time, t 0 .
The functional form of the prior distribution must be specified for all parameters. In some cases the prior distribution is determined via invariance (symmetry) properties of the parameter space [145]; for instance, the prior for the source position D L , α, δ in the Universe is chosen from the requirement that the number density of sources is uniform in the cosmological co-moving volume in accordance with a Friedmann-Lemaître-Robertson-Walker cosmological model. Thus, the probability p(D L , α, δ|M I) ∝ dV At the time of the discovery of GW150914 another precessing waveform model was available, SEOBNRv3, which also includes in-plane spin components [140,141]. The original analysis, however, did not include results from this model, which were reported in [142]. and for redshift z 1 reduces to p(D L , α, δ|M I) ∝ D 2 L | cos(δ)|. In cases where invariance arguments do not apply, we choose simple forms of prior distribution so that the resulting posteriors are easily interpretable. Similar arguments determine the prior for the spin vectors S 1 , S 2 and orientation angles ψ, θ jn to be uniform over the azimuthal angles ranging between 0 and 2π as well as uniform in the cosine of the polar angles ranging between −1 and 1. Regarding the spin vector magnitudes, several possible priors are possible, e.g. p(| S i ||M, I) ∝ | S i | 2 or p(| S i ||M, I) ∝ 1. The main analysis of the events in the GWTC-1 catalog employed a uniform distribution over the norm of the spin vectors. For the component masses m 1 and m 2 , the chosen prior distribution is uniform, thus p(m 1 , m 2 |M I) ∝ 1, but limited from below so that m 1 , m 2 > 1M .

Calibration uncertainties
In addition to uncertainties induced by detector noise, the accuracy and precision of our source parameter estimates are also affected by uncertainties in the amplitude and phase response of the detectors. For this reason, this source of uncertainty on the data is modelled and included in the analysis. The calibration uncertainty model employed is based on empirical estimates of the error magnitudes in both amplitude and phase in specified frequency bands [47,74,82]. In particular, the model assumes the value of the errors to be distributed as a Gaussian distribution with zero mean and a variance given by the empirically determined error magnitudes. Calibration uncertainty curves are then constructed using third order spline interpolation over the data frequency space. Typically, this introduces a total of O(10) additional parameters per detector (half for the phase uncertainty and the other half for the amplitude), which are sampled in concert with the physical parameters of the system. Technical details for the LVC calibration model can be found in [146].

Numerical methods
The total number of parameters to be inferred is thus 15 for quasi-circular orbits and generic spin vectors, and 11 for models where spins are forced to be aligned with the orbital angular momentum. To the set of physical parameters, we must add the 10 parameters per detector necessary to specify the calibration uncertainty model. Hence, for a typical three-detector analysis, we sample a grand total of 45 parameters for a quasi-circular system with generic spin orientations. Parameter spaces of such high dimensionality cannot be efficiently explored with grid-based methods. Therefore, over many years members of the LVC developed the LALInference stochastic sampler library [105] which implements two algorithms, a parallel tempering Markov chain Monte Carlo [147] and a nested sampling [102]. The parallel tempering Markov chain Monte Carlo is designed to generate samples from the multidimensional posterior distribution (27), while the nested sampling instead is designed to calculate the evidence, Eq. (28) and generates samples from the posterior distribution as a by-product. More details are given in [105] and references therein. Other parameter estimation pipelines are routinely used by the LVC, such as rapidPE [148] and BILBY [149], but for the rest of the discussion we will focus on LALinference. However, the same considerations will apply to other Bayesian analysis methods.

Posterior distributions
The end products of the LALInference analyses are posterior samples for all parameters that characterise the gravitational-wave waveform. Of particular interest are posteriors for the intrinsic parameters, masses and spin vectors, which help ascertain the nature of the coalescing objects. For GW150914, the detector-frame masses (i.e., redshifted due to cosmological expansion) measured using the IMRPhenomPv2 waveform model were 38.5 +5.6 −3.6 M and 32.2 +3.6 −4.8 M ; see Table I in [127]. The quoted numbers are an extremely concise way of summarising the full posterior distribution. The output of LALInference analyses are samples from the full posterior distribution. In particular, a number such as 38.5 +5.6 −3.6 M comes from the marginalisation over 14 of the 15 physical source parameters of the full posterior, as well as the calibration parameters, to obtain a one-dimensional posterior from which the 90% credible region is then calculated. Naturally, correlations between different parameters are invisible in a one-dimensional representation. For a clearer picture, multidimensional posterior distributions help to display the information extracted from the analysis. Figure 13 shows the joint two-dimensional posterior distribution for the component masses m 1 and m 2 as an example. In particular, the bottom left panel shows the non-negligible correlation between the component masses.
The full details of the multi-dimensional posterior distribution can be visualised in a compact way by computing the posterior distribution over the waveform itself in the time domain. This is done simply by computing the predicted waveform over each of the posterior samples. Let θ i be the i-th posterior sample, the corresponding waveform will be h(t; θ i ). The waveform samples are the set {h(t; θ i )} i=1,...,N ≡ {h i }. Each of the waveform samples h i can be whitened, see Section 3, and then used to compute credible intervals at every time t j at which the original data were sampled. The result of this procedure is summarised in the presentation of Figure 6 of [127]. Figure 1 in [1] is representing a different procedure; the second row in this figure shows a comparison between the reconstructed 90% credible region obtained by the procedure described above, and a numerical relativity solution that, while not corresponding to any of the computed posterior samples, is consistent with the reconstructed 90% credible region.

Validation of source parameter estimates
The results from Bayesian inference are only as good as the models used in the analysis. If the waveforms used in the signal model or the underlying assumptions of the noise model are inaccurate, the results will suffer from systematic bias. A multitude of tests are used to check for possible mis-modeling error and to quantify the impact on the analyses. As discussed earlier in Section 9.1 the waveform models are compared to highly accurate numerical relativity simulations, and multiple waveform approximants are used in the analyses and cross-compared. The difference between the results found using different waveform models provides an estimate of the systematic error due to the signal model. The noise model can also be checked. Other checks include adding simulated signals with similar parameters to the astrophysical events into nearby stretches of data and checking that the parameters are properly recovered by the parameter estimation algorithms. Over long stretches of LIGO-Virgo data, the noise is known to be non-stationary and non-Gaussian. The overall noise levels fluctuate, and there are frequent low-SNR glitches, and less frequently high-SNR glitches, see Section 5. On the other hand, the gravitational-wave signals spend very little time in the LIGO-Virgo sensitive bandseconds or less for black hole binaries and minutes for neutron star binaries, and over these shorter stretches of time the noise is generally (but not always) well approximated as stationary and Gaussian. When a significant trigger is found by the search pipelines, the first thing the analysts look at are multi-resolution time-frequency scalograms of the data surrounding the trigger (known as Q-scans). Q-scans are qualitative checks which require visual inspection [151,152]. These scans reveal whether there are any loud glitches in the data, as was the case with the binary neutron star GW170817 [8]. Once the parameter estimation analyses have been run, Q-scans of the residuals are closely examined to see if any any unmodelled noise features might have affected the analyses. Figure 14 shows Q-scans of the whitened data and residuals surrounding GPS time 1126259462. The scans of the data reveal the signal from GW150914, while the residuals after subtracting the maximum likelihood waveform from the parameter estimation studies [127] show no visible evidence of glitches or correlated signal power.
In addition to these qualitative checks, more rigorous quantitative checks can be applied. One test that is routinely applied is to reanalyze the residuals using the wavelet-based BayesWave algorithm [128] which is able to identify any glitches and remaining coherent power. Coherent power in the residuals could be evidence of departures from general relativity, or evidence of shortcomings in the template models or the noise model used for parameter estimation. No significant coherent power was found in the residuals for any of the detected events. In the case of GW150914 the lack of a coherent residual was used to place interesting bounds on possible departures from general relativity [153]. In the case of the binary neutron star merger GW170817 [8], a loud incoherent glitch was seen to overlap the signal in the Livingston detector. The glitch was reconstructed and removed using the BayesWave algorithm. The glitch removal procedure has been shown to be safe in a study that injected simulated neutron star merger signals into data with similar loud glitches, followed by removing the glitches with BayesWave and accurately recovering the true signal parameters with the LVC parameter estimation algorithms [154].    [127]. These residuals were used to test for residual coherent power, which can be framed as a test of general relativity if we assume the template to be a sufficiently accurate solution of the theory. Applying the Anderson-Darling test of normality to the residuals yields p-values of 0.15 for LIGO-Hanford and 0.11 for LIGO-Livingston, indicating that the residuals are consistent with the Gaussian noise model used to define the likelihood.
Even if the residuals had failed the formal tests of stationarity and Gaussianity discussed here, it would not necessarily imply that the parameter estimation would be strongly biased. When the noise deviates from the model the analysis will suffer systematic bias. But for this bias to be significant it has to be large compared to the statistical spread in the posterior distributions. Extensive studies using simulated signals added to real LIGO-Virgo data have shown that systematic errors due to deviations from the noise models are generally negligible compared to the statistical uncertainties [104,143,[154][155][156][157]. One exception is when the simulated signals cover or overlap the times of glitches, in which case the biases can be large [158]. When glitches are present, tools such as BayesWave need to be used to model and remove the glitches, ideally in concert with the parameter estimation.

Parameter degeneracies and credible intervals
Gravitational-wave templates exhibit a variety of parameter degeneracies whereby templates with different parameters can have very similar amplitude and phase evolution, and yield very similar likelihoods. One example of such a degeneracy is evident in the posterior distribution for the component masses of GW150914 shown in Figure 13. The degree of similarity between templates with parameters λ, θ is measured by the match M(λ, θ) = (h(λ)|h(θ)) (h(λ)|h(λ))(h(θ)|h(θ)) .
If the true signal is described by h(λ), then the expectation value of the log likelihood for template h(θ), maximized over amplitude is where SNR is the optimal signal-to-noise ratio [159]. We see that signals that have similar morphology, as measured by the match, yield similar likelihoods. Now suppose we hold one parameter, θ k fixed, then maximize the likelihood with respect to all the other parameters. Up to an overall constant, we have [159,160] ln where FF(θ k ) is the fitting factor, or maximized match, between waveforms with θ k =θ k and the maximum likelihood waveform. Figure 16 compares the maximum likelihood as a function of the primary detector-frame mass for GW150914 to the fitting factor. The fitting factor as a function of m 1 was computed by maximizing the match between the overall maximum likelihood waveform and waveforms with fixed m 1 . Note that the posterior distribution for this event had a 90% credible interval of m 1 = 38.5 +5.6 −3.6 M , but templates with primary masses outside this interval continue to yield large fitting factors because other parameters can be adjusted to partly compensate the effects of the change in primary mass on the waveform. For example, we find a fitting factor between the maximum likelihood template for GW150914 and a template with a primary mass of m 1 = 70 M of FF = 0.95. Figure 16. The likelihood (in blue) and the fitting factor (in red) as a function of the detector-frame primary mass m 1 for GW150914. The dashed blue line shows the estimate of the likelihood in terms of the fitting factor from equation (31). The likelihood is scaled relative to the maximum likelihood value. The maximization was performed over the secondary mass, spins and extrinsic parameters.
The possibility of achieving high matches, or correlations, between templates with large primary masses and the maximum likelihood template have been cited as evidence that the LIGO-Virgo parameter estimation analysis for GW150914 and other systems may be flawed [43]. It was further hypothesized that the credible intervals were underestimated due to the instrument noise not conforming to the likelihood model [43]. However, our confidence in the reconstructed credible regions comes from extensive simulations designed to compare the cumulative distributions of simulated populations against the cumulative distribution of reconstructed credible regions. The agreement between the two, see for instance Figure 10 in [105], demonstrates that our algorithms are properly computing the credible intervals. Moreover, as we have shown, the noise properties for GW150914 are compatible with the likelihood model used in the parameter estimation studies, further reinforcing our confidence in the method used to compute the credible intervals.
There is no contradiction in having templates with parameters outside the quoted credible regions producing large fitting factors with the best fit model, since even small template mismatches come with a large penalty for high signal-to-noise ratio systems such as GW150914. For example, the difference in log likelihood between the signal with m 1 = 70 M and the global maximum is ∆ ln Λ = −32, which is what we expect to see for a SNR 25 signal and a template with a fitting factor of FF = 0.95. But the relative likelihood for the higher mass solution is e ∆ ln Λ = 10 −13.9 , thus while templates with large primary masses can produce relatively good matches to the data, the probability that the primary mass is this high is vanishingly small.

Residuals analysis of LIGO data around GW150914
The notion of a residual -the data minus the model -plays an important role in gravitational-wave data analysis. If the signal model matches well the true signal, then the residual should be consistent with a draw from the noise model p(n), the probability distribution for the noise. After known sources of correlation with independent witnesses are removed, we expect the instrument noise in the widely separated LIGO-Virgo detectors to be fully independent, and therefore the residuals in each detector to be uncorrelated. In contrast, gravitational-wave signals will excite a coherent response across the network of detectors, and this difference in correlation properties is one of the ways we are able to separate signals from noise.
As noted in Section 6, it is possible to have correlated transient noise due to lightning [92], but monitoring with magnetometers is presently adequate to rule that out as the cause of events like GW150914 [48]. Low-level correlated magnetic noise is more of a concern for the search for a stochastic gravitational-wave background [95][96][97]161]. Seismic noise is similarly monitored. Since the LIGO detectors share the same design and similar equipment, the frequencies associated with synchronized clocks (GPS), electrical power (60 Hz), and instrument resonances are monitored and suppressed in stochastic background and continuous-wave gravitational wave searches [49].
In this section we will use the data surrounding GW150914 to illustrate the discussion, but the same considerations apply in general, and analyses of the residuals have been reported for all significant events [119].

Signal and template comparisons
As introduced above, the physical parameters of the signal θ determine the shape and amplitude of the gravitational-wave signal h(t; θ). Numerical relativity simulations can be used to generate reference templates [24] using intrinsic parameters taken from the Bayesian parameter estimation studies. However, the templates still need to be projected onto the detectors using an appropriate set of extrinsic parameters. In a single detector the projection is equivalent to time shifting, phase shifting and rescaling the reference template:h(α, δt, δφ)(f ) = αh ref (f )e 2πif δt+iδφ . Figure 17 shows the reference numerical relativity template from Figure 2 of the GW150914 discovery paper [1] along with maximum likelihood projections onto each detector. A smooth taper has been applied to the start of the template to avoid spectral leakage when transforming to the Fourier domain. The data file for the template was taken from the original posting at the GWOSC [162] and originates from the simulation SXS:BBH:0305, calculated for a system with a mass ratio of q = 0.819, spins aligned with the orbital angular momentum with dimensionless magnitudes χ 1 = 0.330 and χ 2 = −0.440, and detector-frame total mass scaled to M = 74.6M . These waveform parameters are consistent with those eventually determined for GW150914, within uncertainties, but do not exactly maximize the likelihood globally. Using the maximization procedure described in Section 8 one finds that the signal arrived at the LIGO-Livingston detector 7.08 ms before the LIGO-Hanford detector, had a Figure 17.
The reference numerical relativity template provided by the GWOSC [162] for GW150914 is shown in the upper panel. The lower panels show the time, amplitude and phase shifted versions of the template that maximize the likelihood in each LIGO detector individually.
larger amplitude projected onto the antenna response pattern in LIGO-Hanford by a factor of 1.24, and had a phase difference of −2.9 radians. These are, however, based on finding maximum-likelihood matches to the detector data individually with a fixed waveform without constraining them to be consistent (for example, the relative time shift could in principle be greater than the maximum light travel time between the detectors), a simplified procedure compared to the simultaneous multi-detector likelihood maximization described in Section 9. When a loud signal is present in the data the individual and joint maximization techniques yield consistent results.
In Figure 1 of the GW150914 discovery paper [1] the LIGO-Hanford data were inverted (corresponding to a phase shift of ±π) and overlaid on the LIGO-Livingston data to illustrate the similarity of the signals in the two detectors with minimal processing of the raw data. In addition, the reference numerical relativity template described above was approximately matched to the LIGO-Hanford and LIGO-Livingston data by adjusting the relative phase, amplitude and time offset. These adjusted templates for each detector were passed through the same bandpass and notch (band-reject) filters as the data and were then subtracted to produce the residuals plotted in the third row of Figure 1 in that paper. Because those " Fig 1 PRL" residuals were not globally optimized and were calculated from filtered data, they produce a somewhat different result than minimizing the residuals in the whitened and bandpassed data, as we will see below. Figure 18 compares the whitened data to the whitened numerical relativity templates, maximized over arrival time, amplitude and phase, in the LIGO-Hanford Figure 18. The upper panels show the whitened and bandpassed data in the LIGO-Hanford and LIGO-Livingston detectors relative to GPS time 1126259462. The maximum likelihood whitened templates have been superimposed on the data. The lower panels show the residuals that are produced by subtracting the templates from the data. and LIGO-Livingston detectors. Also shown are the residuals produced by subtracting the templates from the data. Prior to the publication of the GW150914 discovery paper [1], multiple tests were applied to the residuals to verify they were consistent with noise. The whitened residuals in each detector were found to be consistent with a Gaussian distribution: the Fourier amplitudes pass the Anderson-Darling test (see Figure 15 in Section 9), and the Fourier phases were found to be randomly distributed. The residuals from Bayesian parameter estimation studies [127] were analyzed using a wavelet reconstruction algorithm [128] that is able to detect coherent signals of general morphology. The degree of coherence in the GW150914 residuals was found to be entirely consistent with noise [153].

Correlation analyses
A simpler, though less sensitive, test for coherence is to cross-correlate the data in the two detectors. The cross-correlation can be computed either in the time domain or the frequency domain using the whitened residuals. The correlation in the time domain is defined as: where H(t) and L(t) represent the data streams from LIGO-Hanford and LIGO-Livingston respectively. When working with a finite data segment of duration T the data may be taken to be periodic: H(t) = H(t + T ). The correlation measure is very sensitive to the positioning and duration of the time window and the bandpass filtering that is applied to the data. To make meaningful statements about the significance of the correlation we need to know the distribution of the correlation measure for uncorrelated white noise, and these distributions change depending on the duration and bandpass. When applied to uncorrelated, unit variance Gaussian noise, the correlation coefficients follow a zero mean Gaussian distribution with a variance that depends on the duration and bandpass. Following [42] we apply the correlation analysis to four different time windows. The standard deviations for white Gaussian noise are σ = 0.0870 for the 0.2 second segment, σ = 0.121 for either 0.1 second segment and σ = 0.193 for the 40 ms segment. First half Last half All 0.2s 0.39s-0.43s Figure 19.
Correlations between the LIGO-Hanford and LIGO-Livingston detector data using the same four time intervals used in the analysis in [42]. One time interval covers the full 0.2 seconds of data shown in Figure 18, and two others cover the first and last 0.1 seconds of the data. In addition, a very short time interval of duration 40 ms was selected that covers the peak of the signal. A time lag of 7 ms is highlighted as a dotted vertical line. The upper panel uses the filtered data from Figure 1 of the GW150914 discovery paper [1], while the lower panel uses the whitened data shown in Figure 18. Figure 19 shows the correlations using the whitened data shown in Figure 18 (bottom panel), and in addition, the bandpass/notch-filtered data used to produce the panels in Figure 1 of the GW150914 discovery paper [1] (top panel). There is a clear anti-correlation peak in the LIGO-Hanford -LIGO-Livingston data at a time lag of ∼ 7.3 ms, which is consistent with the time delay inferred for the gravitational-wave signal.  Correlations between the LIGO-Hanford and LIGO-Livingston residuals using the same four time intervals as Figure 19. The upper panel uses the residuals shown in Figure 1 of the GW150914 discovery paper [1], while the lower panel uses the whitened residual time series shown in Figure 18. The whitened residuals from the maximum likelihood signal subtraction show no significant correlations at any time lag for any of the time windows.
In contrast, Figure 20 shows the correlations in the residuals produced using the procedure described above. The residuals from Figure 18 show no notable anticorrelation at ∼ 7 ms (bottom panel), while those from Figure 1 of the GW150914 discovery paper [1] have a slight dip at this time lag (top panel), reflecting the fact that the reference waveform used for illustration in that paper was not the maximum likelihood waveform. For the shortest integration interval, the residuals from Figure 1 of the GW150914 discovery paper have a ∼ 3 sigma anti-correlation at a time lag of ∼ 7.45 ms, which while marginally consistent with noise, is evidence that the signal subtraction was imperfect. In contrast, the residuals produced using the amplitude/time/phase maximized NR waveforms and whitened data show no significant excursions, and are fully consistent with noise. This is also the case for the residuals from the Bayesian parameter estimation described in Section 9. Independent analyses of the GW150914 data have also found no significant correlations between the residuals in the Hanford and Livingston detectors [32,33].

Conclusions
In this article we presented a description of the properties of data from the LIGO and Virgo detectors, and an overview of the analysis methods used by the LVC in identifying and characterizing gravitational-wave signals from the coalescence of binary black hole and binary neutron star systems. We have especially looked closely at the data surrounding the first detection, GW150914 [1,3,163]. Contrary to the claims made in [42], there are no anomalous or unexpected correlations to be seen in association with the observed gravitational-wave events [119], including GW150914 [163]. Other analyses by independent researchers have come to similar conclusions about the correctness of the LIGO-Virgo results [32][33][34]39].
Proper handling of the LIGO and Virgo data is critical for conducting an analysis correctly. As an example, in this paper we have used the whitened maximum likelihood waveforms (as described in Sections 9 and 10) for GW150914, which when subtracted from the data, produce residuals that are consistent with Gaussian noise, and show no correlation between different detectors. If the template waveforms subtracted from the data are not sufficiently good matches to the real gravitational-wave signal, then a remainder of that signal will survive in the resulting residuals, which may thus exhibit nontrivial correlations. Figure 1 of [1] was constructed to show as simply as possible that the signal is compatible with general relativity. It does not illustrate the full LSC-Virgo statistical data analysis. The figure was described in [1] as a visualization of the gravitational-wave signal at the LIGO detectors and a comparison to one numerical relativity waveform which is consistent with the gravitational-wave data. A statistical claim about the numerical relativity waveform and the residuals of Figure 1 of [1] was not intended, although unfortunately the figure may have been interpreted in that way.
The LVC conducted extensive statistical studies of the GW150914 signal and of the surrounding noise, which are documented in [127]. Note that a whitened time series of GW150914 was presented in the parameter estimation companion paper for the discovery; see Figure 6 of [127]. Those studies, as well as the simpler investigations given here, support the interpretation that the signal is well matched by a black hole merger solution of general relativity. The validity of this conclusion has been supported by subsequent data and analysis by the LVC (including studies on all binary black hole produced gravitational-wave signals detected in observing runs O1 and O2 [119]) as well as independent analyses.
The gravitational-wave data for Advanced LIGO and Advanced Virgo can be characterized as locally stationary and Gaussian, with deviations when glitches are present. The LVC conducts extensive data quality, detector characterization, and calibration studies in order to be confident of the reported detections [47,48,50,74].
However, it is not necessary to assume that the data are stationary and Gaussian to search for, and to detect with high confidence, gravitational waves from compact binary coalescence. Instead, LIGO-Virgo searches for gravitational waves use various methods to estimate the false alarm rate directly from the data, for example, by introducing a relative time shift between the detectors.
Previous studies have also demonstrated that the LVC's parameter estimation results are reliable [104,143,[154][155][156][157]. The parameter estimation routines were also robust for the gravitational waves from the binary neutron star merger GW170817 where there was a noise glitch in the LIGO-Livingston data overlapping with the gravitational-wave signal [8,154]. Parameter estimates obtained by researchers outside the LVC for GW170817 are comparable with, and support the conclusions of, the LVC analyses [35][36][37]; these studies were made possible by the public release of the gravitational-wave data [24].
While the examples in this paper have concentrated on the events GW150914 and GW170817, the conclusions presented have been demonstrated to be valid for the analysis of the data containing all 11 gravitational-wave events detected by LIGO and Virgo to date [3,119]. As the LIGO and Virgo collaborations report more events [3,164], independent analyses of the data associated with these events by the broader scientific community will be highly valuable and may well produce new insights. To this end, in this paper we have tried to provide some guidance on the nature of LIGO and Virgo detector noise and on the extraction of gravitational-wave signals. The LVC encourages the scientific community to analyze its data; LIGO and Virgo data will continue to be made publicly available on the GWOSC website [24].