Subtracting glitches from gravitational-wave detector data during the third LIGO-Virgo observing run

D Davis; T B Littenberg; I M Romero-Shaw; M Millhouse; J McIver; F Di Renzo; G Ashton

doi:10.1088/1361-6382/aca238

1. Introduction

Data analysis of gravitational-wave signals from compact binary coalescences (CBCs) [1–4] requires precise understanding of the expected astrophysical waveforms as well as models of the noise in gravitational-wave detectors [5]. These analyses, and the identification of times containing gravitational waves, are limited by the amplitude of background noise, referred to as the sensitivity of the detectors. Numerous noise sources limit the sensitivity of ground-based gravitational-wave detectors [6–8], including fundamental noise sources that are due to the physical limitations of the design and materials of the detectors, and technical noise sources, that are due to the imperfections in the current detectors [9–11]. There are also short bursts of excess noise in the detectors, referred to as 'glitches', that further pollute the gravitational-wave strain data [10, 12, 13].

Typical Bayesian methods in gravitational-wave data analysis assume that the noise is stationary and Gaussian over the time period analysed [5]. These assumptions are violated, however, when glitches are present in the analysed data. When not accounted for as a part of the analyses, the presence of glitches can bias estimates of gravitational-wave properties [14–17]. There are some techniques proposed in which these assumptions have been relaxed [18–21], but these methods have not yet been used in an analysis by the LIGO-Virgo-KAGRA (LVK) collaboration.

Rather than being a rare edge case, it is quite common for glitches to be present near a gravitational-wave signal. The median rate of glitches in the LIGO and Virgo detectors during their third observing run (O3) [2, 3] were above 1 per minute for the majority of the run. At this rate, the probability a glitch to overlap a signal from a binary black hole is $\mathcal{O}$ (10%) and it is almost certain that a glitch will overlap a signal from a binary neutron star. These pessimistic estimates have held true for recent catalogues. In total, 18 of the events detected in O3 were deemed to require glitch subtraction before further analysis [2–4], including all confidently detected gravitational-wave signals from binary neutron stars [22, 23] and neutron star—black hole binaries [24, 25]. Since typical methods used to estimate gravitational-wave source properties do not account for these glitches, it is therefore important to develop alternate mitigation methods to reduce any potential biases.

Three main categories of glitch mitigation have been used in analyses of CBCs: removing the entire stretch of data containing a glitch; modelling glitches using the strain data and subtracting this model; and using an additional time series that witnesses the source of the glitch to subtract the relevant excess noise. The first approach is commonly used in searches for CBCs [26, 27], but requires additional modifications to the analysis techniques to be used in parameter estimation (PE) without introducing biases [20, 28]. For this reason, only the latter two techniques have been used in analyses by the LVK.

A wide variety of similar methods have been developed to model and subtract glitches. Many techniques have been developed to model glitches using gravitational-wave strain data alone, including modelled Bayesian approaches [19, 28–32] and some based on machine learning [33, 34]. Likewise, techniques that utilize the large number of additional auxiliary sensors at each observatory to model sources of noise with analytic methods [35–44] and machine-learning approaches [45–49] have been proposed.

Of these tools, the only algorithms that have been used to model and subtract glitches for LVK analyses [1–4] are BayesWave [28–30] and gwsubtract [35, 40]. BayesWave models glitches based on only the gravitational-wave strain data using wavelets. It has been used to subtract glitches in LVK analyses since the second observing run (O2) [22, 28] and is the most commonly used algorithm to date. Conversely, gwsubtract was used to subtract glitches in an LVK analysis for the first time in O3 [3]. However, this method was previously used for broadband noise subtraction in O2 [40]. gwsubtract models glitches using information from an additional auxiliary witness that is correlated with, but independent of, the strain data.

This work focuses on describing the versions of these tools that were used in O3, the methods for determining if glitch subtraction was needed, and the potential limitations of current glitch subtraction methods. We first provide background on these two methods in section 2, highlighting how the specific configuration of the tools used in O3 differs from previous or more recent versions of the tools. Section 2 also explains the methods used to evaluate the need for and results of glitch subtraction. We then focus on two events, GW190424_18 0648 and GW200129_06 5458 (which we will refer to as GW190424 and GW200129), in section 3 and investigate the result of glitch subtraction in each case and estimate the additional statistical errors introduced from glitch modelling. Finally, in section 4, we describe potential updates to these glitch mitigation methods for the next observing run and the expected challenges for glitch subtraction with a higher detection rate.

2. Methods

In the third observing run, two glitch subtraction methods were utilized as a pre-processing step before PE analyses were completed [2–4]. The first algorithm, BayesWave [28–30], was used for 15 events in O3 [2–4]. The second algorithm, gwsubtract [40], was used for only 1 event [3]. This section outlines the basic methods both algorithms use to model the nose contributions from a glitch. We also discuss the metrics we used to evaluate if glitch subtraction was needed and if the glitch subtraction was successful at removing the excess power from any identified glitches.

2.1. Glitch subtraction using wavelets

The BayesWave algorithm assumes that in each interferometer, the detector data h(t) can be expressed as

$\begin{align} h(t) = n(t) + s(t) + g(t) \end{align} \tag{ 1 }$

where n(t) is Gaussian noise, s(t) is an astrophysical signal, and g(t) is a non-Gaussian instrumental glitch. BayesWave models non-Gaussian features (both signal and glitch) as a sum of sine-Gaussian (also called Morlet-Gabor) wavelets. In the time domain, the wavelets are given by

$\begin{align} \Psi(t;\vec{\theta}) = A e^{-(t-t_0)^2/\tau^2}\cos\left(2\pi f_0(t-t_0)+\phi_0\right) \end{align} \tag{ 2 }$

where $\vec{\theta} = \{t_0, f_0, Q, A, \phi_0\}$ are the parameters of each wavelet, and $\tau = Q/(2\pi f_0)$ . The wavelet parameters, as well as the total number of wavelets used, is marginalized over via a transdimensional Markov chain Monte Carlo (MCMC).

For the glitch model, the wavelet parameters in each detector are independent. For the signal model, there is one common set of wavelets which are projected onto the detectors using a set of extrinsic parameters that describe the sky location and polarisation content. The power spectral density (PSD) is modelled using [50], which also uses a transdimensional MCMC to model the noise as a combination of a cubic spline and Lorentzians. Full details can be found in [29, 30].

The transient features in the data can be reconstructed using three different model assumptions:

(a)
The data contain Gaussian noise with an astrophysical signal.
(b)
The data contain Gaussian noise with an instrumental glitch.
(c)
The data contain Gaussian noise with both a signal and a glitch.

To identify and remove glitches from detector data, we use either the glitch-only model (case II above), or signal-plus-glitch model (case III). The glitch-only approach is used when the glitch is sufficiently separated in time and/or frequency from the part of the signal that BayesWave can reconstruct. When using the glitch-only approach, we only need to consider data from the single detector in which the glitch appears. If the signal and glitch overlap significantly, we use the signal-plus-glitch model. The signal-plus-glitch model requires data from multiple detectors in order to separate the coherent signal power and the incoherent glitch power. By simultaneously fitting the for signals and glitches, we ensure that we do not subtract any signal power during glitch mitigation.

As BayesWave uses an MCMC to marginalize over the parameters, the end result is a posterior distribution of time series of the glitch, g(t). To actually perform the glitch mitigation, we select a fair draw from the posterior [31] by randomly selecting a single sample from the posterior, and subtract it from the detector data. The timescale to model glitches and produce glitch-subtracted data using BayesWave is up to multiple days depending on the type of glitch that is being analysed [30]. The code for both BayesWave and the BayesWave glitch subtraction methods can be found at [51].

2.2. Linear subtraction with a witness

The codebase used to subtract glitches using an auxiliary witness, gwsubtract, is the same linear subtraction algorithm as was presented in [40]. In this section, we provide a short description of this method, and note how the use case considered in this work differs from the original use case presented in [40]. The code associated with gwsubtract is publicly available at [52].

In order to subtract noise, we first assume that the measured strain in the gravitational-wave detector, h(t), is a linear combination of time series from different sources, $n_i(t)$ . We further assume that at least one of these noise sources can be modelled as the convolution of a witness time series, a(t), and an unknown transfer function $c_{ah}(t)$ . In the time domain, the gravitational-wave strain can therefore be written as

$\begin{align} h(t) & = n_1(t) + n_2(t) + \ldots + n_N(t) \nonumber\\ & = n_1(t) + n_2(t) + \ldots + a(t) * c_{ah}(t) \nonumber\\ & = h^{^{\prime}}(t) + a(t) * c_{ah}(t) \end{align} \tag{ 3 }$

where all noise sources except one are included in $h^{^{\prime}}(t)$ . This can be conveniently cast into the frequency domain as

$\begin{align} \tilde{h}(f\,) = \tilde{h}^{^{\prime}}(f\,) + \tilde{a}(f\,) \cdot \tilde{c}_{ah}(f\,). \end{align} \tag{ 4 }$

Following the derivation from [35, 40], we can estimate the unknown transfer function by using the discrete Fourier transform of the strain data, $\tilde{Y}_h(f\,)$ , and the witness time series, $\tilde{Y}_h(f\,)$ . A single value is calculated for the frequency band $[\,f_1, f_2)$ via

$\begin{align} \tilde{c}_{ah}(f^{\,^{\prime}}) = \frac{df}{f_2-f_1} \sum_{f = f_1}^{f_2} \tilde{Y}_h(f\,) \tilde{Y}_a^{\,*}(f\,), \end{align} \tag{ 5 }$

where df is the frequency resolution of the data.

By averaging nearby frequencies when calculating the transfer function, contributions to the measurement from chance correlations are reduced. Specifically, the fractional measurement error from this averaging method is expected to be $\sqrt{\frac{df}{f_2-f_1}}$ if both h(t) and a(t) are stationary Gaussian processes.

Although the basic description of the process for glitch subtraction is quite similar to [35, 40], the process has a few practical differences. In [35, 40], multiple witness time series were used to subtract contributions from a single noise source, requiring that correlations between each of the witness time series were also calculated. In this work, only a single witness time series is considered. As the glitch subtraction can be completed by gwsubtract with a few straightforward calculations, the timescale to produce this data for one event is generally less than multiple hours.

An additional key difference is the stationary nature of the time series. Data that contains glitches is, by definition, not a stationary Gaussian time series. Regardless of the use case, if the transfer function is still stationary and linear, then theoretically this should not prevent these linear subtraction methods from being used. However, it is possible that the true transfer functions may have some non-linear features that are not captured by this method. It is also possible that some of the methods used in [40] to reduce the impact of false correlations could prevent the transfer function from being measured accurately. For these reasons, the ideal type of glitch to subtract with these methods is one that occurs at a high rate over a short time period. This increases the probability that the transfer function will be accurately measured.

Due to these practical differences between glitch subtraction and the use case that [40] was originally designed for, we completed additional validation tests to confirm that the methods of [40] were able to be used for glitch subtractions. These tests are summarized in appendix A.

2.3. Estimating the significance of residuals

One of the key steps in the glitch subtraction process, after deciding whether to subtract excess power in the data, is determining whether this excess power was successfully subtracted. For these purposes, we employ a residual test inspired by [53] to evaluate the statistical significance of any non-Gaussian features in the data. This test relies upon comparing the PSD of the data during the time of interest against the PSD measured over 512 s. Any positive fluctuations in the short-duration PSD are considered excess power.

Following the description in [53], we aim to measure the PSD variation, ρ, which is

$\begin{align} \rho = N \int_{t_0-\Delta t}^{t_0} |F(t) * s(t) |^2~\text{d}t \end{align} \tag{ 6 }$

for some data, s(t), at a given time, t₀. $\Delta t$ is the timescale on which the PSD variation is measured, F(t) is a filter function to weight different frequencies that are being analysed, and N is a normalizing constant. In the frequency domain with discrete samples, this is

$\begin{align} \rho = N \sum_{f = f_\text{low}}^{f_\text{high}} |F(f\,) s(f\,)|^2~\Delta f. \end{align} \tag{ 7 }$

We approximate the squared frequency spectrum of the data, $|s(f\,)|^2$ , as the PSD of the data measured using GWpy [54] over a short duration that overlaps the glitch. We refer to this as $S_G(f\,)$ . Rather than designing a filter function based on a gravitational-wave inspiral signal (as was done in [53]), we choose to use a filter function that is weighted by the sensitivity of the detector. Specifically, $F(f\,) = 1/S_L(f\,)$ , where $S_L(f\,)$ is the PSD of the data over 512 s of data measured using the 'median method' [55] with windows the same size as the short duration PSD and $50\%$ overlap. With these substitutions, the variation in the PSD, ρ is

$\begin{align} \rho = \sum_f S_L(f\,) \times \sum_f \frac{S_G(f\,)}{S_L^2(f\,)}, \end{align} \tag{ 8 }$

where the normalizing constant, N, is

$\begin{align} N = \sum_f \frac{S_L(f\,)}{\Delta f}. \end{align} \tag{ 9 }$

However, the variance of this statistic is dependent on both the time window used to estimate $S_G(f\,)$ , $\Delta t$ , and the 'effective bandwidth' considered, $\Delta f_\textrm{eff}$ [53], making it unsuitable for comparisons where these two quantities differ. Specifically, the variance is

$\begin{align} \mathrm{Var}(\rho) = (\Delta t \Delta f_\textrm{eff})^{-1}. \end{align} \tag{ 10 }$

The effective bandwidth is dependent on both the bandwidth considered, $\Delta f = f_\textrm{high}-f_\textrm{low}$ , and the shape of the PSD. This effective bandwidth is calculated via

$\begin{align} \Delta f_\textrm{eff} \approx \int_{f_\textrm{low}}^{f_\textrm{high}} \frac{\textrm{max} [S_L(f\,)]}{S_L(f\,)}df. \end{align} \tag{ 11 }$

To account for this dependence on $\Delta t$ and $\Delta f$ , we normalize the statistic by the standard deviation for the parameters we use in each instance and subtract the mean of the distribution. Hence our new statistic, $\tilde{\rho}$ is

$\begin{align} \tilde{\rho} = (\rho - 1) \times \sqrt{\Delta t \Delta f_\textrm{eff}}. \end{align} \tag{ 12 }$

This new statistic has a mean of 0 and a variance of 1 for all choices of $\Delta t$ and $\Delta f$ .

Although this new statistic is less susceptible to analysis choices for the reasons listed above, the true variance of the statistic may be different due to practical limitations in the measurement of the PSD. For example, it is not possible to calculate this statistic when $\Delta f \lt \Delta t^{-1}$ as there will be no points in the measured PSD in the bandwidth of interest. When $\Delta f \approx \Delta t^{-1}$ , we also see additional variance in the statistic that is not accounted for in the normalization due to the large frequency resolution of the PSD. If the frequency resolution of the PSD is sufficiently coarse, only a small number of data points are the bandwidth of interest, inflating the measured variance.

To address this issue, we calculate a p-value for the measured value of $\tilde{\rho}$ for a specific choice of $\Delta t$ and $\Delta f$ using simulated data. In each case, we generate 2048 s of stationary, Gaussian data with the measured long-duration PSD using Bilby [56, 57]. We then repeat the same procedure over each sequential short-duration time window for the whole time period simulated. These results are used to generate the distribution from which the p-value is estimated.

An example distribution of this statistic based on data around GW190425 [23] is shown in figure 1. Before subtraction, the measured statistic value ρ was ≈180, much larger than any trial in the simulated background. After subtraction, the statistic dropped to 1.4, which was higher than $18\%$ of the simulated noise, corresponding to a p-value of 0.18. This can be seen from the cumulative distribution of the simulated noise, shown on the left, and a histogram of the measured statistics in the noise, shown on the right. The plotted histogram also demonstrates that the peak of this distribution is indeed close to 0. The distribution of statistics is asymmetric, with a longer tail towards larger values. This asymmetry, even for cases that should be well-behaved, motivates the use of the p-value rather than a threshold on the measured statistic based on an assumed variance.

2.4. Glitch subtraction validation procedures

In order to ensure that the glitch-subtracted data used in downstream analyses do not introduce biases, a series of glitch-subtraction review tests were completed as a part of the O3 LVK analyses [2–4]. These tests focus on ensuring that either the data was consistent with Gaussian noise or that any residual instrumental excess power does not bias estimates of an event's source properties. A flow chart providing an overview of the steps in this process can be seen in figure 2. Previous investigations into the potential biases on PE when glitches overlap or are nearby signals [14–17] have not yet covered the entire parameter space of glitch types and where in the signal the glitch overlaps. We therefore use conservative thresholds for each test to minimize potential biases.

**Figure 2.** Flowchart of the glitch subtraction and parameter estimation (PE) procedures discussed in section 2.4. Note that the two steps involving 'Complete glitch subtraction with chosen algorithm' and 'Re-assess data after glitch subtraction with stationarity test' may be completed multiple times if the first attempt at glitch subtraction does not pass the stationarity test. The colouring corresponds to the general theme of each part of the process: blue is data evaluation steps, beige is modelling and subtracting identified glitches, red is tests with PE analyses, and purple is coordinating and completing PE analyses for GWTC analyses [2–4].
Download figure:
Standard image High-resolution image

In order to identify if data surrounding gravitational-wave events required glitch subtraction, we first utilize the results of event validation procedures completed for each event. These procedures for O3 are detailed in [10, 58]. The outcome of event validation specifically indicates if any glitches were nearby a gravitational-wave signal. Events that do not have any glitches flagged by these procedures are deemed not to require glitch subtraction.

For flagged events, the residual test described in section 2.3 is used to determine if the data is inconsistent with Gaussian noise. The time–frequency bounds to use for the residual test are determined by visual inspection of the data around events using spectrograms produced with the Q-transform [59]. Cases where the residual p-value was greater than 0.1 are deemed consistent with Gaussian noise and no glitch subtraction is attempted, cases with p-values less than 0.001 are passed on for glitch subtraction, and cases with p-values between these two thresholds require further follow up before a decision was reached. In the cases when the result is inconclusive, the overlap of the excess power and signal is used to determine if glitch subtraction is needed; glitch subtraction is generally recommended in cases where excess power directed overlapped the signal. As previously described, glitch subtraction was completed with BayesWave in all but 1 case where gwsubtract was used. The choice of algorithm for each event is made based on the suitability of each algorithm to subtract the glitches in glitches. Technical considerations are not a part of this choice as the time to completion and computing usage are not a limiting factor in the generation of reviewed, glitch-subtracted data.

Once the glitch subtraction was completed, the data is reevaluated to determine if the excess power is sufficiently subtracted. The residual metric is recomputed for the glitch-subtracted data using the same time–frequency regions. The same p-value thresholds are also used to decide if the data is now consistent with Gaussian noise.

In addition to the residual tests, comparisons of PE results using different mitigation methods are used to evaluate if further glitch subtraction is needed or if a different glitch mitigation method should be used¹¹ . To test if excess power in the data could bias analyses, we estimate the source properties of events using either the nominal lower frequency limit (20 Hz) or a higher lower limit that excludes the data containing glitches. In cases where the glitch subtraction is able to improve the stationarity of the data, analyses using the original and glitch subtracted data are also compared. If the glitch-subtracted data does not pass the residual test, different frequency bounds that exclude the flagged time–frequency regions are recommended if the different frequency bound changed the measured chirp mass¹² posterior by more than 1σ and that the signal-to-noise ratio (SNR) is reduced by less than $10\%$ . Other parameters, such as χ_p and ${\chi_\text{eff}}^{12}$ are also compared between analyses. Although not the deciding factor in determining which mitigation method is recommended, any differences in these parameters between analyses are flagged. Additional details about these tests can be found in appendix B.

Once either the data around an event passed the residual test or the identified glitches are deemed to not bias PE, the data is approved for use in downstream analyses. These procedures generally are completed for each event within a few days (for events not requiring glitch subtraction) to weeks (for events that require glitch subtraction). The timeline for this process is limited by the need for human input at multiple steps; in future observing runs, additional automation will be needed to significantly speed up these procedures.

3. Results with real events

In this section, we consider the data around two events, GW190424 [2] and GW200129 [3]. Both events directly overlapped glitches that were mitigated using the glitch subtraction procedures discussed in this work. Data around GW190424 was processed with BayesWave and data around GW200129 was processed using gwsubtract. GW190424 was one of the more challenging cases of glitches that were subtracted with BayesWave while GW200129 is the only event in O3 processed with gwsubtract. In both cases, however, there are well-known witnesses to the source of the glitches near each event. We are therefore confident that the power subtracted is due to glitches as opposed to an unmodelled astrophysical signal.

GW190424 directly overlapped two different types of glitches; one glitch was due to a mechanical camera shutter that was inadvertently left running on a timer [58] and one was due to scattered light that is reflected off a moving surface [60]. A data quality flag was developed that removed time segments containing glitches from the astrophysical searches due to these camera shutter glitches [61]. GW190424 is an event of interest as one of the signals in O3 that was detected using only the data from a single gravitational-wave detector. There was therefore no data from a different detector available to help disentangle coherent power (that is astrophysical) from incoherent power (that is instrumental). Instead, glitch power was identified using auxiliary witnesses and then modelled using BayesWave. For events in O3 that were detected in more than one detector, it was instead possible to disentangle incoherent and coherent power using the multiple strain time series. Although these noise sources were correlated with auxiliary witnesses, gwsubtract was not feasible to be used in this case as scattered light is not linearly correlated with any witnesses and the camera shutter glitches were too infrequent and short to accurately measure the relevant transfer function.

GW200129 directly overlapped glitches that were due to a fault in the 45 MHz electro-optic modulator driver system at LIGO Livingston [62]. This type of glitch was repeating for many minutes around the event and was well-witnessed by the modulation control system. Furthermore, the broadband nature of this glitching is challenging to model with the sine-Gaussian wavelets used in BayesWave. For these reasons, this glitch type was addressed using linear subtraction. These glitches were also vetoed by a data quality flag in O3 [61]. GW200129 is of interest astrophysically as the gravitational-wave event with potentially the strongest evidence to date in favour of spin-induced precession in a CBC [3, 63–65]. The gravitational-wave strain from a spin-precessing binary is lower at specific frequencies as compared to the corresponding non-precessing signal. As the effect of orbital precession could be mimicked by the presence of non-Gaussian noise, it is essential to the analysis of any signal that any glitches present are well-modelled and subtracted. Conversely, the uncertainties on glitch modelling could introduce additional uncertainties in the PE results [65].

Including these two events, a total of 16 events required glitch subtraction in O3. Table 1 lists all of these events, the interferometer from which the glitch was subtracted, and the time–frequency regions that were targeted for glitch subtraction. Note that due to the nature of the glitch subtraction algorithms used, glitch power may also have been subtracted from outside the listed time–frequency regions. However, these time–frequency bounds correspond to the time windows and bandwidth targeted for glitch subtraction and used for residual tests. An additional 10 events¹³ were assessed with these review processes but were determined to not require glitch subtraction. This was due to the identified time–frequency regions passing the residual test or being located outside the time window used in PE analyses. All glitch-subtracted data is publicly available [66–68].

Table 1. A list of all of the events from O3 LVK analyses [2–4] which required glitch subtraction. For each event, we list: the GPS time of the event; the interferometer(s) which required glitch subtraction; and the time window (relative to the event time) and frequency range targeted for glitch subtraction. These time–frequency regions were used for the residual tests of glitch-subtracted data. Note that additional excess power may have been subtracted outside of these time–frequency bounds. Glitches around all events were subtracted with BayesWave except for glitches around GW200129_06 5458, which were subtracted with gwsubtract.

Event	GPS time	IFO	Time (s)	Frequency (Hz)
GW190413_13 4308	12 391 98 206.74	L	[−1.7, 0.7]	[20, 35]
GW190424_18 0648	12 401 64 426.14	L	[−.3, 0.1]	[60, 100]
			[−1.5, 0.5]	[20, 40]
GW190425	12 402 15 503.02	L	[−61.57, −59.97]	[20, 50]
GW190503_18 5404	12 409 44 862.30	L	[−1.1, 0.5]	[20, 35]
		L	[−3.9, −1.9]	[20, 40]
GW190513_20 5428	12 418 16 086.75	L	[−5.65, 0.15]	[15, 40]
GW190514_06 5416	12 418 52 074.89	L	[−1.3, 2.7]	[15, 30]
		L	[−5.3, 0.3]	[15, 30]
GW190701_20 3306	12 460 48 404.80	L	[−1.28, 1.72]	[20,55]
GW190924_02 1846	12 533 26 744.85	L	[−5.0, −3.0]	[18 110]
GW191105_14 3521	12 569 99 739.93	V	[−7.05, −5.65]	[25, 50]
			[0.1, 7.1]	[15, 30]
			[−4.35, −3.15]	[16, 22]
			[−6.2, −5.0]	[18, 25]
GW191109_01 0717	12 572 96 855.22	H	[−3.1, −0.1]	[25, 45]
		L	[−1.5, 1.5]	[20, 32]
GW191113_07 1753	12 576 64 691.84	H	[−1.75, 1.75]	[25, 60]
			[−4.4, −3.6]	[25, 55]
GW191127_05 0227	12 588 66 165.55	H	[−2.125, 0.375]	[25, 40
			[0.0, 3.0]	[20, 45]
GW191219_16 3120	12 608 08 298.45	H	[−7.85, −4.35]	[20, 1000]
		L	[4.5, 7.0]	[20, 25]
GW200105_16 2426	12 622 76 684.06	L	[−4.25, 0.25]	[18, 22]
GW200115_04 2309	12 630 97 407.74	L	[−70, 30]	[20, 25]
GW200129_06 5458	12 643 16 116.44	L	[−0.525, 0.275]	[30, 60]
			[0.6, 1.6]	[30, 55]

3.1. Glitch models for events in O3

The process of glitch subtraction can be broken into two stages: glitch modelling and glitch subtraction. The subtraction process is done simply by finding the difference between the glitch model and the original data. However, as described in section 2, the methods used by the tools described in this work to estimate the glitch model differ significantly. These differences are highlighted in the two representative events we have considered in this section.

The first data visualizations we will consider for our comparisons are spectrograms generated using the Q-transform [59]. Spectrograms of the data around GW190424 and GW200129 from LIGO Livingston can be seen in figure 3. The panels of this figure show the data before subtraction, after subtraction, and the difference of the two spectrograms. The power subtracted near GW190424 is confined to two distinct burst of power: one corresponding to power from the camera shutter glitch at ≈70 Hz, and the other one from the arches in the nearby scattered light glitch. These bursts of power are not directly overlapping the signal, but are near enough that they might bias PE results. Conversely, the power subtracted near GW200129 is more broadband, with visible glitch power below 70 Hz subtracted for multiple seconds that directly overlaps the signal.

**Figure 3.** Spectrograms of the data around GW190424 and GW200129 before glitch subtraction (Top), the data after glitch subtraction (Middle), and the residual between the two which corresponds to the subtracted glitch model (Bottom). Data around GW190424 is shown on the left and data around GW200129 is shown on the right. All spectrograms are generated with GWpy [54] using the Q-transform [59]. The bottom panels highlight the time–frequency regions that were modified by glitch subtraction. Glitches around GW190424 were processed using `BayesWave`, resulting to the subtraction distinct time–frequency regions. Glitches around GW200129 were conversely processed with `gwsubtract`, resulting in broadband subtraction below 70 Hz.
Download figure:
Standard image High-resolution image

A comparison of the original strain data, the deglitched data, and the glitch models can be seen in figure 4. Both the glitch models and the strain data have been whitened using GWpy [54]. The same types of features that were visible in the figure 3 can be seen in the whitened time series. The glitch model for GW190424 is confined to two short bursts, while the glitch model for GW200129 is non-zero over the entire time considered.

These results also demonstrate limitations of these two methods. BayesWave was not able to successfully subtract all of the excess power from the scattering glitches, while the linear subtraction was not able to subtract any power above 70 Hz. Scattered light glitches are a known challenge for BayesWave. Recent investigations [31] have shown that in order to fully model scattered light glitches with BayesWave, low-SNR wavelets must be used. These investigations also find that wavelets at the required SNRs are strongly disfavoured when using priors similar to those used by the analyses discussed in this work. The linear subtraction method is also limited by the input data used to witness the glitch power. In the case of GW200129, the auxiliary sensor used in the subtraction was negligibly coherent with the gravitational-wave strain above 70 Hz. In other cases, such as the broadband noise subtraction performed in O2 [40], the sampling rate of the auxiliary sensors can limit the range of frequencies where subtraction is effective. Ultimately the effectiveness of the linear subtraction method relies upon how faithfully the relevant auxiliary sensors are able to witness the noise source of interest.

3.2. Estimated uncertainties

As previously stated, the uncertainty in the model of a glitch overlapping a gravitational-wave signal introduces another potential source of uncertainty in the PE results. Although it is possible to sample over the glitch model as an additional component in the PE (as is possible with recent updates to BayesWave [19, 31]), this was not done in the O3 analyses. The impact of accounting for the statistical uncertainties in the BayesWave glitch model while performing gravitational-wave PE is explored in [31]. Instead of these methods, a single glitch model was subtracted from the data as a preprocessing step. Although this means that any glitch modelling errors are not included in the analysis, this is not expected to significantly bias the results if the chosen glitch model is at least representative of the true glitch model. In this subsection, we discuss how we assess errors in glitch modelling for both BayesWave and gwsubtract, and demonstrate that statistical errors in the chosen glitch models are qualitatively low.

As discussed in section 2.1, the Bayesian nature of BayesWave results in a posterior distribution of the glitch time series. Previous studies of both simulated signals [69] and real glitches [31] have shown that the accuracy of BayesWave is related to the duration of the data analysed; BayesWave more accurately reconstructs short-duration ( $\lesssim$ 1 s) events. To produce the glitch-subtracted data, we use a fair draw from the posterior distribution. An alternative choice is to use the median of the glitch time series posterior, however in some cases this can lead to oversubtraction [31]. An example of the 90% credible interval of the glitch time series posterior for GW190424, along with the whitened glitch model (a fair draw from the posterior), is shown in figure 5. We see that although the fair draw is a good representation of a possible glitch model, we could get slightly different results depending on which glitch time series we happen to draw.

**Figure 5.** Top: The 90% credible interval of the whitened glitch time series posterior from `BayesWave`, and a fair draw of a single whitened waveform from the posterior (blue). Bottom: A comparison of the 90% confidence interval of the whitened glitch time series from `gwsubtract` (orange) and the whitened glitch waveform subtracted when measuring the transfer function over different time periods (blue). The different time series are consistent in amplitude and phase. Note the differing axis scales for each panel. The `BayesWave` 90% credible is much larger than the plotted `gwsubtract` 90% confidence interval due to the different methods used by each algorithm to estimate the errors. Due to the differing error estimation methods, the relative amplitude of the shown credible/confidence regions should not be used to compare the precision of the two methods.
Download figure:
Standard image High-resolution image

Statistical errors in the glitch model from gwsubtract come from both chance correlations between the two data streams considered and errors in the transfer function measurement itself. As described in section 2.2, the errors from chance correlations can be calculated directly from parameters used in the glitch modelling. In the case of GW200129 the 1σ statistical errors from chance correlations should be $\frac{1}{\sqrt{2048}} \approx 0.022$ . The transfer function measurement errors, however, are not directly estimated as part of the glitch modelling process. However, if we assume that the transfer function is stationary over a timescale longer than used to measure the transfer function, we can take multiple measurements to estimate this source of error.

In order to qualitatively estimate the significance of measurement errors, we recalculate the model of the glitches overlapping GW200129 after changing the parameters used in gwsubtract. Specifically, we shift the time window used to estimate the transfer function by $\{-1024, -512, 512, 1024\}$ s with respect to original run. The glitch model estimated for these four additional attempts is shown in figure 5 compared against the original model. The statistical errors on the original glitch model due to chance correlations is represented by the orange shaded region. All five runs show qualitative agreement in the amplitude and phase of the glitch, within the expected errors from chance correlations. We therefore conclude that the statistical errors from the glitch modelling procedure are low and should not introduce biases in the analysis.

It is important to note that these estimates of the errors do account for systematic errors due to modelling assumptions. Although the wavelet model used by BayesWave aims to be fully generic, this model, along with specific choices on the priors used in the analysis, is not able to fully model all glitch features for all glitches. Furthermore, specific glitch classes are known to be more problematic for BayesWave than others [31]. Likewise, gwsubtract is limited by the assumption that the witness sensor used to subtract the excess noise is a perfect witness for the noise source and that the transfer function between the witness sensor and the gravitational-wave strain is linear over the amplitudes of interest. These types of systematic errors are mitigated by investigations into the limitations of each glitch subtraction algorithm and the choice of which algorithm to apply in each scenario. This type of choice was one of the reasons that gwsubtract was used for one of the events in O3. Other problematic glitches impacting events in O3 were investigated using gwsubtract, but only the glitches around GW200129 were successfully mitigated with this algorithm.

The potential impact of unaccounted systemic error sources on PE analyses is not addressed in this work. However, studies of BayesWave-based PE suggest that systematic errors are low relative to other sources of error in estimation of gravitational-wave event source properties [31]. Systematic uncertainties in the case of gwsubtract are more difficult to probe, as these will vary with each noise source considered. Previous investigations into the use of gwsubtract and other tools for broadband subtraction did not identify biases [40, 70], but it is not clear if these studies are fully applicable in this case due to this differences explained in section 2.2. As the sensitivity of the detectors increases, these systematic uncertainties may have a larger impact on the measured source properties. Recent investigations into GW200129 have already shown that the evidence of orbital precession from this event may be impacted by systematic errors in the glitch modelling [65].

4. Future outlook

The methods described in this work have been used to subtract glitches nearby 16 of the gravitational-wave events identified by LVK analyses in O3, representing ≈20% the total population to date. The high rate of glitches and gravitational-wave signals means that many yet-to-be identified signals will overlap glitches as well. The need for this type of mitigation will only grow as the rate of discovery increases in future observing runs [71]. It is therefore important to consider glitch mitigation and subtraction as an inherent component of gravitational-wave data analysis.

As the detection rate grows further, there will be a need for additional automation in both the glitch subtraction algorithms and the process for deciding on their use and configuration. O3 represents the first time that procedures to identify the need for glitch subtraction were utilized due to the high rate of both glitches and signals. The need for analysis of more events in O4 will only reduce the amount of person power available on a per-event basis. One of the biggest foreseen challenges for future observing runs is reducing human input required to generate glitch-subtracted data. The procedures and methods described in this work will form a basis for the evaluation of events in future observing runs. This effort will also be significantly helped by investigations into the impact of glitches on specific gravitational-wave analyses and additional automation.

One key area of research that is required to be confident in the astrophysical implications of gravitational-wave signals overlapping glitches is the systematic uncertainties and limitations of different glitch subtraction methods. Further complicating these efforts is the potential variability between glitches of the same source, which makes it difficult to apply investigations from past observing runs to new gravitational-wave events. Although challenging to estimate, these systematic uncertainties may have significant impact on the conclusions derived from individual events as the sensitivity of the detectors increase. We therefore urge caution in interpreting results that are dependent on glitch subtraction algorithms when the potential systematics are not fully understood. Although this work and previous studies have indicated that such systematics are qualitatively small for tools similar to those used in O3 [31, 39, 45], this may not be true for all analyses [65]. It is also important to note, however, that while these previous studies have shown that these tools have statistically unbiased results, this does not mean that analyses of individual events will not be impacted.

The case studies considered in this work also demonstrate the benefit of multiple glitch subtraction tools. In addition to the personpower limitations created by relying upon a single tool, the wide variety of glitches and glitch-signal overlaps that are possible mean that it is unlikely that a single glitch subtraction method will be best suited for every scenario. Even tools that are limited in scope can be useful, as was demonstrated by GW200129. Additional glitch subtraction methods also could be used as a potential cross-check to understand potential systematic due to specific glitch subtraction tools, as was done in [65]. The authors therefore encourage additional investigation into alternative, targeted glitch subtraction methods. However, new tools are only optimally beneficial when their effectiveness is compared to current methods to understand the specific cases in which one tool is better suited than another.

As previously described, there are a wide variety of additional glitch subtraction methods that are already available [32, 36–39, 41–49] or may be ready for use by the next observing run. There are also updates already implemented to BayesWave [19, 31] that were not available for use in O3. However, the overall landscape for the data quality of the detectors and the need for glitch subtraction is expected to remain unchanged, as the sources of numerous classes of glitch remain unknown [10, 12, 13]. With the next observing run planned to start in the coming year [71], now is the time for identifying lessons learned from the O3 glitch subtraction experience and developing new methods to address current limitations.

Acknowledgment

The authors thank the LIGO-Virgo-KAGRA Detector Characterization and Parameter Estimation groups for their input and suggestions during the development of this work. We thank Sophie Hourihane and Katerina Chatziioannou for discussions about BayesWave. We also thank Ronaldas Macas for their comments during internal review of this paper. D D is supported by the NSF as a part of the LIGO Laboratory. I M R-S acknowledges support received from the Herchel Smith Postdoctoral Fellowship Fund.

This material is based upon work supported by NSF's LIGO Laboratory which is a major facility fully funded by the National Science Foundation. LIGO was constructed by the California Institute of Technology and Massachusetts Institute of Technology with funding from the National Science Foundation, and operates under cooperative agreement PHY-1764464. Advanced LIGO was built under award PHY-0823459. The authors are grateful for computational resources provided by the LIGO Laboratory and supported by National Science Foundation Grants PHY-0757058 and PHY-0823459. This work carries LIGO document number P2200192.

Parts of this research are supported by the Australian Research Council (ARC) Centre of Excellence for Gravitational Wave Discovery (OzGrav) (Project Number CE170100004) and ARC Discovery Project DP170103625.

Data availability statement

The data that support the findings of this study are available upon reasonable request from the authors.

Appendix A.: Linear subtraction tests with simulated data

As gwsubtract has not been previously used to intentionally subtract glitches from gravitational-wave strain data, it is important to confirm that the algorithm can be used in this scenario. While glitches should theoretically be subtracted using the procedures described in section 2.2, the practical differences listed in this section may make glitch subtraction difficult. Cases where glitches were subtracted were identified in investigations of the data produced in [40], but the glitch subtraction effectiveness was not assessed. To confirm that gwsubtract could be used to subtract glitches in a controlled setting, we test the algorithm with simulated data.

For this test, we first generate a time series with sine-Gaussian wavelets and a low amplitude background (i.e. an auxiliary witness) and a time series that includes both Gaussian noise coloured to match the spectrum of LIGO Livingston in O3 and the same-Gaussian wavelets (i.e. the gravitational-wave strain). The sine-Gaussian wavelets in each channel are related by a known transfer function. The additional Gaussian noise in the witness channel is not correlated with the strain data and requires the algorithm to disentangle correlated and uncorrelated noise in the witness. We then process the simulated strain data with gwsubtract to generate a glitch-subtracted time series. A comparison of the original strain data with glitches and the glitch-subtracted data is shown in the top panel figure A1. Alongside these time series, we also plot the original data without the sine-Gaussian glitches included.

**Figure A1.** A comparison of whitened time series for simulated data (Top) with a glitch in blue, simulated data with a glitch subtracted in orange, and simulated data that does not contain the glitch in green. The similarity of the three time series for times far away from the glitch show that there was correctly only minimal subtraction done for these time periods. The residual between the simulated data with a glitch subtracted and the simulated data that does not contain the glitch (Bottom) is also shown in blue. The expected standard deviation of the residual is ≈0.05, which roughly agrees with the measured standard deviation of 0.068.
Download figure:
Standard image High-resolution image

Ideally, the glitch-subtracted data should match the original data without glitches, up to expected statistical uncertainties. The difference between these two time series is shown in the bottom panel of figure A1. With the settings used in this analysis, the residual should have standard deviation of $\frac{1}{512} \approx 0.05$ . The true standard deviation of the residual is 0.068, which qualitatively agrees with expectations.

Appendix B.: PE studies of glitch subtraction

For events with relatively low-frequency glitches, there are two potential data mitigation strategies: (a) using only data above a certain cut-off frequency, below which the glitch resides; or (b) modelling and subtracting the glitch. To converge on the best approach to use for an event, we perform Bayesian inference on real data with various mitigation strategies applied. For each event, we then visually compare the posterior probability distributions for the network and per-detector signal-to-noise ratios (SNRs) and other parameters of interest to inform our choice of mitigation strategy. In every case for which these PE studies were performed, glitch subtraction was chosen as the preferred mitigation strategy.

We use Bilby and bilby_pipe [56, 57] to perform our parameter estimation and obtain posterior probability distributions on binary parameters. Aside from changing the data or lower frequency limits, the remainder of the settings are identical between runs on the same event. We use the IMRPhenomPv2 waveform approximant [72, 73], a sampling frequency of 2048 Hz, a maximum frequency of 1024 Hz, and a reference frequency of 20 Hz. Unless altered as part of a mitigation strategy, the default lower frequency of analysis is 20 Hz. We employ marginalisation in phase, time and distance.

Here we focus on the results of two case studies: GW191109_01 0717 (Henceforth referred to as GW191109) and GW200129¹⁴ . For both of these events, we perform analyses on 4 s of data. Data from LIGO Hanford and LIGO Livingston was used for both events, while Virgo data was included for runs with GW200129 (Virgo was offline at the time of GW191109). We use bilby default priors in both cases: the high_mass prior for GW191109, and the 4 s prior for GW200129 (see [57] for details of these priors).

GW191109 is one of the most likely events in O3 to exhibit signs of negatively-aligned spins. For GW191109, the data sets that we test each employ of the following mitigation strategies:

(a)
'H45, L20': limiting the frequency of Hanford data to above 45 Hz;
(b)
'H20, L45': limiting the frequency of Livingston data to above 40 Hz;
(c)
'H45, L40': limiting the frequency of Hanford and Livingston data to above 45 Hz and 40 Hz respectively;
(d)
'BW dg': BayesWave glitch subtraction.

GW200129 is one of the most likely events in O3 to contain evidence for spin-induced orbital precession. For GW200129, the compared data sets each individually have one of the following mitigation strategies applied:

(a)
'H20, L60': limiting the frequency of Livingston data to above 60 Hz;
(b)
'dg, non-linear': linear glitch subtraction on data previously cleaned with broadband non-linear 60 Hz subtraction;
(c)
'dg, linear': linear glitch subtraction on data previously cleaned with broadband linear 60 Hz subtraction.

We also compare these results against results obtained using unmitigated data ('H20, L20').

Results are shown for matched-filter Livingston SNR, effective aligned spin $\chi_\mathrm{eff}$ , and effective precession spin $\chi_\mathrm{p}$ in figure B2 for GW191109 and GW200129. We find that when we increase the lower-frequency limit of analysis for LIGO Livingston data, the Livingston optimal SNR is reduced, as expected. Meanwhile, there is little change to the SNR when the various glitch subtraction strategies are employed. There is also negligible difference between posteriors obtained with the linearly and non-linearly cleaned data for GW200129.

**Figure B2.** Plots showing the posteriors on matched-filter SNR in the LIGO Livingston detector ρ_L (Top), effective spin parameter $\chi_\mathrm{eff}$ (Middle), and effective precession parameter $\chi_\mathrm{p}$ (Bottom) under different mitigation strategies for GW191109 (Left) and GW200129 (Right). The unmitigated posterior is shown in blue in all plots. The 'successful' posterior (representing the chosen mitigation strategy) is shown in purple for GW191109 (Left; 'H20, L20 (BW dg)') and green for GW200129 (Right; 'H20, L20 (dg, linear)'). The Livingston SNR is highest when the nominal lower frequency of 20 Hz is used for the Livingston data. For GW191109, `BayesWave` deglitching preserves the support for negative $\chi_\mathrm{eff}$ , while raising the lower frequency at Livingston to 40 Hz pushes the majority of the support to positive values. In both cases, the most informative posteriors come from deglitched data with the nominal lower frequency limit of 20 Hz.
Download figure:
Standard image High-resolution image

The posteriors on spin parameters χ_eff and χ_p are dependent on our choice of data mitigation strategy. When the lower frequency limit is raised above 20 Hz, information about the spins of the system can be lost, as less of the inspiral track is contained within the reduced extent of the data. Therefore, posteriors on spin parameters with limited low-frequency data are more influenced by the prior and are less well-constrained. In both cases, deglitching the data narrows the posteriors on χ_eff and χ_p relative to the posteriors computed from unmitigated data and reduced low-frequency data.

Low frequencies contain most of the information about $\chi_\mathrm{p}$ , so the red histogram in both plots on the final row roughly traces the prior (uninformative) distribution on $\chi_\mathrm{p}$ ; both events have more informative $\chi_\mathrm{p}$ posteriors when deglitched than in the unmitigated case. In the case of GW191109, it appears that Hanford contains very little information about spin parameters: the χ_eff and χ_p posteriors with the Hanford lower frequency limit at 45 Hz and no deglitching (orange) follow the posteriors with no data mitigation strategy applied (blue). Since the SNR loss was significant when higher limits were placed on the lower frequency, our choice for both events demonstrated here was to deglitch the data and maintain a lower frequency limit of 20 Hz.

Subtracting glitches from gravitational-wave detector data during the third LIGO-Virgo observing run

Article metrics

Submit

Permissions

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction