Waves in a forest: a random forest classifier to distinguish between gravitational waves and detector glitches

The LIGO-Virgo-KAGRA (LVK) network of gravitational-wave (GW) detectors have observed many tens of compact binary mergers to date. Transient, non-Gaussian noise excursions, known as ‘glitches’, can impact signal detection in various ways. They can imitate true signals as well as reduce the confidence of real signals. In this work, we introduce a novel statistical tool to distinguish astrophysical signals from glitches, using their inferred source parameter posterior distributions as a feature set. By modelling both simulated GW signals and real detector glitches with a gravitational waveform model, we obtain a diverse set of posteriors which are used to train a random forest classifier. We show that random forests can identify differences in the posterior distributions for signals and glitches, aggregating these differences to tell apart signals from common glitch types with high accuracy of over 93%. We conclude with a discussion on the regions of parameter space where the classifier is prone to making misclassifications, and the different ways of implementing this tool into LVK analysis pipelines.


Introduction
Since the first direct detection of gravitational waves (GWs) in 2015, the LIGO-Virgo-KAGRA (LVK) network of detectors have detected many tens of candidates from compact binary coalescences (CBC) over three observing runs [1,2,3].Since gravitational waves from astrophysical sources are very weak, the detectors need a high degree of sensitivity to achieve the required signal fidelity.This also makes the detectors extremely sensitive to terrestrial noise sources, which often appear in the data, hampering searches for astrophysical signals [4,5]."Glitches" are a broad class of short-duration noise excursions which commonly appear in GW detector data.
They are transient, non-Gaussian noise of environmental origin that have persisted throughout all three observing runs thus far.As many glitches have a similar timefrequency morphology to GW transients, they can trigger false alarms in low-latency search pipelines [6].This can lead to wasted time and resources if telescopes attempt to follow-up these false alarms, as well as decrease the statistical significance of real events [6].
Glitches can exhibit a range of morphologies, and their origin is not necessarily known.In cases where the cause has been identified, the associated glitches can be eliminated [4,7,8].Despite these efforts, a large number of glitches still persist in the detector data [1,2,3].There have been various software approaches to identify likely glitches and mitigate them by down-weighting the significance of coincident candidates.For example, there are algorithms that use glitches in auxiliary witness channels to predict the likelihood of a glitch in strain data [9,10].Such algorithms can also be used to veto glitches entirely, and has been done to exclude candidates identified in the LIGO-Hanford data in previous observing runs, when they occurred at the same time as when snow plows were active on-site [11].Other algorithms estimate the likelihood of whether the event is astrophysical or a terrestrial noise based on estimates of the likelihood of noise producing a similar candidate [12], or using coherence information that requires multiple detectors [13,14,15,16].Transient GW searches will often have additional glitch-exclusion features, such as the excision of loud, short-duration excursions in the data, or gating [17,18,19,20,21], which are less effective for low SNR glitches.
Current glitch mitigation methods for CBC searches, e.g.chi-squared re-weighting of the signal-to-noise ratio (SNR) are based on the similarity of the morphological fit between the data and the template [22].But such methods do not robustly exclude glitches which have similar time-frequency properties to expected GW signals.It has been seen that light scattering can be mistaken for low-mass CBC sources, as well as tomte glitches being mistaken for high-mass CBC sources [7,23].Some glitch-vssignal classification algorithms, including GravitySpy [24,25,26] and GSpyNetTree [27] employ time-frequency spectrograms (a variant of which is called a qscan) [28] and image classification via a convolutional neural network to distinguish between different glitch and signal classes.These algorithms perform well; however, if a single detector captures an event candidate, time-frequency morphology alone may not be enough to confidently distinguish between a true signal and a detector glitch, or a signal coincident with a detector glitch.Choudhary et al. [29] constructed a deep-learning neural network to distinguish between binary blackhole mergers and blip glitches.There has also been work done that aims to infer the astrophysical population of GW sources, even when some noise transients might be included in the inference [30,31,32,33].
Fig 1 shows example qscans of the glitch classes we use in this work, along with the qscan of GW190521 [34], which due to its short temporal duration and loud nature, can be mis-identified as a tomte or low-frequency blip glitch [35,23].
As the detectors become even more sensitive, this can lead to new glitches appearing in the data due to instrument modifications or glitches that were not loud enough to be seen with a higher ambient noise background.In addition to cases where the glitches have a low SNR, do not have auxiliary witnesses, or have very similar morphology to GW signals, it is useful to look for different ways to distinguish between signals and glitches that are robust to new glitch classes and manifestations.
This manuscript is organized as follows: In section 2 we review the parameter estimation methods used in this work to model glitch and simulated GW populations; section 3 describes the set of features extracted from the posterior distributions; section 4 introduces the random forest classifier which is trained on this feature set to distinguish between glitches and GWs.

Methodology
We use the framework of Bayesian inference [36,37] and random forests [38] to develop a classifier between signals and glitches.We use Bayesian inference to perform parameter estimation on both signals and glitches, while we use random forests for classification, based on a set of features (mentioned in section 3) that are extracted from the posteriors obtained through parameter estimation.We use the Bilby package [39,40] to perform parameter estimation based on a GW signal model.The equation for the posterior probability distribution is given by where θ are the model parameters and h is the model, p is called the posterior, L represents the likelihood, π is the prior and Z is the marginalized likelihood, also called the evidence.The likelihood evaluation is based on the assumption that the noise is stationary and Gaussian [41].Therefore, we use a stationary Gaussian noise model weighted by the detector power spectral density (PSD), expressed in frequency domain as where f k are the frequency bins, d k and hk are respectively the discrete Fourier transforms of the strain data and waveform model, S n (f ) is the noise PSD, and T is the duration of the analyzed data segment.

Parameter Estimation
The GW signature of a compact binary coalescence is completely defined by 15 parameters [42].There are 8 intrinsic parameters, out of which 2 are for specifying the chirp mass M and mass ratio q = m 2 m 1 (m 2 < m 1 ), and 6 for specifying the spin vectors; primary and secondary spin magnitudes a 1 , a 2 , and t 1 , t 2 , φ JL , φ 12 for specifying the orientation of the spin vectors.The other 7 are extrinsic parameters describing the luminosity distance to the binary D L and its orientation θ JN , polarization angle (f) Koi Fish ψ, sky location in right ascension (ra, alpha) and declination (dec, delta) and the time and phase of coalescence t c , φ c .If one or both of the components is a neutron star, there can be another 2 tidal deformation parameters Λ 1 , Λ 2 as well.In this work, we use the IMRPhenomXPHM waveform family [43] as our waveform model.It is a fullyprecessing inspiral-merger-ringdown model that includes higher-order multipole modes as well.To obtain the posterior distribution, we also need to specify a prior.To keep our analysis as unbiased as possible, we use uniform priors or astrophysically agnostic priors depending on the parameter.The priors are similar to ones used in Ashton et al. [44], including standard priors used in gravitational wave astronomy, as defined in Veitch et al. [42].The non standard priors that we use are for chirp mass, for which we use a broad uniform prior in [8,200] M to encompass all our simulated gravitational waves under one common prior.For the luminosity distance, we use the standard uniform in source frame prior between [20,7500]Mpc.We used the same set of priors for signals and glitches for a fair comparison of their posteriors.Now that we have a model for our data and a prior, we can analyze it using a stochastic sampling algorithm.In this work we use the Bilby [39,40] implementation of Dynesty [45], which is a nested sampling algorithm that provides both the posterior samples as well as the evidence.

Glitches
In this work, we focus on 5 classes of glitches that are frequently seen in the data and are known to decrease the search sensitivity [11,35,46,47].These glitches have been observed in all the previous Advanced LIGO observing runs and are likely to remain in the future as their sources have not yet been eliminated.Table 1 lists the names of the glitch classes and the number of glitches in each class that we analyzed in this work.They are the same sample of glitches that were used in Ashton et al. [44].These glitches were initially classified by the GravitySpy pipeline at a 95% confidence threshold.We refer the reader to Ashton et al. [44] for more details on how the glitches were obtained from the Livingston & Hanford detectors.Ashton et.al. [44] used the IMRPhenomPv2 waveform family [48,49] as the model for parameter estimation of their glitches.However, it does not include the effects of higher order modes which are known to have significant effects on posterior distributions, especially for systems with asymmetric masses.Since glitch posteriors tend to prefer high spins and q << 1 ratios [44], we used the IMRPhenomXPHM waveform approximant which includes the effects of higher modes in this work.

Gravitational Waves
We simulated a large population of more than 300 gravitational waves each at LIGO Hanford and LIGO Livingston to create a complementary set to the glitches.We chose widely varying properties to cover the range of GW parameters that may be detectable by LIGO-Virgo-KAGRA in future observing runs.We used a single interferometer while performing parameter estimation as glitches would mostly appear in only one of the detectors.Hence it would be useful to compare glitches with single detector GW detection's.We used O3 sensitivity curves to simulate the noise in the detectors.We chose primary and secondary masses from a uniformly spaced grid between 5 M to 125 M .For each such combination, we assigned spin magnitudes a 1 and a 2 uniformly between 0 to 1.The rest of the parameters are assigned by sampling from their appropriate standard prior distributions [42].In particular, we sampled the luminosity distance from the uniform in comoving volume prior.Using these, we can compute the SNR for each binary.In case the SNR does not cross our detectable threshold of 8, we repeatedly sample from the luminosity distance prior and take the first value that does.Hence, all our simulated events lie inside the current detectable threshold for the detectors.We require this since we wish to build a diverse sample set of simulated GWs to complement the glitch set.It should be noted that this injection distribution is not astrophysically motivated [50], but we follow this approach since our aim is to only create a classifier that can find statistical differences between glitch posteriors and those expected from gravitational waves.Once we have a full set of simulated GW event parameters, we generate their corresponding waveforms using the IMRPhenomXPHM family.We perform parameter estimation on these injections using a similar set up as the glitches, except that the latter was done using data from the detectors while here we use synthetic noise.In addition to using the same priors for analysis, we also used the same time duration of 4 seconds of data for the parameter estimation of both glitches and simulated events for a fair comparison.

Features
It is computationally expensive to build a classifier using all the multi-dimensional posterior samples from each glitch and simulated GW.To reduce the complexity of our problem, we extract certain important features from the posterior distributions for different parameters and build a classifier using them.To incorporate the most likely values of a posterior distribution, we use the mode and median of the posterior samples for each parameter.To quantify the width in a distribution, we include both the 1σ (68 percentile) & 2σ (95 percentile) intervals as features.We used the skew of the posterior distribution to quantify any asymmetries in them.Posterior distributions of different parameters may be correlated and these correlations may turn out to be different for glitches and gravitational waves.To include these effects, we also include pairwise linear correlation coefficients for each pair of parameters, which have a value of +1 for perfect correlation, −1 for anti-correlation, and 0 for no correlation.

Random forests
The random forest [38] is an ensemble classifier that combines the result of many independent decision trees.The final classification is based on a majority voting of the decisions of all the constituent trees.The random forest was introduced partly to solve the overfitting problem to which individual decision trees are prone [38].It has been extensively used for various classification problems in astronomy, including, but not limited to, variable star classification [51], photometric classification of supernovae [52] and quasars [53], and sunspot classification [54]; it has also been used to improve gravitational wave data analysis [55], to improve the LIGO detector duty cycle [56], and to distinguish between NS-BH signals and noise transients in single-detector data [57].We choose the random forest as our classifier due to its ease of implementation, in addition to its robustness to noise and ability to assess input feature importance [38].
The training samples for the random forest consists of the features extracted from the posterior distributions, as described in Section 3.For each glitch in a glitch class, or a simulated gravitational wave, we have an n-tuple of numbers corresponding to the values of the posterior features for all parameters.It should be noted that our training data sets have some class imbalance, as we have more total glitches than simulated gravitational waves and even a difference in the number of glitches in each glitch class.To account for this, we use SMOTE (Synthetic minority oversampling technique) [58] which balances the training data set by generating extra samples in a way similar to bootstrap without adding any new information or bias to the training set.
It should be noted that our sample data has around 200 features for each posterior distribution including mean, median, widths, skew for all parameters and their pairwise linear correlation coefficients.A benefit to using a random forest is that we do not need to perform feature selection, as this is done by the algorithm when a random subset of features is obtained to be used for splitting at each node of each tree in the ensemble; such "bagging" of the features in addition to the trees themselves helps prevent overfitting [38].
We set up the random forest based on a few commonly used hyper-parameters which are the number of trees, the criterion for creating a split, and the maximum depth.We use 100 trees and the gini entropy criterion to measure the quality of a split.The trees are extended till all the leaf nodes are pure, i.e they consist of samples of only a single class.We use a 70 − 30 train-test split on our sample data to create and test the classifier.

Results
We divide this section into 2 parts.The first one describes the trends seen in the posterior distributions for glitches of different classes, and how they might differ from what the posteriors of real signals would look like.In the second part, we show the results of our random forest classifier, trained on the features extracted from the posteriors of glitches and simulated signals.

Glitch vs signal posterior distributions
In Fig 3, we notice that for each glitch class, the posteriors for the individual glitches tend to cluster around a narrow range of values for a few parameters, which is consistent with the results in Ashton et.al.It is most commonly seen in their chirp mass, mass ratio, total mass, spin magnitude posteriors as well as a few other parameters that vary depending on the glitch class.We also notice that the widths of the posteriors tend to be narrower for glitches than for simulated gravitational waves.Another feature, that was also seen in Ashton et al. [44], is that glitches tend to prefer high spin posteriors and asymmetric mass ratios (q 1).This differs quite significantly from the posteriors of simulated gravitational waves, which often tend to be uninformative in spin magnitude.However, it should be noted that posterior widths can be correlated with how strong the signal is, hence very high SNR gravitational wave events would have narrow posteriors as well.We also see a similar behaviour in various other parameters like χ eff , χ p , tilt angles, where the posteriors of simulated events are uninformative (i.e they reproduce the prior), while glitches tend to prefer certain specific regions for those parameters.

Classification
We train a random forest classifier for the glitches and simulated signals using the Hanford and Livingston sets separately.Due to the inherent randomness involved in training and testing random forests, we repeat this process 100 times for each of the detectors and report the median values in the confusion matrix along with the corresponding standard deviations.This would give a more accurate picture of the performance of the classifier than a single iteration of it.Since there is a class imbalance between the classes, we used SMOTE [58] to balance them in the training set.The results look promising, as seen in Fig 4 .We obtain a high mean accuracy averaged over all classes of 0.93 for the Hanford classifier and 0.94 for the Livingston classifier with a standard deviation of < 0.01 across all the iterations.We also achieve a mean chirp recall of 0.90 for the Hanford classifier and 0.88 for the Livingston classifier with standard deviations of 0.03 correspondingly in the chirp recall values across the 100 iterations.This means that not only are most of the glitches classified correctly and not misidentified as a gravitational wave, but we also have very few false negatives in identifying gravitational waves, which is essential as one of our top priorities is not rejecting true GW signals.We also notice that the classifier has a hard time in distinguishing between fast scattering and scattered light glitches.This is expected as their posteriors show quite similar distributions, hence their extracted features would be similar.This is not a problem as our aim is to build a GW vs Glitch classifier and not classification within different glitch families.We also find out that the most essential features that were useful for the random forest to make classifications were the modes and medians of chirp mass and total mass, medians of mass ratio and 2σ posterior widths of total mass, the tilt 2 angle and M. In general, the credible intervals of the marginalized posteriors for simulated gravitational waves tend to be broader than those of glitches, as all the glitches in our set have a very high SNR.The regions where the classifier might fail, that is, predict a real gravitational wave as a type of glitch is when the signal would be such that its posteriors lie in the regions where the posteriors of a type of glitch class tend to cluster.However, since we not only take into account the medians of the posterior, merely lying in the same region may not be a problem as the random forest would also use other features such as posterior widths for classification, and those would most likely not be similar.We additionally train another random forest classifier but just using modes as the feature set this time around.The resulting median confusion matrix is shown in Fig 5 .We obtain quite similar results as when it was trained on all features, which shows that the modes are the most essential features to classify between different types of glitches and gravitational waves.This is a particularly useful result, as it hints towards a similar type of random forest classifier that can be created using the matched filtering estimates of the astrophysical parameters of a candidate signal, which can be computed much faster that the posterior samples in parameter If there indeed turns out to be clusterings in the matched filter estimates for different types of glitches, a similar type of random forest classifier can be created and used in the low-latency pipeline for quick classifications.However, only a detailed study can verify if this would be possible.

Conclusion
In this work, we created a simulated set of gravitational waves and a set of glitches belonging to 5 glitch classes that were obtained from Ashton et al [44].We used Bayesian inference to perform parameter estimation on them using a precessing waveform model, that also includes the effects of higher modes.We extracted certain broad features from each parameter estimation run to efficiently describe the population properties of the posteriors of glitches and gravitational waves.We trained a random forest using the extracted features, and found that it has a high test accuracy as well a low false negative rate for classifying GWs.This could be a useful technique to validate single detector candidate detection's [59,60,61], either due to the other detectors being offline, or having a low SNR due to their orientation or sensitivity.This method could be used to validate candidate signals offline to assess which glitch class or gravitational wave they might belong to.This method could also find an application in online matched filter searches, where a random forest classifier is trained on the matched filtering parameter estimates of candidate detection's.This could be used in addition to existing methods to validate signals, thereby possibly increasing the confidence of detected signals or improving the reach of the detectors.

Acknowledgments
This material is based upon work supported by NSF's LIGO Laboratory which is a major facility fully funded by the National Science Foundation.Virgo is funded by the French Centre National de Recherche Scientifique (CNRS), the Italian Istituto Nazionale della Fisica Nucleare (INFN) and the Dutch Nikhef, with contributions by Polish and Hungarian institutes.We are also grateful to computing resources provided by the LIGO Laboratory computing clusters at California Institute of Technology and LIGO Hanford Observatory supported by National Science Foundation Grants PHY-0757058 and PHY-0823459.The majority of analysis performed for this research was done using resources provided by the Open Science Grid (https://osg-htc.org/)[62,63], which is supported by the National Science Foundation award #2030508.This research was enabled in part by support provided by the BC DRI Group and the Digital Research Alliance of Canada (alliancecan.ca).This work makes use of the SciPy [64], NumPy [65], GWpy [66], and PyCBC [67]

Figure 1 :
Figure 1: Qscans of a few glitches commonly seen in the LVK data and a real gravitational wave signal GW190521

Figure 2 :
Figure 2: The distribution of parameters for the simulated GW signal population.
Distribution of posterior modes of glitches and simulated GWs at Hanford Distribution of posterior modes of glitches and simulated GWs at Livingston

Figure 3 :
Figure 3: The above figures show the histogram of the modes that are used as features for the random forest classifier for some of the parameters

TTTFigure 4 :
Figure 4: Results of the random forest classifier trained using all features

TTTFigure 5 :
Figure 5: Results of the random forest classifier trained using only modes

Table 1 :
The number of glitches in each glitch class from each LIGO detector that are used in this work software for data analysis and visualisation.NS is supported by the Mitacs Globalink Research Internship, JM by the Canada Research Chairs program, and AMK by a Killam Doctoral Scholarship.DCS. is supported by National Science and Engineering Research Council of Canada grant number RGPIN/03985-2021.