Scattered light monitoring system at the Virgo interferometer: performance improvement and automation based on O3 data

Scattered light, also referred to as scattering, is a nonlinear non stationary noise that can affect data acquired by ground-based laser interferometers for the detection of gravitational waves. A methodology for the identification and daily monitoring of scattering sources, based on the tvf-EMD algorithm, was applied to a large dataset of 132 days of data. Time series of the differential arm motion (DARM) degree of freedom acquired by the Virgo detector during the third LIGO-Virgo scientific run, so called O3, which lasted from 1 April 2019 to 27 March 2020, were considered. The analysis focused on correlation with suspended West end optical bench (SWEB) position data, as SWEB was a known culprit of scattering witnessed in DARM during O3. Different configurations were tested, improving performances with respect to previously obtained results and at the same time making the methodology fully automated. This allows to employ it as a monitoring system both during the phases of detector’s upgrade and in scientific runs such as the fourth scientific run, O4, currently scheduled to start on 2023. The higher values of correlation obtained suggest that tvf-EMD could improve the performance of scattered light noise subtraction from DARM.


Introduction
Virgo, LIGO, and KAGRA are laser interferometers designed, built, and operated for the detection of gravitational waves [1][2][3][4][5].Such detectors are highly complex instruments, having orthogonal arms extending for 3-4 km in length, and are made of many optical components which must be operated jointly to reach the optimal working point of each instrument.The sensitivity of these ground based detectors can be affected by the surrounding environment through noise coupling, relevant examples being seismic, magnetic and acoustic noises [6][7][8][9].One type of noise consistently present in the detector's output, i.e. into the differential arm motion degree of freedom (DARM), is scattered light noise [10,11], also known as scattering.Scattering occurs when a fraction of the laser light is bounced-off reflective surfaces having a relative motion with respect to the optics of the detector.These fractions of the laser beam, possibly after several round trips, may recombine with the main laser beam.If the speed of the moving reflective surface generating the noise is high enough, or if several round trips occur before recombination, scattering noise can enter the detection band.This noise is often related to excess motion in the microseismic band (0.1 Hz-0.3 Hz) in the environment surrounding the detector [12][13][14][15][16]. Scattering is present in the spectrogram of the DARM data as arches having time of occurrence t and frequency f (t) which is related to the reflective surface's motion x sc (t) and its velocity v sc (t).Such surface is referred to as the culprit of scattering.This physical relationship is quantified by the so-called predictor equation, measured in Hz (see [17], equation (3)).
In equation (1), N is the number of round trips between the optical component and the reflective surface before recombination with the main beam, while λ is the laser wavelength.Identifying scattering noise sources by experimental means can be time-consuming in km-scale laser interferometers.Consequently, methods based on data analysis techniques for identifying and monitoring scattering culprits were developed and applied to Virgo and LIGO data [8,[18][19][20][21].These techniques use adaptive algorithms, such as empirical mode decomposition (EMD) and its recently developed time-varying filter version (tvf-EMD) [22,23], to extract the portion of the data affected by scattering.In particular, in [20], it is shown that it is possible to monitor the onset and time evolution of scattering in connection to microseismic activity variability using adaptive algorithms.In this paper, we considered 132 days of data of the Virgo gravitationalwave detector recorded during the third observation run, known as O3.The analysis has been carried out on a large amount of data, and aims to test different configurations of the scattering monitoring system described in [20] to improve its performance and automation.These improvements make the monitoring system a better asset for the noise mitigation tasks during the observing and commissioning periods of the detector.We describe the adopted methodology in detail in section 2 and the obtained results in section 3.

Methodology
A total of 132 d of DARM data acquired by the Virgo interferometer during O3 were selected looking at so called daily Omicron plots, obtained with the Omicron algorithm [24], where transients of excess power i.e. glitches, at low frequency and of unknown origin are visible.An example is shown in figure 1 for the 18 January 2020.The methodology described in [20] was modified applying to DARM a lowpass filter with a frequency cutoff derived from the predictor data of equation ( 1), referred to as f low = L.In [20], f low was instead chosen by visual inspection of the daily Omicron plots.This is needed in order for the monitoring system to provide results in an automated way.Furthermore, different configurations of the monitoring system were tested.This allowed to identify which configuration gives the highest values of correlation, improving previously obtained results and hence the monitoring system overall performances.For clarity, the adopted methodology is hereafter summarized • A total of 1440 intervals of 60 s are considered for each analyzed day.
• DARM data are downsampled to 100 Hz and then low-pass filtered using f low = L as cutoff, where L is the maximum value of the predictor (Hz) calculated using equation ( 1) for each 60 s long interval (the values of L used are around 20 Hz).• Low-pass filtered data are decomposed with tvf-EMD, obtaining a set of oscillatory modes called Intrinsic Mode Functions (IMFs).The pytvfemd package [25] is used to this end.The envelope of the IMFs is their instantaneous amplitude IA, and is a function of time.• For each of the time intervals, the IA is correlated with equation (1) computed for position data of the suspended West End Bench (SWEB).The Pearson correlation coefficient ρ is used to quantify correlation.
A total of 1440 values of correlation ρ are obtained each day depending on the locking condition of the detector i.e. availability of DARM data.The locked/unlocked status is indicated in the Omicron plot of figure 1 in the form of a green/red bar, respectively.The relative velocity between the West end test mass and the West end optical bench was used as scattering predictor.SWEB was considered as, due to a malfunctioning in the mechanical setting of the suspension, it was the most frequent culprit of scattering in Virgo during O3 [9].The malfunctioning has been identified and resolved after O3.The monitoring system automation was improved using f low = L. Furthermore, to evaluate its performance under different configurations, the first four IMFs obtained decomposing DARM with tvf-EMD are considered.The SWEB predictor from equation (1) was correlated with the sums of the obtained IA, which is referred to as ρ IA {1} , ρ IA {1,2} , ρ IA {1,2,3} and ρ IA {1,2,3,4} , respectively.A configuration, referred to as ρ DARM in which the DARM time series, low-passed as described above, is correlated with equation (1) without obtaining the IA's first, was also tested.This configuration performs very poorly, highlighting that adaptive analysis is needed to obtain a high correlation with equation (1).Results from [20] are referred to as ρIA {1} and are obtained using a manually selected f low to lowpass DARM.Top: Daily Omicron plot of DARM for the 18 January 2020, a day affected by high microseismic noise at the detector's location.Glitches of high signal to noise ratio are visible as red dots (Reproduced from [20].© IOP Publishing Ltd.All rights reserved.).Bottom: Moving average of correlation ρ, computed between the IA of DARM and the SWEB predictor, as obtained from the different configurations tested and described in section 2. The ρ time series is consistent with the glitches shown in the upper panel, indicating that they are due to scattered light originating from SWEB.

Comparison with previously obtained results
To evaluate performances of the algorithm when using f low = L and multiple IMF's to compute correlation, the methodology described in section 2 was first tested on eight days of Virgo data from the final months (winter time) of O3, considering the days already analyzed in [20]:  correlating the IA's with the SWEB predictor, under the different configurations described in section 2. It can be seen that the ρ values obtained are consistent with the time evolution of the glitches occurrences during the day.Results for the other seven days considered are shown in figure 2, while the Omicron plots of such days are in [20].Figures 1 and 2 allow a direct comparison with results from [20], in which the lowpass frequency was chosen by hand and furthermore only IA {1} , the envelope of the first IMF, was considered.Data from [20] are referred to as ρIA {1} and are in black in figures 1 and 2. Choosing f low = L and using combinations of IA greatly improves the algorithm's performances i.e. correlation values ρ are consistently higher than the ones from [20].In figure 1 it can be seen that using the first two modes gives higher correlation in the first part of the day, where the scattering frequency is higher, compared to the rest of the day where instead using more modes gives better results.This is likely due to the fact that expansions with adaptive algorithms provide modes of decreasing frequency, the last term of the expansion simply being the trend or baseline wandering of the data [22].From figures 1 and 2 it can also be seen that simply correlating the low passed DARM time series with the SWEB predictor yields almost no correlation, as ρ DARM (blue) is small.

Comparison using extended O3 dataset
The analysis was then applied to 124 more days, selected based on the presence of low frequency glitches in the daily Omicron plots, similar to figure 1.Hence, a total of 132 d spread over the whole O3 run were considered.The complete list of analyzed days, among with their starting GPS (UTC), is reported in table 1. Figures 3 and 4 show the obtained results in the form of histograms and probability density functions (PDF).

Performances using f low =
L and more than one IMF.The left panel of figure 3 includes only the cases where using f low = L yielded values of ρ IA {1} > 0.5.The difference ∆ρ was then carried out between those cases and ρ IA {1,2} (green), ρ IA {1,2,3} (red), ρ IA {1,2,3,4} (purple), computed at the same GPS time of ρ IA {1} .It can be seen that, for the subset of cases considered, choosing the first two IMFs gives the best results, since ρ IA {1,2} − ρ IA {1} > 0 in 74.4% of the cases.When three and four IMFs are considered, the PDF is instead greater than zero in 59.4% and 49.7% of the cases, respectively.The case in which DARM is simply low passed without decomposing it with tvf-EMD (ρ DARM ) is also shown (blue).It can be seen that simply low passing DARM, without using adaptive analysis, gives very low correlation with the predictor of equation (1).Finally, the left panel of figure 3 shows in black the correlation difference ∆ρ = ρIA {1} − ρ IA {1} between results from [20] and ρ IA {1} .Since in this case ∆ρ > 0 only for 22.5% of the cases, choosing f low = L results are improved.The right panel of figure 3 shows histograms of ρ values as obtained for the whole analyzed dataset, i.e. 1440 values of ρ for each of the 132 d.In the inset plot, only values of ρ > 0.5 are shown.The IA {1,2,3} configuration gives the best results as 13.3% of the cases have ρ > 0.5.This is true in 12% of the cases for the IA {1,2} configuration, while in 10.9% and in 4.3% of the cases for the IA {1,2,3,4} and IA {1} configurations, respectively.

Comparison of configurations performances using more than one IMF.
Having established that considering more IMFs yields better results, a more detailed analysis was performed with the goal of identifying the best configuration.In the left panel of figure 4 only cases in which the IA {1,2} , IA {1,2,3} and IA {1,2,3,4} configurations gave values of correlation As explained in [20], its comparison with Omicron daily plots and with microseismic noise data provides a scattering daily monitoring system.Higher values of ρ compared to results from [20], which are in black, mean a better performance of the monitoring system was obtained.Only cases having ρ IA {1} > 0.5 are considered.It is found that ∆ρ > 0 in 74.4%, 59.4% and 49.7% of the cases for the green, red and purple case, respectively.Hence, it can be seen that using more than one oscillatory mode improves results.Not using adaptive algorithms gives lower correlation instead (blue).The black plot is the correlation difference ∆ρ = ρIA {1} − ρ IA {1} , using the 8 d of data from [20] in which the lowpass frequency was manually chosen, referred to as ρIA {1} .In this scenario, ∆ρ > 0 in only 22.5% of the cases.Hence, choosing a lowpass frequency based on the data improved results.Right: Histograms of ρ values for all the 132 analyzed days, using f low = L and combinations of IMFs.The inset shows the same plot only for ρ > 0.5, which is obtained in 13.3%, 12%, 10.9% and 4.3% of the cases for the red, green, purple and orange plot, respectively.ρ > 0.5 were considered.For this cases only, results are compared with what was obtained in the IA {1} configuration.It is obtained that ∆ρ > 0 in 93.3% of the cases when using IA {1,2} , 92.2% of the cases when using IA {1,2,3} , and 88.8% of the cases when using IA {1,2,3,4} .While the percentage of these cases is similar, using IA {1,2,3} gives higher values of correlation with respect to IA {1,2} , as it can be seen comparing the average values of ∆ρ: 0.23 for IA {1,2} (green plot), 0.29 for IA {1,2,3} (red plot), and 0.27 for IA {1,2,3,4} (purple plot).Due to the previous results, a direct comparison between the best performing configurations, i.e.IA {1,2} and IA {1,2,3} , was also carried out.The right panel of figure 4 shows the correlation differences ∆ρ = ρ IA {1,2,3} − ρ IA {1,2} , considering a total of 200 83 common times for which the correlation is greater than 0.5 in both cases.It can be seen that using IA {1,2,3} gives better results with respect to IA {1,2} , as in the right panel of figure 4 ∆ρ > 0 in 52% of the cases.In summary, obtained results show that the best performing configurations are ρ IA {1,2,3} and ρ IA {1,2} , with the first one performing slightly better than the second one for the analyzed dataset.Given that the methodology is empirical, it could be useful to report results from both configurations in the framework of a monitoring system for detector monitoring and characterization, with no significant increase in computational cost.Alternatively, the best performing configuration among the ones considered could be used in each 60 s interval., ρ IA {1,2,3} and ρ IA {1,2,3,4} > 0.5.The percentage of cases in which ∆ρ > 0 is 93.3%, 92.2% and 88.8% for the green red and purple case, respectively.The average value of the distributions is the highest for the ρ IA {1,2,3} configuration.Right: Histogram of correlation difference ∆ρ = ρ IA {1,2,3} − ρ IA {1,2} .A total of 20 083 samples having the same GPS and for which both ρ IA {1,2,3} and ρ IA {1,2,3} are greater than 0.5 are used to produce the histogram.As in this case it is found that ∆ρ > 0 in 52% of the cases, this is indicative of the fact that the ρ IA {1,2,3} configuration gives better results with respect to the ρ IA {1,2} configuration.Furthermore the overall number of cases giving correlation higher than 0.5 is higher for the ρ IA {1,2,3} case, as also shown in the inset of figure 3.

Conclusions
As described in [18,19] adaptive algorithms can be used to identify sources of scattering using the predictor equation ( 1) computed for several different objects (a list of objects, possible culprits of scattering, are reported in appendix A of [18]).The methodology from [20] for the daily monitoring of scattering sources based on the tvf-EMD adaptive algorithm was improved and applied to 132 d of data acquired by the Virgo interferometer during O3.Such a daily monitoring system can track the onset and time evolution of scattered light noise, due to a given source, making use of the correlation time series shown in figures 1 and 2. Different configurations were tested with the aim of improving its overall automation and at the same time improving performances with respect to previously obtained results.In order to do so, a lowpass frequency f low inferred from the data using the maximum value L of equation (1) each considered interval was used to lowpass the DARM time series, before decomposing it with tvf-EMD.Then, several different configurations were considered.The first four IMF's extracted by the tvf-EMD algorithm were used to obtain the instantaneous amplitudes IA to be correlated with the predictor equation (1).These configurations are referred to as IA {1} , IA {1,2} , IA {1,2,3} and IA {1,2,3,4} , respectively.Obtained results suggest that for the analyzed dataset, the configuration employing IA {1,2,3} slightly outperforms the one employing IA {1,2} .
As the adopted methodology is empirical, this could not always be the case and all the different configurations could be reported in the summary pages of the monitoring system.Overall, obtained results are an improvement compared to what was done in [20].It was shown that is possible to employ the described daily monitoring system in a fully automated way using f low = L and at the same time performances were enhanced by considering more IMFs to compute correlation.In this regard, the high values obtained suggest that the tvf-EMD algorithm could improve the performance of scattering subtraction from the data.The predictor computed from the relevant auxiliary channel could be used as a reference to evaluate noise subtraction performances.This with the aim of improving parameter estimation [26][27][28].The application of tvf-EMD for scattering glitch subtraction will be the subject of future research, also making use of a standard glitch dataset to evaluate denoising metrics.Additionally, obtaining higher correlation values will help to better discriminate (visually) periods of scattered light in the data, facilitating the visual comparison with the daily Omicron plots and hence aiding detector characterization and commissioning efforts.Notably, a much larger dataset was analyzed compared to previously obtained results.As each analysed day is divided into 60 s intervals, giving 1440 values of correlation ρ each day, and given that 132 d are considered and four different configurations tested (IA {1} and IA {1,2} , IA {1,2,3} , IA {1,2,3,4} and low passed DARM), the analysis consisted of a total of 760 320 separate jobs.This large scale analysis was carried out at the Institut national de physique nuclaire et de physique des particules (in2p3, http://cc.in2p3.fr), using the Slurm framework [29].This allowed to run up to 3000 jobs in parallel, greatly reducing computational time.As the source of scattering witnessed in DARM was known to be SWEB, it was possible to mitigate it during the so called online noise subtraction, using the relevant photodiodes as witness channel.This way, scattered light noise had a reduced impact on the gravitational wave strain data, which are obtained from the DARM time series after calibration and noise subtraction.More details are in [3], describing the noise subtraction procedure.Furthermore, the SWEB control was improved after O3 and residual motion is now expected to be similar to the other terminal bench of the detector.The daily monitoring system of scattering sources described here will be used during the preparatory phase, referred to as commissioning, leading toward the fourth observing run O4 and then during the run itself.This will prove useful in situations when the origin of scattered light noise is not known and a quick identification of the source is needed.Using the position of multiple potential scattering surfaces to compute equation (1) will allow to simultaneously monitor a higher number of optical components.

Appendix B. Correlation results of different configurations of the monitoring system
Different configurations of the monitoring system were tested correlating IA obtained using combinations of the first four IMF's of DARM extracted by tvf-EMD.Figure B1 shows results obtained for Virgo data on the 2020/02/02 -02:20:00+60 s (UTC).In red is the SWEB predictor computed from equation (1) while in blue is the IA of DARM in the various configuration.Using the first two or three IMF's gives higher ρ compared to the other cases i.e. the noise is extracted more accurately from the DARM time series.It should be noted that simply correlating the predictor with the low passed DARM time series gave on average significantly lower values of correlation, as shown in figures 1-3.This shows that adaptive algorithms are a valuable tool to extract the amplitude modulation present in the DARM data due to the movement of the culprit.

Figure 1 .
Figure1.Top: Daily Omicron plot of DARM for the 18 January 2020, a day affected by high microseismic noise at the detector's location.Glitches of high signal to noise ratio are visible as red dots (Reproduced from[20].© IOP Publishing Ltd.All rights reserved.).Bottom: Moving average of correlation ρ, computed between the IA of DARM and the SWEB predictor, as obtained from the different configurations tested and described in section 2. The ρ time series is consistent with the glitches shown in the upper panel, indicating that they are due to scattered light originating from SWEB.Results from[20], referred to as ρIA {1} , are in black.

Figure 1 ,
Figure 1, top panel, shows the Omicron low frequency glitches present in DARM during the 18 January 2020.Bottom panel shows instead the moving average of the ρ time series obtained

Figure 2 .
Figure 2. Moving average of ρ values obtained under the different configurations tested.As explained in[20], its comparison with Omicron daily plots and with microseismic noise data provides a scattering daily monitoring system.Higher values of ρ compared to results from[20], which are in black, mean a better performance of the monitoring system was obtained.

Figure 3 .
Figure 3. Left: Probability density functions of ∆ρ = ρ − ρ IA {1} , i.e. differences in correlation obtained for the 132 d of O3 considered, for the various configuration tested.Only cases having ρ IA {1} > 0.5 are considered.It is found that ∆ρ > 0 in 74.4%, 59.4% and 49.7% of the cases for the green, red and purple case, respectively.Hence, it can be seen that using more than one oscillatory mode improves results.Not using adaptive algorithms gives lower correlation instead (blue).The black plot is the correlation difference ∆ρ = ρIA {1} − ρ IA {1} , using the 8 d of data from[20] in which the lowpass frequency was manually chosen, referred to as ρIA {1} .In this scenario, ∆ρ > 0 in only 22.5% of the cases.Hence, choosing a lowpass frequency based on the data improved results.Right: Histograms of ρ values for all the 132 analyzed days, using f low = L and combinations of IMFs.The inset shows the same plot only for ρ > 0.5, which is obtained in 13.3%, 12%, 10.9% and 4.3% of the cases for the red, green, purple and orange plot, respectively.