A study of deep neural networks for Newtonian noise subtraction at Terziet in Limburg—the Euregio Meuse-Rhine candidate site for Einstein Telescope

The Euregio Meuse-Rhine border region of Belgium, Germany and the Netherlands has been identified as a candidate site for hosting Einstein Telescope. Newtonian coupling of ground vibrations to the core optics of the detectors may limit the sensitivity of Einstein Telescope at frequencies below about 10 Hz. The contribution of Newtonian noise is site specific and depends on the ambient seismic field which in turn depends on the site’s geology and the distribution of surface and underground seismic-noise sources. We have investigated the application of machine learning in combination with the deployment of seismic sensor networks to predict seismic displacement noise at specific locations on the surface and underground. Moreover we have modeled a deep neural network (DNN) that allows to subtract Newtonian noise from the strain data measured by Einstein Telescope. The seismic-field model is based on a complete solution of the elastodynamic wave equations for a horizontally-layered soil structure. The geology features soft-soil layers on hard-rock and was shown to be effective in attenuating Newtonian noise from surface waves below the required sensitivity. In addition our model includes a random background of body waves with all possible angles of incidence. We show that a DNN is effective in predicting Newtonian noise. Application of our DNN allows Newtonian noise subtraction by a factor up to 4.7 at 1 Hz and 2.5 at 5 Hz.

The Euregio Meuse-Rhine border region of Belgium, Germany and the Netherlands has been identified as a candidate site for hosting Einstein Telescope.Newtonian coupling of ground vibrations to the core optics of the detectors may limit the sensitivity of Einstein Telescope at frequencies below about 10 Hz.The contribution of Newtonian noise is site specific and depends on the ambient seismic field which in turn depends on the site's geology and the distribution of surface and underground seismic-noise sources.We have investigated the application of machine learning in combination with the deployment of seismic sensor networks to predict seismic displacement noise at specific locations on the surface and underground.Moreover we have modeled a deep neural network (DNN) that allows to subtract Newtonian noise from the strain data measured by Einstein Telescope.The seismic-field model is based on a complete solution of the elastodynamic wave equations for a horizontally-layered soil structure.The geology features soft-soil layers on hard-rock and was shown to be effective in attenuating Newtonian noise from

Introduction
Einstein Telescope is a planned third-generation subterranean gravitational-wave detector, to be based in Europe [1].Einstein Telescope will improve upon the sensitivity of current detectors by novel sensor technology, greater seismic isolation and an evolved observatory design.Seismic perturbations in the subsurface around the gravitational-wave detector may limit the sensitivity of the observatory at low frequency.This is due to mass density fluctuations that induce acceleration-noise in the optical components of the detector.This effect, called Newtonian noise or gravity-gradient noise, is expected to dominate [2] the measured detector strain below frequencies of about 10 Hz.
At present there are two candidate sites for hosting Einstein Telescope: the Euregio Meuse-Rhine (EMR) in the border region of the Netherlands, Belgium and Germany, and Sardinia in Italy.An initial study has been carried out to quantify Newtonian noise for the EMR candidate site [3] and results show that for frequencies below 8 Hz Newtonian noise may be about a factor two higher than the design sensitivity of Einstein Telescope (the so called ET-D).
Here we discuss the use of a seismic sensor network to model and subtract Newtonian noise from the detected signal.Noise subtraction can be accomplished when the current state of the seismic field can be inferred from the data of the seismic sensor network, by numerically integrating the change in gravitational attraction on the optical components, induced by this state.
A pragmatic approach to this problem is to apply Wiener filtering or machine learning to find the relation between the seismometer input and detector strain output.Note that this approach eliminates the necessity to calculate the seismic state of the field, which is generally a complex superposition of waves and transients in a non-homogeneous medium.We demonstrate a new approach that uses DNNs to predict seismic-and Newtonian noise activity.Machine learning for estimating the Newtonian noise contribution to a gravitational-wave detector signal is a synthesis problem, in which the machine-learning algorithm attempts to reproduce a Newtonian noise induced signal given a number of seismic input sensors.Once the noise is estimated it can be subtracted from the observed signal, in effect reducing the noise and thus improving the quality of the detector output signal.In contrast to Wiener filters, a neural network can give non-linear solutions based on the information of several sensors.This is a crucial advantage, since the density fluctuations in the soil surrounding a test mass depend on the gradients of the displacement field, which change sign when a wave travels in the opposite direction.We will discuss Wiener filtering and compare our neural network results with a Wiener filter in section 7.
Einstein Telescope will be sensitive for frequencies from about 3 Hz to 10 kHz, a range that largely coincides with the audio band.In this band machine learning was applied to increase signal to noise in active noise canceling technology, which uses one or more external microphones to capture ambient noise and to apply an algorithm to cancel this signal by mixing an estimated anti-signal into the audio stream [4].In gravitational-wave research studies were done to reduce the technical noise of a detector strain signal by applying machine learning.Vajente et al [5] apply an algorithm based on Volterra series to estimate stationary and nonstationary noise sources.The machine learning part relates mostly to using a gradient descent algorithm commonly used in machine learning to optimize the Volterra kernel.Ormiston et al [6] explored the application of a DNN called 'DeepClean' to improve upon the Wiener filter based noise reduction schemes in current detectors.However, no attempt was made to apply these techniques to Newtonian noise.Badaracco et al [7] find the optimal input seismometer placement by using a particle swarm optimizer, facilitating a Wiener filter based Newtonian noise cancellation pipeline.
Newtonian noise background subtraction features the following relevant characteristics: firstly we can assume a relatively invariant ambient environment and a flexible selection of sensor channels and locations.Secondly we can presume that there are no real-time constraints, and that ample processing resources are available.Finally a Newtonian noise subtraction solution can perform in situ machine-learning model (re-)training, allowing flexibility and an adaptive approach when the ambient seismic noise environment would change over time.
In the following we discuss how Wiener filters and machine learning can be used to predict seismic displacement noise and Newtonian noise.We first demonstrate the prediction of seismic noise for a surface sensor array for which we have an extensive seismic velocity data set, and where we expect both approaches to provide solid predictions.Next we show that such a surface array is also capable to predict underground seismic noise at a depth of 250 m.Finally, we evaluate the capability of both a Wiener filter and a DNN to predict Newtonian noise for the geology at Terziet in Limburg, the Netherlands.This is a significantly more challenging case as Newtonian noise arises from an integral over all relevant mass-element displacements in a complicated geology extending over several kilometers.As real data from a distributed seismic sensor network are not available, this study is based on synthetic data.

Predicting seismic-sensor signals with DNNs
To estimate Newtonian noise, a neural network needs to resolve how seismic waves in the Earth's subsurface relate to the strain signal measured by a gravitational-wave detector.Before embarking on this challenging task, we first carry out a feasibility study to see if it is possible to create a machine-learning model that can predict the signal of a single seismic sensor, the so called witness sensor, based on the response of all other seismic sensors that are located in the surrounding area.This question was addressed by Liu et al [8] who used convolutional neural networks in the context of geophysics analysis for deep-seismic-prior-based reconstruction of seismic data.They analyzed seismic traces from sensors evenly spaced along a straight line (leading to a sub-surface mapping in 2D-geometry).In our analysis, the sensors are placed randomly, at irregular distances both in the north-south and east-west direction, and the phase relations between channels are derived by the neural network itself during training.
We employ a parameterizable DNN and the implemented setup is depicted in figure 1.Once the network has been trained on an adequate amount of training data, the network is used to predict the signal of the witness sensor when using yet unseen input data.The goal is to demonstrate that the seismic activity can be retrieved (also in the time domain) at locations that are not directly monitored by a sensor.It is not assumed that the network can predict seismic activity at any arbitrary location without training; note that this is not a necessary requirement to subtract Newtonian noise as we will demonstrate in section 5.

Preprocessing
A parameterizable DNN termed 'Seismonet' was developed by using Keras [9] and Google Tensorflow [10] to solve the problem of predicting seismic activity for a witness sensor, based on data from other sensors in a seismic network.
While seismic sensors, such as geophones, provide one or more channels of displacement data in the time-domain, Seismonet operates in frequency domain, and signal transformation and signal conditioning are required.The seismic sensor data of all channels are digitized at a sample rate specific for the sensor.In the first preprocessing step, all sample data are decimated to a single sample-rate, leading to a fixed Nyquist frequency, which value is dependent on the application.Furthermore instrument response is removed, providing N time-series signals in velocity (m s −1 ) z n (t) where n represents an index for a specific channel, t the time index, and N the total number of channels.The witness (or target) signal z 0 is the signal we wish to predict based on the data z 1...(N−1) of N − 1 so-called reference channels.
Hereafter each time-series is scaled with a channel specific normalization factor as where Med is the median operation.Once chosen, the normalization factor should remain constant, but it can be determined from a representative subset of all channel data.Normalization such that most values are within the range of −1 . . . 1 is beneficial to DNN performance and common in machine learning preprocessing pipelines.
The total dataset has to be segmented in records of fixed duration.Each record contains s samples of data for all channels z ′ n .To arrive at an optimal value for s, we consider the maximal lag between the witness sensor and any reference sensor in the field for which one can expect sizable coherent contributions.This maximal lag depends on the spatial separation between the sensors and the velocities v with which waves travel through the field.Generally for seismic waves at the surface, the group velocity is lower than the phase velocities in the wave packet, and all velocities decrease with increasing frequency.The lowest relevant velocity v min is then mainly determined by the Rayleigh waves at the Nyquist frequency.Longer travel times between sensors can be neglected.The maximal lag τ max in the correlation that needs to be considered can therefore be estimated as where d max is the maximal distance between the witness sensor and the furthest edge of the sensor network.Correlations need only be considered for time lags smaller than or equal to τ max .We choose the record length s to equal where ⌈. ..⌉ 2 denotes rounding up to the nearest power-of-two (which is numerically convenient when using fast Fourier-transform calculations).An example of a (time domain) record S is shown in figure 2. The input of the neural network is focused on the central s/2 samples.As signals in the shaded regions of length s/4 may receive contributions from previous or next records, all signals are windowed.Partly overlapping records are obtained by using a stride of s/2 samples.For our application, the chosen value of s is optimal, since longer records and higher values of s would increase the parameter size of the neural network and degrade the numerical performance.
Seismonet operates in the frequency domain and thus the time domain record S needs to be transformed to its spectral representation R by using a discrete Fourier transform (F) as Here, i and n represent the record and channel index, respectively and w n represents a window function.The window function used depends on whether it is a witness channel (n = 0) or reference channel (n ∈ {1 . . .N − 1}) and is given as For reference channels we mitigate spectral leakage and improve the spectral resolution by application of a Hann window.For the witness channel we concentrate the predictive capabilities of the DNN on the coherent time span.To achieve this, we utilize a padded Tukey window function applied to the witness channel data.The padded Tukey window has a value of 1 for the central s/2 portion of the window and smoothly tapers to 0 outside of this region.This approach not only reduces spectral leakage, and focuses the DNN's predictions on the coherent time span, but also avoids that the prediction is degraded as the reference channels may not have enough history and/or future samples to make accurate predictions for the witness channel.
In the following we denote the witness channel with y, such that S 0 = S y and R 0 = R y , and the reference channels with x such that S {1...N−1} = S x and R {1...N−1} = R x .

DNN for predicting seismic displacement noise
Each record R(i) contains all needed context for seismic activity prediction in the coherent region of the witness sensor.The records are fed to Seismonet and figure 3 shows that Seismonet is composed of three segments.The first 'band correlation' segment consists of the input layer and a number of sequential layers that process frequency-domain data in a number of overlapping bands.Each consecutive layer reduces the number of parameters (i.e. the number of neurons and their respective weights) for that specific band, guiding the optimizer to abstract the in-band correlations.The second 'full correlation' segment consists of a number of fully connected layers, allowing global abstractions.For this segment the number of neurons in the first layer and the subsequent neuron reduction per layer can be configured, though the last layer always contains the same number of neurons as the number of output values.Both 'band correlation' and 'full correlation' segments contain dropout layers [11] in between their layers and both use the exponential linear unit ELU [12] as activation function.The final segment contains a single layer whose only purpose is to translate and scale the output of the 'full correlation' segment to the output range.This layer has no activation function.
While the global composition of Seismonet is fixed, it can be customized by hyperparameters, facilitating to adapt Seismonet to a certain bandwidth and to a specific number of channels.The following hyperparameters are available for customization: the width of each overlapping band, the overlap ratio between each band, the number of neurons in the first layer, the number of band correlation layers, the dropout factor used for band correlation layers, the ratio of reduction of each subsequent band correlation layer, the number of full correlation layers, the width and shape of the full correlation layers, and the drop-out factor between each full correlation layer.
Finding optimal values for such a large parameter space cannot easily be done manually and for this reason hyperparameter optimization (HPO) [13] is used to find a near-optimal set of values for a given training and validation data set.
The predicted output of Seismonet for record i is denoted R ŷ(i), and represents Seismonet's inference for a specific training record.Seismonet uses the mean absolute error as loss function.A more comprehensive description of Seismonet is provided in appendix.

Post-processing
Prediction R ŷ(i) of Seismonet is a complex frequency spectrum which can be converted to the time-domain by using the inverse Fourier transform S ŷ(i) = F −1 (R ŷ(i)).It is possible to reconstruct the full prediction in the time-domain from S ŷ(i) by merging multiple consecutive predictions.Merging requires overlap between records, where each overlapping record then needs to be merged to its neighbor with a power-preserving cross-fade.For this paper we have not reconstructed a continuous time-trace, and the results only show single record predictions.

Estimating ambient surface displacement noise
Validation of Seismonet for the prediction of seismic surface displacement noise has been carried out by using data of a 2017 seismic survey [14] executed in the south of Limburg, near the village of Terziet, the Netherlands.The PSD at the surface and at 250 m depth are shown in figure 4. For reference Peterson's new low-noise model and new high-noise model are shown as dashed curves [15].

Sensor network topology
The employed seismic sensor network consists of two arrays of Quantum sensor nodes [16], that are placed on the surface in expanding concentric circles.Each node provides a single channel of seismic activity data in the vertical direction.The topology of the seismic network is shown in figure 5.Each subsequent circle has about twice the distance from the array center to ensure network sensitivity to a wide band of wavelengths and to avoid spatial aliasing.The sensor at the center of Array A (depicted by the green triangle in figure 5) acts as the witness sensor for which the neural network will estimate the seismic activity.

Input data
Each node provides a single channel of sensor data with a sample rate of 250 samples per second and 24 bits of velocity noise resolution.Since we are only interested in frequencies below 10 Hz, all data are first converted to IEEE-754 32-bit float, decimated to r = 25 samples per second (and thus f Nyquist = 12.5 Hz), and filtered with an 8th-order zero-phase type-I Chebychev low-pass filter, with −3 dB at 10 Hz.Data are corrected for the transfer function of the 5 Hz geophone-based sensors and all channels were scaled with their channel specific normalization factor.
The distance between witness sensor and the edge of the network equals about d max = 0.5 km.For a lower-bound velocity detection threshold of v min = 100 m s −1 , the required record length according to equation ( 3) equals s = 512 samples.
The data set consists of 12 days of sample data from 67 participating sensors.We employ a 75% overlap, and the final data set consists of 202 432 records of which 80% are used for training.

Training
Training was executed on an Nvidia Tesla V100 with the Nadam neural network optimization algorithm [17] with a learning rate of 10 −5 .To find the most optimal model configuration Hyperband [18] HPO with Keras Tuner [19] has been applied.The resulting model consist of over 66 M trainable weights and contains 18 input bands of 778 mHz width that have an 11% overlap.Convergence was reached when the validation loss did not decrease for 90 epochs, and this condition occurred after 265 epochs.

Results
A representative validation record of the vertical surface seismic velocity information in the time-domain with actual and predicted signals is shown in figure 6(a).The seismic velocity information estimated by the DNN (S ŷ) accurately matches the signal measured by the witness sensor (S y ).Note that these time-domain data are high-pass filtered with f = 584 mHz and lowpass filtered at 10 Hz.
The performance of the Seismonet model in the frequency domain is shown in figure 6(b).The figure shows that the median power of the true and predicted signals are nearly identical in the frequency domain.The bands represent the 10-90 percentile, and the data are low-pass filtered at 10 Hz.A phase mismatch between actual and predicted signal would not show up in these spectra.The quality of the prediction should be judged by considering the residual power (|R y − R ŷ| 2 ) that remains after subtracting the DNN-prediction from the data.As shown in figure 6(c), the residual power is more than a factor 20 smaller than the uncorrected power in the frequency band between 3 and 10 Hz.At lower frequencies the small size of the seismic sensor array limits the spatial resolution.Together with increased instrument noise this limits the ability of the neural network to make accurate predictions.

Comparison to a Wiener filter approach
Wiener filters [20] are widely used in current gravitational wave research.In a Wiener filter one subtracts the correlated noise from a signal in an optimal manner, provided that the influence of the noise is linear.In our application we want to subtract the acceleration noise δa, induced by seismic activity, by using seismometer signals s i .The Wiener filter employs the measured cross-correlations between all the sensors and accelerations to minimize the noise in δa; i.e. it subtracts the correlated part of the signals between the seismometers and acceleration.For finite, discrete and real-valued data sets the circular cross-correlation between data streams s i and s j is defined as where the index k runs over all time samples and the subscript mod N indicates modulo N.
It is assumed that the finite discrete data set can be described as periodic with a period of N samples.The correlation vector (s i ⊗ s j )(τ ) is part of a Fourier-pair; the values in the time domain can be calculated from with F −1 the inverse fast-Fourier transform operation, S( f ) the data stream in frequency domain (S(f ) = F(s(t))), and S * the complex conjugate of S. This allows one to build up the normalized cross-correlation matrix Corr ij (f ) in Fourier-domain by summing data from different time segments as with For the Wiener filter, the prediction for the acceleration in the x-direction in frequency domain a with the filter coefficients C i (f ) given by The residual noise when applying a DNN to the data is shown by the orange curve and R ŷ represents the prediction of Seismonet.The green and red curves represent the residual power when using the unmodified Wiener filter P wf and modified Wiener filter P mwf , respectively.In the latter, seismic data segments with outliers are not used for filter construction.The geophone self-noise is shown by the dotted curve [21].
Here Corr ja (f ) represents the vector of normalized cross-correlations for frequency bin f between sensor j and acceleration a x , Corr −1 ij (f ) the inverse of the normalized cross-correlation matrix Corr ij (f ), and |a x (f )| and |S i (f )| the average values for the amplitudes (the square root of the average power) of channels S i and a x .
Next we show that both Wiener filters and DNNs are highly capable in predicting the displacement field measured by a witness sensor by using the data from surrounding sensors.For Wiener filters this is the case as wave speeds are constant and the displacement field is continuous, so the optimal linear combination of a set of sensors surrounding the reconstructed sensor should have a high overlap with the witness sensor.The construction and application of the Wiener filter is done with the same data and segmentation as used for the neural network, where training data are used for construction of the filter and test data used for prediction.Sudden local excitations in one of the input channels are excluded as these may greatly reduce the Wiener filter's predictive capabilities.For this reason we drop seismic data segments from the training data if the RMS of one of the channels exceeds 3σ with respect to the median RMS of the previous seven hours of seismic data for that channel.
The PSDs of residual noise when using a Wiener filter (with and without outlier removal) and DNN are compared in figure 7. Good performance in the Wiener filter approach was obtained after removal of outliers from strong transient events (P mwf ), while such a procedure was not required for the DNN.

Estimating underground seismic displacement noise
We have shown that both a Wiener filter and Seismonet are capable of estimating surface seismic displacements.Next we study the performance for estimating underground seismic displacements.

Sensor network topology
A sparse seismic surface array in combination with a borehole sensor was used to predict vertical seismic activity at 250 m depth.The total number of input channels is 66 and their layout is shown in figure 8.For the sparse array two weeks of survey data were obtained from a seismic survey executed in South Limburg, the Netherlands, which lasted from 14 to 27 November 2020.Survey data from 60 Quantum sensors, located within 3.5 km from the borehole, were used.In addition data from three seismometers were added, that provide 3-channel data.The first station is located at seismic station Heimansgroeve [22], the Netherlands at a depth of only 3 m.The other seismometers are located at Terziet with a Nanometrics Trillium-240 seismometer placed at the surface and a Streckeisen STS-5A seismometer placed at the bottom of a 250 m deep borehole; the PSD of the latter is shown in figure 4. The vertical seismic displacement information of the borehole sensor acts as the witness sensor for the neural network.The horizontal channels of the borehole sensor are not used.
The network spans about d max = 3.5 km but is less-than-optimal considering the off-center location of the witness sensor.Moreover, due to the sparseness of the array we expect only reasonable results at frequencies below 500 mHz.For this reason all data are decimated to a sample rate of 1.25 samples per second.Assuming v min = 200 m s −1 and applying equations ( 2) and (3) we obtain s = 128 samples per record per channel.
The data set consisted of 14 days of sample data from 67 channels.We employed an 87.5% overlap between each record resulting in a data set of 94 336 records.Data from sensors were first collected and decimated to 5 samples per second.Finally the data were low-pass filtered and decimated to 1.25 samples per second with an 8th-order zero-phase type-I Chebyshev with the −3 dB point at 0.5 Hz.Instrument response was removed and all channels were scaled with their channel-specific normalization factor.An optimal Seismonet model with 32.5M trainable  [15].Panel (c) shows the residual underground seismic vertical DNN information after subtracting the prediction from the recorded information.For reference the median true signal power is also provided (red curve).
weights was found with Hyperband HPO.The resulting model contains 32 input bands of 29 mHz width and with 66% overlap.

Results
The validation record presented in figure 9(a) shows the measured seismic vertical velocity from the witness sensor at a depth of 250 m and the prediction from the neural network.Again good consistency between the two wave-forms in the time domain is evident.Figure 9(b) shows both the predicted and recorded power of the vertical seismic velocity at 250 m depth in the frequency domain.The residual power is shown in figure 9(c); it is about two orders of magnitude lower than the uncorrected power in the whole frequency band.Note that the employed onesided sparse seismic network was not optimized.More accurate and broad-band results can be expected for an optimized seismic network topology, particularly if underground sensors are included.
We compare the performance of the neural network to that of the Wiener filter in figure 10.Both the DNN and the modified Wiener filter perform equally well.In contrast to the case of the surface array survey discussed in section 3, the unmodified Wiener filter performs relatively well.This can likely be attributed to the inclusion of seismometers which have lower noise floor and are better shielded from external events such as wind and rain.Outside of the microseismic peak the DNN seems to perform better.Whether this is due to the lower signal to noise ratio or due to the presence of non-linear effects has not been explored.) is represented by the blue dashed curve.Seismonet's prediction for Ry is denoted R ŷ and the power of the residual noise is represented by the orange curve.The green and red curves show the residual noise of the predictions P wf and P mwf for the Wiener filter and the modified Wiener filter, respectively.In P mwf , records with outliers have been removed from the dataset.

Newtonian noise subtraction
Subtraction of Newtonian noise requires knowledge of both the mass-density distribution of the local geology and the time-dependent seismic displacement field.The mass-density distribution can be obtained from passive and active seismic surveys, while information on the seismic displacement field can be obtained from a sensor network.Studies have shown [3] that in order to obtain accurate estimates of Newtonian noise, these distributions must be integrated over kilometer-scale distances from the test masses.For the geology at Terziet it was estimated that integration radii up to 2 km are required to capture more than 90% of the Newtonian noise, due to the contributions of a body-wave background.
The previous sections have shown that both a Wiener filter and neural network can be used to predict the seismic displacement field at specific locations.However it is impractical to characterize this field up to large distances and integrate this field in order to estimate Newtonian noise.Newtonian noise background subtraction requires a solution that goes beyond what was presented in figure 1 and where the actual interferometer strains are used as witness channels.
We envision that a permanent seismic sensor network will be installed surrounding each corner station as is shown in figure 11.Sensors are placed on the surface and in the tunnels of the observatory.This minimizes the construction effort and avoids the need for additional boreholes.For frequencies below about 10 Hz the strain outputs of the low-frequency interferometers of Einstein Telescope contain a sizable contribution from Newtonian noise, and this contribution de facto results from an integral over all relevant mass-element displacements.The difference between the predicted interferometer strain signal and the true signal will be minimized by the neural network.

Synthetic data generation
Wiener filter and Seismonet models can be trained on real data, and for Newtonian noise subtraction to be effective, the actual detector strain data must contain a significant contribution from Newtonian noise.This will be the case for the low-frequency interferometers of Einstein Telescope, where Newtonian noise is expected to dominate the strains for frequencies from 2 to about 8 Hz.Here the assumption is made that technical noise contributions (e.g.control noise) are not greatly limiting the sensitivity.A seismic-sensor network surrounds each test mass and has sensors both on the surface and underground at the Einstein Telescope (ET) site.The seismic network output and the interferometer output are used to train a machine-learning algorithm for mapping seismic network output to the interferometer output.Once the model is trained it can be used to predict the interferometer output given a seismic network output.This signal can be subtracted from the true output of the interferometer.For a Wiener filter approach the two boxes labeled 'DNN' would be replaced by a box labeled 'Wiener filter crosscorrelation'.
Data from an array of seismic sensors surrounding the main mirrors are needed.At present such data do not exist.Therefore, synthetic data are used that are derived from the model by Bader et al [3].This model is schematically shown in figure 12 and incorporates both the geology at Terziet in Limburg and seismic noise PSDs that were measured with a surface sensor array and 3C seismometers at the surface and at 250 m depth.The numerical calculations provide the synthetic displacement field around the test mass.This allows to calculate the Newtonian accelerations on the mirror.Einstein Telescope features three low-frequency interferometers and thus three strain signals.The strain signal of each interferometer is due to the noise power of four mirrors.
In our analysis we assume the seismic noise at each corner station to be independent.Note that in reality the contributions due to Newtonian noise will be correlated in the case of three colocated interferometers, as each corner station hosts the input-mirrors of one interferometer, but the end-mirrors of the other interferometers.How such correlations can be exploited to minimize the effect of Newtonian noise remains to be investigated.
The displacement field has been calculated with the Matlab Elastodynamics Toolkit EDT [23] by using isotropically distributed excitations in a radius of 6 km around the center of the model space.Sources have random location, direction, phase and Gaussian distributed amplitude, and the strength of the sources is matched to reproduce the mean PSD at the surface.In the numerical calculations, we integrate the seismic displacement fields to a radius of 4 km around the test mass.Other observables as reported in the fore-mentioned survey from Koley et al [14], such as the frequency-dependent reduction of seismic amplitude with depth and the wave speeds and observed overtones in the beamforming data, are qualitatively reproduced by our model.
Apart from anthropogenic noise, we identified the presence of a body-wave background of which the strength was measured by analyzing day-night ratios for the PSDs measured at the surface and at 250 m depth [14].This body-wave background is currently poorly known; we only have data from a single tri-axial sensor at 250 m depth.It was modeled by randomly generated plane waves, isotropic in displacement and in traveling direction.No rescattering, reflections, damping, and/or dispersion is included, although the density of the rock layers is taken into account.The body waves feature a P-wave component traveling at 4 km s −1 and an S-wave component at 2.8 km s −1 .A more realistic model with better parameters and going beyond plane waves would require input data from a survey with more sensors at depth.
The model is used to generate 50 000 training records by adding 20 randomly distributed surface sources and 4 random body waves.Using a combination of 20 sources for each training record leads to a more realistic description, since at any given moment in time the activity in the field cannot be described by the action of a single source.The oscillations in the subsurface are recorded by a virtual mirror which is placed at position (0, 0, −250) m, enclosed in a 10 m radius cavern, in the model space and by a number of virtual seismic sensors.For the virtual mirror only the x-acceleration a NN  true is derived corresponding to the signal to be predicted.In addition the model contains two types of virtual 3C seismic displacement noise sensors, 132 geophone-based sensors and 3 seismometers.Both types of sensors provide the same seismic displacement data, but with different frequency-dependent sensor self-noise.For the geophone-based sensors, the self noise n [m ( √ Hz) −1 ] can be parameterized as with frequency f in Hz [24].For the underground seismometers the self noise is an order of magnitude less.The position of the sensors in the model space is shown in figure 12.Note that the network topology is by no means optimized, but solely serves to demonstrate the value of a Wiener filter or DNN for Newtonian noise subtraction.Each record contains seismic and Newtonian noise in a limited amount of time (few seconds) for a single frequency and for 136 sensor locations.The synthetic data must be preprocessed for suitability for the DNN.During preprocessing sensor-specific self-noise is added to the synthetic data.All data are in the frequency domain, and belong to a specific frequency band (1, 3, 5 and 10 Hz).
The neural network will be equally capable to reconstruct the acceleration of a single mirror when trained on this parameter as to subtract Newtonian noise in the case of three co-located detectors in an equilateral triangular layout when trained on combinations of the strains of these detectors.
This can be understood by considering three interferometers with the input and end mirrors of the Fabry-Perot cavities located in corner stations labeled 1, 2 and 3.The three interferometers provide three strain measurements S 1 , S 2 and S 3 that measure the arm length differences (∆A − ∆C), (∆B − ∆A), (∆C − ∆B), respectively.We choose an orthonormal, right-handed coordinate system with the direction of the x-axis pointing from corner 1 towards corner 2 (along arm A) and the z-axis pointing downwards, towards the center of the earth.We can write the strains as differences in mirror displacements in the x and y horizontal directions (we ignore the small effects of displacements in the vertical direction); e.g.
where higher-order corrections of the order (δ 2 /L) with δ the mirror displacement and L the arm length are neglected.It follows that the strain signal S 1 depends on 5 variables: the mirror displacements in the x-and y-directions for the mirrors at location 1, 2, and 3.This may seem a more difficult task for the neural network than finding the acceleration of a single mirror in the x-and y-direction (2 independent parameters).However, these five variables are not independent; only one component of the displacement vector in each of the corners contributes.
For S 1 for instance, the relevant displacements for mirror 1 are in the horizontal direction with an azimuthal angle of 120 • (displacements with directions parallel to arm B, i.e. the combination − 1 2 δx(1) + 1 2 √ 3δy(1)), for mirror 2 in the direction of arm A, (δx(2)) and for mirror 3 in the direction of arm C, at an azimuthal angle of 60 • (the combination of 1  2 δx(3) ).With a triangular lay-out, we have three different strain signals.Each of those strain signals depends on three independent variables, namely the displacements of the mirrors in a direction parallel to an arm; which arm depends on which strain signal is calculated.Thus, the accelerations in the corners contribute with different weights to the strains S 1 , S 2 , and S 3 .
In the current study, we train the network to reproduce the acceleration of the test mass in the horizontal direction (a x and a y ).In our model the noise in the x-and y-direction are uncorrelated, and the neural network must be able to minimize the uncertainty in 2 independent horizontal directions.This is tested by checking if the neural network is equally capable to predict the acceleration of an underground mirror in the x-or y-direction both separately and simultaneously on the basis of the seismic activity measured in our field.We expect that a neural network that is capable of reconstructing the independent accelerations of a mirror at a given location is equally capable of reconstructing three strain signals, each depending on the mirror accelerations at three different locations; although each strain signal depends on three degrees of freedom, these degrees of freedom are constrained by having three different strain measurements (e.g. the mirror displacement δx (1) contributes with different factors to the strains S 1 , S 2 , and S 3 and the neural network should be sensitive to this fact).

Analytical considerations
Newtonian noise depends in a complicated manner on the displacement fields in the surrounding soil and on the geometry of tunnels, caverns, hills and layer boundaries.For that reason, a numerical approach is warranted, both in predicting the noise and in subtracting the noise.It is not a priori clear that the displacement information is strongly correlated with the acceleration, or that a Wiener filter can capture the relevant information.To clarify this and to be better able to interpret the results we obtain in our study, we first proceed with a brief discussion of Newtonian noise in an analytical context.
Consider a cubical volume element with ribs of length ∆ in an infinite homogeneous medium.The vector r, points from the origin of our coordinate space to the center of the volume element.The normals to the surface ni for the six faces of the cube are aligned with the unit vectors ±ê i .The equilibrium density, the density of the undisturbed infinite medium, equals ρ 0 .The presence of a time-dependent displacement field u i (t) introduces mass currents with a size of J i = ρ dui dt .From the continuity equation, we can relate the change in mass in an element of volume V to the current through the surface S of that element as The Einstein summation convention to sum over repeated indices is used here and throughout this section.The mass fluctuation with respect to the mass at equilibrium density can be obtained by integrating the mass currents through the surface from the time t ui =0 where u i equals zero (at the surface) to the present time t, The change in mass due to the integrated currents through the faces of the cube that are parallel to the i-direction thus equate to The change in density δρ is obtained by dividing the change in mass δm in the volume element by its volume ∆ 3 as where in the last step we take the limit that ∆ is small compared to the wavelengths present in the displacement field.One can decompose the seismic displacement field in terms of plane waves as where A w,i represents the amplitude of the displacement field in the i-direction, k w,j the wave propagation vector (with length equal to 2π divided by the wavelength), ω the angular frequency, t the time and ϕ 0 the phase of the wave at t = 0 in the origin.
Inserting equation ( 17) into equation ( 16) yields for the density fluctuation associated with the displacement field at location r δρ(r) = ρ 0 ∑ w i k w,i A w,i e i (kw,jrj−ωt+ϕ0) .(18) One can observe that only longitudinal wave components with k w,i A w,i ̸ = 0 introduce density fluctuations inside the medium.Furthermore, the density fluctuation in a volume element is a scalar property; it depends on the total divergence of the displacement field, not on the direction of the field u i or on the direction of the wave vector k j (which are vector properties).
The contribution to the Newtonian noise from the density fluctuations in the cubical volume element with ribs of length ∆ at location r on a test mass located at the origin can be written as where δa i represents the change in acceleration of the mirror in the i-direction and ρ∆ 3 the mass of the volume element considered; we assume that ∆ is small compared to the distance r here.The additional phase factor e i π/2 at the end of equation ( 19) stems from the prefactor i in equation ( 18) that arose from taking the derivative of the displacement field.We notice that first of all, the change in acceleration is in the direction of the volume element, but it is not correlated with the direction of the displacement field.For example P-waves in x, y, and z directions all introduce similar density fluctuations.Secondly, the factor π/2 in the phase in equation ( 19) represents a 90-degree phase shift between the displacement field component and the density fluctuation δρ in the center of the volume element.The maximal/minimal density is reached when the displacement amplitude crosses zero; δρ(t) is zero for the maxima/minima in u(t).Thirdly, for transverse waves, u i k i equals zero (the displacements in the volume element do not contribute then), whereas for a P-wave, the factor u i k i changes sign for waves travelling in opposite directions, leading to a non-linear contribution of the displacement field in the volume element to the acceleration of the mirror, and to vanishing cross-correlations in a Wiener-filter approach in the case of isotropically-distributed waves.In that case a Wiener filter is not suited to predict the contributions of Newtonian noise from a homogeneous medium with isotropic fields on a test mass.It can estimate the magnitude of the contribution, but not the sign; filter coefficients for the cross-correlation will approach zero.
In practice, the test mass is not embedded in a homogeneous medium.It is located in a cavity in the rock, there are tunnels for the beams, and also the medium does not continue above the surface; there rock is replaced with air.In order to calculate the full Newtonian noise, the boundary conditions of the geometry have to be taken into account.From the last part of equation ( 14) one can derive that mass fluctuations associated with the movement of a surface element r 2 dΩ introduce changes in acceleration amounting to where nk represents a unit vector normal to the surface and r i points towards the surface element.Note that for this term, only displacements in the direction normal to the surface contribute.These may be due to S-waves or P-waves and indeed a large contribution to the Newtonian noise in our model calculations are due to vertical displacement fields at the surface.The direction of δa stems from the direction of the location of the surface element.Whereas the contributions from volume elements scale with 1/r, the surface contribution in a given solid angle element dΩ does not depend on the distance to the test mass.Surface contributions arise from the excavations around the mirror and from the soil-air boundary and depend on the geometry of these boundaries.In contrast to volume contributions, a Wiener filter approach may accurately predict surface contributions, due to the fixed phase relation between sensor displacement and mirror acceleration.Depending on the geometry and seismic field properties, the relative importance of surface and volume contributions may differ.

Wiener-filter simulation results
The synthetic data sets as described in section 5.1 have been applied to a Wiener filter.These sets contain 134 tri-axial sensors and any witness channel that we calculated (we can generate synthetic data for accelerations and displacements at any location).First, we tested whether the seismic sensor network was dense enough.We calculated the displacement field at the position of the test mass (in absence of the cavern around the mirror) and verified how well this field can be reproduced from the synthetic data of the 134 surrounding sensors by using the Wiener filter.As expected, the Wiener filter performs excellently here, since the problem of reproducing the displacement field is linear.The coherence between the most nearby sensor and the displacement field is above 0.99 for the frequencies 1, 3, and 5 Hz and still 0.987 for the frequency at 10 Hz.We use 40 000 different noise trials to build the correlation matrix and determine the Wiener filter coefficients, and use 10 000 new trials to see how these coefficients perform on new data.In all cases, the noise can be reduced with more than 40 dB when the projection is made for the sum of the 40 000 training trials, i.e. if the measured correlations are applied to the full training set.For new trials (with different sources and different noise then in the training set) the results are still excellent; here the reduction factors equal 21 dB at 1 Hz, 37 dB at 3 Hz, 32 dB at 5 Hz and 24 dB at 10 Hz.
Next, we apply the Wiener filter on the acceleration of the test mass in the x-direction.We use 40 000 trials to obtain the filter coefficients and 10 000 new trials to establish the performance.The results for Newtonian noise are shown in figure 13.
We observe that the Wiener filter is capable of reproducing the noise at 1 Hz quite well, but that the performance deteriorates for higher frequencies and it performs marginally at 10 Hz.This can be understood from the size and nature of the relative contributions to the Newtonian noise.Since for the Wiener filter we determined all normalized cross-correlations, we can inspect which sensors contribute most.At 1 Hz, the wavelength is long, typically kilometer scale for Rayleigh waves, and the soil moves largely coherently in the vicinity of the mirror.This implies that the cavern wall displacements and to a lesser extend the surface displacements (which are coherent over a long distance at 1 Hz) contribute relatively more than the local density fluctuations in the surrounding soil.The maximal correlation (0.605 in amplitude) occurs with the sensor in the x-tunnel 250 m downstream from the mirror, where the x, y, z coordinates are defined in figure 12.This is not the most nearby sensor, it seems that at this position the displacement contribution from the cavern wall and the density contribution from the P-wave component of the wave in x-direction add coherently.Indeed, the coherence between the displacement field in the y-direction and the mirror acceleration in x-direction is also substantial for this sensor (0.346).We interpret this as due to events with P-wave components traveling under an angle with respect to the x-axis; for some waves the displacement of the cavern can be close in phase with the density fluctuation in the soil at the sensor position 250 m in x-direction (which is 90 degrees out of phase with the displacement field at the same location).It seems to help if the contraction is partly due to an y-component in the wave.The correlations for both x and y displacements with the acceleration in x-direction are close to π radians out of phase; as should be expected from equations (19) and (20).
At 10 Hz, the wavelengths for surface waves are rather short (in the order of 25 m).The sizes of the regions that contribute coherently are small, leading to small correlations between the contributions from any given sensor and the acceleration of the mirror.The maximal correlation between displacement field and acceleration at 10 Hz is only 0.14 in amplitude.It occurs between the test mass acceleration and the vertical displacement at the surface, measured by a sensor located at −400 m in x-direction.The next-highest correlation is also with the vertical channel of a sensor at the surface, this time at −200 m in x-direction.Both correlations show a phase close to zero, as expected from equation (20) for the terms due to surface displacement.The largest correlations for an underground sensor occur for a sensor 13 m away, at x = 6.7 m and y = −11.5 m, with coherences of 0.099 and 0.089 for the displacements in x-and y-direction, respectively.That the largest correlations occur due to surface sensors can be understood when one realizes that at 10 Hz, the displacement power spectral densities are four orders of magnitude larger than at 250 m depth.Even though the surface is relatively far away, it still dominates in the integral.
In conclusion, Wiener filters may be applied to subtract Newtonian noise, and are most effective at long wavelengths and in cases where the density fluctuations induced by P-waves in the volume of the surrounding rock are not a dominant contribution.For the Terziet geology, at frequencies of 10 Hz, we find that a Wiener filter approach is performing poorly.

DNN for Newtonian noise prediction
For the Newtonian noise prediction, a different implementation for Seismonet is required which only operates on records generated for a single frequency f ∈ {1, 3, 5, 10} Hz. Figure 14 shows that this DNN has two pipelines, one for linear correlations (bottom) and one for nonlinear correlations (top).The synthetic input data consists of the real and imaginary part of the spectra of 136 sensors that measure in the x, y, z direction as described in section 5.1.

Figure 14.
The DNN model employed for the Newtonian noise prediction from synthetic data consists of a linear and a non-linear pipeline.For the linear layers the number of neurons is shown, while for the non-linear layers the numbers of neurons and the dropout values differ depending on frequency.Input consists of a 3D array of sensorid, direction (x, y, z), and complex amplitude (Re, Im); output is the predicted mirror acceleration (Re, Im) in the x-direction.The linear pipeline optimizes through linear combination of channels, effectively similar to a Wiener filter after optimization.This pipeline has no hyperparameters.The non-linear pipeline consist of 4 fully connected layers with three activation functions, and three dropout layers.This pipeline has been optimized for each frequency separately with HPO.The number of neurons for the fully connected layers l1 . . .l3 and the dropout value for dropout layers d1 . . .d3 are provided in table 1.During HPO, the activation function was also optimized, and the best models all utilized the hyperbolic tangent (tanh) activation function.
Training was done with the same hardware and optimizer configuration as discussed in section 3.3.Convergence is reached after validation loss does not decrease for 90 epochs.

Results for Newtonian noise prediction
The performance of the neural network is shown in figure 15.The figure shows validation data for frequencies of 1, 3, 5 and 10 Hz.For each frequency the red histogram shows the true mirror The figure shows that subtracting a NN pred from the true acceleration a NN true reduces the amplitude of the Newtonian noise by a factor of 2.8 at 10 Hz.The predictions are better at low frequencies, where seismic wavelengths are larger and coherence lengths are longer.
The DNN was successfully applied to minimize a x and a y separately, and to minimize both horizontal accelerations simultaneously.We expect that the neural network can be equally well applied to reconstruct the acceleration of a single mirror when trained on this parameter as to subtract Newtonian noise in the case of three co-located detectors in a equilateral triangular layout when trained on combinations of the strains of these detectors; see section 5.1.
Figure 16 shows the predicted contribution of Newtonian-noise to the sensitivity of Einstein Telescope at the EMR candidate site with and without the use of seismic sensor arrays in combination with a DNN.The Newtonian noise contains contributions from seismic surface waves and a body-wave background.The prediction assumes the strain sensitivity for a single interferometer with 10 km arm length, and test masses at 250 m depth enclosed in 10 m radius caverns.Solid curves show the mean, while the colored bands show the 10-90 percentile.The ET-D sensitivity is represented by the dashed-black curve.Our results show that a DNN can effectively predict mirror accelerations from seismic displacement data obtained with a network of tri-axial sensors.This allows predicting Newtonian noise and our study indicates that a significant contribution can be subtracted from the strain output, especially at low frequency.

Summary and outlook
A DNN that employs a network of surface and underground sensors as input is a promising tool for subtracting Newtonian noise from gravitational-wave strain signals.Such a network is specifically effective in the band where Newtonian noise strongly contributes to the detector sensitivity.
The applied model includes the full solution of the elasto-dynamic wave equation for seismic surface excitation, while body waves are modeled as harmonic plane waves.Studies of underground geology and source distribution remain important to obtain a more realistic layered 3D geology and enhanced characterization of both local geology and seismic wave field, mostly notably of body waves.Additionally, the accuracy of a DNN for Newtonian noise prediction may benefit from the use of three-component sensors.Also the displacement noise generated by the facility infrastructure will be an important factor in the actual Newtonian noise experienced at the observatory.These effects need to be numerically studied as well.
For Einstein Telescope with three low-frequency interferometers the synthesis problem is complex.For each interferometer the strain signal depends on the accelerations of the 4 Fabry-Perot cavity mirrors at the ends of 2 arms.For uncorrelated noise, we show that our neural network can perform equally well in subtracting Newtonian noise in the case of three colocated interferometers, where three strain signals are present.Each corner hosts the input mirror of one interferometer and the end mirrors of the other two interferometers.This causes the seismic noise to be partly correlated, especially at low frequency.To have a realistic test would require better knowledge of the body-wave background, implementation of the different geologies at the corner stations, the exact lay-out of tunnels, excavations, and underground infrastructure, and the local environmental noise.Such detailed modeling will be the subject of future work.

Appendix. Seismonet
As discussed in section 2.1 Seismonet operates in the frequency domain and takes the complex spectra of reference channels for a specific segment of time as input and estimates the complex spectrum for a witness channel.
Seismonet has a sequential DNN architecture, illustrated in figure A1, where we follow the nomenclature of Keras [19].Each layer in this architecture takes the input data from the previous layer, performs a specific operation, and passes the result to the next layer.The number of times a section is repeated (G and H in figure A1) is configurable by hyperparameters.Additionally each repetition does not need to have the same number of neurons.Seismonet provides customization through hyperparameters, allowing for example adjustment of the number and width of bands in the band correlation section, the number of neurons in the connected layers, and the deactivation ratio in the dropout layers.
The most important layers are the fully connected and locally connected layers which are labeled respectively Dense and LocallyConnected in figure A1.These are the only layers which can be optimized and involve one or more matrix multiplications of type o = Wi + b, (A.1) where i represents the vector of input values and o denotes the output vector.Vector b is optional and contains bias weights, while W represents the matrix of trainable weights and contains dim(o) × dim(i) values.The neural network is trained by using back-propagation learning.This is an iterative process involving neural network prediction, computing the error between the predicted and expected data by using a loss function, and adjusting the weights of the network layers such that the error is reduced.Generally training ends when the error no longer decreases substantially or when a specific number of training iterations has been completed.For a comprehensive review of DNNs see [25].
A fully connected layer performs a single matrix multiplication over all the input data, while a locally connected layer performs matrix multiplications on subsections of the data.This can be visualized as sliding a window over the input data, where each window position has its own corresponding W, and only the data within the window is used as input i.
Figure A2 provides a simplified example of our use of this sliding window property to generate distinct frequency bands from the input data.The example assumes an input for 6 frequency bins and from 6 reference channels.This input is separated into 3 bands by using a window of 3 × 6 (frequency bins × channels) with a stride of 2 in frequency.Since each spectrum has a real and an imaginary component, the input for each band contains 2 × 3 × 6 = Figure A1.Seismonet consists of a sequence of layers, each serving a specific purpose.The purple layers modify the shape of the data while preserving its content.Orange layers contain neurons that perform a linear transformation on (a subset of) the data.The green activation layers apply a non-linear function to introduce non-linearity into the network.Blue dropout layers are incorporated to prevent over-fitting and improve the generalization.Both the band correlation and full correlation layers contain sections that can be repeated a number of times.36 components.These values are transformed into a flattened vector i n with n as band index.By employing the band-specific weight matrix W n and bias weight vector b n , this input vector is mapped to an output vector o n , whose elements are denoted filters.In this example we assume that HPO yields that 4 filters are needed to optimize this layer.Consequently, the four elements of the output vector o n for a specific band are now linear combinations of all the input values, and can generally no longer be attributed to a specific reference channel or frequency bin.Subsequently, the output vectors of the different band are concatenated to form a new output comprising bands × filters = 3 × 4 = 12 components.
The advantage of using bands in the first section of the network is twofold.First it reduces the number of weights required, compared to a fully connected layer with the same number of inputs and outputs, thereby reducing model complexity.Second it guides the optimization process to take into account bands with a lower signal to noise ratio compared to that of the full spectrum, allowing for more effective signal extraction.The addition of overlap between bands, achieved by setting the stride to be less than the width of the band, in bins, enables the network to effectively capture frequency information at band transitions.Omitting this overlap leads to reduced accuracy in determining frequencies near the edges of the bands.
The way Seismonet applies the LocalllyConnected2D decreases the number of dimensions of the data from three (channels × frequencies × R(R x ), I(R x )) to two (bands × filters).For this reason subsequent layers only require a LocalllyConnected1D layer to process data in separated bands.This layer is equivalent to a LocallyConnected2D, but operates on data with two dimensions: bands and filters.
The output of one layer is fed into the next with an activation function A i  Example of how the LocallyConnected2D layer is applied to separate an array of 6 frequency bins and six channel data into three separate frequency bands.A stride of 2 in frequency results in an overlap of 1 frequency bin per band.The output vector for band n is found by on = Wnin + bn where in denotes a vector of the flattened input values in band n, and Wn and bn the weights specific for that band.In this example zero padding is applied first as the number of bands would otherwise not fit the data.Hyperparameter optimization is assumed to indicate that 4 filters are optimal for this layer, leading to 4 × 36 + 4 = 148 trainable parameters for each band.
HPO we determined that for Seismonet the ELU [12] activation function produces the most accurate networks.ELU is given by ELU(x) =

{
x, if x ⩾ 0 α(e x − 1), otherwise, ( where for Seismonet α = 1.The dropout layer is only active during training and effectively randomly disables a number of neurons in the previous layer.This is achieved by zeroing their output.The ratio of neurons which are disabled can be configured, but the specific set of neurons changes per iteration.Dropout forces the network to generalize over multiple inter-layer connections and prevent over-fitting.
The complexity of Seismonet can be appreciated by considering some details for the DNN implementation used in section 3 to estimate ambient surface noise.Here, Seismonet was configured with the sequential model, with layer configuration, data dimensionality and number of trainable weights provided in table A1.
As an example, the derivation of the number of weights will be explained for the LocallyConnected2D layer.This layer is employed to partition the input data into 18 bands, with each band comprising 16 frequency bins and 66 input channels.Thus, the total number of inputs per band for this layer can be calculated as Number of inputs per band = 16 × 66 × 2 = 2112.
Here, the factor 2 arises from the real and imaginary part of the spectrum of each reference channel.
The next step involves mapping these inputs to 1000 filters by using a weight matrix of dimension 2112 × 1000.Additionally, each filter is accompanied by an extra bias weight.Hence, the total number of weights per band can be calculated as Number of weights per band = 2112 × 1000 + 1000 = 2113 000.Considering that there are 18 bands, the overall number of weights in the LocallyConnected2D layer can be calculated as Number of weights in LocallyConnected2D = 2113 000 × 18 = 38 034 000.

Figure 1 .
Figure 1.Example seismic network for predicting the seismic displacement signal of a witness sensor (denoted by the red sensor in the center of the network).

Figure 2 .
Figure 2. Example of a record S showing the witness channel in red and the reference channels in blue.Signals in the shaded regions of length s/4 may receive contributions from previous or next records.All signals are windowed.

Figure 3 .
Figure 3. DNN used for seismic excitation prediction.The first three funnel-shaped layers process the seismic input data per-band.The middle two layers allow cross-band correlations.The final layer is used for scaling.

Figure 4 .
Figure 4. Power spectral density at the surface as measured by a quantum sensor and at 250 m depth by a seismometer.The colored bands represent the 10-90th percentile.Peterson's NLNM and NHNM are shown as dashed curves.

Figure 5 .
Figure 5. Layout of Array A and Array B showing seismometers laid out on rings of different radii on a map of Terziet, Limburg.The borehole at the site is marked with a star.The witness sensor at the center of Array A is at GPS coordinates 50.753 158, 5.905 013 and is shown by the green triangle.Map data © 2020 Google.

Figure 6 .
Figure 6.Panel (a)  shows the time-trace of a typical single validation record of the vertical surface seismic velocity information recorded by the witness sensor (Sy).The seismic velocity information estimated by the DNN (S ŷ) is in good agreement with the measured signal.Note that all data are high-pass filtered with f = 584 mHz and low-pass filtered at 10 Hz.Panel (b) shows the median power of the true signal (red curve) and the predicted signal (blue curve).The bands represent the 10-90 percentile, and the data are low-pass filtered at 10 Hz.For reference Peterson's NLNM and NHNM are provided as dashed curves.Panel (c) shows the residual surface seismic velocity information when the DNN prediction is subtracted from the recorded data.For reference the median true signal is provided in red color.

Figure 7 .
Figure 7.The PSD of actual noise (labeled Ry) is represented by the blue dashed curve.The residual noise when applying a DNN to the data is shown by the orange curve and R ŷ represents the prediction of Seismonet.The green and red curves represent the residual power when using the unmodified Wiener filter P wf and modified Wiener filter P mwf , respectively.In the latter, seismic data segments with outliers are not used for filter construction.The geophone self-noise is shown by the dotted curve[21].

Figure 8 .
Figure 8. Seismic sensor array used to estimate vertical underground seismic displacement noise at a depth of 250 m.Red pins indicate sensor nodes, the blue pin a permanent seismometer located at Heimansgroeve at a depth of 3 m.The center of the black circle shows the location of the Terziet borehole with a 3C sensor station at the surface and one at a depth of 250 m.The vertical displacement information of the borehole seismometer is used as witness channel.Map data © 2021 Google.

Figure 9 .
Figure 9. Panel (a) Time-trace validation record of the underground vertical seismic velocity information (Sy) recorded by the witness sensor at 250 m depth.The seismic velocity information estimated by the DNN (S ŷ) is also shown and matches the measured signal reasonably well.All data are high-pass filtered with f = 29 mHz and low-pass filtered at 0.5 Hz.Panel (b)shows the median of the true power (red curve) and the predicted signal (blue curve) for the sensor at 250 m depth.The bands represent the 10-90 percentile and data are low-pass filtered at 0.5 Hz.For reference Peterson's NLNM and NHNM are provided as dashed curves[15].Panel (c) shows the residual underground seismic vertical DNN information after subtracting the prediction from the recorded information.For reference the median true signal power is also provided (red curve).

Figure 10 .
Figure10.The PSD of actual noise (labeled |Ry| 2 ) is represented by the blue dashed curve.Seismonet's prediction for Ry is denoted R ŷ and the power of the residual noise is represented by the orange curve.The green and red curves show the residual noise of the predictions P wf and P mwf for the Wiener filter and the modified Wiener filter, respectively.In P mwf , records with outliers have been removed from the dataset.

Figure 11 .
Figure 11.Overview of a machine-learning solution for Newtonian noise subtraction.A seismic-sensor network surrounds each test mass and has sensors both on the surface and underground at the Einstein Telescope (ET) site.The seismic network output and the interferometer output are used to train a machine-learning algorithm for mapping seismic network output to the interferometer output.Once the model is trained it can be used to predict the interferometer output given a seismic network output.This signal can be subtracted from the true output of the interferometer.For a Wiener filter approach the two boxes labeled 'DNN' would be replaced by a box labeled 'Wiener filter crosscorrelation'.

Figure 12 .
Figure 12.Surface and underground sensor arrays employed in the DNN model that is used to predict mirror acceleration due to Newtonian noise.Circles represent the 3D distribution of seismic sensors.An array of three-component geophones is placed on the surface (orange), and at 250 m depth a combination of geophones (blue) and threecomponent seismometers (red) follow the tunnels of Einstein Telescope.The pentagon shows the location of a mirror which acts as the witness sensor and is enclosed in a 10 m radius cavern.The modeled soil layers are indicated by colored bands.The gray cylinder represents the boundary of the vertical access shaft of a corner station of Einstein Telescope.

Figure 13 .
Figure 13.Wiener filter validation record data for Newtonian noise at frequencies of 1, 3, 5 and 10 Hz.Histogram of the original x-direction mirror acceleration noise distribution a NN true (red outline) and the reduced x-direction acceleration noise distribution (a NN true − a NN pred ) (blue filled).The mean of both distributions is shown as a vertical dashed line.The reduction factor mean(a NN true − a NN pred )/mean(a NN true ) is provided below the arrow in each plot.

Figure 15 .
Figure 15.Validation record data for frequencies at 1, 3, 5 and 10 Hz.The original xdirection mirror acceleration noise distribution a NN true is shown as outline and the reduced x-direction acceleration noise distribution (a NN true − a NN dnn ) as step-filled histogram.For reference the reduction when applying a Wiener filter to the same data a NN true − a NN wiener has also been provided.The mean of all distributions is shown as a vertical dashed line.The reduction factor mean(a NN true − a NN dnn )/mean(a NN true ) and mean(a NN true − a NN wiener )/mean(a NN true ) is provided after each arrow in each plot.

Figure 16 .
Figure 16.Prediction for the Newtonian-noise contribution to the sensitivity of Einstein Telescope at the Euregio Meuse-Rhine candidate with and without the use of seismic sensor arrays in combination with a deep neural network.The Newtonian noise contains contributions from seismic surface waves and a body-wave background.The prediction assumes three interferometers in an equilateral triangle with 10 km arm length, and test masses at 250 m depth enclosed in 10 m radius caverns.Solid curves show the mean, while the colored bands show the 10-90 percentile.The ET-D design sensitivity is represented by the dashed-black curve.
of element k of respectively the input of layer n and of the output vector of the previous layer.The activation function enables the network to perform non-linear functions and operates element-wise, thus preserving the shape of the input.With

Figure A2.
Figure A2.Example of how the LocallyConnected2D layer is applied to separate an array of 6 frequency bins and six channel data into three separate frequency bands.A stride of 2 in frequency results in an overlap of 1 frequency bin per band.The output vector for band n is found by on = Wnin + bn where in denotes a vector of the flattened input values in band n, and Wn and bn the weights specific for that band.In this example zero padding is applied first as the number of bands would otherwise not fit the data.Hyperparameter optimization is assumed to indicate that 4 filters are optimal for this layer, leading to 4 × 36 + 4 = 148 trainable parameters for each band.

Table 1 .
Parameters used for the non-linear pipeline of the DNN model shown in figure14.The values have been determined using hyperparameter optimization.For each frequency f ∈ {1, 3, 5, 10} Hz, the number of neurons is provided for the fully connected layers l1 . . .l3, and the dropout rate for dropout layers d1 . . .d3.

Table A1 .
Configuration of layers, data dimensionality and trainable weights for estimating ambient surface noise (see section 3) as provided by the Keras model summary function of Tensorflow.Dropout layers are omitted for brevity.See figureA1for a visual representation of the layers and the network.