Estimation of waveform deformation with the matched filter

In many particle physics experiments the data processing is based on the analysis of the digitized waveforms provided by the detector. While the waveform amplitude is usually correlated to the event energy, the shape may carry useful information for event discrimination. Thanks to the high signal to noise ratio they provide, matched filters are often applied. Their original design is however intended for the estimation of the waveform amplitude only. In this work we introduce an analytical extension of the original matched filter for the estimation of a possible shape deformation with respect to a reference template. The new filter is validated on simulations and, with respect to shape parameters calculated on unfiltered waveforms or derived from the original matched filter, it improves the discrimination capability by at least a factor 2 both at low and high signal to noise ratios, making it applicable to the data of several experiments.


Introduction
Particle detectors convert the physical interaction inside the target material into an output electrical signal.In order to enhance the signal to noise ratio (SNR) and to extract information efficiently, the digitized waveforms are often processed offline by applying digital filters [1].In particular, matched filters [2,3] are often exploited as they provide an accurate and robust estimation of the signal amplitude and, eventually, of the signal arrival time, in the presence of noise with arbitrary power spectrum.As the pulse amplitude may correlate with the energy of the event, this filtering technique represents often the ideal choice for the energy estimation.
On the other hand, when the goal is to assess the nature of an event, the attention is focused on the pulse shape [4] which can carry information on the type of interaction in the detector.In addition, waveforms whose shape is drastically different from the expected one are probably not generated by a particle interaction, but they could be due to electronic noise or detector instabilities.Pulse shape analysis can hence be used to identify the events of interest (signal) and reject the others (background), whether they are of particle or non-particle origin.
Pulse shape parameters are used to characterize the waveform.They can be evaluated on the unfiltered data, like in the case of rise and decay times [5,6], or can be derived after processing with a digital filter, such as the matched filter [7].When dealing with the raw waveform, the limiting factor in the evaluation of pulse shape parameters is the noise, that lowers the precision.When dealing with filtered waveforms, the limiting factor is the filter bandwidth, which may hide the features of different waveforms.These limits affect the discrimination potential, i.e. the capability of isolating the signal from the background, in particular in the case of low signal to noise ratio or small waveform deformation.
In this work, we expand the original mathematical formalism of the matched filter to estimate not only the amplitude and the time delay of the pulse, but also a possible deformation with respect to a template pulse.The distortion is formalized as a dilatation or contraction of the time by a factor , an additional degree of freedom that is used as pulse shape parameter.There is a conceptual difference between this approach and the one followed in the data analysis of gravitational waves experiments, in which the waveform is compared to a bank of different template signals, in order to find the one with the maximum correlation [8,9].In our case we are instead interested in finding any deviation from a given template, without modeling the unwanted signals.
The filter described in this paper is intended to improve the pulse shape discrimination capabilities for background rejection.This is the case for example of experiments looking for rare processes, such as low mass dark matter (DM) interactions [10], coherent elastic neutrino nucleus scattering (CENS) [11] and neutrinoless double beta decay (0) [12].
In DM and CENS experiments an extremely low detector energy threshold is required to identify the low energy nuclear recoil signals.Pulse shape parameters are used to clean the energy spectrum from detector artifacts and noise spikes close to threshold (see for example [13][14][15][16][17][18]) but their effectiveness is limited because of the low SNR, which can amount to just a few units.A non-identified excess background approaching the threshold is limiting several experiments in the field [19].
In 0 decay experiments, instead, the SNR can be very high, up to several thousands.However, a very good energy resolution and nearly zero background are needed for the signal identification.The LEGEND experiment, for example, identifies background events via their different pulse shapes [20,21], while pile-up events are expected to be the ultimate background of the CUPID experiment [22][23][24].

Mathematical formulation 2.1 Matched filter
The matched filter is based on the hypothesis that a waveform () can be decomposed as: where () is the known template signal with unitary amplitude, () is an additive noise component with known power spectrum,  is the unknown signal amplitude and  0 accounts for a possible time delay between the actual signal and the template.Since the noise may have a non-zero autocorrelation, a  2 minimization cannot be performed in the time domain and the estimation of  and  0 is instead performed in the frequency domain [2].This can be accomplished by defining: in which (  ) and (  ) are the Fourier transform of () and (), respectively, and  (  ) is the noise power spectral density.The minimization of this quantity, with respect to the two unknown variables  and  0 , provides the best estimators for these latter parameters.Minimizing Eq. 2.2 with respect to  yields the expression for the filtered waveform, which is a function of the delay  0 : where: It can be proved [25] that the best estimator for the pulse amplitude, Â, is the maximum of the filtered waveform Â( 0 ) in Eq. 2.3.Similarly, the time of the maximum of Â( 0 ) is the best estimator of the time delay, that will be referred as t0 .Finally, the expected amplitude resolution   amounts to: (2.5)

Extension to deformed pulses
The only two free parameters in the matched filter are the amplitude  and time delay  0 .This implies that the template signal () can only be scaled and time-translated to match the waveform ().In order to include a possible deformation of () with respect to (), we introduce the following modification to Eq. 2.1: where the additional degree of freedom  accounts for a possible dilatation ( < 1) or contraction ( > 1) of the template pulse shape ().If we assume that the deformation is small, the equations for the  2 minimization simplify considerably, as shown in the following.For small deformation we can write  as: and then perform a Taylor expansion of the deformed template pulse ( • ) around  = 0 as: in which the subsequent terms are  ( 2 ) and   is the "starting point" of the deformation.This approximated version of the deformed template pulse is the one that will be correlated to ().It has the property that, in the point  =   , it assumes the same value as the non-deformed template pulse (), while at increasing time distances from   the deformation increases.In practical applications,   can be chosen as the start time of the pulse to ensure a good Taylor approximation.Starting from Eq. 2.6 and using the approximation in Eq. 2.8, the frequency domain  2 now yields: in which Δ(  ) is the Fourier transform of the function Δ().The procedure is to minimize Eq. 2.9 with respect to the three unknown variables ,  and  0 .In particular, the estimated value for  is intended to be used as shape parameter, measuring the deformation of () with respect to ().
Minimizing with respect to  and  one gets their best estimates, Â( 0 ) and ε( 0 ), as a function of the time delay  0 : where: and . (2.13) The best estimate of the jitter, t0 , is obtained from the minimum of Eq. 2.9, after substituting  and  with Eqns.2.10 and 2.11.Then, the best estimators of amplitude and deformation are derived as Â = Â(t 0 ) and ε = ε(t 0 ), respectively.
The expected resolution on the parameters to be fitted,   , can be derived from the error matrix, . Therefore, the resolutions on  and  are, respectively: (2.14) (2.15)

Simulation
We validated our technique on simulated events.We generated pulses with rising and trailing edges described by a single time constant, respectively: where  is the discrete sample index, t is the pulse start time in samples,   and   are, respectively, the rise and decay constants of the pulse and is a normalization constant needed to have a model with unitary amplitude.The length of the window is chosen as  = 2000 samples, and the parameters in Eq. 3.1 are chosen as t = 400,   = 10 and   = 100 samples.A generic waveform hence has the form: where  is the pulse amplitude,  is the deformation,  0 is a time jitter with respect to t and [] is a simulated additive noise component with power spectral density following a  −1 behaviour and a unitary standard deviation .
For the data processing, Eqns. in Sec. 2 are ported to the discrete case replacing integrals with sums.In order to build the filter, the template pulse [] is estimated back from the generated data by averaging over 500 non-deformed ( = 0) and non-delayed ( 0 = 0) waveforms and the noise power spectral density,  [  ], is estimated from the average over 500 power spectra of generated noise traces.The parameter   in Eq. 2.8 in principle should be chosen equal to the true start time of the pulse t.However in real data the start time has to be estimated directly from the template pulse.In this simulation, we set   as the point at which the template reaches the 5% of its maximum amplitude on the rising edge.
In (3.4) As an example of simulation, Fig. 1 shows simulated waveforms with different values of  for SNR = 30 (left) and SNR = 1000 (right).

Low SNR results
A first dataset is composed of pulses with SNR ≤ 40 and a deformation like the one introduced in Sec.2.2.Every event in the simulation consists in a waveform of the form in Eq. and  0 extracted from an uniform distribution in the range [−10, 10] samples.Figure 2 shows the reconstructed parameters t0 , Â and ε as a function of the SNR.In the case of events with SNR = 0, which are noise traces, the time jitter is not estimated but we fix t0 = 0.In general, the reconstruction is well performed under the hypothesis of low deformation (blue and yellow dots in  The new parameter ε has been designed to discriminate between the shape of signal ( = 0) and background ( ≠ 0) events.In the following we compare its performance to other shape parameters often used for background rejection, such rise time (time interval between 10% and 90% of the signal maximum on the rising edge), decay time (time interval between 90% and 30% of the signal maximum on the trailing edge) and the  2 of the matched filter (Eq.2.2).
In order to compare the performance of different shape parameters  we define the discrimination potential as: where <  >  and  2 ,  are, respectively, the mean value and the variance of the parameter  evaluated on events with a deformation . Figure 3 displays the discrimination potential for the parameters considered, both in the case of low deformation ( = 0.2) and high deformation ( = −0.5),showing that our new shape parameter improves the discrimination at low SNR values by a factor 2.5 with respect to the rise time, 2 with respect to the decay time and 10 with respect to the  2 of the matched filter.
The variance of ε depends on the amplitude (see Eq. 2.15 and Fig. 2), and is not of practical use when one needs a selection variable.From Eq. 2.15, we can define a normalized pulse shape parameter as:  which has the advantage of having unitary standard deviation, being independent on amplitude and maintaining the same discrimination potential of ε.In this way, background events can be rejected with a constant cut that is energy independent, as shown in Fig. 4.
In order to assess the filter performance on events in which the deformation differs from the expected global time dilation/contraction, we simulated triangular waveforms with SNR = 10 intended to reproduce noise spikes.The length of such spikes is set to 50 samples, which is half of the pulse decay time constant   .A waveform of this kind, together with a signal pulse, is represented in Fig. 5. Also in this case, the pulse shape variable ε provided by the algorithm shows a clear separation between signal and background, as seen by the distributions in Fig. 5.The discrimination potential in this case amounts to Δ ε = 3.6, to be compared with Δ decay time = 3.0, Δ  2 = 1.4 and Δ rise time = 0.7.

High SNR results
We studied the performance of the algorithm on events with high SNR and very small deformation: SNR = 1000 ,  ∈ {0, 0.005, −0.01} . (3.8) In this case,  0 is reduced to the range [−2, 2] samples in order to emulate a more precise triggering given the higher SNR and for the estimation of t0 we interpolate the  2 function around its minimum with a cubic spline in order to reduce biases due to sampling (Eq.2.9 ported to the discrete case).
Then Â[ 0 ] and ε[ 0 ] are also evaluated through a local interpolation of their sampled values.Figure 6 shows the distributions of the reconstructed ε in the simulation, for different values of the true deformation .A small constant bias in the mean values, possibly still due to sampling, is present and amounts to around +0.001.The bias is however roughly the same for the different values of  and does not limit the discrimination.A similar bias is observed in the width of the distributions, which result 30 % higher than their theoretical value from Eq. 2.15.A strong separation between signal ( = 0) and background ( ≠ 0) is visible, and also in this case the discrimination potential of ε is more than a factor 2 higher than the one of the other parameters (see Tab. 1).
Finally, we tested the algorithm on pile-up events, simulated by summing up two identical pulses of the form 3.1 with a time delay Δ: Table 1.Comparison of the discrimination potential Δ  of the various parameters  considered in this analysis, for events with SNR = 1000.The first column is for events with deformation  = 0.005, the second one is for events with deformation  = −0.01.
and  chosen in order to have SNR = 500.As before, we are interested on the ability to identify signal events (Δ = 0) over background events (Δ ≠ 0) with our shape parameter ε.We tested small values of the separation, Δ ∈ {0, 1, 1.5, 2} samples, resulting in small differences between single and pile-up pulses, as shown in Fig. 7 (left).The discrimination potential Δ  is compared in Fig. 7 (right) to the rise time and the  2 parameters.The new shape parameter is once again more powerful in background identification than other variables.exploit the reconstructed deformation as a new pulse shape parameter, used to identify background events to be rejected.The signal/background separation provided by this new variable exceeds the one of other parameters by at least a factor 2 both at low and high SNR when the deformation is the expected one.This good behaviour is kept also with other kind of background events, like triangular noise spikes and pile-up pulses, making this filtering technique a promising tool for background rejection to be tested in a wide range of experiments.

Figure 1 .
Figure 1.Example of simulated pulses obtained from Eq. 3.3.Left: pulses with SNR = 30 and three values for the deformation .Right: pulses with SNR = 1000 and three small values for the deformation .In this case, no difference between different pulses is visible by eye.
the following we show the results of the application of the filter on different datasets, one featuring low SNR, emulating DM and CENS experiments, and one featuring high SNR, emulating 0 experiments.The values for the SNR are estimated from the original matched filter as: SNR =    .

Figure 2 .
Figure 2. Comparison between the estimates and the true values of the parameters, as a function of the SNR.In each plot, the colors identify different values of the true shape deformation .Data points and error bars are, respectively, mean values and standard deviations of the distributions.For display purposes, the points are slightly shifted on x-axis to prevent overlapping.The points are expected to lay on the dashed lines, in the case of a perfect reconstruction.Left: difference between the reconstructed and simulated time jitter t0 −  0 ; Middle: comparison between the reconstructed ( Â) and simulated () amplitudes; Right: reconstructed deformation ε.

Fig. 2
Fig.2are compatible with the dashed lines representing true values), proving the consistence of the mathematical formulation described in Sec.2.2.The new parameter ε has been designed to discriminate between the shape of signal ( = 0) and background ( ≠ 0) events.In the following we compare its performance to other shape parameters often used for background rejection, such rise time (time interval between 10% and 90% of the signal maximum on the rising edge), decay time (time interval between 90% and 30% of the signal maximum on the trailing edge) and the  2 of the matched filter (Eq.2.2).In order to compare the performance of different shape parameters  we define the discrimination potential as:

Figure 3 .
Figure 3. Discrimination potential Δ  () as a function of the SNR, for the pulse shape parameters  considered in this analysis: rise time, decay time, matched filter  2 and reconstructed deformation ε for  = 0.2 (left) and  = −0.5 (right).

Figure 4 .
Figure 4. Normalized pulse shape parameter εnorm as a function of SNR, for different values of  (colors).Mean values (points) and standard deviations (error bars) are obtained by performing a Gaussian fit to the distributions.For display purposes, points are slightly shifted on x-axis to prevent overlapping.The blue band represents a possible ±1 acceptance region to retain signal ( = 0) and discard background ( ≠ 0).

Figure 5 .
Figure 5. Left: comparison between a signal event and a triangular waveform, both with SNR = 10 after the matched filter.Right: distribution of the parameter ε on events of this kind.

Figure 6 .
Figure 6.Distribution of the variable ε on events with SNR = 1000.The blue distribution is for signal events ( = 0), while the red and the yellow are relative to background events with  ≠ 0. Gaussian fits of the distributions are represented as black solid lines.

Figure 7 .
Figure 7. Left: example of a simulated pile-up pulse (zoom on the rising edge), obtained from Eq. 3.9 adding two pulses with SNR = 500 and Δ = 2.For comparison, a single pulse with SNR = 1000 is also shown.Right: discrimination potential Δ  as a function of the time separation in pile-up pulses Δ, for different pulse shape parameters.