Pulse shape discrimination and exploration of scintillation signals using convolutional neural networks

We demonstrate the use of a convolutional neural network to perform neutron-gamma pulse shape discrimination, where the only inputs to the network are the raw digitised silicon photomultiplier signals from a dual scintillator detector element made of 6Li F:ZnS(Ag) scintillator and PVT plastic. A realistic labelled dataset was created to train the network by exposing the detector to an AmBe source, and a data-driven method utilising a separate photomultiplier tube was used to assign labels to the recorded signals. This approach is compared to the charge integration and continuous wavelet transform methods and a simpler artificial neural net. It is found to provide superior levels of discrimination, achieving an area under the curve of 0.996 ± 0.003. We find that the neural network is capable of extracting interpretable features directly from the raw data. In addition, by visualising the high-dimensional representations of the network with the t-SNE algorithm, we discover that not only is this method robust to minor mislabeling of the training dataset but that it is possible to identify an underlying substructure within the signals that goes beyond the original labelling. This technique could be utilised to explore and cluster complex, raw detector data in a novel way that may reveal more insights than standard analysis methods.


Introduction
Efficient pulse shape discrimination (PSD) techniques are required in many applications, from radiation measurements to particle physics experiments.PSD is a typical classification problem whereby digitised waveforms can be discriminated based on their time and energy features.Such features or characteristics are usually dictated by the underlying physical process that occurs in the detection medium.For example, in scintillation signals, a common characteristic used for PSD is the decay time of the detected scintillation light, which is dictated by the atomic or molecular structure of radiative states.
In this paper we focus on one specific application of PSD using 6 LiF:ZnS(Ag) phosphor screens similar to those used in the SoLid reactor neutrino experiment [1] and in other specific neutron detection applications where high discrimination is required.The neutron capture reaction on 6 Li produces highly ionising particles that excite the ZnS(Ag) scintillator: Thin sheets of 6 LiF:ZnS(Ag) provide high neutron detection efficiency due to the large 6 Li neutron capture cross section and the high scintillation yield of ZnS(Ag).Strong gamma-ray background rejection can be achieved as a result of the large difference between fast and slow ZnS(Ag) scintillation components for electron and nuclear interactions, respectively.As our focus here is the classification of those two type of signals, we make the distinction between these scintillation responses by defining as electron scintillation (ES) signals, the interactions from gamma-rays and other charged particles (electron, positrons, muons, etc.) and nuclear scintillation (NS) signals as those produced by nuclei such as protons, alpha particles or heavier ions.The 6 Li neutron capture reaction that produces a tritium and alpha therefore produces a distinctive NS signal in this detector medium.Traditional pulse-shape discrimination (PSD) techniques that utilise time domain information such as charge-integration [2] are popular for their robustness and reasonable performance.Other methods based on frequency information [3] can achieve superior performance by exploiting better the information available, but are more computationally expensive.The performance of these approaches is dependent on a number of factors that include the choice of scintillator, experimental requirements and read out technique.They also tend to have limitations with real data that exhibit a number of additional features such as pile up or background interactions.
Neural networks have the potential to be superior classifiers provided they are trained with realistic data.In particular, convolutional neural networks (CNN), are well suited to raw data that have high local correlations such as waveform signals.They are very successful in computational vision tasks and can reach high performance with limited datasets.Once trained, obtaining a classification output from a CNN is also very fast.
In this work, we present the results of using a CNN to discriminate between ES and NS signals using the raw scintillation pulse from a single PVT cube with 6 LiF:ZnS(Ag).First, we describe the experimental set up that was used to collect a labelled dataset for the training of the network.We then present the details of the CNN architecture as well as two other common PSD algorithms, in order to provide a set of benchmarks that are compared to the results of the CNN.Finally we investigate and discuss the features learned by the network, demonstrating that they are interpretable and can be utilised to identifty a clear substructure within the ES and NS signals.

Experimental Dataset
In this work, supervised learning was used to train the CNN and therefore a labelled dataset of ES and NS signals was required.Instead of using a set of Monte-Carlo generated signals which may contain unrealistic features, a dataset consisting of real scintillation signals was collected.A schematic of the experimental setup used to collect this data is shown on Fig. 1.The detector element consists of a neutron sensitive 6 LiF:ZnS(Ag) screen coupled to a polyvinyltoluene (PVT) scintillating cube, which is only sensitive to ES signals.The detector element was exposed to an AmBe source that emits neutrons and gamma-rays over a wide range of energies, and therefore provides a realistic range of signals for the training set.A small fraction of the dataset includes signals from background interactions such as muons and natural radioactivity and the reasonable activity of the source also results in a fraction of pile up pulses.Scintillation light from both scintillators was collected by wavelength shifting fibres placed in grooves on the side of the cube and registered by two Hamamatsu MPPC S12572-050P silicon photomulitpliers (SiPMs) attached to one end of each fibre.To enhance the reflection of light inside the cube and collection of that light by the fibres, the detector element was wrapped in Tyvek.
A Philips XP1911/UVPA photomultiplier tube (PMT) was placed on top of a hole cut into the Tyvek, and used to detect a large fraction of the blue scintillation light that is not collected by the fibres.The greater collection of emitted light in the PMT was intended to provide strong discrimination of NS and ES signal so that labels can be confidently assigned to the corresponding signals seen by the SiPMs.The data from both the PMT and SiPMs was digitised using a CAEN VX1724 digitiser card at 100 MS/s with a 14 bit range, and the larger PMT response was used to trigger the acquisition of the SiPMs signals.To ensure that the majority of the slow ZnS scintillation component was acquired, the maximum length of each waveform was set to be 1000 samples.
Labels were then assigned to the NS and ES signals using a simple PSD method based on the time-over-threshold (TOT) and maximum peak amplitude (MPA) of the PMT signals, which provides very good results for discriminating events at a low threshold.Fig. 2 shows the PSD parameter value as a function of signal amplitude for each of the PMT waveforms.As a result of the long shaping of the PMT pulse, NS events have a much longer decay time and are above threshold

Pulse Shape Discrimination
Pulse shape discrimination techniques utilise the dominant shape and amplitude features of digitised signals by selecting a subset of the total information and compressing this into a reduced quantity.An alternative approach that can take advantage of all the information within a signal is to directly use the digitised signals as an input to a suitable machine learning algorithm, which is trained to assign labels to individual signals.This has been shown to provide superior performance in specific applications [4] [5].An ideal machine learning algorithm for pulse shape discrimination are convolutional neural networks (CNN) [6] as they have the ability to efficiently extract and combine locally correlated features directly from raw data.As a result they have become the established technique for complex image recognition problems and in recent work have been applied in high-energy physics data analysis where they have been shown to provide comparable or improved performance in tasks such as background rejection [7] [8] [9], event reconstruction [10], and event classification [11] [12] [13] when trained on the raw detector outputs.In this section we introduce the optimised CNN architecture used for classifying ES and NS signals, as well as two commonly used PSD techniques: the charge integration and continuous wavelet transform methods whose results are used as a benchmark to compare against.

Charge Integration
The charge integration (CI) method [2] is a a robust and easily interpretable method of pulse shape discrimination.It uses the ratio of pulse area contained in the tail of a pulse to that contained within a short, initial time period.For a signal f (t), a short integration window, Q short = t short 0 f (t)dt, and long integration window Q long = t long 0 f (t)dt are defined and the quantity can be used to discriminate between pulse types.The optimal values of the parameters t short and t long were found to be 245 ns and 497 ns, respectively.

Continuous Wavelet Transform
The continuous wavelet transform (CWT) [3] provides a powerful method of pulse shape discrimination that is capable of utilising information from both the time and frequency domains of a signal.The continuous wavelet transform of a signal f (t) at a scale a and shift b is defined as which can be interpreted as the convolution of the signal with a series of scaled and shifted wavelets ψ(t).The energy density of the wavelet transform at a scale a is defined as which depends heavily on the shape of the signal.Two different scales, a 1 and a 2 , can therefore be chosen to discriminate between different signal shapes using the variables f 1 = P(a 1 ) and f 2 = P(a 1 )/P(a 2 ).A hyperplane in the ( f 1 , f 2 ) space can then be used to select different signal shapes.Using the Ricker wavelet, the optimal values of a 1 and a 2 were found to be 1 and 900, respectively.

Convolutional Neural Network
CNNs are an extension to feed-forward neural networks that are capable of extracting locally correlated features from a multi-dimensional input.This is achieved through the use of convolutional layers and pooling layers.For a one-dimensional input of length k, a convolutional layer is specified by a set of α filters, each with a width m and a set of learnable weights w i (i = 1...m).This layer transforms an input f with c separate channels each of length k through the operation where f β is the β channel of the input, h αβ i are the learnable weights of the filter of the α output channel applied to the β channel of the input, and g is a non-linear activation function.The resulting outputs F α i are the feature maps, which can be interpreted as a non-linear representation of the input that consists of features extracted by the filters.The filters are learnt during a supervised learning process that minimises a loss function, which for ES/NS discrimination is the standard binary logistic loss.The pooling layers perform dimensionality reduction of the feature maps, and therefore reduce the total number of learnable parameters of the network.Max-pooling, defined by taking the maximum value in a small region of the input, is used in this work as it provides a degree of translation invariance.Common CNN architectures use successive convolutional and pooling layers to produce multiple abstract representations of the input, which can then be combined to perform classification.Further explanation of CNNs can be found in [14].
In this work, a CNN architecture was developed that classifies the raw, digitised signals obtained from the experimental set-up introduced in section 2 as either ES or NS signals.For a fair comparison with conventional PSD techniques only one of the SiPM signals was used.A schematic of the complete architecture is given in Figure 4, with the specific parameters of the convolutional and pooling layers given in Table 1.The CNN transforms each signal through two successive convolution and pooling layers (window size of 2), where the non-linear ReLU activation function was used in both convolutional layers.The resulting feature maps are fed into a fully-connected layer with 64 neurons, that each have the ReLU activation.The output of the CNN consists of a single neuron with the softmax activation, and can therefore be be interpreted as the probability of the signal being NS.To classify signals, a threshold must be determined above (below) which all signals will be classified as NS (ES).Keras [15] was used with the Tensorflow [16] backend to train the CNN.The weights of the network were optimised by minimising the cross-entropy loss on a training sample of 12 000 signals with an equal proportion of ES and NS signals.The network was trained for a total of 50 epochs using the Adam optimsier [17]   batch size of 256 samples and initial learning rate of 0.001.

Results
The performance of the CNN as well as the CI and CWT methods was evaluated on a seperate test sample of 3 000 signals. Figure 5 compares the ROC curves of the CNN to the CI and CWT methods on this sample, with the AUC metrics given in Table 2. Errors on the ROC curves and AUC metric are estimated by repeatedly evaluating the performance on random samples of the test set.The CNN is able to achieve a significantly higher level of ES/NS discrimination compared to CWT and CI methods, with an AUC of 0.995 ± 0.003.

Feature Interpretation
The superior performance of the CNN is due to its ability to efficiently extract and combine the features relevent for discrimination, directly from the raw signal.To understand these features, the normalised feature maps of the two convolutional layers are visualised in Fig. 6 for representative NS and ES signals.It can be seen that in the first convolutional layer, most of the filters activate at the peaks of the signals and it can be see that filter 4 activates only at the maximum of each peak.This allows the CNN to encode both the number of peaks in a signal along with their relative amplitudes.Filter 6 appears to activate when the signals are below some threshold, which allows the CNN to extract quantities similar to time-over-threshold.It can be seen that in the second convolutional layer the feature maps become more complex, with multiple features of the first convolutional layer combined.For example, filter 4 appears to activate both on low ampltitude regions of the signal as well as at the steeply rising edges of high amplitude pulses.
Futher understanding of how the CNN interprets the signals can be obtained by visualising the high dimensional representation of each signal that is encoded within the 64-dimensional space of the fully-connected layer.The t-SNE algorithm [18] is used to create a two-dimensional embedding of this space, which preserves a level of the local and global structure of the original space.This allows us to visualise in a qualitative way both the features encoded by the CNN, and to determine clusters of signals that have similar representations.The results are shown in Fig. 7, with each signal in this space coloured by the magnitude of the CNN output.It can be seen that the t-SNE shows the signals spread along an S-shape, forming two distinct ES and NS clusters.Within these clusters, four regions (A, B, C, D) of distinctly different signals have been highlighted, with representative signals in these regions shown in Fig. 8.By examining these regions we find that there is is a continuous substructure of signals with varying energies, that gradually transforms from ES to NS signals, and that the CNN is using both the shape and energy information to discriminate between signals.The upper cluster consists entirely of NS signals, with those close to region A having a large amplitude pulse and a slowly decaying tail with many peaks, characteristic of high energy NS signals.Moving through this cluster we find waveforms with a continuous decrease in the amplitude and length of the NS pulse.
A similar structure is seen for the lower cluster of ES signals.the CNN is not able to confidently classify as either ES or NS.These appear to be a collection of low energy signals that have several prompt peaks, most likely to be the pile-up of single photon avalanches.
The signals highlighted in black are the NS (ES) signals that have a CNN output value of less (greater) than 0.5, and would therefore likely be misclassified when threshold cut is placed on the CNN output.The majority of these events appear in the cluster of ES signals and upon further inspection of these signals, they are clearly ES or pile-up signals that were incorrectly labelled by the simple selection shown in Figure 3.The fraction of misclassified signals is O(1%) and is therefore in agreement with the original estimation of the contamination of the NS sample.This demonstrates that the CNN is robust to, at least, a small proportion of mislabeled training data and can correct the originally mislabelled signals.
Furthermore, it is possible to use the discovered substructure shown in Fig. 8 to further subclassify and divide pulses beyond that of the initial labeling of ES and NS, and could be used for example to remove pile-up or introduce a more suitable set of labels for the signals.To summarise, our investigation suggests that first, it is not necessary to have a perfect set of labels, the CNN is robust to a small proportion of misclassification and second, supervised learning is effective at recognising a wide range of sub-structure in the data.Through the exploration and clustering of data in this low-dimensional embedding, it is possible to further explore and understand a raw detector dataset without requiring unsupervised learning techniques.

Conclusions
In this work we have demonstrated a convolutional neural network architecture that provides superior performance in the classification of digitised signals compared to traditional PSD methods, achieving an AUC of 0.995 ± 0.003 on the set of signals obtained from a single 6 Li PVT detector element.By investigating the representations learned by the CNN we have shown that it is possible to interpret the features extracted by the convolutional layers and therefore gain an understanding of how the CNN discriminates between different signal types.Further to this, we have shown that by visualising the high-dimensional representations it is possible to identify substructure within the signal types, even though the CNN was trained to perform a binary classification task.This approach could be utilised in many other areas of physics data analysis, such as to discover clusters of events in raw detector data without relying on hand-crafted variables.

Figure 1 :
Figure 1: Schematic of the experimental set-up used to record scintillation signals from the PVT cube.

Figure 2 :Figure 3 :
Figure 2: Distribution of PMT signals based on their time ove threshold value versus the amplitude of the maximum peak captured in the waveform.The dashed red line shows the selection cut used for labelling the data.

Figure 4 :
Figure 4: Schematic of the CNN architecture used in this work.The inputs to the network are individual signals of length 1000 samples.Two successive convolutional and pooling layers extract features from the signal which are combined in the fully-connected layer.The output layer consists of a single neuron with the softmax activation, which represents the probability of a signal being NS.

Figure 5 :
Figure 5: ROC curves of the CNN (green), CWT (blue) and CI (red) methods.The shaded bands represent the 1σ deviation on the true positive rate obtained by repeated sampling of the test set.

Figure 6 :Figure 7
Figure 6: Visualisation of the normalised filter outputs of the first (left) and second (right) convolutional layers for representative NS (blue) and ES (red) signals.In the right plot, the input signal has been downsampled to show the correlation with the second convolutional layer.

10 AmplitudeFigure 8 :
Figure 8: Representative signals for each of the labeled regions identified in the 2d t-SNE embedding.A: High energy NS signals.B: Pile up of single photon avalanche signals.C: Low energy ES signals.D: High energy muon signals.

Table 1 :
Parameters of the convolutional and pooling layers of the CNN architecture.

Table 2 :
Area under the curve (AUC) values for each of the PSD methods considered in this work, with 1σ uncertainties given in brackets.