Convolutional Neural Networks for Direct Detection of Dark Matter

,


I. INTRODUCTION
The mystery of Dark Matter is one of the main motivations to search for physics beyond the Standard Model of Particle Physics.The detection of Dark Matter interactions beyond gravitational would be a crucial step for both the Particle and Astroparticle Physics communities.Direct Detection experiments such as XENON1T search for instances where Dark Matter particles scatter with Standard Model (SM) particles and transfer energy to them inside a detector.
For particle Dark Matter direct detection experiments, the main building block is cryogenic material single-phase time projection chamber (TPC), dual-phase TPC, and bubble chambers.Among the leading direct detection experiments, TPC is used in DarkSide-50 [1], LUX [2], PandaX-II [3] and XENON1T [4].In this work, we focus on the output of TPC as a part of the XENON1T experiment.The light signals recorded by the photomultiplier tubes (PMTs) due to the prompt scintillation and secondary scintillation are used to infer the type of interactions, namely to distinguish WIMP and background events.The dominant background sources are beta particles, neutrons and gamma-ray photons.
Typically, in order to achieve a large signal-tobackground ratio in the data, one requires a substantial number of cuts to the data to be performed based on certain discrimination parameters.To squeeze as much signal as possible, it is crucial to improve on recall of anomalous signal events, and more effective limits to these discrimination parameters are sought in an attempt to improve the detection probability of such anomalous events.
Machine Learning (ML) provides a unique and flexible alternative to these rigid cut-based methods.In particular, Deep Learning models such as Convolutional Neural Networks (CNN) are able to heuristically learn patterns in highly-complex input space, leading to an ability to detect anomalous signals without the need to manipulate or remove as much data.In this paper, a novel, Deep Learning model is developed using a CNN which can discriminate between simulated background and WIMP waveform images from the XENON1T experiment.
In the field of Particle and Astrophysics, Deep Learning is showing promising results (see e.g.[5]).Convolution Neural Networks has also been found very efficient for simulating the Dark Matter in N-body simulations of the galaxies [6].They also offer improved sensitivity for cosmological observations from weak lensing maps [7].Machine learning shows promising reach for disentangling among collider Dark Matter searches [8], using substructure based Dark Matter probes for noncollider searches [9][10][11], and for cosmological Dark Matter [12].This paper is organised as follows.In Section II we describe the XENON1T experiment, explaining how the time projection chamber is used to look for proposed WIMP events and the types of backgrounds.Section III is devoted to the process of simulating Dark Matter and electronic recoil background events using the open source data processing software created by the XENON collaboration.The architecture of the CNN used is explained in section IV, as well as the training and testing procedure for the model.In the last section V, we discuss the results.

II. WIMP AND BACKGROUND EVENTS AT XENON1T
The XENON experiment is an underground research facility, operated at the Laboratori Nazionali del Gran Sasso (LNGS) in Italy.Starting in 2006, the XENON10 experiment used a 25 kg (14 kg target mass) Xenon dual phase time projections chamber (XeTPC) to search for WIMPs [13].This was followed by the XENON100 experiment in 2008, a liquid Xenon TPC (LXeTPC) containing 62 kg target mass of LXe (161 kg total) [14].The most recent experiment is the XENON1T experiment, a 3.2 tonne LXeTPC with a fiducial volume of roughly 2 tonnes [4].XENON1T is designed for detecting nuclear recoils (NRs) from WIMP particles scattering off the Xe nuclei [15].
The TPC used in the XENON1T experiment was built roughly 97 cm long by 96 cm wide and contained (2004 ± 5) kg of LXe, with another 1200 kg of LXe used as shielding [15].The TPC was then placed in a 10 m tall water tank with a diameter of 9.6 m, used to further shield it from any background radioactivity.A vital part of a TPC is the photomultiplier tubes.When a photon hits the photocathode in the PMT it produces electrons which are then accelerated by a high-voltage field.The number of electrons increases within a chain of dynodes by secondary emission.A total of 248 PMTs are used in the TPC to record signals.They are split into the top array (which contains 127 PMTs) and bottom array (121 PMTs), in order to maximise the scintillation-light collection efficiency [4].Fig. 1 shows the PMTs and the TPC used in the XENON1T experiment.The PMTs with the highest quantum efficiency were placed in the centre of the bottom array since that is where most of the scintillation light is collected.More information about the XENON1T TPC can be found in [4].If an incoming particle interacted with the liquid Xenon, it would produce a scintillation of light and ionization.The S1 signal is the light seen by the top and bot-tom PMTs (due to total internal reflection at the boundary).The electron charges that were released during ionisation then drift upwards towards the gaseous Xenon (GXe) due to the electric field between the cathode gate and anode.They are then extracted by a stronger extraction field, E extraction , creating another larger scintillation light signal (S2) seen by the top PMTs.The position of the original interaction can be reconstructed in 3 dimensions by using the S2 signal pattern (giving the lateral position) and the difference in time between S1 and S2 (depth of interaction) [4].
However, there are many other processes that can lead to the creation of a light signal within the XENON1T experiment.For example, in order to reduce cosmic rays reaching the Xenon, the experiment was carried out deep underground, under the Italian Apennines.To reduce natural background radioactivity the TPC was submerged in a water tank and made from shielding material.Furthermore, only events that occurred within the inner tonne of LXe were used, allowing for the rest of the LXe to be used as more shielding.This was possible due to the large mass density of LXe (almost 3000 kg/m 3 ), meaning particles, such as gammas, can only travel for a short distance before being stopped [16].
Despite the shielding, there are six types of background events that can be detected within the search region.Table I shows the expected number of each of these background events, as well as the expected number of events for a spin-dependent 500 GeV/c 2 WIMP with a crosssection of 10 −45 cm 2 , over the time period of the first science run of XENON1T (34.2 live-days).This WIMP benchmark is chosen among the allowed values by direct, indirect and collider Dark Matter searches [17].The expected number of events were calculated using Laidbax [18] and the blueice Monte Carlo model [19].These results agree with the XENON1T collaboration paper ( [15]), however, they used a WIMP mass of 50 GeV/c 2 and a 10 −46 cm 2 cross-section.
The particles in the detector release energy in the form of a nuclear or electronic recoil (NR or ER).This means the incoming particle will either scatter directly from the nucleus of the target atom, or it will interact with the electron cloud [16].Since we expect WIMPs to have a very low interaction probability (low cross-section), they should only cause nuclear recoils.Therefore, our ability to differentiate between NR and ER is very important when searching for WIMPs.Table I shows that the number of expected WIMP events is much higher than for any of the NR background.Hence, in this work we have focused on comparing the ER background with WIMPs WIMP mostly causes the Nuclear Recoils (NR) but there are some backgrounds also which can have NR.One way the XENON1T experiment cut down on NR background was to only look at low energy, single-scatter nuclear recoils, since those are the type we expect WIMPs to create.To do this, only events that had at least one S2 signal above 200 single photoelectrons (PE) were included and the signal duration of S2 had to be consistent with the interaction depth (calculated from the drift time) [20].

FIG. 2. Illustrative plot of cS1 and cS2 for ER (blue) and WIMP (black) interactions.
The XENON1T was designed as an ultra-low background experiment including high rejection of ER backgrounds.Even when set to 50% NR acceptance the XENON1T could detect WIMP-like events while also rejecting roughly 99.8% of all the ER events.The main source of the ER background is the beta decays of 85 Kr and 214 Pb, which causes a flat energy spectrum within the interested range [20].
When reconstructing the energy deposits of particles interacting with the LXe, the S1 and S2 signals need to be corrected in case of any time or spatial dependent signal losses.The light signal, S1, is corrected for the (x,y,z)-dependent light collection efficiency in the TPC (cS1) [21].The charge signal, S2, is corrected for the time and depth dependent electron lifetime since electrons can be lost while drifting in the LXe if they attach to impurities within the Xenon (cS2) [21].In Fig. 2 we plot the distribution of these events for ER and WIMP interactions.

III. EVENT SIMULATION
In the following we introduce the three types of images that were used during our analysis, and explain how we generated them using PaX (Processor for Analysing XENON) software, created by the XENON collaboration At the first step, we generate the energy spectrum of the Dark Matter particle, using wimprates [24].Given the proposed mass and cross-section of a WIMP, wimprates produces the differential rates of a WIMPnucleus scattering in the standard halo model, within a liquid Xenon detector [24].In this work, we used a WIMP mass of 500 GeV −1 and a cross-section of 10 −45 cm −2 for illustration.More details on how to set-up the simulation are given in App. A.
Next, Laidbax (Likelihood-And Interpolated Density Based Analysis for XENON) [18] is used to convert the energy spectrum into a model that is compatible with the PaX [22] software which we use to create the graphical output of the simulated WIMP and background events.PaX has an in-built simulator called FaX (Fake XENON) which requires an input csv file consisting of a set of variables for each interaction to construct the waveform of the event.These variables are: instruction (simply a number given to each interaction); recoil type (NR or ER); x-position, y-position and depth of the interaction (in cm); number of photons produced; number of electrons produced; time of the interaction (in ns).
The ER background energy spectrum we used came pre-built in Laidbax.Note, though, that the models used in Laidbax are not the official models approved by the XENON collaboration, which include several more uncertainties.For example, the ER background model is a polynomial ER fit [20] and the yields for the NR background model are specified by the parameterisation of the global fit found in [25].The Laidbax model produces a database of simulated interactions.The parameters of the interactions include; radial distance r 2 , angle θ, z-coordinate, number of photons produced, number of electrons produces, the cS1 and cS2 values (corrected S1 and S2 signals).As mentioned earlier, in Fig. 2, simulated WIMP and ER background events are shown.We can see the overlap between these two types of events.To convert the Laidbax model to the input file for FaX, a series of calculations and assumptions have to be made.For example, the polar coordinates need to be converted to cartesian coordinates.The time variable is not given by Laidbax, hence it is assumed that the actual time of the interaction is not relevant, as long as it occurred within a particular time range that an event is defined by.

A. Image Processing
The original graphical output of PaX [22] is shown in Fig. 3.The images show the largest S1 and S2 peaks (top left), the hitpatterns for the top and bottom PMTs (top right), all the S1 and S2 peaks in the event (middle), and the PMTs that detected the signal (bottom).
In order to use the images as an input to CNN, we edited them to remove unnecessary features that are the same in every image so that the CNN can focus on features that are unique to a WIMP or background event.First, we looked at whether we could differentiate a WIMP event from the background just using the S1 and S2 hitpatterns.The images were edited to remove the text labels in order to increase effective learning.The final image produced is 800×400 pixels and an example is shown in Fig. 4. Next, we looked at the largest S1 and S2 peak graphs.Fig. 5 shows an example of the edited peaks for a WIMP event.Finally, we combined the hitpatterns and peak graphs into one "HitPeak" image of size 800×800 (shown in Fig. 6).The similar image for the ER is shown in Fig. 7; comparing this to Fig 6 we can see that both events are visually very similar, hence the need for Machine Learning to differentiate them.

IV. CNN ARCHITECTURE
This section discusses the details of a Convolutional Neural Network used in this work (python code can be found in the github repository [26]).
The majority of Machine Learning problems involve a dataset X={(y i ,x i ), i=1,...,N}, a model g(θ) with parameters θ, and a cost function C(X,g(θ)) (also known as the loss function).The cost function allows you to judge how well the model performs on the given dataset.A model is fitted by calculating the value of θ that minimises the cost function.
The most common way to minimise the cost function is to use Gradient Descent; an algorithm that finds the local or global minima of a function.The parameters are adjusted in the direction where the gradient of the cost function is large and negative, and then the gradient is recalculated in the new position.After each iteration the model gradually converges towards a minimum (where any changes to the parameters will produce little or no change in the loss) resulting in the weights being optimised.Given the cost function C(θ i ), it simultaneously updates for each i = 0,...,n until convergence is reached: The learning rate, η, controls how large each step is taken during gradient descent.
Neural networks (NN) contain multiple neurons3 stacked into hidden layers.The output of each layer then serves as the input for the following one.Each neuron takes a vector of inputs, x, and produces a scalar output a i (x).The function a i depends on the NN but it can be separated into a linear operation (which weighs the importance of the inputs) and a non-linear transformation (performed by an activation function).In this paper we used a Leaky ReLU (Rectified Linear Unit) activation function.
A NN calculates the gradient of the cost function using backpropagation.This algorithm contains a forward pass (going from the input layer to output layer), calculates the weighted inputs and activations for all the neurons, and then backpropagates the error (output to input layer), calculating the gradients.
A Convolutional Neural Network is a type of Neural Network Machine Learning algorithm that primarily takes images as its input and assigns weights and biases to different parts of the image.A CNN is comprised of many layers of different types, including Convolutional Layers, Pooling Layers, and Fully-Connected (FC) Layers.The convolutional layer is used to extract features from the input image by passing a filter (kernel) over the image matrix.To perform different operations on the image, such as edge detection or sharpening, different types of filters are used.The layer outputs the image matrix at a reduced volume, depending on the size of the filter.A nonlinear activation function is applied after each Convolutional layer.
Pooling layers are used to reduce the dimensions of the data by combining the output of a neuron cluster in one layer to a single neuron in the next.The FC layer connects the neurons to all the activations in the previous layer.
FIG. 6. Example of a 800×800 HitPeak image showing both the hitpatterns and largest peaks together from a simulated WIMP event.The axis labels and numbers are removed so all images are in the same style when used in the CNN.
The final layer is non-linear classification layer and assigns decimal probabilities to each mutually exclusive class.
Before running the images through CNN, we scale down the image resolution.The original images had a resolution of 800×400 for the individual Hit and Peak graphs and 800×800 for the images showing both (Hit-Peak).A lot of computational power would be needed to run these images through the CNN due to the large number of parameters.We chose to scale the HitPeak images to 75x75 resolution and the separate Hit and Peak images to 100x50.Next, the images are converted into a single array, the shape of which depends on the image spatial resolution.Each pixel is one of 256 possible values (0-255) since the images are 8-bit, i.e. 256 = 2 8 .Most Machine Learning algorithms perform better on small, floating point values instead of large pixel values.Hence, we scale the image pixels to between 0 and 1 by dividing by 255.Each image is labelled as signal or background since we are using supervised learning.The data is split into training and test sets.We separated 90% of the images into the training set with the other 10% into the test set.A further step is to create another subset from the training data, assigning 20% of the training set as the validation set.The validation set provides an unbiased evaluation of the models fit on the training data while also tuning the hyperparameters 4 .The final model can then be used on the test dataset.
The CNN was created using the Keras Sequential model within the Tensorflow [27] environment to linearly stack the layers.The number of convolution layers and number of neurons are two of the hyperparameters we tested during this project (see Sections IV A 2 and IV A 3).Each convolution layer contains an L2 weight regularizer ("kernel regularizer") where the alpha value5 is 0.005.Regularization is used to prevent overfitting due to intrinsic noise.
Each convolutional layer also includes a LeakyReLU activation function and a pooling layer.Max pooling, with a pool size of 2x2, is used to reduce the spatial dimensions of the output volume.This means after each pooling layer the output volume is half that of the input volume.
The classification stage has the most parameters and so requires a "dropout" regularisation layer to prevent overfitting (with the dropout rate set to 25%).The data is flattened into a 1-dimensional array to connect the 2-dimensional convolutional and pool layers to the 1dimension FC layers.The output layer contains a single neuron which is used to make predictions.A sigmoid activation function produces a probability output between 0 and 1.
Finally, the model is compiled using an Adam optimiser [28] to minimise the cost function.Adam (derived from 'adaptive moment estimation') computes adaptive learning rates for each parameter.Binary cross entropy was used for the loss function (since we were completing a binary classification task).Fig. 8 shows the architecture of CNN, created using Tensorboard6 .Finally, the model is fitted with the defined parameters and the accuracy and loss for the test dataset are calculated.

A. Testing the CNN
We simulated 10,000 events for both WIMP and ER particles, making a total of 20,000 images.Most of the data (90%) was included in the training data set with the other 10% in the test set.The training set was then split again to include a validation set (14,400 training and 3600 validation images).The test accuracy and loss (value of the cost function) are given for each test.
The CNN assigns random weights and biases at the beginning of each iteration and so there will always be a slight variation in results after each run.Therefore, the model will give slightly different results in each test, even when using the same parameters.The CNN was run in a CPU TensorFlow environment.
In this project we compared the effect of using the three types of images as well as multiple different hyperparameters, including the number of neurons in a convolution layer, number of convolutional layers in the CNN, the batch size, and the number of epochs.

Image Type
First, we compared how the type of image affects the CNN output.To do this we tested each of the three image types (Hit, Peak and HitPeak) using a one convolution layer CNN with 20 neurons over 40 epochs, with a batch size of 100.Fig. 9 compares the accuracy and loss for the training and validation datasets of the three images.Table II shows the accuracy and loss for their test sets.We can see that the HitPeak images give far better results than the seperated images.Note, the HitPeak images have slightly more parameters (519,181) than the seperated images (471,181) due to the shape of the original figures.The rest of the tests discussed in this paper use the 75x75 HitPeak images.

Number of Neurons
Next, we explore how the choice of neuron number affected the performance of the CNN.The number of neurons represents the number of filters that run in parallel for a given input, allowing the model to learn multiple features.To do this we used a CNN with one convolutional layer and set the neuron number to 5, 10, 20, 40 and 60.Fig. 10 shows the training and validation results for the different number of neurons for the HitPeak images over 40 epochs with a batch size of 100.Table III shows the test accuracy and loss for each neuron number as well as the number of parameters in the CNN.The results show that the test accuracy increases with increasing neuron number before peaking at 20 neurons (0.865) and then decreasing.This is due to the fact that while more neurons can express more complicated function, using too many neuron on a simple dataset can lead to overfitting on the test data.

Number of Convolutional Layers
Having multiple convolutional layers allows a model to extract more complex features.The earlier layers (closer to the input image) learn the lower level features, with the complexity of the features increasing with each layer.To observe how the number of convolutional layers affect our model, CNNs with 1, 2 or 3 layers were used.Each layer contained 20 neurons since that was the optimal number found for one convolutional layer in the previous test.
The results, shown in Fig. 11 and Table IV, show that two convolutional layers give the best test accuracy (0.870) and loss (0.372), as well as the best training/validation accuracy and loss.Since our images are not very complex they do not have many features, and so a very deep CNN with three or more layers is not needed.This conclusion was also reached when increasing the number of neurons in each layer.
As the number of convolutional layers increases, the total number of parameters decreases (Table IV).This is because there is a pooling layer associated with each convolutional layer which reduces the shape of the output volume.Hence, even though each convolution layer introduces new parameters, the number due to the output volume (when the data is flatten into a 1D array) will be much less than for a CNN with one convolutional layer.

Batch Size
We tested the 75×75 HitPeak images over 40 epochs using the CNN with two convolutional layers (with 20 neurons each) using different batch sizes; 50, 100, 200, 500 and 700.Results are shown in Table V and Fig. 12.
We can see that the smaller the batch size, the faster the model will converge to a "good" solution (steeper curve at the beginning), since the model will start learning before seeing all the data.However, this does not guarantee the model will converge to the best possible result (as it would when using the whole dataset).The best test accuracy (0.870) and loss (0.385) were found using a batch size of 100, hence this batch size was used in the future tests.In our tests we have found that a CNN with two convolutional layers (both containing 20 neurons) using the HitPeak images scaled to 75×75 for 40 epochs and a batch size of 100, gives the best results overall.Finally, we test the CNN with all the optimal parameters.Fig. 14 shows the ROC Curve for the optimal CNN.Our final run gives an accuracy of 87.0% and loss of 36.7% for the HitPeak images.We also show the ROC curves for separate Hit and Peak images.

V. CONCLUSIONS AND OUTLOOK
In this work, we have introduced a new method of differentiating between proposed Dark Matter (WIMP) events and the background events (electronic recoils) in the XENON1T experiment using the graphical output of PaX.The CNN correctly identified a proposed WIMP event 87.0% of the time.When testing our CNN, we found that images which showed both the hitpattern and largest peak figures (HitPeak) gave the best accuracy, compared to images showing only one or the other.
We found that 20 neurons is the optimal number of neurons in a CNN with one convolutional layer (0.865 accuracy and 0.395 loss).The number of neurons in a convolution layer represents the number of filters used.The higher the neuron number, the higher the total number of trainable parameters.Since our images are not very complex, the CNN does not require many neurons.The best results were obtained when using a CNN with two convolution layers with 20 neurons in each (0.870 accuracy and 0.372 loss).This test was also carried out using different numbers of neurons in each layer (first layer having 20 neurons, then increasing by 20 each layer); the results followed the same pattern as using a constant number of neurons.The optimal batch size was found to be 100 (0.869 accuracy, 0.385 loss).The smaller batch sizes reached their maximum performance quicker than the larger ones since the model will start training before all the data is seen.If the batch size is too large it can lead to poor generalisation, resulting in lower test accuracy.Finally, 40 epochs gave the best results overall (0.870 accuracy and 0.370 loss).One could run CNN using a GPU (Graphics Processing Unit) which would be faster than the current CPU and would be able to use higher resolution images, possibly resulting in higher accuracy.This set-up could be used for the real Dark Matter and background models created by the XENON group to simulate more realistic events.The CNN can then be adapted and improved using this data, with the aim of running CNN using real data generated during the XENON1T experiment.If this stage is successful then the method could be used in the next XENON collaboration experiment; the XENONnT (an upgrade of the XENON1T).Experiments aiming for the detect Dark Matter via direct detection focus on the particular regions of the detector to improve the signal and background ratio.This method does not require any reduction in the data sets.It just tries to differentiate the events based on the PMTs signals.In fact same idea could be extended for other TPCs based detectors.Finally, this work could be further extended by the ML based models to identify the rare WIMP events among the many different type of background events, not just the dominant ER background.
VI. ACKNOWLEDGMENTS C.K.K. wishes to acknowledge support from the Royal Society-SERB Newton International Fellowship (NF171488).The work of V.S. by the Science Technology and Facilities Council (STFC) under grant number ST/P000819/1.We would like to thank Michael Soughton and Adam Matthews for many fruitful discussions.We would also like to acknowledge Prof. Christopher Tunnell and Dr Jelle Aalbers for their help in explaining how we could use PaX, and the other open-source software they created, to carry out this work.

FIG. 3 .
FIG. 3. Example of the graphical output of PaX for a simulated WIMP event.Top two plots on the left: largest S1 and S2 peaks.Top two plots on the right: hitpatterns for the top and bottom PMTs.Middle plot: S1 and S2 peaks in the event.Bottom plot: Red dots represent PMT channels that have detected coincident signals, whilst the green dots represent signals detected in a lone PMT.

FIG. 4 .
FIG.4.Example of the edited 800×400 hitpattern image from a simulated WIMP event.The axis labels and numbers are removed so all images are in the same style when used in the CNN.

FIG. 9 .
FIG. 9. Accuracy and Loss for the training and validation data for each type of image (Hit, Peak and HitPeak).

FIG. 10 .
FIG.10.Accuracy and Loss dependence on number of neurons in a single convolution layer(5, 10, 20, 40 and 60).20 neurons give the best results overall.

TABLE I .
1. Expected number of events for each type of background over the time period of the first science run of XENON1T (34.2 live-days) within the fiducial mass and a 500 GeV/c 2 , 10 −45 cm 2 WIMP (Generated using Laidbax).

TABLE II .
Test accuracy and loss for the three types of image; Hit, Peak and HitPeak.

TABLE III .
Test accuracy and loss for different neuron numbers in a CNN with one convolutional layer for HitPeak images.

TABLE IV .
Test accuracy and loss for 1, 2, or 3 convolutional layers for HitPeak images.There are 20 neurons in each convolution layers.

TABLE VI .
Test accuracy and loss for different number of epochs (CNN has two convolutional layers with 20 neurons each and batch size of 100).