The use of Wasserstein Generative Adversarial Networks in searches for new resonances at the LHC.

In the search for physics beyond the standard model, machine learning classifiers provide methods for extracting signals from background processes in data produced at the LHC. Semi-supervised machine learning models are trained on a labeled background and unlabelled signal. When using semi-supervised techniques in the training of machine learning models, over-training can lead to background events incorrectly being labeled as signal events. The extent of false signals generated must therefore be quantified before semi-supervised techniques can be used in resonance searches. In this study, a frequentest methodology is presented to quantify the extent of fake signals generated in the training of semi supervised DNN classifiers when confronting side-bands and the signal regions. The use of a WGAN is explored as a machine learning based data generator.


Introduction
Machine learning classifiers are mathematical algorithms that are successfully used to discriminate between signal and background processes contained in the data produced at the LHC.The extraction and further analysis of signal processes enables searches for new physics.The semi-supervised technique is used to train machine learning classifiers on labelled background processes without labelling the signal processes of interest.As this method uses unlabelled signal samples, the model reduces potential biases caused by preconceived notions defining the signal and therefore provides potential for exposing physics beyond the standard model.During the training of semi-supervised models, over-fitting can lead to background events being incorrectly labelled as signal events.In order for this method to be used to discover new physics, the extent of error produced, in the form of false signals, must first be quantified.The extent of false signals generated, in the training of semi-supervised classifiers, is the focus of this study and can be measured using the methodology proposed.When conducting these kinematic scans and/or resonance searches within a given mass range, the significance of observing a local excess of events must consider the probability of observing the excess elsewhere within the range.This is known as the "look elsewhere effect" and must be controlled for resonance searches, Ref. [1].The measurement of fake signal must therefore account for this effect using global and local significance measurements.The use of artificial intelligence, AI, based event generators can be used to scale datasets to have sufficient number of events, and therefore statistics, to perform a frequentest study, overcome the "look elsewhere effect", and enable more efficient

Dataset
The Zγ final state dataset is an ideal dataset for this study as it contributes to more than 90% of the total backgrounds, in the production of the Higgs like heavy scalar decaying to Zγ (pp → H → Zγ) events, where Z → e + e − or Z → µ + µ − .The Zγ Monte Carlo, MC, dataset, used to train the event generator, was simulated using Madgraph5 [14] and Delphes(v3) [15].As the study is measuring fake signal generated in the training of machine learning classifiers, no signal events are included in this study.The study is conducted around a fixed invariant mass, m ℓℓγ , of 150 GeV.The features which the WGAN is trained to generate are shown in Figure A1.The features used to train and evaluate the semi-supervised DNN model are ∆R ℓℓ , ∆ϕ ℓℓ , E miss T , Φ E miss T , ∆Φ(E miss T , Zγ), N j , and N cj .

Wasserstein Generative Adversarial Network as Data Generator
The Wasserstein Generative Adversarial Network, WGAN, with gradient penalty is a machine learning based data generator [16].The WGAN is continuously demonstrating its ability to generating physics events, and is an optimal model for this problem [17].The WGAN is made of two competing neural networks.The generator network is trained to synthesise the MC dataset as realistically as possible.The critic network is trained to be able to classify if a dataset accurately reflects the statistics of the MC training data.A detailed description of the WGAN loss functions can be found in Appendix A.

Methodology
The frequentest methodology consists of repeating the given pseudo-experiment sufficient times to understand the likelihood and extent of false signals generated in the over-training of semisupervised DNN classifiers.The objective of the pseudo-experiment is to quantify fake signal generated in the training of a semi-supervised DNN classifier.In order to achieve this a pretrained data generator is used to produce statistically independent background events for each experiment.The events are used to train a DNN and a background rejection scan is applied to the response.The invariant mass, m ℓℓγ , of each background rejection batch is fit to extract the significance of signal found around a centre of mass of 150 GeV.

Pre-Trained Data Generator
The WGAN is trained on fast simulation Monte Carlo Zγ events, and it's hyper-parameters are optimised to produce realistic events.For each pseudo experiment the pre-trained WGAN is used to generate 200, 000 events.The quality of the generated kinematic feature distributions, are evaluated visually and with the Kolmogorov-Smirnov, KS, score and the bin-wise relative difference, RD, metrics.The correlation of the generated dataset is evaluated visually and with a metric of maximum difference in correlation between real and generated events.The pre-trained WGAN optimised architecture and hyper-parameters are discribed in Appendix Appendix A.1.

Semi-Supervised DNN Training
The DNN is trained on the Zγ generated background data.The dataset is divided into two samples for training using the invariant mass, the mass-window and side-band.The mass-window or signal region is defined as events with 144 ≤ m ℓℓγ ≤ 156.The side-band or background region is defined as 132 ≤ m ℓℓγ < 144 or 156 < m ℓℓγ ≤ 168.As there is no signal in either sample, the DNN should not find significant separation between samples.A DNN model optimised for semi-supervised classification, Ref. [18], is used in this analysis.The DNN is trained for 100 epochs, using a learning rate of 1x10-3 and batch size of 256.The DNN architecture consists of three hidden layers, each of 200 nodes, and ReLu activation function, and a single output node using a sigmoid activation function.

Background Rejection Scan
In order to evaluate local signal excesses within the fixed mass range, a background rejection scan is used.Events are extracted from the DNN response distribution in batches of 50, 60, 70, 80 and 90% of the events.For each batch, Events are mapped to their corresponding invariant mass distribution.Each batch's invariant mass distribution is fit with an exponential function, f (x), which exposes the distribution of background, B, events.A second fit, using the exponential function with an added Gaussian, g(x), is applied with the Gaussian centred at the center of mass, 150 GeV, and σ equal to the resolution of the dataset, 2.4.The Gaussian therefore is able to represent any signal, S, events found within the mass window.As there are no signal events within the analysis dataset, any signals found can be assumed to be generated within the training of the DNN.

Signal Significance Calculation
The local signal significance, for each background rejection batch, k, can therefore be calculated using Equation 1.
where a and b are the minima, 132 GeV, and maxima, 168 GeV, of the invariant mass respectively.

Results and Conclusions
The resulting feature distributions and correlations of the pre-trained WGAN used in this study are shown in Appendix Appendix A.2.The frequentest study consists of repeating the pseudo-experiment sufficient times to produce a evaluate a 3σ effect.To this end the pseudoexperiment must be repeated more than 5 • 10 5 times.Each pseudo-experiment produces a local signal significance for each background rejection batch.For an initial evaluation, the pseudoexperiment is repeated 1000 times.The local and global signal significances can be understood using their frequency distributions, Figures 1 and 2. The probability of achieving a 3σ significance can therefore be calculated as the fraction of times that the experiment has events with |σ| > 3.During the initial evaluation, no significance above 2.5 was achieved.Although this is a positive indication, enough pseudoexperiments must be run before results can be concluded.

Figure A2 .
Figure A2.Final feature Correltation comparison of MC data and WGAN generated data.