ELUQuant: Event-Level Uncertainty Quantification in Deep Inelastic Scattering

We introduce a physics-informed Bayesian Neural Network (BNN) with flow approximated posteriors using multiplicative normalizing flows (MNF) for detailed uncertainty quantification (UQ) at the physics event-level. Our method is capable of identifying both heteroskedastic aleatoric and epistemic uncertainties, providing granular physical insights. Applied to Deep Inelastic Scattering (DIS) events, our model effectively extracts the kinematic variables $x$, $Q^2$, and $y$, matching the performance of recent deep learning regression techniques but with the critical enhancement of event-level UQ. This detailed description of the underlying uncertainty proves invaluable for decision-making, especially in tasks like event filtering. It also allows for the reduction of true inaccuracies without directly accessing the ground truth. A thorough DIS simulation using the H1 detector at HERA indicates possible applications for the future EIC. Additionally, this paves the way for related tasks such as data quality monitoring and anomaly detection. Remarkably, our approach effectively processes large samples at high rates.


Introduction
In experimental nuclear (NP) and high-energy physics (HEP), data analyses typically regress fundamental quantities from observables measured in events produced and detected by experiments.A crucial aspect of these analyses is the corresponding event-level uncertainty quantification (UQ).The method introduced in this work ELUQuant (dubbed ELUQ), to our knowledge, pioneers this in NP/HEP by gleaning insights from computer vision [1] and Multiplicative Normalizing Flows (MNF) in Bayesian Neural Networks (BNN) [2], effectively capturing both heteroskedastic aleatoric and epistemic uncertainties, which influence the regression of fundamental quantities from measured observables.Deep inelastic scattering (DIS) has recently benefited from deep learning (DL) techniques.An innovative study by Diefenthaler et al. [3] employed deep neural networks (DNN) to Additional effects may arise from, e.g., higher-order QED corrections at the lepton vertex or QCD corrections (see [3,4]).
infer kinematic variables Q 2 and x of neutral current DIS from traditional reconstruction methods enhanced through correlations revealed in the simulated datasets of the ZEUS experiment.Successively, Arratia [4] applied DNNs, capitalizing on full kinematic information of both the scattered electron and the hadronic-final state, to reconstruct the kinematics of neutral-current DIS events, using H1 experiment simulations.Though both papers signify critical advancements in leveraging DNN for DIS, they did not delve into the domain of UQ.Our endeavor with ELUQ distinctively bridges this aspect, and highlights the potential of detailed event-level UQ, a novelty among the referenced works.This methodology harbors promise for any physics analysis demanding nuanced UQ.This manuscript is structured as follows: Sec. 2 introduces the DIS kinematics, the chosen case study, and both the quantified uncertainty sources (aleatoric and epistemic); Sec. 3 delves into the ELUQ architecture, detailing its loss function, training procedures, and inference performance; Sec. 4 reports the results we obtained using H1's neutral current DIS Monte Carlo dataset also used in [4].Conclusively, Sec. 5 evaluates the broader impacts, emphasizing the effectiveness of ELUQ in event-level UQ and its potential applications for data quality monitoring and anomaly detection.

Kinematic Reconstruction of DIS
Deep Inelastic Scattering DIS is a reaction used to probe the internal structure of hadrons.In this process, high-energy leptons are scattered off hadrons, revealing intricate details about quarks and gluons [5].Historically, the experiments conducted at the HERA collider, which remains the only electron-proton collider ever constructed, have been instrumental in DIS studies [6,7].The forthcoming Electron Ion Collider (EIC) [8] promises to venture into previously uncharted regions of the DIS kinematic spectrum.Fig. 1 depicts the DIS process, where k, k ′ and P are the four-momenta of the electrons and proton, respectively, and HFS is the hadronic final state.
DIS kinematics involve: squared four-momentum transfer Q 2 = −q 2 = (k − k ′ ) 2 ; inelasticity y = q•P k•P , indicating the electron's energy fraction transferred to the nucleon; and Bjorken scaling x = Q 2 sy , showing the momentum fraction carried by the struck quark.The kinematic variables are related by Q 2 = sxy, where s = (k + P ) 2 represents the squared center-of-mass energy.Momentum and energy conservation in DIS kinematics provide the ability to calculate x, Q 2 , and y from measurements.Classical methods for their reconstruction differ (cf.[3,4,9]).We compare our results with methods such as Electron (EL), Double Angle (DA), and Jacquet Blondel (JB).As highlighted in [3,4], the DIS process can be influenced by several factors, such as initial-state and final-state radiation (ISR, FSR).Moreover, higher order QED and QCD corrections can also manifest in the process.Each reconstruction method has its strengths across the phase space and sensitivities to radiative effects.For instance, EL uses only measurements of the scattered lepton and excels in high y scenarios but falters at low y.In contrast, JB uses only the HFS and performs better at low y.Hence, regression linking measured quantities to true kinematics is crucial.The true values of x, Q 2 , and y in our data are derived from generator-level particle 4-vectors, considering effects like ISR and FSR radiation.

Synthetic dataset and network input
We utilize full simulation from the H1 experiment that encompass QED radiation and Lund hadronization model. 1 Table 1 summarizes the dataset statistics and size on disk.A total of 15 measured input features are used in our work and are sourced from [4].These encompass seven features sensitive to QED radiation: T with T as the HFS transverse momentum and p T the electron's; p bal z = 1 − Σe+Σ 2E 0 , where Σ e = E − p z,e and Σ = HF S i (E i − p z,i ); energy, η, and ∆ϕ of the nearest photon to the electron beam direction, where ∆ϕ is relative to the electron; E sum CAL /p e , the ECAL energy sum within a cone of ∆R <0.4 around the scattered electron; and the count of ECAL clusters within ∆R <0.4.These seven are merged with another eight: scattered electron's p T,e , p z,e , E; the HFS four-vector components T , P z,h , and E h ; ∆ϕ(e, h), the angle between the scattered electron and HFS momentum; and the difference, ∆Σ = Σ e − Σ.

ELUQuant Architecture
In this work, ELUQ is applied to the DIS simulated dataset of H1, extracting x, Q 2 , y, and their related epistemic and aleatoric uncertainties from 15 measured input features.Epistemic uncertainty stems from knowledge gaps, improving with more data and refined models.Aleatoric uncertainty, on the other hand, arises from inherent system variability and remains unaffected by additional data.
ELUQ is a bicephalous regression network with Bayesian blocks characterized by MNF to approximate posteriors for event-level UQ.A representation of the architecture is  reported in Fig. 2.This enables building posteriors over the weights at each layer.Thus, when sampling, the result is a diverse combination of weights within the network.After training, each forward pass through the network yields a distinct set of weights drawn from the learned posterior.
To effectively handle uncertainty in Bayesian networks, the objective is to calculate a posterior, q(w|D), where w are the weights of the NN.This allows predictions on the regressed quantities through a posterior distribution q(y|x, D), integrated over the space of the weights.However, the formulation of such a posterior is intractable and therefore Bayesian inference must be employed.Typically, fully factorized Gaussians are assumed as an approximate posterior q(W) such that we can minimize the evidence lower bound between the approximated posterior and the assumed prior.This is generally limiting and can underestimate true uncertainty.Another method is to utilize random auxiliary variables to improve the approximate posterior via a mixing density.In Louizos et al. [1], they parameterize the mixing density in terms of auxiliary variables z, which are in turn parameterized by normalizing flows to allow flexibility and local reparameterizations.They reduce the computational overhead of using a normalizing flow by allowing z to act multiplicatively on the means.
Given a set of Gaussian weights, the pre-activation of neurons can be considered as a linear combination of the weights, which is in itself Gaussian.Louizos et al. [1] further show that doing this for each injection within a mini-batch results in a different set of weights, lower variance in gradients, and an overall more stable optimization.An algorithmic description can be found in [1] (Algorithm 1).

Loss function
The total loss function is the sum of different contributions: The regression loss, Eq. ( 2), provides the DIS kinematic vector of observables we want to predict, namely v =(x,Q2 ,y), as well as the corresponding heteroskedastic aleatoric term σ =(σ(x),σ(Q 2 ),σ(y)): The sum i runs over all vectors in the mini-batch, and the sum j runs over elements in the vector, where the epistemic term is captured by ∥v i −v i ∥.The use of a logarithm at network output has been demonstrated in [2] to be more numerically stable than regressing the variance, σ 2 . 2 Looking closely at Eq. (( 2)), this is the logarithm of a multivariate normal distribution, see [2].The physics-informed term, Eq. ( 3), is applied on the regressed observables which ideally should match the truth where Q 2 = sxy holds: where the Mandelstam s is calculated at the ground truth level.The Kullback Leibler, Eq. ( 4), term is adapted from [1], which employs MNF in variational Bayesian Neural Networks to improve the posterior approximation.We deploy Gaussian priors, and the posterior distribution is given by the product of a Gaussian and mixing density parameterized by a normalizing flow.As shown in [1], this parameterization is flexible, allowing nonlinear and multimodal dependencies between the weight elements.
The SELU activation functions, as presented by Klambauer et al. [10], are employed for their inherent self-normalizing properties, which ensure non-vanishing gradients.Their self-normalization nature could provide cases in which batch normalization is not needed, although this is data-dependent.We utilize SELU along with batch normalization to improve network convergence [10].We also note that [4] utilizes these activation functions.
Training Training and inference are performed utilizing a Python 3.9.12environment with Pytorch 1.12.1 and CUDA 11.3.The model is trained for a maximum of 100 epochs, utilizing a batch size of 1028, in which training is stopped early if validation loss has plateaued.The model is trained using the Adam optimizer with an initial learning rate of 5e −4 , and is subject to a stepped learning rate function in which we decay (γ) by an order of magnitude every 50 epochs (step size).It was found that decreasing the learning rate in such a fashion allows the network to converge to a more stable lower value.During training, it is important to correctly weight the KL loss contribution in such a way that it does not dominate, yet allows the convergence to informative posteriors.This too holds true in the relation between the physics informed and regression losses.Optimal values of α = 1.0 and β = 0.01 were found through simple grid search optimization schemes.The total dataset size is ∼ 12 million events, split into a standard 70%, 15%, 15% training, validation and testing split.This provides ∼ 2 million events for testing purposes.Data injected to the network is scaled on the interval (-1,1), and targets are also scaled into the same interval.Table 2 provides the specs for training.
Inference At inference we sample each individual event 10k times, taking the mean value as our final prediction.The epistemic uncertainty component is given by the standard deviation in these predictions.Each individual inference will also provide the corresponding aleatoric uncertainty, in which we again take the mean as our final aleatoric component.We perform this in batches of size 100, which corresponds to an overall batch of 1 million.This results in an event-level inference time of 20ms.Note that this pipeline could be further vectorized to further decrease wall time.Table 3 provide the specs for inference.
Limitations Identifying a suitable network structure for a Bayesian network is generally not straightforward given the model is attempting to optimize a distribution of weights at each layer.The problem has been alleviated by utilizing the network's deterministic counterpart (a DNN) to identify a minimal complexity model that provides acceptable performance.We then directly utilized the structure from this network in ELUQ.Another aspect to consider, depending on the data, during the initial stages of training is that it is possible a poorly calibrated model produces poor regression targets with small aleatoric components, resulting in unstable fluctuations.In light of this, we bounded the minimum and maximum variances to improve training stability.The numerical values of the bounds were set such that they do not influence the learned aleatoric component, i.e., the network does not default to the minimum or maximum allowed value.

Analysis and results
Our strategy began with training a streamlined DNN that, despite its reduced complexity compared to [4] (150k parameters compared to 1.2M), achieved similar performance.While ELUQ and the DNN share similar architectural layer sizes, ELUQ stands out by offering enhanced UQ not possible with a basic DNN.We utilized ELUQ to predict the DIS kinematic variables and their associated aleatoric and epistemic uncertainties.In what follows, we make comparisons with other traditional reconstruction methods, namely EL, JB, and DA, introduced in Sec. 2. We will also incorporate DNN into the comparative visualizations.The section will be split into two parts.Sec.4.1 will discuss the general performance of the architecture in relation to other methods, similar to what is done in [4], and Sec.4.2 will provide detailed studies on the uncertainties produced by our model, and how their utilization can result in increased performance.

Regression Performance
Fig. 3 shows resolutions for x, Q 2 , y in bins of y and comparison among the various reconstruction methods.We can immediately notice that the distributions of DNN and ELUQ look alike over the whole range in y and for all the DIS kinematic variables.The choice of the binning in y is to reproduce and compare with the results in [4].We also notice that DNN and ELUQ outperform traditional methods.As expected, the traditional methods do perform differently as a function of y: for example, the methods that mostly rely on the scattered electron yield the best resolution in events with large y, but their resolution on x quickly diverges at low y.As already discussed, with ELUQ we can calculate uncertainties at the event-level.Given an observable Ôk for the k th event and its associated uncertainty σ k , the weighted average of the observable over the entire dataset using uncertainty level information, and its associated uncertainty, are given by:    While other methods do not provide direct access to event-level uncertainty, comparisons between methods are still feasible.To facilitate this, the expected event-level uncertainty is approximated using the RMS as detailed in [4].The RMS is then compared to the event-level equivalent derived from the weighted uncertainty, given by ≈ σ w • √ N .Notice that due to the large statistics, the uncertainty on the averages will be exceedingly small and may be challenging to visually discern otherwise.Figure 6: Average predicted-to-true ratio in bins of y for ELUQ, DNN with eventlevel uncertainty.In blue we also show for ELUQ the smaller event-level uncertainty on the weighted average compared to the RMS on ELUQ and DNN.
Fig. 4 and Fig. 5 show the (weighted) average ratio of the predicted observables normalized to their ground truth, < R O >=< Ôpred./ Ôtrue >, and the event-level uncertainties, respectively, in bins of the inelasticity y.
Fig. 4 shows a drop in the < R x > at low y, where the RMS resolution for y and x increase, even for the DNN and ELUQ reconstruction, as shown in Fig. 5.According to [4], this results may be attributed to further acceptance, noise, or resolution effects that deteriorate the measurement of the HFS.Notice that the weighted average is slightly more affected by this flaw in reconstruction performance than the arithmetic average.Fig. 6 shows a comparison of the ratios between ELUQ and DNN; for ELUQ, we report both the RMS and the event-level equivalent weighted uncertainty.Notice that the total uncertainty at the event-level for ELUQ is given by the sum in quadrature of the aleatoric and epsitemic components, i.e., σ tot = σ ale ⊕ σ epi .

Uncertainty Analysis
The validation criteria of our model are two-fold.In the previous section, we validated regression performance in comparison to the model's deterministic counterpart (DNN), both with and without the inclusion of information from uncertainties.Showing the benefits of access to event-level uncertainty in relation to performance.In what follows we validate the event-level uncertainty components individually.We conduct a series of closure tests on the aleatoric component to show the event-level quantities propagate correctly at the histogram level.We also conduct closure tests on the epistemic component in which we show the uncertainty generated by our model decreases as a function of model calibration.Fig. 7 shows a comparison between σ ale , σ epi and the RMS from DNN, for the three regressed variates x, (left), Q 2 (middle), y (right).Fig. 8 presents a detailed analysis of the histograms representing event-level occurrences of σ ale and σ epi uncertainties on x, Q 2 , and y.These uncertainties are examined in bins of y.
Closure tests support the reliability of the aleatoric and epistemic uncertainties extracted.For instance, as shown in Table 4, aleatoric uncertainties are consistent with the RMS of a DNN in bins of y (visually depicted in Fig. 7) where the regressed observables manifest as Gaussian distributions not affected by inaccuracy, that is, centered at the expected mean from ground truth.Notably, epistemic uncertainty-originating from   the same multivariate normal distribution characterizing the aleatoric term in the loss function-amplifies in response to increased inaccuracy with respect to ground truth, see Fig. 9. UQ studies have also been conducted to demonstrate the effect of the physicsinformed term on the inaccuracy |v− v|.Fig. 10 shows that Eq. ( 3) contributes to decrease  Figure 11: Weighted average predicted-to-true ratio in bins of y for ELUQ after applying a series of cuts at various thresholds on the relative weighted uncertainty of x, Q 2 , y to reject events.
the inaccuracy on Q 2 ; it also confirms that the epistemic increases if the inaccuracy gets larger.
We also demonstrate how event-level UQ can be employed to assess the quality of events, retaining those with higher confidence and discarding events with more pronounced uncertainties.Fig. 11 shows the effect of cutting events with large relative uncertainty using different thresholds.By excluding events with higher uncertainty, we mitigate the observed drop in the predicted-to-true ratio for the variable x. Figure 12: Rejected Fraction in bins of y after applying a series of boolean OR cuts at various thresholds on the relative uncertainty of x, Q 2 , y.An event that does not pass any of these cuts is rejected.
However, this approach results in a reduction of statistics.Figure 12 illustrates the count of discarded events in relation to the severity of the cuts, segmented by bins of y.The loosest cut removes 40% of the events in the lowest bin in y, predominantly influenced by high aleatoric uncertainty in x.

Conclusions
We present ELUQuant, a novel network that integrates Physics-Informed Bayesian Neural Networks with Flow-approximated Posteriors.This heralds a major leap in physics analyses, uniquely providing insights into both heteroskedastic aleatoric and epistemic uncertainties on an event-level basis.To our knowledge, this is a pioneering achievement in the field, realizing a long-sought benchmark.Validated by results from the H1 detector's DIS simulation at HERA, our work suggests promising future extensions to the upcoming EIC for extracting essential kinematic observables, which could be affected by radiation effects, and their associated uncertainties.Closure tests support the reliability of the aleatoric and epistemic uncertainties extracted.For instance, aleatoric uncertainties align with the RMS of a DNN in y-regions where the regressed observables manifest as Gaussian distributions not affected by inaccuracy, centered at the expected mean from ground truth.Notably, epistemic uncertainty-originating from the same multivariate normal distribution characterizing the aleatoric term in the loss function-amplifies in response to increased inaccuracy with respect to ground truth.While the impact of ELUQ for DIS data is evident, its versatility extends to a broader range of event-level physics analyses.The granularity ELUQ offers can revolutionize event filtering decision-making.Informed by uncertainties, it can mitigate true inaccuracies, showing promise in both data quality monitoring and anomaly detection.In computational terms, our approach at inference showed an impressive rate of 10,000 samples/event within a mere 20 milliseconds on an RTX 3090, emphasizing real-world application viability.In essence, ELUQuant's pioneering approach to event-level uncertainty quantification sets a new standard for comprehensive

Figure 1 :
Figure 1: Neutral current DIS diagram with possible initial and final state radiation.Additional effects may arise from, e.g., higher-order QED corrections at the lepton vertex or QCD corrections (see[3,4]).

Figure 2 :
Figure 2: ELUQuant is a bicephalous regression network with Bayesian blocks characterized by MNF modules to approximate posteriors for event-level UQ.

Figure 3 :
Figure 3: Resolutions for x, Q 2 , y vs y for different reconstruction methods.

Figure 4 :Figure 5 :
Figure 4: The average predicted-to-truth ratio of the DIS kinematics for ELUQ and DNN compared to the classical methods, in bins of y.

Figure 7 :Figure 8 :
Figure7: Average event-level aleatoric and epistemic uncertainty on the reconstruction-to-true ratio for x (left), Q 2 (middle) and y (right).ELUQ, compared to the RMS obtained from the DNN model.

Figure 10 :
Figure 10: Epistemic uncertainty vs true inaccuracy in bins of y, for x, Q 2 , y.The plots show ELUQ trained with and w/o the physics-informed loss, its inclusion providing a lower inaccuracy for Q 2 .

Table 1 :
Dataset statistics and size on disk.

Table 2 :
Training Specs of the ELUQuant architecture: training was performed with an Intel i7-12700k 12 Core CPU, Nvidia RTX 3090 24GB GPU and 64GB memory.

Table 3 :
Inference Specs of the ELUQuant architecture.Same computing resources of

Table 4 :
Comparisons between the aleatoric uncertainty of ELUQ with the RMS of other methods, for the DIS kinematic variables x, Q 2 , y.