Object condensation for track building in a backward electron tagger at the EIC

At the Electron Ion Collider, quasi-real photoproduction measurements involve tracking scattered electrons at small angles relative to the beamline. These electrons act as effective beams of tagged almost-real photons, with a high flux compared to larger Q2 interactions. However, the proximity of the detector to the electron beam results in a very high flux of electrons from the bremsstrahlung process (about 10 electrons per 12 ns electron/ion bunch crossing over an area of approximately 100 cm2). Consequently, the tracking detector systems experience high occupancy. To address this, we propose using machine learning algorithms, specifically object condensation methods, which excel at track building in the quasi-real photon tagger. These algorithms achieve track finding efficiency of 95% or higher and purity of 90% or higher, even in the presence of noise and hit detection inefficiencies.


Introduction
The backwards electron tagger design and location at the ePIC experiment provides an extension to the Q 2 acceptance of the central detector from 0.1 GeV to effectively 0 GeV.The first group of outgoing electron beamline magnets, located before the tagger, act as a magnetic spectrometer, spatially separating the electrons out by energy.At the highest luminosity running, the central ePIC detector will be required to record 500 kHz events of interest.In the backwards tagger however, hits from the Bremsstrahlung process will dominate with a relative cross section 10 4 times higher than deep inelastic scattering processes from proton collisions and 10 8 times higher from collisions with gold ions.The tagger will be required to measure the momentum of all of the electrons allowing the identification of the electrons associated with the events recorded in the central detector.In addition to the electrons originating at the interaction point, a high hit rate is expected from backgrounds such as synchrotron radiation from the beamline magnets and Bremsstrahlung from interactions between the beam and residual gas which will leave distinctly different tracks in the detector.
The proposed far backward tagger design is composed of two separate detectors covering distinct ranges of scattered electron energy.Each detector will consist of four layers of 36 Timepix4 hybrid pixel detectors [1] arranged in a 6 by 6 grid.A Timepix4 chip has 55 μm square pixels in a 448x512 array, and is capable of reading out 10's kHz rates per pixel.The high rate from Bremsstrahlung, as shown in figure 1 is highly focused by the beamline dipoles onto a central band.Due to the high concentration of hits, forming track candidates across layers involves considering numerous permutations of hits.Analyzing the incorrect track combinations leads to inefficient CPU usage and variable latency.We address this by using machine learning with a fixed latency to limit false combinations.
The importance of machine learning in modern day high energy physics has been highlighted for the EIC [2,3].In particular, machine learning is known to deal well with computer vision tasks such as object recognition.These types of algorithms could be useful in the case of the quasi-real photon tagger reconstruction where the aim is to efficiently recognise an object, namely the electron track, from a collection of pixel hits.The performance of tracking algorithms can be evaluated using the tracking efficiency and purity.The efficiency measures how good the algorithm is at reconstructing true tracks and the purity measures how few fake tracks are reconstructed by the algorithm.This paper will demonstrate how object condensation algorithms are particularly well suited to building tracks in the far backward tagger, notably achieving high tracking efficiencies and purities in the presence of noise and hit detection inefficiencies.
-1 - The expected rates from Bremsstrahlung on the front layer of the higher energy/rate detector, overlaid with the Timepix4 geoemetry.The distribution across the layer shows most hits are concentrated around Y equals to 0 and high X.

Methodology
Graph Neural Networks [4] (GNNs) are well suited to tracking applications as they represent data as nodes connected by edges which is akin to hits connected by tracks.Several different architectures have been designed to build tracks using the GNNs, in this work the object condensation method is investigated [5].
GravNet or GarNet layers [6] (GNet is used to refer to both) aim to create a one-shot model that can build a graph representation of data from a set of hits in a detector which is then directly used for classification or regression tasks.For example, GNet layers were used to distinguish between two overlapping showers by classifying hits as belonging to one or the other [6].GarNet layers were also used to identify particle types and simultaneously predict the particle energy given a set of hits belonging to a cluster [7].This later application was implemented on FPGAs, achieving a 1 μs inference time.
Object condensation methods build on the GNet layers to produce a one-shot approach to track building and track parameter estimation [5].These track parameters are, for example, the three-momentum of the track or the classification of the particle producing the track.Hits are passed to a model implementing GNet layers which then predicts a latent space representation of the hits, a  value that allows to group hits together in the latent space representation, and the track parameters.The graph structure, with a node for each hit, is maintained throughout the network with information passing between the nodes depending on a different latent space for each GNet layer.The key to training the object condensation network comes from the loss function, in the output latent space a charge is calculated for each hit which is attractive to hits from the same track source and repulsive for others: where  is the hit index in an event,  min is a user defined parameter and  is a prediction from the network which, trained to be high for a single hit per track which is called the condensation point.Condensation points represent a potential minima in the latent space around which the other hits belonging to the same track will be clustered.They also contain all of the multi-dimensional information gained from the network itself and will have the best estimation of the track parameters.For a complete description of the method see reference [5].

Training
Events generated using the GETaLM event generator [8] consist of a sample of quasi-real photoproduction electrons, mixed in with Bremsstrahlung electrons to represent a realistic electron-proton bunch crossing at 18 GeV.These event samples contained at most 15 electrons.In order to test the model in conditions which would reflect the inclusion of background sources no included int the generator, the multiplicity was increased by combining 10 events into one.This led to samples containing up to 82 electrons per event.In addition some samples have random noise injected uniformly across the detector.The bunched electrons are passed through the Geant4 based simulation of the epic simulation [9] which maintains the montecarlo truth associations between hits and the particles which caused them.
The the input to the model is the information contained in the variable number of tagger hits, such as their position (x,y,layer and detector), time, and energy deposition.The energy and time of the hits has not been used in this instance as resolution effects are not accounted for in the simulation output and not currently well studied properties of the detector.In addition, the timing structure of the event sample was known to be inaccurate.The output of the training data consists of an object identifier associating each hit to a input particle, a label   , and the track parameters to be estimated.The specific track parameters investigated here are the 3-momentum of the scattered electron, classifying the production process as either Bremsstrahlung or photoproduction, and classifying the hit as coming from a particle input into the simulation or a secondary produced within the simulation (or injected noise).
A schematic representation of the module 2 of the tagger is shown in figure 2 with tracks, clearly distinguished by the line joining hits of the same colour and noise in black.In the current simulation, most electrons only record a hit in a single cell, the tracks therefore have at most one hit per layer, as the simulation evolves the model will have to be updated to cluster hits on each layer as well as between them.In figure 2 an efficiency of 80% was applied to the data, leading to some tracks recording hits in fewer than 4 layers.A level of 80% was chosen to investigate the robustness of the model and is much lower than expected by the real Timepix detectors.Finally noise can be added to the data by random sampling from the X, Y, layer and module space.Between 0 and 200 noisy hits are added to the data for each event, in order to not bias the training to a particular level of noise.Figure 3 shows the distributions of the number of tracks per event, the number of tracks with hits in 2,3 or 4 layers per event and the number of noisy hits per event.
The network architecture suggested in ref. [5] has been simplified to match the reduced complexity of our detector setup, using only two blocks of GravNet layers, the output of which is concatenated and passed to a dense layer with 32 nodes before feeding into the output tensor.The basic output tensor contains the  value, and two coordinates of the latent space which are used to calculate the loss via the potential of the system, later extended to include parameters associated with the original particle.The model is trained for 600 epochs with an ADAM optimiser [10] and a learning rate of 10 −4 .The model -3 -  is trained first on the perfectly efficient and noiseless data, then on the data with both hit detection inefficiencies and noise.This allows to compare the performance of the object condensation model in a perfect scenario with a more realistic simulation of projected data taken by the far backward tagger.
An example of the distribution of hits across the latent space representation of the data before and after training is shown in figure 4.After training, hits from the same track are tightly grouped together and much more separated from different tracks, represented in different colours.During inference condensation points are chosen by requiring a  value greater than 0.1.For each condensation point, the distance between the condensation point and all other hits is calculated.The closest hit in each layer is then assigned to the same track as the condensation point, as long as that closest hit is within a user defined distance, typically set to 1. Requiring this maximum distance allows for a layer not to be represented in a track, allowing to predict tracks with two or three hits instead of four.Tracks are then represented as arrays of the X, Y positions in each layer, ordered by layer.
-4 - Once trained, the track building performance is measured using the efficiency and purity.The efficiency is calculated as the number of predicted tracks that exactly match a true tracks, divided by the number of true tracks in an event.The purity is measured as the number of matched tracks in all predicted tracks.Both metrics are averaged over all events in the testing set.The track efficiency and purity was roughly 99% on the data with perfect hit detection efficiency and no noise.This dropped to a track efficiency and purity of 96% and 93% respectively when adding hit detection inefficiencies and noise to the data.The method has proved very robust to high numbers of tracks, detector inefficiencies and the injection of noise hits.The efficiency and purity as a function of these metrics is shown in figure 5.The track construction efficiency appears to stay above 95% across all metrics while the efficiency sees a sharp drop off towards the upper range of the injected noise levels and when only 2 hit points are left by the track.
-5 - Extending the model outputs to include classification of the tracks and estimation of their momentum as also shown great promise.Figure 6 shows the classification of condensation points as either photoproduction electrons or Bremsstrahlung electrons, the kinematics these events has a large overlap which will be indistinguishable, but the model manages to separate out a higher fraction than simple cuts.Additional classification is demonstrated between tracks originating from an input to the simulation or secondary produced within the simulation.The reconstruction of the track momentum has also been investigated, this is shown in figure 7, this regression task is shown to be very successful and combined with a cut in the classification response demonstrates a clean correlation.It is important to note that the information contained within the condensation point can already be more than that of the collection of hits it connects, meaning collecting the hits and passing them to further analysis steps to carry out the paramaterization tasks in principle might loose information.

Discussion and outlook
The demonstrated performance of the object condensation method shows it is well suited to track building in the far-backward electron tagger.This is due to the fact that the track building needs to be able to construct dozens of tracks from hundreds of hits, including noise, whilst still being able to maintain a low and stable latency.The one shot approach guarantees faster prediction times and is shown to achieve high track efficiency and purity.Further optimisation of the model such as adjusting the size and shape of the GNet layers and balancing the components from the separate condensation, regression and classification tasks contributing to the total loss.Reconstruction of the components of the momentum and identification of tracks originating from the interaction point will be the focus of the optimization.
The method will have to adapt to improvements in the input data making it more closely match the expected experimental data such as the mixing of extra backgrounds with improved timing structure and a detailed sensor response model and digitization, spreading the charge over more than a single pixel and introducing more realistic temporal, spatial and energy resolutions.
The ePIC detector will use a streaming DAQ, where there will be no hardware triggers to select tagger hits only recorded in a coincidence window with signals from the central detector.Instead a coincidence filter will need to be carried out at a later stage in the DAQ based on some preliminary processing.The suggestion is that the object condensation network or an alternative GNN would be deployed on an FPGA to filter data at an early stage of DAQ in real time.Ref. [7] has shown that GNet layers can be deployed on FPGAs, and as such the object condensation method should also be deployable on FPGA.This however remains to be tested.The tagger reconstruction will then be embedded within the general ePIC reconstruction software and online data acquisition, with the low tracking times demonstrated by the object condensation method well suited to such use cases.

Figure 1 .
Figure 1.The expected rates from Bremsstrahlung on the front layer of the higher energy/rate detector, overlaid with the Timepix4 geoemetry.The distribution across the layer shows most hits are concentrated around Y equals to 0 and high X.

Figure 2 .
Figure 2. Example hit distribution on tagger module 2 showing track hits (coloured) with inefficiencies and noise (black) injected into the data.

Figure 3 .
Figure 3.The distributions of, number of tracks per mixed together events (left), number of layers hit per track after efficiencies applied (middle), and the number of noisy hits added per event (right).

Figure 4 .
Figure 4.The predicted latent space before (left) and after (right) training.Tracks are represented by a single color, although similar shades are repeated for various tracks due to the amount of tracks per event.Noise hits are plotted in black.Transparency is related to the predicted  value.Condensation points have almost no transparency whilst non-condensation points have some transparency.The minimum transparency is set to 0.2 for visibility.In this example some condensation points are formed from noise hits which would lead to a lower tracking purity from the formation of tracks around a noise condensation point.

Figure 5 .
Figure 5.The purity and efficiency as a function of the number of tracks per event (left), the number of layers represented in a given track (middle) and the number of noisy hits per event (right).This model was trained on data with hit detection inefficiencies and noise added to the data.

Figure 6 .
Figure 6.Response of two classifiers on the condensation points.Left -Bremstrahlung/Quasi-Real photoproduction.Right -Primary electron/secondary created in the simulation.

Figure 7 .
Figure 7.The true momentum magnitude of an electron at its origin plotted against the predicted momentum.For all condensation points (left) and filtered by the primary classification response > 0.8 (right).