Application of graph networks to background rejection in Imaging Air Cherenkov Telescopes

Imaging Air Cherenkov Telescopes (IACTs) are essential to ground-based observations of gamma rays in the GeV to TeV regime. One particular challenge of ground-based gamma-ray astronomy is an effective rejection of the hadronic background. We propose a new deep-learning-based algorithm for classifying images measured using single or multiple Imaging Air Cherenkov Telescopes. We interpret the detected images as a collection of triggered sensors that can be represented by graphs and analyzed by graph convolutional networks. For images cleaned of the light from the night sky, this allows for an efficient algorithm design that bypasses the challenge of sparse images in deep learning approaches based on computer vision techniques such as convolutional neural networks. We investigate different graph network architectures and find a promising performance with improvements to previous machine-learning and deep-learning-based methods.


Introduction
Arrays of Imaging Air Cherenkov Telescopes (IACTs) enable precise observations of the veryhigh-energy gamma-ray sky in the GeV to the TeV regime.When highly-energetic particles penetrate the Earth's atmosphere, air showers(i.e., particle cascades) are initiated.IACTs image the Cherenkov radiation, which the relativistic secondary particles emit while passing through the Earth's atmosphere.
For the precise observation of gamma-ray emitters, the hadronic background has to be reduced to a minimum to ensure a high sensitivity of the instruments.For rejecting this background -of mainly proton-induced air showers -most algorithms rely on high-level image observables like Hillas parameters [1].In this approach, the images are first cleaned from background noise, e.g., the night sky background.Subsequently, the images -consisting of the remaining pixels after cleaning -are reduced to multiple parameters by elliptical modeling of the first moments of the Cherenkov light distribution, where the width and length of the ellipse are critical parameters and decisive for γ/hadron separation.Since hadroninduced showers are subject to larger fluctuations due to the hadronic interactions in the shower development, the resulting light distribution is different and mainly results in increased widths.For an efficient background rejection, cut strategies use these differences on such image parameters or by exploiting their correlations using machine learning techniques [2].
With the recent advent of deep learning [3,4], the application of machine-learning algorithms to low-level data of physics instruments became computationally feasible [5].These deep learning techniques already demonstrated their effectiveness in Large Hadron Collider (LHC) [6], neutrino [7,8], and cosmic-ray physics [9,10].Also, in gamma-ray astronomy, various works focus on reconstructing IACT images using deep learning, i.e., neural networks, to extract information beyond the Hillas parameterization, which is too simplified to describe characteristics of the Cherenkov light distribution in full detail.For example, especially proton showers that develop on average far more inhomogeneously compared to photon showers cannot properly be described by an ellipse.A simple analysis based on width and length can thus be interpreted as a principal component analysis (PCA) where only the first two components of the light distribution are considered.Hence, more sophisticated approaches aim to exploit the substructure of IACT images beyond these moments to improve the rejection of hadron-initiated showers.
The first application of deep learning to IACT images was studied in Ref. [11].A following comprehensive study on the application of deep learning to event reconstruction [12] forms the base for many IACT deep learning applications [13][14][15][16].The proposed hybrid architecture consisting of convolutional and recurrent operations showed promising results when applied to simulations.Whereas the recurrent network part was used to deal with image stereoscopy, i.e., the combination of multiple telescopes that imaged the same shower, the convolutional network part was used to exploit the shower image on a per-telescope basis.One particular challenge of this approach is the image sparsity, as after image cleaning, typically only around 15% active pixels remain, as visible in Figure 1a.Since many IACT cameras feature a hexagonal arrangement of pixels, the sparsity of the image even increases as a transformation into a Cartesian representation is needed (compare Figure 1b).These caveats make the application of convolutional neural networks (CNNs) less efficient and motivate algorithms that can deal more naturally with the measured light distribution in the camera.When not performing cleaning, in theory, more information could be exploited assuming that the cleaning has finite performance.Nevertheless, the major part of the image would, on average, consist of background noise that must be described in full detail.We, therefore, would like to emphasize that investigating additional more sophisticated cleaning procedures will also contribute decisively to the development of precise and efficient algorithms for IACT image analyses.
In this work, we present a novel approach for discriminating γ-ray-from proton-induced showers using graph networks, which were recently applied to physics data lying on nonregular [17][18][19] and non-Euclidean manifolds [20,21].Our technique relies on cleaned images, which we interpret as sparse signal patterns that can be represented by graphs, and, thus, efficiently analyzed using graph convolutional neural networks.Furthermore, since only the signal pixels after cleaning contribute to the construction of graphs, our algorithm is computationally significantly more efficient in terms of memory consumption and run time than current approaches based on CNNs or CNN-RNN-based models.
The work is structured as follows.In the first part, we introduce the data, followed by a summary of deep-learning-based IACT image analyses and our new approach to interpreting IACT images as graphs.Then, in Section 3, we review two different graph convolutional neural network architectures we use in this work and describe the training of our algorithm using simulations of the High Energy Stereoscopic System (H.E.S.S.) [22].Finally, in Section 4, we apply our method to the classification of mono and stereo events, examine the performance of our proposed classification algorithm, and compare the results to machine-learning and deep-learning-based classifiers presented in the literature.

Data
For the simulation of extensive air showers, the software package CORSIKA (Cosmic Ray Simulation for Kascade) is used [23].Initially developed for the KASCADE experiment, it is publicly available, open-source, and a standard tool for the broader astroparticle physics community.Furthermore, we use the software sim_telarray to fully simulate the detector response, ranging from the photons' ray tracing to the measurement with photomultiplier tubes (PMTs) and its digitization [24].
The simulated events are calibrated and cleaned using the standard H.E.S.S. Analysis Program (HAP).H.E.S.S. is one of the currently operational IACT arrays and is located in Namibia.It consists of five telescopes, four small telescopes, named CT1-4, arranged in a square with 120 m side length, and a larger fifth telescope (CT5) placed in its center.In this work, we use H.E.S.S. as a show-case scenario of our algorithm.Nevertheless, due to its flexible design, our proposed method can be applied to any IACT array configuration.
Once the data is recorded, a calibration procedure is needed to use the raw data, which entails converting the measured Analog-to-Digital-Converter (ADC) counts into the physical unit of photoelectrons (p.e.).In the following step, a dual threshold cleaning procedure is applied to exclude all pixels without a shower signal and to keep only pixels with shower signals in local proximity.We used the so-called extended 4/7 cleaning [22] for CT1-5 in this paper.This means, during the tail cuts cleaning process, we keep a pixel if the intensity is >4 p.e. and at least one nearest neighbor pixel has an intensity >7 p.e., conversely.Finally, we save the extended image, i.e., also include the neighboring two rows of pixels around the 4/7-cleaned image.We further use selection cuts, part of the standard analysis chain of H.E.S.S., selecting only bright camera images that are not truncated, ensuring high-quality reconstructions.Whereas the minimum image total amplitude, denoted as the image size parameter, guarantees bright images, the local distance, i.e., the maximum distance between the Hillas ellipse center-of-gravity and the camera center, ensures that the images are not truncated.The local distance cut, we refer to it as preselection cut hereafter.
In H.E.S.S., the reconstruction is performed with different configurations of the five telescopes enabling observations at different energy ranges.These configurations are motivated by either the start of operation of individual telescopes -for example -CT1-4 have been operational since 2003, and CT5 was added to the array in 2012.At the time of writing, there are three main configurations: mono, stereo, and hybrid."Mono", corresponds to observations that solely include CT5 -most effective at low energies.The "stereo" and "hybrid" configuration corresponds to any combination of at least two telescopes from CT1-4 and CT1-5, respectively, having good quality data, i.e., satisfying the required selection.Whereas stereo is most effective at the highest energies, hybrid covers the entire energy range of H.E.S.S.
In this work, for training the neural networks, all events were simulated as diffuse emission around 20 • zenith, 0 • azimuth, with an opening angle of 5 • .The simulations were performed for the so-called phase2d3, which corresponds to the latest state of the H.E.S.S. array taking into account effects such as the optical efficiency of the telescope mirrors, in an energy range of 10 GeV to 300 TeV with a spectral index of -2.The proton showers were simulated using the QGSJET-II model implemented in CORSIKA.Since the amount of Cherenkov light produced differs between photon and proton showers due to the hadronic component, the simulated energy range differs slightly between both classes to cover the full sensitivity range of H.E.S.S. 1 .We have pre-processed the simulated events using HAP to apply the image cleaning, trigger condition for the array, and calculate the Hillas parameters per image in mono and hybrid analysis in the so-called zeta configuration [25].
For detailed comparisons, we have also used a standard set of cuts, where the minimum size parameter of an image is 60 p.e. for CT1-4 and 80 p.e. for CT5 and the maximum local distance of an image is 0.525 m for CT1-4.For CT5, we do not apply a local distance cut.Our final data consists of 10 6 events in the mono data set, 650,000 events in the stereo data set with preselection, and 10 6 events for the stereo dataset without the preselection, i.e., the local distance cut applied.We used 85% for validation and training and 15% for final testing.
Finally, we compare our results as a benchmark to the BDT-based standard γ/hadron separation method used in H.E.S.S.However, we note that the BDT-based method used in H.E.S.S. requires full preselection and further uses point-source gamma-ray simulations, and real "off-run" data is used for the background description.These off-runs contain the data collected by pointing to the parts of the sky where we do not have any known gammaray source, enabling a stable application of the BDT to data.For our scenario, this is a conservative benchmark as we likely overestimate the BDT performance on simulations since, for evaluation of graph networks, diffuse γ's were used.

IACT images and graphs
Current deep-learning approaches for IACT-image analyses are based on CNNs.This approach -even if well-motivated and showing promising results -suffers from the following two challenges.Firstly, the sparsity gets even more prominent for cleaned images because image cleaning removes a significant fraction of the night-sky background.We found that, on average, only ∼ 15% of pixels hold signals.See Figure 1a for a visualization of a typical IACT image measured using CT5 after cleaning and pre-processing.Secondly, the design of IACT cameras usually features an arrangement of hexagonal pixels.Thus, re-indexing [26,27] or re-binning [12] has to be performed to allow for using CNNs, which are based on Cartesian grids.It can lead to performance issues for too low-resolution re-binning or inefficient, i.e., sparse representations [28].For example, the axial representation introduces a lot of zeros into the new images indicated as grey pixels in Figure 1b. 2 Therefore, to reduce the computational complexity, we follow a new, more natural approach.
We consider the cleaned IACT images as point clouds, i.e., a collection of triggered pixels with positions (x, y) in the camera frame, yielding a signal value s in units of p.e.To conduct an efficient deep learning approach based on convolutional operations3 , we use the point cloud to create a graph -i.e., a collection of nodes that are connected via edges -to be analyzed by graph convolutional networks.
As IACT images can be interpreted as an overlay of images from different stages of shower development, the spatial structure of these recorded images in the camera frame is the fundamental feature of discriminating protons from photons.Thus, we construct a directed graph using the k-nearest-neighbor (kNN) algorithm applied to the pixel positions (x, y) only.Due to the hexagonal pixelization, we use k = 6 and include a self-loop so that each sensor is connected to its closest six neighbors and itself.In principle, two approaches are possible, using all pixels and building a single fixed graph for all events or by only considering the signal pixels to form a signal graph, which changes on an event-by-event basis, as shown in Figure 2. We are focusing in this work on the signal graph approach due to its computational efficiency.We also performed a few tests using the fixed graph approach but observed non-significant improvements.
The final image representation results in an un-weighted signal graph with N nodes (remaining number of pixels after cleaning), depicted in Figure 2b.Each node x i of the graph nodes {x 1 , ..., x n } holds a three-dimensional feature vector with the x, y position and measured signal x i = (x i , y i , s i ) forming the matrix of node feature vectors X ∈ R N ×3 .The edges of the graph indicate a direct neighborhood between two nodes (0 or 1).This relationship is described by the adjacency matrix A ∈ R N ×N .The degree matrix D ∈ R N ×N denotes the number of neighbors for each node.
Note that for a stereoscopic event of j triggered telescopes, j independent graphs are created.In contrast, non-triggered telescopes are modeled as a single-node graph with zero elements as the feature vector and a self-loop.This approach simultaneously facilitates an efficient representation and the information on the telescopes' trigger state; the latter is vital for inferring shower properties.Node sizes denote signal strength.Grey nodes indicate pixels without any signal after preprocessing.The black edges denote the neighborhood relations after constructing the graph using kNN.For better visibility, self-loops are not visualized.networks, we use a logarithmic re-scaling [9] of the measured pixel signals s i :

Transformation of signals
(2.1) where σ(s ) denotes the standard deviation of the transformed signals estimated over the entire dataset.Pixels with negative values are removed, as they did not show a performance improvement and reduced the computational efficiency of the algorithm.

Graph Convolutional Neural Networks
In recent years, the success of deep learning in speech recognition and computer vision has spread into the natural sciences, including physics.One of the driving forces of deep learning is convolutional neural networks (CNNs), which have been demonstrated to be enormously powerful when applied to regular and structured data such as images and time series.For the analysis of data lying on irregular grids or non-Euclidean manifolds and forming point clouds, however, the convolutional operations have to be substantially modified.Graph convolutional neural networks offer an elegant way out of this challenge.By constructing graphs to represent these point clouds, a clear neighborhood relation is defined, enabling the exploitation of the data structure using convolutions.Currently, two different approaches are used to perform the convolution operation on graphs.One ansatz, the so-called spectral convolutions, utilizes a transformation into the Fourier domain where the convolution acts point-wise.A second is a spatial approach in which the convolutional kernels are spatially localized -similar to CNNs -and the kernels are extended to non-Euclidean and irregular lattices.For a detailed introduction to deep learning on graphs and the different methods, refer to Ref. [5,29].
In this work, we investigate two different formulations that use the spatial ansatz, which is very flexible and has already been successfully utilized in physics [19][20][21]30].

EdgeConv
For the analysis of point clouds, by means of graph networks, edge convolutions were proposed in Ref. [31].Given a graph with N nodes {x 1 , ...., x n } as input holding various features, for each node x i , similarly, the convolutional operation is performed -followed by an activation function -to extract and accumulate local features.
Here h Θ denotes an arbitrary function with trainable parameters Θ, usually parameterized as a feed-forward network.It can be interpreted as a continuous kernel function as it depends on the feature vector of the central node (x i ) and the local relative feature difference to the neighboring node (x j − x i ).After performing the operation with each neighboring node x j , the aggregation operation j∈N (i) over the neighborhood of the node x i is performed.Here, the aggregation operation " " is to be defined by the user.We will use 1   6   j∈N (i) throughout this manuscript.
Figuratively speaking, EdgeConv can be seen as an extension of the standard CNN operating on an image to continuously-distributed data.In the CNN, the same kernel slides over the image and is, at each position, 'convolved' with the local image patch.That is the point-wise product of the kernel and the pixel and its local neighborhood -at the current kernel position -is performed.Hereafter, the results are aggregated to a single value by summing up, which leads to the fact that the structure of the image is preserved.Similarly, in the edge convolution at each node, the convolution is performed using the same kernel function h Θ , replacing the discrete filter as used in CNNs with a feed-forward network.Hence, applying the kernel function to each neighboring node of the current node resembles the point-wise kernel-patch product in the CNN.Finally, the information is aggregated using j∈N (i) .Within this interpretation, the number of features (outputs) of the kernel network would correspond to the number of kernels used in a CNN layer.

TAGConv
Topology Adaptive Graph Convolution Networks (TAGCN) were introduced in Ref. [32].TAGCN extends the well-known fixed-size CNN-like filters to graph-convolutional networks in the spatial domain.In this approach, the graph-convolutional operation is achieved by simultaneously applying a set of fixed-size kernels, to be defined by the user, to each node of a given graph.The output at a given node is each kernel's weighted sum of outputs.The sizes of these kernels are motivated to propagate the information to a given node by its neighbors.For example, size 1 and 2 kernels will bear the information from the next-and next-to-next neighbors, respectively.Note in this setup the number of nodes considered in the convolution depends on the actual size of the one-hop or rather two-hop neighborhood.Thus, it makes the topology of these learnable fixed-size filters adaptive to the topology of the graphs.For a graph with a set of nodes X, the adjacency matrix A, and its degree matrix D, the TAGCN  Here, the more complex version of the stereo architecture is shown.In the visualized event, only three of the four telescopes satisfied the image amplitude cut.After processing the signal patterns of the telescopes using six graph convolutional layers, the resulting features are concatenated, and pooling is performed.Finally, a fully-connected network part follows, analyzing the obtained 'pooled' outputs after each convolutional layer of each telescope.
convolution operation is defined as: where the summation is taken over the kernel parameter k, and Θ k are the trainable parameters.More concretely, a given node over which the kernel is applied will only collect the message from the nodes which are k path lengths away, where one path length corresponds to an edge connection between two nodes.In this work, we have used K = 2, which has also performed best in the original work.It is worth emphasizing that the term D −1/2 AD −1/2 of the adjacency matrix A is raised to the power k, which allows lower contributions in the message passed from a farther node than a nearer one.
For an analogy to CNNs-based architectures where typically a fixed-size filter is used.For example, a 3 × 3 filter in VGG-16 [33] or an 11 × 11 filter in AlexNet [34].However, in GoogLeNet [35], a combination of a set of filters of different sizes is used in each convolutional layer.Taking a similar approach as in GoogLeNet, TAGCN can be considered a graphconvolution operation where a set of up to K-localized filters is used simultaneously.Although we note that in GoogLeNet, the features from different size filters are only concatenated, in contrast, in TAGCN, the weighted sum is aggregated.
Even being similar, TAGConv is not a simple extension of the well-known GCN architecture [36], as it is not an approximation of a graph convolution in the Fourier domain.It can rather be understood in the context of graph signal processing as it utilizes a multiplication by polynomials of the adjacency matrix to consider the k-hop neighborhood.

Network design and training
We use a similar layout to enable a fair comparison between the two graph architectures.Each graph part features six graph-convolutional layers of the respective type.For the stereo data set, the same graph part is shared using weight sharing4 along the four telescopes as four towers to improve the generalization performance of the architecture.The following network part receives as input the output of each of the six graph convolutional layers.For the stereo architecture, the outputs of all telescopes are concatenated.To these concatenated tensors, a global pooling operation is applied to remove the dependence on the number of graph nodes.As this is a necessary but aggressive pooling operation, the inputs after each graph layer are utilized to keep information from different levels of the feature hierarchy.The pooling operation is followed by five ResNet modules [37] and the output layer.Figure 3 shows a sketch of the network design used for the stereo architecture with four towers of graph convolutions processing the signal patterns of the four telescopes.For the mono dataset, the architecture simplifies to a design with only a single tower of graph convolutional layers to process the graph input.
The two graph network architectures were implemented using PyTorch Geometric [38].For more details on the neural network architectures, refer to section A in the appendix.Each network was trained using the ADAM [39] optimizer with the AMSGRAD option [40] on a single Nvidia A100 GPU.The training duration amounted to roughly 10-20 hours.The details of the training parameters can be found in the appendix.
Note that for the mono and the stereo data set, we only train on the dataset without the preselection cut applied.We follow this procedure since using the data without preselection increases the statistics by 50%.This significant increase in training data will help to improve the generalization performance of our deep learning algorithms.

Classification of IACT images
A crucial aspect of ground-based observation of γ-rays is to separate the scarce γ-ray signal events from the abundant hadronic cosmic-ray background events.IACT arrays such as H.E.S.S yield highly-sensitive γ-ray observations only when the background events are effectively rejected.Currently, in H.E.S.S, the γ/hadron separation is performed using so-called Boosted Decision Trees (BDTs).Therefore, we use the BDT classification performance as a baseline comparison for our deep-learning-based classifiers.
In this approach, several parameters extracted from the observed shower images, typically derived from Hillas-based event reconstruction, are combined into one parameter, which describes the likeness of an event to be of hadronic or electromagnetic origin and allows the classification of the events [25].In contrast, our approach directly utilizes the individual pixel-level information mitigating the information loss due to image parameterizations.
In this work, we study the performance of the graph network approach and compare it to the traditional BDT method for two cases: "mono" and "stereo", as described in Section 2. The networks are evaluated in the respective sensitivity regime of the configuration to obtain realistic results, which are 50 GeV -300 TeV for mono and 100 GeV -300 TeV for stereo.
We use the "Receiver Operating Characteristic" (ROC) curve to evaluate the classifiers to demonstrate the general performance.The area under the ROC curve (AUROC) is used as the evaluation metric while comparing different classifiers; as the closer the value of AUROC is to 1, the better the performance of the classifier.
It is to be noted that the γ/hadron separation is inherently dependent on the amount of information in the recorded images itself.Meaning with an increasing amount of light in the recorded images, the performance of the classification tasks is expected to improve.Thus, we examine the AUROC as a function of the reconstructed energy (derived from the image-amplitude parameter), which is highly correlated with the amount of light observed.To derive the uncertainty on the AUROC value for a given bin in reconstructed energy, we used bootstrapping and resample 1000 times.We also compare the performance to the previous work of deep CNN-RNN-based analysis for the "stereo" configuration [12].

Mono performance
Figure 4 shows the performance of our two GCN-based architectures -TAGConv and Edge-Conv -for the mono configuration, i.e., considering CT5 only.In Figure 4a, we examine the general performance over the entire test data set.It can be observed that both architectures give very similar performances of an AUROC 0.9790 for TAGConv and 0.9779 for EdgeConv, respectively.The energy-dependent performance is shown in Figure 4b.At reconstructed energies larger than 500 GeV, the performance reaches a stable plateau with an AUROC above 0.99 for both architectures.
In Figure 5, we display the performance of our classifiers in two energy bands of 0.05-1 TeV (lower) and 1-300 TeV (higher).At a 50% of gamma-ray efficiency (0.5 of true positive rate) in the lower energy band, we would misclassify 1 in ∼ 100 (8 • 10 −3 false positive rate) background events.Similarly, only 1 in ∼ 4000 background events would be misclassified for the higher energy band.Such good performance of γ/hadron separation, using a single telescope only, can be attributed to its large collection area, dense pixelization, and the validity of the effectiveness of our deep learning approach.Therefore, the sub-structures in the air-shower light pool can be resolved in more detail, which is especially important for hadron-induced events.Since the mono reconstruction is currently under development, we could not find any available BDT performance of the mono analysis for the H.E.S.S experiment for comparison.

Stereo performance
The γ/hadron classification performance obtained in this work and its comparison to the BDT-based method for the stereo analysis are shown in Figure 6 and Figure 7 for the full energy range.In the case of stereo evaluation, it is worth revisiting our cuts applied to the data sets.We studied the performance for two cases, first: where only the image amplitude cuts are applied (Figure 6); second: the full preselection cuts are applied, i.e., only events are included that have at least two telescopes from CT1-4 passing the local-distance cut defined in Section 2 (Figure 7).We note that the local distance cut ensures a high-quality reconstruction using the BDT but significantly reduces the statistics by 30%.   in the performance of the BDT-based method to ours can be clearly seen below 10 TeV and above 50 TeV.It is worth mentioning that the performance for the BDT-based method is obtained using point-source γ-ray simulations, while the graph networks are evaluated using diffuse γ-ray simulations.Hence, a further drop in performance can be expected for the BDTbased classifier as it is comparatively more difficult to classify diffuse γ-ray simulations than the point source ones from a diffuse proton background.Finally, in Figure 8a and Figure 8b, we show the ROC curves obtained in two energy bands of 0.1-1 TeV and 1-300 TeV for the cut data set with preselection cuts.To keep a 50% γ-ray efficiency (0.5 of true positive rate), we would misclassify 1 in 500 background events (2 • 10 −3 false positive rate) for the energy band of 0.1-1 TeV, which is about an order of magnitude better performance compared to the BDT-based method and a significant improvement.For higher energies above 1 TeV at 50% γ-ray efficiency, we would even misclassify only 1 in ∼ 1500 background events, which is again an order of magnitude better than in the case of the BDT-based method.

Conclusion
In this work, we presented a novel algorithm based on graph-convolutional networks for discriminating between photon and hadron-initiated air showers using the images detected by single or multiple Imaging Air Cherenkov Telescopes (IACTs).As an example IACT array, we used simulated proton and photon events for the current configuration of H.E.S.S.By interpreting the cleaned images as graphs, we overcame the inefficient and sparse images in approaches based on convolutional neural networks enabling the application for large IACT arrays featuring of different camera geometries, such as Cherenkov Telescope Array (CTA) [41].For two different graph convolutional networks, we have demonstrated that graph networks enable a more natural and efficient classification of IACT images.We studied our approach on two different data sets and found similar results in both methods.For the mono dataset, only the recently-added CT5 telescope was considered.Here, we found a performance enabling a strong separation between protons and photons.In particular, we found a very promising background rejection at low energies (below 1 TeV).Using the stereo data set, we demonstrated that stereoscopic observations using the CT1 -CT4 telescopes can be analyzed with graph convolutional networks.We found that we can significantly outperform the classical background rejection strategy based on BDTs using deep learning.At a gamma-ray efficiency of 50%, the background rejection is improved by roughly one order of magnitude.We further found a promising improvement compared to previous deep-learning-based studies using H.E.S.S. simulation, where we found an improvement in the data with and without preselection cuts.The finding that the performance of the deep-learning-based algorithms does not strongly rely on the preselection cut is a promising finding.It will enable to increase the statistics in future gamma-ray analyses of up to 50% with an increase in background rejection almost a decade superior to the current generation of classifiers.
This work comprises a novel study of graph networks for IACT images.We expect that using future architecture optimizations, e.g., by including sophisticated attention mechanisms between telescopes to resolve the stereoscopic nature of the images better, improving the image cleaning, and considering pixel-timing will further enhance the performance.Finally, we like to stress that it will be crucial to investigate the difference between data and simulation [42,43], including the night sky background and the instrument response, to exploit the full potential of deep learning algorithms for telescope observations under real-operation conditions.
the learning rate was reduced by multiplying with δ = 0.3 for EdgeConv, and δ = 0.33 for TAGConv, when the validation loss did not decrease after five epochs.The training was stopped after the validation loss did not decrease after ten epochs.

Figure 1 :
Figure 1: Representation of a typical event measured using CT5.(a) Detected Cherenkov light distribution at the camera after cleaning and pre-processing (point cloud).Marker sizes and colors indicate the signal strength.Grey points represent pixels without signal after cleaning and pre-processing.(b) Representation as a Cartesian image using axial addressing; grey pixels mark zero-padded regions, and white pixels indicate pixels without any signal after cleaning and pre-processing.
The distribution of measured pixel signals features an exponential tail towards high values.Therefore, to improve the learning behavior of neural (a) (b)

Figure 2 :
Figure 2: Representation of an IACT image as (a) fixed graph and (b) signal graph.Node sizes denote signal strength.Grey nodes indicate pixels without any signal after preprocessing.The black edges denote the neighborhood relations after constructing the graph using kNN.For better visibility, self-loops are not visualized.

Figure 3 :
Figure 3: Simplified sketch of the graph network design.Here, the more complex version of the stereo architecture is shown.In the visualized event, only three of the four telescopes satisfied the image amplitude cut.After processing the signal patterns of the telescopes using six graph convolutional layers, the resulting features are concatenated, and pooling is performed.Finally, a fully-connected network part follows, analyzing the obtained 'pooled' outputs after each convolutional layer of each telescope.

Figure 4 :
Figure 4: Classification performance for mono events without any selection cuts applied.(a) Comparison of ROC curves for the trained graph network classifiers.(b) Energy evolution of the classification performance.The numbers in the legend indicate the AUROC over the whole energy range.

Figure 5 :
Figure 5: Classification performance for mono events for the energy interval (a) 50 GeV to 1 TeV and (b) 1 TeV to 100 TeV.

Figure 7 :
Figure 7: Classification performance for stereo events with the preselection applied.(a) Comparison of the classification between the two graph networks and the BDT classifier of the standard reconstruction.(b) Energy-dependent classification performance of the classifiers.

Figure 8 :
Figure 8: Classification performance for stereo events with the preselection applied for the energy interval (a) 100 GeV to 1 TeV and (b) 1 TeV to 100 TeV.