Pipeline for performance evaluation of flavour tagging dedicated Graph Neural Network algorithms

Machine Learning is a rapidly expanding field with a wide range of applications in science. In the field of physics, the Large Hadron Collider, the world's largest particle accelerator, utilizes Neural Networks for various tasks, including flavour tagging. Flavour tagging is the process of identifying the flavour of the hadron that initiates a jet in a collision event, and it is an essential aspect of various Standard Model and Beyond the Standard Model studies. Graph Neural Networks are currently the primary machine-learning tool used for flavour tagging. Here, we present the AUTOGRAPH pipeline, a completely customizable tool designed with a user-friendly interface to provide easy access to the Graph Neural Networks algorithm used for flavour tagging.


Introduction
The purpose of flavour tagging is to determine the initial parton of jets, i.e. collimated cone of stable particles arising from fragmentation and the hadronization of a quark after a collision.Unique features of the heavy hadrons (bound states involving bottom or charm quarks) are exploited to identify heavy jets; indeed their decays exhibit distinctive topologies due to their lifetimes.The flavour tagging is particularly important for studying the Standard Model (SM) Higgs boson and the top quark and, additionally, in searches for several Beyond Standard Model (BSM) resonances.The Large Hadron Collider (LHC) at CERN [1] involves Machine Learning techniques for online tagging of heavy flavours [2].This approach enables the efficient management of large amounts of data, which is crucial for handling the integrated luminosity produced during the last years of Run 2 and expected during Run 3 [3].LHC general-purpose experiments, CMS [4] and ATLAS [5], utilize Graph Neural Networks (GNNs) for flavour tagging.GNNs have a distinct capability of carrying out multi-level inferences.For example, in the ATLAS experiment, the new GN1 tagger [6] makes predictions on three levels: the graph classification (i.e.flavour tagging), the node classification (i.e.prediction of the track truth origin), and the edge classification (i.e.whether the tracks in the track-pair belong to a common vertex).Furthermore, GNNs can include the structure of physical events.For example, in the CMS experiment, the ParticleNet tagger [7] exploits the particle cloud structures to reconstruct and tag the jets.Finally, GNN algorithms are applied to offline analysis for background-signal classification tasks at colliders [8].GNNs are involved in non-collider experiments as well.For instance, IceCube has developed a GNN algorithm for neutrino reconstruction [9], which improves reconstruction accuracy for low-energy events.It can be inferred that GNNs are a crucial new technology in particle physics, but managing and utilizing them is complex; for this reason, a pipeline that allows easy access to this state-of-the-art technology has been developed.

The AUTOGRAPH pipeline
The AUTOGRAPH pipeline (Automatic Unified Training and Optimization for Graph Recognition and Analysis with Pipeline Handling) is an easy-to-use and customizable architecture designed to train and apply GNNs for flavour tagging.The framework consists of two main components -the user interface and the automated steps, as depicted in Figure 1.To manage the jet-graph structure, network architecture, and training hyperparameters, the user fills a single configuration file via the provided interface.The pipeline's automated structure consists of Python modules managed by the user interface.To start the training process, the user executes a single Python script.When it comes to supervised Machine Learning algorithms like the one used in the pipeline, the training process involves adjusting the network parameters (also known as weights) to minimize the loss function.The loss function represents the difference between the network's prediction and the target value associated with the event, typically obtained from the Monte Carlo simulations.The automatic training process consists of three main steps: pre-processing the data, defining the network architecture, and iterating through epochs to update the network weights.Moreover, the network is tested on a validation dataset that is different from the one used for training to assess its performance.

Dataset handling
The pipeline works on simulated datasets previously pre-processed by the user.It requires an array representation of track-jets with a fixed number of associated tracks.The tracks should be associated with the jets before the pipeline action.In AUTOGRAPH, the dataset undergoes an initial automated step to define the graph representation.The graph in Figure 2 illustrates a jet, with the tracks linked to the jet represented as nodes in the graph.Each node corresponds to a customizable set of track features the user selects.Finally, once the tracks and their related variables are specified, the user selects the global features associated with the graph structure and links directly with the jet.Through the configuration file, the user can customize the graph architecture, selecting the number of tracks per jet, the features associated with each track, and the global-jet features.Finally, the resulting graph list is converted into Pytorch Geometric DataLoader format [10].During the conversion, the original dataset is divided into two data sets: the training dataset, dedicated to the network's training, wherein the network parameters are updated at each epoch, and the validation dataset, used to control the network performance.The configuration file allows the user to choose the fraction of data used for training, which is set to 0.7 by default.The network is trained by performing a classification task to identify the jets originating flavour.To feed the GNNs, the pipeline recasts the array-like dataset in a compatible format with the Graph Layers, as explained in the previous section.The graph-jets are provided as input to a user-configurable number of Graph Layers.The user is given a choice between two Message passing-based layers: the Graph Convolutional Layers (GCN) [11] and the Graph Convolutional Attention Layers (GAT) [12].The configuration file allows the selection of the graph layer number and the hidden nodes per graph layer.Once the graph-jet has been processed by the GCN and/or GAT layers, a pooling function is applied.The latter aims to convert the updated graph representation into an array-like representation using embedding criteria for all track features.The user can select from various pooling functions in the configuration file.In particular, they can include a pooling attention function that applies the attention mechanism as an embedding criterion [13].A threeclasses Fully Connected Network (FCN) executes the final classification, taking the pooled graph representation as input.The FCN outputs are three values representing the probability that the input jet is generated from a bottom quark, a charm quark, or a light quark.The user can set through the configuration file the number of linear layers and the number of hidden nodes per linear layer in the FCN.To sum up, the user has the ability to customize the entire architecture of the network as per their requirements.Moreover, the attention mechanism [14] can be integrated into the network's graph part and the pooling function.The final architecture of the network is exemplified in Figure 3.

Training and performance evaluation
To train the previously defined network structure, the network weights need to be updated.The network adjusts the weights based on the input features and architecture to differentiate between different types of jets.The primary objective is to minimize the loss function.To achieve this, the optimizer calculates the gradient of the loss function with respect to the network weights and selects the weights that minimize the loss function.In the pipeline, the loss function and the optimizer are fixed respectively to the Cross-Entropy Loss [15] and the Adam Optimizer [16].The user can define the optimizer learning rate, i.e. the step size at each iteration while moving toward a minimum of a loss function.Furthermore, the network is trained on the whole training dataset in steps of data batches; the batch size can be defined by the user.All the customizable hyperparameters affect the network performance and could be used to perform a grid search for network optimization.

Application of the pipeline on simulated datasets
A comprehensive testing of the pipeline to evaluate both its usability and performance has been conducted.Two different datasets have been simulated with Run 2 center-of-mass energy √  = 13 TeV.The test consists of a grid search on the graph portion of the network architecture with the pooling function, the training hyperparameters and the FCN structure fixed.

Dataset simulation
Three Monte Carlo simulation frameworks interfaced with each other have been exploited to obtain the datasets.Firstly, MadGraph_aMC@NLO [17] was used to generate parton-level hard processes, followed by Pythia 8.3 [18] which provides the parton showering and hadronization.Finally, Delphes 3.5.0[19] covers the detector response simulation.The ATLAS experiment card, included in Delphes by default, has been used.To extrapolate the value and the uncertainty of the transverse  0 and longitudinal  0 impact parameter, the Delphes configuration has been modified adding the track smearing.A next-to-leading order  and a leading order  ′  dataset have been used in the network training.In the latter,  refers to the Higgs boson and  ′ is a hypothetical neutral vector boson that arises in extensions of the SM.Several models predict the existence of a  ′ boson, such as the B-L model [20] and the leptophobic  ′ model [21].In this study,  ′ has a fixed mass of   ′ = 2 TeV and it is restricted to decay in the hadronic channel ( ′ → ).The purpose of the  ′  dataset is to extend the jet transverse momentum range as shown in the two top plots of Figure 4.The tracks have been associated with a jet following the Δ criteria, where Δ is defined as with  the pseudorapity and  the azimuthal angle.The track is associated with the jet whether Δ ≤ 0.45 for jet   ≤ 150 GeV, while for jet   > 150 GeV is required Δ ≤ 0.26 [22].The resulting distributions for both datasets are reported in Figure 4. Using the Monte Carlo truth information, the jets have been labeled with three indices 0,1,2 that respectively correspond to light, charm, and bottom jets.For the training, a dataset with 7 global features and 11 node features has been prepared and the description of each feature is reported in Table 1.

Results
We conducted a grid search using both datasets.In this search, we examined a total of 20 different architectures, each trained for 500 epochs with the hyperparameters listed in Table 2.This approach allowed us to explore the various architectures to identify the best-performing model.To analyze the performance of the different architectures tested, we have extracted the minimum value assumed by the loss function during the training process over the 500 epochs.This metric provides insights into how well the model is able to minimize the difference between predicted and actual values.The obtained minimum loss value for each architecture has been plotted in Figure 5 for both datasets.By examining the plot, we can compare the performance of the various architectures and determine which one achieves the lowest loss value.The results of our analysis indicate that the architecture with 5 GCN layers and 512 hidden nodes per GCN layer has the lowest value of the loss function for both datasets.This suggests that this architecture is the most optimal for our purposes.However, we acknowledge that evaluating the network's performance based solely on the loss function may not be sufficient.Therefore, the pipeline provides the distribution of the discriminant,   , which can offer additional insights into the network's performance.That discriminant is defined as where   ,   and   are the network output probabilities for b-,c-and light quarks respectively, and   is the fraction of c-jet in the dataset.This variable represents the network's capability to distinguish between jet flavours.By examining the   distribution, we can better understand how well the network can discriminate between different classes, and make more informed decisions about its efficacy.The discriminant distribution obtained training the architecture with 5 GCN layers and 512 hidden nodes per GCN layer on the  dataset is shown in Figure 6 (up).In order to gain a better understanding of events, it is essential to extract physical information using tagging algorithms.The pipeline facilitates this by providing a feature ranking of the input features.By ranking these features based on their importance in the network classification task, we can gain insights into the physical properties of the events.Currently, the Pytorch Geometric Explainer method [23] is used in the pipeline to rank these features.This method provides a comprehensive approach to analyze and evaluate features that are crucial to the classification of events.However, we are continuously working on improving this feature ranking process to ensure that it is accurate and efficient.The result of the feature ranking obtained with the trained model with which we produced the discriminant distribution is shown in Figure 6 (down).From the latter, it is evident that the network considers the impact parameters to be fundamental, as they provide crucial information for determining the characteristics of the jet being analyzed.

Conclusion
The AUTOGRAPH pipeline, a state-of-the-art flavour tagging algorithm based on GNNs, has been presented.The pipeline consists of several components that work together to provide accurate and efficient jet tagging of different flavours produced in high-energy physics experiments.Firstly, the input data is pre-processed to extract relevant features that are then fed into the GNN-based tagging algorithm.The GNN architecture comprises several layers of neural networks that learn to recognize patterns in the input data.The output of the GNN is a set of probabilities that correspond to each flavour of the particle.The AUTOGRAPH pipeline has been extensively tested on two simulated datasets to evaluate its usability and performance.The performance of the pipeline was evaluated using standard metrics.Overall, the AUTOGRAPH pipeline provides a straightforward and efficient way to perform flavour tagging in high-energy physics experiments.

Figure 1 .
Figure 1.Pictorial pipeline representation.The user interface section represents the configuration file wherein the user can choose the setting for the automated steps.The latter are divided into three main processes illustrated schematically.

Figure 2 .
Figure 2. Fully connected graph representation of a jet in the pipeline.The forty nodes are the tracks associated with the jet.The graph is labeled from the Monte Carlo, in this case as 0, corresponding to light jets.

Figure 3 .
Figure 3. Representation of the network architecture.The input graph, labeled by the MC truth-level information, is classified from the Fully Connected Network in three classes corresponding to light quark, cor b-jets.

Figure 4 .
Figure 4. Jet transverse momentum distribution for the  at next-to-leading order dataset (up-left) and for the  ′  at leading order dataset with   ′ = 2 TeV (up-right).Distributions of associated track number for the  at next-to-leading order dataset (down-left) and for or the  ′  at leading order dataset with   ′ = 2 TeV (down-right).The distributions are divided for jet truth flavours.

Figure 5 .
Figure 5. Results from the grid search.(left) Heat map with best epoch loss value for the grid search on  at the next-to-leading order dataset (right).Heat map with best epoch loss value for the grid search on  ′  at the leading order dataset.

Figure 6 .
Figure 6.Discriminant distribution for the best architecture with the  at leading order dataset divided per truth flavour (up).Representation of the 18 input features ranking (down).