A detailed study of interpretability of deep neural network based top taggers

Recent developments in the methods of explainable artificial intelligence (XAI) allow researchers to explore the inner workings of deep neural networks (DNNs), revealing crucial information about input–output relationships and realizing how data connects with machine learning models. In this paper we explore interpretability of DNN models designed to identify jets coming from top quark decay in high energy proton–proton collisions at the Large Hadron Collider. We review a subset of existing top tagger models and explore different quantitative methods to identify which features play the most important roles in identifying the top jets. We also investigate how and why feature importance varies across different XAI metrics, how correlations among features impact their explainability, and how latent space representations encode information as well as correlate with physically meaningful quantities. Our studies uncover some major pitfalls of existing XAI methods and illustrate how they can be overcome to obtain consistent and meaningful interpretation of these models. We additionally illustrate the activity of hidden layers as neural activation pattern diagrams and demonstrate how they can be used to understand how DNNs relay information across the layers and how this understanding can help to make such models significantly simpler by allowing effective model reoptimization and hyperparameter tuning. These studies not only facilitate a methodological approach to interpreting models but also unveil new insights about what these models learn. Incorporating these observations into augmented model design, we propose the particle flow interaction network model and demonstrate how interpretability-inspired model augmentation can improve top tagging performance.


Introduction
Machine learning (ML) models are ubiquitous in experimental High Energy Physics (HEP).With an ever increasing volume of data coupled with complex detector phenomenology, these models are useful to find meaningful information from these large datasets.Over time, machine learning models have grown in complexity and simpler regression and classification models have been replaced by intricate and deep neural networks.Owing to their intractably large number of trainable parameters and arbitrarily complex non-linear nature, deep neural networks (DNNs) have often been treated as black boxes.It has always been challenging to understand how different input features contribute to the network's computational process and how the inter-connected neural pathways convey information.In recent years, advances in explainable Artificial Intelligence (XAI) [1] have made it possible to build intelligible relationship between an AI model's inputs, architecture, and predictions [2,3,4].While some methods remain model agnostic, a substantial subset of these methods have been developed to infer interpretability of computer vision models where an intuitive reasoning can be extracted from human-annotated datasets to validate XAI techniques.However, in other data structures such as large tabular data or relational data constructs like graphs, use of XAI methods are still quite novel [5,6].In recent times, XAI has been successful in learning the underlying physics of a number of problems in high energy detectors [7], including parton showers at the Large Hadron Collider (LHC) [8] and jet reconstruction using particle flow algorithms [9].
One of the major applications of ML in the field of HEP is classification of jets, which is referred to as jet tagging.Jets represent hadronic showers observed as conical spray of particles originating from quarks and gluons produced in the high energy collisions at a collider experiment like the LHC.Identifying jets that originate from decay products of a particle such as the top quark (t) and being able to separate them from other jet categories, such as jets originating from the quantum chromodynamics (QCD) background, is an important challenge in many physics analyses.Traditional top tagging algorithms based on kinematic features of jets and clustering of jet constituents (see Refs. [10,11,12,13] for example) have been used in particle phenomenology research as well as by the ATLAS and CMS experiments and their predecessors.In Run 1 physics analyses, these top tagging algorithms along with low-complexity statistical models like decision trees took the center stage in dealing with top tagging [14,15,16].However, owing to their superior performance, models based on DNNs started becoming popular in Run 2 at a higher center-of-mass energy of 13 TeV [17,18].
For top quarks produced with large momenta, the decay products can be packed close to one another and be reconstructed as a single jet.For such boosted jets, top tagging can be particularly challenging and require a better analysis of jet substructures, a collection of constituents and their derivative properties that can offer better discrimination between jet classes.DNNs have proven to be useful to exploit the jet substructure properties in performing jet classification.A wide variety of deep learning models have been developed to optimize top tagging [19,20,21,22,23,24,25,26,27,28,29,30,31,32,33].A comprehensive review and comparison of many of these models is given in Ref. [34].Some of these models have exploited DNN's capacity to approximate arbitrary non-linear functions [35] and their huge success with problems in the field of computer vision, other models have been inspired by the underlying physics information like jet clustering history [22], physical symmetries [23] and physicsinspired feature engineering [27].These efforts have inspired novel model architectures and feature engineering by creating or augmenting input feature spaces with physically meaningful quantities [27,36,37].
The rich history of physics-inspired model development makes the problem of top tagging an excellent playground to better understand the modern XAI tools themselves.This allows us to traverse a rare two-way bridge in exploring the relationship between data and models-our physics knowledge will allow us to better understand the inner workings of modern XAI tools and perfect them while those improved tools would allow us to take a deeper look at the models-paving ways for analyzing and reoptimizing them.As it has been pointed out in Ref. [38], such insights into explainability of DNN-based models are important to validate them, to make them reliable and reusable.Additionally, the broader scope of uncertainty quantification in association with ML models relies on developing robust explanations [39] and in the field of HEP for problems like top tagging will require dedicated understanding of how robust as well as interpretable these models are [40].
Yet another remarkable application of interpretability is to understand how the model conveys information and in doing so, which parts of a DNN most actively engage in forward propagation of information.Such studies could be useful to understand and reoptimize model complexity.Given DNNs have shown remarkable success in jet and event classification, recent work has placed emphasis on developing DNN-enabled FPGAs for trigger-level applications at the LHC [41,42,43].As resource consumption and latency of FPGAs directly depend on the size of the network to be implemented, it is definitely easier to embed simpler networks on these devices.Hence, methods that allow interpreting a network's response patterns as well as provide critical insights about model optimization without compromising its performance can greatly benefit these new budding fields of ML applications, especially for online event selection and jet tagging at current and future high energy colliders.
Application of state-of-the-art explainability techniques for interpreting jet tagger models is receiving more attention recently [37,44,45,46] and has been demonstrated to be successful in identifying feature importance for models like the Interaction Network [47].In this paper, we study the interpretability of a subset of existing MLbased top tagging models.The models we have chosen use multi-layer perceptrons (MLPs) as underlying neural architecture.Choosing simpler neural architecture allows us to elucidate the applicability and limitations of existing XAI methods and develop new tools to examine them without convoluting these efforts with the complexity of larger models or unorthodox data structures.To compare our results for different models as well as with existing benchmarks in published literature, we use the dataset developed by the authors of Ref. [23] and later used in the top tagger model review in Ref. [34].The models explored in this paper along with the dataset have been reviewed in section 2. The model hyperparameters explained in this section will constitute the baseline model in each category.Variants of each model are studied to better understand their interpretability where the underlying architecture remains the same but model hyperparameters, input features, or data preprocessing might be changed.Section 3 reviews modern XAI methods that we will use in investigating the explainability of top tagger models.In section 4, we analyze the results of applying XAI methods on different top tagger models.Section 6 summarizes our findings and illustrates new dimensions to explore in the conjunction of XAI and HEP.

Review of Top Tagging Dataset and Models
The dataset used in this paper has been used to perform model benchmarking studies in Ref. [34] and publicly available at Ref. [48].This dataset consists of 1 million top (signal) jets and 1 million QCD (background) jets genetated with Pythia8 [49] with its default tune at 14 TeV center of mass energy for proton-proton collisions.The detector simulation was performed with Delphes [50] and jets were reconstructed using the anti − k t algorithm [51] with a jet radius of R = 0.8 using FastJet [52].Only jets with transverse momenta within the range of 550 and 650 GeV are considered.For each jet, the dataset contains the four momenta of up to 200 constituents with zeropadded entries for missing constituents.The dataset is divided into traning, validation, and testing sets with a 6:2:2 split.Some characteristic jet features from a random subsample of the training data are shown in Figure 1.In this paper we consider three different NN-based models for top tagging.Given the tagger distinguishes between two jet classes, minimizing the standard binary cross-entropy (BCE) loss has been used as the training objective for all models.The training is done using the Adam optimizer with minibatches.All networks showed comparable performance with different batchsizes.The architecture, hyperparameters, and data preprocessing for each of the baseline models is summarized below-TopoDNN [19,17]: The simplest top tagging model we consider is a fully connected multi-layer perceptron (MLP) trained with transverse momentum (p T ), azimuthal angle (φ), and pseudorapidity (η) of the 30 most energetic particles.Usually referred to as TopoDNN, this model represents a quintessential MLP network.Although TopoDNN is outperformed by many other ML-models for top tagging, its simple architecture allows us to explore different XAI metrics, their limitations, and the best practices to overcome them.Since MLPs are still widely used in HEP for a wide variety of applications, our studies of modern XAI for this model will also illustrate the best practices to interpret the input-output relations for such models.TopoDNN is trained on preprocessed data where (i) the jet is rotated on the η − φ plane to have the most energetic component aligned along the central coordinate (0,0), (ii) the second most energetic component falls along the negative-φ axis, and (iii) all momenta are scaled with an arbitrarily chosen factor of 1/1700.The transformations (i) and (ii) take advantage of the underlying Lorentz invariance of collider physics and (iii) converts the momenta into unitless quantities and scales them down to a numerical range comparable to those of η, φ quantities.The baseline model is constructed with 4 hidden layers with 300, 102, 12, and 6 nodes respectively and ReLU activation function.The output layer consists of a single node which is converted by the sigmoid function to represent the probability of the jet being classified as a signal jet.[20,21]: Top-tagging with N -subjettiness variables uses an MLP as the underlying trainable architecture.However, the input to the network is different from the usual kinematic variables.It uses the multi-body N -subjettiness variables [53], defined as

Multi-body N -subjettiness (MBN S)
where p T,J and p T,i represent the transverse momenta of the jet and its i-th constituent and ∆R ki is the distance between the k-th jet axis and the i-th particle constituent.
The n jet axes chosen for claculating τ (β) n are obtained using the k t algorithm [54] with E-scheme recombination [55].Figure 2 shows the distribution of some of the τ variables for QCD and top jets.The input to MBN S tagger is the set of subjettiness variables where, besides the subjettiness variables, the jet p T and jet mass (m J ) variables are used as inputs to provide a kinematic scale for the jet event.However, the latter inputs are scaled by a factor of 1/1000 to mitigate the several orders of magnitude gap between , and (d) τ (2) 5 for background and signal jets their numerical range and those of the τ s.In our work we consider the MB8S modelan MLP consisting of 4 hidden layers with (200,200,50,50) nodes respectively.The ReLU activation function is used for the hidden layers.The output layer consists of 2 nodes transformed by the SoftMax function to respectively represent the probabilities of the jet being classified as a background or signal jet.
Particle Flow Network (PFN) [24]: PFNs are built following the deep set [56] architecture, inherently making the network invariant under permutation of particle constituents.The model implements the following relation where F and Φ represent non-linear functions implemented as trainable NNs, N is the number of jet constituents and p i is the four momentum of the i-th jet constituent.The Φ network learns to create a latent embedding of each constituent particle.These latent embeddings are summed over for all constituent particles of a jet creating jetlevel latent embeddings, making the network invariant under permutation of particle constituents.Finally, the jet-level embeddings are passed on to the F network which performs the jet classification.We train the network with the p T , η, φ of jet constituents as input.As a part of data preprocessing, we standardized the constituents' η and φ by subtracting the jet's η and φ.Also, the p T values of the jet constituents are scaled by a the inverse of sum of constituent p T s, i.e. 1 / i p T,i .The Φ network is implemented as an MLP with 3 layers of 100, 100, and 256 nodes respectively.Each layer is followed by a ReLU activation layer.The output layer of Φ represents a 256 dimensional latent space of jet representation.The F network consists of 3 hidden layers with 100 nodes per layer with ReLU activations.The output consists of two nodes transformed by the SoftMax operation to represent the probabilities corresponding to each jet class.

Interpretability of Machine Learning Models: Tools and Methods
While a number of XAI techniques have been developed in ML literature, how any one these methods actually explain an ML model can actually be quite different from the others [57,58,59].Often we find the XAI techniques producing diverging explanations, making it challenging to rely on these methods.We investigate a number of these techniques and compare the corresponding results and try to understand what may contribute to their divergence.Here, we summarize the methods that we are going to use to explore the interpretability of top tagger models.
The ∆AUC method: Identifying feature importance has been an important part of studying classification models [60].In standard feature selection tasks, a reasonable subset of the features that excels in some model performance metric is chosen.Although it is conceptually different from feature ranking in post-hoc model interpretation, many interpretation metrics also rely on identifying feature importance with a simpler surrogate model which is trained to minimize a model's performance loss [61].One of the most useful model analysis tools for binary classification is the Region Operator Characteristic (ROC) curve, and the Area Under the Curve (AUC) serves as a scalar metric for evaluating model performance.ROC-AUC based feature ranking has been widely promoted in ML literature [62,57,63].We adapt those same principles for our model interpretation studies.One straightforward way of evaluating a feature's contribution in making predictions is to investigate the model's performance when a particular feature is masked from the input-by replacing it with a populationwide average value or a zero value, whichever is contextually relevant to the model's relationship with the training dataset.

Shapely Additive Explanations (SHAP) [64]:
The SHAP scores represent a game theoretic approach in identifying the importance of difference features.For each instance of the dataset, the input features are assigned an additive score that determines to what extent a particular feature contributes to the classifier prediction.An average model prediction is determined by replacing each feature by its population average and then individual features are added back to the model to find their impact on leading the prediction towards the optimal value.For each feature, the SHAP score is determined by evaluating the average contribution of adding the feature over all possible feature subsets defined without that feature.Given that evaluating exact SHAP scores require iterating over 2 n sets of feature combinations for each data instance with n features, several simplifying assumptions are made to reduce the computational complexity.In our work, we use the kernel SHAP method-a model agnostic approach to obtain local explanations similar to the LIME framework [65] and obtained by generating random samples around the data point and performing a mean-squared-error-minimizing linear regression over the samples to evaluate the SHAP scores.In order to avoid any overfitting in obtaining the SHAP score, the number of samples in each model were chosen to be at least twice as many as the number of input features

Layerwise Relevance Propagation (LRP) [66, 67]:
The LRP technique propagates the classification score predicted by the network backwards through the layers of the network and attributes a partial relevance score to each input.The backpropagation of LRP scores in an MLP network is obtained by the following relation- where a (n) j and r (n) j are the activation and relevance scores of the j-th node in n-th layer and w jk:n is the weight that determines the contribution of the j-th activation in the n-th layer to the k-th node in layer n + 1.The inputs to the network are identified as the 0-th layer and the relevance scores assigned to them are denoted as r (0) j .The original LRP method has been developed for simple MLP networks.Variants of this method have been explored to propagate relevance across convolutional neural networks [68] and graph neural networks [69].While the basic LRP rule in Eqn. 4 conserves the total relevance score i.e. the classifier network's output, based on the distribution of weights and activations, relevance scores can become unbounded when k a (n) j w jk:n → 0. To overcome this, the LRP rule is modified to treat postive and negative weights asymmetrically.We use the so called LRP-γ rule defined as- where w + = w • Θ(w), Θ being the Heaviside step function and γ is a regularization parameter.In our representation, we always present the normalized relevance scores so that j r where S = {s 1 , s 2 ...s N } represents a set of samples over which the RNA score is evaluated.The quantity a j,k (s i ) is the activation of j-th neuron in the k-th layer when the input to the network is s i .When summed over all the samples in the evaluation set S, this represents the cumulative neural response of a node, which is normalized with respect to the largest cumulative neural response in the same layer to obtain the RNA score.Hence, in each layer, there will be at least one node with an RNA score of 1.Since the neurons are activated with ReLU activation in the models we consider, the RNA score will be strictly non-negative, and ≤ 1.In a qualitative way, we are trying to see which neurons most actively engage to obtain the predictions from our models.Since the MLPs in our models consist of only Dense layers, each layer takes all the activations from the previous layer as inputs.As all nodes within a given layer are subject to the same set of inputs, we can reliably estimate how strongly they perceive and transfer that information to the next layer by looking at their activation values.For the same reason, we normalize the cumulative activation of a node with respect to the largest aggregate in the same layer.The NAP diagram is obtained by presenting the RNA scores for the different layers of the model as a two dimensional heatmap where along the horizontal axis lies the different activation layers of the network and the vertical axis represents the different nodes in those activation layers.NAP diagrams illustrate the relative activity level of different nodes within each layer and hence can demonstrate the sparsity of the model's activity.
The methods that we have explored so far adopt widely different approaches to understanding different aspects of a NN.A summary of their properties is given in Table 1.In our analyses, we use models with O(10 − 100) inputs, so scalability is not a major bottleneck for application of these methods.On the other hand, although some of these methods allow exploring local explanations, i.e. explanations for individual data samples, we concern our studies with global explanations alone.

∆AUC SHAP LRP RNA/NAP Scalability in input dimension
Local explanation

Global explanation Requires Forward Propagation Requires Backward Propagation Susceptible to spurious correlations Addresses Model Complexity Requires Retraining
Table 1.Comparison of different XAI methods in terms of their implementation heuristics.We consider a method scalable if the complexity of its implementation grows at most by linear order.A local explanation refers to explanation metrics assigned to individual features for a given data point while a global explanation refers to explanation metrics assigned to individual features for the entire dataset.

Model Interpretability for Top Taggers
Ideally, we expect an XAI metric to correctly identify features that the NN consider most important.Hence, any post-hoc feature ranking XAI method should ideally identify the same set of features though their relative rankings can moderately vary.However, there is no straightforward correlation among XAI methods introduced in the previous section.In the following subsections, we first investigate and validate these methods in the context of simpler TopoDNN and MB8S models.Building on the insights obtained from these studies, we use these tools to interpret the PFN model and investigate its latent space representation.

TopoDNN
Since TopoDNN is arguably the simplest NN-based model to perform top tagging, it is perhaps the most ideal model to investigate different aspects of XAI.Given that correlation among features has been demonstrated to be an important aspect of identifying feature rankings in classical machine learning [70] as well as modern XAI methods, we start by examining the pairwise Pearson correlation coefficient for a subset of the input features for background and signal jets in Figure 3.The correlation matrices for both jet categories are mostly sparse except for some large anti-correlations between p t,0 , the transverse momentum of the most energetic jet constituent, and that of some of the low energy constituents.Given the dataset has been generated within a limited jet p T range, such anti-correlations are expected-the higher the energy of the most energetic constituent, the lower the energy of the remaining constituents.Given that the numerical range of the p t of lower energy constituents is typically much smaller than that of the highest energy constituent, we can expect the impact of their anticorrelations with p t,0 on the NN's performance to be rather small.We can indeed verify that in Figures 4a-4c where we identify the important features for the TopoDNN model using the ∆AUC (Figure 4a) and SHAP scores (Figures 4b and 4c).The ∆AUC score cannot independently identify the features that contribute to identification of signal and background jets.However, by evaluating the SHAP scores for subsets of the dataset that only contain one kind of jets, one can identify the features that most dominantly contribute to identification of the corresponding jet class.However, as shown in Figures 4b and 4c, there is a significant overlap between the features that are identified as the most important ones for both jet categories.Unlike computer vision models that deal with image or videos as input data and importance distribution for different images can vary based on which pixels carry the most relevant information, the same feature can contribute equally importantly for different classes in models with tabular data.Why the network treats the same set of variables as important becomes clearer upon inspecting the distribution of some of these preprocessed features as shown in Figures 4d-4g.φ 1 , φ 2 , η 2 all show strong classification characteristics, and given these variables are either loosely correlated or almost uncorrelated as shown in Figure 3, they all can independently contribute to the network's ability to tell apart the different jet classes.On the other hand, p t,0 by itself is a modest discriminator and hence identified as having a modest impact on the model's performance.This has also been verified by training a variant of the TopoDNN model that excludes the p t,0 variable and as shown in Table A1, performs almost equally as well as the baseline model.However, we see a stark difference in the distribution of the relevance scores (Figure 5) among different features, obtained from the LRP method, when compared to other feature ranking metrics we have considered so far.Unlike SHAP or ∆AUC scores, a subset of the p t variables have the largest relevance scores for both jet categories.While the most highly ranked features from the previous two methods show strong discriminating characterisitcs, some of the highly ranked features from the LRP method show very little discriminating capacity.This difference can be understood from the nature of these ranking methods.Both ∆AUC and SHAP calculate the model's deviation from the mean behavior, i.e. qualitatively, they both represent how much information is obtained from inclusion of the true value of a feature instead of using the population mean as a feature mask.On the other hand, LRP calculates the feature's cumulative relevance, which additively includes the relevance scores attributed to each feature's mean behavior.Assuming x = {x i } be a sample jet event taken from the set of events X and x \k = x\{x k } {E (X k )} be the event set where we mask the k-th input feature by replacing it with its mean value, the linear order behavior of the NN can be approximated using the Deep Taylor Decomposition formalism [71]- where f ( x) represents the output of the NN before the final Sigmoid activation.Noting that the relevance scores additively distribute the functional output among the different inputs, i.e. f ( x) = i r(x i ) and f ( x \k ) = i =k r(x i ) + r(x k ), we can rewrite Eqn.7 as, We define as the differential relevance score attributed to the corresponding feature.When the features are loosely correlated, collecting terms with equivalent indices in Eqn. 8 and ignoring higher order effects, we can write r(x k ) ≈ r(x k ) + δr k where we denote r(x k ) as the mean-behavior relevance score.Figure 6a show the absolute mean behavior relevance scores of different features and the relative size and distribution of the relevance scores are very similar to what we can see in Figure 5.This explains that a large contribution of the LRP scores actually comes from the mean behavior relevances, and has very little to do with the network's ability to distinguish different jet types.In fact, many of the angular variables that are regarded as highly important by ∆AUC and SHAP methods also show large differential relevance, as shown in the distributions of Mean Absolute Differential Relevance (MAD Relevance) scores (normalized with respect to relevance scores from the baseline model) in Figures 6b and 6c.Now we turn our focus to examine the behavior of the internal architecture of the model with NAP diagrams.As discussed in Section 3, NAP diagrams plot the RNA scores of different nodes of the activation layers within the network.Figure 7 shows the 2D map of RNA scores for QCD and top jets where the RNA scores of the former are plotted as negative values to allow simultaneous visualization.It can be readily understood that the network in Figure 7 is quite sparse, i.e. most nodes show relatively smaller activations.We can heuristically quantify sparsity of the network by the fractional number of hidden activation nodes with an RNA score less than a given threshold.We arbitrarily choose this threshold to be 0.2 and find that the network's sparsity measures for background and signal jet categories are 0.86 and 0.76 respectively, giving an overall sparsity measure of 0.70.This implies that about 70% of the nodes show a cumulative activity level of less than 20% compared to that of the most active node in the corresponding layer.We also see in Figure 7 that the most active nodes for different jet categories are almost completely disentangled by the time information propagates to the third layer.Large sparsity of the network and early disentanglement of jet categories indicate the network's complexity can be reduced without any noticeable compromise in its performance.To demonstrate this, we have trained variants of the TopoDNN model with lesser complexity by simultaneously reducing the depth and width of the MLP model.As shown in Table A1, these simplified models perform almost equally well while the model complexity is significantly reduced.

MB8S
Although the underlying architecture of the MB8S model is an MLP, there is stark difference among the input features.Unlike the input to the TopoDNN model explored in Section 4.1, the inputs to the MB8S model are highly correlated for both jet categories (Figure 8).Such large correlations among features can make it hard to distinguish whether a feature truly conveys independent discriminating characteristics or if a feature is deemed important by a model simply because it is correlated with another feature.Training a NN with correlated feature inputs can contribute to increased model complexity [72], overfitting [73], and obscure its interpretability [74].The MB8S network has been trained with DropOut layers [75] with a dropout rate of 0.2 (0.1) for the first (final) two hidden layers to protect it from the problem of overfitting.With such large correlations among input features, we want to differentiate between two aspects of a feature's importance-its independent contribution to a network's decision making process and its deemed importance in a certain instance of a trained model because of its correlations with other models.and 9c, the sets of top ranking features for these different methods show significant overlap but their raking shows some noticeable difference, especially for the jet mass feature.As shown in Figure 1c, the distribution of jet mass is very different for the two types of jets and naturally, it is expected to be a strong discriminator.This is also reflected in the distribution of SHAP scores.But the ∆AUC ranking places the jet mass variable at a lower ranking compared to subjettiness variables τ 2 respectively, both of which are identified as top ranking variables in both ∆AUC and SHAP rankings and have strong discriminative distributions as shown in Figures 2a and  2b.We do not expect a one-to-one correspondence between the feature rankings from ∆AUC and SHAP scores.Nevertheless, having a lower rank for jet mass compared to variables that display almost perfect correlations with other strong discriminators naturally intrigues the question whether it is overshadowing the importance of variables that actually adds new information to the classifier.It has been observed that the inclusion of jet mass significantly improves the performance of the MBN S taggers [20].Hence, the jet mass distribution has the ability to better contribute to a network's decision-making process compared to other highly correlated subjettiness variables.In order to investigate whether the highly correlated variables can actually independently contribute to the network's performance, we train a variant of the MB8S network where only the τ (1)  x variables are included, along with the mass and transverse momentum of jets.This choice is inspired from the block-diagonal concentration of pairwise correlations in Figure 8.This alternate network has almost identical performance as compared to the baseline MB8S network as shown in Table A1.The feature rankings via ∆AUC and SHAP scores for this network also consistently identify τ 2 , and jet mass as the most important features for this classification model.These top ranking features display relatively weaker correlations among themselves (correlation coefficients ≤ 0.4) and hence, can contribute new information to the classifier's decision making process.Moreover, since the two networks demonstrate almost equivalent performance, the highly correlated subjettiness variables only marginally impact the network's performance.The relatively high ∆AUC score attributed to τ This, however, does not imply that these features are unimportant for the particular instance of the trained network we investigated.On the contrary, the large ∆AUC scores associated with these variables indicate that the trained MB8S model depends on these correlations for proper inference.But the large importance associated with these variables should not be interpreted as their independent contribution to jet classification.
Next we turn our attention to relevance scores attributed to different input features by the LRP method.From our studies of the LRP method for the TopoDNN model, we know the relevance scores can be an unreliable measure in identifying how important a feature really is.We can see that pattern repeated for the MB8S network too.Figures 10a and 10c show the mean absolute relevance scores attributed to different features for background and signal jets.LRP attributes a large relevance score to p T,J , the transverse momentum of the jet for the background jets.However, this variable is one of the least expressive features for the network.It has almost no correlation with other features, and has a very similar distribution for both jet types.Hence, assigning this feature a very large relevance score definitely raises some concerns about the reliability of the feature ranking obtained from LRP.The distribution of MAD relevance scores in Figure 10b gives a more appropriate distribution for feature importances.For the top jets, the MAD relevance distribution (Figure 10d) identifies τ 2 , m J as the most important features.τ  To demonstrate how the different hidden activation layers contribute to information propagation for the two jet categories, we show the 2D map of the RNA scores of different nodes in Figure 11.Although the network appears to be relatively sparse for different jet categories, with a sparsity measure of 0.74 and 0.64 for background and signal jets measured with respect to a threshold of 0.2 on RNA scores, different nodes are most strongly activated for signal and jet categories.As a result, the network's overall sparsity becomes 0.44.However, we can already see that the nodes that are most strongly activated by the two jet categories are almost completely disentangled at Layer 2. This indicates that the network might be simplified by choosing a shallower network and indeed verified in Table A1 where we see that models trained without the final two layers have almost identical performance metrics.

PFN
While the previously analyzed TopoDNN and MB8S models both employed a single MLP to perform the jet classification, PFN utilizes a deep set topology that linearly combines particle-level neural embeddings to obtain a jet level latent space, which eventually is used to train a second MLP to learn the classification task.We start by examining the feature ranking from the ∆AUC and MAD relevance scores in Figure 12.We note that the ranking of features obtained from the ∆AUC metric is somewhat different from the ranking of the MAD relevance scores for the two jet categories.For instance, the azimuthal angle of the most energetic jet constituent, φ 0 appears with a low MAD relevance score for background QCD jets while appearing as the top-ranked feature for the signal jets while also reporting a relatively large ∆AUC value.On the other hand, the relative transverse momentum of that same constituent has the largest MAD relevance score for background jets as well as the largest ∆AUC score while being ranked after φ 0 , φ 1 for the signal jets.This difference in MAD relevance ranking for the two jet classes is somewhat unlike what we have seen for the classifier models previously investigated.These deviations in relative rankings are understood by examining the distributions of the corresponding features.Note that in linear order, the differential relevance δr k ∝ x k − xk .Hence, larger deviations from the mean can lead to larger differential relevance scores.The distribution of φ 0 − φ0 is sharply peaked at zero for background jets.Hence, it is reasonable for this variable to have a low MAD relevance score for background jets.On the other hand, the same variable shows long-tailed distributions for the signal jets and hence, has a larger impact on its MAD relevance score.Similarly, the distribution of p t,0 − pt,0 has long tails for both jet classes with a wider spread for the background jets.As a result, this variable shows up high in ranking for both jet classes, while being ranked higher for background jets compared to signal jets.
The TopoDNN network is also trained on a similar set of inputs and hence, it is naturally expected to see noticeable overlap in the set of important features for these two networks.However, the input data preprocessing is very different for these two networks which may result into significant differences on how these variables are treated by the corresponding models.TopoDNN always centers the most energetic constituent along the origin in the (η, φ) plane while using a fixed scale normalization for the p t variables.Hence, the φ 0 variable is trivially set to zero for all jets and it does not appear in TopoDNN's list of top-ranked variables.While such differences make it difficult to obtain a one-to-one comparison between the two sets of features rankings, some common conclusions can be drawn from both models.For instance, distributions pertaining to the highest energy constituents are (trivially) more important than the lower energy constituents.We also see that angular distributions of these constituents play a more decisive role in determining the jet class for the top jets.On the other hand, the distributions of their transverse momenta have a larger impact in classifying the background jets.
The particle level embeddings obtained by the Φ network are summed over to obtain a latent space representation of the jet.Characterization of latent spaces has been a topic of general interest in many areas of machine learning application.For instance, disentangling semantic features of images via latent spaces in Variational AutoEncoders (VAEs) [76] and its variants [77,78,79,80] has been widely studied in modern machine learning literature.In the context of collider physics, how latent spaces embed information and can be used as effective candidates for anomaly detection and bump hunting has been studied [30,24,81].The proponents of PFN performed detailed studies showing how the latent space representation forms discernible contours in the (η, φ) plane of jet image representations.While such studies are useful to divulge geometric features of the latent space configuration, interpreting how they actually contribute to the network's decision making process, especially for large latent spaces with O(100)-dimensions, remains unexplored.
Since the PFN network is trained to obtain classification scores in two disentangled dimensions in the very last layer of F network, it is only expected that the information propagation pathways for the different types of jets will show some level of disentanglement within the hidden layers of the networks.This is indeed verified by the NAP diagrams shown in Figure 13.These NAP diagrams reveal some crucial insights.Firstly, we clearly see that the activity level of different nodes in the final layer of the Φ network for background and signal jets is very similar.It implies that the network embeds the jet-level information in the same latent subspace.Secondly, the latent space appears to be very sparse and we indeed found that many of the latent space variables are identically zero for all events in both jet categories.A third observation is that the F network effectively learns to disentangle the representation of jet classes only at the third hidden layer (Figure 13b).But the first layer of the F network is quite sparse with RNA scores for almost 40% of the nodes being very close to zero for both jet classes.With these observations in mind, we retrained variant networks with latent space dimensions of 64 and 32 while reducing the number of nodes in the first hidden layer of F and still got comparable performance (Table A1).
Given the sparsity of the latent space representation, we expect that only a small subset of these latent features will actually have a strong contribution towards the decision making process of F in the baseline model.The ranking of different latent features using the ∆AUC score and MAD relevance scores are shown in Figures 14a-14c.There are noticeable overlaps among the features that rank high with these methods, though the actual sequence of latent variables, understandably, shows some differences.We show the distributions of some of these latent space embeddings in Figures 14d-14g.These distributions highlight a stark contrast between the the latent space representation for PFN when compared to those studied in the context of generative models like VAEs.The latter class of models are known to provide semantic disentanglement in their latent spaces creating a clear separation in latent space dimensions that account for variabilities in input distributions.For instance, VAEs trained on the popular celebrity protrait dataset CelebA [82] have successfully demonstrated disentanglement of semantics like age, gender, skin tone, hairstyle etc. [77] in the latent space.In the context of the top tagging dataset, it is the jet kinematics that is embedded in the latent space.As seen from the NAP diagram in Figure 13b and the latent space feature distributions in Figures 14d-14g, the PFN latent space does not provide any disentanglement between the jet classes.In fact, for models like PFN that are trained to learn the jet classes and not the distribution of training dataset, the model has no additional incentive in disentangling the jet classes.Rather, PFN learns to embed the information regarding jet classes in correlations among latent features.This can be seen from the latent space correlation matrices of the two jet classes shown in Figure 15.To illustrate the nature of the jet class identifying characteristics in the correlations among the latent features, we examine the distribution of variances in the datasets using Principal Component Analysis (PCA) [83].PCA performs a linear transformation on these features to obtain a set of orthogonal feature spaces with no cross-correlation among the transformed features.Since the size of the latent space is large but sparse, we select the top-ranking subset of latent features so that simultaneously masking each latent feature in the remaining subset causes at most 1% drop in the AUC score from the test data.For our baseline PFN model, this requires choosing 95 of the 256 latent features.We found that 99% of the observed variance in the test data was described by the top 37 principal components.We show the distribution of the top four components in Figures 16a-16d.We can readily see how these PCA-transformed latent features can differentiate between the two jet classes in Figures 16e-16f.
Having examined the disentanglement between the jet classes by the principal components of the latent space, it is also instructive to investigate the physical nature of the latent space learned by the network.While it is neither trivial nor obvious for neural networks to learn about features that bear meaning to humans, it has been seen that latent space networks can occasionally learn about physical variables [30].In case of the PFN, since the latent space dimensions are highly correlated with each other, we chose to study the correlation between the principal components and jet features like jet mass, the number of constituents, and the subjettiness variables which, as shown in  2, can have moderate to strong discriminative power.We found that the first principal component, z pc,0 (Figure 16a) shows a strong correlation with jet mass for both jet categories with correlation coefficients being 0.82 and 0.64 for background and signal jets respectively.z pc,0 also shows strong correlations with the number of jet constituents.z pc,1 and z pc,2 showed moderate to strong correlations with the subjettiness variables τ (1) 1 and τ (1) 2 , implying the PFN model also learns to somewhat reconstruct distributions similar to these variables.

Interpretability Inspires: The Particle Flow Interaction Network (PFIN)
The performance metrics of PFN are found to be very similar to that of the MB8S network.Our studies suggest that PFN learns to loosely reconsturct some of the expressive jet features in its latent space.However, what constrains PFN's performance can be understood by examining the construction of the latent space.The PFN latent space is constructed by linearly combining the individual particle-level embeddings from the Φ network.Such linear combinations constrain the latent space's ability to learn any inter-particle interaction.The particle-level embeddings obtained from the Φ network do not take into account the ensemble of particles constituting the jets.As a result, the network learns to create per-particle embeddings by emphasizing particle-level features that are known to have moderate-to-strong expressive distributions (Figure 12).Hence, we can expect a noticeable improvement in PFN's performance if the latent space can be augmented with interaction-level representations.In fact, modern architectures known to outperform the PFN model for top tagging take inter-particle interactions in some form into account.
Inspired by our observations regarding feature importance and latent space distributions for PFN as well as the trend in building modern architectures for top tagging, we propose an augmentation of PFN by including an Interaction Network (IN) [47,30] to demonstrate how particle-level interactions allow for better-performing models.
The dataflow for the proposed Particle Flow Interaction Network (PFIN) model is shown in Figure 17.The interactions are modeled in PFIN by constructing a fully connected undirected graph with N pp = P (P −1) 2 edges where P is the maximum number of constituent particles the network is trained with.Each particle is represented with N p features.For our purpose, we use N p = 3 with (p t , η, φ) for each particle with the same preprocessing used for PFN.Each edge is initially represented with 2N P features by concatenating the individual particle-level features.This node-to-edge level feature construction is facilitated by a couple of interaction matrices of size P × N pp called R R and R S .For P = 4, these matrices are constructed in the following manner: where we have labeled the rows with the particle ID and each column label (i, j) represents which particles are connected by this edge.The edge-level features are transformed by the Interaction Transformation (InTra) block to calculate a N I = 4 dimensional representation for each edge by calculating the physics-inspired quantities [33,27]: ln ∆, ln k T , ln z, ln m 2 where ∆ The subscripts 1 and 2 represent the two particles associated with the edge and each quantity in the aforementioned relations represents its unpreprocessed value.Given these quantities are symmetric with respect to the particles, the actual ordering of the particles does not impact PFIN's dataflow, maintaining the permutation-invariant property of PFN.These interaction features are transformed into N z dimensional interaction embeddings by the trainable Φ I network.These embeddings are propagated back to particle level using the interaction matrices and only those interactions are considered where both constituents are present.These particle-level interaction embeddings are concatenated with the original particle features and further transformed into N z dimensional modified per-particle interaction embeddings via a trainable Φ I,2 network.These embeddings are concatenated or summed with per-particle embedding from PFN's Φ network to obtain augmented particle embeddings.These augmented features are then summed over its constituents to obtain the jet-level latent space.Finally, the F network obtains jet class probabilities for each of the jet class based on these jet-level latent space features.
The model hyperparameters and performance metrics for our implementation of PFIN are given in Table 2.The choices of numbers of nodes in the hidden layers of Φ and F as well as the size of the latent space were inspired from the study of RNA scores and NAP diagrams for PFN.This allows us to keep the increase in number of trainable parameters manageably small.Both versions of PFIN show notable improvement in performance outperforming both PFN and IN models.PFIN outperforms LGN [29] and its performance is comparable to those of ParticleNet and ResNeXt models while requiring a significantly smaller number of parameters, providing much faster training and model convergence.PFIN allows us to explore the impact of pairwise particle interaction on jet classification.We calculate the ∆AUC and MAD relevance score for each pair of particles by masking the corresponding input to the Φ I network and calculating the deviation in model prediction with respect to the baseline model's result.We additionally calculate the deviation in the model's jet class prediction probabilities.The results for the PFIN network with summation used for augmented particle embeddings are shown in Figure 18.The pairwise particle interactions play a particularly important role in identifying the signal jets.The mean deviations in the background jet class probabilities are barely impacted by masking interaction features.However, for the signal jets, this impact is found to be rather large, the mean prediction probability is reduced by almost 20% when the interaction between the two most energetic jets is masked.This clearly outlines the importance of particle interactions in detecting signal jets and consequently, explains the improvement observed in model performance.
Similar to what we observed for the PFN model, the jet class information is found to be embodied in the distribution of correlations among the latent space features.Hence, we further investigated PFIN's latent space by performing PCA and one crucial observation is a stronger correlation between jet mass and the top principal component of the PFIN latent space.These correlation coefficients were found to be close to 90% for both jet classes for both variants of the PFIN model.The other principal components also showed moderately improved correlations with the subjettiness variables and the number of jet constituents.As a result, the latent jet features allow construction of notably more expressive distributions, contributing to the observed improvement in jet tagging performance.

Conclusion
This paper presents a comprehensive study of the interpretability of DNN based top tagger models.Our work has unveiled a number of important aspects regarding how these models connect with the corresponding datasets.We have observed intriguing inconsistencies in feature ranking from different ranking methods, especially how the LRP method can produce a relevance distribution that can lead to a misleading interpretation of feature importance.Modifying LRP to obtain differential relevance scores has been found to be more consistent with other approaches in XAI.Furthermore, explainability metrics need to be carefully studied and understood for models trained with highly correlated input features since, as we show with the MB8S model, interpretability of AI models can be obscured in such cases.Our investigation suggests that models learn to embed jet class information in correlations among latent space embeddings and can learn to mimic distributions that closely resemble physical jet features.On the other hand, RNA scores and NAP diagrams can lead to an effective understanding of how information is propagated through different layers of a network and can lead to efficient model reoptimization strategies.Using NAP diagrams to reduce network complexity can be especially beneficial when multiple variants of the same network are required to be trained, e.g. to quantify uncertainties on event classification scores from systematic variations or adapt an architecture to learn jet classification in different phase spaces.RNA scores and NAP diagrams also open the possibility of incorporating these methods to obtain in-situ model optimization during training.
Observations regarding feature importance and model sparsity can also lead to better model building, as we demonstrate with the PFIN model which learns to take advantage of individual particle-level feature embeddings as well as pairwise particle interactions to noticeably improve over the PFN model.PFIN's performance is better than or comparable to a number of other novel models while its implementation needs a smaller number of parameters, ensuring faster training and quicker convergence.
Studying the impact of pairwise particle interactions on PFIN using ∆AUC and MAD relevance scores reveals how these interactions can play an important role to identify top jets.This work establishes a methodological paradigm to demonstrate the usage of XAI tools for day-to-day applications of DNNs and MLPs in HEP.Many modern HEP analyses heavily rely on DNNs not only for jet tagging and object reconstruction, but also for event classification and identification of signal events in search for physics beyond the standard model.Our work lays the foundation of exploring quantitative and robust explanations for such models.Compared to many of the existing tools to find feature importance, calculating ∆AUC and MAD relevance scores is much faster and can produce equally reliable explanations for the model performance.While these methods may require further sophistication to be applicable with other data structures, we expect them to provide satisfactory performance for most classification problems relying on tabular data.Building on the results and observations presented in this work, our future work will take a deeper look into adapting these novel tools and methods for more general data types and interpreting novel architectures like graph nets and transformers in the context of top tagging and more general jet classification scenarios.

Figure 1 .
Figure 1.Distribution of (a) number of constituent particles, (b) jet transverse momentum (p T,J ), and (c) jet mass (m J ) for background (QCD) and signal (top) jets

j = 1 .
Neural Activation Pattern (NAP) Diagrams [44, 45]: While the aforementioned methods help identify the importance of features for a trained NN model, the NAP diagrmas visualize the information propagation pathways through the network's architecture.The NAP diagram visualizes the Relative Neural Activation (RNA) score, defined as-

Figure 3 .
Figure 3. Feature correlation matrix for p T , η, and φ of the 10 highest energy constituents of (a) background QCD jets and (a) signal top jets.

Figure 4 .
Figure 4. Feature ranking obtained from the (a) ∆AUC scores, (b) SHAP scores for background QCD jets, and (c) SHAP scores for signal top jets.Only the 20 highestranked features are shown.Figures(d), (e), (f), (g) show the distributions of the inputs φ 1 , φ 2 , η 2 , and p t,0 for two jet categories after performing the preprocessing described in Section 2.

Figure 5 .Figure 6 .
Figure 5. Distribution of the relevance scores obtained from LRP for (a) background QCD jets and (b) signal top jets.Only the 20 highest-ranked features are shown for each jet category.

Figure 7 .
Figure 7. NAP diagram for TopoDNN model visualizing a 2D map of RNA scores for different nodes of the activation layers.To simultaneously visualize the scores for QCD and top jets, we project the RNA scores of the former as negative values.The nodes in each layer are ordered according to their RNA scores for the top jets.

Figure 8 .
Figure 8. Feature correlation matrix for (a) background QCD jets and (a) signal top jets for the different input features for the MB8S network shown in Figure8, these variables demonstrate almost 100% correlation with τ

Figure 9 .
Figure 9. Feature ranking obtained from the (a) ∆AUC scores, (b) SHAP scores for background QCD jets, and (c) SHAP scores for signal top jets for the MB8S network.Only the 10 highest-ranked features are shown.
be resulting from strong feature correlations.

Figure 10 .
Figure 10.Distribution of (a) relevance scores for background QCD jets, (b) MAD relevance scores for background QCD jets, (c) relevance scores for signal top jets, and (d) MAD relevance scores for signal top jets for the MB8S network.Only the 10 highest-ranked features are shown for each jet category.
in the ∆AUC metric and we have explained how their importance primarily stems from their correlations with other expressive input features to the network.

Figure 11 .
Figure 11.NAP diagram for MB8S model visualizing a 2D map of RNA scores for different nodes of the activation layers.To simultaneously visualize the scores for QCD and top jets, we project the RNA scores of the former as negative values.The nodes in each layer are ordered according to their RNA scores for the top jets.

Figure 12 .
Figure 12.Feature ranking obtained from the (a) ∆AUC scores, (b) MAD relevance scores for background QCD jets, and (c) MAD relevance scores for signal top jets for the PFN network.Only the 20 highest-ranked features are shown.

Figure 13 .
Figure 13.NAP diagrams for (a) Φ and (b) F networks for the PFN model showing the RNA scores of different nodes in these networks for the two jet classes.To simultaneously visualize the scores for QCD and top jets, we project the RNA scores of the former as negative values.The nodes in each layer are ordered according to their RNA scores for the top jets.

Figure 14 .
Figure 14.Figures in the top row show the latent space feature rankings obtained from the (a) ∆AUC scores, (b) MAD relevance scores for background QCD jets, and (c) MAD relevance scores for signal top jets.Only the 20 highest-ranked features are shown for each jet category.Figures in the bottom row show the distribution of some of the highest-ranked latent space features for both jet categories.

1. 0 Figure 15 .
Figure 15.Correlation matrix for latent space features for (a) background QCD jets and (b) signal top jets.Only the top 30 latent features obtained from the ∆AUC metric are shown.

Figure 16 .
Figure 16.Distributions of some of the principal components for background QCD jets and signal top jets (top row) and pairwise distributions for some of these components as a function of the predicted signal class probability (bottom row)

Figure 17 .
Figure 17.Model architecture and data flow for the PFIN model.N b represents the batch size.The InTra block computes the pairwise particle interaction features given in Eqn.10.The Cat/Sum block creates the augmented particle embeddings by either concatenating or summing the outputs of Φ I,2 with Φ. Φ, Φ I , Φ I,2 , F networks are implemented as fully connected MLPs with ReLU activation.

Figure 18 .
Figure 18.(a) ∆AUC, (b) normalized MAD relevance score, and (c) mean deviation in jet class prediction probabilities for the PFIN model (with summation being used for particle level feature augmentation) for pairwise masking of particle interaction features for the five most energetic constituents.Particle constituents are arranged in decreasing order of their energies.For (b) and (c), the upper and lower diagonal entries represent the corresponding scores for the signal and background jet classes respectively.

Table 2 .
Model hyperparameters and performance metrics for the PFIN model.(s) and (c) respectively represent architectures where per-particle interaction embeddings from Φ I,2 are summed and concatenated with the particle embeddings from Φ.The background rejection rate 1/ B is evaluated at a signal efficiency of 30%.

Table A1 .
Performance and sparsity of the baseline and model variants for different model architectures.The background rejection rate 1/ B is evaluated at a signal efficiency of 30%.