New angles on fast calorimeter shower simulation

The demands placed on computational resources by the simulation requirements of high energy physics experiments motivate the development of novel simulation tools. Machine learning based generative models offer a solution that is both fast and accurate. In this work we extend the Bounded Information Bottleneck Autoencoder (BIB-AE) architecture, designed for the simulation of particle showers in highly granular calorimeters, in two key directions. First, we generalise the model to a multi-parameter conditioning scenario, while retaining a high degree of physics fidelity. In a second step, we perform a detailed study of the effect of applying a state-of-the-art particle flow-based reconstruction procedure to the generated showers. We demonstrate that the performance of the model remains high after reconstruction. These results are an important step towards creating a more general simulation tool, where maintaining physics performance after reconstruction is the ultimate target.


Introduction
The detailed simulation of particle interactions with sophisticated detector systems is central to modern high energy physics, providing both the bridge between theory and experiment and the means to design and optimise detectors for future experiments.Traditionally, simulation tools have been based on the use of Monte-Carlo methods, with Geant4 [1] being the most prevalent.This approach can produce very high quality simulations, but comes at the price of consuming significant computational resources [2].
A major challenge for generative model based fast simulation approaches is that the next generation of particle physics experiments that will be performed at colliders feature detectors of an ever increasing granularity.This is for example true of both the Calorimeter Endcap Upgrade for the CMS experiment in preparation for the upcoming High Luminosity phase of the LHC [19], and for experiments at so called Higgs Factories.Such a facility would consist of an e + e − collider, being either circular or linear, that would provide clean environments to enable high precision studies of the Higgs, electroweak and top sectors [20] [21].The high granularity present in these detectors places high demands on the physics performance of a generative model based simulator, as well as presenting data of a high dimensional nature.
Building on previous studies into the simulation of both electromagnetic [11] [22] and hadronic [12] showers in a highly granular calorimeter with a highly performant BIB-AE model, the contribution of this work is twofold.First, we extend the previous studies by demonstrating the ability to condition a BIB-AE on multiple parameters, while maintaining a high degree of physical fidelity with respect to Geant4.This is a crucial step towards being able to apply such simulators in a realistic environment, where particles can impact a detector with various angles as well as energies.Secondly, we perform a detailed study of the effect of applying a state-of-the-art reconstruction algorithm and demonstrate that the BIB-AE based simulator maintains a strong performance.This evaluation serves as a strong indicator of the quality of a model, which will ultimately be judged on its physics performance after reconstruction.
The paper is organised as follows.Section 2 gives details on the dataset and describes the particle flow reconstruction algorithm used.The generative model1 is described in Section 3, and a review of its performance before and after reconstruction, as well its computational performance, is shown in Section 4.

Dataset and Reconstruction Scheme
In this section, the calorimeter system used in this study, and the production of the dataset, is described in Section 2.1, while the particle flow algorithm used for reconstruction is outlined in Section 2.2.

Dataset
For this work, we focus on the International Large Detector (ILD) [23], a detector concept proposed for the the International Linear Collider (ILC), which is one option for a Higgs Factory and a high energy lepton collider.The ILD detector is optimized for the particle flow algorithm, which aims to reconstruct each individual particle in an event, with the goal of obtaining the best overall detector resolution possible.
This introduces a number of requirements on the detector -the most relevant to this study is the high granularity of the calorimeters required.The subject of this work is the simulation of photon showers in the Si-W option for the ILD electromagnetic calorimeter (ECAL).This detector is a sampling calorimeter which consists of 30 layers of active silicon sensors sandwiched between tungsten absorber layers.It features two sampling fractions -with the first 20 tungsten layers being 2.1 mm thick, while the last 10 layers have a thickness of 4.2 mm.The silicon layers feature cells of size 5 × 5 mm2 .
The iLCSoft [24] ecosystem is used by ILD for simulation and reconstruction of the detector response, as well as for subsequent analysis.The training data used in this work is produced via full simulation with Geant4 version 10.4 using the QGSP BERT physics list, using a realistically detailed detector model implemented in DD4hep [25].
The training dataset consists of showers in the ILD ECAL, which are initiated by photons fired from a virtual particle gun.This particle gun has a fixed position of (x , y z ) = (0.0 mm, 1810 mm, −50 mm) 2 in the ILD coordinate system.This coordinate system is orientated such that z points along the beam axis, and y points vertically, i.e. perpendicularly to the calorimeter face.The shower images used for training are produced by projecting the ECAL hits into a regular grid of 30 × 60 × 30 voxels in the local (x, y, z) coordinate system.In these transformed coordinates, the z axis lies orthogonal to the face of the calorimeter, so each plane of voxels along this dimension corresponds to one of 30 layers in the physical ECAL.In this projected space, the photons are incident at a fixed cell at index (i x , i y , i z ) = (15,12,0).Since the geometry of the ECAL is not perfectly regular, a staggered pattern tends to appear along the z direction of the images [11].The projection from the physical geometry to a regular grid also tends to cause artefacts which are corrected for such that each voxel in the regular grid corresponds to exactly one sensor in the calorimeter.A total of 500k showers are present in the training dataset, produced by photons with varying incident energies and angles 3 .The incident angle is varied in one direction such that it corresponds to the polar angle in the global ILD coordinate system (i.e. the angle with respect to the z -axis).The incident energy varies uniformly in the range of 10 to 100 GeV, along with the angle in the y − z plane, which simultaneously varies from 90 to 30 degrees to the calorimeter face.The angle in the local x − z plane (corresponding to the azimuthal angle in the global ILD coordinate system) is fixed to be 90 degrees.Finally, 9 test datasets, each consisting of 1900 showers produced by photons at fixed combinations of energies ({20, 50, 90} GeV) and polar angles ({40, 60, 85} degrees), are used to check the model performance across the phase space.This allows the evaluation of the single energy and angular response of the model, which is the end target of a simulator.A fixed calibration factor is applied to scale the hit energies in the last 10 layers to account for the change in the sampling fraction arising from the thicker absorber layers.

Reconstruction Scheme
The precision physics goals of future e + e − Higgs factories require the ubiquitous adoption of particle flow reconstruction at these facilities.This approach aims to reconstruct each individual particle in an event using information from the best subdetector system for the given task, in order to optimise the overall detector resolution.In this work, we make use of PandoraPFA [26], the state-of-the-art pattern recognition algorithm used by ILD and other linear collider detector concepts.
The full reconstruction chain used by ILD is a multi-step process.Firstly, a digitisation procedure is applied to the hits produced by the simulation.This involves emulating effects which are for instance intrinsic to the sensor or arise from the readout electronics.Secondly, for the calorimeters a two step calibration procedure is applied to convert the total visible energy deposited in the detector back to the incident particle's energy in GeV.Next, after several pattern recognition algorithms to reconstruct the tracks from charged particles are run, sophisticated clustering procedures are iteratively applied to the resulting calorimeter hits by PandoraPFA, using track information where appropriate.The final result is a list of reconstructed particles, or Particle Flow Objects (PFOs), with important information pertaining to the particles (energy, momentum, ID, etc.) associated.This associated information is then used for high level reconstruction and subsequent physics analysis.

Bounded Information Bottleneck Autoencoder
The Bounded Information Bottleneck Autoencoder (BIB-AE), first introduced in [27], generalises many of the features present in common Generative Adversarial (GAN) and Variational Autoencoder (VAE) architectures.The BIB-AE architecture was successfully applied to the problem of calorimeter shower simulation in [11], with further The encoder reduces the input calorimeter showers to a low-dimensional latent space, which is regularised by KLD and MMD loss terms in addition to a latent critic.The decoder reconstructs shower images back from the latent space, with a dual purpose reconstruction critic simultaneously assisting reconstruction and judging the quality of the output shower.The Post Processor network is trained in a second step to adjust voxels individually.The blue and lilac lines represent an input conditioning on the energy and angle of the incident particle, while the red line represents a conditioning on the visible energy.
At its core, the BIB-AE is an autoencoder, with an encoder network N E mapping calorimeter showers to a lower dimensional latent space, and a decoder network N D reconstructing showers from the latent representation.Around this core, a number of auxiliary components aid specific elements of either the training or generation process.
The first set of additional components focuses on the latent space.As in standard VAE approaches, the latent space of the autoencoder must be regularized towards a known distribution, for which we make the standard choice of a Standard Normal distribution N (0, 1).Two terms are included in the loss to enforce this latent regularisation constraint.The first is a Kullback-Leibler divergence (KLD) term, defined as where D KL is the Kullback-Leibler divergence.It is defined between two probability distributions P and Q, with probability densities p and q respectively, as The second latent regularisation contribution is a Maximum Mean Discrepancy (MMD) [29] term between the latent space and a Standard Normal distribution, defined as where x, x and y, y are independent samples drawn from distributions P and Q respectively, k is a positive definite function and E denotes the expectation.These are complemented by a loss contribution from a Wasserstein-GAN-like latent critic C L , trained to distinguish between the latent space and a Standard Normal distribution, with loss given by ( The approach to sampling from this latent space will be described in Section 3.2.
The second role is fulfilled by a second Wasserstein-GAN-like reconstruction critic C, with loss contribution given by This critic serves a dual purpose -it not only aids image reconstruction from the latent space by comparing input and output shower images, but also provides feedback as to whether showers look realistic or not.Note that following the developments in [12], the latent and reconstruction critics illustrated in Figure 1 actually represent two identical network architectures with independent weights.One network is trained continuously, while the other has its weights and optimizer reset after each epoch.This helps to prevent artefacts induced by the continuously trained critic becoming desensitised to particular data features, while retaining the learnt behaviour that is lost when resetting the critic.This was originally introduced in [12] to deal with the additional sparsity present in hadronic showers, however we find that this also benefits the network performance for photon shower generation in the larger grid size studied here.
The combination of each of these elements results in a total loss given by ( 7) with each term being controlled by an independent hyperparameter β i [12].The final component of the architecture is a separate Post Processor Network.This network consists of a series of kernel size one convolutions, which biases the network towards the refinement of individual voxel energies, rather than their creation or removal.By incorporating loss terms based around the Mean Squared Error (MSE) and the Sorted Kernel Maximum Mean Discrepancy (SK-MMD), as well as a loss term based on comparisons averaged over a batch of showers, the description of the hit energy spectrum can be significantly improved [11].This network is trained in a second step, once the training of the networks in the main pre-trained BIB-AE setup is complete and can be frozen, in order to enhance the stability of the Post Processor training [12].
Importantly, for the model to be useful as a simulator for physics it must be able to give a physically meaningful detector response that is dependent on the properties of the incident particle.For this reason, all of the networks in the BIB-AE architecture (with the exception of the latent critic) are conditioned on both the energy and polar angle of the incident particle.The Post Processor Network is additionally conditioned on the total visible energy each shower deposits in the calorimeter.The networks are built around combinations of 3D convolutions and fully connected layers, using the ADAM optimizer [30] with an exponential learning rate decay.Minibatch discrimination is applied throughout the training.Prior to feeding the training data to the network, a threshold is applied to map hits below 1 × 10 −4 MeV to zero.The BIB-AE architecture was trained for a total of 50 epochs, after which the model was frozen and the Post Processor trained for a further 53 epochs.To select the best performing epochs, a similar approach to that described in [22] was followed.The selection was therefore based on a bin-wise area difference between distributions of key physics observables (see Section 4.1) for BIB-AE and Geant4 generated showers.Particular weight was given to the visible energy and angular response, and a selection made to find the best performing base BIB-AE and Post Processing epoch.

Sampling Strategy
Thus far, our description of the BIB-AE implementation has focused only on the steps necessary to train the model.However, if we are to use the model as a simulator, we must be able to draw samples from the learnt latent representation z.It is for this reason that generative autoencoder approaches introduce structure to the latent space via regularisation.This regularisation, however, comes at a cost -the more we drive our latent space to have a known structure, the less information we can encode.Inspired by the buffer VAE approach [31], previous BIB-AE studies [22] [12] have applied density estimation to permit an improved latent space sampling.The advantage of this method is that it reduces the regularization constraint, and allows for an increased focus on information retention.Kernel Density Estimation [32] was previously used for this purpose in [22] [12], providing an accurate model of the latent space variables and their correlations, and resulting in a performant simulator.However, since both of these studies were restricted to conditioning on only the energy of the incident particle, they made use of a rejection sampling method to generate samples of a specific energy.While this sampling method is sufficiently fast for single parameter conditioning, it becomes a major bottleneck for multi-parameter conditioning.A new conditional density estimation strategy is therefore required.
A class of models that are well suited to this task are so called Normalizing Flows [33] [34].These models aim to learn a bijective mapping X = g(Z) with inverse f := g −1 between a simple base random variable z ∈ Z with a known and tractable probability density function p z (z), and a random variable x ∈ X with some unknown probability density function p x (x).Under the change of variables formula, p x (x) can then be computed from p z (z) via Our Normalizing Flow model is implemented using the PYRO [35] deep probabilistic programming library.The model itself is a hybrid, consisting of 8 blocks each with 7 coupling layers.Of these coupling layers, 6 are based on affine transformations [36] and one is based on element-wise rational spline bijections of linear order [37] [38].Each layer is conditioned on a two dimensional context containing the energy and angular labels, which are pre-scaled by dividing by values of 100 and π 2 respectively.To train the model, the 500k shower samples are encoded with the pretrained BIB-AE model, resulting in 24 latent variables for each shower.Additionally, the corresponding energy sum of each shower in MeV scaled by a factor of 10 4 is appended, resulting in a 25-dimensional training distribution.The inclusion of the energy sum of the shower here not only provides the additional conditioning label to the Post Processor Network (see Section 3.1) during sampling, but also permits a per-shower re-scaling of the visible energy sum in a similar manner to [15] (see Figure 1).The Normalizing Flow model was trained for a total of 200 epochs, and the best performing epoch selected based on the loss.This parameter configuration of the model was then used for subsequent latent space sampling to feed into the pre-trained BIB-AE during inference.

Results
In order to benchmark the physics performance of the BIB-AE generator, we compare statistical distributions of key physics observables produced by the model to those of Geant4 in Section 4.1.This comparison is broken down into two parts.
Firstly, we look at simulation level.Here the direct output of the two simulators is used, providing an indication of how well the BIB-AE model has learnt from the training data.This has been the typical means of comparison in the vast majority of prior work (e.g.[11] [5]), and while it provides a useful and interesting comparison, it does not provide a complete picture.
For this reason, we additionally perform a comparison of observables at reconstruction level, which are ultimately the quantities that will be used in physics analyses.Since the reconstruction procedure (see Section 2.2) typically relies on applying a series of sophisticated topological clustering algorithms, it is by no means clear a priori which attributes of the high dimensional data space are important for a model to capture.This was demonstrated previously in [12].Here we go beyond that study, which looked only at the effect of reconstruction on linearity and resolution, by providing a systematic study of the effect of reconstruction on a large number of observables.
We conclude the results by reporting the computational performance of the BIB-AE model during inference in comparison to Geant4 in Section 4.2.

Physics Performance
For each observable, we begin by studying the performance of the BIB-AE in comparison to Geant before reconstruction, but with a simple calibration factor to account for the two sampling fractions (as described in 2.1).
A cut is applied to cells with an energy deposition below 0.07875 MeV, which corresponds to cell energies that are less than half of the most probable energy loss of a minimum ionising particle (MIP).This emulates the situation in a real calorimeter, since this region lies below the noise floor.
Subsequently, the effect of reconstruction (see Section 2.2) on each observable is studied.For this purpose, a selection criteria is placed on the data, such that only events containing one PFO are used for the performance evaluation.As a first step to quantify the quality of the BIB-AE simulator, we study the conditioning performance by considering observables that are highly correlated with the angle and energy of the incident photon.

Angular Response
The angular response is characterised by finding the principal axis of the shower with a principle component analysis (PCA).The resulting distributions for the reconstructed angles in degrees for 20 GeV (left), 50 GeV (center) and 90 GeV (right) showers are shown in Figure 2, comparing the Geant4 test data with BIB-AE generated data.In this figure and throughout this work, we apply a consistent colour scheme to represent distributions corresponding to showers with a fixed incident angle of 40 degrees (orange), 60 degrees (green) and 85 degrees (blue).Across the range of fixed angle and energy showers considered, the angular distributions of the BIB-AE generated showers match their Geant4 counterparts.The most noticeable discrepancy is a slight mismodelling of the very sharp peaks present in the Geant4 distributions by the BIB-AE, that tends to become more apparent as the energy of the incident photon is increased.The effect of reconstruction on the angular performance of the model can be seen in Figure 3.The reconstructed angle of each shower is obtained from the intrinsic direction of the clustered hits.Reconstruction has little visible effect on the angle, with the overall good agreement to Geant4, and the slight mismodelling of the sharp peaks at higher energies, being retained.
The angular distributions are characterised in more detail in Figure 4 through Gaussian fits to both the Geant4 and BIB-AE distributions.The mean µ and standard deviation σ of the fits are then extracted, and the means (left) and standard deviations (right) plotted as functions of the incident particle angles in order to obtain the effective angular linearity and widths for each fixed incident energy of 20 GeV, 50 GeV and 90 GeV.The means of the angular distributions are very well reproduced by the BIB-AE, with the maximum deviations from the Geant4 values reaching only to the 1 % level.Turning to the resolution plot in the right of the figure, the performance of the BIB-AE appears to degrade with increasing energy.The angular response for 20 GeV showers at 40 degrees shows excellent agreement, whereas a deviation of almost 40% is observed for 90 GeV showers at 85 degrees.
The effects of reconstruction on the angular linearity and width can be seen in Figure 5.This confirms that reconstruction has a minimal impact on the angular performance of the BIB-AE, with only minor changes in the mean and widths of the angular distributions being introduced by the clustering procedure in reconstruction.

Energy Response
To study the energy conditioning performance of the BIB-AE, the total energy deposited in the active sections of the calorimeter is determined and shown in Figure 6.The energy sums of the BIB-AE distributions match their Geant4 equivalents across the board, with only minor deviations.This is a direct result of the re-scaling procedure using the per-shower energy sums generated by the Normalising Flow (see Section 3.2).The energy sum distributions are characterised in further detail in Figure 7, with the mean (µ 90 ) and rms (σ 90 ) of the central 90% of the distributions being calculated.These are shown in the Figure as the linearity and resolution as a function of the incident particle energy, for each of the various incident angles.The linearity (left) is particularly well described, especially for photons with a low incidence angle, with the maximum deviations from the Geant4 value restricted to well below the 1% level.While the resolution (right) exhibits larger deviations, it is also well described, with deviations being restricted to below the 10% level.
The effect of reconstruction on the energy conditioning performance of the model can be seen in Figure 8.In this case, the distributions show the energy of the Particle  Flow Object (PFO) reconstructed by PandoraPFA, which correspond to the incident particle energy that has been reconstructed.Again, an excellent agreement between the BIB-AE and Geant4 distributions is observed.This is confirmed by the PFO linearity and resolution shown in Figure 9.The effect of reconstruction on the linearity  is minimal.While a slight effect on the resolution can be seen, these deviations are still restricted to below the 10% level across all the test points.

Cell Energy Spectrum
We now turn our attention to the cell energy spectra of the showers.The results are summarised in Table 1, by calculating the Jensen-Shannon Distance (JSD) between the Geant4 and the BIB-AE distributions for each combination of energy and angle.The JSD between two probability distributions P and Q, with probability densities p and q respectively, is the square root of the Jensen-Shannon divergence where M = 1 2 (P +Q), and D KL is the Kullback-Leibler divergence, defined in equation 2 in Section 3.1.
For the simulation level results, the best (50 GeV, 85 degree) and worst (90 GeV, 40 degree) combinations are highlighted and presented in Figure 10.The grey hatched region to the left of these plots indicates the region below half a MIP that will be removed in a real calorimeter (typically referred to as the MIP cut).Therefore, for the simulation level results, the reported JSD is only calculated in the region above the MIP cut.Larger deviations from the Geant4 ground truth are exhibited in the region around and below this cut-off.However, above the half MIP cut the variations in the shape of this spectrum with incident particle energy and angle tend to be accurately described.The key feature is the distinct peak that occurs in the spectrum around 1 MIP, which is well reproduced by the BIB-AE thanks to its Post Processor Network.This was  first achieved by the original BIB-AE architecture [11], and these results indicate that this capability can be extended to more general simulation scenarios where multiple conditioning parameters are necessary.
The resulting JSD values for the cell energy spectrum of each energy and angle combination after reconstruction are summarised in the right of Table 1.The best (50 GeV, 85 degrees) and worst (now 20 GeV, 40 degrees) are highlighted in bold, and the distributions shown in Figure 11.These distributions end at the half MIP threshold, as lower energy hits are discarded during reconstruction.After reconstruction, the MIP peak that was smooth at simulation level is now smeared out as a result of the two MIP calibration factors that are applied during the process.Overall, the BIB-AE is able to reproduce the cell energy very well after reconstruction, with only minor deviations being visible, even for the worst performing distribution at 20 GeV, 40 degrees.Another distribution where the Post Processor Network can play a key role is in correctly producing the distribution of the number of hits with energy above the half MIP threshold shown in Figure 12.This distribution is strongly affected by how well the cell energy spectrum is modelled around the cut, and even slight discrepancies can significantly effect the resulting number of hits above the half MIP threshold.The differences that can be seen in these distributions, which tend to become more pronounced for higher energies where the number of hits increases, can therefore be linked to the more noticeable deviations in the cell energy spectrum around the cut.

Number of Hits
The distributions of the number of hits after reconstruction are shown in Figure 13.These distributions do not show any significant shifts from the simulation level distributions, indicating that the vast majority of hits above the half MIP threshold are retained after reconstruction, as would be expected for electromagnetic showers.Shower shape observables are also crucial for a generative model to capture, as they can significantly impact downstream reconstruction.We begin by reviewing the  performance of the model along the depth of the calorimeter i.e along the z axis.Firstly, the center of gravity along the depth of the calorimeter, given by the first moment along the z axis, was computed and the JSD values between BIB-AE and Geant4 distributions for each angle and energy combination are shown in Table 2 at both simulation and reconstruction level.At simulation level, the best (50 GeV, 85 degree) and worst (20 GeV, 85 degree) performing combinations of energy and angle are highlighted in bold, and the distributions shown in Figure 14.Overall, a good description of these distributions is achieved, with the most noticeable discrepancies arising from the BIB-AE tending to produce a somewhat narrower distribution than the Geant4 counterpart.

Center of Gravity
After reconstruction, the best and worst performing combinations of incident angle and energy are at 50 GeV, 40 degrees and 90 GeV, 85 degrees respectively.These distributions are shown in Figure 15 -in the best case (right) the BIB-AE distribution matches the Geant4 distribution very closely, and in the worst (left) the BIB-AE again produces a slightly narrower distribution.

Longitudinal Profile
The second physical distribution relating to the shower development along the depth of the calorimeter that was investigated is the longitudinal profile 4 .The JSD values between BIB-AE and Geant4 distributions for each combination of incident particle energy and angle are shown in Table 3.At simulation level, the best (90 GeV, 85 degrees) and worst (20 GeV, 40 degrees) performing combinations for energy and angle are highlighted in bold, and the distributions shown in Figure 16.Note that the apparent discontinuity appearing around layer 20 arises from the calibration factor applied to account for the two sampling fractions.This feature therefore becomes more pronounced for particles closer to perpendicular incidence and with higher energy, where more energy is deposited in later layers of the calorimeter.The BIB-AE performs excellently at reproducing this distribution across the combinations of energy and angle studied.This includes an impressive reproduction of the step-like features that are present in the central layers for many energy and angle combinations (see for example the 90 GeV, 85 degree showers in the left panel of Figure 16), that results from the alternating layer structure of the ILD ECAL.
After reconstruction, the worst performing incident energy and angle combination is still 20 GeV, 40 degrees, but the best performing combination is now at 90 GeV, 60 degrees.The distributions for these combinations are shown in Figure 17.The excellent level of agreement between the BIB-AE and Geant4 distributions that was observed at simulation level is still present after the reconstruction procedure, across the range of energies and angles studied.

Radial Profile
The ability of the BIB-AE to capture the transversal development of showers is investigated by studying the radial profile of the showers around the principle axis.The JSD values between the BIB-AE and Geant4 for each fixed combination of incident particle energy and angle are shown in table 4. At simulation level, the best (90 GeV, 60 degrees) and worst (20 GeV, 40 degrees) performing combinations are highlighted in bold, and shown in Figure 18.While the BIB-AE tends to be able to capture the profile around the high energy core very well, at larger radii the performance varies.In some cases the model reproduces the distribution at larger radii with a high degree of fidelity, while in others, typically where the showers have a high degree of inclination (i.e.40 degree showers), the BIB-AE distributions fall off too quickly compared to the Geant4 showers.After reconstruction, the best and worst performing combinations of incident particle angle and energy are still 90 GeV, 60 degrees and 20 GeV, 40 degrees respectively.As at simulation level, the profile around the core of the shower is well reproduced across the board.In the best performing case (i.e 90 GeV, 60 degrees), even at larger radii the BIB-AE maintains a strong agreement with Geant4.Interestingly in the worst performing case at 20 GeV, 40 degrees, although a fall off at large radii is still observed, it appears to be somewhat less drastic than at simulation level.This could be significant in a more general case for distinguishing overlapping photon showers.We leave a comparison relative to Geant4 for such a scenario to future work.

Computational Performance
The suitability of a generative model as a surrogate simulator ultimately comes down to its inference time per shower.To this end, the generation time per sample is benchmarked on both CPU and GPU hardware.Table 5 shows the average time to generate a shower with energy and angle uniformly distributed in the 10-100 GeV range and 90-30 degree range respectively using Geant4 and the BIB-AE.The generative model offers a significant speedup relative to Geant4, reaching up to three orders of magnitude on a GPU.

Conclusion
Generative models show potential to provide powerful tools for fast simulation, and to significantly reduce the computational resources required by experiments in particle physics.The contribution of this paper is two fold.In the first instance, we generalise the BIB-AE architecture, one of the most powerful generative models for calorimeter simulation, to a multi-parameter conditioning scenario in a highly granular calorimeter.
A key challenge for extending simulation tools based on generative models into such a multi-parameter conditioning scenario arises from the requirement to cover an increased phase space.This means a generative model not only needs a robust conditioning scheme, but also requires sufficient capacity and capability.Scaling of models designed for learning on regular grids must also be given some forethought, as larger grid sizes will be required for showers with varying angle of incidence.In the second instance we perform a detailed study into the effects of particle flow reconstruction on the performance of a generative model for the first time in such a highly granular detector.We demonstrate the possibility to design a simulator that retains a strong performance after reconstruction -it will be the physics performance after this processing that will provide the ultimate means of judging the suitability of a surrogate simulator.The physics performance of the BIB-AE model was shown to be strong across a range of physics observables.From a conditioning perspective, the energy response was shown to be particularly strong-thanks in part to the per-shower re-scaling afforded by the use of a normalising flow to learn the total energy deposited in the active regions of the calorimeter.This approach helps to make the energy conditioning performance of the model more reliable than was the case in previous versions [11].The angular response was somewhat weaker in comparison, but still provides a good description overall.
The BIB-AE retained the ability to learn the cell energy spectrum well across a range of angles and energies, as well as reproducing shower shapes with a high degree of fidelity.The most noticeable differences in the distributions appear in the radial profile for more inclined showers and at larger radii.
The improved computational potential of generative models relative to Geant4, which is by this point well established in the literature, remains true for the BIB-AE architecture in this extended setup.A speed-up of up to three orders of magnitude relative to Geant4, aligns with previous results for this model [11] [12].
For future work, a crucial step will be to develop general methods to handle irregularities in the detector geometry.This will be necessary to allow the incident position on a detector surface to be varied, and entire detector subsystems to be handled.This step would also allow the effects of reconstruction to be studied in more general environments where overlapping showers are present.Studying the physics performance of generative models after reconstruction is an essential step, which will ultimately be necessary to evaluate the suitability of any generative model for calorimeter simulation.

Figure 1 .
Figure 1.Schematic diagram of the BIB-AE architecture setup during training, including each network and its corresponding loss terms.The encoder reduces the input calorimeter showers to a low-dimensional latent space, which is regularised by KLD and MMD loss terms in addition to a latent critic.The decoder reconstructs shower images back from the latent space, with a dual purpose reconstruction critic simultaneously assisting reconstruction and judging the quality of the output shower.The Post Processor network is trained in a second step to adjust voxels individually.The blue and lilac lines represent an input conditioning on the energy and angle of the incident particle, while the red line represents a conditioning on the visible energy.

Figure 4 .
Figure 4. Simulation level angular linearity (left) and width (right) for both Geant4 and BIB-AE generated showers.Curves are shown for each of the fixed incident energies of 20 GeV, 50 GeV and 90 GeV, which are coloured purple, dark cyan and red respectively.In the angular linearity plot on the left, the means for showers with energies of 20 GeV and 90 GeV are shifted by constant values of −10 degrees and +10 degrees respectively for visual purposes.The sub-panels in each figure show the relative deviation of the BIB-AE angular responses from their Geant4 equivalents.

Figure 5 .
Figure 5. Reconstructed angular linearity (left) and width (right) for both Geant4 and BIB-AE generated showers.Curves are shown for each of the fixed incident energies of 20 GeV, 50 GeV and 90 GeV, which are coloured purple, dark cyan and red respectively.In the angular linearity plot on the left, the means for showers with energies of 20 GeV and 90 GeV are shifted by constant values of −10 degrees and +10 degrees respectively for visual purposes.The sub-panels in each figure show the relative deviation of the BIB-AE angular responses from their Geant4 equivalents.

Figure 6 .
Figure 6.Visible energy deposited in the calorimeter at simulation level for both Geant4 and BIB-AE generated showers.The distributions are grouped according to incident photon angles of 40 degrees (left, orange), 60 degrees (center, green) and 85 degrees (right, blue).

Figure 7 .
Figure 7. Energy linearity (left) and resolution (right) at simulation level for for both Geant4 and BIB-AE generated showers.The curves are grouped according to the three incident angles of 40 degrees (orange), 60 degrees (green) and 85 degrees (blue).The sub-panels in each figure show the relative deviation of the BIB-AE visible energy responses from their Geant4 equivalents.

Figure 8 .
Figure 8. Reconstructed particle (PFO) energy for both Geant4 and BIB-AE generated showers.The distributions are grouped according to incident photon angles of 40 degrees (left, orange), 60 degrees (center, green) and 85 degrees (right, blue).

Figure 9 .
Figure 9. Reconstructed particle (PFO) energy linearity (left) and resolution (right) for both Geant4 and BIB-AE generated showers.The curves are grouped according to the three incident angles of 40 degrees (orange), 60 degrees (green) and 85 degrees (blue).The sub-panels in each figure show the relative deviation of the BIB-AE visible energy responses from their Geant4 equivalents.

Figure 10 .
Figure 10.Simulation level cell energy spectra for the best (50 GeV, 85 degrees, left) and worst (90 GeV, 40 degrees, right) incident angle and energy combinations.The grey hatched area indicates the region below half a MIP.

Figure 11 .
Figure 11.Reconstructed cell energy spectra for the best (50 GeV, 85 degrees, left) and worst (20 GeV, 40 degrees, right) incident angle and energy combinations.The grey hatched area indicates the region below half a MIP.

Figure 12 .
Figure 12.Simulation level number of hits for both Geant4 and BIB-AE generated showers.The distributions are grouped according to incident photon angles of 40 degrees (left, orange), 60 degrees (center, green) and 85 degrees (right, blue).

Figure 13 .
Figure 13.Reconstructed number of hits for both Geant4 and BIB-AE generated showers.The distributions are grouped according to incident photon angles of 40 degrees (left, orange), 60 degrees (center, green) and 85 degrees (right, blue).

Figure 14 .
Figure 14.Simulation level center of gravity distributions for the best (50 GeV, 85 degrees, left) and worst (20 GeV, 85 degrees, right) incident angle and energy combinations.

Figure 15 .
Figure 15.Reconstructed center of gravity distributions for the best (50 GeV, 40 degrees, left) and worst (90 GeV, 85 degrees, right) incident angle and energy combinations.

Figure 19 .
Figure 19.Reconstructed radial profiles for the best (90 GeV, 60 degrees, left) and worst (20 GeV and 40 degrees, right) incident angle and energy combinations.

Table 1 .
Tableshowingthe Jensen Shannon Distance (JSD) between the Geant4 and BIB-AE results for the simulation and reconstruction level cell energy spectrum for each combination of fixed energy and angle.For the simulation level results, the reported JSD is only calculated in the region above the MIP cut.

Table 2 .
Table showing the Jensen Shannon Distance (JSD) between the Geant4 and BIB-AE results for the simulation and reconstruction level center of gravity for each combination of fixed energy and angle.

Table 3 .
Table showing the Jensen Shannon Distance (JSD) between the Geant4 and BIB-AE results for the simulation and reconstruction level longitudinal profiles for each combination of fixed energy and angle.

Table 4 .
Table showing the Jensen Shannon Distance (JSD) between the Geant4 and BIB-AE results for the simulation and reconstruction level radial profiles for each combination of fixed energy and angle.

Table 5 .
Comparison of the computational performance of the BIB-AE generator and Geant4 on a single core of an Intel ® Xeon ® CPU E5-2640 v4 (CPU) and an NVIDIA ® A100 with 40 GB of memory (GPU).For the BIB-AE, the best performing batch size is selected.The value shown is the mean obtained for a set of 2000 showers with uniform energy from 10 − 100 GeV and 30 − 90 degrees, with error arising from the standard deviation.