A generative adversarial network to speed up optical Monte Carlo simulations

Detailed simulation of optical photon transport and detection in radiation detectors is often used for crystal-based gamma detector optimization. However, the time and memory burden associated with the track-wise approach to particle transport and detection in commonly used Monte Carlo codes makes optical simulation prohibitive at a system level, where hundreds to thousands of scintillators must be modeled. Consequently, current large system simulations do not include detailed detector models to analyze the potential performance gain with new radiation detector technologies. Generative adversarial networks (GANs) are explored as a tool to speed up the optical simulation of crystal-based detectors. These networks learn training datasets made of high-dimensional data distributions. Once trained, the resulting model can produce distributions belonging to the training data probability distribution. In this work, we present the proof of concept of using a GAN to enable high-fidelity optical simulations of nuclear medicine systems, mitigating their computational complexity. The architecture of the first network version and high-fidelity training dataset is discussed. The latter is generated through accurate optical simulation with GATE/Geant4, and contains the position, direction, and energy distributions of the optical photons emitted by 511 keV gamma rays in bismuth germanate and detected on the photodetector face. We compare the GAN and simulation-generated distributions in terms of similarity using the Jensen–Shannon distance. Excellent agreement was found with similarity values higher than 93.5% for all distributions. Moreover, the GAN speeded the optical photon distribution generation by up to two orders of magnitude. These very promising results have the potential to drastically change the use of nuclear imaging system optical simulations by enabling high-fidelity system-level simulations in reasonable computation times. The ultimate is to integrate the GAN within GATE/Geant4 since numerous applications (large detectors, bright scintillators, Cerenkov-based timing positron emission tomography) can benefit from these improvements.


Introduction
Optical Monte Carlo simulations are extensively used to understand, optimize, and design crystal-based gamma detectors in nuclear medicine and high-energy physics (Staelens et al 2003, Roncali et al 2017a, Ghabrial et al 2018. In such detectors, a detailed description of the light transport in the crystal and the light collection by the photodetector are needed to optimize their energy, spatial, and timing performance (Bauer et al 2009, Ariño-Estrada et al 2020, He et al 2022. The computation time to perform optical simulations is high since each optical photon emitted after a gamma interaction in a scintillator is tracked from emission to detection (Geant4 Collaboration 2021). For example, ∼1 s is needed to compute the spatial distributions of the detected optical photon originating from a single 511 keV gamma interaction in a 20 mm thick bismuth germanate (BGO) (light yield of ∼8500 ph MeV −1 ) 3 .
In addition to detector development, Monte Carlo simulations are widely used to simulate complete clinical and preclinical imaging systems (Ricci et al 2019, Ahmed et al 2020, Sarrut et al 2021a, Sanaat et al 2022, Zaidi and Andreo 2022. Detailed and complete system optical simulations of these systems, often composed of hundreds or thousands of scintillators, are computationally prohibitive and thus are always limited to the analysis of the first interaction gamma points to study the system performance (Gu et al 2015, Abbaszadeh et al 2018, Borghi et al 2018, Surti and Karp 2018. Consequently, current system simulations are not precise enough; they cannot include improved radiation detector technologies to analyze their potential gain in performance. Moreover, up to several gigabytes of output data are generated for each detector. In recent years several groups proposed the use of machine learning techniques to speed up and unburden extensive system Monte Carlo simulations (de Oliveira et al 2017, Paganini et al 2018, Erdmann et al 2019, Sarrut et al 2019, 2021b, Fanelli and Pomponi 2020, Hashemi et al 2023. They all used the recently developed generative adversarial networks (GANs) (Goodfellow et al 2020), a class of generative models composed of two neural networks capable of generating a multidimensional data distribution mimicking high-dimensional distributions of training data.
Our approach aims at developing a GAN model called optiGAN which will be capable of generating multidimensional distributions (energy, position, direction, and time) of optical photons at the photodetector face, conditional to their emission position (gamma interaction in the crystal). Once trained, the model can be directly integrated within the Monte Carlo simulation and used to bypass low-level detail optical photon tracking (figure 1). This can enable fast and accurate simulation of optical photons in radiation detectors and can be scaled up to a complete imaging system, modeling the multidimensional distributions of scintillation and prompt photons on the photodetector face simultaneously. To our knowledge, GANs have never been applied to this specific aim.
In this work, we developed a conditional Wasserstein generative adversarial network (WGAN) model with gradient penalty (GP) to enhance the training procedure, convergence, and stability. The training dataset was generated through high-fidelity optical Monte Carlo simulations using GATE (Sarrut et al 2021a), a toolkit dedicated to numerical simulations in medical imaging and radiotherapy based on the Geant4 simulator. The optiGAN proof-of-concept was performed using a BGO crystal, a good candidate for positron emission tomography (PET) applications, thanks to its high density, stopping power, high photo-fraction, refraction index, and low production cost (Kwon et al 2016, Roncali et al 2019, Kratochwil et al 2020. The transport of the optical photons generated by the high energy gamma (511 keV for PET applications) interaction in BGO was thoroughly modeled using the well-validated LUT Davis model at the crystal edges (Roncali et al 2017b). This led to highly accurate optical photon distributions at the photodetector face. The position, direction, and energy distributions of all the optical photons (scintillation and Cerenkov photons) on the photodetector face were saved and used as a high-fidelity training dataset (figure 1).
Here we present the structure of optiGAN and the training dataset. We compare optical photon distributions generated by the GAN to distributions from the training dataset and other distributions not in the training set. The latter allows us to discuss the computational time and the GAN ability to learn a multidimensional optical model and produce realistic optical photon distributions following this model.

Training data from high-fidelity optical Monte Carlo simulations
All training datasets were prepared with optical simulations performed in a 3 × 3 × 3 mm 3 BGO crystal with the open-source software GATE v. 9.0 (Geant4 v. 10.06.03) (Sarrut et al 2021a).
BGO bulk and optical properties (density, composition, emission spectra, decay time, absorption length, index of refraction) were accurately integrated into the simulation. More details can be found in our previous work (Roncali et al 2019).
To control the source emission position, electrons of 420 keV were used as the source, corresponding to electrons generated by 511 keV photoelectric interactions in BGO. The tortuous path of the energetic electron in BGO at 511 keV was modeled with a realistic inelastic mean free path in the order of a fraction of microns . The optical photons produced (scintillation or Cerenkov emission) were generated and tracked using the Livermore Model. The transport of optical photons was conducted using the LUT Davis model at the crystal edges. The model uses a crystal reflectance look-up-table computed from three-dimensional (3D) crystal surface measurements (Roncali et al 2013, Roncali et al 2017b to determine the photon reflection or transmission and its direction at each interaction with the crystal surfaces. Surfaces were modeled as polished and wrapped in Teflon, except the face in contact with the photodetector, which was modeled coupled to optical grease (index of refraction 1.56) for all simulations. The photodetector was considered ideal, with a photodetection efficiency of 1, including the geometrical and quantum efficiency.
A total of 140 emissions positions was simulated within the crystal volume at selected X-Y-Z positions to build the training dataset. Using the symmetry of the geometry, these 140 positions covered 1/8 of the crystal cross-section and five depths of interaction (figure 2(a)). Another 1000 electron emission points, distinct from the training points, were randomly selected within the whole crystal volume and used to test the prediction capabilities of the GAN (prediction dataset).
At each position, electrons were emitted to generate at least 30 000 optical photons on the photodetector face. The properties of all optical photons exiting at the coupling-photodetector interface were saved in a phase space GATE root file: the X and Y coordinates on the cross-section (Z position was constant and equal to the photodetector position), the 3D directions (dX, dY, and dZ), and the kinetic energy (EKine) were stored as tabular data making a total of six distributions. The 3D emission source position was used as the class label to tag the distributions.
The tabular data of the 140 training points were concatenated to generate a single training matrix (figure 2(b)). This constituted the multidimensional dataset to train our conditional GAN for a total of nine columns (six distributions-X, Y, dX, dY, dZ, EKine, and 3D class label) and 4.2M rows (140 points, each with 30 000 entries).
The tabular data of the remaining 1000 emission points were used to test the prediction capabilities of the GAN (section 2.4).

GAN architecture, parameters, and training
GANs (Goodfellow et al 2020) are a type of deep learning model designed to generate new, synthetic data samples that are indistinguishable from those belonging to the training dataset.
The optiGAN model was implemented in Python with the Keras framework. Its architecture comprised two neural networks, a generator G and a discriminator D (also called critic) .
As a general concept, G is responsible for generating a multidimensional distribution M g (generated) indistinguishable from the real one in the dataset (M t ). G takes as input: z, sampled from a simple multidimensional normal distribution, and a class label L (source 3D positions within the crystal). It outputs a multidimensional distributionx ∈ M g :x = G (z, L) targeting a real data distribution x ∈ M t (L) (figure 3(a)). D takes as input a multidimensional distribution with class labels (M t or M g ) and is responsible for classifying it as either real (drawn from the dataset) or fake (generated) ( figure 3(b)). The models are  trained together in an adversarial way, such that improvements in the generator come at the cost of a reduced capability of the discriminator.
Both D and G were composed of three linear fully connected hidden layers. As the input was not an image but a multidimensional matrix, the two neural networks were built using dense layers and did not require convolution layers. For G, the number of neurons in the hidden layers was set in ascending size, H, H × 2, H × 4 (figure 3, where H was 128). D had the same number of neurons but in descending size (figure 3). H was set empirically based on experimental results. As Arjovsky et al (2017) recommended, a rectified linear unit activation function was used for all layers except for the last one of G, where a linear activation function was used instead.
We used the Wasserstein (or Earth Mover's) distance as loss function  together with GP (Gulrajani et al 2017) as defined in equation (1), which both were shown to provide better GAN learning stability (Gulrajani et al 2017) than the conventional binary cross entropy loss function and gradient clipping (Goodfellow et al 2020. Together with improved stability of the optimization process, it was shown that the Wasserstein loss metric correlated with the generator's convergence and sample quality (Arjovsky et al 2017) (1) The first two terms represent the expectation values of the output of the discriminator when fed with a fake (x) or real (x) distribution, respectively. λ represents the GP coefficient used to control the strength of the penalty. It was set to 10, as suggested by Gulrajani et al (2017). We used the square hinge GP (Petzka et al 2017) as defined in equation (2) since it showed better results than other penalties when λ = 10 (Sarrut et al 2021b) (3) x (equation (3)), was sampled uniformly along straight lines between pairs of points sampled from the real data distribution and the generated distribution. ϵ was a random number ∈ U [0, 1]. Adam optimizer was used to minimize the loss function. The momentum term β 1 and β 2 were set to 0.5 and 0.9, respectively. The learning rate was chosen empirically and set to 10 −5 .
The model was trained for 10 000 epochs with 10 iterations per epoch. Stochastic batches of 420 000 samples (n b ) were used at each iteration. For D, each batch comprised a stochastic selection of the training dataset matrix rows ([n b , 9], selected from the multidimensional matrix of 4.2M rows, figure 2(b)). For G, the input batch included only the class label L ([n b , 3]) and z, whose dimension was empirically set to [n b , 10]. The discriminator was updated more often than the generator, three times per iteration versus once, as advocated by Arjovsky et al (2017).
The generator model was saved each 100 iterations.

Generation of synthetic distributions
Saving the generator model allowed the generation of the optical photon distributions (X, Y, dX, dY, dZ, EKine) of a given interaction point at a specific iteration. The generator produced synthetic (generated) distribution samples when fed with a random z and a specific 3D position within the crystal volumes. Both simulated and generated distributions were histogrammed with 150 bins. First, the optiGAN generation capability was tested by generating distributions using points belonging to the training dataset and comparing them with the corresponding simulations (used for training).
Then, we tested the GAN prediction capabilities by generating the synthetic distributions of the 1000 points outside the training dataset (prediction dataset) and comparing them with the corresponding simulations.

Quantitative comparison
Distributions were compared by quantifying the similarity (S) between a normalized generated (h G ) and simulated (h R ) optical photons distribution at the photodetector face using the Jensen-Shannon distance (JSD), a similarity measure of probability distributions (Menéndez et al 1997, Zunino et al 2022. The JSD represents the statistical distance between two distributions and is given by equation (4): where m represents the bin-wise mean of h R and h G , and D is the Kullback-Leibler divergence (also called relative entropy). JSD outcome is bounded in [0,1], where 0 indicates that the two distributions are the same, while 1 means they are nowhere similar. The similarity S was computed as 1 − JSD and estimated for all emission points, distributions (X, Y, dX, dY, dZ, EKine), and training iterations.

Computation time
With the final goal of integrating the trained optiGAN model within the Monte Carlo toolkit to skip the optical photon tracking process, we compared the CPU computation time to generate the optical photon distributions through accurate optical Monte Carlo simulations with the ones generated using the saved generator model. To this end, we simulated the emission of 5000 420 keV electrons at three representative points in the crystal cross-section: at the center, corner, and edge (X and Y equal to [0,0], [1.45,1.45] and [1.45, 0], respectively), at two different depths of interaction, one far from the photodetector (Z = 0.1 mm) and one close to the photodetector (Z = 2.8 mm), for a total of six points (six simulations). Others simulation and analysis details are as the ones described in section 2.1.
For each simulation, the simulation computation time was estimated starting from the emission of a parent electron until the tracking and eventual detection of all the emitted optical photons ended, and then summing the contribution of all electrons.
The GAN computation time was estimated by feeding the G model with z and a 3D emission position until the end of the synthetic distribution samples generation. For each 3D emission point, the dimensions of z and L were set to [10, n batch ] and [3, n batch ], respectively, where n batch matched the number of detected optical photons for the specific emission point extracted from the corresponding simulation. This allowed accounting for a realistic number of detected optical photons at different emission positions (due to different transport within the crystal and extraction at the photodetector face) in the synthetic optical photon distribution generation.
Results concerning the time needed to simulate the whole training dataset and the optiGAN training time are not discussed here, since we focus on comparing the computational gain experienced by users applying the trained network in future releases. The simulations, model training, distribution generation, and computation time estimation were all performed with an x86_64 Intel(R) Core (TM) i9-10900X CPU@3.70 GHz, CPU MHz 1200, 16 GB DRAM Dell system. Figure 4 shows the GAN loss function computed from equation (1) at each iteration. It started negative at the beginning of the training, when G was not sufficiently trained and, applied to generated data D E [D (x)], was less than D applied to real data E [D (x)]. Then, the loss function increased consistently as the training progressed until it converged to zero, showing the high training stability and adversarial behaviors of the training.

GAN synthetic distribution generation capability
Excellent agreement between the optiGAN-generated and training dataset distributions was found (figure 5). The GAN was able to simultaneously learn distributions corresponding to different emission point positions in the crystal that exhibited shapes and different degrees of complexity (figure 5). The agreement between the simulations and the GAN outputs was quantified through the similarity metric. For example, the distributions of optical photons emitted at the center of the crystal close to the photodetector and those of photons emitted close to the crystal edges far from the photodetector, shown in figures 5(a) and (b), were generated with a similarity S higher than 90% for all distributions.
It is interesting to note that both normal (e.g. X and Y distributions, figure 5(a), or EKine) and other shaped distributions with high skewness (e.g. X and Y, figure 5(b) and dZ distributions) were concurrently generated with a high degree of accuracy.
Similarity values as a function of the iteration number showed the excellent capability of the optiGAN architecture, together with the chosen hyperparameters, to learn multidimensional distributions ( figure 6). An average similarity (black lines in figure 6) greater than 87% was obtained for all distributions and all points starting from when more than 20 000 iterations were used. The mean similarity increased with the number of training iterations reaching at least 93.5% for all distributions as indicated in figure 6. The dZ distribution showed the lowest average similarity (93.5%), with a few points with S values lower than 90% (figure 6, dZ distribution). This was due to the large heterogeneity in the dZ distribution shapes (e.g. figures 5(a) and (b)). However, most emission points showed dZ distributions with similarity values higher than 90% (figure 6, colorscale). The same behavior is observed with dY. The highest similarity values were obtained with the kinetic energy EKine (figure 6), thanks to its normal-shaped distribution independent from the emission point location (figures 5(a) and (b)).

GAN synthetic distributions prediction capability
The ability of optiGAN to produce realistic distributions for emission points outside the training dataset was evaluated next. Excellent agreement was obtained between optiGAN-generated distributions and Monte Carlo simulation distributions of optical photons within the crystal volume not included in the training dataset. Figure 7 illustrates results for one point, with similarity values greater than 92%.
When considering all the points of the prediction dataset, average similarity values greater than 93.8% were obtained (black points, figure 8). The standard deviation was less than 1.9% (black points error bars, figure 8) among all points and distributions in the prediction dataset (1000 points). Moreover, the mean similarity values between the prediction dataset simulations and the optiGAN-generated distributions matched the ones between the training dataset and the optiGAN (in black and blue, respectively, in figure 8). Although there are no significant differences between them, the prediction dataset showed a slight increase in the similarity values for some distributions (dX, dY, dZ and EKine). This increase could be due to a majority of the prediction points having a high correlation with the better-performing points in the training data. As a Figure 6. Percentage similarity as a function of the training interaction for all the studied optical photon distribution on the photodetector face and all points. In each graph, the red curves represent the minimum and maximum accuracy obtained among all emission points during the training process. In black, the mean percentage accuracy values of all points as a function of the iterations. The text box depicts the mean accuracy value at 100 000 iterations. Note that the mean accuracy values (black lines) location shows that most points have high accuracy. Figure 7. Distributions of the six parameters studied (X and Y position, directions dX, dY, dZ, and kinetic energy EKine) of one representative point not included in the training dataset (blue). In black, distributions generated by the optiGAN G model saved after 100 000 iterations. All distributions were histogrammed with 150 bins. result, the model gave slightly higher average similarity values than the training dataset. However, the difference is not statistically significant. Table 1    The simulation computation time mainly depended on the X-Y emission position (table 1), where the tracking of optical photons happened to be computationally longer when emitted at the center of the crystal cross-section than at the edges. Although not changing the simulation computation time, the emission Z position affected the number of optical photons arriving on the photodetector face, where emission points closer to the photodetector favored detection (table 1, last column). This directly affected the optiGAN computation time, which linearly increased with the number of detected optical photons (table 1). Indeed, a larger number of optical photons arriving on the photodetector face meant larger z and L input dimensions fed to G, leading to larger synthetic distributions to generate.

Computational performance
Although different batch sizes for optiGAN had different computation demands, significant computation time improvements were obtained, no matter the emission position within the crystal volume.

Discussion and conclusions
Using the most recent cutting-edge development of generative deep neural networks, we developed and optimized a WGAN capable of generating multidimensional optical photons distributions on the photodetector of a radiation detector, preserving the accuracy and reproducing the distribution details of a complete optical Monte Carlo simulation. Similarity values higher than 93.5% were obtained between the simulated and GAN-generated distributions when considering emission points included or not in the training dataset. This not only shows the excellent distribution generation capabilities of the trained optiGAN but its potential to predict optical photon distributions on the photodetector face for interactions in the whole crystal volume without the need to perform their detailed and accurate Monte Carlo optical simulations and include them in the training dataset.
Moreover, computation time gains of up to two orders of magnitude have been obtained when using CPU. Note that the computation time gain is expected to improve when considering larger crystals (e.g. 20 mm thick crystals, as those commonly used in clinical imaging systems) due to the increased optical photon emission, transit, and spread within the material, which will strongly increase the simulation computation time but not the GAN generation time.
Although very promising results are presented on a specific configuration (3 × 3 × 3 mm 3 BGO), this work does not aim at replacing detailed Monte Carlo simulations of single crystals or individual radiation detectors, which are long but still feasible in terms of computational time. The presented proof-of-concept shows encouraging results that can benefit many physics applications where the accurate optical photon tracking within radiation detectors makes large systems simulations computationally prohibitive. Detailed optical Monte Carlo simulations are largely used by the community to design and optimize large radiation detectors, they are thus critical to analyzing their possibly enhanced performance. As soon as a set of multidimensional distributions (high-fidelity training dataset) of a specific radiation detector can be generated from accurate optical Monte Carlo simulations, researchers could benefit from the proposed approach. Different multidimensional distributions could be used as training datasets according to a specific application, and different crystals, geometries, surface treatments, and technologies (e.g. metamaterials (Konstantinou et al 2021)), as well as different sources and interaction processes, could be used.
Although very high similarity values (<93.5%) were obtained, there is still potential to further enhance the model generation capabilities and minimize the deviation between the optiGAN-generated data and the training data. These deviations may be attributed to the optiGAN approach's inherent uncertainty when learning complex distributions with very different features. They could depend on the model architecture (e.g. number of layers, number and type of distributions), selected hyperparameters and/or GP function chosen. Further investigations are undergoing.
We will test the inclusion of different variables (e.g. timing, detection efficiency) in the high-fidelity multidimensional training dataset and optimize it to target the generation of distributions useful to study the timing performance of radiation detectors. Indeed, large efforts are currently focused on improving detectors coincident timing resolutions to push the limits of time-of-flight PET techniques, with the final perspective of a clinical image signal-to-noise ratio improvement, resulting in a faster image acquisitions execution and thus reduced exposure rates for patients (Lecoq 2017). In this context, large system simulations are mandatory to test new radiation detector technologies but computationally prohibitive nowadays. To this aim, next optiGAN version will include the possibility of learning sparse distributions coming from optical photons with different physical features (scintillation or Cherenkov photons), as well as testing the effect of inter-crystal scattering.
Future works will focus on improving the optiGAN performance in terms of computational time by thoroughly optimizing the code for graphic processing unit (GPU) computation. Different training procedures will be analyzed (e.g. additional GPs and data normalization), and several optimizations will be performed (e.g. automatic mixed precision (Micikevicius et al 2017), Data loader optimization techniques, and batch size optimization) to optimize the training. Preliminary optimization showed a 10× reduction in training computation time when training the GAN on GPU 4 (∼21 s/epoch) compared to CPU (∼3 min 30 s/epoch). In addition, transfer learning will be investigated to maximize the applicability of a configuration-specific training dataset. In addition, transfer learning will be investigated to maximize the applicability of a configuration-specific training dataset.
Further developments will build on the presented results and continue progressing, with the final aim to integrate the optiGAN within the simulation toolkit GATE/Geant4 for the community to benefit from this new simulation approach. The optiGAN integration and use will be simplified and user-friendly since GATE will be directly managed by Python libraries (Sarrut et al 2022).

Data availability statement
The data cannot be made publicly available upon publication because they are not available in a format that is sufficiently accessible or reusable by other researchers. The data that support the findings of this study are available upon reasonable request from the authors.