Deep learning based linear energy transfer calculation for proton therapy

Objective. This study aims to address the limitations of traditional methods for calculating linear energy transfer (LET), a critical component in assessing relative biological effectiveness (RBE). Currently, Monte Carlo (MC) simulation, the gold-standard for accuracy, is resource-intensive and slow for dose optimization, while the speedier analytical approximation has compromised accuracy. Our objective was to prototype a deep-learning-based model for calculating dose-averaged LET (LETd) using patient anatomy and dose-to-water (DW) data, facilitating real-time biological dose evaluation and LET optimization within proton treatment planning systems. Approach. 275 4-field prostate proton Stereotactic Body Radiotherapy plans were analyzed, rendering a total of 1100 fields. Those were randomly split into 880, 110, and 110 fields for training, validation, and testing. A 3D Cascaded UNet model, along with data processing and inference pipelines, was developed to generate patient-specific LETd distributions from CT images and DW. The accuracy of the LETd of the test dataset was evaluated against MC-generated ground truth through voxel-based mean absolute error (MAE) and gamma analysis. Main results. The proposed model accurately inferred LETd distributions for each proton field in the test dataset. A single-field LETd calculation took around 100 ms with trained models running on a NVidia A100 GPU. The selected model yielded an average MAE of 0.94 ± 0.14 MeV cm−1 and a gamma passing rate of 97.4% ± 1.3% when applied to the test dataset, with the largest discrepancy at the edge of fields where the dose gradient was the largest and counting statistics was the lowest. Significance. This study demonstrates that deep-learning-based models can efficiently calculate LETd with high accuracy as a fast-forward approach. The model shows great potential to be utilized for optimizing the RBE of proton treatment plans. Future efforts will focus on enhancing the model’s performance and evaluating its adaptability to different clinical scenarios.


Introduction
Proton therapy provides distinctive benefits compared to conventional photon radiation, such as non-penetrating radiation and precise dose delivery via the Bragg peak (Doyen et al 2016).This is especially valuable for pediatric patients, patients who have received prior courses of radiation, and cases when maintaining a low integral dose is a high priority.Nevertheless, due to the energy deposition characteristics of charged particles, the biological impact on cell destruction differs from traditional x-ray-based radiotherapy.To account for these differences, the concept of relative biological effectiveness (RBE) was introduced (Paganetti 2014), which relates the dose of reference x-ray radiation to the dose of other types of radiation (e.g.particles) needed to produce the same biological effect (Vitti and Parsons 2019).
In clinical practice, a constant RBE of 1.1 is widely used for proton radiation, though this is now known to be an oversimplification and has been shown by multiple studies to be non-ideal (Chaudhary et al 2014, Jones 2016, Peeler et al 2016).While RBE is affected by several factors such as radiosensitivity, fractionation, and endpoints, its primary variation comes from ionization density, which is quantified by a physical quantity known as Linear Energy Transfer (LET) (Paganetti et al 2002).The dose-averaged LET (LET d ) is commonly utilized as a representation for LET in clinical dose distributions resulting from multi-energetic protons.Studies have reported a direct link between adverse treatment side effects and LET d (Eulitz et al 2019, Bahn et al 2020, Ödén et al 2020, Fjaera et al 2022), making it desirable to optimize LET d and biological dose alongside the physical dose to ensure patient safety.Although advanced proton centers have taken steps to mitigate the effects of variable LET d and RBE (Paganetti andGiantsoudi 2018, Mutter et al 2020), there is still no commercially available treatment planning system that integrates LET d and RBE optimization (Sørensen et al 2021).Consequently, the capability to utilize LET d and RBE is suboptimal in many clinical proton therapy centers and patients may not be receiving optimal proton radiation therapy.
Due to the contribution of secondary particles produced in nuclear interactions, LET is best computed using Monte Carlo (MC) simulations, which involve tracking millions of individual particles and their interactions with matter.Despite providing accurate results, they are computationally demanding and time-consuming.Recent advancements in algorithms and GPU acceleration have made MC simulations feasible in clinical settings for dose evaluation in advanced proton centers (Wan Chan Tseung et al 2015, Tseung et al 2016, Deng et al 2020b).However, direct optimization with LET remains challenging as it requires iterative calculations (Gu et al 2021).Faster analytical models have been proposed to estimate LET, utilizing simplified analytical forms and phenomenological functions to approximate local proton spectra, relative proton power ratios, and corrections (Wilkens and Oelfke 2003, Sanchez-Parcerisa et al 2016, Deng et al 2020a).However, these models still exhibit limitations, especially in the accurate depiction of complex physics and tissue heterogeneity-inhibiting clinical adoption (Sanchez-Parcerisa et al 2016, Deng et al 2020a).
In recent years, the rapid progress of artificial intelligence has generated increasing interest in employing deep learning (DL) models for dose calculation and knowledge-based dose prediction.Dose calculation involves solving the Boltzmann transport equation or its simplified versions to determine the physical dose distribution, given patient anatomy and the specific beam arrangements.On the other hand, dose prediction aims to forecast the most clinically optimal dose distribution for a patient, focusing on understanding the desired clinical trade-offs among various potential dose distributions that could be realistically delivered based on the patient's CT data.Unlike dose calculation, which is grounded in simulating underlying physics, the dose prediction task incorporates the principles of physics only implicitly and partially.For both tasks, a multitude of DL models have been utilized, yielding notable accuracy in their outcomes.Examples of knowledge-based dose prediction tasks encompass various UNet-like architectures (Nguyen et al 2019, Ahn et al 2021, Liu et al 2021, Gronberg et al 2023), Generative Adversarial Network (GAN) (Kearney et al 2020, Zhan et al 2022), and diffusion models (Feng et al 2023, Fu et al 2023, Zhang et al 2023).Despite variances in model performance and inference speed, these models for dose prediction typically rely on patient's anatomy and physician's contours as input and do not use beam-specific information.In comparison, the dose calculation model requires specific beam-related information and allows the calculation of dose contributions from individual pencil beams without necessitating physician's contours.These methods leverage sequence models such as Long Short-Term Memory for proton dose (Neishabouri et al 2021) and transformer-type architectures for both proton (Pastor-Serrano and Perkó 2022) and photon dose (Pastor-Serrano et al 2023).In contrast to numerous publications on physical dose prediction or calculation tasks, exploration into LET calculation using DL approaches remains scarce with very few studies reported, despite its critical importance in proton and particle therapy for treatment outcomes and patient safety.Previous studies have proposed simple Artificial Neural Network architectures for knowledge-based LET prediction for Brain using patient's anatomy and physician contours (Pirlepesov et al 2022), while more recent efforts have utilized GANs to perform beam specific LET calculation with physical dose as input without CT (Gao et al 2024).
In this study, we focus on LET calculation with DL models and propose that a DL-based approach can present substantial advantages over traditional methods in LET d calculation.It retains the strengths of analytical methods in parameter fitting while offering a significantly larger parameterized space compared to any existing human-designed analytical solutions.This expanded parameter space could potentially enhance the accuracy of results.Moreover, a DL-based approach's rapid forward inference could be considerably faster than MC simulation.These combined features make DL-based approaches a compelling solution for real-time and direct optimization tasks.In this study, we establish a prototype Cascaded 3D UNet model to explore the DL-based approach's ability to swiftly and accurately calculate LET d .This serves as a foundational step towards incorporating DL-based techniques in real-time clinical settings and direct optimization processes.

Patient data
With the approval from our Institutional Review Board, in this study we retrospectively curated a dataset of 275 de-identified prostate plans from the patients treated with SBRT using the Hitachi Probeat-V Proton Beam Therapy System, a half-gantry configuration (Umegaki et al 2003).All patients had granted consent for their medical records to be used in research.The prescription was 3625, 3800, or 4000 cGy to the target volumes in five fractions.Each treatment plan was single-field optimized (SFO) and consisted of four fields (Zhu et al 2010), of which two were right/left laterals and the other two were right/left anterior obliques.
Non-contrast abdominal CT scans for all patients were acquired with Siemens SOMATOM Definition AS 20 CT scanners, employing 120 kVp, a 1 mm slice thickness, and iterative metal artifact reduction for acquisition.The LET d for individual treatment fields within each clinically delivered proton plan was calculated by a clinically commissioned MC simulation code and served as the ground truth for the DL model training.Valid Dose-to-water (D W ) can be calculated by either commercialized analytical methods or MC.Given the established agreements between them during MC commissioning and clinical plan verification, we opt for the latter for easier batch processing in this study.Furthermore, to assess the impact of D W accuracy variations on the precision of the LET d calculation, we randomly selected 12 fields and computed D W using the Varian Eclipse Proton Superposition Convolution algorithm.We then used these analytically calculated D W as inputs to evaluate the model's calculation accuracy, comparing it with results obtained from using MC-calculated D W values.
Considering each beam configuration as an individual field, the dataset comprised a total of 1100 fields.These fields were then randomly divided into subsets of 880 for training, 110 for validation, and 110 for testing.

LET d calculation 2.2.1. Data preprocessing
The detailed workflow for preprocessing is shown in figure 1.To facilitate numerical computation, all images-including dose/LET d distribution maps and abdominal CT scans-were converted to the NIfTI (Neuroimaging Informatics Technology Initiative) format (Li et al 2016).As the data originated from different coordinates, the affine transformation was employed to align the center of the image to a proton field isocenter.Subsequently, the images underwent resampling and cropping to 128 × 128 × 32 voxels at an isotropic resolution of 5 mm/voxel.Low-resolution input was utilized in this work for the proof of principle as well as due to limited computational resources.The HU of CT was truncated at 1000 and globally rescaled between −1 and 1. Dose and LET d were normalized between 0 and 1.After intensity rescaling, the 3D images of CT and D W were converted and stacked into a two-channel tensor, which served as the input for the training phase of the model.

Model structure
This work utilizes the Cascaded UNet architecture due to its recent success in image translation and synthesis, as well as its scalability and adaptability for transfer learning (Roth et al 2018, Liu et al 2021).UNet has emerged as a popular architecture in computer vision and gained considerable recognition in the medical imaging field.One of the primary reasons for UNet's widespread adoption is its efficient handling of biomedical images: it is capable of working with relatively few training images while maintaining robust performance, a valuable trait given the often-limited availability of annotated medical images (Ronneberger et al 2015).The architecture's design is well suited for voxel-to-voxel biomedical image processing, providing high-precision localization via a symmetric expanding path that allows for full spatial resolution recovery at the output (Çiçek et al 2016).Furthermore, the use of concatenated higher-resolution features from the contracting path facilitates contextual understanding (Milletari et al 2016).
In this study, the Cascaded UNet model was selected as it adopts a coarse-to-fine strategy and incorporates staged feature extraction (Roth et al 2018, Liu et al 2021), as illustrated in figure 2. This design facilitates fine-tuning when applying the pre-trained model to more specific datasets, matching the reality that the spot sizes from different proton delivery systems may differ substantially.The Cascaded 3D UNet model utilizes two 3D UNet neural networks for a two-step LET d calculation process, which introduces another level of supervision at coarser output.Initial inferences from the first network (UNet 1) are refined by the second (UNet 2), enhancing calculation accuracy compared to a standard UNet (Roth et al 2018).To evaluate the necessity of the 2-part Cascaded UNet structure, we intentionally excluded the second UNet segment during one training process.This enabled a direct performance comparison between the Cascaded UNet and a common single-staged UNet of equivalent depth, aiming at determining the effectiveness and contribution of the cascaded design in our model.A similarly structured Autoencoder model (Bank et al  2023), without the residual connections between the encoder and decoder, has also been properly trained on the same dataset.This allows for a direct performance comparison between the more complex Cascaded UNet model and simpler architectures.
Both UNet 1 and UNet 2 have symmetric structures, consisting of 4 down-sampling blocks and 4 up-sampling blocks, coupled with regular convolution blocks, and followed by an output layer (1 × 1 × 1 convolution layer) (Ronneberger et al 2015).Each down-sampling block halves the input size and doubles feature maps using a 3 × 3 × 3 convolutional layer (stride of 2), an instance normalization layer, and a Rectified Linear Unit (ReLU) as the activation function.Conversely, the up-sampling block enlarges input size and reduces feature maps using a trilinear up-sampling layer, a 3 × 3 × 3 convolutional layer (stride of 1), and a ReLU.The output feature maps are 16 for UNet 1 and 32 for UNet 2. As a result of the symmetric structure, the final output by UNet 2 maintains the input's spatial resolution.
The model's inputs are two-channel tensors composed of a CT image and a D W map, and the output is a single-channel tensor for LET d map of equivalent spatial dimensions as the inputs.The selection of input channels was guided by the understanding that the medium's density and composition along the beam path significantly influence LET, and D W provides information on the characteristics of the beam.To evaluate the impact of the CT input on the model's performance, we conducted an ablation test where we used only D W as the input, omitting the CT.

Model training
The Cascaded 3D UNet model was implemented using the DL framework Pytorch (Paszke et al 2019).Model weights were initialized randomly and updated iteratively using the Adaptive Moment Estimation optimizer.In the training process, a learning rate of 0.0001 was initially set, and a 2 × 10 −5 decay rate was applied using the CosineAnnealingLR scheduler.To assess the impact of different loss functions on model performance, both Least Absolute Deviations (L1 loss) and Least Square Errors (L2 loss) were utilized to measure the discrepancy between the predicted LET d and the ground truth.The total loss function was a weighted sum of losses from UNet 1 and UNet 2, with a higher weight assigned to the latter.
where L n denotes different loss functions, P the model inference, P 1 and P 2 the model calculations from UNet 1 and 2, respectively, GT the ground truth, N the number of non-zero voxels in the ground truth LET d image, a the hyperparameter that adjusts the weight of the loss calculated from UNet 1.
During training, the batch size was set to 32, with 28 steps per epoch and a maximum of 160 000 steps. a was set to 0.5.The training and validation loss were updated using moving average.The final model was selected based on the smallest validation loss.

Performance evaluation
The performance of the selected DL model was evaluated on the test dataset.One of the post-processing steps implemented was to null the negative predicted LET d values, as this adheres to the physical principle that LET d cannot hold a negative value.Following that, the model inference was inversely normalized to get the absolute value of the LET d .The DL-calculated LET d distribution was then compared against the Monte-Carlo calculated ground truth using voxel-based mean absolute error (MAE) within regions where the ground truth LET d is not zero.Due to the interplay between LET d and D W , it is not appropriate to determine the biological effect using only LET d or D W . Instead, we utilized the product of LET d and D W as an indicative measure for estimating the biological dose (Grün et al 2019, Gu et al 2021).Gamma passing rate analysis was performed on the voxel-wise multiplication of LET d and D W , utilizing criteria of 3% accuracy, a 5 mm distance to agreement (DTA), and a 10% low LET d cutoff.The DTA of 5 mm was chosen based on the voxel size of the low-resolution input data.
Additional metrics included in our study were the Structural Similarity Index Measure (SSIM), Normalized Cross-Correlation (NCC), and Peak Signal-to-Noise Ratio (PSNR).SSIM is used to evaluate the similarity between two images, taking into account aspects such as luminance, contrast, and structure.Given that the SSIM was originally designed for 2D images, it was calculated slice-by-slice along the third dimension.The average SSIM was then determined by aggregating these individual slice scores.PSNR is used to quantify the difference between the original and the inferenced result, calculated using the MAE and the maximum possible voxel value while NCC assesses the correlation coefficient between two image datasets.Of note, the evaluations of MAE, SSIM, PSNR, and NCC were exclusively applied to the LET d distributions, rather than on the dose * LET d in the gamma analysis.

Results
With 4 GPUs (NVidia A100) operating in parallel, the training process of a single model approximately took 72 h.Once the one-off training process had been completed, a single-field LET d calculation with the trained model took 0.14 ± 0.02 s on a single A100 40GB GPU.
Figures 3(a Table 1 summarizes the detailed evaluation results among the 110 test fields for the Cascaded UNet, a single-staged UNet, and an Autoencoder of matching network depth, using different input channels and loss functions.The reported metrics include MAE, Gamma Passing Rate (Gamma), SSIM, NCC, and PSNR.Upon further examination, the majority of data points failing the gamma index test are clustered near the edge of the beam path, where the dose statistics are low and variance is large.This observation is supported by figures 3(c) and (d).For the 12 analytically calculated D W inputs, the inference results using the best performance Cascaded UNet model yielded an average gamma passing rate of 95.3% ± 1.4% and a mean MAE of 1.12 ± 0.13 MeV cm −1 , closely aligning with those achieved using MC-calculated D W as the input.Furthermore, to provide a more straightforward comparison of the model performance, figure 4 shows a visual comparison of the MC-calculated ground truth versus inferences from different models.
For a clinical evaluation of the proposed model, Dose-LET volume histograms (DLVH) are plotted in figure 5 for a representative test patient.DLVH serves as a tool for simultaneously presenting the distributions of both dose and LET within a given volume (Yang et al 2021).This makes it an effective method for evaluating the model's efficacy across various dose regions.Three distinct volumes are shown in figure 5, including areas where the dose to water exceeds 2 Gy, along with typical organs at risk (OARs) such as the rectum and bladder.The DLVH's X-axis represents the physical dose in Gy, while the Y-axis shows the LET d in MeV cm −1 .The isovolume lines, labeled as DLv%, indicate the percentage volume of a structure receiving at least x Gy in dose (D) and y MeV cm −1 in LET (L).For a comprehensive comparison, the LET Volume Histograms (LETVH) and Dose Volume Histograms (DVH) are positioned adjacent to the DLVH, providing a better view of the corresponding LET and dose distributions.
To further evaluate the DL model performance against the two conventional methods for calculating LET d , we constructed a Bland-Altman plot (Bland and Altman 1986) comparing the MC and DL models in figure 6    LET d by the DL model and the ground truth calculated by MC simulations, whereas the Y-axis shows the fraction of voxels falling within a bin size of 3%.This histogram also incorporates the corresponding data for the FoCa analytical model in the prostate site, thereby providing a platform for comparing the performances of the two methodologies.Since the reported data of the FoCa model is from a single case (Sanchez-Parcerisa et al 2016), we have presented histograms for both the cohort average and the field exhibiting the worst Gamma passing rate from our DL model.

Discussion
This study introduces a new DL model that utilizes a Cascaded 3D UNet for voxel-based LET d calculations, marking one of the initial attempts at applying DL to accurately calculate LET d for prostate cancer patients undergoing proton SBRT therapy.Distinct from previous methods, this model uses CT scans and dose-to-water distribution maps as inputs.This advancement not only eliminates the dependency on human-drawn contours, but also integrates anatomy heterogeneity information, paving the way for generalization to other treatment sites.CT images and D W are readily accessible from all TPS systems, ensuring that utilizing this model for LET d calculation does not impose significant additional demands on a TPS.The D W can be sourced from a variety of algorithms, provided they meet clinically acceptable standards.The inference results using the analytically calculated D W were comparable to those using MC-calculated D W .It is worth noting that the model's predictive accuracy could be impacted by lower D W accuracy, particularly in heterogeneous regions.In such cases, fine-tuning the model may be necessary to compensate for the inaccuracies in input data.
Remarkably, the computation time of the model is shortened to 0.14 s on a single GPU, coupled with a high gamma passing rate of 97.4% when benchmarked against the MC-calculated ground truth.In comparison, our in-house MC simulator, noted for its expedited calculation speed, requires approximately 9.8 s for a single-field computation.The calculation time of the Cascaded 3D UNet model is at least 50 times faster than the conventional MC methods and an order of magnitude than the hybrid analytical ones (Wan Chan Tseung et al 2015, Deng et al 2020a).With its combination of MC-level precision and swift processing speed, this model has the potential to become a good candidate for real-time LET computation or for integration within optimization processes, potentially improving the quality of radiation plans for patients.
As demonstrated in figure 3, the DL-calculated LET d is capable of emulating the finer details of the reference image generated by the MC simulator across different beam configurations.This suggests promising potential for further generalization across all beam configurations.As per figures 3(c) and (d), the DL model demonstrates good performance throughout the beam path except for the edge areas and at the distal end of the range.An area of potential improvement lies in enhancing the model's ability to predict LET d in regions characterized by sharp gradients.This is further supported by the better performance observed in models trained using L2 loss, which assigns more weight to voxels with higher deviations.A hybrid of higher-order loss or Focal Loss could potentially establish a better balance between focusing on the LET d in edge regions and the central high-dose areas (Lin et al 2017).
According to table 1, while the MAE, SSIM and NCC metrics show similar results for the Cascaded UNet model with or without CT input, there's a significant drop in the average gamma passing rate from 97.4% to 89.2% when CT is excluded.A visual examination reveals that in regions where the medium is heterogeneous, the LET inference is less accurate without CT input.The decline in accuracy can be attributed to the vital role of CT in providing information about the medium's density and composition.Therefore, it is essential to include CT data as an input for enhanced model accuracy.Furthermore, incorporating proton spot maps, spot energies, and the Bragg peak spectra as inputs could be beneficial, offering more comprehensive information to the model.Additionally, it is noted that the performance metrics for the single-staged UNet and the Autoencoder models are consistently lower compared to the Cascaded UNet, highlighting the effectiveness and superiority of the more complicated cascaded design in improving overall model performance.The visual comparison of the outputs from different models in figure 4 further supports these findings.The Cascaded UNet models, utilizing both CT and D W as inputs, successfully replicated the accurate LET distribution across areas of high anatomical heterogeneity.In comparison, the other models produced more homogeneous LET inferences, failing to account for tissue heterogeneity.
Figure 5 displays LETVHs and DLVHs for three different volumes, demonstrating a close alignment between the predicted LET and the Monte-Carlo generated ground truth across all isovolume curves.This alignment is particularly notable in the high-dose areas, observable at the right end of the curves.Such a correlation underscores the clinical significance of our model, affirming its accuracy in predicting LET within the entire dose area as well as OARs, which is of great concern in proton therapy.
In the Bland-Altman plot between the MC and the DL model (figure 6(a)), the bias of the difference is close to 0, suggesting that there is little systematic difference between the two methods on average.No obviously discernible trend is evident across both low and high dose regions.The model's efficacy is further substantiated by the histogram of relative LET d differences in the non-zero LET d region (figure 6(b)).As shown in figure 6(b), this histogram is characterized by a peak around zero and a relatively confined distribution, which provides additional evidence of the model's accuracy compared to the MC simulation.Moreover, in the prostate site, our model outperforms the analytical FoCa model as demonstrated by a more pronounced peak near zero and a narrower distribution, indicating potentially enhanced accuracy.When compared voxel by voxel, a small portion of voxels still present relatively large deviations (>10%) from MC-based labels, which suggests the potential for further improvement in the modeling.However, as illustrated in the DLVH plots in figure 5, the deviations in voxels are predominantly observed towards the middle to left end of the curves, corresponding to the low dose region.These deviations are not clinically significant within the context of this study.Of note, both figures 5(b) and 4 present a direct voxel-to-voxel comparison, which is an even stricter assessment than gamma analysis.
We consider a Gamma passing rate above 97% for the product of LET d and D W to be clinically acceptable, especially in the high dose high LET d region, although no gamma criteria have been investigated or established in clinic for LET that has significantly larger gradients than usual D W distribution.What sets our proposed model apart from other artificial intelligence models is its ability to compute LET d at the field level with input dynamically correlated with beam data, potentially providing superior reliability and improved generalizability.To solve the degeneracy problem of similar doses corresponding to different LET d distributions, our workflow takes the whole beam path into consideration rather than just the target or Organ at Risk dose distribution to guarantee unique dose-to-LET d correlations.Despite the fact that all plans in this study were SFO, we anticipate that our DL-based model can be extended to multi-field optimization (MFO) plans, given that the delivery parameters are fixed post-optimization.We are currently developing DL-based models specifically for LET d computations in MFO plans, and a future comparison of inference outcomes between models would be interesting.
The developed DL-based model shows the potential to be utilized in direct LET/RBE optimization by integration in commercial treatment planning system without significant alterations in existing algorithms, owing to its relatively low computational cost and lack of a need for specific expertise or expensive GPU clusters.This accessibility could help to enhance patient care at clinics of all capacities.However, to handle complicated clinical scenarios and diverse anatomical heterogeneity, more investigations are warranted to extend its generalizability.Certain limitations in this study must be acknowledged.First, the training and testing of the model are currently limited to data from prostate cancer treatment with SBRT.To enhance the model's generalizability beyond this specific dataset context, it will necessitate fine-tuning and evaluation across various anatomical sites.Second, the images used in this study have a relatively low resolution of 5 mm per voxel, reflecting the spatial tolerance commonly adopted in proton clinics.Employing higher resolution images would require a larger parameter space or patch stitching, potentially compromising inference speed.It is also important to note that the reported inference speed depends on the specific GPU model used.Third, our study focuses on a single DL architecture, without exploring alternative options such as vision transformers.Exploring more recent DL architectures might yield improved performance.Furthermore, the comparison of the DL model's outcomes with those from an analytical method is based on a single treatment site and a single case calculated by the referenced analytical method.While providing a straightforward assessment of the DL model's performance, this comparison cannot be deemed a comprehensive evaluation of either model's performance.Further systematic comparisons or studies are needed to ensure a more equitable comparison.Future work will focus on enhancing performance in high gradient regions, possibly through the deployment of a deeper network architecture for higher resolution.We are also extending the model to other treatment sites, including the brain, breast, and head and neck regions to develop a treatment-site independent model.This will be a significant step forward towards the broader application and utility of the model.

Conclusion
This study has successfully shown proof of concept with a Cascaded 3D UNet model that enables precise LET d calculation based on patient anatomy and dose-to-water distribution maps.Our model is capable of calculating LET d distributions in close agreement with the gold standard calculated by MC simulators while keeping the computation time around 0.14 s.The combination of accuracy and efficiency makes it promising for use in clinical applications and improving the quality of radiation plans.

Figure 1 .
Figure 1.Workflow diagram for the DL-based LET d calculation process.

Figure 2 .
Figure 2. Diagram of the 3D Cascaded UNet model.The number of channels is shown above each block while the dimension of the input/output is next to the corresponding images.
) and (b) show visual comparison of the MC calculated ground truth (Label) versus the DL-calculated LET d (Inference) for one beam configuration of a patient in the test dataset.Figure 3(c) provides a one-dimensional comparison of the LET d along the beam path for both the label and the inference.Dashed lines in figures 3(a) and (b) indicate the beam path under analysis.Figure 3(d) displays the Gamma index map for the spots that are above 8 MeV cm −1 (10% low LET d cutoff), calculated between the label and the model inference multiplied by the D W map, which serves as a quantitative evaluation of the model performance.The color variations within the gamma index map indicate the level of correspondence.Higher values on this map suggest greater discrepancies, whereas values below 1 indicate an acceptable level of agreement.

Figure 3 .
Figure 3.Comparison between the LET d label and model inference.(a) Ground truth calculated by MC simulator.(b) Calculation by DL model.(c) 1 D comparison along beam path.(d) Gamma index map in the region where LET d > 8 MeV cm −1 .
(a).Concurrently, figure 6(b) presents histograms contrasting the analytical FoCa (ForwardCalculation) model (Sánchez-Parcerisa et al 2014) with our DL model.In figure 6(a), the differences between MC and DL model are plotted against the mean of the two.Each point on this plot corresponds to a voxel from a representative LET d distribution map within the testing dataset.The central black dashed line denotes the mean differences at −0.61 MeV cm −1 .The surrounding gray dashed lines indicate the limits of agreement (95% confidence interval), calculated as 1.96 times the standard deviation, resulting in 5.1 MeV cm −1 .In figure 6(b), the X-axis represents the relative difference between the predicted

Figure 5 .
Figure 5. Dose-LET volume histograms (DLVH), LET volume histograms (LETVH), and dose volume histograms (DVH) for a single test patient (a) within the DW > 2 Gy region, (b) within rectum contour, (c) within bladder contour.The DLv% lines in the DLVHs are iso-volume lines representing the percentage volume of a structure that has a dose of at least x Gy and an LET of at least y MeV cm −1 .In both the LETVHs and DLVHs, solid lines are the MC-calculated ground truth while the dashed lines are inferences calculated by the cascaded UNet model.

Figure 6 .
Figure 6.Comparison of DL model performance against MC and analytical model FoCa.(a) Bland-Altman plot between MC and DL model in the non-zero LET d region.(b) Histograms of relative difference in LET d for the DL model within the non-zero voxel region in comparison with the FoCa analytical model.The histogram of the average LET d inference as well as the field with the lowest Gamma passing rate (92.3%) are shown for the DL model.

Table 1 .
Evaluation results on the testing dataset for Cascaded UNet, Single-staged UNet, and Autoencoder model, with different input channels and loss functions.The evaluated metrics include MAE, Gamma, SSIM, NCC, and PSNR.