Robust deep learning-based forward dose calculations for VMAT on the 1.5T MR-linac

In this work we present a framework for robust deep learning-based VMAT forward dose calculations for the 1.5T MR-linac. A convolutional neural network was trained on the dose of individual multi-leaf-collimator VMAT segments and was used to predict the dose per segment for a set of MR-linac-deliverable VMAT test plans. The training set consisted of prostate, rectal, lung and esophageal tumour data. All patients were previously treated in our clinic with VMAT on a conventional linac. The clinical data were converted to an MR-linac environment prior to training. During training time, gantry and collimator angles were randomized for each training sample, while the multi-leaf-collimator shapes were rigidly shifted to ensure robust learning. A Monte Carlo dose engine was used for the generation of the ground truth data at 1% statistical uncertainty per control point. For a set of 17 MR-linac-deliverable VMAT test plans, generated on a research treatment planning system, our method predicted highly accurate dose distributions, reporting 99.7% ± 0.5% for the full plan prediction at the 3%/3 mm gamma criterion. Additional evaluation on previously unseen IMRT patients passed all clinical requirements resulting in 99.0% ± 0.6% for the 3%/3 mm analysis. The overall performance of our method makes it a promising plan validation solution for IMRT and VMAT workflows, robust to tumour anatomies and tissue density variations.


Introduction
Accurate dose calculations are the cornerstone of modern radiation therapy treatment workflows. During the treatment planning phase of a radiotherapy plan optimization cycle, parameters are iteratively adjusted and the dose is recalculated until converging to a clinically acceptable solution. The desirable resulting plan should ensure maximal tumour coverage while optimally sparing the adjacent healthy tissue and organs-at-risk (OARs).
Multiple dose calculation engines have been used in treatment planning systems (TPS), with Monte Carlo (MC) (Rogers 2006) based ones offering the best accuracy by modelling the full particle transport (Krieger and Sauer 2005). Thanks to recent computer hardware developments, GPU-based MC implementations (GPUMCD) (Hissoiny et al 2011, Jia et al 2011 can offer computationally boosted dose calculations with high dosimetric accuracy, suitable for both offline and online radiotherapy workflows. Recently, hybrid radiotherapy systems that combine magnetic resonance imaging (MRI) with a linear accelerator have been developed and are becoming widely available. These setups, known as MR-linacs, consist of an MRI scanner surrounded with a rotating gantry and can deliver radiation while offering high, real-time soft tissue contrast. The clinical introduction of MR-linacs (Mutic andDempsey 2014, Raaymakers et al 2017) has enabled their use in adaptive radiation therapy treatment schemes, where daily anatomical changes can be incorporated in the treatment planning process (Winkel et al 2019) and a new plan can be generated while the patient lies on the treatment couch. The inclusion of this intrafraction information could significantly reduce OAR dosage and hence radiation-related toxicity (Christiansen et al 2022).
The most commonly used radiation delivery techniques are Intensity-Modulated Radiation Therapy (IMRT) (Bortfeld et al 1994) and Volumetric-Modulated Arc Therapy (VMAT) (Otto 2007, Shaffer et al 2010. Any further distribution of this work must maintain attribution to the author-(s) and the title of the work, journal citation and DOI.
While IMRT uses a step-and-shoot approach, VMAT enforces a continuous radiation delivery scheme via a constantly rotating gantry around the patient, achieving similar target coverage to IMRT (Yu and Tang 2011) and significantly reducing the time patients have to lie on the treatment couch (Palma et al 2010). Yet thus far no commercial solutions of VMAT delivery using an MR-linac exist, limiting the combination of external magnetic field with arc therapy to research setups .
Naturally, moving towards agile online adaptive treatment requires robust software tools that can meet the clinical speed and efficiency challenges (Kontaxis et al 2017). Therefore, contemporary research focuses on automating various parts of the treatment workflow in order to better serve prospective applications.
Artificial intelligence is already a widely explored field in radiation therapy, due to its easy applicability and the promising outcomes it can offer, such as speed, consistency and robustness. In particular, various deep learning (DL)-based techniques have been explored for radiation therapy workflows attempting to model the typically time-consuming dose calculations. Previous approaches have mainly focused on fast dose calculations for Monte Carlo denoising purposes in radiotherapy (Bai et al 2021, Neph et al 2021, proton dose calculations (Neishabouri et al 2021), dose predictions from OAR contours and density information (Kearney et al 2018, Mashayekhi et al 2022, Qilin et al 2022 and probabilistic dose predictions (Nilsson et al 2021).
While the concept of DL-based dose modelling has been explored using a variety of methods, not all of these methods encode particle-transport physics into the network inputs. Our research group has developed Deep-Dose, a DL framework that closely resembles a conventional dose engine, as its intuitive physics-based inputs describe different characteristics of the photon transport and thus allow the network to correlate multi-leafcollimator (MLC) segments with 3D dose deposition. The concept of DeepDose was initially explored using prostate data from patients treated on a conventional linear accelerator (Kontaxis et al 2020) and was subsequently extended to abdominal tumours for a 1.5T MRI radiotherapy system (Tsekas et al 2021), while successfully modelling the electron-return-effect (ERE) caused by the external magnetic field. Nonetheless, the model was unable to generalize to more tumour sites and could not be used to predict VMAT dose distributions.
In the present work we aim to extend the DeepDose framework to perform VMAT dose calculations. Therefore, we introduce VMAT data in our training set by modelling the continuous gantry motion using static gantry positions (Otto 2007). Besides, we extend our method to more tumour sites, including lung cases that are of particular interest due to their tissue inhomogeneity, which demands that the effects of the transverse magnetic field on internal air-tissue interfaces are included in our calculations. By using a generalized training scheme, potential beam angle configuration and anatomical biases are prevented. Our proposed DL framework can perform robust dose calculations for both IMRT and VMAT in a 1.5T MR-linac environment.

Materials and methods
2.1. Data preprocessing A total of 124 patients (42 prostate, 26 rectal, 34 lung and 22 esophageal) were used, previously treated in our clinic with VMAT on a conventional linac. Since no VMAT setup using the Elekta MR-linac exists, in this work we converted the data to the MR-linac space by approximating the dynamic gantry rotation with static gantry positions. This was achieved in a two-step approach: First, partial arcs were approximated with static segments and subsequently the VMAT MLC apertures (MLC vmat ) were estimated using the physical dimensions of the MR-linac MLC (MLC mrl ).
A partial VMAT arc is defined by two consecutive control points that typically cover an angle of a few degrees. In order to approximate such an arc, half of the total monitor units (MU) to-be-delivered were assigned to the starting and the rest to the ending control point. Due to the differences between the Synergy MLC vmat (Elekta AB, Stockholm, Sweden) and the Unity MLC mrl (Elekta AB, Stockholm, Sweden) in leaf width (5 mm versus 7.15mm) and MLC dimensions (40 cm × 40 cm versus 22 cm × 57 cm) at isocenter, the leaves had to be adjusted as following: First, a concentric rectangle corresponding to the new MLC dimensions was used for approximating the reference MLC shape. Then, by averaging the left/right leaf positions of the intersecting leaves respectively, the new leaf positions were calculated. Finally, the beam energy settings were adjusted to match an MR-linac configuration (7 MV) and the new plans were compared to the original ones using gamma analysis for validation purposes.
In order to account for the reduced number of off-centered segments, which arises from the fact that VMAT plans are typically targeted around the tumour, in contrast to the fixed-isocenter positioning on the MR-linac that causes many segments to be off-centered, we introduced randomized shifts in two perpendicular directions of the beams-eye-view: the leaf travel direction and the direction perpendicular to the leaf travel. For each of the training segments random shifts were applied in both directions, while ensuring that the segment does not reach the boundaries of the MLC and that a sufficient number of patient voxels are hit by each segment. The range of shift for the leaf travel direction was between −35 and +35 mm, and for the direction perpendicular to the leaf travel from −20 to +20 leaves. The threshold of acceptance we used was 100 voxels with a voxel size of 3 × 3 × 3 mm 3 , thus corresponding to a volume of 2.7 cm 3 . In addition, randomized gantry as well as collimator angles (between 0 and 359 degrees) were assigned to each of the training and validation segments, enabling our framework to be robust to various anatomies and eliminating potential beam configuration biases.

Input data preparation
The network inputs followed the DeepDose framework (Kontaxis et al 2020): The segment mask (containing the equivalent square area value), the distance from the linac source, the distance from the central beamline, the radiological depth and density information were generated per segment. For the generation of the network inputs, the MLC apertures were first rasterized using the generated plan files. Then, the segment shapes were raytraced through the 3D patient anatomy and treatment couch using a submillimeter precision step. Finally, in order to generate a clinical resolution output, a 3 × 3 × 3 mm 3 grid was exported.
We decided to feed synthetic-CT scans (sCT) as density input to the network to closely resemble the clinical MR-linac pipeline at our hospital. Following this workflow, the sCTs were generated using a bulk assignment scheme for the delineated contours per patient. For the ground truth dose calculations, all segments were set to 100 MU to ensure a standardized dose output and their dose was calculated using a research version of GPUMCD (Hissoiny et al 2011) at 1% statistical uncertainty per segment. An overview of the different network inputs and the corresponding target dose is presented on figure 1.

Evaluation on VMAT mrl plans
We tested our network on clinically deliverable VMAT mrl plans, generated with a research version of Monaco TPS (Elekta AB, Stockholm, Sweden), using an MLC mrl configuration and including the presence of 1.5T external magnetic field. All used patient test cases have been previously treated in our clinic with VMAT on a conventional linac. The actual clinical prescriptions for each patient, different for each tumour site and stage, were used for creating the VMAT mrl plans. A total of 4 prostate, 4 rectal, 7 lung and 2 esophageal patients plans were generated, planned using 66 Gy (30 fractions of 2.2 Gy), 25 Gy (5 fractions of 5 Gy), 48 Gy (4 fractions of 12 Gy) and 20 Gy (5 fractions of 4 Gy) respectively.
After generating the VMAT mrl plans, they were approximated using the same data preparation pipeline that was followed for the training and validation data in order to generate the network inputs. The prepared inputs were then used to predict the dose distributions for the test patients.

Dataset
After the preprocessing and data preparation pipelines, the total of 18 688 patient segments (5029 prostate, 5226 rectal, 4858 lung and 3575 esophageal) were split into training and validation sets, resulting in 16 830 training and 1858 validation segments. For inference, a total of 1437 patient segments (432 prostate, 506 rectal, 262 lung and 237 esophageal) were used. A detailed overview of the complete data processing workflow for both training and test purposes is presented on figure 2.

Network
The network we used for the dose predictions was a 3D UNet, based on the original architecture (Özgün et al 2016). We adopted the 3D UNet implementation proposed by the Niftynet framework (Gibson et al 2018), slightly modified to serve our input/output requirements.
All input grid dimensions were set to 216 × 192 × 120 in order to include large tumour segments introduced from VMAT planning. The grid spacing was kept to the clinical 3 × 3 × 3 mm 3 and the batch size we used was 32. Other choices for hyperparameters included the use of a learning rate of 10 −4 , an Adam optimizer and a root mean squared error loss function. For training, a patch-based approach was used, with patches of 32 × 32 × 32 voxels being extracted from the training inputs, without applying zero-padding, while a whole-volume inference was performed during test time.
We trained our network for a total of 125 000 iterations (approximately 3 days), before stopping early to avoid potential overfitting. This corresponds to a total of 238 epochs given our batch size, where the network has processed 4 × 10 6 image patches. The training of the network was performed on a workstation with a dual Intel ® Xeon ® Gold 6132, 128 GB RAM and an NVIDIA ® Quadro GP100 card. The GPUMCD ground truth dose calculations were also generated using a single NVIDIA ® Quadro GP100 card.

Evaluation on IMRT plans
To further prove the generality of our method, we tested our network on a subset of the IMRT test dataset used in the previous DeepDose version (Tsekas et al 2021). The data we used included abdominal tumours (prostate, rectal and oligometastatic cases) treated on a 1.5T MR-linac. We expected our current method to perform fairly well on these cases, since VMAT segments are typically more complex than IMRT ones and the same magnetic field was included in the dose calculations of both datasets. Therefore in this case training the network using VMAT cases could result in accurate dose predictions for IMRT segments as well.

Robustness evaluation on out-of-training VMAT plans
In order to assess the robustness of our network and its potential generalizability to more tumour sites, we performed a test on two patient cases that were not included in the training set: one brain and one pancreatic VMAT patient. Having trained the model on a highly heterogeneous dataset, evaluation on these out-of-training examples could provide useful insights on its potential range of application.

Analysis
For the evaluation of our method the dose differences between the predicted and ground truth dose distributions were calculated for each test segment, including the voxels within the 10%-100% of the dose maximum. Additionally, gamma analysis of the predicted segments for the 1%/1 mm, 2%/2 mm and 3%/3 mm criteria respectively was performed.
By summing up all individual predicted segments and weighting them with their clinical MU values, the total predicted plan dose was generated. Then, the dose differences were reported for each of the predicted plans in the test set for the voxels lying within the 50%-100% of the D1cc and gamma analysis tests were performed. The gamma pass rates for the entire test dataset as well as per individual tumour site were reported. The target and OAR coverage were assessed using dose-volume-histograms (DVH).
Furthermore, the accurate modelling of the ERE was assessed by comparing ground truth and predicted central segment profiles and an additional quantitative analysis of the ERE modelling was performed. Due to the fact that the ERE is evident for several millimeters, we decided to use a uniform 1 cm expansion around the ipsilateral lung contour for all lung test cases. We then reported the voxelwise relative dose differences for all predicted voxels that belonged to the expanded contours and lied within the 50%-100% of the D1cc.

Dose calculation results
Overall, a very good agreement was observed between the ground truth and the predicted dose distributions for both the per segment and per plan evaluation metrics. Different organ structures and regions with different tissue density had no impact on the statistical accuracy of the dose predictions of the network. The time needed on average for dose prediction per segment was 1.5 s.
For the total dose distributions of the VMAT mrl patients, highly accurate agreement was reported with average dose difference of 0.2% ± 0.7% (0.2 ± 0.3 Gy). The average gamma pass rates were 83.4% ± 12.7%, 96.8% ± 4.4% and 99.7% ± 0.5% for the 1%/1 mm, 2%/2 mm and 3%/3 mm analysis respectively. Table 1 summarizes the gamma scores per tumour site for the lung, rectal, prostate and esophageal cases in the test set for the 1%/1mm, 2%/2mm and 3%/3mm criteria. Figure 3 presents a central transversal slice of a predicted dose distribution for a VMAT prostate case and the corresponding DVH compared to the target one.
We additionally evaluated the performance of our method on IMRT segments by comparing it to our previous network version and obtained comparable results. For the total of 15 IMRT inference patients used (5 oligometastatic, 5 prostate and 5 rectal) an average gamma pass rate of 99.0% ± 0.6% per patient plan was reported for the 3%/3 mm criterion, while the previous DeepDose network, trained on abdominal IMRT segments, achieved a score of 99.3% ± 0.7% for the same gamma analysis.
Finally, the performance of our network was tested on two previously unseen VMAT tumour cases, one brain and one pancreatic patient. Our method passed all clinical requirements for both cases, reporting 97.3% and 99.8% respectively for the 3%/3 mm criterion.

ERE modelling
Our network predictions were able to effectively capture the ERE effect, caused by the presence of magnetic field in the ground truth dose distributions. Figure 4(a) presents a predicted lung VMAT segment and on figure 4(b) the corresponding predicted central dose profile is compared to the target one. The ERE effect is evident close to air-tissue interfaces, where dose increase spikes occur. Such examples are the lung tissue boundaries and the location where the beam exits the patient body.
By analyzing and reporting the voxelwise relative dose differences of the voxels belonging to the expanded ipsilateral lung contours for all lung test patients (7 cases) we obtained an average of 0.3% ± 0.2%.

Discussion
In this work, we presented a robust DL-based dose engine for VMAT dose calculations on the MR-linac. By training a convolutional neural network on a large set of anatomical tumour cases, highly accurate dose predictions for a test set of MR-linac-deliverable VMAT patient plans were demonstrated, while dose variations caused by the 1.5T external magnetic field were accurately modelled. Additionally, convincing dose calculations were demonstrated for a set of previously unseen IMRT tumour cases. The robustness of our method largely lies on training our network using multi-site data, covering thus a broad range of anatomies. In particular, we introduced tumour cases in the thoracic (lung and esophageal) and pelvic (prostate and rectal) region, including soft tissue, bony structures as well as air cavities. In addition, VMAT treatment planning typically results to segments of a higher number and complexity compared to IMRT. Thus, training our network on a few thousands of these diverse segment shapes resulted in an increase of the range of operation of our method, while the further addition of a 1.5T magnetic field fully simulates a hybrid MRI radiotherapy environment where arc therapy can be supported. The trained model was able to predict highly accurate dose distributions for a set of VMAT test plans, generated using a research TPS. Moreover, the performance of our method was successfully benchmarked against the previous DeepDose model on a set of IMRT segments.
In addition, our DL model demonstrated prediction robustness across different tumour sites and regions with density variations. High dosimetric agreement was observed for all test patients for the 3%/3 mm and  2%/2 mm criteria, including the challenging lung cases, with a few exceptions for the 1%/1 mm criterion, mainly caused by some rectal and esophageal cases. Nonetheless, we believe that no systematic error exists in the model predictions on these tumour sites and further analysis of the prediction accuracy using the stricter 1%/ 1 mm criterion is considered to be future work. Moreover, the quantitative analysis of the ERE modelling revealed excellent agreement of the predicted and target dose distributions within the expanded contours of the ipsilateral lungs, where the ERE is mostly evident.
We furthermore evaluated our trained model on two previously unseen VMAT tumour sites. For this purpose one brain and one pancreatic patient were used. The predicted dose distributions of both these out-oftraining examples passed our clinical requirements. While the pancreatic case was rather expected to pass the test, as it lies within the abdominal region which is well represented in the training set, the brain case was expected to perform worse. Nonetheless, the results on this small number of cases are purely presented as an indication of robustness of our approach. Further network training on new, out-of-training tumour sites will be needed for expanding the range of operation of our method, however we believe that our approach is easily generalizable to more tumour sites.
The feasibility of accurate dose calculations using the DeepDose method was proven for different clinical environments (withand without external magnetic field), tumour sites (prostate, rectum, lung, esophagus, oligometastases, pancreas, cervix), delivery modalities (IMRT and VMAT) and beam energies (7 MV, 10 MV). We therefore believe that we have provided enough evidence supporting the robustness of our framework as well as its potential applicability on different radiotherapy systems, such as a 0.35T (Mutic and Dempsey 2014), yet re-training of the network will be required for different linac data and magnetic fields.
While the generality of our method was proven, there are still issues concerning its speed that need to be addressed in the future. VMAT plans typically consist of hundreds of control points, each of which has to be represented by one segment using our approach and for which a full input preparation is needed. In our current implementation the input grid size was increased compared to the previous publication (Tsekas et al 2021) in order to fit some large VMAT segments and consequently the inference time grew respectively. A full 3D segment dose prediction using our framework takes on average 1.5 s, while the average time needed for the generation of the network inputs for each segment is 5.5 s. On the contrary, less than one second is needed for the forward dose calculation of a single segment using the gold-standard solution of GPUMCD (3% statistical uncertainty per segment). Therefore, we acknowledge that our method is not yet ready for a clinical introduction in an online workflow.
Future work will focus on attempting to accelerate our DL-based dose calculations by performing both software-and hardware related upgrades. Potential software improvements could include the use of different DL frameworks or tensor representations. Further optimization could be achieved by reducing the number of inputs required by our model. While input exploration of the DeepDose framework has already been performed (Kontaxis et al 2020) and the use of all five inputs proved to result in the best network performance, a more thorough investigation of different input combinations is needed. Also, generating masked ray-traced inputs, contrary to the current full 3D grids used, would drastically reduce our data preparation time. Ideally future implementations will focus on performing full plan 3D dose predictions, reducing thus the time currently needed for the whole plan calculations that rely on the summation of individual segment dose distributions. Additional network improvements could also focus on including full density information instead of synthetic CT scans. However, the inclusion of full CTs as network inputs is expected to require longer training of the network due to the voxel-by-voxel density variations. Finally, as far as hardware speed-up is concerned, upgrading the GPU card is expected to improve the inference speed of our network.
We demonstrated that our standalone DL-based dose engine has the potential to be used for clinical-grade dose calculations. A potential short-term goal would be its use as a secondary dose check for plan verification purposes, given the fact that it can be robust to different anatomies of both IMRT and VMAT data and it is less critical on speed. Nonetheless, the clinical introduction of a new framework prerequires extensive quality assurance (QA) and risk analysis. We plan to study the QA workflow of our method in future work.

Conclusion
We proposed a generic DL-based dose engine able to function in a hybrid MRI-radiotherapy environment and perform highly accurate forward dose calculations for a variety of IMRT and VMAT plans. Network robustness was achieved by randomizing gantry and collimator angles as well as MLC segment positions during training. The trained model was used to generate convincing 3D dose predictions for a set of test patients, passing all our clinical requirements. We believe that our approach is robust and can be sped up to be useful in online workflows for plan QA purposes for a variety of tumour sites and delivery modalities.