A motion model-guided 4D dose reconstruction for pencil beam scanned proton therapy

Objective. 4D dose reconstruction in proton therapy with pencil beam scanning (PBS) typically relies on a single pre-treatment 4DCT (p4DCT). However, breathing motion during the fractionated treatment can vary considerably in both amplitude and frequency. We present a novel 4D dose reconstruction method combining delivery log files with patient-specific motion models, to account for the dosimetric effect of intra- and inter-fractional breathing variability. Approach. Correlation between an external breathing surrogate and anatomical deformations of the p4DCT is established using principal component analysis. Using motion trajectories of a surface marker acquired during the dose delivery by an optical tracking system, deformable motion fields are retrospectively reconstructed and used to generate time-resolved synthetic 4DCTs (‘5DCTs’) by warping a reference CT. For three abdominal/thoracic patients, treated with respiratory gating and rescanning, example fraction doses were reconstructed using the resulting 5DCTs and delivery log files. The motion model was validated beforehand using leave-one-out cross-validation (LOOCV) with subsequent 4D dose evaluations. Moreover, besides fractional motion, fractional anatomical changes were incorporated as proof of concept. Main results. For motion model validation, the comparison of 4D dose distributions for the original 4DCT and predicted LOOCV resulted in 3%/3 mm gamma pass rates above 96.2%. Prospective gating simulations on the p4DCT can overestimate the target dose coverage V95% by up to 2.1% compared to 4D dose reconstruction based on observed surrogate trajectories. Nevertheless, for the studied clinical cases treated with respiratory-gating and rescanning, an acceptable target coverage was maintained with V95% remaining above 98.8% for all studied fractions. For these gated treatments, larger dosimetric differences occurred due to CT changes than due to breathing variations. Significance. To gain a better estimate of the delivered dose, a retrospective 4D dose reconstruction workflow based on motion data acquired during PBS proton treatments was implemented and validated, thus considering both intra- and inter-fractional motion and anatomy changes.


Introduction
Treating tumours in the thorax and abdomen regions is challenging due to the respiratory motion which leads to large tumour displacements and induced density changes. Especially for pencil beam scanned (PBS) proton therapy, the interplay between the patient's motion and the dynamic beam delivery can result in complex dose degradation with dose inhomogeneities occurring within the target (Phillips et al 1992, Bert et al 2008. Using 4D dose calculations (4DDC), the dosimetric effects of motion on the planned dose distributions have been comprehensively investigated in various simulation studies (Bert et al 2008, Seco et al 2009, Zhang et al 2012, Grassberger et al 2013, Ammazzalorso and Jelen 2014, Zou et al 2014, Batista et al 2018, Dolde et al 2019, Meijers et al 2020, Steinsberger et al 2021, Lebbink et al 2022. The effectiveness of different motion mitigation techniques, such as breath hold, beam gating, rescanning and tracking, have also been analyzed extensively (Li et al 2006, Bert and Rietzel 2007, Knopf et al 2011, Grassberger et al 2015, Zhang et al 2015, Gorgisyan et al 2017, Ishihara et al 2017, Engwall et al 2018, Dolde et al 2019, Gorgisyan et al 2019. Nevertheless, prospective and patient-specific simulations of the effects of motion are inherently limited in their predictive power, due to the complexity of the 4D problem and its dependency on a large number of largely unknown variables and motion/delivery characteristics at the time of treatment. As such, a number of authors have investigated the possibility of reconstructing the delivered dose, taking into account directly measured motion surrogates in combination with time-resolved log file (LF) data of the actual delivery.
The use of log files for reconstructing 4D doses has been previously reported by Krieger et al (2018) for the validation of their analytical 4DDC algorithm. Further, for instance, Richter et al (2013Richter et al ( , 2014 and Batista et al (2018) incorporated breathing irregularity into the 4D dose reconstruction for scanned charged particle therapy by combining motion trajectories from an Anzai belt (Anzai Medical Co., Ltd) with time-resolved logs of the beam delivery sequence. A log file-based approach has also been adopted for clinical dose reconstructions using motion extracted from the planning 4DCT or weekly repeated 4DCT, with surrogate motion variations during treatment delivery being recorded also using the Anzai system (Meijers et al 2019, 2020, Spautz et al 2022. For the 4DDC in these approaches, each pencil beam is typically assigned to the closest phase of the pre-treatment planning 4DCT (p4DCT), based on either breathing signal amplitude or phase.
Nevertheless, while significant intra-and inter-factional changes in breathing motion (Von Siebenthal et al 2007a, Den Boer et al 2021) occur, a 4DCT typically only represents a single, averaged breathing cycle, acquired at the pre-treatment stage. Intuitively then, when assigning pencil beams to the corresponding geometry (4DCT phases/bins), motion amplitudes different to those observed in the p4DCT cannot be fully taken into account. A previous publication by our group has recently shown the impact of neglecting these effects (Duetschler et al 2023).
In order to take into account motion variability during the dose delivery, one approach is to simply scale the 4DCT motion along the cranio-caudal direction according to observed breathing signals (Kraus et al 2011). Alternatively, incorporating a more accurate motion model into the 4D dose calculation or reconstruction could be an effective approach. Many studies have shown that the 3D motion can be predicted from surrogate signals using either patient-specific or population-based motion models In this work, we add to the work of Meijers et al (2019Meijers et al ( , 2020 and Spautz et al (2022). By introducing a comprehensive workflow for 4D dose reconstruction based on log files, but with full 3D motion estimates derived from patient-specific, surrogate-driven motion models, our new methodology allows for changes in the breathing amplitude to also be taken into consideration. In particular, a new motion modelling and prediction framework based on principal component analysis (PCA) and partial data reconstruction (Blanz and Vetter 2002) has been developed and validated to incorporate online acquired 3D motion irregularity of a surrogate into the 4D dose reconstruction workflow. Consequently, information from delivery log files and surrogate motion trajectories have been combined for 4D dose reconstruction at a temporal resolution close to that contained in the log files to gain a more accurate estimate of the actually delivered dose. As such, the proposed workflow addresses the following three issues: (i) A patient-specific motion modelling and prediction framework was implemented to establish the correlation between surface marker motion, as recorded using an optical tracking system (OTS, Fattori et al (2022)) and deformation vector fields (DVFs) from a pre-treatment p4DCT. The time-resolved DVFs can be reconstructed from 3D marker motion tracked by the OTS during the treatment delivery. The accuracy of motion prediction was validated both geometrically and dosimetrically.
(ii) Within this framework a motion model, based on acquired OTS marker trajectories for both intra-and inter-fractional motion variations, has been used to derive time-resolved multi-cycle 4DCTs (referred to as 5DCTs). Combined with synchronized machine delivery log files, 4D fraction dose distributions can then be retrospectively reconstructed.
(iii) The joint impact of both motion variabilities and inter-fractional patient changes has additionally been investigated by combining repeated gated CTs, acquired regularly throughout the treatment course, with the deformable motion model to perform 'fraction' specific 4D dose reconstructions.
This proposed workflow has been demonstrated on three clinical 4D cases as a first proof of principle study of the method.

Materials and methods
In section 2.1, we first introduce the motion model-guided 4D dose reconstruction workflow, followed by the OTS motion trajectories and machine log files used in this work (section 2.1.1). The validation of the motion modelling and prediction framework is explained in section 2.2. In section 2.3, we describe the comparisons of the reconstructed 4D dose distributions considering intra-and inter-fractional variability. Finally, in section 2.4 the patient data used for this proof of concept study and the treatment planning approach are presented.

The motion model-guided 4D dose reconstruction workflow
An overview of the motion model-guided 4D dose reconstruction workflow is presented in figure 1. The workflow consists of three main sequential steps: motion extraction (section 2.1.2), motion modelling framework (section 2.1.3) and log file-based 4D dose reconstruction (section 2.1.4).

Surrogate motion and delivery log files
In our current clinical 4D treatment workflow, an infrared-reflective body marker is placed on the patient's chest using anatomical landmarks prior to pre-treatment (planning) 4DCT imaging and then before the start of each fraction (Fattori et al 2022), thus avoiding the use of skin tattoos. 3D motion trajectories are acquired during the pre-treatment imaging and during the treatment at a frequency of 60 Hz. Based on the work of Fattori et al (2017), NDI Polaris Spectra optical tracking technology (Northern Digital Inc., Waterloo, CA, USA) is used. During the treatment delivery, measured spot positions and delivered monitor units are also recorded in a machine delivery log file (Meier et al 2015, Scandurra et al 2016. These delivery log files are synchronized with the OTS trajectories over a network time protocol. As part of the data consistency check, we verified that all spot times read from the delivery log files fell within the gating window recorded in the OTS log, and from this, we could ensure adequate synchronisation of the systems for the purposes of this study.

Motion extraction by deformable image registration
At the planning stage, for each patient, a p4DCT is acquired and reconstructed based on the OTS-tracked surface motion (Fattori et al 2022) into eight phases (phase-sorting). An additional prospectively gated planning CT (pCT) is acquired for each patient, using a 30% breathing amplitude window around the end exhale (EE) breathing state (Fattori et al 2022). Motion DVFs were first extracted from all phases of the p4DCT with respect to the EE phase as a reference. The deformation vectors in the spine region and outside of the body were manually set to zero to avoid introducing any nonphysical motion into the motion model. Furthermore, the OTS marker motion shown on the p4DCT was extracted from the resulting DVFs and visually checked. The EE phase was registered to the respective pCT, which was used as reference geometry for the 4DDCs. If available, repeated gated daily CTs (dCTs) were also deformably registered to the pCT.
All image registrations in this work were performed with the open-source Plastimatch 8 software, with a 3D affine registration followed by a subsequent 3D multi-resolution B-spline registration using the mean-squared error cost function metric. The quality of all DIR results was visually checked.

Motion modelling and reconstruction
Previous publications by our group proposed and validated the reconstruction of the full 3D DVFs from surrogates tracked using beam's eye view 2D x-ray imaging (Zhang et al 2013(Zhang et al , 2014. This PCA motion modelling framework was re-implemented and adapted to reconstruct DVFs from 3D surrogate motion acquired with the OTS, which is used in our clinical workflow for respiratory gating. A brief explanation of the principle of PCA modelling and reconstruction can be found in appendix A. The motion model was trained based on the DVFs from the single breathing cycle p4DCT to establish a statistical correlation between the 3D surrogate OTS marker motion and the full DVF. Prior to motion modelling, however, the OTS motion trajectories were first downsampled to 2 Hz (similar to the typical frequency of a 4DCT) and normalized on the first EE state. The resulting patient-specific model was used to reconstruct the full 3D DVFs from OTS trajectories acquired over the whole treatment delivery. Moreover, the necessary density information for 4DDC was provided by the pCT, which was warped according to the predicted 3D DVFs.
Additionally, the predicted DVFs can also be used to warp dCTs for incorporating inter-fractional CT changes. As such anatomical changes are considered, but differences between the pCT and dCT could also originate from daily breathing variations and resulting differences in the EE phase definition used for the gated imaging. For this purpose, the pCT and dCTs were first rigidly aligned based on bony anatomy and anatomical correlation was established through DIR. The predicted multiple-breathing cycle 4DCTs are referred to as p5DCTs and d5DCTs, respectively, depending on the underlying CT image (see table 1). , which relies on ray-casting (Schaffner et al 1999) and has recently been implemented into our in-house treatment planning system and has been accelerated by the use of graphics processing units (GPUs). In contrast to deforming the dose, this 4DDC approach involves deforming the dose calculation grid based on the time-dependent DVFs extracted from 4D images. The water-equivalent depth for each phase is then computed for the deformed grid points using the corresponding DVFs. Since pencil beams are typically delivered between distinct phases, with this approach DVFs can be interpolated in time to achieve a finer temporal resolution, allowing for more precise 4DDC calculations based on individual pencil beam delivery times , Duetschler et al 2023. Further details can be found in appendix B. The beam model of PSI-Gantry2 (Pedroni et al 2004, Zenklusen et al 2010, Safai et al 2012 was used for all dose calculations. The same settings as used clinically were employed and the dose calculation grid was set to be 4 mm × 4 mm (lateral) × 2.5 mm (distal) with 4 mm lateral and 2.5 mm distal spot spacing.

Log file-based 4D dose reconstruction
In the delivery log file, for each delivered pencil beam the measured parameters are documented, including lateral spot position, spot energy, delivered monitor units and spot delivery time.

Performance evaluation of the motion model
The OTS-driven motion model (Zhang et al 2021) is a central part of the described workflow, as such the process of its evaluation will briefly be described here.

Geometric accuracy of the deformable motion reconstruction
The performance of the motion model was first validated for all patient 4DCTs used in this study (see section 2.4 below) using the leave-one-out-cross-validation (LOOCV) technique, for which the PCA model was trained on seven of eight phases while always leaving out one phase for testing. The OTS motion trajectory extracted from the remaining 4DCT phase was then used as input to test the model. Additionally, as a benchmark, the model was also trained and tested on all eight 4DCT phases as the best-case scenario (referred to as BEST). The model prediction accuracy for above two scenarios was assessed by comparing the predicted DVFs to the corresponding 'ground truth' (GT) DVFs from the p4DCTs. The point-wise 3D distance between the GT and predicted 3D motion vectors was assessed and the Euclidean distance was calculated for each phase. The median (P 50% ) and the 95th percentile (P 95% ) of the prediction error within the body were compared.

Dosimetric effects of the motion reconstruction
Synthetic 4DCTs were derived from the predicted LOOCV and BEST motion, by warping the pCT with the corresponding predicted DVFs. The resulting 4DCTs were then used as input for the 4DDC and compared to the 4D dose calculated using the GT 4DCT and its DVFs. The voxel-wise dose differences for 4DDC from LOOCV or BEST compared to GT scenarios were calculated. For analysis, a volume of interest (VOI 50% ) was defined by the 50% isodose of the static planned dose distribution. The percentages of voxels in the VOI with an absolute point dose difference to the GT dose distribution larger than 10% (V dosediff>10% ) and 5% (V dosediff>5% ) were calculated respectively for each scenario. We also calculated the global gamma pass rate using an absolute dose difference of 3% and a distance to agreement of 3 mm with a lower 10% dose threshold.

4D dose reconstruction considering both intra-and inter-fractional variability
Based on the motion model-driven log file-based 4D dose reconstruction framework described above, dose distributions have been retrospectively calculated for three example patients considering their fraction-specific logged delivery information and synthetic 5DCTs based on predicted DVFs from the corresponding fractionspecific OTS motion trajectories. In this proof of principle study, we examine the dosimetric differences at each fraction, due to the intra-fractional motion variability and inter-fractional motion (and anatomy) changes within both clinical target volumes (CTV) and organs at risk (OARs) through cumulative dose volume histograms (DVHs). The dose indices of V 95% and D 5% -D 95% were evaluated for the CTV of each case. Moreover, the reconstructed 4D dose was compared to the result of a prospective 4DDC of the gating treatment, using the p4DCT with a 30% gating window around the EE phase. For this, the spot delivery times without gating were first pre-calculated based on the delivery dynamics of PSI-Gantry2 (Pedroni et al 2004) and then adjusted according to the repeated gating signal calculated from the OTS marker motion extracted from the p4DCT. The resulting 4D dose distribution represents the expected dose as evaluated at the pre-treatment stage. The different 4DDC scenarios are summarized in table 1.

Patient data and treatment planning
The pre-treatment image data, OTS motion trajectories and delivery log files of three abdominal/thoracic cancer patients (annotated as Case A, B and C) treated by PSI-Gantry2 with respiratory beam gating were used. For one example patient (Case C), repeat gated CTs, taken at each fraction during the full treatment schedule, were available and used as updated geometry to evaluate the joint effects of motion variability and anatomic variations.
The CTV of all phases within the gating window were combined as the internal target volume (ITV), which was further extended by a 5-6 mm margin to account for setup and range uncertainties in the planning target volume (PTV). The PBS treatment plan with three (Case A and B) or two fields (Case C) was optimized on the pCT and PTV (see figure 2). The patients were treated using 2-4 times volumetric rescanning based on pretreatment 4D dose calculations considering the residual motion. All three patients were treated with a 30% amplitude gating window around the EE phase using the OTS (Fattori et al 2022). A short summary of the patient and treatment plan characteristics is given in table 2.
For this study, 10 example fractions (3 each for Case A and B; 4 for Case C) were selected (see table 2) for validating the usefulness of the motion model-guided 4D dose reconstruction workflow.

Results
3.1. Performance evaluation of the motion model 3.1.1. Geometric accuracy of the deformable motion reconstruction Figure 3 shows the motion-magnitude (top) and the prediction errors of the LOOCV and BEST validation (bottom) for all phases of the planning 4DCT of each patient. The median prediction error for all voxels within the body is below 0.6 mm for both validation scenarios for Case B and Case C. For these two cases, the 95th percentile of the prediction error is between 0.6/0.7 mm and 2.8/4.2 mm for the BEST/LOOCV validation. Larger prediction errors occur for Case A (figure 3(a)), which exhibits larger motion amplitudes. In figure 3(a) a pronounced peak in the prediction error for the 63% phase, with a median prediction error of 2.6 mm and P 95% of 13.0 mm for LOOCV, can be observed. The OTS marker position of the 63% phase is very similar to the 12% phase, however, these phases show considerable differences in motion (see motion-magnitude in figure 3(a)), which leads to the large prediction error for the 63% phase.
As expected, the prediction errors of the LOOCV scenario are generally larger than the BEST scenario, where all phases were already used for the model building, but these differences are not pronounced. In figure 3 a slight increase of the LOOCV prediction error for inhalation phases (00% and 85/90% phase) can be observed, as these have to be extrapolated from the remaining phases used for model building.
When the motion model is used for irregular motion, no ground truth is available to compare the predicted dense motion fields to. As such, the two validations here act as quality assurance of the motion model for the individual cases. For each patient case, the motion model is based on a single p4DCT and the correlation established between the OTS surrogate motion and the DVF is assumed to be similar for larger motion amplitudes and we expect the motion prediction accuracy for irregular OTS motion to be similar.

Dosimetric effects of the motion reconstruction
The results of 4DDCs after warping the planning CT with the motion predicted by the LOOCV and BEST modelling scenario were compared to 4DDC of the original 4DCT. In figure 4 the dose distribution for the  4DCT and the LOOCV scenario as well as their dose differences are shown. For the worst case example (Case A), the most pronounced local dose differences are located close to the diaphragm. The percentage of voxels within the 50% isodose VOI 50% with an absolute dose difference above 5% and 10% of the prescribed dose and the 3%/ 3 mm gamma pass rates are listed in table 3. For Case A, the motion differences due to model prediction lead to an absolute dose difference >5% for about 10% of voxels within the VOI 50% . For Case B and Case C, less than 1% (4%) of the VOI 50% has an absolute dose difference >10% (5%) for both modelling scenarios. Gamma pass rates above 99% are achieved for both cases.   Figure 5(a) shows example motion trajectories of Case A, comparing the OTS marker motion (anteriorposterior direction) of the p4DCT and of one example fraction (05) recorded during the delivery. We noticed a clear difference of around 3.5 mm between the marker amplitude on the p4DCT and on average during the treatment fraction as well as a drift of the marker position over time. Moreover, considerable intra-and interfractional motion variations in both frequency and amplitude were observed for all studied motion trajectories and for all three cases (see figure 5), especially for the later fractions. The largest average difference in the OTS marker amplitude was observed for fraction 10 of Case A, with a mean difference of 5.0 mm to the p4DCT. The smallest amplitude differences compared to the p4DCT were observed for Case B and on average remain below 0.9 mm. The largest intra-fractional amplitude variations (up to 25.7 mm) occur for Case C. The resulting CTV motion amplitude from the model prediction is shown in figures 6(a) and (b). The median (P 50% ) and 95th percentile (P 95% ) motion amplitude within the CTV was extracted from the p5DCTs. The largest CTV motion amplitude occurs for Case A, where a median displacement of 8.7 mm can be observed on the p4DCT. Case A also shows the largest motion variability, with displacements larger than 100 mm in parts of the tumour for some breathing cycles. The 95th percentile CTV motion amplitude of the p4DCT is 23.8 mm, 7.9 mm and 7.0 mm for Case A, Case B and Case C, respectively.

Predicted intra-and inter-fractional motion variability
All three patients were treated with respiratory-gated beam delivery to mitigate the motion. The residual CTV motion during the gated beam delivery as estimated by the motion model is presented in figures 6(c) and  (d). The 95th percentile CTV displacements within the 30% gating windows observed in the p4DCTS for Case A, Case B and Case C, respectively, are 7.8 mm, 2.0 mm and 3.0 mm.

Log file-based 4D dose reconstruction
The results of the motion model-guided 4D dose reconstruction workflow are presented in this section. All results were obtained by warping the pCT with predicted DVFs (i.e. using p5DCTs) considering surrogate trajectories during dose delivery, thus reflecting both intra-and inter-fractional variations in the respiratory motion. In figure 7(a) the 4D dose distribution simulating gating on the p4DCT is shown for the example Case C, while figures 7(b) and (c) show the reconstructed 4D dose for two example fractions (01, 06). Differences to the simulated gating scenario on the p4DCT can be observed for both fractions (figures 7(d) and (e)), as well as slightly smaller differences between the two example fractions (figure 7(f)). Most pronounced differences can be observed cranial and caudal to the CTV, which could be explained by changes in the breathing amplitude (see figure 6).
The resulting DVHs for the CTV and nearby OARs for the four studied fractions of Case C are shown in figure 7(g). Compared to the pre-treatment simulation of gating on the p4DCT, a small decrease in the CTV coverage can be observed for fractions 01 and 23. Very small DVH differences occur in the liver and spinal cord, while the largest differences can be observed in the pancreas, stomach and spleen. In figure 8 the DVHs for Case A and Case B are plotted. A small decrease in the CTV coverage can be observed for Case A especially for fraction 05, while the CTV coverage remains very similar for all fractions for Case B. For Case A, the largest differences occur in the stomach, while for Case C the esophagus is most sensitive to motion changes. Table 4 lists some dosimetric indices for the CTV and OARs for Case C. The reconstructed 4D dose distributions using the p5DCT result in a slight degradation of the plan quality for the studied fractions. The CTV coverage (V 95% ) is reduced by 0.1%-1.0% for the studied fractions compared to the p4DCT, while an increase of up to 1.6% in D 5% -D 95% is observed. Only D 2% to the spinal cord is marginally reduced for all reconstructed fractions by 0.3%-0.6%. Figure 9 illustrates dose statistics for the CTV and selected OARs for all studied fractions of the three cases. In general, the p5DCT-based reconstructed dose distributions are slightly worse than expected by the pretreatment simulation on the p4DCT, which leads to an overestimation of the CTV dose coverage and homogeneity for all studied fractions. On average, CTV V 95% is overestimated by 0.5% (max: 2.1%) and D 5% -D 95% underestimated by 0.9% (max: 3.1%). Only small differences in D 2% spinal cord and the mean dose to the left kidney occur (maximal differences to p4DCT of -0.5% and +1.2%, respectively). The differences in the mean left kidney and stomach dose are slightly larger with average differences to the p4DCT of 1.4% and 2.8%, respectively (max: 4.9% and 7.2%).
Altogether, the reconstructed fraction doses using the p5DCT show no substantial plan degradation compared to the pre-treatment evaluations, confirming the accuracy of the gated treatment under the conditions of irregular free breathing.

Joint impact of motion variability and anatomic changes
The results for Case C in the previous section, based on warping the planning CT (p5DCTS), will now be compared to results based on the daily CT (d5DCTs), thus also including inter-fractional anatomy changes. Overlays of the pCT and the dCTs acquired prior to the treatment delivery are shown in figure 10. The reconstructed d5DCT doses for fractions 01 and 06, using the predicted motion DVFs and the fractional CT image, are shown in figure 11. Their dose difference to the pre-treatment p4DCT-based calculation (see also figure 7(a)) are also shown and are more pronounced than the differences only due to motion variability (figures 7(d)-(f)).
In figure 11(e) the DVHs for 4D doses based on the p4DCT (solid line), the fractional p5DCTs (dotted lines) or d5DCTs (dashed lines) are compared. The inter-fractional CT changes lead to a reduced dose to the CTV for all studied fractions. On the other hand, while inter-fractional variations in the breathing motion (p5DCTs, dotted lines) lead to an increased dose burden to the OARs, a dose reduction to most OARs can be observed when considering dCT changes (d5DCTs, dashed lines). Only one fraction (01) results in a dose increase to some OARs based on the d5DCT compared to the p4DCT simulation.
Dose statistics are also listed in table 4 and visualized in figure 9. Using the d5DCT, for all fractions, a reduced target coverage V 95% compared to p4DCT and compared to the p5DCT results discussed in the previous section can be observed. However, with V 95% 97.2% an acceptable target coverage is maintained for all fractions. Larger inter-fractional differences in the dose statistics of the OARs can be observed due to daily anatomy changes compared to breathing variations alone.

Discussion
The treatment planning of moving tumours typically relies on a single pre-treatment 4DCT, representing one average breathing cycle. The 4D plan quality of actually delivered PBS proton treatments could however be severely degraded by the interplay effect due to the sensitive nature of protons to both intra-and inter-fractional motion variations and anatomical changes. Surrogate motion could provide true information on the breathing motion and its variations during the treatment delivery and has previously been used for 4D dose  reconstructions based on amplitude-or phase-sorting of individual spots onto the individual 3D images of a 4DCT (Richter et al 2013, Batista et al 2018, Meijers et al 2019, 2020, Spautz et al 2022. However, these approaches still rely on a 4DCT with one single breathing cycle, which could underestimate motion effects and therefore misjudge the quality of the delivered treatment (Duetschler et al 2023). We, therefore, presented a 4D dose reconstruction workflow using multiple-breathing cycle 4D image data sets ('5DCTs') reconstructed from measured surrogate trajectories based on PCA motion modelling. For each fraction, the 5DCT consisted of ∼3000-8000 3DCTs with a temporal resolution of 2 Hz. Moreover, the fractional anatomical variations can additionally be included using daily (or weekly) CTs as reference geometries. The new implementation of the 4D dose calculation in the in-house treatment planning system provides the basis for further studies of moving tumour treatments with both prospective and retrospective 4D dose evaluation functionalities. The flexible implementation allows the use of clinical 4DCTs representing one breathing cycle, but also the study of realistic irregular breathing scenarios e.g. through the use of motion  models, as done in this work, or using 4D numerical phantoms (Segars et al 2010, Duetschler et al 2022a. Furthermore, the simulation of different motion mitigation techniques (gating, rescanning, tracking) has also been implemented. Such studies are further facilitated by the reduced 4D dose calculation times through the implementation of dedicated Aparapi GPU kernels using Java (Oracle Corporation, Redwood Shores, CA, USA). Using a local NVIDIA Quadro P2200 GPU (Nvidia Corporation, Santa Clara, CA, USA) the 4D dose distribution of complex treatment plans using a clinical 4DCT, which previously took a few minutes up to a few hours, can be calculated in under one minute. On the other hand, the motion model reconstruction for the whole treatment duration requires long computation times and a large data storage capacity. In the future, motion modelling could be integrated into the treatment planning system and only the necessary breathing states for 4D dose reconstruction calculated on the fly.
Here, we built upon the previous PCA motion modelling method by Zhang et al (2013Zhang et al ( , 2014, but also other motion modelling approaches could be utilized. One limitation of the chosen approach is, that the motion model does not include any temporal information. Thus, if two phases in the p4DCT have a very similar maker position but different internal motion, this can lead to errors in the reconstructed DVFs. This for example led to a pronounced overestimation of the motion in the model validation for one phase of Case A (peak in figure 3(a)). Alternatively, separate modelling for inhalation and exhalation phases could also overcome this issue. Instead of a single surface marker, also multiple markers could be monitored simultaneously and used for motion modelling. PCA based on different surface surrogate signals has previously been investigated by Fayad et al (2012). Opposed to motion modelling based on surface markers, surrogate signals extracted from optical tracking of the patient's surface have also been studied (McClelland et al 2011, Fayad et al 2012, McClelland et al 2013, Fassi et al 2014, Wölfelschneider et al 2017. For this study, the OTS marker motion was extracted from the p4DCT, limiting the case selection for this application, since not all cases have the marker in the image range of the p4DCT. Alternatively, the motion model could directly take the OTS trajectories acquired during the p4DCT acquisition as input for the model generation. For example, an averaged OTS trajectory during the p4DCT acquisition could be used for the training of the motion model. However, we have to remark that with both these approaches, each motion model is trained using a single p4DCT breathing cycle assuming that the relationship between the breathing surrogate and the internal motion can be extrapolated for larger amplitudes. As such, the LOOCV results showed a slight increase for extreme breathing phases due to extrapolation errors (see figure 3). The validity of this assumption should be evaluated further, for example taking advantage of longer-duration datasets from 4DMRIor numerical phantoms, such as XCAT (Segars et al 2010) or 4DCT(MRI)s (Duetschler et al 2022a), which provide ground truth information about irregular breathing.
Previous surrogate-based 4D dose reconstruction approaches (Richter et al 2013, Batista et al 2018, Meijers et al 2019, 2020, Spautz et al 2022 were based on the Anzai system, which provides a relative 1D surrogate signal in arbitrary units, while the OTS records the 3D displacement of a point on the body surface in physical units. The placement of the OTS marker is based on anatomical landmarks (Fattori et al 2022), avoiding the use of skin tattoos. Therefore, differences in the OTS marker position on the patient's body surface could result in some inter-fractional differences in the OTS surrogate motion, which in turn could lead to motion modelling prediction errors. Therefore, systemically investigating the uncertainty of prediction performance due to possible predictor location variations is also important. Alternatively, the motion modelling framework could also be used with an internal surrogate signal (e.g. using online x-ray imaging (Zhang et al 2013(Zhang et al , 2014 or diaphragm position from ultrasound (Giger et al 2020)), which is subject to fewer location variations.
For this first proof of concept study, we randomly selected only a small number of fractions as examples and selected fractions with a smooth, uninterrupted treatment delivery. A comprehensive study including more patient cases, with more fractional motion trajectories and more fractional CTs, is necessary to systematically understand the joint dosimetric impacts of both intra-and inter-fractional motion variability and anatomic changes. We believe the workflow and framework established in this paper provide a good foundation to achieve such evaluations in the future. Further, the reconstructed fraction doses could be accumulated to account for fractionation, which will smear out some daily variations.
4D dose reconstruction and accumulation after the delivery of each fraction could also be used to make decisions related to treatment plan adaptations (Meijers et al 2019, Albertini et al 2020, Meijers et al 2020. This could involve e.g. increasing the number of rescans or reducing the gating window or, especially in the case of anatomical changes, a full plan re-optimization.
The real-time tracked surrogate motion was combined with delivery log files of the PBS proton plan to derive a more accurate and realistic 4D dose reconstruction considering intra-and inter-fractional motion variability. This was used to reconstruct the doses of three patients treated with respiratory beam gating combined with rescanning. The resulting reconstructed dose distributions showed no clinically relevant loss of target dose coverage, confirming the effectiveness of the gated treatment delivery. The use of daily gated CTs for one patient showed a larger influence on the delivered dose than variations in the breathing pattern alone. It has to be mentioned, that a perfect patient positioning was assumed and rigid registration of the bony anatomy was used for the daily gated CTs, which does not necessarily represent the clinical treatments using a target-based alignment. Further, the breathing motion was limited to residual motion within the gating window, and larger dose differences due to breathing variability could occur for treatments using only rescanning as a motion mitigation technique. Future simulations with rescanning alone could give insight into the impact of both intraand inter-fraction motion and anatomy changes.

Conclusion
A framework for retrospective 4D dose reconstruction based on delivery log files and acquired surrogate motion trajectories during PBS proton treatments delivered to cancer patients with moving tumours was successfully implemented and validated. The motion modelling and 4D dose calculation framework can consider both intra-and inter-fractional motion and anatomy changes and can be used to study the effect of realistic irregular breathing for different motion mitigation techniques.
In this first proof of concept study, we have combined all the available data to give a better insight into the delivered dose. For the studied clinical cases treated with respiratory-gated PBS proton therapy, an acceptable target dose coverage was maintained for all fractions. The next step is the computation of the eigenvectors s t and eigenvalues t 2 s of the covariance matrix C, such For every time step, we now want to reconstruct motion vectors v of size 3K from lower-dimensional surrogate data r (dimension N < 3K ). In the case of a 3D OTS surrogate signal N = 3 for every time step. We use the approach by Blanz and Vetter (2002), as described below.
In other words, we are searching for a map L: K The integral depth dose ID 0,n depends on the initial beam energy and energy spectrum and the water-equivalent range WER 0 of the grid point (s, v, u). For each pencil beam a single Gaussian approximation is used with standard deviations of σ 0,u and σ 0,v .

B.2. 4D dose calculation
To perform 4D dose calculation (4DDC) involves discretizing the continuous delivery timeline  into discrete time points represented by   . Further information on this can be found in the work of Zhang et al (2019). Linear interpolation of the deformation vector fields obtained from deformable image registration of the 4D images enables the computation of displacements Δv t and Δu t perpendicular to the beam direction at each dose grid point (s, v, u). The density information required for calculating the water-equivalent range WER t (s, v, u) is provided by the closest 4DCT image. Using equation (B.2), the deforming dose grid algorithm allows for the use of an arbitrary temporal resolution Δt in the discretization of the delivery time  . A common approach is a spot-wise 4DDC using the spot delivery time t spot for each pencil beam, e.g. extracted from delivery log files as done in this work.