Virtual 4DCT generated from 4DMRI in gated particle therapy: phantom validation and application to lung cancer patients

Objective. Respiration negatively affects the outcome of a radiation therapy treatment, with potentially severe effects especially in particle therapy (PT). If compensation strategies are not applied, accuracy cannot be achieved. To support the clinical practice based on 4D computed tomography (CT), 4D magnetic resonance imaging (MRI) acquisitions can be exploited. The purpose of this study was to validate a method for virtual 4DCT generation from 4DMRI data for lung cancers on a porcine lung phantom, and to apply it to lung cancer patients in PT. Approach. Deformable image registration was used to register each respiratory phase of the 4DMRI to a reference phase. Then, a static 3DCT was registered to this reference MR image set, and the virtual 4DCT was generated by warping the registered CT according to previously obtained deformation fields. The method was validated on a physical phantom for which a ground truth 4DCT was available and tested on lung tumor patients, treated with gated PT at end-exhale, by comparing the virtual 4DCT with a re-evaluation 4DCT. The geometric and dosimetric evaluation was performed for both proton and carbon ion treatment plans. Main results. The phantom validation exhibited a geometrical accuracy within the maximum resolution of the MRI and mean dose deviations, with respect to the prescription dose, up to 3.2% for target D 95%, with a mean gamma pass rate of 98%. For patients, the virtual and re-evaluation 4DCTs showed good correspondence, with errors on target D 95% up to 2% within the gating window. For one patient, dose variations up to 10% at end-exhale were observed due to relevant inter-fraction anatomo-pathological changes that occurred between the planning and re-evaluation CTs. Significance. Results obtained on phantom data showed that the virtual 4DCT method was accurate, allowing its application on patient data for testing within a clinical scenario.


Introduction
Particle therapy (PT) with protons or carbon ions has the potential to achieve highly conformal dose distributions with optimal target coverage and improved sparing of surrounding organs at risk (OARs) compared to conventional x-rays radiotherapy, thanks to its geometrical selectivity and radiobiological effectiveness (Durante et al 2017). Thoraco-abdominal tumors such as in the lung, liver and pancreas could potentially benefit from the use of PT (Wink et al 2014, Shibuya et al 2018, Kawashiro et al 2018, Li et al 2020.
Organ motion, i.e. anatomical changes of tumor and surrounding structures in the tumor site that occur in a long and short time scale (interand intra-fraction variations, respectively), introduces geometric uncertainties into this procedure, leading to risk of geometrical misses and interplay effects (Keall et al 2006, Kubiak 2016, Mori et al 2018, Mastella et al 2021. Respiration is the main cause of organ motion in the thoraco-abdominal site and needs to be accounted for during treatment planning and delivery.
Respiratory motion is depicted by acquiring respiratory-correlated four-dimensional (4D) imaging, with 4D computed tomography (4DCT) image being the current clinical standard (Keall 2004), which is the base for the treatment planning workflow. The main limitation is that the patient's anatomy is imaged only within the time interval of the acquisition and might fail to represent the actual motion during treatment (Keall et al 2006, Riboldi et al 2012. Re-evaluation 4D-CTs acquired during the treatment course provide a valuable support to verify the impact of organ motion variations on the treatment, however at the cost of extra imaging dose to the patient (Li et al 2022, Molinelli et al 2022.
The increasing use of magnetic resonance imaging (MRI) in photon radiotherapy to guide treatment is rapidly translating also into PT, with some feasibility studies on integrating in-room MRI with a proton beam (Hoffmann et al 2020), and others demonstrating the potential of offline MRI in supporting treatment planning (Meschini et al 2022b). Specifically for moving organs, the radiation-free modality combined with fast dynamic sequences allows for long and repeated acquisitions (Jaffray 2012, Paganelli et al 2018b, Keall et al 2022 to derive respiratory-correlated 4D magnetic resonance imaging (4DMRI) accounting for respiratory motion variability (Fontana et al 2016, Stemkens et al 2018, Paganelli et al 2018b, Paganelli et al 2019.
Although deep learning-based strategies to generate synthetic CTs from MRI data are currently being investigated to derive the information required for treatment planning (Cusumano et al 2020, Boulanger et al 2021, these are not yet clinically implemented. This is mainly due to difficulties in their validation, and a limited number of studies is present in the literature for mobile organs because of anatomical complexity (Parrella et al 2023). In this context, the method originally proposed by Boye et al (2013), consisting in the generation of a virtual 4DCT via deformable image registration (DIR), can be more easily validated and has already shown its potential for proton and carbon ion therapy of thoraco-abdominal tumors (Bernatowicz et al 2016, Dolde et al 2019, Meschini et al 2020, Duetschler et al 2022, Meschini et al 2022a. In this work we therefore aimed at the first experimental validation of the virtual 4DCT approach (Meschini et al 2020) on an ex-vivo porcine lung phantom representing patient-like data close to the reality of the clinical practice. The method was evaluated for applications in carbon ion and proton therapy treatments from both geometrical and dosimetric standpoints. In addition, we applied the validated method on lung cancer patients data treated with PT to evaluate breathing motion variabilities in a clinical scenario by repeating 4DMRI acquisitions and avoiding re-evaluation 4DCTs that would deliver additional non-therapeutic dose to the patient.

Phantom experimental setup and imaging dataset
The phantom composed of ex-vivo porcine lungs mimics human breathing motion with high reproducibility for ground truth (GT) data acquisition and offers the possibility to test a clinical scenario simulating realistic patient-like data (Rabe et al 2021).
Three datasets were acquired in total (table 1) for two distinct porcine lungs (one dataset for lung 1 and two datasets for lung 2) by repeating measurements three times with different motion patterns, through a specific setup adopted by Rabe et al (2021). A gelatin-water mixture was injected into the lung at different locations. The structures, once solidified, served as surrogate target lesions (four lesions for dataset 1; three lesions for datasets 2 and 3, respectively) with volumes between 10 and 17 cm 3 , which is comparable in size to stage T1 non-small cell lung cancer lesions. The lung was positioned in the MR-compatible porcine lung phantom artiCHEST (PROdesign GmbH, Heiligkreuzsteinach, Germany) in a geometry mimicking the human thorax (Biederer et al 2003) and inflated at atmospheric pressure by establishing a vacuum in its enclosure. Subsequently, artificial breathing motion was simulated by applying air pressure with an external pump to an artificial water-filled silicone diaphragm located inferior of the lung.
To reproduce a patient's breathing patterns, the phantom's control software enables arbitrary periodical diaphragm movements with adjustable breathing frequency and amplitude. The breathing frequency was the same for the three datasets (approximately 12 cycles min −1 ), only the amplitude and baseline pressure (lowest inflation of the diaphragm, corresponding to maximum inhale) changed. The main difference between datasets 2 and 3 was the motion amplitude (larger amplitude for dataset 2) with different baseline pressure. For the motion patterns an asymmetric breathing curve was used, with a longer time spent in the 'close-to-exhale' phases than in the 'close-to-inhale' phases, similar to the breathing motion of a patient (Lujan et al 1999).
For each dataset, we acquired a 3DCT and 4DCT and a 3DMRI and respiratory-correlated 4DMRI, with the 4DCT acquisition serving as GT dataset in the geometric and dosimetric analyzes.
For CT acquisitions, the phantom was placed on the patient couch of a Toshiba Aquilion LB (Canon Medical Systems, Japan) CT scanner, used for the acquisition of planning CTs for photon radiotherapy at the LMU University Hospital. The phantom motion was paused at the mid-exhale position, and a 3DCT was acquired (inplane pixel size: 1.074 × 1.074 mm 2 ; slice thickness: 3 mm; pixels: 512 × 512; x-ray tube voltage: 120 kV). Next, the phantom breathing motion was initiated, and projection data for 4DCT reconstruction were acquired simultaneously to a respiratory surrogate signal, recorded with a load cell coupled to the external pump pressure via a membrane adapter (Anzai Medical Co., Ltd, Tokyo, Japan). The projections were retrospectively assigned to ten breathing phase bins using the vendor's phase-based sorting algorithm to reconstruct a respiratorycorrelated 4DCT dataset (in-plane resolution: 1.074 × 1.074 mm 2 ; slice thickness: 3 mm; pixels: 512 × 512; x-ray tube voltage: 120 kV) with ten breathing phases (0%-90% with a 10% step size). After CT acquisition, the phantom was moved to a MRI-guided linear accelerator (MR-Linac) while the breathing motion was paused. During transport, the vacuum pump was connected to an uninterruptible power supply to ensure continuous evacuation of the phantom's lung enclosure to avoid positional changes of the lung between the scanners.
For MRI acquisitions, the phantom was scanned with a 0.35 T MRIdian MR-Linac (ViewRay Inc., USA) in clinical operation at LMU University Hospital. The breathing phantom was scanned with the clinically employed balanced steady state free precession sequence with a frame rate of 4 Hz sequentially at all sagittal slice positions spanning the whole phantom in left-right (LR) direction (70 frames per slice position; in-plane resolution: 3.5 × 3.5 mm 2 ; slice thickness: 5 mm; TR/TE = 2.41 ms/1.09 ms; flip angle: 60°; field-of-view: 35 × 35 cm 2 ; acquisition matrix: 100 × 100; receiver bandwidth: 1000 Hz px −1 ). Respiratory-correlated 4DMRI volumes were reconstructed by retrospective sorting based on a surrogate signal derived from the sum of all pixels in a binary thresholded region of interest of the 2D frames which included parts of the moving diaphragm, as described in Rabe et al (2021). Subsequently, the 4DMRI images were corrected for through-plane geometric distortions using the spherical harmonics coefficient optimization method (Janke et al 2004, Rabe et al 2021, deriving a sagittal dataset with resolution 3.5 × 3.5 × 5 mm 3 (5 mm in LR direction) with 10 breathing cycles.
The method for generating the virtual 4DCT was then applied as will be discussed in section 2.3. Due to the reproducibility of the phantom, limited motion between the end-exhale phase of the 4DCT and 4DMRI was observed, thus the static mid-exhale 3DCT was adopted as reference CT for the virtual 4DCT generation to simulate more challenging inter-fraction motion.

Patient dataset
The patient dataset consisted of three patients with lung cancer treated at the National Center for Oncological Hadrontherapy (CNAO, Pavia, Italy) with PT, using respiratory gated dose delivery centered on end-exhale  perforated body thermoplastic masks (Klarity Medical Products, USA). The study was approved by the local Ethical Committee and all patients signed an informed consent. The patient datasets (table 1) are provided with: 4DCT dataset, 4DMRI dataset and 3D volumetric interpolated breath-hold examination sequence serving as anatomical acquisition for the localization of the lesion.
The 4DCT scans were acquired during free breathing with a Siemens SOMATOM Sensation Open CT (Siemens Healthcare GmbH, Germany) scanner with voxel size of 0.98 × 0.98 × 2 mm 3 . 4DCT acquisition was coupled to a pressure sensor (AZ-733V system, Anzai Medical Co. Ltd, Japan) placed between the patient body and the solid mask. Projections were retrospectively assigned to eight breathing phase bins using the external respiratory surrogate signal and the scanner amplitude-based sorting software to reconstruct a respiratorycorrelated 4DCT dataset. The following respiratory phases were considered: end-exhale (0%EX), used for plan optimization; 30%-exhale (30%EX) and 30%-inhale (30%IN), to evaluate the impact of residual motion within the gating window; end-inhale (90%IN) to assess the total range of motion (ROM) (Meschini et al 2020, Molinelli et al 2022. On the same day of the planning 4DCT, multi-slice MRI sagittal images of the thoracic site were acquired during free breathing with the T2/T1-weighted balanced steady-state free precession sequence (TrueFISP) using a 3 T scanner (Magnetom Verio, Siemens Healthcare GmbH, Germany). The imaged volume consisted of 25 sagittal slices, including a field of view (FOV) limited to 12.5 cm in the LR direction. For each slice 30 frames were acquired with the following parameters: TR/TE: 228.07 ms/1.5 ms; flip angle: 33°; scan matrix: 256 × 256 pixels with spacing of 1.33 × 1.33 mm 2 ; slice thickness of 5 mm; acquisition time: 310 ms/slice with 25 slices × 30 frames; k-space percentage sampling: 65%; acceleration factor: 2 with a generalized auto-calibrating partially parallel acquisition using 16 auto-calibration lines. The respiratory-correlated 4DMRI was retrospectively reconstructed and the 0%EX, 30%EX, 30%IN and 90%IN respiratory phases were considered in the 4DMRI as in the 4DCT (Paganelli et al 2018a, Meschini et al 2019). Additionally, 4DMRI were acquired also in correspondence of potential additional re-evaluation 4DCTs (table 1).
The method for generating the virtual 4DCT was applied as described in section 2.3, adopting the end-exhale phase of the 4DCT as reference 3DCT. To cope with the limited FOV of 4DMRI, the virtual 4DCT approach was performed also including a step which allowed us to propagate the motion field and obtain physically plausible deformation outside of the 4DMRI FOV, as described and validated in Meschini et al (2022a).
For patients, we generated virtual 4DCTs using the planning CT and the 4DMRI both acquired on the same day. We also generated additional virtual 4DCTs using planning CT and 4DMRI belonging to different acquisitions. This was possible thanks to the availability of 4DMRIs acquired in correspondence of re-evaluation 4DCTs, with a scan temporal distance ranging from 14 to 45 d (table 1) with respect to the planning CT.
In this way, we analyzed the potential of the virtual 4DCT as support to eventually trigger a re-evaluation 4DCT acquisition and consequently reduce the number of times the patient is exposed to additional nontherapeutic radiation.

Virtual 4DCT generation
To compensate for setup errors, a rigid image registration between the reference 3DCT and the end-exhale 4DMRI (step 1 in figure 1) was performed. Then, the virtual 4DCT was produced by following the approach proposed by Meschini et al (2020) which involves three main steps. Firstly, mono-modal DIR was performed to obtain the deformation vector fields (DVFs) describing breathing motion within the 4DMRI, by registering the end-exhale MRI to all other 4DMRI respiratory phases (step 2 in figure 1). Multi-modal DIR was then applied to superimpose the reference static 3DCT to the end-exhale MRI (step 3 in figure 1) to account for non-rigid baseline variations occurring in case of inter-fraction motion between the acquisition of the reference CT and the 4DMRI scans (Meschini et al 2020). Lastly, the virtual 4DCT was generated by warping the registered endexhale CT according to the DVFs previously obtained (step 4 in figure 1).
Intra-and inter-modality DIR were performed using the Plastimatch B-splines algorithm (Shackleford et al 2010), using a multi-stage approach starting from a coarse resolution and B-spline grid spacing to finer ones. For the multi-modal DIR, mutual information was chosen as similarity metric and five stages were implemented. Instead, for the mono-modal DIR, the chosen metric was the mean square error, and three stages were implemented (Meschini et al 2020).

Phantom validation
The validation of the method on the phantom consisted in comparing the generated virtual 4DCT with the available GT 4DCT with both geometric and dosimetric metrics.
For the geometric analysis, we quantified: the performance of the mono-modal and multi-modal DIR, comparing the structures obtained by propagating the contours (with the DVFs retrieved from the mono-and multi-modal DIR) with those manually segmented; the target ROM, computed as the distance between the endexhale and the end-inhale phases; the performance of the virtual 4DCT computed as difference with respect to the 4DCT GT; the difference between the 4DMRI and the GT 4DCT because no variations should be ideally present between the two modalities (table S1 in supplementary materials). The comparisons were performed by means of different metrics using tumor masks (Crum et al 2006, Taha et al 2015: • Center of mass (COM) distance, calculated as: where COM 1 and COM 2 denote the COM of the first tumor mask and the COM of the second tumor mask, respectively.
• Dice similarity coefficient (DSC), which is an index that can vary between 0 and 1 and can be used for computing the similarity between two binary images. DSC was calculated as: where GTV 1 and GTV 2 (i.e. gross tumor volume) denote the first tumor mask and the second tumor mask.
• Hausdorff distance (HD), which is a measure of the maximum of the minimum distances between two contours. HD was calculated as: where x i is the position of the ith point on the first contour and x j is the position of the point closest to x i on the corresponding second contour. Dosimetric analyses were carried out using the RayStation Treatment Planning System (TPS) (RaySearch Laboratories, Sweden) used clinically for treatment planning. The procedure consisted in a first phase in which the dose distribution was optimized on the reference mid-exhale 3DCT, both for protons and carbon ions, to simulate a gated treatment, as clinically performed. Once the dose has been finalized, dose recalculation was performed on all the other respiratory phases of virtual and GT 4DCTs. An artificial prescription dose of 60 Gy (RBE) for 10 fractions was taken. Position and orientation of the different beams were adjusted to minimize the effects of artificial inhomogeneities in the beam path (figure S1 in supplementary materials).
The accuracy of the method was evaluated by computing the global gamma pass rate (3 mm/3% with analysis threshold at 10% of the prescription dose) and the difference of relevant dose-volume histogram (DVH) metrics such as the dose to 95% (D 95% ) of the target volume (ΔD95%| CT,vCT ) computed as: Figure 1. Schematic workflow of the procedure used to validate a method for virtual 4DCT generation from 4DMRI data and its evaluation in particle therapy of the thorax.

Dn
where Dp = prescription dose, Fx = number of fractions, n = 95%, whereas X and Y are GT 4DCT and virtual 4DCT, respectively.

Patient evaluation
In the considered patient group, treatment plans were optimized with the RayStation TPS and were defined for each patient in prone or supine setup.
The standard strategy for lung cancer treatment at CNAO consists of a robust optimization approach directly on the GTV structure (i.e. no subclinical disease, no CTV) (Chang et al 2017). The robust optimization followed the CNAO internal protocol including 3% range uncertainty and 3 mm setup uncertainty. Since gating treatments are clinically performed at CNAO, the robust optimization was performed on the end-exhale 0%EX by including the limits of the gating window (30%EX and 30%IN). The robust optimization was then verified by recalculating the plan on 30%EX and 30%IN, as target dose coverage must be guaranteed in the gating window for clinical approval.
Clinical plans were then recomputed for all respiratory phases of the virtual 4DCT and, due to the absence of a GT, on all phases of the re-evaluation 4DCT. Recalculations were done in order to evaluate the dosimetric impact of residual motion on the gating window phases (i.e. end-exhale, 30%EX and 30%IN) and at end-inhale.
The inter-and intra-fraction variabilities observed in the virtual 4DCT with respect to the re-evaluation 4DCT were quantified. Specifically, we evaluated the difference between the target volumes (GTV, gross tumor volume) of the virtual and the re-evaluation 4DCTs relying on COM distance, DSC and HD metrics. The tumor ROM in the re-evaluation 4DCT dataset, computed as the distance of tumor COM in the 0%EX and the 90%IN, was also quantified. Additionally, for the dosimetric analysis, we assessed the ΔD5%| CT,vCT and ΔD95%| CT,vCT of the target volume and the ΔD2%| CT,vCT to OARs in the beam paths as well as the global gamma pass rate (3 mm/3%). In the clinical practice, if D 95% < 95% of the prescribed dose for the target and D 2% > 5% of the prescribed dose for OARs, a re-evaluation 4DCT is acquired.

Phantom validation
In the phantom validation, the method accuracy was tested, and results related to (a) the performance of the DIR, (b) the difference between 4DMRI and 4DCT and (c) virtual 4DCT error, exhibited a geometrical accuracy within the maximum resolution of the MRI (i.e. 5 mm) (table S2 in supplementary materials).
The ROM, expressed as the average among tumors, between end-exhale and end-inhale phases of the 4DMRI was up to 9.2 mm, whereas it was up to 11.2 mm for 4DCT (table 2). The generated virtual 4DCTs were close to the GT data with an error within the voxel size of MR imaging (table 2, figure S2 in supplementary materials).
For the dosimetric analysis, figure 2 shows the results for the three phantom datasets, expressed as absolute mean, calculated on the different breathing phases, of the dose difference between GT 4DCT and virtual 4DCT, normalized with respect to the prescription dose. For the first dataset, the method exhibited DVH deviations up to 6% for target D 95% and a gamma pass rate above 91% for the end-exhale (figures 3(a) and (b), table S3 and S4 in supplementary materials), mainly due to phase shifts (i.e. variations related to nodule baseline shift and lung tissue deflation due to the repositioning of the phantom from the CT to the MRI scanner, as quantified in table S2 panel F and visible in figure S3 of supplementary materials) during CT/MRI acquisition. For the second and third datasets, results obtained showed a mean dose difference around 1% for protons and maximum 2% for carbon ions, except for lesion T3 of phantom M2. For this tumor, which presented the largest motion (13.32 mm and 15.03 mm quantified in 4DMRI and 4DCT respectively, table S2 panel D and E of supplementary materials), the dosimetric difference in D 95% observed between virtual 4DCT and 4DCT was mainly due to the fact that the smaller ROM of the 4DMRI (and thus of the virtual 4DCT) with respect to that of the 4DCT guaranteed a better target coverage (as optimized on the reference mid-exhale). Nevertheless, the values of the gamma pass rate for M2 and M3 were higher than 99% for the end-exhale phase (table S4, supplementary materials). Overall, the dose differences for carbon ion treatment plans were higher than proton plans with mean deviations up to 9% and a mean gamma pass rate of 98%.

Patient evaluation
Results related to virtual CT versus re-evaluation 4DCT in patients are reported in figure 4 and showed that the method performed with an overall mean error of (3.7 ± 1.1) mm within the gating window. Slightly worse but reasonable results up to 7.4 mm were achieved for breathing phases outside the gating window (i.e. end-inhale phase), where we expect greater variations between the 4DMRI and 4DCT acquisitions with respect to the exhale respiratory phase (tables S5-S7 in supplementary materials).
In terms of DVH metrics the virtual and the re-evaluation 4DCTs showed good agreement, with limited errors on tumors D 95% within the gating window up to 2% (figure 5) and a mean gamma pass rate of 94% (standard deviation of 4%) for the end-exhale. Particularly, for patient LP04, the D 95% showed variations below 2% although a degradation of the gamma pass rate was present (table S8 in supplementary materials).
For patient LP03_1 a dose variation up to 20% for target ΔD95%| CT,vCT , were observed inside the gating window. An example DVH for the target volume and OARs for this patient is depicted in figure 6(a) for the 0% EX phase together with the corresponding dose distribution on both re-evaluation and virtual CTs in comparison with results for LP04_2 ( figure 6(b)). Contours of the planning CT were propagated with the virtual CT approach and the structures manually delineated by the clinician on the re-evaluation 4DCT were used for comparison. To support the analysis, we quantified the motion between the planning 4DCT and re-evaluation 4DCTs that had occurred during the time between the scans (supplementary material table S9, panel A); the HD between the contours showed anatomo-pathological variations between the two acquisitions, especially for LP03_1 and LP04.
For the OARs in the beam path, the ΔD2%| CT,vCT was calculated (figure 7). Lowest variations were seen for the esophagus for which differences up to 4% were found in the carbon ion treatment plan recalculation. The highest variations were found for the heart (up to 20%) and for lung-GTV (up to 19%) structures.

Discussion
In this work, a method implemented in the literature (Meschini et al 2020) to derive virtual 4DCT from 4DMRI data has been validated by means of ex-vivo porcine lungs phantom able to mimic human breathing motion. Additionally, we quantified the results obtained, adopting the phantom-validated method, on data from real lung cancer patients treated at CNAO with PT, both from the geometric and dosimetric standpoints. In perspective this approach allows us to study organ motion due to respiration by acquiring repeated 4DMRI, depicting different breathing cycles, potentially avoiding 4DCT that would deliver additional nontherapeutic dose to the patient (Meschini et al 2020).
Only few studies in the literature focused on generating virtual CT data in the thoraco-abdominal region for proton or carbon ion therapy treatments (Boye et al 2013, Guerreiro et al 2019, Liu et al 2019, Meschini et al 2020, Duetschler et al 2022. The virtual 4DCT approach was validated on a computational phantom in Meschini et al (2020) for carbon ion treatments of abdominal targets, but up to now no published papers proposed a validation of the method on physical phantoms. The use of such phantoms is however crucial to obtain reproducible breathing motion and GT volumes with realistic patient-like data for validation purposes (Rabe et al 2021).
Results related to phantoms ROM showed that the motion of the 4DCT were higher compared to the MRI ROM. This could be due to (a) the resolution of the 4DMRI which is worse than that of the 4DCT in any direction, and (b) the different acquisition times of 4DCT and 4DMRI.
It should be also noted that variations between the 4DCT and 4DMRI acquisitions were present (table S2(F) and figure S3 in supplementary materials). These variations (less than 4.6 mm) were mainly related to nodule baseline shift and lung tissue deflation due to the repositioning of the phantom after transport from the CT to the MRI scanner. Nevertheless, the phantom represents a realistic patient-like data and accurate geometric results were obtained. Indeed, achieved results showed that the method exhibits a geometrical accuracy within the maximum resolution of the MRI.
From a dosimetric standpoint, in the first phantom dataset, the method presented DVH deviations with respect to the prescription dose up to 5% for protons and 6% for carbon ions for target D 95% (gamma pass rate > 91%). A greater difference between GT 4DCT and virtual 4DCT was obtained in this phantom dataset and for some lesions (e.g. T1) because of a phase shift during CT/MRI acquisitions which contributed to a greater error ( figure S3 in supplementary material). For the second phantom, the main dose difference was associated to the smaller ROM of the 4DMRI (and thus of the virtual 4DCT) with respect to that of the 4DCT, which guaranteed a better target coverage. For the third phantom dataset, geometrical and dosimetric evaluations were optimal (figure 2, gamma pass rate above 99%).
The phantom validation results obtained for protons and carbon ions treatment plans showed differences between the two types of treatment and, specifically, the biggest errors were obtained in the latter case. A greater difference could be expected for carbon ion whose dose distribution is more sensitive to changes in tissue density along the beam path (Meschini et al 2020).
To analyze the virtual 4DCT method from a clinical perspective, the evaluation on patients consisted in generating 4DCT not just relying on 4DMRI data acquired on the same day of the planning 4DCT, but also considering the generation of virtual 4DCTs using planning 4DCT and 4DMRI belonging to different acquisitions. Specifically, virtual 4DCTs were generated using the reference CT of the first acquisition (i.e. the same used also to plan the treatment) with a 4DMRI obtained after a month or more.
On a geometric level, errors were low and acceptable since their median values are within the MRI maximum voxel size. From a dosimetric standpoint, the virtual and the re-evaluation 4DCTs showed good agreement for most of the patients, with limited errors on tumors within the gating window (ΔD95%| CT,vCT up to 2%) and gamma pass rates with a mean value of 94%, both for protons and carbon ions. The mean gamma pass rate was affected by results of patient LP04 and LP03_1, where relevant anatomo-pathological variations were observed between the planning CT and the re-evaluation CT (table S9(a) and figure S4, of supplementary material), resulting in differences between the propagated contours of the planning CT with the virtual CT approach and the manually delineated contours on the re-evaluation 4DCT (figure 6(a) in supplementary material). In LP04 the target coverage was nevertheless guaranteed with D 95% variations within 2%, whereas in LP03_1 dose deviations in the target up to 10% at end-exhale were observed.
Considering that the virtual 4DCT approach relies on the propagation of the structures defined on the planning CT, if the contours have been re-delineated on the re-evaluation CT, then they should also be redefined on the virtual 4DCT for a fair comparison. As such, we would expect that a manual contouring would be directly performed on the MRI acquisition or that a manual adjustment of the contours derived with the virtual 4DCT approach should be taken into consideration also when the virtual 4DCT would be exploited as support to the clinical workflow, especially when long times between the scans are present as in our case.
For the OARs, highest variations were found for the heart structure (up to 20%) and, subsequently, on Lung-GTV. This is mainly because our approach accounts for respiratory motion and not for cardiac movement and   that there might be variations introduced by the propagation of the motion field to compensate for the limited FOV of the acquired MRI (Paganelli et al 2018c, Meschini et al 2022a. To further support that our method performs independently to the image modality, we also evaluated the creation of virtual 4DCTs directly on the re-evaluation 4DCT without considering the MRIs. The performance followed the same trend as that of the MRI with errors in the order of the CT resolution (supplementary material table S9(b)).
Finally, the main limitation of this study for the physical phantom dataset is connected to the spatial resolution of the imaging data, as this affects DIR accuracy and therefore is put forward as the minimum acceptable uncertainty in the virtual 4DCT method (Meschini et al 2020). For the phantom, the resolution depended on the clinical MR sequence of the MR-Linac used for the acquisitions. This affected the performance of the method on patients, as DIR was optimized on phantom data and then used for application on patients. We expect that better results can be achieved by optimizing the parameters for virtual CT generation directly on patients or by performing phantom acquisitions with the same resolution of patient data.
On the other hand, main limitations related to the patient datasets are connected to the low number of patients available, to re-evaluation 4DCTs that are not as reliable as real GTs and to the long temporal distance between planning and re-evaluation 4DCTs scans that led to relevant anatomo-pathological variations requiring re-delineation of contours on re-evaluation 4DCTs.
A possible solution to these issues is to include a larger patient cohort in the analysis and acquire MR images more frequently so that multiple virtual 4DCTs can be generated, on which contours can be redefined manually. These virtual 4DCTs could be used for robust plan optimization, allowing the plan to be less sensitive to the anatomo-pathological variations measured in patients.
Moreover, in presence of relevant anatomo-pathological variations (seen on MRIs) and dosimetric deviations (seen on virtual 4DCTs), the virtual 4DCT could also be exploited as support to trigger the acquisition of a re-evaluation 4DCT, reducing accordingly the number of times the patient is subjected to additional nontherapeutic radiation. Specifically, when clinical dosimetric constraints on the virtual 4DCT are not satisfied (D 95% < 95% of the prescribed dose for target and D 2% > 5% of the prescribed dose for OARs), a re-evaluation 4DCT is suggested.

Conclusions
In this study, we exploited CT/MRI acquisitions of a physical phantom to validate the virtual 4DCT approach in thoracic tumors treated with PT. The method was found to be accurate from both geometrical and dosimetric standpoints, allowing us to test it on real patients' data treated at CNAO to evaluate the potential of the virtual 4DCT as support to the clinical workflow. This study could encourage the use of 4DMRI in the treatment of lung cancer with protons or carbon ions to quantify breathing motion variations.