Assessment of range uncertainty in lung-like tissue using a porcine lung phantom and proton radiography

Thoracic tumours are increasingly considered indications for pencil beam scanned proton therapy (PBS-PT) treatments. Conservative robustness settings have been suggested due to potential range straggling effects caused by the lung micro-structure. Using proton radiography (PR) and a 4D porcine lung phantom, we experimentally assess range errors to be considered in robust treatment planning for thoracic indications. A human-chest-size 4D phantom hosting inflatable porcine lungs and corresponding 4D computed tomography (4DCT) were used. Five PR frames were planned to intersect the phantom at various positions. Integral depth-dose curves (IDDs) per proton spot were measured using a multi-layer ionisation chamber (MLIC). Each PR frame consisted of 81 spots with an assigned energy of 210 MeV (full width at half maximum (FWHM) 8.2 mm). Each frame was delivered five times while simultaneously acquiring the breathing signal of the 4D phantom, using an ANZAI load cell. The synchronised ANZAI and delivery log file information was used to retrospectively sort spots into their corresponding breathing phase. Based on this information, IDDs were simulated by the treatment planning system (TPS) Monte Carlo dose engine on a dose grid of 1 mm. In addition to the time-resolved TPS calculations on the 4DCT phases, IDDs were calculated on the average CT. Measured IDDs were compared with simulated ones, calculating the range error for each individual spot. In total, 2025 proton spots were individually measured and analysed. The range error of a specific spot is reported relative to its water equivalent path length (WEPL). The mean relative range error was 1.2% (1.5 SD 2.3 %) for the comparison with the time-resolved TPS calculations, and 1.0% (1.5 SD 2.2 %) when comparing to TPS calculations on the average CT. The determined mean relative range errors justify the use of 3% range uncertainty for robust treatment planning in a clinical setting for thoracic indications.


Introduction
Proton therapy is increasingly employed for the treatment of thoracic indications, such as non-small cell lung cancer (NSCLC) (Macdonald et al 2009). In thoracic patients, proton therapy may reduce radiation-induced complications, such as radiation pneumonitis, which is linked to the lung dose (Palma et al 2013, Appelt et al 2014, dysphagia, which correlates with dose to the oesophagus (Belderbos et al 2005, El Naqa et al 2006, Zhu et al 2010, Gomez et al 2012, Wijsman et al 2015 or a reduced life expectancy (two-year mortality), which is correlated with the dose to the heart (Wang et al 2017).
The adoption of proton therapy for the treatment of thoracic tumours, especially when applied with pencil beam scanning (PBS), has been hampered by numerous sources of uncertainty, which can potentially have severe dose distribution degrading effects. Sources of uncertainty are linked to organ motion, interplay effects (Bert andRietzel 2007, Paganetti 2012) and range uncertainty, among others. Regarding the range uncertainty, the literature suggests that uncertainty as high as 5% might be applicable for lung tissue.
In addition, it has been speculated that due to the micro-structure of the lung, which cannot be properly detected with computed tomography (CT) scans, proton beams may be affected by an increased range straggling effect, which would enlarge range uncertainty further (Uries et al 1986, Baumann et al 2019, España and Paganetti 2011, Perles et al 2011, Sell et al 2012, Titt et al 2015. Most of the recent work is based on Monte Carlo simulations, as experimental range measurements in lung-like tissue are not straightforward to perform. Practically, it is rather challenging to maintain lung tissue air-ventilated in ex vivo conditions. Therefore, when assessing range accuracy in lung-like tissue or performing CT calibration for proton dose calculations, simplified tissue substitutes are often used. These are materials similar to lung tissue in terms of physical density, however, they do not resemble the elemental composition or structure of actual lung tissue. As a result, lung-like tissue is associated with increased range uncertainty (Paganetti 2012). Nonetheless, artiChest (PROdesign, Heiligkreuzsteinach, Germany) (Biederer andHeller 2003, Etzold 2020) is a phantom, which allows one to ventilate porcine lungs within a plastic shell while mimicking the breathing motion.
Furthermore, proton radiography (PR) is a proton imaging technique introduced and investigated by numerous groups (Parodi 2020). It relies on detecting high energy protons passing through a sample-of-interest, positioned in the beam path. The detected signal can give insight into the water equivalent thickness (WET) of the materials in the beam path. This can be useful, for instance, for patient positioning or to provide an input for assessment of range uncertainty in the proton treatment planning process.
In our clinic we have adopted a PR technique (also referred to as range probing) as described by Farace et al (2016). It relies on measuring individual integral depth-dose curves (IDDs) per shoot-through proton pencil beams distally from the sample-of-interest, using a multi-layer ionisation chamber (MLIC). The technique has been demonstrated to be useful for validation and optimisation of CT calibration curves by Meijers et al (2020). In this previous study range errors were assessed in bone-and soft tissue-like phantoms. In particular, one of the phantoms was designed as a 'thorax' phantom, consisting of ribs, liver, fat and muscle attached to a low-density styrofoam block, which allowed one to assess range uncertainties in tissue types found in the thorax, except lung tissue itself. In the previous study 1.5 SD of range errors was found to be within the theoretical range uncertainty margin of 2.4% + 1 mm.
The objective of this study was to experimentally assess range errors in an air-ventilated porcine lung using PR and, in combination with the previous study, which investigated bone-and soft tissue-like phantoms (Meijers et al 2020), to justify the choice of range uncertainty for robust thorax treatment planning to be used in a clinical setting. In this context, range errors were determined as the discrepancy between treatment planning system (TPS) predicted ranges versus measured ranges through the porcine lung tissue phantom.

Preparation
A 4D phantom artiCHEST, which consists of two water-filled plastic shells in the shape and dimensions of a human thorax and which can host ex vivo porcine lungs, was used (see figure 1). Ex vivo porcine lungs, including parts of the trachea, were inspected for their intactness and were prepared for the experiment within 24 h after extraction. The 4D phantom was configured to mimic a 4 s long breathing cycle. This was accomplished by a two-way air pumping system. A first pump ensures vacuum in the chest cavity while the bronchia is open to atmospheric pressure, letting the porcine lungs inflate. A second pump drives an air-operated diaphragm, which can produce cyclical motion, following a user-defined breathing curve. A phase-binned 4DCT scan (Somatom Definition AS, Siemens Healthineers, Erlangen, Germany) was performed and reconstructed into scans of ten breathing phases and an average scan, all imported into TPS (RayStation 8B, RaySearch Laboratories, Stockholm, Sweden). The breathing signal of the 4D phantom was collected by using an ANZAI load cell (Anzai, Tokyo, Japan). The clinical CT calibration curve in the TPS was used to correlate mass densities to the CT numbers of the 4DCT scan. The CT calibration curve was defined following the stoichiometric method (Schneider et al 1996) with minor deviations and validation, as described by Meijers et al (2020). Proton stopping power ratios (SPR) are calculated by the TPS based on the provided mass density to CT number curve and internal material look-up table.
Five PR fields (i.e. frames) in the anterior-posterior (AP) direction were planned to intersect the 4D phantom at various positions. The centres of the frames (F2-F6), as well as the setup of the 4D phantom are  shown in figure 1. Areas, which were covered by PR frames, were subject to a motion of up to 9 mm. The highest extent of the motion was observed in frame 2.
The AP beam direction selection was driven by two aspects: (1) currently the acquisition is technically feasible only along the cardinal axis, and (2) the AP direction allowed one to design some of the frames (F5 and F6) in a way that only lung tissues are intersected, without soft tissue being in the beam path, this way allowing one to focus on range errors in lung tissues.
Each PR field consisted of 81 spots separated by 5 mm and covering an area of 4 × 4 cm 2 . All spots were assigned an energy of 210 MeV and 1 MU/spot. On the used system the full width at half maximum (FWHM) in air at the isocentre for spots with 210 MeV energy is 8.2 mm, according to the spot size measurements performed during the commissioning phase.
Focusing on porcine tissue sample anatomy, proton spots in frames 5 and 6 were intersecting lung tissue only, while frames 2, 3 and 4 were located in the mediastinum area, therefore proton spots were intersecting soft tissue, vessels and tracheas in addition to lung tissue.

Measurements
In the proton treatment room (Proteus Plus, IBA, Louvain-la-Neuve, Belgium) the 4D phantom was positioned on the treatment table supported by a 6D robotic positioning system and aligned to the treatment room isocentre by using the on-board cone-beam CT (CBCT) system. The positioning accuracy of the robotic arm is <0.5 mm, as demonstrated by recurrent quality assurance (QA) procedures.
The MLIC (Giraffe, IBA Dosimetry, Schwarzenbruck, Germany) was positioned distally from the 4D phantom below the treatment table 63 cm from the isocentre on a slab of solid water, as shown in figure 2. The MLIC was aligned to the isocentre along the beam axis.
Each of the five PR frames was delivered five times to ensure that various phases of the breathing cycle were sampled by each proton spot. The delivery time of a single frame is approximately 12.5 s. The phantom was repositioned between the frames by applying an offset with the robotic arm. The breathing signal of the 4D phantom was acquired using an ANZAI load cell during the acquisition of the PR frames. The cell was connected via an adaptor to the hose, which was operating the diaphragm. Before the experiment the ANZAI system and proton therapy system (PTS) were synchronised to the same time server to ensure a consistent absolute time reference for both systems. After delivery of the PR frames, treatment delivery log files of the PTS were collected.
Since measurements were performed for fields in the AP direction, proton beams were passing through the treatment table (Qfix Standard insert, Qfix, PA, USA). During the clinical commissioning of the patient positioning devices the WET of the table insert was determined to be 5 mm. This value was taken into account during data analysis, by shifting measured IDDs accordingly.
During measurements, the proton spots were passing through various materials in the 4D phantom in the following order (see figure 2(B)): plastic shell, water, plastic shell, porcine lung, plastic shell, water and plastic shell. The plastic shell of the 4D phantom is made of thermoplastic copolyester (physical density according to the material data sheet is 1.27 g cm −3 ). The WET of the shell was experimentally verified and agreed to the theoretical calculation within 0.5 mm, which is the uncertainty of the Giraffe-based WET measurements (Farace et al 2016). The data set for the experimental verification consisted of PR acquisition for frame 2 for an empty phantom (without porcine lung).
The inner surface of the plastic shell (the one in contact with lung tissue) was covered with a thin layer of ultrasound gel, to reduce friction and avoid tearing of the lung tissue. In terms of CT numbers, the gel corresponded to water-like material. No additional corrective actions were taken.

Analysis
PR frames were acquired during simulated breathing of the 4D phantom. A priori, it was not known which spots would be delivered during which specific breathing phase. Therefore, treatment delivery log files (referred to as logs) and ANZAI breathing patterns, acquired during beam delivery, were used to sort out spots into their corresponding breathing phase. For this purpose, the method as described by Meijers et al and in-house built scripts were used (2019). In addition to other data, logs contain information regarding position, monitor units (MU), energy and delivery timeline for every delivered spot. From the ANZAI breathing patterns, it is possible to determine in what phase the 4D phantom was at a specific moment in time. By combining these two datasets, a set of ten DICOM sub-plans was created, where each sub-plan contained only the spots corresponding to a specific breathing phase. These sub-plans per measured frame (5 frames × 5 repetitions per frame) were imported in the TPS, and each individual spot was calculated on the corresponding phase of the 4DCT image set. The end point of a spot delivery was considered as a time stamp for this spot. Assignment to the phase was carried out based on this time stamp. The approximate delivery time of a spot was 5 ms. Depending on the timing, it is possible that some of the spots were partially delivered in one phase and ended in the next one. However, this effect was considered negligible and no additional corrective actions were taken. To simulate the MLIC, a water slab was added distally to the 4D phantom. The TPS Monte Carlo dose calculation engine of RayStation with an accuracy of 0.5% on a dose grid of 1 mm was used to calculate TPS-predicted IDDs. For the purpose of dose calculations, the plastic shell of the phantom was contoured and overridden with a density of 1.27 g cm −3 . TPS-calculated dose distributions were integrated along the beam axis to create an IDD per spot.
A set of Matlab scripts based on openREGGUI open source code (Deffet et al 2017, OpenREGGUI 2020) was used to compare measured IDDs with the simulated ones and calculate the range error for every individual spot. The range error is obtained by the least squares method and corresponds to the optimal shift between two IDDs. Figure 3 shows measured (MLIC) and calculated (TPS) IDDs for one of the example spots passing through lung tissue (frame 5).
In addition to TPS calculations on the corresponding phase of the 4DCT, calculations were performed on the average CT, since it is common practice to perform clinical treatment planning on average CTs.

Results
In total 2025 proton spots were individually measured and analysed. The mean water equivalent path length (WEPL) through the 4D phantom was 143.7 mm (1 SD 32.5 mm), as determined based on MLIC measurements. All ten phases of the 4DCT were well represented in the measurement data set. On average 203 spots (1 SD 18 spots) were associated with an individual phase. The lowest number of spots (171) was associated with a 70% phase, and the highest number of spots (240) was associated with a 0% phase.
Although range uncertainty recipes, as suggested in the literature (Paganetti 2012), typically consist of relative and absolute components, in practice most of the commercial TPSs allow one to specify only the relative component for the purposes of robust optimisation or robustness evaluation. Therefore, a further on the range error of each specific spot is evaluated relative to the WEPL of this spot through the 4D phantom, which is determined based on MLIC measurement.    Figure 4 shows a histogram of relative range errors for all 2025 measured spots. For this data set the TPS dose calculation per spot was performed on the corresponding phase (0%-90%) of the 4DCT. The mean relative range error is 1.2% (1.5 SD 2.3 %). Figure 5 shows the histogram of relative range errors for the complete measurement data set, however, in this case, the TPS dose calculation for all spots was performed on the average CT. The mean relative range error here is 1.0% (1.5 SD 2.2 %).  Overlay of the WET map of the phantom along the proton path (coronal view PR) and relative range errors, as measured for proton spots included in frame 3. Every square represents a proton spot. For instance, the WET gradient over the area A is 30.5 mm.
Since frames 5 and 6 were intersecting lung tissue only (see figure 1), relative range error histograms for these frames are shown separately in figure 6.
The mean relative range error for comparison with calculations on 4DCT phases is 1.0% (1.5 SD 1.1 %), while for comparison with calculations performed on the average CT it is 0.8% (1.5 SD 1.2 %).

Discussion
Comparison of the TPS-predicted and experimentally measured ranges for shoot-through proton spots directed through a porcine lung in the 4D phantom showed good agreement, especially for spots travelling through lung tissue only (frames 5 and 6).
The introduction of soft/lung tissue intersections in the beam path creates high WET gradients that are in addition moving with the breathing cycles. Therefore, larger relative range errors are observed in frames located in the mediastinum region (especially frames 3 and 4). The sensitivity of the range accuracy towards high WET gradients in the presence of setup uncertainty is an observation that has already been reported in the literature several times (Knopf et al 2008, Farace et al 2016, Meijers et al 2020. As an example, figure 7 shows the transversal view of the phantom in the area of frame 3. One can observe rapid changes in tissue material in the lateral direction along the beam path of a spot (indicated by an orange line) that results in high WET gradients. Such WET gradients are correlating with increased range errors (area A in figure 7). The increase in range errors may be caused by local setup errors, resulting from positioning misalignments and breathing motion. This type of error has a random component; therefore, in the case of fractionated treatments, effects on the edges may smear out. However, further investigations should confirm this.
For technical and practical reasons, the 4D phantom was continuously kept under motion (simulating breathing) throughout the experiment: from the assembly of the phantom, during the CT scan, until the end of the measurement session, which is a time period of about 4-5 h. During this time, spatial configuration of the porcine tissue at some local areas may have changed, resulting in an increased setup error of some tissue versus the proton beam. In addition, it is not possible either to guarantee or to verify that the spatial configuration of tissue is exactly reproducible from one breathing cycle to another.
It can be observed that the evaluation performed on the average CT scan results in a slightly smaller mean range error and SD, compared to the evaluation performed on the phases of the 4DCT scan. Although the difference is relatively small, one possible explanation for this observation might be the more blurred edges of individual structures in the average CT image. This might make the comparison less sensitive to small local setup inconsistencies, which may be caused by spatial configuration variations as discussed in the previous paragraph. However, further investigations are necessary to confirm this.
The current study is primarily focussing on range error assessment. However, the data set potentially could be used for future investigations, in order to evaluate the effect of range mixing in more detail. Potentially, the goodness of fit may be used as a measure for such assessments.
A small, but systematic shift of relative range errors (about 1%) was observed, meaning that the TPS systematically overestimates the density of some materials in the beam path. This observation is potentially caused by an ambiguity in the contour definition of the shell of the phantom. The density of the thermoplastic shell was underestimated on the CT images (1.12 instead of 1.27 g cm −3 ), therefore contouring and override of the shell was necessary. However, the shell has a curved shape, it does not have an uniform thickness and the exact edges of the surfaces on the CT images are ambiguous, as illustrated in figure 8. For this reason, the definition of the shell is prone to uncertainty. As mentioned before, to obtain WET information of the shell, the PR of frame 2 was acquired for an empty phantom (without porcine lung). This measurement was used to directly correct for the shell density discrepancy of the PR of frame 2 acquired for the assembled phantom (with porcine lung), instead of using the override structure. For this direct correction, the mean relative range error for frame 2 (five repeated measurements) shifts towards zero (mean relative range error 0.3% instead of 1.1%), as demonstrated in figure 8. This suggests that ambiguity in the override approach to at least some extent may have contributed to the observed systematic shift.
The PR method used in this study is not suitable for establishing ground truth proton SPRs per tissue type. Individual proton spots are intersecting various mixtures of materials and tissues in lateral and longitudinal directions. Therefore, every acquired IDD is intrinsically integral by nature. To extract SPR information per tissue/material type, acquisitions of multiple projections would be necessary (Krah et al 2015, Meyer et al 2017, which would be a step towards proton CT (Dedes et al 2018).
The main purpose of the proposed method of investigation is to assess the range calculation accuracy of the TPS in a near-clinical setting using an end-to-end approach. In the case of major observed deviations between TPS-calculated and measured ranges, further investigations to identify the root cause would be necessary.
From the perspective of assessing individual tissue types corresponding to individual segments of the CT calibration curve, it is advantageous that PR acquisitions have been performed separately for lung-like tissue and soft/bone-like tissue (Meijers et al 2020). However, from the point of view of end-to-end testing, the absence of bone-like tissue in the current experimental setup may be considered as a limitation. The ability to perform PR measurements in vivo would provide additional confirmation for the made observations.
In conclusion, the range accuracy assessment in an ex vivo lung tissue using a 4D phantom showed good agreement between calculated ranges in the TPS and the PR measurements. The mean relative range error for a complete data set was 1.2% (1.5SD 2.3%), when TPS dose calculations were performed on the corresponding phases of the 4DCT, and 1.0% (1.5SD 2.2%), when TPS dose calculations were performed on the average 4DCT. The reported results are for the evaluation, where override for the shell has been used. This investigation, in combination with a previously conducted study (Meijers et al 2020) investigating bone-and soft tissue-like phantoms, allowed us to confirm the use of 3% range uncertainty (1.5 SD interval) for robust 4D CT and Monte Carlo based treatment planning in a clinical setting for thoracic indications as adequate.