Abdominal synthetic CT reconstruction with intensity projection prior for MRI-only adaptive radiotherapy

Sven Olberg; Jaehee Chun; Byong Su Choi; Inkyung Park; Hyun Kim; Taeho Kim; Jin Sung Kim; Olga Green; Justin C Park

doi:10.1088/1361-6560/ac279e

1. Introduction

The proliferation of magnetic resonance imaging (MRI)-based radiotherapy delivery systems in recent years has pushed applications of MRI-guided radiation therapy (MRgRT) to the forefront of RT (Wen et al, Fallone 2014, Keall et al 2014, Mutic and Dempsey 2014, Fischer-Valuck et al 2017, Pollard et al 2017, Raaymakers et al 2017). The superior soft tissue contrast of MRI compared to that of x-ray computed tomography (CT) improves target delineation in many sites, enabling near real-time motion tracking and management during treatment (Noel et al 2015, Raaymakers et al 2017). The advantages of MRI also lend themselves well to managing interfractional changes in a patient's anatomy, as MRI setup scans acquired in an adaptive radiotherapy (ART) workflow capture the anatomy of the day without exposing the patient to additional ionizing radiation (Fischer-Valuck et al 2017). The need for electron density information in dose calculations, however, necessitates that these setup scans be registered to a CT simulation scan that may have been acquired weeks prior to a given treatment fraction (Schmidt and Payne 2015). Issues may arise in some cases when the anatomy represented in each scan is incompatible due to changes in the geometry and position of organs of interest, which is a challenge that is especially relevant to gastrointestinal (GI) structures and intestinal gas pockets considered during ART in the abdomen. In these cases, the ideal, unconstrained MRgRT workflow would involve intensive manual contouring so that density overrides could be performed in order to approximate in the CT simulation scan the anatomy of the day captured in the MRI setup scan.

It is this potential for challenge and uncertainty that comes with multi-modal image registration and intensive contouring that makes an MRI-only workflow—one in which MRI is the sole imaging modality used for planning and guidance—an attractive alternative to the conventional MRgRT workflow. The primary challenge in an MRI-only workflow is generating synthetic CT (sCT) data that yields the electron density information used in dose calculations. Many existing approaches to this task have been summarized as belonging to three general classes: atlas-based, voxel-based, and learning-based methods (Edmund and Nyholm 2017, Johnstone et al 2018). Recent investigations have focused primarily on approaches belonging to the last category, namely deep learning (DL)-based approaches in which convolutional neural networks are used to approximate a mapping between MRI and CT images (Spadea et al 2021).

Studies in this space have primarily been limited to regions of relatively static anatomy, including the head & neck (Han 2017) and general pelvis (Arabi et al 2018, Chen et al 2018, Maspero et al 2018). More recently, our group has extended these investigations into the thorax, where registration is challenged by pulmonary and cardiac motion, by generating sCT data for use in MRI-only breast RT (Olberg et al 2019). When considering applications in the more dynamic region of the abdomen, a primary challenge to employing these DL strategies becomes readily apparent: the majority of frameworks require paired data for training. The presence and hard-to-characterize motion of intestinal gas gives rise to observable differences in bowel filling and position on numerous time scales: seconds in the course of a single scan, minutes during treatment delivery, and hours between MRI and CT scans used for treatment planning (Nakamoto et al 2004, Feng et al 2009, Kumagai et al 2009, Mostafaei et al 2018, Corradini et al 2019). This challenge is even more relevant clinically when considering setup scans in MRI-guided adaptive treatments. In these situations, MRI setup scans acquired the day of treatment are registered to the CT simulation scan that may have been acquired weeks previously. When large discrepancies in bowel filling and position exist, the MRI setup scan and simulation CT scan may be rendered largely incompatible. Corrective measures that could be undertaken in an online adaptive workflow represent a potentially significant delay to treatment delivery while the patient remains on-table (Henke et al 2018). As an alternative to the paired data approaches, one may consider the application of an unpaired style-transfer approach exemplified by CycleGAN in settings in which abundant, unpaired data exists (Zhu et al 2018, Spadea et al 2021). However, the authors of the original CycleGAN paper acknowledge a gap between the paired and unpaired results that is hard or even impossible to close in some settings, especially those in which there exists some inherent ambiguity (Zhu et al 2018). This conclusion is mirrored in other studies of the sCT reconstruction task; (Peng et al 2020) conclude that the conditional paired approach was 'preferable if high-quality MR-CT pairs were available' in the nasopharynx, which is another area challenged by the presence of air, and (Fu et al 2020) demonstrate no benefit to adopting the CycleGAN in the upper abdomen for liver cancer patients.

In light of these challenges, many existing studies on sCT reconstruction in the abdomen have abandoned the DL-based approaches in favor of various classification or thresholding-based approaches wherein manual steps are taken in the process of generating sCT data to account for the presence of air. Bredfeldt et al (2017) and Hsu et al (2019) utilized a fuzzy c-means clustering algorithm to classify tissue classes based on multiple MRI volumes captured using different sequences, taking care to threshold image regions where air is expected to be found before applying the classification algorithm. Alternatively, (Ahunbay et al 2019) opted to use deformable image registration between the daily MRI and simulation CT scans to transfer electron density information while using thresholding operations in manually contoured regions to identify the presence of air. Guerreiro et al (2019) explored a hybrid atlas and intensity-based conversion algorithm (Korhonen et al 2014, Koivula et al 2016, 2017) in which contoured regions of air were transferred directly from the simulation CT after Hounsfield units (HU) were assigned. Most recently, both (Cusumano et al 2020) and (Qian et al 2020) have reported experiences with DL-based sCT reconstruction in the abdomen. Neither study, however, deals explicitly with the issue of air discrepancies between corresponding MRI and CT scans. Cusumano et al (2020) instead employed exclusion criteria to select only MR and CT images with 'high correspondence in terms of air pockets location between the two images' for training and excluded a further six patients from the test set for the reason of poor correspondence of air between the two modalities. Similarly, (Qian et al 2020) do not discuss results for patients with poorly matched representations of air.

In the present study, we return to paired data-driven DL with a novel hybrid approach to the abdominal sCT reconstruction task enabled by the creation of a training data set that is clinically unavailable. As was previously discussed, the primary barrier to the adoption of many DL-based algorithms in this setting is the requirement for paired training data. Mismatches in the presence of intestinal gas between corresponding MRI and CT scans render the collection of a training data set of sufficient size an impossible task. Considering this and the challenges faced with an unpaired approach, we first utilize automated thresholding and morphological reconstruction operations to identify and propagate regions of air between corresponding MR and CT images to produce a well-matched training data set. We present here the preliminary evaluation of our paired data DL-based approach to sCT reconstruction in the abdomen enabled by the novel utilization of the intensity projection prior with a focus on showcasing the effects of intestinal gas differences on dose calculations. Dosimetric comparisons are made between two classes of test patients separated by a qualitative distinction made at the time of data collection prior to any evaluation: Class 1, consisting of well-matched patients demonstrating little involvement of intestinal gas, and Class 2, consisting of patients characterized by notable differences in the presence of intestinal gas in corresponding MRI and CT scans. Dosimetric accuracy is established using the patients of Class 1 while comparisons of target coverage between the sCT-based plans and the simulation CT-based clinical plans for patients of Class 2 highlight the complications posed by intestinal gas during MRI-only ART in the abdomen.

2. Materials and methods

2.1. Patient population

Data sets used in the present study were retrospectively collected from a population of pancreatic cancer patients previously treated at our institution using MRgRT. In each case, patients underwent CT and MRI simulation scans prior to treatment planning. Scans were acquired in treatment position using an Alpha Cradle (Smithers Medical Products Inc., North Canton, OH) with no additional immobilization devices. The nominal prescription in the selected population was a total dose of 50 Gy delivered to 95% of the PTV in 5 fractions. A small number of cases deviated from this prescription with total prescribed doses ranging from 30 to 40 Gy delivered in 5 fractions. For the nominal prescription, satisfying dose-volume constraints of <0.5 cm³ at 36 Gy for notable structures including the duodenum, stomach, small bowel, and large bowel was prioritized over target coverage. In any case in which one of these constraints was violated, the calculated dose was normalized to satisfy the violated constraint. The same normalization applied in the clinical case was also applied in the evaluation of our proposed method as discussed later. In all patient cases involved here, dose calculations in the clinical plan were performed using electron density data derived directly from the simulation CT scan with no additional density overrides included to account for the presence of air.

2.2. Training and testing data

A total of 89 patient data sets (one pair of corresponding MRI and CT scans per patient) were used in the present study, randomly assigned in the following splits: 53 train / 3 valid / 33 test. Ten-fold validation of the framework in the sCT reconstruction task has been previously carried out by our group, and an additional 3-holdout validation is performed here (Olberg et al 2019). The 33 test patients were qualitatively subdivided at the time of data collection prior to testing into the two classes mentioned previously: 13 well-matched patients in Class 1 and 20 patients characterized by notable discrepancies in bowel filling in Class 2. CT simulation scans acquired using a dedicated simulation machine (Brilliance CT, Philips Medical Systems, Andover, MA) were registered with 0.35 T MRI scans (nominally 276 × 276 × 80 matrix, 1.63 × 1.63 × 3 mm³) acquired using the MRIdian system (ViewRay Inc., Oakwood Village, OH) with a bSSFP sequence before being exported for pre-processing. Processing yielded a training data set of 2017 paired images, which were padded to dimensions of 520 × 520 before training via 320 × 320 random crops. The framework was trained for 1500 epochs with a batch size of 1 using TensorFlow (Abadi et al 2015) v1.7.0 in Python running on a 12 GB Titan Xp GPU (NVIDIA, Santa Clara, CA).

2.3. Pre-processing

A primary challenge in training a generative model to solve an image-to-image translation task such as this is constructing a set of training data consisting of well-matched pairs of images. This process becomes even more complicated in the abdomen when the variable presence of intestinal gas must be considered. Corresponding MRI and CT scans used in treatment planning may demonstrate notable mismatches in bowel filling and position that present a barrier to accurate dose calculations. While these mismatches could be handled during the treatment planning process in the clinical setting through intensive manual contouring to enable electron density overrides, we opt to avoid this entirely through the creation of an intensity projection prior. Here we adopt a novel approach to data augmentation for paired data DL applications in which incompatible representations of intestinal gas in corresponding MRI and CT scans are made compatible through the propagation of air from MRI to CT images. The handling of intestinal gas proceeded in the following steps:

(1)
Corresponding MRI and CT scans were rigidly registered in the ViewRay treatment planning system (ViewRay Inc., Oakwood Village, OH) to achieve a gross alignment, primarily of bony anatomy.
(2)
Using automated thresholding and morphological reconstruction operations, regions of air in each scan were identified. For CT images, Otsu's method (Otsu 1979) was used to compute a single threshold value with which the image could be quantized to produce a body mask that excludes most notably the couch. Within this mask, regions of air were thresholded with a histogram shape-based method by selecting intensities falling within an offset (defined as 7 bin widths here, nominally 140 HU) around the lower mode. Finally, binary erosion and dilation operations were performed to eliminate small, noisy regions and produce a smoother segmentation, respectively. For MR images, a similar approach was taken to produce a body mask. In identifying regions of air within this mask, extra precautions were required to avoid selecting low-signal regions not containing air (e.g. vertebral body). To this end, the erosion of a quantized image based on a five-level thresholding achieved with Otsu's method was used as the basis for the morphological reconstruction (Vincent 1993) of segmented regions within the abdominal cavity as defined by the space surrounded by the body wall and excluding the region around the vertebral body.
(3)
These regions in the CT images were infilled with realistic texture via harmonic inpainting to produce a CT image with no air-containing regions (Schönlieb 2015, Parisotto and Schönlieb 2016).
(4)
Regions of air identified in the MR images in step (2) are then propagated to the corresponding CT images. As such, regions of air originally represented in the CT image but not the corresponding MR image maintain the infilled texture from step (3). Finally, the scans are deformably registered by way of a statistical dependence measure algorithm (Shi 2013).

2.4. Model and loss formulation

The task of sCT reconstruction viewed as a forward mapping from MRI to CT has been previously discussed (Olberg et al 2019). Briefly, the goal in establishing a generative model is to estimate a suitable operator that maps from MRI to CT, which is challenged by the many-to-one correspondence of pixel intensities between the two modalities. We approach the image-to-image translation task using a generative adversarial network framework consisting of two competing networks: (1) a generative model G that produces sCT samples residing in the same space as true CT data and (2) a discriminator D that attempts to distinguish between samples generated by G and true samples. During training, G and D undergo alternating minimization steps of their respective loss function, each of which depends on the generalized definition of sigmoid cross entropy loss (Abadi et al 2015):

$\begin{eqnarray}&&L=\vec{x}-\vec{x}\ast \vec{z}+\mathrm{log}(1+\exp (-\vec{x})),\end{eqnarray} \tag{ 1 }$

where $\vec{x}$ is the true or predicted image logits computed by D and $\vec{z}$ is a label corresponding to true ( $\vec{1}$ ) or predicted ( $\vec{0}$ ). Using this definition of sigmoid cross entropy loss, the generator loss function g_loss is defined by

$\begin{eqnarray}&&{g}_{{loss}}={L}_{{adv}}+{l}_{{mae}},\end{eqnarray} \tag{ 2 }$

where the adversarial loss L_adv is the sigmoid cross entropy loss (equation (1)) with predicted image logits $\vec{x}$ assigned a true label ( $\vec{z}=\vec{1}$ ) and the mean absolute error (MAE) loss l_mae is the mean of the absolute difference between true images I_true and predicted images I_pred:

$\begin{eqnarray}&&{l}_{{mae}}=\mathrm{mean}\left(\left|{I}_{{pred}}-{I}_{{true}}\right|\right).\end{eqnarray} \tag{ 3 }$

Minimizing such a loss formulation yields synthetic images that are computed as true images by the discriminator through the adversarial term L_adv while also maintaining pixel-wise agreement between the generated and true images through the MAE term l_mae.

While true labels are assigned to predicted images in g_loss, the discriminator aims to correctly identify true and predicted images. As such, the discriminator loss function d_loss depends only on the sigmoid cross entropy loss:

$\begin{eqnarray}&&{d}_{{loss}}={L}_{{pred}}+{L}_{{true}},\end{eqnarray} \tag{ 4 }$

where L_true and L_pred are the sigmoid cross entropy loss (equation (1)) with true or predicted image logits $\vec{x}$ and corresponding labels $\vec{z}=\vec{1}$ or $\vec{z}=\vec{0}$ , respectively. While G strives to generate outputs computed as true images by D through the adversarial loss term L_adv with predicted image logits assigned a true label, the expected true and false labels are instead used for the true and predicted CT images in the formulation of d_loss.

Trainable paramaters describing the various operations in each of the competing networks are optimized through alternating minimization steps of the respective loss functions. For d_loss, TensorFlow's (Abadi et al 2015) gradient descent optimizer is used with an initial learning rate of 0.000 02. In the following minimization step, g_loss is minimized utilizing the Adam gradient-based stochastic optimization algorithm (Kingma and Ba 2017) with an initial learning rate of 0.0002, β₁ = 0.7, β₂ = 0.999, and $\hat{\epsilon }={10}^{-8}$ . In both cases, learning rates decay every 10 000 steps subject to a staircase exponentional function with a decay rate of 0.95.

2.5. Network architecture

2.5.1. Generator

The fully convolutional DenseNet (Jégou et al 2017) employed here, illustrated in figure 1, consists of individual dense blocks arranged to form a stacked encoder-decoder U-net (Ronneberger et al 2015) structure. The constituent dense blocks, which resemble residual blocks (He et al 2015) in that intermediate feature maps are iteratively concatenated, are built using the following components: Batch Normalization (BN), ReLU activation, 3 × 3 same convolution, and dropout with probability p = 0.2. The growth rate k of the layer (k = 16 in this case) dictates the number of feature maps computed by each layer. Feature maps computed by each of these intermediate layers are iteratively concatenated to form the output of the dense block itself, granting a degree of convergence-aiding supervision due to the short paths to all feature maps in an architecture that is ultimately quite parameter efficient (Jégou et al 2017). Transition down (TD) operations in the encoder path, which serve to reduce the dimensionality of feature maps, consist of BN, followed by ReLU activation, 1 × 1 convolution, dropout with probability p = 0.2, and 2 × 2 max pooling with stride 2. To recover the input dimensions on the decoder path, transition up (TU) layers perform 3 × 3 transpose convolution with stride 2. Skip connections between corresponding layers of the encoder and decoder sides of the network transfer structural information that aids in the reconstruction of fine detail as the full input resolution is recovered along the decoder path.

2.5.2. Discriminator

The architecture of the discriminator is unchanged from the previous application to sCT generation in the breast (Olberg et al 2019). D is a straightforward encoder consisting of five convolutional layers that ultimately applies the sigmoid function to yield the probability of the evaluated image being a true CT image.

2.6. Evaluation

The proposed approach to sCT reconstruction is evaluated in two primary ways, each with a focus on the two classes of patients previously discussed. Pixel-wise image comparisons are made between true CT images and reconstructed sCT images for patients belonging to each class using the MAE and mean absolute percentage error (MAPE) measured in regions within the body contour not containing air:

$\begin{eqnarray}&&\mathrm{MAE}=\displaystyle \frac{{\sum }_{i=1}^{n}\left|{{CT}}_{i}-{{sCT}}_{i}\right|}{n},\mathrm{and},\end{eqnarray} \tag{ 5 }$

$\begin{eqnarray}&&\mathrm{MAPE}=\displaystyle \frac{100}{n}\sum _{i=1}^{n}\left|\displaystyle \frac{{{CT}}_{i}-{{sCT}}_{i}}{{{CT}}_{i}}\right|,\end{eqnarray} \tag{ 6 }$

where n is the number of pixels not containing air in both the CT reference image and the generated sCT image. These image comparisons are made for sCT outputs of both the model proposed here and a 'blind' model trained with the same patient data only without the pre-processing treatment of regions of air, which represents the conventional DL-based approach to the present problem. Considering the fact that the reference CT image for patients of Class 2 may be largely incompatible with the corresponding MRI image, we also evaluate the degree of overlap of regions of air in input MR images and reconstructed sCT outputs of each model using the Dice similarity coefficient (DSC):

$\begin{eqnarray}&&\mathrm{DSC}=\displaystyle \frac{2\left|X\cap Y\right|}{\left|X\right|+\left|Y\right|},\end{eqnarray} \tag{ 7 }$

where X and Y are the sets of pixels in air masks of an MR image and corresponding sCT image. Additionally, the structural similarity index (SSIM), which assesses similarity through three distinct luminance, contrast, and structure terms, (Olberg et al 2019) is calculated between the input MR image and reconstructed sCT output of each model in these air-containing regions.

A subsequent dosimetric evaluation compares dose distributions calculated at the first treatment fraction in simulation CT-based clinical plans to those recalculated using sCT-derived electron density information and the same optimization parameters used in the clinical plans for each of the 33 test patients. Full dose-volume histograms (DVHs) for the target and surrounding tissues of well-matched patients in Class 1 are used to first establish the baseline accuracy of the proposed approach to sCT reconstruction. The same comparison is made for patients in Class 2 demonstrating notable discrepancies in the presence of intestinal gas between corresponding MRI and CT scans to explore the effect of these discrepancies. For both patient classes, we examine differences in prescribed dose coverage of the PTV between the clinical CT-based plans and the proposed sCT-based plans. The 3D gamma index with a 3%/3 mm criterion is computed to evaluate agreement between the CT-based and sCT-based dose distributions for patients in each class (Low et al 1998). Additionally, mean DVH differences for each structure of interest are computed and evaluated for statistical significance in each patient class. For patients of Class 2, the full width at half maximum (FWHM) of the profile of the difference in calculated target coverage is used to evaluate the uncertainty in high-dose coverage of the target due to the involvement of intestinal gas. Dose calculations in each case were performed using the ViewRay treatment planning system integrated Monte Carlo algorithm in the presence of a magnetic field with a dose grid resolution of 0.3 cm and calculation uncertainty of 1%.

3. Results

3.1. Image comparison

Completing 1500 epochs during training required 126 h in total. At deployment, inference requires approximately 0.26 s/slice.

Each of the following comparisons shows an input MR image along with the sCT reconstruction produced by the blind and proposed models and the corresponding true CT image. Difference maps illustrate absolute differences in pixel intensities between the true CT image and sCT image in units of HU. Axial slices for representative patients belonging to Class 1 are displayed in figure 2. Absolute difference maps (figures 2(e)–(f)) between the true CT image and the sCT reconstruction for each patient show a general agreement in the bulk of the soft tissue represented in each image in the case of the proposed model and a failure to accurately reproduce HU values in the case of the blind model. This is reflected in the MAE values computed for patients of Class 1; the blind and proposed approaches achieve values of 143 ± 29 HU (MAPE = 14 ± 3%) and 90 ± 29 HU (MAPE = 9 ± 3%), respectively. Also included in the last row of this comparison is an example of the relatively rare case of a notable presence of intestinal gas that is well-matched in corresponding MRI and CT scans.

**Figure 2.** Image comparisons for representative slices of well-matched patients of Class 1. Input MR images (a), output sCT images for the blind model (b) and proposed model (c), true CT images (d), and absolute difference maps (e)–(f) between the true CT images and generated sCT images for the blind and proposed model, respectively. Values in the absolute difference maps are in units of HU. The final row illustrates the rare case of a relatively well-matched slice with a notable presence of intestinal gas.
Download figure:
Standard image High-resolution image

In contrast, figure 3 shows the same comparison made for patients of Class 2 in which notable discrepancies in the presence of intestinal gas between corresponding MRI and CT scans are observed. These discrepancies give rise to pixel-wise disagreements of the order of ± 800 HU in the involved gas-containing regions. At the same time, the failure on the part of the blind model to produce accurate HU values in regions of soft tissue is observed. The increased likelihood of discrepancies in soft tissue positions between corresponding scans in patients of Class 2 causes an increase in the MAE computed in regions not containing air in either image: up to 164 ± 41 HU (MAPE = 18 ± 5%) for the blind model and 112 ± 41 HU (MAPE = 14 ± 4%) for the proposed model.

The overlap of regions of air in MR images and the corresponding sCT reconstructions was evaluated in a total of 158 images from the four patients included in figure 3 using the DSC. The average DSC improved from 0.56 ± 0.22 for the blind model to 0.80 ± 0.21 for the proposed model. In these air-containing regions, the SSIM computed between the input MR image and reconstructed sCT image improved from 0.14 ± 0.06 to 0.34 ± 0.07 with the adoption of the proposed model over the blind model.

3.2. Dosimetric evaluation

Similarly, the dosimetric evaluation of the proposed approach to sCT reconstruction focused on the two distinct classes of patients. In both cases, optimization parameters selected in the CT-based clinical plans were used to recalculate dose distributions based on electron density information derived from the generated sCT images. Used as a baseline point of reference to establish the dosimetric accuracy of the proposed reconstructions, the thirteen well-matched patients of Class 1 demonstrate differences in the prescribed dose coverage of the PTV (V₁₀₀) of 1.3 ± 2.1% between CT-based clinical plans and the sCT-based plans with a gamma pass rate of 98.3 ± 1.3% using 3%/3 mm criterion. The representative DVHs for patients belonging to Class 1 shown in figure 4 demonstrate close agreement in calculated target coverage and doses to surrounding tissues between the CT-based clinical plans and the sCT-based recalculations.

**Figure 4.** Representative DVHs for well-matched test patients of Class 1 comparing the CT-based clinical plans (dashed) and sCT-based plans (solid) recalculated using the same plan parameters. The prescribed dose was 50 Gy in all but the last case. The legend at top-left is applicable to all sub-figures.
Download figure:
Standard image High-resolution image

For the twenty poorly-matched patients of Class 2, notable discrepancies in the representation of intestinal gas between corresponding MRI and CT scans result in sizeable and variable differences in PTV V₁₀₀ coverage: 13.3 ± 11.0% on average. Due to these differences, the gamma pass rate is reduced to 93.9 ± 9.8% using the same 3%/3 mm criterion. These differences in target coverage, along with small discrepancies in the dose to closely involved tissues like the duodenum, are observable in the representative DVHs included in figure 5. Also plotted in figure 5 is the difference in target coverage at each point, which yields an approximately Gaussian profile. The FWHM of this profile for all patients in Class 2 covers an average range of 51.4(SD = 1.3)–58.2(1.6)Gy. Mean DVH differences between the CT-based clinical plans and the sCT-based plans for all structures of interest are plotted in figure 6 for patients of each class. For patients in Class 1, differences between the CT-based and sCT-based plans are computed to be statistically insignificant (distributed with a median of zero) using a two-sided Wilcoxon signed rank test (Wilcoxon 1945) for each of the duodenum (p = 0.34), large bowel (p = 0.62), liver (p = 0.52), small bowel (p = 0.38), spinal cord (p = 0.91), stomach (p = 0.91), and PTV (p = 0.20). For patients in Class 2, differences were instead shown to be statistically significant (not distributed with a median of zero) using a two-sided Wilcoxon signed rank test for each of the duodenum (p = 0.002), large bowel (p = 0.007), liver (p < 0.001), small bowel (p < 0.001), spinal cord (p = 0.01), stomach (p < 0.001), and PTV (p < 0.001).

**Figure 5.** Representative DVHs for patients of Class 2 characterized by notable differences in the presence of intestinal gas between corresponding MRI and CT scans comparing the CT-based clinical plans (dashed) and sCT-based plans (solid) recalculated using the same plan parameters. Differences in calculated target coverage are plotted at each point (dotted) to yield an approximately Gaussian curve. The FWHM of these difference profiles for all patients of Class 2 covers an average range of 51.4–58.2 Gy. The legend at top-left is applicable to all sub-figures.
Download figure:
Standard image High-resolution image

**Figure 6.** Summary of mean DVH differences between CT-based clinical plans and sCT-based recalculated plans for each structure of interest in well-matched patients of Class 1 (n = 13) and patients of Class 2 (n = 20) characterized by notable differences in the presence of intestinal gas between corresponding MRI and CT scans.
Download figure:
Standard image High-resolution image

4. Discussion

The potential value of generating synthetic CT data for MRI-only ART in the abdomen is multifaceted. Although therapeutic gains may be achieved, adopting an online adaptive workflow introduces additional time burdens to the process of treatment delivery including re-contouring, re-planning, and quality assurance—all of which must occur while the patient remains on-table. Re-contouring, which must be undertaken to accommodate changes in normal tissue volumes and—ideally—the variable presence of intestinal gas that is our current focus, represents a significant portion of the total on-table time per fraction: up to 24 min in the worst case (Henke et al 2018). By utilizing the proposed approach to sCT reconstruction explored here in which the focus was placed on producing a clinically unavailable data set of well-matched representations of intestinal gas, one is able to rapidly produce (0.26 s/slice) sCT data in the clinical setting that accurately reflects both the presence of intestinal gas shown in a patient's daily MRI scan and HU values present in a true CT scan. In this way, one of the primary concerns prompting re-contouring in the adaptive setting is potentially eliminated. When paired with an auto-contouring strategy designed for MRI-guided ART (Fu et al 2018), the time burden associated with re-contouring may become negligible.

Another primary motivation in adopting an adaptive workflow is to achieve dose escalation under shifting anatomic conditions (Henke et al 2018). Plan adaptation is often performed to increase OAR sparing while also increasing target coverage (Henke et al 2018). The observed underdosing of the target for patients of Class 2 characterized by mismatched representations of intestinal gas is especially relevant in these scenarios when dose escalation is a specific aim of pursuing plan adaptation. The higher calculated target coverage in clinical plans in which mismatches between the planning CT and setup MRI are not accounted for compared to the sCT-based plans and uncertainty in high-dose regions (figure 5) represent a barrier to any escalation that is pursued in these scenarios. In the present study, we have explored this effect at the first treatment fraction for 33 test patients, demonstrating that even at the first fraction, a non-negligible portion of the patient population may experience uncertainties in simulation CT-based dose calculations due to the involvement of intestinal gas. This concern becomes even more important when considering the accumulation of dose over the course of a treatment in which each fraction is adapted and these differences accrue. However, it is important to note that the DVHs computed for a given treatment fraction and presented in the ViewRay treatment planning system are full rather than fractional DVHs. Considering this, the DVHs examined in this study convey the overall effect in the case that the magnitude of the discrepancies observed at the first fraction are carried forward through each subsequent fraction. Considering the hard-to-characterize nature of gas motion, we do not examine here whether there exists some sort of interplay effect throughout the course of treatment.

We acknowledge a number of additional limitations to the present study. First and foremost, the image-to-image translation approach employed here struggles fundamentally in a situation in which the anatomy represented in corresponding training image pairs differs. Although we took care in the present study to propagate regions of air, geometric differences in soft tissues surrounding the target are not always handled sufficiently by multi-modal deformable image registration. These general effects are observed in image comparisons, where outputs of the blind model appear to be uniformly lower in intensity compared to the true CT images. We attribute this to the increased ambiguity in pixel correspondences in the training image pairs. In effect, more pixel intensities are pushed towards the lower intensity of air. In the proposed case with the pre-processed training data set, the limiting ambiguity becomes the difference in soft tissue representation in the corresponding MR and CT images. The model is reasonably robust to the variations present in the training data set, but struggles to faithfully reconstruct the most dynamic tissues and structures. Style-transfer methods that do not rely on matched pairs of training data as exemplified by CycleGAN (Zhu et al 2018) may be of particular use in this application to overcome the limitations of multi-modal deformable image registration. The implementation of an unpaired approach may still benefit from semi-supervised data produced in the manner described here to overcome the limitations of the unpaired approach in cases with inherent ambiguity, as is the case for MRI signal intensity. A second limitation of the trained model is reconstruction inaccuracies at the superior and inferior extremes of a patient's image stack that stem from the make-up of the training data set. While every patient data set was roughly centered on the target and surrounding tissues, slices containing views of the lungs and diaphragm or inferior portions of the abdomen were not as equally represented. These reconstruction inaccuracies may be ameliorated by adopting a more robust 3D network architecture over the relatively lightweight 2D architecture at the expense of heavily increased memory usage, which may not be a feasible trade-off in all settings. An additional concern regarding MR image quality in the abdominal sCT reconstruction task is the issue of susceptibility artifacts in the GI tract. The effects of these artifacts are lessened due to the lower field strength of the MRgRT platform utilized here, (Ginn et al 2017) but certain circumstances involving the ingestion of fortified foods prior to treatment—although not encountered in the present study—have been reported and are thus an important consideration in this application (Green et al 2018).

An inherent challenge in this space is the issue of image evaluation when the underlying premise is that the 'ground truth' simulation CT data is incompatible with the setup MRI data used due to the involvement of intestinal gas and the motion of GI structures. In computing both the MAE and MAPE, direct pixel-wise comparisons are made between the sCT images and the corresponding CT images. In a similar way, the multi-modality image similarity comparison made between MR images and sCT reconstructions by way of the SSIM is challenged by modality-specific differences in contrast and luminance. As such, the image comparisons made here are imperfect comparisons and we instead rely more heavily on the improvement in the representation of air in our sCT reconstructions as measured by the DSC considering that is the primary focus of this work. Nonetheless, a comparison to other methods is warranted. Ahunbay et al (2019) achieved an MAE of approximately 25 HU in the abdomen using a method entirely reliant of multiple deformations of true CT images. Closer to the realm of true image synthesis, multiple atlas-based techniques have reported values ranging from 40 to 200 HU for sites including the pelvis, cranium, and general torso (Edmund and Nyholm 2017, Johnstone et al 2018, Guerreiro et al 2019). Finally, results of DL-based approaches at various sites quantified using various metrics have been collected and reported by Spadea et al (2021) for general comparison with the results achieved here. For abdominal cases demonstrating little involvement of intestinal gas, (Cusumano et al 2020) report an MAE of 78.71 ± 18.46 within the body contour. In a similar cohort, (Qian et al 2020) report MAE values ranging from approximately 42–79 HU for two DL-based methods. In the MAE values we report, we do not distinguish between regions of soft tissue and bone, but we do compare competitively to existing methods.

Finally, the separation of test patients into two separate classes performed here relied on the qualitative assessment of the relative involvement of intestinal gas and the degree to which representations of intestinal gas matched between corresponding MRI and CT scans. In some cases, the characterization of the patient was clear—there were easily observable discrepancies or gas was entirely uninvolved—but the categorization was more challenging in other cases. As such, the furthering of this and related work would benefit from some quantitative approach to the characterization of the involvement and similarity of representations of intestinal gas that would in turn enable the exploration of trends in distinct patient groups.

5. Conclusions

The approach to sCT reconstruction in the abdomen evaluated here highlights the challenges posed by the presence of intestinal gas throughout the MRI-guided ART workflow. Eliminating the burden of handling intestinal gas from the clinical setting through the creation of a clinically unavailable training data set for training a paired data generative model offers the potential to streamline a time-intensive portion of the adaptive treatment workflow. These time savings are gained while also enabling accurate dose calculations in adaptive treatments despite the variable presence of intestinal gas at each stage of treatment planning and delivery during MRI-only ART in the abdomen.

Acknowledgments

JSK acknowledges this work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2020R1A4A101661911)..

Conflicts of interest

The authors have no conflicts to disclose.

Abdominal synthetic CT reconstruction with intensity projection prior for MRI-only adaptive radiotherapy

Article metrics

Permissions

Author e-mails

Author affiliations

ORCID iDs

Dates

Abstract

1. Introduction