Feasibility of CycleGAN enhanced low dose CBCT imaging for prostate radiotherapy dose calculation

Daily cone beam computed tomography (CBCT) imaging during the course of fractionated radiotherapy treatment can enable online adaptive radiotherapy but also expose patients to a non-negligible amount of radiation dose. This work investigates the feasibility of low dose CBCT imaging capable of enabling accurate prostate radiotherapy dose calculation with only 25% projections by overcoming under-sampling artifacts and correcting CT numbers by employing cycle-consistent generative adversarial networks (cycleGAN). Uncorrected CBCTs of 41 prostate cancer patients, acquired with ∼350 projections (CBCTorg), were retrospectively under-sampled to 25% dose images (CBCTLD) with only ∼90 projections and reconstructed using Feldkamp–Davis–Kress. We adapted a cycleGAN including shape loss to translate CBCTLD into planning CT (pCT) equivalent images (CBCTLD_GAN). An alternative cycleGAN with a generator residual connection was implemented to improve anatomical fidelity (CBCTLD_ResGAN). Unpaired 4-fold cross-validation (33 patients) was performed to allow using the median of 4 models as output. Deformable image registration was used to generate virtual CTs (vCT) for Hounsfield units (HU) accuracy evaluation on 8 additional test patients. Volumetric modulated arc therapy plans were optimized on vCT, and recalculated on CBCTLD_GAN and CBCTLD_ResGAN to determine dose calculation accuracy. CBCTLD_GAN, CBCTLD_ResGAN and CBCTorg were registered to pCT and residual shifts were analyzed. Bladder and rectum were manually contoured on CBCTLD_GAN, CBCTLD_ResGAN and CBCTorg and compared in terms of Dice similarity coefficient (DSC), average and 95th percentile Hausdorff distance (HDavg, HD95). The mean absolute error decreased from 126 HU for CBCTLD to 55 HU for CBCTLD_GAN and 44 HU for CBCTLD_ResGAN. For PTV, the median differences of D 98%, D 50% and D 2% comparing both CBCTLD_GAN to vCT were 0.3%, 0.3%, 0.3%, and comparing CBCTLD_ResGAN to vCT were 0.4%, 0.3% and 0.4%. Dose accuracy was high with both 2% dose difference pass rates of 99% (10% dose threshold). Compared to the CBCTorg-to-pCT registration, the majority of mean absolute differences of rigid transformation parameters were less than 0.20 mm/0.20°. For bladder and rectum, the DSC were 0.88 and 0.77 for CBCTLD_GAN and 0.92 and 0.87 for CBCTLD_ResGAN compared to CBCTorg, and HDavg were 1.34 mm and 1.93 mm for CBCTLD_GAN, and 0.90 mm and 1.05 mm for CBCTLD_ResGAN. The computational time was ∼2 s per patient. This study investigated the feasibility of adapting two cycleGAN models to simultaneously remove under-sampling artifacts and correct image intensities of 25% dose CBCT images. High accuracy on dose calculation, HU and patient alignment were achieved. CBCTLD_ResGAN achieved better anatomical fidelity.


Introduction
In modern image-guided radiotherapy (IGRT), cone beam computed tomography (CBCT) is used as a routine in-room imaging technique. Most radiotherapy centers have medical linear accelerators equipped with a kilovoltage CBCT (kV-CBCT) scanner, which provides full three-dimensional (3D) information about the patient's anatomy at every treatment fraction. In the presence of inter-fractional anatomical changes between acquisition of the planning CT (pCT) and the treatment day, CBCT imaging data would be suitable for treatment adaptation and enabling accurate dose delivery (de Jong et al 2021, Moazzezi et al 2021, Sibolt et al 2021, Byrne et al 2022.
One primary problem which arises in using CBCT for treatment adaptation is that CBCT image quality is typically insufficient to infer and adapt the applied daily dose (Kurz et al 2015). Typically, CBCT intensity correction techniques on a standard full dose scan have been investigated in current literature. The wide range of techniques include look-up-table based solutions (Kurz et al 2015), the use of pCT-to-CBCT virtual CT (vCT) (Peroni et (Kurz et al 2015, might struggle in the pelvic region owing to the more pronounced and complex inter-fractional changes in anatomy. While the DIR inaccuracies could be improved by means of using vCT as prior for projection based intensity correction (Niu et al 2010, Park et al 2015, Kurz et al 2016, the time for generating corrected images, which takes several minutes, hinders the use of the obtained corrected CBCT images for online treatment adaption. Similarly, MC based methods which take up to several hours are not suitable. Recently, the use of deep convolutional neural network (CNN) to speed up CBCT correction has received substantial interest. The U-Net architecture (Ronneberger et al 2015) has been employed to translate images across domains and correct CBCT intensities. In Kida et al (2018), a U-Net was trained using CBCT and vCT as input and target to translate the CBCT into a pCT equivalent image. Other U-Nets were trained for projection based image correction using MC simulated scatter distributions (Maier et al 2018(Maier et al , 2019 or corrected projections retrieved with a previously validated algorithm based on a vCT prior (Hansen et al 2018. Apart from the U-Net, generative adversarial networks (GAN) (Goodfellow et al 2014) have been applied to translate CBCT into pCT images. In particular, the cycle-consistent GAN (cycleGAN)  architecture has seen considerable attention for unpaired training. For example, in the brain and the pelvic region (Harms et al 2019) (however using an additional paired loss term), in the HN region (Liang et al 2019) and the pelvic region (Kida et al 2019, dosimetric analysis of the cycle-consistent generative adversarial networks (cycleGAN) based corrected CBCT images were included, highlighting high dose calculation accuracy for photon therapy. The majority of deep learning based correction methods take less than a minute.
Using CBCT in IGRT increases the precision of the treatment, but also adds to the dose delivered to healthy tissues. One additional concern is thus that the imaging dose received from repeated CBCT scans at 20-35 fractions might be considerable and increase the risk of secondary malignancies. Kan et al (2008) measured, with thermoluminescent dosimeters, the dose from CBCT in a female anthropomorphic phantom and reported the effective and absorbed doses to 26 organs with standard and low-dose imaging modes. Effective doses to the whole body from standard mode CBCT for imaging of the pelvis were 22.7 mSv per scan. They concluded that CBCT on a daily basis could add an additional 2%-4% to the absolute secondary cancer risk. The radiationinduced cancer risk due to organ doses from kV-CBCT was also estimated by Kim et al (2013). Absorbed dose measurements in a cylindrical and in an anthropomorphic phantom yielded 170-187 mGy for the pelvic scan protocol, for which they concluded that 70% of additional secondary cancer risk from radiotherapy treatment of prostate patients can be attributed to CBCT imaging. Therefore, the excess radiation-induced cancer risk from CBCT is not negligible.
According to the Report of the American Association of Physicists in Medicine (AAPM) Therapy Physics Committee Task Group 180 (Ding et al 2018), imaging dose should be considered in the treatment planning process if larger than 5% of the therapeutic target dose, and in general the principle of 'as low as reasonably achievable' (ALARA) for imaging should be pursued. In the current clinical practice, radiation oncologists typically use the lowest possible dose of radiation to obtain the CBCT images, or try to to limit the frequency of CBCT imaging during treatment to reduce the risk of secondary cancers from cumulative CBCT dose. Lower dose CBCT at equivalent image quality could thus be favourable as it offers a higher flexibility of in terms of pretreatment imaging frequency. Reducing dose, however, could be challenging since the CBCT image quality is further degraded, leading among others to potential loss of anatomical information.
Prior research has thoroughly investigated CBCT correction, however it remains to be investigated whether advances in deep learning can be leveraged to substantially reduce CBCT dose while jointly correcting CBCT image intensity and retaining therapeutic dose calculation accuracy. To address the needs of (1) CBCT dose reduction and (2) improving image quality for dose adaptation, our study investigates a cycleGAN-based low dose CBCT approach that translates a CBCT from a reduced number of projections (approximately 90), namely CBCT LD , to a pCT equivalent image, referred to as CBCT LD_GAN , by simultaneously removing under-sampling artifacts and correcting image intensities while preserving anatomy fidelity. In parallel to CBCT LD_GAN , we also implemented an alternative cycleGAN with a generator residual connection to improve anatomical fidelity, referred to as CBCT LD_ResGAN .

Materials and methods
2.1. Patient data 2.1.1. Data acquisition In this study, pCT and CBCT imaging datasets of 41 prostate cancer patients who received volumetric modulated arc therapy (VMAT) treatment to a total dose of 70-76 Gy in 2 Gy fractions at the Department of Radiation Oncology of the LMU Munich University Hospital were collected. All patients were advised to follow an in-house bladder and rectum filling protocol. The pCTs were acquired with a Toshiba Acquilion LB CT scanner (Canon Medical Systems, Japan). Tube voltage was set to 120 kV. An image grid of 1.074 mm × 1.074 mm × 3.000 mm was used in combination with a 55 cm lateral ield of view (FOV). No contrast agent was used.
To prevent the saturation of the detector panel and body outline artifacts, all retrospectively selected CBCT images were acquired in treatment position with a scan protocol of 120 kV tube voltage, exposure time of 20 ms and x-ray tube current of 20 mA per projection using the XVI system (version 5.52) of a Synergy medical linear accelerator (Elekta, Sweden). This is the lowest dose pelvic protocol at our institution. The lateral FOV was increased by using a laterally-shifted detector panel in M position and a bowtie filter. Images with body outline truncation in spite of the increased fov were excluded from the study. Around 350 projections [346,357] were acquired in each 360°scan.

Data preparation
To generate a low dose CBCT LD from the full dose CBCT org , CBCT projection data were uniformly undersampled by a factor of 4 (keeping 25% of the projections) from about 350 to 90 projections, followed by a reconstruction using the Feldkamp-Davis-Kress (FDK) implementation of Reconstruction ToolKit (RTK) (Rit et al 2014) with 410 × 410 × 264 voxels on an isotropic 1.0 mm 3 grid. By thresholding and morphological masking, the patient couch was removed from the CBCT image, which was then converted to an image size of 512 × 512 by zero padding with the pixel intensity in the attenuation coefficient value (μ) range [0, 0.04] (values above 0.04 were set to 0.04). The first and last 35 image slices in superior-inferior direction with partial FOV cone truncation were excluded. pCTs were re-sampled to the same grid and image size using a linear interpolator from the SimpleITK library. The table was also removed from the images. The pixel intensity of the CT images was empirically converted to the range of the CBCT images ((HU + 1024)/65536) (Park et al 2015). The resulting intensities were mapped to the range [0, 0.05] (values above 0.05 were set to 0.05). Patients were instructed to lay with arms down and forearms folded up during acquisition. Since pCT slices showing limbs were excluded, the data used for training covered the pelvis and lower abdomen. To incorporate patient outline information in the training, a binary mask of each pCT and CBCT image was created by thresholding. All images were stored in 16 bit format before training. The data pre-processing workflow is illustrated in figure 1.

CycleGAN architecture and training 2.2.1. Forward and backward cycles and loss function
To correct the intensity of low dose CBCT LD , we adapted a cycleGAN architecture , Ge et al 2019 to learn the image translation between low dose CBCT LD (input) and pCT equivalent images (output) with unpaired patient data (planning and fraction images). The framework chains two sets of a generator and discriminator networks. The generator aims to obtain the most efficient representation of CBCT LD from which a synthetic pCT can be generated slice by slice in the forward cycle. The discriminator is used to distinguish synthetic pCT with output label 0 and true pCT with label 1 in the forward cycle. In the backward cycle, outputs of the generator and discriminator are reversed. The loss function for both generators and discriminators consists of the terms described below.
In figure 2 (panel (a)), a generator G pCT learns a mapping from CBCT LD to pCT such that the distribution of images from G pCT (CBCT LD ) is indistinguishable from the distribution of pCT by a discriminator D pCT using an adversarial loss in the forward cycle: where G pCT aims to minimize the first term )] by generating synthetic images G pCT (CBCT LD ) that closely resemble pCT, while D pCT aims to maximize both terms and become as good as possible in distinguishing between synthetic images G pCT (CBCT LD ) and real pCTs.
In figure 2 (panel (b)), the second generator G CBCT LD was trained to establish the inverse mapping from pCT to CBCT LD with the help of the second discriminator D CBCT LD in the backward cycle:

=
With the above adversarial loss, the generators G pCT and G CBCT LD are encouraged to generate realistic images of the target domain in order to fool the discriminators D pCT and D CBCT LD .
To stabilize the training and ensure the inverse-consistent mappings with respect to the two image domains, a cycle consistency loss L cyc is introduced to enforce G CBCT LD (G pCT (CBCT LD )) ≈CBCT LD and G pCT ( G CBCT LD (pCT)) ≈pCT. In the forward cycle, L cyc computes the L 1 norm of the output from G CBCT LD with the generated synthetic pCT as input and the input low dose CBCT LD : In the backward cycle, the roles of CBCT LD and pCT are again swapped and the corresponding cycle consistency loss function is: The cycle consistency loss, however, does not directly enforce the structural similarity between the input CBCT LD and the generated CT images. A previous CBCT-to-CT study has shown that there are measurable deviations in the patient body outline . To incorporate patient outline information and geometrically constrain the generator, we have adapted a shape loss as suggested in Ge et al (2019). A U-Net shape extractor (SE) was first trained for 5 epochs with paired pCT as input and the corresponding binary masks as the ground truth output. During the cycleGAN training, the shape extractor segments the patient outline of the generated CBCT LD_GAN image from G pCT and computes the L 1 loss between this new mask and its corresponding ground truth mask from the input low dose CBCT LD : Therefore the total loss used was: where λ 1 and λ 2 are hyperparameters that were empirically set to 25 and 1 in this study. The objective function to be solved was Since this min-max optimization aims to find the model parameters that could describe the distribution of the image domain instead of using pixel-wise comparison, unpaired datasets could be used for this study.
We additionally trained a cycleGAN variant where a residual skip connection was used for the generator (see figure 3). This approach has been reported to improve geometric fidelity to the input image in the field of histopathology (de Bel et al 2021) and used in a previous CBCT-to-CT study (Deng et al 2022). Since anatomical fidelity is critical in our application, we have adopted this approach. As shown in figure 3, G pCT GAN was trained to convert CBCT LD directly to CBCT LD_GAN in panel (a). For CBCT LD_ResGAN , G pCT ResGAN was trained to convert CBCT LD to an intermediate image, which has reversed intensities that suppress the streak artifacts from the CBCT LD input image as shown in panel (b). In the backward cycle, the other generator G CBCT LD in the CBCT LD_ResGAN approach was also trained to obtain the final output with the addition of the pCT input. Hyperparameters λ 1 and λ 2 were empirically set to 25 and 0 for CBCT LD_ResGAN . It was observed that the shape loss did not improve the performance of CBCT LD_ResGAN , as opposed to CBCT LD_GAN . Supplementary figure S1 and supplementary figure S2 illustrate the λ 2 experiments for one exemplary ensemble model validation patient (section 2.2.2) for CBCT LD_GAN and CBCT LD_ResGAN , respectively.

Network training
In a geometric augmentation pipeline, we employed two-dimensional (2D) horizontal flipping and affine transformations including rotation of [−5°, 5°] and scaling by [0.9, 1.1] with a bicubic interpolation over 4 × 4 neighboring pixels to the CBCT and pCT inputs and their masks to enhance the generalisability of the model.
For the generator, the encoder contains two convolutional layers with stride 2 and the decoder contains two deconvolutional layers with stride 2. Nine residual blocks between encoding and decoding operations were used (Johnson et al 2016). For the discriminator, 70 × 70 PatchGAN  was employed with a downsampling scheme from 256 × 256 to 32 × 32 by applying four series of 2D convolutional layers followed by instance normalization (Ulyanov et al 2016), except for the first and last layer, and LeakyReLU with a slope of 0.2 as nonlinearity, except for the last layer. The receptive field of the network was 70 × 70 and each pixel in the output was evaluated as a scalar in the range [0, 1]. The networks were implemented in PyTorch (v1.12.0).
Training was performed starting from the pre-trained model provided by Ge et al (2019). Results from training without the pre-trained model did not show convergence at the same number of epochs as for the pretrained model. The adam optimizer was used for both generator and discriminator. The learning rate was set to 0.0002 during the first 100 epochs, and gradually reduced to zero over the next 100 epochs. For input to the network, the image patch was resampled to 256 × 256 pixels for the data augmentation. The batch size was set to one. A RTX A6000 graphics processing unit (GPU) (NVIDIA, California USA) was used.
Among a total of 41 patient datasets, a subset of 30 patients using four single folds, each containing 25 patients were used to perform the training with unpaired datasets. Three patient datasets were used as an ensemble model validation set and eight were used as a final test set. After the training, the generators G pCT GAN and G pCT ResGAN were used to correct CBCT LD intensity by translating CBCT LD slice-by-slice into pCT equivalent images, labelled CBCT LD_GAN and CBCT LD_ResGAN . As illustrated in figure 4, since four different folds were used for training the cycleGAN, four G pCT GAN and G pCT ResGAN with identical training hyper-parameters were obtained and applied to the ensemble model validation set. The median of the four models was used as the final output. For every 10th epoch, we computed the mean absolute error (MAE) and mean error (ME) for the three ensemble model validation cases in comparison to the reference vCT (section 2.3.1) and compared the appearance of soft tissues, bones, air cavities and body outline visually to find the optimal stopping epoch.

Data evaluation 2.3.1. Reference vCT and scatter corrected CBCT
Since there could be substantial anatomical differences between pCT and CBCT LD due to changes in bladder and rectum filling, as well as in patient positioning, the obtained images were not directly compared to the pCT for determining the accuracy of CBCT LD_GAN or CBCT LD_ResGAN . Instead, we generated a vCT by mapping the pCT to the daily CBCT via a dedicated DIR approach. As described in Hofmaier et al (2017), we aim for (1) image similarity which is computed by normalized gradient fields, and (2) deformation regularity which is computed Then the four models were applied to the ensemble model validation set and the median of the four outputs was evaluated to find the best model, which was then applied to the final testing set in evaluation. by curvature regularization. The optimization problem is solved in a discretize-then-optimize scheme using a quasi-Newton L-BFGS optimizer.
A CBCT correction technique that had been validated in Park et al (2015) and Kurz et al (2016) was employed as an alternative reference for evaluating the network results and their comparison to vCT for the eight test cases. This reference correction approach was fully described in the original publications of Niu et al (2010) and Niu et al (2012) and in follow-up studies from Hansen et al (2018) and Landry et al (2019). We first forward project the VCT according to the geometry of the CBCT scanner to retrieve primary beam projections (I pri ). The scatter and other low frequency deviations (I sca ) are calculated as the difference between a scaled original CBCT org projection (I org ) with ntensity scaling factor (ISF) and (I pri ) followed by a generous smoothing function f. The scatter corrected projection (I cor ) was estimated by subtracting the scatter contribution from the original measured CBCT org projections. With I cor , we could reconstruct a scatter-corrected CBCT, in the following referred to as CBCT cor with HU values equivalent to the pCT, and with ideally the same anatomy as CBCT org . In line with CBCT LD , CBCT cor was reconstructed using the FDK algorithm with the same reconstruction settings.

CT number accuracy
For the eight test cases, CBCT LD , CBCT LD_GAN and CBCT LD_ResGAN were compared to vCT in terms of the MAE and ME in HU. All pixel intensities were scaled from model output in μ to HU using the inverse empirical scaling used for the pCT. Pixels outside the joint body outline of vCT and CBCT LD_GAN /CBCT LD or CBCT LD_ResGAN /CBCT LD were excluded.

Dosimetric analysis
To determine dosimetric accuracy, we generated and recalculated VMAT plans on vCT, CBCT LD_GAN and CBCT LD_ResGAN for the eight test patients in a research version of a commercial treatment planning system (TPS) (RayStation, version 10.01, RaySearch, Sweden). Contours of target structures and organs-at-risks (OARs) were transferred via DIR from pCT to vCT, on which VMAT plans using one arc were optimized on an isotropic dose grid of 3.0 mm using a collapsed-cone dose engine. These plans were then recalculated on CBCT LD_GAN and CBCT LD_ResGAN . The generic Elekta Synergy beam model with Agility multi-leaf-collimator in the TPS was employed. The prescription was 74 Gy in 37 fractions and we aimed at clinical target volume (CTV) V 95% of 100%, and planning target volume (PTV) V 95% better than 95% of the prescription dose. We aimed at fulfilling the dose-volume histogram (DVH) constraints that are given in the QUANTEC report (Marks et al 2010) for the rectum and the bladder. Identical generic CT number to electron density conversion tables were employed for vCT, CBCT LD_GAN and CBCT LD_ResGAN in all cases. The dose distributions on vCT, CBCT LD_GAN and CBCT LD_ResGAN were then compared in terms of a 1%, 2% and 3% dose difference criterion. Voxels with less than 10% of the prescribed dose were excluded. In addition, the VMAT dose distributions for vCT, CBCT LD_GAN and CBCT LD_ResGAN were compared with regard to DVH parameters of clinically relevant target structures and OARs. CTV and PTV D 98% and D 2% , together with PTV D 50% and V 95% were analyzed. For the rectum V 50/60/65 Gy and for the bladder V 60/65 Gy were analyzed.
To evaluate the robustness of the dosimetric results to the reference image, the VMAT plans were additionally recalculated on CBCT cor and the dose distribution compared to the one from vCT with a 1% dose difference criterion.

Positioning accuracy
Daily patient positioning is one of the primary purposes of in-room CBCT. To evaluate registration accuracy when using CBCT LD_GAN and CBCT LD_ResGAN , we rigidly registered these images to the pCT using the research TPS. The transformations were compared to the one obtained from registering CBCT org to the pCT. Automated gray level rigid registration was used with six degrees of freedom.

Anatomical fidelity
To evaluate the networks' capability for preserving the anatomy correctly, we evaluated the shapes of organs geometrically. Two OARs, bladder and rectum, were segmented manually using the research TPS on CBCT org , CBCT LD_GAN and CBCT LD_ResGAN for this purpose. All contours were thoroughly validated by a radiation oncologist with expertise in prostate cancer radiotherapy. Dice similarity coefficient (DSC), average and 95th percentile Hausdorff distance (HD avg , HD 95 ) of the contours on CBCT LD_GAN and CBCT LD_ResGAN were computed to determine the fidelity of the organ shape in the network output, using CBCT org as ground truth.

Model selection based on ensemble validation
The model of epoch 50 for CBCT LD_GAN and the model of epoch 60 for CBCT LD_ResGAN which had the lowest MAE and ME and high soft-tissue geometric fidelity upon visual inspection of the validation cases were selected. In figure 5, the output images from the four trained G pCT GAN and G pCT ResGAN are shown for an exemplary ensemble model validation patient (panel (a)-(d) and panel (g)-(j)), together with the calculated median images (panel (e) and (k)) and the pixel-wise difference between maximum and minimum HU values (panel (f) and (l)). For CBCT LD_GAN , deviations between the four different models were most pronounced at the edges of the bony anatomy, as well as at the patient body outline. We also observed variations in the bowels with occasional generation of air pockets (panel (c)). For CBCT LD_ResGAN , deviations were generally less pronounced as in CBCT LD_GAN , and no random large air pocket was generated. In the following analysis, only the median images were considered.

Image analysis
We evaluated CBCT LD_GAN and CBCT LD_ResGAN on eight test patients. CBCT images of test patient 36 and their HU differences are shown in figure 6. In CBCT LD (panel (c)), streaks and undersampling artifacts are clearly observed when compared to CBCT org (panel (f)). In panel (d) and (e), CBCT LD_GAN and CBCT LD_ResGAN have successfully removed these artifacts. Figure 6 also shows the HU differences of all CBCT results with respect to vCT. CBCT LD (panel (g)) and CBCT org (panel (j)) show larger underestimated regions and larger overestimated regions, as well as pronounced deviations in the bony structures. As seen from the reduced differences to vCT, CBCT LD_GAN (panel (h)) and CBCT LD_ResGAN (panel (i)) improved image intensities compared to CBCT org . The remaining differences between CBCT LD_GAN and CBCT LD_ResGAN with respect to vCT are observed at the patient body outline and bone interfaces. In addition, figure 6 also shows the HU differences of all CBCT results with respect to CBCT cor . All HU differences to CBCT cor are similar to the differences to vCT but with remaining increased noise.
To quantify the image quality, we computed the average ME and MAE in HU of CBCT LD_GAN , CBCT LD_ResGAN and CBCT LD compared to vCT for training, validation and test patients as shown in figure 7. In panels (a) to (c), the ME of CBCT LD had positive values in almost all patients while CBCT LD_GAN had negative values in the majority of datasets. CBCT LD_ResGAN had slightly more negative values than positive ones. The MEs of all datasets were comparable within the correction method. In panels (d) to (f), CBCT LD_GAN and CBCT LD_ResGAN showed a substantially reduced MAE for all datasets compared to CBCT LD . Table 1 reports the quantitative results in terms of the average ME and MAE of all patient images in training, validation and testing datasets. For the testing datasets, the average ME changed from 20 HU for CBCT LD to −6 HU for CBCT LD_GAN and −2 HU for CBCT LD_ResGAN . The average MAE reduced from 126 HU for CBCT LD to 55 HU for CBCT LD_GAN and 44 HU for CBCT LD_ResGAN .

Dosmetric analysis
The quantitative results of the dose difference analysis of the VMAT plans comparing CBCT LD_GAN and CBCT LD_ResGAN to vCT are given in table 2 for all test datasets and the investigated dose difference (DD) levels. For CBCT LD_GAN , the average 1% DD pass-rate was 95.9%, with a value range from 87.3% to 98.7%. For CBCT LD_ResGAN , the average 1% DD pass-rate was 97.0%, with a value range from 92.0% to 98.6%. This shows that a high agreement of CBCT LD_GAN and CBCT LD_ResGAN to the reference vCT was found. In addition, the average 1% DD pass-rate comparing vCT to CBCT cor for all test datasets was 98.4%, indicating excellent dosimetric agreement between the two benchmark images. The dose distribution and difference of test patient 38 are depicted in figure 8. Only minor dose differences in the planning target volume (PTV) region between CBCT LD_GAN , CBCT LD_ResGAN and vCT were found. The dose difference for CBCT LD_ResGAN has smaller magnitude than for CBCT LD_GAN . Figure 9 shows target and OAR DVH parameter differences with respect to vCT as boxplots over all patients. For most of the considered parameters in both CBCT LD_GAN and CBCT LD_ResGAN , differences were within 1.5 Gy for dose DVH parameters (D x ) and below 1.5% for volume DVH parameters (V x ). All deviations were  below 2 Gy/2%. Particularly in the target DVH comparison, the median differences of D 98% , D 50% and D 2% comparing CBCT LD_GAN with respect to vCT were 0.3%, 0.3% and 0.3% for the PTV. In CBCT LD_ResGAN , the median differences of D 98% , D 50% and D 2% with respect to vCT were 0.4%, 0.3% and 0.4% for the PTV.

Positioning accuracy
With respect to CBCT org -to-pCT, the mean absolute difference of rigid transformation parameters were 0.07 mm (right-left) (RL), 0.05 mm (inferior-superior) (IS), 0.01 mm (posterior-anterior) (PA), 0.17°(pitch), 0.15°(roll) and 0.24°(yaw) for CBCT LD_GAN -to-pCT, and similarly, the mean absolute differences were 0.03 mm (RL), 0.05 mm (IS), 0.04 mm (PA), 0.16°(pitch), 0.19°(roll) and 0.26°(yaw) for CBCT LD_ResGAN -to-pCT. The majority of differences were thus less than 0.20 mm or 0.20°, except the pitch of patient 34 was 0.32°for CBCT LD_GAN , the yaw of patient 38 was 0.82°for CBCT LD_GAN and 0.77°for CBCT LD_ResGAN , the roll of patient 39 was −0.60°for CBCT LD_GAN and −0.79°for CBCT LD_ResGAN , the yaw of  3.6. Anatomical fidelity As shown in table 3, the average DSC of bladder was 0.88 for CBCT LD_GAN and 0.92 for CBCT LD_ResGAN with respect to CBCT org . HD avg and HD 95 of bladder were 1.34 mm and 6.03 mm for CBCT LD_GAN , and 0.90 mm and 4.05 mm for CBCT LD_ResGAN . As shown in table 4, the average DSC of rectum was 0.77 for CBCT LD_GAN and 0.87 for CBCT LD_ResGAN with respect to CBCT org . HD avg and HD 95 of rectum were 1.93 mm and 6.43 mm for CBCT LD_GAN , and 1.05 mm and 3.89 mm for CBCT LD_ResGAN . In both bladder and rectum, CBCT LD_ResGAN had a higher DSC and lower HD avg and HD 95 than CBCT LD_GAN . In addition, bladder had generally higher DSC and lower HD than rectum in both CBCT LD_GAN and CBCT LD_ResGAN . Figure 10 illustrates that the contour of the rectum in CBCT LD_GAN (panel (b) and (e)) had a larger shape deviation than in CBCT LD_ResGAN (panel (c) and (f)) with respect to CBCT org (panel (a) and (d)) due to a small incorrect air pocket generated, which would also be contoured as part of the rectum in clinical practice.

Discussion
The daily use of CBCT imaging during a fractionated radiotherapy course could deliver a considerable amount of radiation dose to patients. Due to the insufficient image quality, CBCT also cannot be used for daily dose calculation and adaptation. To address these problems, our study aimed at addressing dose reduction and intensity correction simultaneously. We generated synthetic low dose CBCT LD to train two cycleGAN architectures to tackle the tasks of (1) removing the under-sampling artifacts and (2) correcting the intensity of CBCT LD , and evaluated both approaches on a cohort of prostate cancer patients. The key finding of this study is that it was possible to reduce the CBCT imaging dose by 75% and enable VMAT dose calculation accurately with the use of cycleGAN.
To obtain CBCT LD , the number of projections was subsampled by a factor of four, which led to severe streaking in the reconstructed images. The proposed CBCT LD_GAN and CBCT LD_ResGAN techniques successfully removed all streak artifacts, by training the generators G pCT to map the CBCT LD input to the pCT domain which has no under-sampling noise. In addition, the cycle consistency loss regularized the body structures between CBCT LD and CBCT LD_GAN , and between CBCT LD and CBCT LD_ResGAN . The hyperparameter λ 1 was increased from a default value of 10 to 25, as the relative importance of preserving the anatomical content in the loss function was previously demonstrated in Kurz et al (2019) and confirmed in our study. Furthermore, the shape loss was added to incorporate patient body outline information as suggested in Ge et al (2019). The hyperparameter λ 2 was adjusted from a default value of 10 to 1 for CBCT LD_GAN . Compared to the default value 10, the smaller λ 2 tends to output soft tissue and organs with more correct shapes in our experiments. For CBCT LD_GAN , λ 2 of 1 was empirically found beneficial in comparison to using no shape loss as shown in the supplementary figure S1. For CBCT LD_ResGAN , λ 2 of 0 gives the least variation in the min-max plots and thus a higher stability of the model outputs, as shown in supplementary figure 2.
Compared to previous unpaired CBCT-to-CT correction works using cycleGAN in pelvic scans, our model has achieved a slightly higher MAE reduction. This could be explained by the fact that the input CBCT LD has more noise than the usual standard full dose CBCT input in other studies. The MAE in comparison to vCT was substantially reduced from 126 HU for CBCT LD to 55 HU for CBCT LD_GAN and to 44 HU for CBCT LD_ResGAN . Liu et al (2022) proposed a two-step method with phantom-based and patient-based models, and reduced MAE of well-matched slices from 67 to 32 HU with respect to a deformably registered reference CT. In Deng et al (2022) (Kurz et al 2016) as reference, which has higher anatomical fidelity to CBCT org but more noise than vCT.
In terms of dose calculation accuracy, good results were achieved for VMAT when comparing CBCT LD_GAN and CBCT LD_ResGAN to vCT. For a 2% dose difference criterion, a mean pass-rate of 99% was determined for the test patients for both proposed approaches. Despite the additional under-sampling artifacts in the low dose CBCT input, the CBCT LD_GAN and CBCT LD_ResGAN dosimetric results are still comparable to the previous work by Kurz et al (2019) which used a fully sampled prostate dataset with a similar cycleGAN architecture (without shape loss or a generator residual connection). In line with this, for most cases a very good agreement of CBCT LD_GAN and CBCT LD_ResGAN with respect to vCT in terms of clinically relevant DVH parameters was achieved. For VMAT, a trend of marginally overestimated doses on CBCT LD_GAN and CBCT LD_ResGAN was found in the target structures and OARs, with deviations below 1 Gy for dose DVH parameters (D x ) and below 1.5% for volume DVH parameters (V x ) for 7 out of 8 test cases.
In order to investigate the anatomical fidelity, two OARs in the network-generated images were contoured and compared to a ground truth contour on CBCT org . The DSC in rectum was lower than in bladder, possibly due to the higher variability of the rectum shape and the random natural occurrence of air pockets in the rectum. In addition, it is more difficult to segment the rectum, thus increasing the uncertainties for rectum contours. It is notable that CBCT LD_ResGAN still yielded generally higher DSC and lower HD avg and HD 95 than CBCT LD_GAN in the two OARs. This demonstrated that CBCT LD_ResGAN can achieve improved geometrical accuracy, and indicated a positive effect from a generator residual connection.
While having high treatment dose calculation accuracy and enhanced anatomical fidelity, the proposed low dose CBCT techniques could deliver at least 75% lower dose in a pelvic scan. To estimate the reduced patient dose, we have chosen the cone beam dose index (CBDI) value which provides a single number that represents the mean volumetric dose in the CT dose index (CTDI) phantom as reported in (Hyer and Hintenlang 2010). They reported a CBDI value (table 2 in Hyer and Hintenlang (2010), chest protocol) for the same configuration as our protocol (M20 protocol with 120 kV and a bowtie filter at an Elekta XVI scanner) of 1.62 mGy/100 mAs. By selecting only 90 out of 350 projection frames, our CBCT LD has thus reduced the patient dose from 2.27 to 0.57 mGy (from a total exposure of 140 mAs to 36 mAs) per scan. For reference, another Elekta XVI CBCT-to-CT work using cycleGAN with a regular full dose scan in prostate cancer reported a total exposure of 288 mAs without providing complete acquisition details such as kV collimator type or the use of a bowtie filter (Kida et al 2019). In a recent deep learning CBCT low-dose study using a U-Net, Yuan et al (2020) used a clinical HN protocol with 182 projections over 205°, which would correspond to 319 projections over 360°, and thus to a considerably higher sampling rate than our approach by a factor of 3.5.
The computational time of the investigated low dose CBCT techniques for correcting a 3D pelvic scan per patient was shorter when compared to vCT or the projection-based scatter correction approach CBCT cor in Kurz et al (2016), which have correction times in the order of 6-10 min per patient. The correction time per slice of 10 ms in CBCT LD_GAN or CBCT LD_ResGAN is identical to the other prostate CBCT-to-CT works by Landry et al (2019) using a U-Net, and by Kurz et al (2019) using a similar cycleGAN. It should be noted that there are also iterative reconstruction works using compressed sensing, e.g. in Choi et al (2010), Lee et al (2012) and Park et al (2012) or total variation in Song et al (2014) to remove under-sampling artefacts in CBCT images. However, one more prior scatter correction step would be required to convert the CBCT image intensities to CT diagnostic intensities. Since the proposed CBCT LD_GAN or CBCT LD_ResGAN techniques allow fast image correction within 2 s per patient (195 slices), they have the potential to be applied for CBCT-based online treatment plan adaptation.
There are some limitations in this study. First, the evaluation of the HU and dose calculation accuracy rely on vCT. The advantage of using vCT as a reference is that it has correct intensity and ideally identical anatomy to CBCT LD . However, vCT might not be a perfect ground truth due to uncertainties in DIR. This might be one of the potential causes for the small deviation found in the patient body outline in figure 6 panel (h) and (i), and in the dose difference maps in figure 8 panel (c) and (e). This is also the reason why we compared the network results with an alternative ground truth CBCT cor for inspecting the deviations that might have been caused by the DIR uncertainties. As shown in (figure 6 panel (l) and (m)), similar deviations in the patient body outline were also found in the comparison to CBCT cor , which implies that the uncertainties in DIR did not affect HU accuracy analysis. In addition, the average 1% DD pass-rate comparing vCT to CBCT cor was 98.4% as reported in section 3.4, which also implies that employing either vCT or CBCT cor as ground truth has only minimal impact on the dosimetric comparison for the network results.
Second, it is observed that the prediction from some single models before ensembling can be geometrically unstable, especially for CBCT LD_GAN . Our approach is to stabilize the output by taking the median of the 4 model outputs. Yet this does not control variability of each individual model. In CBCT LD_ResGAN , the variability has been reduced due to the generator residual connection.
In future work, we would like to investigate the feasibility of further reducing CBCT dose and explore undersampling schemes that might provide the opportunity to selectively avoid irradiating critical organs. Besides, we would extend the proposed low dose CBCT imaging technique to other anatomical locations.

Conclusion
This study showed that it is possible to reduce the CBCT imaging dose by 75% in pelvic scans while enabling accurate VMAT dose calculation with the use of a cycle-consistent generative adversarial network. The network was successfully trained to simultaneously remove streaking artifacts and translate low dose CBCT LD to CT equivalent images using unpaired training data. The resulting low dose CBCT LD_GAN and CBCT LD_ResGAN images resemble planning CTs in HU accuracy and the daily in-room CBCT org in anatomy. Clinically relevant DVH parameters were accurately predicted. CBCT LD_ResGAN has improved the anatomical fidelity in comparison to CBCT LD_GAN . Compared to the reference technique (vCT), CBCT LD_GAN and CBCT LD_ResGAN , which allow substantially faster correction and are not affected by DIR uncertainties in the presence of pronounced inter-fractional changes, have thus the potential to be applied for online treatment adaptation.