Penalty weight tuning in high dose rate brachytherapy using multi-objective Bayesian optimization

Objective. Treatment plan optimization in high dose rate brachytherapy often requires manual fine-tuning of penalty weights for each objective, which can be time-consuming and dependent on the planner's experience. To automate this process, this study used a multi-criteria approach called multi-objective Bayesian optimization with q-noisy expected hypervolume improvement as its acquisition function (MOBO-qNEHVI). Approach. The treatment plans of 13 prostate cancer patients were retrospectively imported to a research treatment planning system, RapidBrachyMTPS, where fast mixed integer optimization (FMIO) performs dwell time optimization given a set of penalty weights to deliver 15 Gy to the target volume. MOBO-qNEHVI was used to find patient-specific Pareto optimal penalty weight vectors that yield clinically acceptable dose volume histogram metrics. The relationship between the number of MOBO-qNEHVI iterations and the number of clinically acceptable plans per patient (acceptance rate) was investigated. The performance time was obtained for various parameter configurations. Main results. MOBO-qNEHVI found clinically acceptable treatment plans for all patients. With increasing the number of MOBO-qNEHVI iterations, the acceptance rate grew logarithmically while the performance time grew exponentially. Fixing the penalty weight of the tumour volume to maximum value, adding the target dose as a parameter, initiating MOBO-qNEHVI with 25 parallel sampling of FMIO, and running 6 MOBO-qNEHVI iterations found solutions that delivered 15 Gy to the hottest 95% of the clinical target volume while respecting the dose constraints to the organs at risk. The average acceptance rate for each patient was 89.74% ± 8.11%, and performance time was 66.6 ± 12.6 s. The initiation took 22.47 ± 7.57 s, and each iteration took 7.35 ± 2.45 s to find one Pareto solution.Significance. MOBO-qNEHVI combined with FMIO can automatically explore the trade-offs between treatment plan objectives in a patient specific manner within a minute. This approach can reduce the dependency of plan quality on planner’s experience and reduce dose to the organs at risk.


Introduction 1.High dose rate brachytherapy
Interstitial High dose rate (HDR) brachytherapy for the treatment of prostate cancer is an effective therapy delivered either as monotherapy or as a boost in combination with external beam radiotherapy (EBRT) (Zwahlen et al 2010, Ágoston et al 2011, Neviani et al 2011, Aluwini et al 2012, Hoskin et al 2012, 2021, Kaprealian et al 2012, Wojcieszek and Białas 2012, Morton 2014, Spratt et al 2014, Chin et al 2017, Mendez & Morton 2018, Morton and Alrashidi 2020).A recent randomized trial of EBRT conducted alone and combined with HDR brachytherapy concluded that at 12 years, there remains a significant improvement in relapse-free survival after EBRT+HDR, with both modalities being equitoxic for severe late urinary and bowel events and urethral stricture (Hoskin et al 2021).Several other studies have reported dose to the urethra as the most important predictor of long-term genitourinary morbidity (Ishiyama et al 2009, Hsu et al 2010, Morton et al 2011).The low toxicity and control rate for HDR brachytherapy monotherapy delivered in a fraction of 19 Gy suggests that dose escalation may be beneficial (Mendez & Morton 2018).However, dose escalation to 21 Gy is limited by urethral dose constraints.Despite its effectiveness, HDR brachytherapy is more complex and timeconsuming than EBRT alone (Dutta et al 2018).A part of this complexity is attributed to the treatment plan optimization process (Cui et al 2018b, Bélanger et al 2019, Breedveld et al 2019, Shen et al 2019, 2020, van der Meer et al 2020, Morén et al 2021, Pu et al 2022).

Inverse treatment plan optimization
Inverse treatment plan optimization algorithms in HDR brachytherapy aim to create effective treatment plans by optimizing dwell times.These algorithms use contoured patient images to set constraints on tumour dose and organ at risk (OAR) dose.The goal is to find dwell times that meet these constraints, guided by objectives defined by dose-volume histograms (DVHs).It is common to start the dwell time optimization using a template plan to speed up this process.However, in cases where the template plan is far from an acceptable plan for a patient, manual fine-tuning of the treatment plan can take time for minimal improvements in the plan quality.The process is more difficult in the cases where the catheter configuration is not ideal due to low number of catheters or catheter deviation in the tissue.Accordingly, dwell time optimization is relevant as the last resort to ensure a successful plan generation for the patient.
The optimization algorithms currently used in the clinical treatment planning systems are inverse planning simulated annealing (IPSA) (Lessard and Pouliot 2001) and hybrid inverse planning and optimization algorithms (Baltas et al 2009).While the clinical optimization algorithms have been effective, they have two main limitations.First, these algorithms use linear penalty functions, resulting in fewer and longer dwell times compared to forward planning (Chajon et al 2007, Holm et al 2012, Morén et al 2019), where the planner manually selects the dwell times.Second, the penalty weights of the objective function have to be manually selected subjectively.More recently, volumetric evolutionary optimization algorithm (VEGO) was introduced in BrachyVision clinical TPS (BrachyVision algorithms Reference Guide, v.17.0 Varian Inc., Palo Alto, CA), to incorporate penalty terms for DVH metrics, dwell time smoothness between neighbouring dwell positions and hot spots.The weights of these penalty terms need to be selected manually by the planner.Antaki et al (2020) developed the fast mixed integer optimizer (FMIO), which uses mixed integer quadratic programming to find a guaranteed optimal solution to the dwell time optimization problem.In addition, FMIO benefits from a problem size reduction strategy that allows it to find the solution in 37 ± 18 s.FMIO uses linear and quadratic penalty functions.The linear penalty meets the DVH objective for the clinical treatment volume (CTV), while the quadratic penalty adjusts dwell times to minimize OAR doses and improve dose homogeneity.However, FMIO currently relies on the manual specification of penalty weights for the linear penalty in each DVH objective.Automating penalty weight tuning would transform FMIO into a fully automated dwell time optimization algorithm.
Previous studies have demonstrated the effectiveness of automatic penalty weight tuning in accelerating treatment plan optimization for HDR brachytherapy without a loss in treatment plan quality for prostate cancer patients (Cui et al 2018b, Bélanger et al 2019, Breedveld et al 2019) and cervical cancer patients (Shen et al 2019).However, penalty weight tuning based on a single objective (Shen et al 2019) or based on a population Pareto surface (Cui et al 2018a) limits the algorithm to a specific tumour site.Ideally, the algorithm should work for any cancer type.PNav was another algorithm that used an extensive database of treatment plans (11 250 cases) to navigate the Pareto optimal weight combinations neighbouring a solution generated by VEGO (Deufel et al 2020).The iCycle algorithm (Breedveld et al 2019) and the GPU-based multi-criteria optimization (gMCO) (Bélanger et al 2019) work on any cancer type given proper adjustments.However, iCycle requires the penalty function to be twice differentiable and continuous, which is not true for most clinical penalty functions (Breedveld et al 2019).gMCO requires graphical processing unit (GPU) and a solver algorithm that is compatible with GPU (Bélanger et al 2019).Fulfilling these requirements may not always be possible.

Bayesian optimization for penalty weight tuning
Bayesian optimization is a sampling-based method to efficiently find any black-box function's local maxima or minima, regardless of convexity or continuity.It has been widely used in machine learning for hyper-parameter tuning of neural networks.A recent study showed the effectiveness of Bayesian optimization in fine-tuning target doses in EBRT (Maass et al 2022).Multi-objective Bayesian optimization (MOBO) aims to explore the trade-offs between maximizing multiple objectives simultaneously.Theoretically, MOBO-qNEHVI can provide penalty weights on the Pareto surface between DVH objectives in HDR brachytherapy treatment planning.MOBO-qNEHVI does not rely on a training data set and models each treatment plan as an independent Gaussian process, making it a general algorithm applicable to various tumour sites and capable of providing patient-specific solutions.In addition, MOBO-qNEHVI is compatible with clinical penalty functions and does not require GPU, making it a practical choice for weight tuning in clinical practice.

Aim of this study
This study aimed to investigate MOBO-qNEHVI combined with FMIO for automatic weight tuning since MOBO-qNEHVI can provide penalty weights on the Pareto surface between the DVH objectives in HDR brachytherapy.

Materials and methods
This study investigates the weight tuning problem in HDR brachytherapy using MOBO with q-expected hypervolume improvement (qEHVI) as its acquisition function (Daulton et al 2021).However, before describing the methods, brief descriptions of MOBO and qNEHVI are provided in the following section.

MOBO
MOBO is an extension of single objective Bayesian optimization (SOBO), which is a sequential probabilistic approach to efficiently obtain the local maxima or minima of any black-box function, f (x), with minimum sampling.Since consequent function calls to f (x) are expensive, SOBO finds the maxima of a surrogate function,  is usually a probabilistic Gaussian process representing a posterior distribution given the true function evaluation data x f x , The more data is obtained from f (x), the more accurately the posterior (or surrogate) resembles f (x) at the cost of a high computation time.To ensure efficient sampling of f (x), an acquisition function ( ) is formulated.This acquisition function aims to determine the next sampling point to provide the most information about the maxima and minima of f (x).The inputs of α are a set of candidate points χ cand = {x i }; α feeds these points to the posterior and ranks their performance.Next, f (x) evaluates the highest-ranking candidate points.After evaluation, {x i , f (x i )} are added to , and the posterior is updated (figure 1(b)).The advantage of SOBO is that ranking χ cand using posterior-based α is computationally cheaper than calling f (χ cand ).The posterior distribution is generated by calculating the mean and standard deviation of the observed data and fitting a Gaussian function to these parameters.The efficiency and accuracy of the Bayesian optimization depend on how good the acquisition function is.If the gradient of α is available, gradient-based optimization of the acquisition function can speed up the process greatly.Greenhill et al (2020) summarizes an extensive analysis of various acquisition functions.
MOBO aims to explore the trade-offs between maximizing multiple objectives simultaneously.In MOBO, the black-box function outputs a vector Accordingly, the posterior is a multi-variate Gaussian distribution, which is the superposition of m single-valued Gaussian shows Pareto surface for 2 arbitrary objectives g 1 (w) and g 2 (w).Point A is not Pareto optimal on this graph since objective g 1 (w) can be improved without compromising g 2 (w).In contrast, point B is Pareto optimal because any improvement in g 1 (w) or g 2 (w) leads to a decline in the other objective.(b) Steps in a loop of Bayesian optimization.(c) HVI for w * and w'.Note HVI of w * (green area) is larger than the HVI of w¢ (orange area) with respect to the previous Pareto surface (gray area).Therefore, qNEHVI picks w * as the candidate point for the next iteration. processes: : th element in the output vector of the black box function : posterior distribution over , given : , The data from sampling at points. 1 2.2.q-NEHIV Finding a proper acquisition function for the multi-objective case is more complicated.A plausible acquisition function for MOBO is expected hyper-volume improvement (EHVI).However, with increasing number of objectives, the hyper-volume and the time complexity of EHVI grow exponentially.Daulton et al (2020) implemented EHVI in parallel and used a gradient-based optimization on this acquisition function to introduce qEHVI.In a subsequent paper, the authors adapted qEHVI to handle noisy Pareto surfaces to obtain q-noisy EHVI (q-NEHVI) (Daulton et al 2021).The authors demonstrated that optimizing qNEHVI is faster than other acquisition functions in MOBO.In addition, they extended qNEHVI to support outcome constraints, which is desired in many practical applications of MOBO.Although the reader is encouraged to refer to the qNEHVI paper (Daulton et al 2020(Daulton et al , 2021)), we briefly explain how qNEHVI works.
The purpose of the acquisition function, α, is to suggest the next sampling points X * for the black-box function based on the current state of the surrogate function,  f ( | )  as shown in equation (2).These points are expected to be either on the true Pareto surface of the black-box function or to be as close as possible to the Pareto surface.qEHVI ranks the candidate sampling points (χ cand ) by calculating the expected hyper-volume improvement that these points have in the surrogate function with respect to previously obtained data points () (3).qNEHVI is calculated using box decomposition in a parallel fashion, where the joint improvement in the hyper-volume from the points in a batch is the union of the HVI of each point (equation (4), and figure 1(c)).For each x i in the set χ j , the hyper-volume improvement is defined as the union of hyper-volume expansion, which is the distance between the value of the surrogate function at x i and the current Pareto surface  obtained from  as presented by the equation (5), where r is a reference point, which is a feasible desired output of the black-box function.Calculating box decomposition for each batch is computationally expensive, which limits the number of parallel batches.However, Daulton et al (2021) proposed cached box decomposition to avoid repeated box decomposition calculations for subsequent iterations of qNEHVI (Daulton et al 2021)

X
arg max 2 the super set of all subsets of HVI 1 HVI 3 , number of objectives, volume of the M dimensional hyper rectangle , , HV , HV , .5 In this study, the black-box function is the dwell time optimization obtained with FMIO (Antaki et al 2020).Prostate cancer is chosen as the cancer site.However, in theory, this method applies to any other cancer type.For prostate cancer, the variables are the four penalty weights, and the outputs are the three DVH objectives (6).The reference point is the clinical thresholds to be achieved (7).Before applying MOBO, we examined the numerical range of the weights for each objective and the relationship of the weights with respect to one another by implementing a simple heuristic called semi-continuous interval shrinkage exploration (SCISE).After understanding the nature of the weights, we used MOBO-qNEHVI on FMIO and fine-tuned its parameters to achieve optimal performance.SCISE and MOBO-qNEHVI were tested on 13 treatment plans using Intel Core i7-8700 CPU and 16 Gigabyte RAM  A1.
For optimization, the voxel size was set to 3 mm 3 , where the dose per dwell position was calculated.For the final dose calculation, where the dose for the optimized treatment plan with all the optimized dwell times was calculated, the voxel size was set to 1 mm 3 .The DVH metrics are evaluated based on the final dose map with 1 mm 3 voxel size.A total of 10 6 decay events were simulated per dwell position to obtain type A uncertainties less than 0.3% in the 3 mm 3 voxels within the CTV.At 1 mm 3 resolution, 5 × 10 7 histories were simulated per dwell position to obtain maximum uncertainty of 0.28% ± 0.05% in the CTV voxels, averaged over all patients.The uncertainty analysis was done using an in-house software called BrachyUtils and the external software DicomRTTools (Anderson et al 2021).Details regarding the Monte Carlo simulations are summarized in appendix table A2 according to the recommendations from the AAPM task group 268 (TG-268) (Sechopoulos et al 2018).These dose-rate maps were imported into RapidBrachyMCTPS, where one could manually set the penalty weights to a default value and run FMIO.However, manual adjustment of the penalty weights is not an efficient way to optimize the treatment plans.Each patient had 17 catheters, and the inter-dwell distances on each catheter was 5mm.The average volume of the CTV was 44.66 ± 11.63 cm 3 .The average number of dose points per patient and dwell positions for CTV and OAR are presented in table 1.Overall, the range of the total number of dose points was between 1793 to 4531, with an average of 2841 dose points.

FMIO as a function of penalty weights
To automate penalty weight tuning, the dwell time optimizer algorithm must be automatically called with various weights as its input.Accordingly, FMIO was reformulated as a callable function of penalty weights.In RapidBrachyMCTPS, FMIO formulates the optimization problem as a Mixed Integer Optimization model and passes it to Gurobi software, which is a general-purpose optimizer (Gurobi Optimization, LLC 2022).Gurobi solves the dwell time optimization problem and returns the ideal dwell times.Therefore, the patient geometry, dwell positions, target dose for each structure and penalty weights exist in the Gurobi model.For each voxel, FMIO has a linear penalty and a quadratic penalty.The penalty weights fine-tuned in this study belonged to the linear portion of the penalty function.The weights of the quadratic penalty were set to the default value of 1 to isolate the impact of the linear penalty weights on plan quality.The target dose for the CTV voxels was 15 Gy, which is required by the ESTRO-ACROP guidelines (Henry et al 2022); and the target dose for all voxels in OAR was set to 0 Gy to minimize the dose to the OARs as much as possible.In addition, the target dose to the tumour was added as a tunable parameter ranging from 15 Gy to 16.5 Gy (10% higher dose).To run FMIO with various weights, the model was extracted from RapidBrachyMCTPS as a Mathematical Programming System (MPS) file.Using the Gurobi Python API, the MPS files were automatically loaded, the penalty weights were modified, the dwell times were optimized, and the DVH metrics were calculated without manually interacting with the TPS.The whole process was setup as the single callable function in equation (6).After finding the optimal treatment plans, they were imported into RapidBrachyMCTPS and final DVH metrics were calculated with a resolution of 1 mm 3 .
2.5.Semi-continuous interval shrinkage exploration SCISE was developed to break the continuous range of the penalty weights into a discrete set.Then, it ran FMIO on all the possible combinations of penalty weight in the set.The penalty weight for each of the three objectives can be any real number between 1 and 1000.SCISE uses an interval shrinkage strategy to explore this interval while maintaining a resolution of 12.5.
The idea behind interval shrinkage was to reduce the number of options at each set, find the optimal weight and zoom in on the solution.In this context, zooming refers to using the neighbours of the solution as the new Repeating the steps above in 5 iterations allowed SCISE to reduce the step size from 200 to 12.5.Appendix figure A1 illustrates this algorithm over 3 iterations.The total number of FMIO calls was 716.We used the multiprocessing Python package to run SCISE on 12 CPUs in parallel.This way, the average computation time was 4.65 ± 1.80 min.The acceptance rate for SCISE was defined as the number of clinically acceptable solutions over the total number of calls to FMIO multiplied by 100.In addition, the penalty weights that passed the ESTRO-ACROP DVH thresholds were isolated.Then, the average coefficient of variation, which is the standard deviation over the mean value, was calculated for the weights of each structure and their corresponding DVH metrics.This analysis allowed for the characterization of the neighbourhood of the search space where clinical penalty weights could be found.

MOBO-qNEHVI on FMIO
MOBO-qNEHVI was implemented by Daulton et al (2021) in Ax API (Bakshy et al 2018).To interface with the AX API, we instantiated an AxClient object, then defined the number and the range of the parameters (penalty weights) and the number and the threshold of the objectives (DVH metrics).MOBO-qNEHVI works iteratively.At each iteration, it runs FMIO on the penalty weight combinations that qEHVI has proposed in the previous iteration.Data from each call to FMIO is used to adjust the Gaussian surrogate function to model the treatment plan better.Therefore, by increasing the number of iterations, MOBO-qNEHVI can capture the patient-specific Pareto front more accurately.However, running MOBO-qNEHVI on more iterations is time-consuming.Therefore, it is essential to know the minimum number of iterations that lead to a sufficiently accurate Pareto surface in a clinically acceptable time.Accordingly, we tested various parameter configurations and iteration numbers on the performance of MOBO.
The AxClient was tested on four parameter configurations.First, all four penalty weights ranged from 1 to 1000 (R1).Second, the weight of the CTV was fixed to 1000, and the other weights varied from 1 to 1000 (R2).Third, the weight of the CTV was fixed at 1000, but the range of the weights of the OARs varied from 1 to 500 (R3).Finally, we added the target dose as a parameter to range between the clinical value to 10% higher than the clinical target value (R4).The decision to add the target dose as a parameter was inspired by the findings of Maass et al (2022), where they concluded that the dose thresholds are more important than the penalty weights in controlling the DVH metrics of EBRT cases.We tested which of these parameter configurations allowed MOBO-qNEHVI to find more clinically acceptable plans with a lower number of iterations.ESTRO-ACROP thresholds were used as the reference point to guide MOBO-qNEHVI toward clinically acceptable solutions.To find the minimum number of iterations that MOBO-qNEHVI needs to capture the Pareto surface, we ran MOBO-qNEHVI for 5, 10, 20, 30, 40, 50, and 60 iterations.A summary of the various configurations above is presented in table 2.
The time complexity for the number of iterations was calculated throughout the experiments above.After the iterations are completed, the fraction of the solutions that passed the ESTRO-ACROP requirements was taken as the acceptance rate (equation ( 8)).We also implemented parallel random sampling to initialize the surrogate function before using MOBO-qNEHVI.This way, MOBO-qNEHVI does not spend time reconstructing the surrogate functions and can immediately focus on finding the Pareto surface.The average time of the initialization per patient was measured.To assess the Pareto front, we plotted the DVH metrics of each OAR (y-axis) versus the DVH metric of CTV (x-axis).As presented in the results section, 25 parallel initialization and 6 MOBO-qNEHVI iterations with R4 parameter configuration yield a sufficiently high acceptance rate.The same configuration was used to obtain plans with D 95% (CTV) greater than 15 Gy while respecting ESTRO-ACROP thresholds to the OARs.While treatment plan optimization was done with 3 mm 3 voxels, the final plan evaluation was done on 1 mm 3 voxels.The catheter configuration and the dose distribution from the delivered clinical plans and MOBO-qNEHVI plans with maximum D95%(CTV) were visualized in the transverse plane for each patient.In addition to the DVH metrics, we quantified the distribution of the dwell times along the catheters for each patient in both delivered clinical plans and the plans suggest by MOBO.

Results
3.1.SCISE on FMIO SCISE was used to run FMIO on 716 weight combinations on every patient.Although SCISE was able to find some solutions that passed the ESTRO-ACROP requirements, this algorithm had an average acceptance rate of 36.37% ± 21.93%, which was not effective on most of the patients, as shown in table 3.In addition, the execution time of SCISE was too long to be used in the clinic (4.65 ± 1.80 min).After analyzing the data from SCISE on the linear part of FMIO, we show that it is not the exact values of the linear penalty weights that impact the DVH objectives but the relative value of the weights with respect to one another (appendix table A3).In the set of 536 clinically acceptable solutions from all patients, we observe that the mean penalty weight of the CTV is 852.90 ± 189.17, which is much larger than the weight of the urethra 63.54 ± 102.70 and rectum 55.47 ± 61.93 on average (figure 2).The average coefficient of variation (σ/μ) across the penalty weights of OARs was 1.366, and σ/μ across the 3 DVH metrics was 0.0798.

MOBO-qNEHVI on FMIO
MOBO-qEHVI was promising since the algorithm was able to find ESTRO-ACROP solutions within 20 MOBO-qNEHVI iterations on all patients in less than 3 min using the basic parameter configuration (R1 in table 2).As opposed to SCISE where the acceptance rate was measured with respect to D90%(CTV), MOBO-qNEHVI was tasked with passing the D95%(CTV) threshold.As shown in figure 3, the time complexity grows super-linearly with increasing the number of iterations, while the growth in acceptance rate is logarithmic.Running MOBO-qNEHVI for more than 40 iterations took nearly 500 s and resulted in less than 5% improvement in acceptance rate.This suggests that running MOBO-qNEHVI for more than 40 iterations does not add much value to the quality of the penalty weight configurations, but it costs time.
The configuration that had the highest acceptance rate in the lowest number of iterations was the one where the weight of CTV was set to 1000, the range of OAR weights varied from 1 to 500, and the target dose was added as a parameter (R4 in table 2).At 20 iterations, the R4 configuration improved the acceptance rate by 68% ± 17% compared to R1 and 55% ± 30% compared to R3.This improvement can be mainly attributed to fixing the penalty weight of CTV to 1000 and adding the target dose as a modifiable parameter.With respect to   the ESTRO-ACROP DVH objective thresholds, R4 had a mean acceptance rate of 72.31% ± 15.89% and took 131.68 ± 17.73 s in 20 iterations.
Initializing the surrogate Gaussian functions with 25 random parallel sampling of FMIO followed by only 6 iterations MOBO-qNEHVI (31 FMIO calls in total) increased the average acceptance rate to 89.74% ± 8.11% in 1.11 ± 0.21 min on average for all patients.The 25 parallel quadratic FMIO calls took 22.47 ± 7.57 s on average, and the rest of the 6 MOBO-qNEHVI iterations took 44.13 ± 13.8 s.Therefore, each MOBO-qNEHVI iteration took only 7.3 s to find a solution.This experiment showed that parallel initialization increased the acceptance rate by 23.84% and drastically reduced the performance time.The results are presented in Table 4.The Pareto fronts for all the patients are presented in the appendix figures A2, and A3.Notably, patient 7 was a difficult case for the SCISE algorithm as its acceptance rate was only 1.6%.On the other hand, MOBO-qNEHVI found ESTRO-ACROP plans with a rate of 100% at 3 mm resolution (Figure 4).In MOBO-qNEHVI plans, visualizing the dwell time distributions for equidistant dwell positions with respect to the catheter tips showed that the dwell positions nearing the edges of the catheter have irregular time distributions while the central dwell positions are more stable and follow the same pattern as the delivered clinical plans (Figures A4 and A5).As shown in Figures A6, the coefficient of variation in the distribution of the delivered clinical plans was 0.80 ± 0.10, which was lower than that of the plans generated by .The catheter configuration and the dose distribution in the transverse plane are shown in Figures A7 and A8.

Discussion
While SCISE could find clinically acceptable plans for all patients, its performance was highly variable across the patients.This variability can be attributed to limited resolution in weight tuning, diverse patient anatomy and catheter placements.The finest weight resolution of SCISE was 12.5/1000, but MOBO-qNEHVI could fine-tune penalty weights down to 16 decimal points.Another limitation of SCISE was that the optimization moved toward the weight vectors with the highest value for D90%(CTV).This configuration was necessary to ensure the delivery of a high enough dose to the CTV.But often, it resulted in overdosing the urethra.The last insight from SCISE was that the coefficient of variation was much larger for the penalty weights of the OARs than the DVH metrics.This means that even if one of the penalty weights were doubled or halved, the DVH would not change substantially.Therefore, the gradient of the DVH metrics with respect to the penalty weights is very small, which can cause the human planner to spend a long time changing the weights for minimal improvement in the DVH metric quality of a treatment plan.
Two search algorithms similar to SCISE are employed in the treatment plan optimization of EBRT.Huang et al (2021) first projected the control parameter space onto the constraint Pareto surface, then used the bisection search to find solutions with clinically acceptable DVH metrics.The bisection search evaluates if the midpoint of an interval satisfies the user-defined requirements.If not, it breaks the interval into two smaller intervals and repeats the search on the smaller intervals (Huang et al 2021).In SISCE, the initial interval is broken down into 6 smaller intervals, and the search takes place in four dimensions.In the second algorithm, Maass et al used uniform random sampling to establish the relationship between penalty weight, target doses and DVH metrics in stereotactic radiotherapy (Maass et al 2022).In the same paper, the authors used the result of a uniform random sampler to better fine-tune their Bayesian optimization, which is similar to our strategy.In this study, MOBO-qNEHVI found solutions on the entire range of the DVH metrics (0-20 Gy).By manipulating the penalty weights' range, we could focus MOBO-qNEHVI to the clinically acceptable region of the DVH metric space.As shown in figure 3, fixing the weight of the CTV to 1000 improved the acceptance rate by 9.61% ± 22.19% at 20 iterations.Limiting the variation of OAR penalty weights from 1 to 500 did not significantly affect the acceptance rate (3.46% ± 33.61%).As we have seen from SCISE, the OAR weights of the clinical plans are often 2 orders of magnitude smaller than the weight of the CTV (table 2).Therefore, the effective search space is much smaller than the entire hyper-volume described by the range of the weights.Given that the reducing search space by 1/4 (half range for 2 OAR weights) did not impact the acceptance rate curve in figure 3, we conclude that MOBO-qNEHVI focuses only on the effective search space.On the other hand, adding the dose to the target as a parameter improved the acceptance rate by 68.08% ± 17.26% at 20 iterations, which agrees with the findings of Maass et al (2022).
The exponential growth in the performance time is due to the exhaustion of candidate points that can expand the hyper-volume significantly.Once the Pareto surface is found, qNEHVI must extensively sample the Gaussian surrogate function to find a candidate point that expands the hyper-volume.This results in qNEHVI taking longer and longer to converge after the saturation of the acceptance rate.Initializing the surrogate Gaussian function with 25 random FMIO calls before using qNEHVI allowed us to overcome the exponential growth in time and have an acceptance rate of nearly 90% with just 6 MOBO-qNEHVI iterations.Since the parallel initiation took 22.47 ± 7.57 s on average, the time for the 6 MOBO-qNEHVI iterations was 44.13 ± 14.7 s.Therefore, the average single MOBO-qNEHVI iteration took 7.35 ± 2.45 s as obtained by the calculations in appendix equations (E1) and (E2).It is important to note that the initial MOBO-qNEHVI iterations take less time than the later iterations.Therefore, reducing the number of MOBO-qNEHVI iterations will likely reduce the average time per single iteration.It is important to investigate further how increasing the number of objectives, adding more OARs or adding homogeneity requirements would impact the performance time of MOBO-qNEHVI.Also, catheter position configuration significantly impacts how easy it is to fulfill the DVH requirements.The success rate of random penalty weight tuning on FMIO (represented by the blue dots in figures A2 and A3) serves as an indicator of optimization complexity.From our small dataset, we observe that 4 out of 13 patients had straightforward optimization (attributed to high-quality catheter configuration), while the remaining nine patients required MOBO-qNEHVI to attain clinically acceptable DVH metrics.The catheter configuration is presented in figures A7 and A8.
Four recent algorithms perform automatic penalty weight optimization in HDR brachytherapy.The first one is the multi-criteria optimization developed by Cui et al (2018a).The authors adapted the sandwich algorithm from IMRT (Craft et al 2006) to find the Pareto surface for prostate cancer patients treated with HDR brachytherapy (Cui et al 2018a).This algorithm took 1.1 hours to converge.After mapping the Pareto surface for a large population of patients and isolating the clinical treatment plans, Cui et al (2018a) fitted a fourth-degree polynomial that related the DVH metrics to an ideal penalty weight.This regression model found a solution in less than 60 s.While this method is extremely fast and results in high-quality plans, it relies on a database of Pareto optimal treatment plans for each cancer type.
The second algorithm is the Erasmus iCycle software, which was first introduced for IMRT (Breedveld et al 2012) and then modified for HDR brachytherapy (Breedveld et al 2019).iCycle requires the objectives to be prioritized in a wish list (prioritized optimization).Then, it optimizes the most important objective, adds a slack variable to the optimal objective value and uses it as a constraint in optimizing the lower-priority objectives.Since constraints are non-negotiable, iCycle ensures that the most important objectives remain optimized as subsequent optimizations occur.iCycle produced a high-quality solution for every patient in less than 10 s, which was almost always (in 32/33 cases) preferred over the human-generated plan by the radiation oncologists.
In the same year as iCycle, the weight tuning policy network was introduced to fine-tune the penalty weights in a human-like manner (Shen et al 2019).The methods rely on a single objective reward function that combines the penalties from OARs.Accordingly, the network was not developed to find the Pareto solutions.Instead, it aimed to maximize the reward function in as few iterations as possible.It reduced the dose to the OARs by around 10% while maintaining a high dose to the CTV in a 5 min run time.Only one solution was generated, and unlike previous penalty weight tuning methods, the Pareto surface was not presented to the planner.
The last algorithm, GPU accelerated multi-criteria optimization (gMCO), was obtained by running thousands of dwell time optimizations in a GPU (Bélanger et al 2019).Instead of GPU-accelerated simulated annealing, which is based on IPSA, Bélanger et al (2019) found the quasi-Newton optimizer (gL-BFGS) to be a better solver for the gMCO.gMCO increased the fraction of clinically acceptable plans from 92.6% (humangenerate plans) to 99.8% of the test prostate patient cases.In addition, the fraction of the plans with V 100 above 95% was also improved by 27%.gMCO generated a pool of 1000 plans for each patient in 9.4 s, with a mean acceptance rate of 61.7% per patient concerning the radiation therapy oncology group thresholds (RTOG) (Hoskin et al 2012).The acceptance rate of gMCO per patient was 26.8% when looking for RTOG plans with the desired D 95% (CTV).In a recent study, 3 radiation oncologists were asked to choose between HDR brachytherapy prostate plans that gMCO and human planner generated.One radiation oncologist preferred gMCO plans in 19 out of 20 cases.The others preferred gMCO in 17/20 and 12/20 cases (Bélanger et al 2022).
In conclusion, multi-criteria optimization relies on a database of population Pareto optimal treatment plans for each cancer type.The weight tuning policy network (Shen et al 2019) and iCycle (Breedveld et al 2019) recommend a single optimal treatment plan instead of the trade-off solutions.In the weight tuning policy network, the reward function of the OARs is summed with a user-defined coefficient.These coefficients are unique to each cancer site.Similarly, the prioritized optimization approach that is used by iCycle needs the planner to set priorities to the OARs.In addition, iCycle is incompatible with most clinical and research objective functions as it strictly requires the penalty function to be twice differentiable and continuous.Lastly, gMCO finds patient-specific Pareto surfaces with the help of advanced GPU hardware, NVIDIA TITAN X (Pascal).All these algorithms find the Pareto surface with respect to the average dose to the organ and not the DVH metrics.On the other hand, MOBO-qNEHVI finds the patient-specific Pareto surface without any userdefined or population-based parameters, unlike the weight tuning policy network, multi-criteria optimization and iCycle.In addition, MOBO-qNEHVI is compatible with any penalty function, including piece-wise objective functions that are common in clinical practice of brachytherapy as well as more sophisticated mixed integer optimization models that incorporate pair-wise penalty (Morén et al 2019) or the mean tail dose (Morén et al 2019) in their objective function.This is because MOBO-qNEHVI was developed to be a general optimizer of any black-box function as long as that function is not highly multi-modal.
Unlike gMCO, which only requires the dwell time optimization algorithm on GPU, MOBO-qNEHVI supports dwell time optimization on GPU and central processing unit (CPU).With respect to D 95% (CTV) and the rest of the ESTRO-ACROP thresholds, MOBO-qNEHVI with parallel initialization was able to obtain an acceptance rate of over 90% with 31 dwell time optimizations.In comparison, the average acceptance rate of gMCO was 26.8% after 1000 dwell time optimizations per patient.Lastly, MOBO-qNEHVI directly explores the Pareto surface in the DVH domain, which includes the Pareto surface in the mean organ dose domain (Zarepisheh et al 2014).The drawback of MOBO-qNEHVI compared to gMCO is its overall performance time (a minute versus 9 s in gMCO).Note that after the initiation, each MOBO-qNEHVI iteration takes 7.35 ± 2.45 s, and the user may choose to run fewer iterations to reduce performance time since each iteration will produce an ESTRO-ACROP plan.This is the first time that penalty weight tuning for FMIO was presented, and Bayesian optimization was used for hyper-parameter tuning in HDR brachytherapy.
In the future, MOBO-qNEHVI will be tested on other cancer types, namely, breast and gynecologic cancers.In these cases, the dose homogeneity index is an important objective that should be controlled with linear and quadratic penalty weights.As seen in figure A6, the delivered clinical plans had a lower coefficient of variation for the inter-catheter dwell times than the MOBO-qNEHVI plans.This limitation is more significant for the dwell positions at the ends of the catheters.Note that the dwell time optimization in our algorithm is performed with FMIO (Antaki et al 2020), where a linear one-sided penalty function controls the DVH metrics, and a two-sided quadratic penalty term aims to increase the dose homogeneity in the target volume.In this study, only the linear penalty weights are optimized by MOBO-qNEHVI to find the DVH trade-offs.The quadratic weights will be added as a parameter, and MOBO-qNEHVI will be used to optimize the dose homogeneity index and other DVH objectives, thereby improving the distribution of inter-catheter dwell times.Additional penalty terms can be added to the objective function of FMIO that better capture the 3D spatial dose properties, such as the ones suggested by Kaplan and Korreman (2021).
It is anticipated that MOBO-qNEHVI will be applied to recent innovations in HDR brachytherapy, such as dynamic intensity modulated brachytherapy (Famulari et al 2020, Morcos et al 2021), as it necessitates no training data.Incorporating the capability to lock certain DVH metrics of a given solution and navigating the associated trade-offs using the PNaV algorithm would be beneficial (Deufel et al 2020).

Conclusion
For the FMIO algorithm, MOBO-qNEHVI was able to find clinically acceptable penalty weights on the DVH Pareto surface for all the simulated patients without prior training.Fixing the penalty weight of the CTV to 1000, limiting the range of the OAR weights to 1 and 500, adding the dose to the target as a parameter, and initializing the surrogate Gaussian function with random sampling resulted in the highest average acceptance rate with a reasonable performance time.Therefore, MOBO-qNEHVI can be considered a reliable Pareto surface calculator that, combined with the FMIO algorithm, can provide automatic treatment plans for HDR brachytherapy within a minute.

Appendix D. Average time calculation for one iteration of MOBO-qNEHVI
To calculate the average time for each iteration of MOBO-qNEHVI (μ(time 1 MOBO)), the mean initiation time (μ(Initiation time)) was subtracted from the mean total time (μ(Total time) and the result was divided by 6; as we had 6 MOBO-qNEHVI iterations in total for each patient.The standard deviation on the initiation time (σ (Initiation time)) and total time (σ(Total time)) were used to calculate the uncertainty on the mean MOBO-qNEHVI iteration time (σ(time 1 MOBO)

Appendix E. Inter-catheter dwell time distribution
To increase the dose homogeneity inside the target volume, it is recommended that the dwell times are well distributed between the dwell positions that are close in the transverse plane.Although the objective of this paper was to evaluate the performance of MOBO-qNEHVI in exploring the DVH objectives and not the dwell time distribution, it is important to report on the dwell time profile for transparency.Assuming that the catheter tips have close axial coordinate, we looked at the dwell times of the dwell positions that are equidistant from the catheter tip.Ideally, equidistant dwell positions have similar dwell times to spread the dose uniformly instead of relying on a few dwell positions with large dwell times.Note that since the equidistant dwell positions are located on different catheters, we refer to their distribution as the intercatheter dwell time distribution.The inter-catheter dwell time distributions for each patient is shown through the box-plots of figures A4 and A5.In addition, the coefficient of variation for inter-catheter dwell time distribution across the patients is presented on figure A6.For each patient, the analysis above considered four plans: the delivered clinical plan, and three clinically acceptable plans from MOBO-qNEHVI each optimizing one DVH objective while respecting the clinical thresholds for the remaining DVH objectives.
Figures A4 and A5 show that the delivered plans have more uniform dwell time distributions than plans from MOBO-qNEHVI.This is confirmed by the average coefficient of variation for the inter-catheter dwell time distribution, which was significantly less for the delivered plans than the plans generated by MOBO-qNEHVI figure A6.It is worth noting that excepts for the dwell positions on the extreme ends of the catheters, the general trend of the dwell times along the catheters is similar between the MOBO-qNEHVI plans and the delivered plans.Also, the delivered plans have been manually adjusted by the human planner, who correct the optimized plans to have well distributed dwell times and consequently high dose homogeneity.Therefore, the delivered plans may not represent the raw output of the clinical treatment plan optimizer.
Lastly, MOBO-qNEHVI only had access to the linear penalty weights and the target dose as these parameters directly control the DVH metrics.However, the quadratic penalty weights of FMIO control the dose homogeneity index.it is possible to provide these parameters to MOBO-qNEHVI and dose homogeneity index as an additional objective.This is to be investigated in the future.

Figure 1 .
Figure 1.Pareto surface, Bayesian optimization and hyper-volume improvement.(a)shows Pareto surface for 2 arbitrary objectives g 1 (w) and g 2 (w).Point A is not Pareto optimal on this graph since objective g 1 (w) can be improved without compromising g 2 (w).In contrast, point B is Pareto optimal because any improvement in g 1 (w) or g 2 (w) leads to a decline in the other objective.(b) Steps in a loop of Bayesian optimization.(c) HVI for w * and w'.Note HVI of w * (green area) is larger than the HVI of w¢ (orange area) with respect to the previous Pareto surface (gray area).Therefore, qNEHVI picks w * as the candidate point for the next iteration.
Gy, 18.75 Gy, 11.25  Gy .7 [ ] ( ) = 2.3.Patient data, simulation details and plan optimization Treatment plans for 13 randomly-selected prostate cancer patients were optimized using the FMIO optimizer implemented in RapidBrachyMCTPS, Famulari et al (2018), Antaki et al (2020), Glickman et al (2020).Patients' post-implant CT images were used with approval from our institutional review board with the numbers MEO-37-2021-2597 and MP-37-2020-5035.Contours were imported from DICOM RT Structure Set files.Dwell positions were extracted from the clinical DICOM RT Plan files.Monte Carlo-based dose rate maps were simulated for each dwell position using RapidBrachyMC, a Monte Carlo dose calculation engine included in RapidbrachyMCTPS (Famulari et al 2018).A voxelized phantom was created based on the patient's CT images.Voxel by voxel tissue composition and mass densities were assigned based on CT Hounsfield unit values.Tissue elemental compositions followed recommendations from the American Association of Physicists in Medicine (AAPM) task group 186 (TG-186) (Beaulieu et al 2012) summarized in appendix table

Figure 2 .
Figure 2. The penalty weights and the DVH metrics of all the 3114 ESTRO-ACROP treatment plans from all patients.

Figure 3 .
Figure3.Time and acceptance rate as a function of MOBO iterations for the experiment configurations in table 2. The solid line and the shaded area represent the mean and standard deviation over all prostate patients.

Figure 4 .
Figure 4. DVH space of patient seven after running six iterations of MOBO-qNEHVI with parameter configuration R4 and parallel initiation of the Gaussian surrogate function.Each point belongs to a single vector containing weights and the target dose.The top figure is the DVH space when D 90% (CTV) was given as an objective to MOBO-qNEHVI, while the bottom figure is the DVH space when D 95% (CTV) was given as an objective.The DVH metrics were calculated with voxel-size of 3 mm.

Figure A2 .
Figure A2.Pareto front of patient 1 to patient 6 as estimated by MOBO-qNEHVI.These results are obtained with 25 random sampling of FMIO to initialize the surrogate Gaussian function followed by 6 iterations of MOBO-qNEHVI with R4 and ESTRO-ACROP thresholds parameter configuration.D 95% (CTV ) was given to MOBO-qNEHVI as an objective.

Figure A3 .
Figure A3.Pareto front of patient 7 to patient 13 as estimated by MOBO-qNEHVI.These results are obtained with 25 random sampling of FMIO to initialize the surrogate Gaussian function followed by 6 iterations of MOBO-qNEHVI with R4 and ESTRO-ACROP thresholds parameter configuration.D 95% (CTV) was given to MOBO-qNEHVI as an objective.

Figure A4 .
Figure A4.Dwell time distribution of the dwell positions along the catheters for patients 1 to 6.The X axis represents the distance of the dwell position from the catheter tip and the Y axis is the box plot of dwell times.The blue box plots are obtained from the delivered clinical plans.The green, red and orange box plots are obtained from the MOBO-qNEHVI plans where the focus is on DVH objectives of rectum, CTV and urethra.

Figure A5 .
Figure A5.Dwell time distribution of dwell positions along the catheters for patients 7 to 13.

Table 1 .
Mean and standard deviation of the number of dwell positions and dose points across ten patients.search interval, then breaking this interval into the same number of discrete values and repeating the process.Accordingly, SCISE started by breaking the range of 1-1000 into the set of {1, 200, 400, 600, 800} (step size = 200).Then, the penalty weight search space was formed by generating all the possible combinations of the weights for the 3 objectives.Therefore, this iteration's total search space size was 6 3 = 216.Next, SCISE ran FMIO 216 times to find the corresponding DVH metrics for each penalty weight combination.The penalty weight combinations were ranked based on the D 90% (CTV) (from high to low), and the top weight combination was selected (for example, [1000, 200, 1]).SCISE halved the step size and defined a new search interval focused on the top weight combination.
The details of this analysis are presented in appendix E

Table 3 .
Result of testing SCISE on 13 prostate patients.The plans with absolute minimum and maximum D 90% (CTV)

Table 4 .
Result of initializing the surrogate Gaussian models with 25 parallel random FMIO calls, then running 6 MOBO-qNEHVI iterations with R4 parameter configuration from table 2. The DVH metrics were calculated with a voxel size of 1 mm.

Table A3 .
).The calculation for the error propagation are presented in the equations below The outcome of linear portion of FMIO is independent of the exact value of the weights.DVH objective values depend on the relative ratio of the weights with respect to one another.