Probabilistic Pareto plan generation for semiautomated multicriteria radiation therapy treatment planning

Objective. We propose a semiautomatic pipeline for radiation therapy treatment planning, combining ideas from machine learning–automated planning and multicriteria optimization (MCO). Approach. Using knowledge extracted from historically delivered plans, prediction models for spatial dose and dose statistics are trained and furthermore systematically modified to simulate changes in tradeoff priorities, creating a set of differently biased predictions. Based on the predictions, an MCO problem is subsequently constructed using previously developed dose mimicking functions, designed in such a way that its Pareto surface spans the range of clinically acceptable yet realistically achievable plans as exactly as possible. The result is an algorithm outputting a set of Pareto optimal plans, either fluence-based or machine parameter–based, which the user can navigate between in real time to make adjustments before a final deliverable plan is created. Main results. Numerical experiments performed on a dataset of prostate cancer patients show that one may often navigate to a better plan than one produced by a single-plan-output algorithm. Significance. We demonstrate the potential of merging MCO and a data-driven workflow to automate labor-intensive parts of the treatment planning process while maintaining a certain extent of manual control for the user.


Introduction
Having achieved a broad range of promising results in recent years, the application of machine learning methods to biomedical engineering is today established as a prosperous subject of research (Park et al. 2018, Siddique andChow 2020).Within automated treatment planning for radiation therapy, while the direct prediction of machine parameters of an optimal or desired plan remains an intractably high-dimensional and nonlinear problem, data-driven methods based on a prediction-mimicking pipeline has helped in homogenizing the labor-intensive process of creating clinically satisfactory plans (Berry et al. 2016).However, such a methodology has several fundamental drawbacks-the produced deliverable plan is, for example, highly dependent on the quality of the prediction, and it is in practice often close to, yet not sufficiently in line with, the clinician's preferences, entailing the need for further post-processing using manual tools (Cagni et al. 2018).Whereas some previous work has been devoted to address the former point (Babier et al. 2020b, Nilsson et al. 2021, T Zhang et al. 2021), the latter is especially concerning in that it implies not only strict requirements on the protocol used for creating the training data, but also significant time spent on model commissioning, which often involves setting up manual objectives supplementing the dose mimicking problem and, in turn, undermines the purpose of automated planning in the first place.Instead of continuing on the direction of developing a fully automated method, in this paper, we propose a new semiautomatic treatment planning workflow in which the a treatment planner or clinician may optionally articulate their own preferences by navigating in real time between Pareto optimal plans, combining ideas from machine learning and multicriteria optimization (MCO).
Automated treatment planning using machine learning, also known as knowledge-based planning, generally concerns the automatic plan generation using knowledge extracted from historically delivered clinical treatment plans (Ge and Wu 2019, Hussein et al. 2018, Ng et al. 2020, Wang et al. 2019).Usually, one uses available data to train a machine learning model to predict the parameters of some parameterized optimization problem such that the solution will, to the largest extent possible, correspond to a clinically satisfactory plan.While some work has been focused on predicting weights in a weighted-sum objective function (Boutilier et al. 2015), the majority of recent work has been aimed at predicting achievable dose-related quantitiesfor example, spatial dose distributions and dose-volume histograms (DVHs)-and setting up a dose mimicking optimization problem to minimize the deviation from the values evaluated on the actual dose to those predicted.
Of such a prediction-mimicking pipeline, the prediction part has been extensively investigated in the literature.Regarding spatial dose prediction, current state-of-the-art methods are mostly based on convolutional neural networks with a U-net-like architecture (Babier et al. 2020a, Campbell et al. 2017, Kearney et al. 2018, Nguyen et al. 2019, Nilsson et al. 2021, Shiraishi and Moore 2016).For DVH or dose statistic prediction, methods based on overlap volume histograms as inputs (Appenzoller et al. 2012, Jiao et al. 2019, Ma et al. 2019, Wall et al. 2018), some involving predicting principal component coefficients of DVHs (Yuan et al. 2012, Zhu et al. 2011), have been more recently joined by those using input images directly as input (Liu et al. 2020, Nguyen et al. 2020, T Zhang et al. 2021).As discussed by Babier et al. (2020b) and furthermore demonstrated in a previous paper by us (T Zhang et al. 2021), a substantial disconnect between the halves of a prediction-mimicking division may emerge from making deterministic inferences due to typical dose mimicking formulations being relatively non-robust to prediction errors.In this context, we showed (T Zhang et al. 2021) that probabilistic methods, which output predictive probability distributions expressing estimation uncertainties, may reduce the information loss between the prediction and mimicking parts.Indeed, much of other previous work (Covele et al. 2021, Fogliata et al. 2019, Nguyen et al. 2021, Nilsson et al. 2021) have already been directed toward precise quantification of predictive uncertainties for spatial dose or DVH statistics.
Despite this, the sensitivity of the final result to prediction errors brings to light more fundamental weaknesses of the current automated treatment planning paradigm.To approach clinical quality of automatically produced plans, the dataset used for training the prediction model must be sufficiently large as well as highly adherent to the protocol and planning standards set for creating the plans (Cagni et al. 2018, Van der Bijl et al. 2020)-however, even then, certain manual intervention in terms of additional objectives or post-processing is sometimes needed to approach satisfactory quality (McIntosh et al. 2021, T Zhang et al. 2021).In fact, as noted by J Zhang et al. (2020) and Van der Bijl et al. (2020), automatically produced plans are the product of a population-based decision made upon the planning tradeoffs without first estimating or exploring what they are.Considering the large prevalence of inter-planner variations even among experienced planners (Nelms et al. 2012), it becomes questionable whether the substantial efforts required to collect large and high-quality datasets is motivated by the end result, particularly if the dose mimicking formulation is heavily controlled by domain-knowledge objectives whose parameters need to be manually tuned and commissioned.Thus, instead of trying to further improve parts of a single-plan-output pipeline, we will direct attention to an MCO-based alternative semiautomatic workflow in which the output is rather a range of plans around what would have been the single-plan output, optionally letting the user make final adjustments through a real-time navigation interface.In contrast to conventional MCO, such a methodology enjoys the benefits of not needing to manually specify appropriate tradeoffs and constraints and of the Pareto surface covered being significantly smaller, thus requiring fewer Pareto plans to obtain comparable discretization accuracy.While the idea of restricting a standard Pareto surface to clinically acceptable regions has been found to be successful (Goli et al. 2018, Huang et al. 2021, Serna et al. 2009), in a machine learning context, recent efforts have been focused on exploring tradeoff directions in DVHs by residual modeling (J Zhang et al. 2020) or predicting the spatial dose distributions of plans Pareto optimal with respect to a prespecified MCO problem (Bohara et al. 2020, Jensen et al. 2021, Nguyen et al. 2020, Van der Bijl et al. 2020).Such methods have the advantage of speed but the disadvantage of, again, being vulnerable to prediction errors-for instance, if one were to navigate between a set of predicted Pareto optimal doses, it is not clear why the interpolated dose should correspond to or even be close to a physically realizable dose, especially in the presence of prediction errors.
In this paper, we propose a semiautomatic treatment planning pipeline in which knowledge extracted from historically delivered clinical plans is leveraged to produce for each new patient a set of Pareto optimal plans, enabling the possibility for the user to articulate their preferences before a final plan is decided upon.To achieve this, we combine ideas from previous work on data-driven methods in automated planning with the flexibility of MCO framework, estimating predictive probability distributions to be translated into tradeoff objective functions.In particular, apart from models predicting spatial dose and dose statistics neutrally, as a basis for the resulting MCO problem, we will also create biased-or, as we will call it, tilted-versions of the models simulating the cases of having optimized more aggressively toward certain groups of planning goals.In particular, a standard three-dimensional convolutional U-net is used to predict spatial doses, trained using a combination of a Sobolev space-inspired spatial loss and a DVH loss.The dose statistic prediction, on the other hand, is performed probabilistically by the similarity-based Bayesian mixture-of-experts model proposed in T Zhang et al. (2020a) and used in T Zhang et al. (2021).Tilted predictions are subsequently constructed using a change of probability measure in the predictive distribution of dose statistics and retraining of the neutral U-net model using an accordingly modified DVH loss.Translating the neutral and tilted predictions into tradeoff objectives, the resulting MCO problem is solved numerically to produce a set of Pareto optimal plans to be presented to the user for navigation.
Specifically, the proposed means of setting up a well-defined MCO problem enables the generation of, for example, fluence-based Pareto plans rather than merely predictions thereof, an idea which has not been previously explored in the literature.The result is a novel semiautomatic workflow in which the Pareto plans are automatically generated relatively quickly and, compared to conventional MCO, span more exactly the range of achievable as well as clinically acceptable plans, all while having relatively limited requirements on training data size and quality.Apart from the proposed pipeline itself, conceptual contributions of this paper include the loss functions used for training the spatial dose prediction model and the notion of tilting predictions to simulate a change of tradeoff preferences.Numerical experiments demonstrate that one may often navigate to a better plan than the single-plan output associated with the neutral predictions and that the corresponding deliverable plan is often better than the clinical ground truth, showcasing the potential of uniting classical MCO with the current data-driven treatment planning paradigm.

Method
Let X and D denote vector spaces of contoured CT images and spatial dose distributions, respectively, and let {(x n , d n )} N n=1 ⊂ X ×D be a training dataset of clinical, historically delivered treatment plans.For sets such as {s i } I i=1 or {s i } i∈R , we will use the shorthand notation {s i } i when the range of the index i is known from context, and analogously for vectors with parentheses instead of brackets.Using a linear voxel indexing, we can represent each x ∈ X and d ∈ D as vectors x = (x i ) i and d = (d i ) i of equal length.Let R = R target ∪ R OAR be the set of regions of interest (ROIs) defined for the treatment site, partitioned into targets R target and organs at risk (OARs) R OAR .For our purposes, we will only use the binary encodings of the ROIs and not the radiodensities in the CT images.Also, let {ψ j } j be a generic collection of dose statistics ψ j : D → R containing all dose-related quantities of interest-for example, dose-at-volume D v or lower/upper mean-tail-dose MTD ± v (Romeijn et al. 2006) levels at different volumes across different ROIs.We will often use y ∈ Y for the vector (ψ j (d)) j of dose statistic values evaluated on a dose d, with Y being the corresponding vector space.
For an out-of-sample patient x * ∈ X , our task is to predict the corresponding spatial dose d * ∈ D and its dose statistic values y * = (ψ j (d * )) j ∈ Y and solve an appropriately constructed MCO problem to obtain a set of Pareto optimal plans.While the dose statistics are directly evaluable using the predicted spatial dose, it will be crucial to estimate the multivariate predictive distribution over groups of dose statistics rather than merely a single, deterministic prediction, which motivates our employing a separate dose statistic prediction model-the interstatistic dependencies will help in creating tilted predictions, on which the MCO tradeoffs will be based.In particular, the proposed pipeline, shown in Figure 1

Spatial dose prediction
As a start, we need a spatial dose prediction model which can predict the dose distribution d * for each new patient x * .Ideally, one would prefer to obtain the full predictive distribution but although some previous work (Nguyen et al. 2021, Nilsson et al. 2021) has been devoted to uncertainty estimation for spatial dose prediction, no existing method is able to output the complete multivariate predictive distribution as a closed-form probability density.Instead, we will proceed with a deterministic U-net dose prediction model f : X → D and an associated regression problem on the form where = ( i ) i is a non-isotropic regression error for which we assume the i to be independent and normally distributed as i ∼ N(0, σ 2 i )-note, however, that the particular architecture used is not conceptually important.For each input x = (x i ) i ∈ X , we define the ith-position information x i as the (|R| + |R target |)-dimensional vector such that the first |R| components are the binary ROI encoding of voxel i and the last |R target | components are distances to the targets.Explicitly, this is written as where ⊕ denotes vector concatenation and DT R (i), shorthand for distance transform, is the Euclidean distance from voxel i to the voxel in R nearest to i.
For training, it is straightforward to see that a standard maximum-likelihood approach to fit the network weights in f is equivalent to a weighted mean squared error minimization under the current assumptions on , the weight of each voxel i proportional to σ −2 i .In addition to a mean squared error, we will introduce two additional loss contributions: one to account for the fact that a physical dose must vary smoothly in space, and one to enforce that the predicted DVHs are reasonable.For the first contribution, let u : R 3 → R be the three-dimensional scalar field representing a dose distribution d in continuous space, and note that the standard mean squared error |d − d| 2 in discrete space corresponds to the squared Inspired by theory on partial differential equations, where one often works in Sobolev spaces (Evans 2010), we will instead use the Sobolev Translated into discrete space and adapting to the voxel weights arising from the non-isotropic likelihood, we write the spatial loss contribution L spat : D 2 → R as where α > 0 controls the contribution of the gradient term and ∇d i is used to denote the central difference approximation to the spatial gradient in the voxel grid at index i.While similar loss functions have been used in image reconstruction (Lu andChen 2019, Van der Jeught et al. 2021) and meteorology (Höhlein et al. 2020) contexts, the idea is new to the dose prediction literature.In addition to the spatial loss, we will use a DVH-based loss L DVH : Y 2 → R similar to that proposed by Nguyen et al. (2020), but with squared differences taken in the dose direction rather in the volume direction.Specifically, if {ψ j } j∈S D is a subset of all dose statistics restricted to only those of dose-at-volume type, we have where the σ j are introduced to entail a weighting analogously as the σ i .Combining the spatial and DVH-based losses, the loss function L : D 2 → R used for training is written as where the total loss to be minimized is given by the mean over the training examples.To ensure differentiability of each ψ j (d) with respect to d, for each forward pass in the neural network, the local dose (d i ) i∈R in each ROI R is evaluated, sorted and linearly interpolated to be able to approximate the dose-at-volume value

Dose statistic prediction
When predicting the dose statistic values y * = (ψ j (d * )) j given x * , we now seek a probabilistic method to estimate the multivariate predictive density p(y * | x * , {(x n , y n )} n ).These are essentially the same preconditions-high-dimensional inputs and outputs compared to data size, complex input-output relations and possibly skewed distributions-that motivated the development of the dose statistic prediction method using a mixture-of-experts model in T Zhang et al. (2021), with the exception that a spatial dose prediction method has already been trained.Thus, instead of constructing and training a variational autoencoder for feature extraction as in T Zhang et al. (2021), we will use the vector (ψ j (f (x))) j of dose statistics evaluated on the predicted dose f (x) as inputs to the mixture-of-experts model.This will be combined with purely geometric features φ geom (x), such as the discretized target distance transforms used for comparison in T Zhang et al. (2021), to produce for each input x a total feature vector (2) To further regularize the input space by reducing its dimension, which will be beneficial for the subsequent mixture-of-experts model, we will fit an isomap transformation (Tenenbaum et al. 2000) to the total feature vectors {φ tot (x n )} n , creating corresponding embeddings {φ iso (x n )} n .The isomap is a generic dimensionality reduction algorithm producing embedding vectors for which similarities are based on geodesic rather than Euclidean distances, which suits our purposes well.Similarly, for the output space Y, we will extract the main principal components of the centered data {y n −y} n , where y is the sample mean, writing y n = y +P y n pc for each n with the principal components as columns in P and the y n pc being coefficient vectors.Here, as we shall see below, it is important that each y may be reconstructed as an affine transformation of its corresponding coefficient vector y pc .
Having obtained a preprocessed dataset {(φ iso (x n ), y n pc )} n , we can proceed by applying the same similarity-based mixture-of-experts model (T Zhang et al. 2020a) as used in T Zhang et al.
(2021), which outputs predictive distributions as multivariate Gaussian mixtures.This is a recently developed nonparametric Bayesian regression method in which predictions are based on input similarities rather than explicitly modeling input-output relations.With experts as multivariate normal distributions {N(µ c , Σ c )} C c=1 , for each new input φ iso (x * ), mixture weights are calculated by first evaluating the similarity τ n to each training input φ iso (x n ), and then the probability σ nc of each training output y n pc belonging to each expert class c-the total mixture weight π c for each expert class c is then given by π c = n τ n σ nc .In particular, input similarities are based on the Mahalanobis distances (φ iso (x) − φ iso (x )) T Λ(φ iso (x) − φ iso (x )), where Λ is a precision matrix.The model has a predictive likelihood on the form where θ = (Λ, {(µ c , Σ c )} c ) are the model parameters and where

Tilting
With the neutral spatial dose and dose statistic prediction models in place, in order to be able to construct an MCO problem rather than a single-plan dose mimicking problem, our next step is to tilt the predictions to be more aggressive toward one or several planning goals.More precisely, whereas the neutral prediction may be associated with a balanced prioritization of the planning goals used in the historical plans, we wish to predict the outcome had the prioritization been heavily biased toward certain goals-for example, an extra low rectum dosage for a prostate plan, possibly at the cost of sacrificing target coverage-without actually needing to include such plans in the training data.On the other hand, in contrast to conventional MCO, we also wish to exclude completely unrealistic plans-for example, one with unacceptably cold target dose to achieve an excessively low rectum dose-without having to specify constraints manually.In other words, the Pareto optimal solutions to the MCO problem, and thus also the range covered by the tilted predictions, should span as exactly as possible the plans which are achievable as well as clinically acceptable.
Noting that the predictive distributions of the dose statistics have infinite support, we want the dose statistic tilting to direct attention toward the distribution tails.For this, we will exploit the mixture-Gaussian form of the distributions and use exponential tilting, inspired by importance sampling techniques.Given a probability distribution p(y) for y, the ζ-exponentially tilted distribution p ζ (y) is obtained by a change of probability measure according to p ζ (y) ∝ e ζ T y p(y).In our case, it can be shown that the tilted predictive distribution is again a Gaussian mixture An interpretation of this is that the moment-generating function E N(y * | y+P µc,P ΣcP T ) e ζ T y * of y * under y * ∼ N(y + P µ c , P Σ c P T ) is multiplied to each original mixture weight π c when tilting-that is, more mass in the tilted distribution is assigned to those y * which are more parallel to ζ.Moreover, the mean shift for each expert class c will be P Σ c P T .This suggests that ζ should be parallel to the ideal direction of y * and of appropriate magnitude.It turns out that the main principal component p 1 is a good choice for this, with eventual sign modifications for targets in order for distributions of dose statistics to be maximized to be tilted upward, and vice versa.Using the sign convention that p 1 should be in the mostly positive direction and accounting for the fact that ζ should be in units inverse to y * , we may write ζ as where ι ≥ 0 is a constant and the denominator is the predictive standard deviation of y * along the direction p 1 .Using this, we may achieve good choices of ζ only by tuning a single parameter ι.Note that ζ = 0 corresponds to the neutral model.Figure 5 shows examples of tilted distributions in comparison to their neutral counterparts.Finally, to also obtain spatial dose predictions more compatible with the tilted dose statistic predictions, we need to modify the neutral dose prediction model.This is done by retraining the neutral model using the same spatial loss function but with a different DVH loss.Denoting by µ ζ (y) the ζ-exponentially tilted predictive mean of y, for each tilting ζ, we may obtain a retrained dose prediction model f ζ by minimizing the same total loss as described in Section 2.1 but with L replaced by -that is, the reference levels for the dose statistics are replaced by the mean of the tilted predictive distribution according to the mixture-of-experts model.

MCO
Equipped with a method of tilting predictions of both DVHs and spatial dose distributions, we are now set to construct the MCO problem.Unlike in conventional MCO, where each tradeoff should represent a naive ideal-e.g., max-dose functions at zero dose for OARs-we may utilize the prior information contained in the predictive models to design tradeoff objectives in such a way that penalties become relatively small beyond the range of realistic values.To convert the predictions to objectives, we will use the binary cross-entropy objectives presented in Nilsson et al. (2021) and T Zhang et al. (2021).Recall that the marginal predictive distributions for the dose in each voxel i and the value of each dose statistic j are Gaussian and mixture-Gaussian, respectively, following the assumptions on the regression error and properties of the mixtureof-experts model, and let the associated univariate cumulative distribution functions be denoted by F ζ i and F ζ j under a tilting ζ.For each voxel i achieving dose d i during optimization, the penalty contribution will be based on the binary cross-entropy where t i is a binary label set to 1 if d i is to be maximized and 0 otherwise-the contribution from achieving the value ψ j (d) in dose statistic j is calculated analogously, t j being the corresponding label.Such an expression has the property that the higher the certainty that the voxel dose or dose statistic value is around some range of values, the more will the objective penalize deviations from those values.At the same time, since the F i and F j are continuous and strictly increasing, the optimizer will always have incentive to improve even when beyond the range of typical values.For more details on such objectives-in particular, including an illustration of their behavior for different probability distributions-see T Zhang et al. (2021).
Hence, for each tradeoff, all of the voxels and dose statistics will be included, but with their respective distributions tilted differently.If ζ is a particular tilting, the associated tradeoff objective ψ ζ is written as where the r i are the relative volumes of the voxels with respect to the outline ROI, summing to unity, w spat and the w j are weights and S obj is a subset of all dose statistics considered.With a set Z 0 of appropriately constructed tiltings, where the zero tilt ζ = 0 represents the neutral model, the resulting MCO problem is written as where η denotes the optimization parameters, E its feasible set and the dose deposition is assumed to be defined through some mapping d = d(η), the details of which depend on the type of optimization used.One example is in fluence map optimization, where η ≥ 0 may represent discretized nonnegative fluences and where d is linear in η (Ehrgott et al. 2010); other examples include direct machine parameter optimization for techniques such as dynamic multileaf collimator delivery, sliding-window volumetric modulated arc therapy (VMAT), intensity-modulated proton therapy, tomotherapy and ordinary VMAT (Craft et al. 2014, Unkelbach et al. 2015), all but the last-mentioned also with the property of d being linear (or approximately linear) in the optimization variables η.Standard algorithms exist for numerically solving general MCO problems-for details, see Breedveld et al. (2019) and Bokrantz (2013).For example, one may optimize on the weighted-sum total objective ζ∈Z w ζ ψ ζ using different weight patterns (w ζ ) ζ∈Z -specifically, such an algorithm produces |Z| plans using one-hot vectors as weight pattern, called anchor plans, and one using the all-one vector, called the balance plan (Craft et al. 2014).In the end, regardless of the Pareto plan generation algorithm chosen, we obtain a set {d k Pareto } k of Pareto optimal plans with corresponding optimization parameters {η k Pareto } k .To decide upon a final plan, the user now has the opportunity to intervene by navigating on the surface of Pareto plans, which numerically amounts to choosing an interpolation Pareto with nonnegative coefficients λ = (λ k ) k of unit sum.In a treatment planning system such as RayStation (RaySearch Laboratories, Stockholm, Sweden), for instance, one can use sliders to balance the tradeoffs while inspecting in real time the spatial dose, DVHs and clinical goals of the navigated plan.For delivery techniques equipped with the abovementioned dose-variable linearity property, the analogous interpolation k λ k η k Pareto constitutes a feasible plan (if the feasible set E is convex) that recreates the navigated dose distribution, meaning that any navigated plan is directly realizable-otherwise, an additional optimization is performed to mimic the navigated plan using feasible optimization variables (Bokrantz 2012, Craft et al. 2014).To the extent that the navigated dose may guaranteed to be physically realistic, this motivates the generation of Pareto plans and associated optimization variables rather than directly performing navigation on the predicted doses.Alternatively to manual navigation, in RayStation, it is also possible to automatically navigate by specifying some objectives and constraints summarizing one's preferences, e.g. based on clinical goals.In particular, dividing a set of clinical goals {±ψ j (d) ≤ ± ψj } j∈S CG , signs depending on t j , into objectives and constraints S CG = S CG, obj ∪ S CG, constr , we may write the autonavigation optimization problem as minimize (x) ± denoting the positive or negative part of x.While autonavigation has the potential of making the proposed semiautomatic workflow fully automatic, due to the fact that the construction of ( 6) needs to be based on domain knowledge, it may also be viewed an enhanced manual step.

Computational study
To demonstrate the proposed semiautomatic pipeline, we performed numerical experiments using a dataset originating from Iridium Cancer Network (Antwerp, Belgium), which comprised 91 retrospective treatment plans of prostate cancer patients having undergone a prostatectomy prior to radiation therapy.The patients were treated with dual 360-degree VMAT arcs and prescriptions of 7000 cGy in the prostate bed and 5600 cGy in the seminal vesicles and pelvic nodes, with final doses calculated by the collapsed cone algorithm in RayStation.The dataset was split into 84 training or validation patients and 7 test patients-the models trained using the former set were used to make predictions and create Pareto plans for each patient in the latter.For all parts of the numerical experiments, the set R of ROIs considered were the prostate PTV, seminal vesicles PTV, rectum and bladder regions, of which the former two constituted R target .
The neural network used for the neutral and tilted spatial dose prediction models was implemented in TensorFlow 2.4, using the three-dimensional U-net architecture depicted in Figure 2. Convolutional and max-pooling layers were used for the encoder part and transposeconvolutional layers for the decoder part, along with a sequence of convolutional layers before the final output is produced, and layer normalization was applied after each rectified linear unit (ReLU) activation.The input images were preprocessed to arrays of size 64 × 48 × 83 × 6 with a 5 mm voxel resolution, where, as expanded in (1), the 6-dimensional information x i for each voxel i is the binary ROI encoding concatenated with the target distance transforms.Comprising around 2 • 10 7 weights, the models were fitted using a standard Adam optimizer and a training-validation split of 75 and 9 patients, respectively.The gradient term weight α in L spat was set to 1, and the dose statistics used in the DVH-based loss L DVH were dose-atvolume functions {D i/100 } 99 i=1 at each integer-percentage volume level from 1 % to 99 %, with individual dose statistic weights σ −2 j manually tuned such that targets had total weight 10 times that of OARs.For fitting the neutral model, the training lasted around 100 epochs, after which an approximate minimum in validation loss was reached.The tilted models were obtained by retraining the neutral network for another 10 epochs using the modified DVH loss (4), the length of the training set so as not to deviate too far from the neutral prediction while still achieving the desired tilted effect.
The set {ψ j } j of all dose statistics to be considered was dose-at-volume functions {D i/100 } 99 i=1 for all four ROIs, along with additional mean-tail-dose functions {MTD ± i/100 } 99 i=1 (Romeijn et al. 2006) for targets (using lower mean-tail-dose MTD − v for v ≥ 0.5 and upper mean-tail-dose MTD + v for v < 0.5) to increase control of distribution tails.In order to achieve better predictive accuracy given the small dataset, we used separate models for each ROI at the

Dose
Figure 2: Illustration of the architecture of the U-net used for neutral and tilted spatial dose predictions.The encoder part consists of convolutional (yellow) and max-pooling (red) layers, and the decoder upsamples using transpose-convolutional (blue) layers.After each layer, an activation is applied if indicated with a darker color, which is followed by layer normalization.cost of sacrificing inter-ROI dependencies.The raw inputs φ tot (x) were constructed according to (2), where the geometric inputs φ geom (x) were distance transforms from each ROI in R OAR to each ROI in R represented by histograms in a total of 88 dimensions.Concatenating with the dose statistic values predicted by the spatial model and using the isomap algorithm, embeddings φ iso (x) were reduced to 10 dimensions.Meanwhile, the 4 largest principal components were used in P , leading to inputs φ iso (x) and outputs y pc of the mixture-of-experts models being of dimension 10 and 4, respectively.Each mixture-of-experts model used C = 16 classes with a posterior sample size of 10 for each mean-covariance pair (µ c , Σ c ).The set Z of tilts was constructed by using (3) on each ROI in R at a time while letting predictions for other ROIs remain neutral, with the constant ι tuned manually, resulting in |Z| = |R| + 1 = 5 tilts in total including the original all-neutral model.
Constructing tradeoff objectives from the neutral and tilted predictions according to (2.4), where the subset S obj was chosen to be dose-at-volume statistics at volume levels 10 %, 20 %, . . ., 99 % for OARs and both dose-at-volume and mean-tail-dose statistics at 1 %, 2 %, 5 %, 10 %, 20 %, . . ., 90 %, 95 %, 98 %, 99 %, the MCO formulation (5) was implemented and solved numerically in a research version of RayStation 11A.A standard algorithm (Bokrantz 2013) was used to generate |Z| + 1 = 6 fluence-based Pareto plans-five anchor plans and one balance plan-each using 40 iterations of the in-house sequential quadratic programming solver.To evaluate the proposed methodology against a single-plan-output automated planning algorithm, we compared the neutral anchor plan to the best possible plan in the navigation space, here represented by the autonavigated plan using the optimization formulation (6) and the clinical goals shown in Table 1.Note that the use of autonavigation is mostly intended to serve as a means of evaluate the quality of the produced Pareto frontier approximation, replacing a human user for the purposes of this study.We also compared OAR sparing in the clinical plan to the deliverable VMAT plan constructed from the best navigated plan using the in-built conversion process in RayStation.

Results
Starting with the prediction models, although they are not the main focus of our numerical experiments, we may still validate qualitatively that their purposes are fulfilled.Figure 3 shows a comparison between the neutrally predicted spatial dose and the corresponding ground truth for a patient in the test dataset.While some noticeable differences in both spatial dose and DVH are present-for example, the predicted dose decays faster beyond the targets and has inferior target coverage in the prostate PTV-the overall quality of the dose prediction is quite sufficient for being a starting point in our pipeline.Furthermore, Figure 4 shows difference maps between the neutral and the rectum-and bladder-tilted models.One can see that while the main differences are located in the respective ROIs, the dose shifts are relatively smooth, extending beyond the ROIs subject to tilting, and also smaller in or near overlap regions with the targets, as would be expected of a physically realistic dose.These properties are indicative of the Sobolev and DVH loss functions achieving the desired effects of penalizing spatial unevenness and excessive DVH deviation, respectively, in combination with the natural smoothness implied by the convolutional neural network architecture.As for the dose statistic prediction, the dimensionality-normalized root mean squared error had means 137 cGy, 109 cGy, 722 cGy, 412 cGy and standard deviations 21 cGy, 13 cGy, 297 cGy, 111 cGy for the prostate PTV, seminal vesicles PTV, rectum and bladder ROIs, respectively, over the seven test patients.Moreover, Figure 5 shows pointwise DVH confidence bands for both the neutral and the tilted model over the four ROIs considered for the same test patient.One can see that the ground truth DVHs are well within the neutral prediction bands and that the tilted counterparts have predicted significantly more uniform doses to the targets and lower doses to the OARs.Hence, this assures us that the dose statistic prediction is reasonable and that the tilted models indeed correspond to more optimistic predictions.For each patient, obtaining the six Pareto plans took around five minutes.To visualize the range of plans achievable with the produced approximation of the Pareto set, Figure 6 shows the span of DVHs achievable for each ROI on the test patient alongside the fluence-based neutral and navigated plans, where the navigation has been performed by the autonavigation algorithm.One can see that the fluence-based navigated plan, representing for our purposes the best plan in the Pareto set, has better sparing of the rectum and bladder as well as better target coverage compared to the neutral anchor plan, which represents what would have been the output had we pursued a single-plan approach.As may be seen in Figures 3 and 5, the predictions for the prostate PTV are slightly too cold, which has indeed translated to a similar effect in the neutral anchor plan.On the other hand, by navigating closer to the prostate PTV anchor plan, we are able to improve the target coverage-analogously, we are able to simultaneously achieve better OAR sparing.This illustrates, in particular, the necessity for a means of easily adjusting the preliminary output of an automated treatment planning algorithm, for which the MCO framework is well-adapted.Furthermore, Table 2 shows the average differences in dose statistic values between the fluence-based navigated and neutral plans and between the machine parameter-based navigated and clinical plans.Here, it is evident that the fluence-based and machine parameterbased navigated plans present improvements over the neutral and clinical plans, respectively.One can also see in Figure 8 that the machine parameter conversion is able to keep differences between the machine parameter-based and fluence-based plans small, which reassures us that the fluence-based MCO, despite being an idealization, is physically realistic to a certain extent.Although similar experiments need to be performed on larger datasets and other treatment sites in order to be able to draw definitive conclusions, all in all, the results show that our proposed semiautomatic methodology has the potential of improving the quality of plans produced by fully automated pipelines while offering the additional flexibility of real-time MCO navigation.

Discussion
Despite easily being able to produce acceptable or near-acceptable plans in most cases, current automated treatment planning algorithms are approaching a plateau in performance as function of training data size and quality-in the last fraction of cases, they struggle with consistently achieving clinical quality without additional post-processing.Instead of trying to make further improvements of such algorithms, we have argued that this is indicative of more fundamental weaknesses of the automated planning paradigm and sought to develop a semiautomatic counterpart, where the output is a range of plans instead of a single plan and the final plan may be decided upon after optional manual adjustments.Suiting our purposes well, the established framework of MCO was combined with machine learning and optimization methods known from classical prediction-mimicking automated planning algorithms.In particular, a threedimensional convolutional U-net with specially developed loss functions was used to predict spatial dose, and a nonparametric Bayesian regression method was used to estimate multivariate predictive distributions of dose statistics.These neutral predictions were then tilted by a change of probability measure into ones biased toward different groups of planning goals at a timeimportantly, this idea allowed us to simulate different goal prioritizations without requiring such plans to be included in the training data.Constructing tradeoffs based on predictive distributions of spatial dose as well as dose statistics, the resulting MCO problem was numerically solved to produce Pareto plans, presenting the user with a navigation-ready setup.Compared to previous literature on the subject, our proposed method differs in that actual fluence-based or machine parameter-based plans are generated, as opposed to only considering predicted doses, and that we have completed the semiautomatic pipeline by performing navigation and converting to a deliverable plan.The results are promising, illustrating the facts that one may often improve the neutral anchor plan by navigating on the Pareto surface and that the corresponding deliverable plan may often be better than the clinical ground truth.Among the advantages of our proposed method are the limited demands on the size and quality of the input training data-as the output no longer needs to precisely match the clinical protocol at hand, the data requirements are arguably even less strict than for a conventional automated planning algorithm-the minimal need for tuning and commissioning domain-knowledge objectives in the dose mimicking problem, and the real-time navigation interface facilitating reaching consensus between treatment planners and radiation oncologists.Furthermore, if one prefers a completely automated approach, the manual navigation step may be replaced by an autonavigation algorithm based on clinical goals.While flucence-based VMAT Pareto plans, having the also important advantage of computational speed but the disadvantage of requiring an additional conversion step, were used for the numerical experiments in this paper, there are numerous other common delivery techniques for which the navigated plan is directly or almost 9ROXPH>?@'RVH>F*\@ 9ROXPH>?@379SURVWDWH 379VHPLQDOYHVLFOHV 5HFWXP %ODGGHU directly deliverable, examples including DMLC, sliding-window VMAT, intensity-modulated proton therapy and tomotherapy.On the other hand, our workflow has the additional step of creating tilted versions of the predictions, which is relatively sensitive to the estimated multivariate predictive distribution of the dose statistics and thus includes a certain extent of hand-crafting.
While the present work primarily focuses on conceptually demonstrating the proposed method, there are many interesting directions for future research.For instance, to further conform to the philosophy of probabilistic machine learning, one may seek to develop a spatial dose prediction method which is able to estimate the multivariate probability distribution over all voxel doses, e.g. using a probabilistic extension of a U-net (Kohl et al. 2018), Gaussian processes (Rasmussen and Williams 2006) or Markov random field methods (Murphy 2012).While the Sobolev space-inspired loss function used for spatial dose prediction is shown to work well, one would need further numerical experiments to confirm that it improves the dose prediction quality over ordinary loss functions.Other possibilities include investigating prediction of all dose statistics at once instead of using ROI-specific models and developing a more systematic and customized method of tilting dose statistic predictions, particularly in such a way that one may allocate higher importance to certain dose statistics instead of, as is the case with the current method, treating the whole DVH for each ROI relatively equally.For example, one could investigate ways of constructing more advanced tilting vectors ζ, e.g by analyzing regression residuals such as in J Zhang et al. (2020).Lastly, another important next step of evaluating the proposed methodology is to include human treatment planners and clinicians to perform the navigation part and evaluate the quality of the end result.Although more evaluations of the method on other datasets and treatment sites remain before definitive conclusions may be drawn regarding how the method compares against existing ones, the semiautomated data-driven workflow presented in this work has shown considerable promise in alleviating the radiation therapy treatment planning process from labor-intensive and monotonous tasks.

Conclusion
In this work, we have presented a new semiautomatic treatment planning pipeline in which knowledge extracted from historical plans is utilized to generate for each new patient a Pareto set representation, optionally allowing for manual adjustments through navigation before a final plan is settled upon.By unifying ideas from the current automated planning paradigm with MCO-in particular, where neutral spatial dose and dose statistic predictions are augmented with counterparts biased toward different planning goals-we are able to achieve this with minimal quality requirements on input training data.The computational study illustrates a substantial benefit from exploring different tradeoffs compared to a single-plan-output approach and the potential of ultimately obtaining deliverable plans better than the clinical ground truth.For clinics wishing to automate labor-intensive tasks in the treatment planning process while preferring to maintain a certain extent of manual control, the proposed methodology constitutes a viable option to the current fully automated planning paradigm.

Figure 1 :
Figure 1: Flowchart illustrating the different parts of the proposed pipeline.

Figure 3 :
Figure 3: (a) Transversal cuts of the neutrally predicted spatial dose (left) and the clinical ground truth counterpart (right).(b) Comparison in DVH between those of the neutrally predicted spatial dose (dashed) and the clinical ground truth (solid).

Figure 4 :
Figure 4: Difference maps for the rectum-and bladder-tilted spatial dose predictions (left and right, respectively) versus the neutral prediction.

Figure 5 :
Figure 5: Illustration of neutral (green) versus tilted (purple) dose statistic predictions, where the shaded bands correspond to 99 %, 95 % and 70 % prediction intervals for each dose-at-volume statistic Dv.The training DVHs are shown in gray and the ground truth in black.Note that since the prediction intervals are displayed pointwise for each v, there is no guarantee that the interval limits are monotonous in v.

Figure 6 :
Figure6: Visualization of the DVH spans of the produced Pareto plans for two test patients, where the shaded regions lie between minima and maxima for each dose-at-volume statistic Dv, pointwise for each v.The lines show the DVHs for the neutral anchor (dashed), the autonavigated (solid) and the clinical (dotted) plan.

Figure 7 :
Figure7: Boxplot of the spread of each dose statistic over the seven patients for the autonavigated, neutral anchor, deliverable and clinical plans.The acceptance levels for the clinical goals on the targets are marked with vertical gridlines.

Figure 8 :
Figure 8: (a) Comparison in DVH between those of the fluence-based autonavigated plan (dashed), the converted machine parameter-based deliverable plan (solid) and the clinical plan (dotted).(b) Difference map between the deliverable and the autonavigated plans.

Table 1 :
Clinical goals used in the autonavigation optimization problem (6), either as constraints or objectives.

Table 2 :
Differences in dose statistic values between the autonavigated (Nav) and the neutral anchor (Neu) plan, and between the deliverable (Del) and the clinical (Clin) plan.The displayed values are sample means and standard deviations of the pairwise differences over the test dataset of seven patients.