A machine learning approach to the accurate prediction of multi-leaf collimator positional errors

Joel N K Carlson; Jong Min Park; So-Yeon Park; Jong In Park; Yunseok Choi; Sung-Joon Ye

doi:10.1088/0031-9155/61/6/2514

1. Introduction

The introduction of volumetric modulated arc therapy (VMAT) as a method for delivering radiotherapy has decreased delivery time and monitor units (MU) as compared to conventional intensity modulated radiation therapy (IMRT) (Otto 2008). However, due to the highly choreographed nature of VMAT delivery, many potential sources of error arise, necessitating patient specific quality assurance (QA) and dosimetric verification techniques. The complex movement of the multi-leaf collimator (MLC) is one such source of errors between treatment planning and delivery. MLC positional errors are differences between the planned and delivered positions of the individual MLC leaves. These deviations can be studied by comparing the leaf positions encoded in the planning DICOM-RT files, which contain the intended leaf positions, to the machine reported DynaLog files, which contain the leaf positions during delivery. Although the manufacturer specified accuracy of DynaLog files is not present in the literature, DynaLog file reported MLC positions have been shown to be accurate through the analysis of film (Zygmanski et al 2003), 2D diode array (Li et al 2003), and electronic portal imaging device (Zeidan et al 2004) measurements.

Systematic shifts in leaf position and leaf gap have been shown to have detrimental effects on the accuracy of the delivery of dose distributions for both IMRT (Rangel and Dunscombe 2009, Yan et al 2009, Bai et al 2013) and VMAT (Oliver et al 2010, Tatsumi et al 2011). Some of the causes of leaf errors are known; for example, velocity of individual MLC leaves has been shown to have an approximately linear relationship with positional errors (Ramsey et al 2001, Losasso 2008). Miura et al also showed that gamma passing rates are correlated with MLC leaf velocity, indicating that errors in MLC positions due to large velocities may have negative effects on dosimetric accuracy (Miura et al 2014b). Furthermore, it has been shown that constraining the millimeters traveled per leaf per MU improves the delivery accuracy of the treatment plan (Chen et al 2011, Miura et al 2014a).

Due to the negative impact of MLC positional errors on the delivery accuracy of radiotherapy plans, it is advantageous to be able to predict how the errors will impact the delivery accuracy. To this end, several modulation indices have been developed in attempts to score the delivery accuracy of VMAT and IMRT plans (Li and Xing 2013, Masi et al 2013, Park et al 2014a, 2014b) before they are delivered. However, these methods are correlational, and appropriate thresholds for these values are difficult to define (Park et al 2014b). Furthermore, these indices do not give the treatment planner any information as to how the dose distribution as viewed in the treatment planning system (TPS) will be influenced by the errors.

Therefore, in this study we focused on creating a method for predicting MLC positional errors before delivery, and incorporated those errors into the dose distribution calculation to enable the treatment planner to see a more accurate representation of the dose as it will be delivered. To predict the errors, we first acquired planned and delivered MLC positions from a series of VMAT plans, and calculated the differences between the two. Next, we calculated leaf motion parameters of the plans which were hypothesized to lead to MLC errors. We then built machine learning models using these parameters as inputs to predict the errors between planned and delivered MLC positions. We then verified the accuracy of the predictions, and assessed their impact on QA and patient dosimetry.

The final outcome of the study is a model capable of taking a planned set of MLC positions in the form of a DICOM-RT file, and predicting the positions which will be delivered to a high degree of accuracy. By including the predictions of the model into the TPS, we show that it is possible to achieve a more accurate representation of the true locations of MLC leaves, which allows treatment planners to see a realistic view of the dose that will be delivered to the patient.

2. Materials and methods

2.1. VMAT plans

A retrospectively selected set of 74 VMAT plans was acquired from three separate institutions for this study. The plans from Institution 1 were for head and neck (H&N) (N = 20), and prostate (N = 20) cancer. The plans from Institution 2 were also for H&N (N = 6), and prostate (N = 10). For Institution 3 there were H&N plans from various sites (N = 15) and prostate plans (N = 3).

All plans were generated in the Eclipse system (Varian Medical Systems, Palo Alto, CA) with the progressive resolution optimizer 3 (PRO3, ver.11.0.31, Varian Medical Systems, Palo Alto, CA). Dose distributions were calculated using the anisotropic analytic algorithm (AAA, ver.11.0.31, Varian Medical Systems, Palo Alto, CA) with a dose calculation grid of 2 mm. Two full arcs were used in each plan, and optimized such that the angular separation between control points (CPs) was 2.0341°, leading to 356 individual CPs per plan.

Each plan was delivered using a linear accelerator equipped with a Varian Millennium 120 MLC. All plans from each institution were delivered using a single linear accelerator, and therefore a single MLC from the respective institution. The Millennium 120 MLC consists of two banks of 60 MLC leaves, with the outer 20 and inner 40 on each side having widths of 1 cm and 0.5 cm, respectively. Initial calibration of all MLCs was performed by a qualified Varian engineer. In all institutions included in this study, TG-40 (Kutcher et al 1994) and TG-142 (Klein et al 2009) protocols are followed for MLC QA.

2.2. Determining MLC error magnitude

The planned positions of each individual MLC leaf at each CP for every plan were extracted from DICOM-RT files exported from the Eclipse system. Therefore, from each plan 42,720 leaf position data points were extracted (356 CPs for each of the 120 MLC leaves). The plans were then delivered, and delivered locations of the individual MLC leaves were extracted from the Dynalog of the MLC.

After extracting the planned positions from the DICOM-RT files and the delivered positions from the Dynalog files, the two sets of positions (planned and delivered) must be synchronized before the difference between positions can be calculated. Synchronization must take into account the differences between the sampling times of the DICOM-RT and DynaLog files. DynaLog files record the position every 0.05 s in units of motor counts, which were converted to millimeters according to manufacturer specifications. DICOM-RT positions are recorded at each CP in units of millimeters. At a CP where there is to be more than 4.238 MU delivered, the gantry slows to allow successful delivery (Park et al 2015), changing the time between CPs. In this dataset there were no CPs in any of the plans at which the planned MU was greater than this threshold, and therefore the time between CPs was taken to be a constant 0.424 s, the maximum gantry movement speed (Park et al 2015). Therefore, after synchronization, the maximum time difference between the plan file and the DynaLog is 0.025 s, that is, half of the sampling time of the counts recorded in the DynaLog files.

After synchronization, the differences between the positions present in the DICOM-RT file (planned positions) and the positions reported by the DynaLog file (delivered positions) for each leaf were calculated using an in-house program written in R (R Development Core Team 2010). The absolute value of this quantity for each leaf at each CP is the error magnitude.

2.3. Leaf motion characterisation

A number of parameters characterising MLC leaf motion were derived from the planned leaf positions. Each parameter was calculated for all MLC leaves at all CPs, thus each data point represents a single MLC leaf at a single CP. The position of each leaf at the CP of interest, and also at the previous and subsequent CPs was calculated. The instantaneous velocity for each leaf at each CP was calculated as the leaf position minus the leaf position at the previous CP, divided by the time between CPs, as described in equation (1):

$\begin{eqnarray}&&\text{Velocit}{{\text{y}}_{\text{CP}}}=\,\frac{\text{Positio}{{\text{n}}_{\text{CP} ~}}-~\text{Positio}{{\text{n}}_{\text{CP}-1}}}{0.424~\text{s}}\end{eqnarray} \tag{ 1 }$

Acceleration for each leaf was calculated in a similar fashion. Velocity and acceleration for each leaf were also calculated for the previous and subsequent CP. Velocity and acceleration of both adjacent MLC leaves was calculated under the hypothesis that friction from adjacent leaves may induce errors.

Movement of the MLC leaves was also sorted into several categories. A category was defined to separate leaf motions into categories defining the state of motion, including: 'at rest', 'moving', 'coming to a stop', or 'moving for only a single CP' at the given CP. An MLC leaf was defined to be at rest if it did not move during the CPs before or after the CP of interest. Leaf movement direction was categorized to differentiate whether the leaf was moving towards, or away from the isocenter of the MLC. To further investigate the effect of friction from adjacent MLC leaves on the movement of the leaf of interest, a category was defined to classify the two adjacent leaves as 'both moving in the same direction', 'both moving in the opposite direction', 'one moving in the opposite direction', or 'both at rest'. The CP at which the error occurred (i.e. 1 to 356), the arc number (i.e. '1' or '2'), and the leaf bank the leaf was a part of (i.e. 'A' or 'B') were also extracted. The extraction of the errors between planning and delivery, and the calculation of predictive leaf motion parameters is displayed in figure 1.

**Figure 1.** Workflow of the extraction of errors between DICOM-RT and DynaLog files, and the extraction of leaf motion parameters from planned positions.
Download figure:
Standard image High-resolution image

2.4. Model training, validation, and testing

For each institution, the data was split into three separate datasets, termed training, validation, and testing. Thus there were nine sets in total. Since there was no overlap of plans within the datasets, each set may be considered as independent of the remaining sets from each institution.

A single plan was randomly chosen from each institution to be the training set for that institution (N = 1, 1, 1). The choice to use only a single plan for training from each institution was based on two observations. First, the large number of training data available in each plan (42, 720) was determined to be a sufficient number for training of the machine learning algorithms used in this study through cross validation with different sizes of training data sets. Second, the errors are dependent on the individual MLC rather than the plan itself, thus any sufficient amount of data from each unique MLC would be appropriate to train a model. Therefore, a model specific to the MLC of a given linear accelerator should be built.

A predictive model specific to each institution was fit using only the data from that institution's training plan. After each model was fit to the training plan, the accuracy of each model was tested on a validation set consisting of two randomly selected plans from the model's respective institution. The purpose of the validation set was to find the optimal combination of leaf motion parameters and to tune any parameter values the model may have. This tuning process is done on the validation set, rather than the training set, to avoid both over-fitting of the models, as well as overly optimistic accuracy assessments which do not hold out of sample. The leaf motion parameters and tunable model parameters were sequentially iterated over to minimize the root mean square error (RMSE) between predicted and delivered positions on the validation set, and the model with the lowest RMSE was chosen as the final model.

A final validation of the models was performed using the remaining plans from each institution, that is, the testing set (N = 37, 13, 15). Model performance on the testing set was then assessed using mean absolute error (MAE) and RMSE between the predicted and delivered positions. In this case, RMSE is used as an alternative to the more common standard deviation (SD), as the distributions of the errors do not follow the normal distribution. It should be noted, however, that the formulas for RMSE and SD are identical, the difference is in interpretation. The statistics reported in this study are from the test set only; this is an alternative and preferable method to using cross-validation, because the untouched test set can be thought of as real-world data, as the test data wasn't used during the training or validation process. This process is shown in figure 2.

**Figure 2.** Workflow for training, validating, and finally reporting the statistics of predictive models.
Download figure:
Standard image High-resolution image

2.5. Testing independence of predictions from training plan

To examine the importance of the choice of training plan on the quality of predictions, analysis of the predictions was performed using a different randomly selected plan from each institution for model training than was used initially. For the second model, identical model parameters as used to train the initial models were used.

Furthermore, to test whether models trained using a different MLC were able to make accurate predictions for other MLCs, a model trained from each institution was used to make predictions on testing plans from each of the other institutions. For example, a model trained using a single plan from Institution 1 was used to make predictions on plans from both Institutions 2 and 3, and the results compared to the predictions made using models trained using data from Institutions 2 and 3, respectively.

Due to the non-normal distribution of error predictions, the Mann–Whitney U test was performed to examine differences in model accuracy using alternative training plans. For each test performed, the p-values along with 95% confidence intervals of the difference in median error prediction was reported.

2.6. Predictive model parameters and types

Several different models were tested to find a model with the best predictive accuracy. The models included a simple linear regression model, a multiple linear regression model, a model based on the random forest algorithm, and a model based on the cubist algorithm (described below). The inputs to the models were the leaf motion parameters described above, and the target response for each model was the difference between the planned and delivered MLC leaf positions.

Of the leaf motion parameters extracted from the differences between planned positions and delivered positions, a set of two quantitative parameters: leaf position and instantaneous velocity, and four qualitative parameters: movement towards or away from the center, whether the leaf was at rest/starting/stopping/moving for a single CP, the CP number, and the leaf bank were utilized in the final models. All other parameters were found to decrease the RMSE on the validation set.

The R programming language (R Development Core Team 2010) was used for all data analysis and modeling.

2.6.1. Linear regressions.

For the linear regression modeling, two models were built. The first, LM_{V Only}, was a simple linear regression of velocity against the target response (difference between planned and delivered MLC leaf positions). The second was a multiple linear regression, regressing the parameters described in section 2.6 against the target response.

2.6.2. Random forest.

The random forest implementation used was based on Breiman and Cutler's algorithm (Breiman 2001) as implemented in the R package 'randomForest' (Liaw and Wienes 2002). Random forests create a predictive model by first randomly selecting a subset of a given number of features from the feature space. A sample of the training data is then taken, and the selected features are used to create a decision tree which separates the data such that the homogeneity of the samples at the terminal node of each branch is maximized. This process is repeated many times, and each decision tree produced in this way is saved to create a 'forest' of decision trees. To make predictions on new data, the new data point is fed into each tree, and the tree offers a prediction which is the average of all the data points used in training which follow the same path through the tree as the new data point. The prediction of the algorithm is then the average of the predictions from each tree in the forest.

For the random forest model, the number of features randomly sampled as candidates for each split of the decision tree was four, the value which minimized the RMSE on the validation set. Any number of trees above 100, and any sample size above 4000 were found to have little impact on accuracy.

An example of a random forest as applied to the prediction of MLC positional errors is as follows. First, the algorithm selects four leaf motion parameters, for example leaf velocity, leaf position, whether the leaf is moving or resting, and whether the leaf is moving towards or away from the isocenter. Then, a sample of 4000 errors (differences between planned and delivered positions) and the associated leaf motion parameters for those errors are extracted. From these, a tree is built with a number of terminal nodes with criteria such as 'if leaf velocity is greater than X cm per second, and the leaf is moving towards the isocenter, the error is Y'. In this study, 100 such trees are built, each having up to 1000 terminal nodes.

2.6.3. Cubist.

Cubist is a rule-based model consisting of several different methodologies. The operation of the cubist algorithm is similar to the random forest, however there are several optimizations. One such optimization is that while a random forest makes predictions using an average of the training points within the terminal node of a given branch, the cubist algorithm builds a linear regression model at each terminal node. There are several other optimizations, and the algorithm in its entirety is described in detail in Kuhn and Johnson (2013). Tunable parameters of the cubist algorithm include committees and neighbors. Committees being somewhat analogous to the number of decision trees used to contribute their predictions to the final prediction, and neighbors representing a number of neighboring training points which can be used to aid in prediction. The values of committees and neighbors in the final model were 100 and 0, respectively. The R implementation of the cubist algorithm from the 'Cubist' package was used (Kuhn et al 2014).

2.7. Integration with treatment plan for gamma analysis

For each plan from Institution 1, 2D dose distributions of each VMAT plan as delivered with 6 MV photons were acquired with a MapCHECK2 detector array (Sun Nuclear Corporation, Melbourne, FL). The MapCHECK2 was inserted into a MapPHAN (Sun Nuclear Corporation, Melbourne, FL) during delivery. Before delivery, the relative responses of each detector in the MapCHECK2 array, as well as the absolute response of the detector to a known dose were calibrated according to manufacturer specifications. The absolute dose of the Linac was also calibrated according to the American Association of Physicists Task Group 51 (AAPM TG51) protocol (Almond et al 1999).

A CT image of the device setup was imported into the Eclipse system and used for the calculation of the 2D dose distributions of plans with either planned or predicted positions. The distributions were calculated with a 2 mm calculation grid, PRO3 optimizer, and AAA algorithm, as above.

After delivery and calculation, both global and local gamma evaluations were performed with SNC patient software (ver. 6.1.2, Sun Nuclear Corporation, Melbourne, FL). Gamma criterion of 3%/3 mm, 2%/2 mm, and 1%/2 mm were used with a 10% threshold for the ROI, as frequently cited in the literature (Iftimia et al 2010, Heilemann et al 2013). The differences between the passing rates using the planned MLC positions, and the passing rates using the predicted MLC positions were compared using paired t-tests to assess the difference in mean passing rates between the two.

2.8. Integration with treatment plan for DVH analysis

For five H&N patients from Institution 1 for whom patient CT data was available, DICOM-RT files were reconstructed with predicted MLC positions. These, along with the planned DICOM-RT files, were imported into the Eclipse system. Dose distributions to the patient CT images were calculated using the same parameters as above, with the exception of the calculation grid size, which was reduced to 1 mm. For the target volume, clinically relevant dose-volumetric parameters such as the dose received by 95% of the target volume ( ${{D}_{\text{95} \%}}$ ), ${{D}_{\text{5} \%}}$ , the minimum dose, the maximum dose, and the mean dose were compared between planned and predicted VMAT plans. For organs at risk (OARs) in the H&N plans, the volume of each parotid gland receiving 50% of the dose ( ${{V}_{\text{50} \%}}$ ), and mean dose to each parotid gland and each sub-mandibular gland (SMG) were compared. Differences in the dose volumetric parameters between calculations using planned MLC positions versus delivered positions, and predicted positions versus delivered positions were compared using paired t-tests to assess the mean differences between the two.

3. Results

3.1. Predictive leaf motion parameters

Several leaf motion parameters were particularly important in increasing model accuracy. The motion parameter which offered the most predictive ability was leaf velocity, which had an approximately linear relationship with error magnitude (β = 0.129, CI = 0.128 to 0.130, p < 0.001), with coefficient of determination, R², of 0.902 (p < 0.001). This relationship is shown in figure 3(A).

**Figure 3.** Predictive value of leaf motion characteristics. Error magnitude (the difference between planned and delivered positions) versus individual leaf velocity on 10 000 randomly sampled errors from all institutions is shown in plot (A), with a linear regression of the sample in blue. β represents the slope of the line (0.129 mm of error for every increase in velocity of 1 mm s⁻¹), and R² represents the coefficient of determination. Plot (B) shows the difference in median error magnitudes between MLC leaves moving toward or away from the isocenter of the MLC.
Download figure:
Standard image High-resolution image

Whether the leaf was moving towards or away from the isocenter of the MLC also had a statistically significant effect on the mean error magnitude, making this category an important predictive motion parameter. The MAE of all leaves moving away from the center was 1.37 mm (RMSE = 0.99 mm), while the MAE of leaves moving towards the center was only 1.14 mm (RMSE = 0.97 mm). A difference in means of 0.235 mm, with 95% confidence interval (CI) of 0.231 to 0.238 mm (p < 0.001) by the Welch two sample t-test. A boxplot expressing the difference between movement directions is shown in figure 3(B).

3.2. Predictive accuracy

The model based on the cubist algorithm outperformed all other models. Planned, delivered, and predicted leaf positions of a single MLC leaf for two representative sets of CPs are shown in figure 4. The figure shows that in all cases the predicted positions more closely coincide with the delivered positions than do the planned positions.

**Figure 4.** Planned, delivered, and predicted (cubist) positions of a single MLC leaf from an H&N plan over two sets of CPs. In plot (A), the leaf is planned to drop rapidly, with delivered positions lagging until the leaf slows. Plot (B) show a set of CPs where heavy modulation is planned, but the delivered positions consistently fail to reach the target. In all cases the predicted positions are closer to the delivered positions than are the planned positions.
Download figure:
Standard image High-resolution image

The MAE, and root mean squared error between planned and delivered, and predicted and delivered for moving, resting, and all MLC leaves from each institution are summarized in tables 1–3, respectively. The considerably lower error between predicted and delivered versus planned and delivered positions for the cubist model is shown in figure 5.

Table 1. Model performance in predicting delivered MLC positions for moving MLCs from the test set (N = 65).

Institution	Model	All plans		H&N plans		Prostate plans
Institution	Model	MAE (mm)^a	RMSE (mm)^b	MAE	RMSE	MAE	RMSE
1	Planned	1.284	1.636	1.358	1.354	1.086	1.489
	LM_{V Only}^c	0.324	0.45	0.354	0.476	0.244	0.373
	LM^d	0.282	0.407	0.302	0.423	0.227	0.359
	Random forest	0.275	0.395	0.29	0.407	0.237	0.36
	Cubist	0.253	0.371	0.269	0.384	0.21	0.332

2	Planned	1.409	1.699	1.458	1.735	1.361	1.663
	LM_{V Only}	0.313	0.409	0.315	0.408	0.311	0.41
	LM	0.286	0.372	0.291	0.375	0.281	0.369
	Random forest	0.284	0.384	0.29	0.387	0.279	0.381
	Cubist	0.278	0.387	0.285	0.393	0.272	0.38

3	Planned	1.145	1.495	1.153	1.504	1.075	1.412
	LM_{V Only}	0.356	0.501	0.354	0.5	0.375	0.517
	LM	0.305	0.448	0.3	0.443	0.346	0.483
	Random forest	0.314	0.454	0.313	0.454	0.318	0.451
	Cubist	0.274	0.426	0.273	0.424	0.286	0.44

^aMean absolute error. ^bRoot mean squared error. ^cLinear regression model using only leaf velocity. ^dLinear regression model with all leaf motion parameters.

Table 2. Model performance in predicting delivered MLC positions for MLCs at rest from the test set (N = 65).

Institution	Model	All plans		H&N plans		Prostate plans
Institution	Model	MAE (mm)^a	RMSE (mm)^b	MAE	RMSE	MAE	RMSE
1	Planned	0.084	0.158	0.159	0.246	0.043	0.075
	LM_{V Only}^c	0.085	0.158	0.16	0.246	0.044	0.075
	LM^d	0.052	0.085	0.097	0.121	0.027	0.057
	Random forest	0.039	0.074	0.061	0.106	0.028	0.048
	Cubist	0.027	0.054	0.056	0.088	0.012	0.017

2	Planned	0.037	0.109	0.051	0.132	0.033	0.1
	LM_{V Only}	0.039	0.109	0.052	0.132	0.034	0.1
	LM	0.016	0.045	0.022	0.06	0.015	0.038
	Random forest	0.021	0.051	0.029	0.068	0.018	0.044
	Cubist	0.005	0.013	0.007	0.019	0.005	0.01

3	Planned	0.033	0.129	0.037	0.136	0.023	0.11
	LM_{V Only}	0.034	0.129	0.038	0.136	0.024	0.11
	LM	0.025	0.087	0.026	0.082	0.022	0.099
	Random forest	0.022	0.084	0.023	0.08	0.019	0.095
	Cubist	0.009	0.04	0.009	0.033	0.01	0.053

^aMean absolute error. ^bRoot mean squared error. ^cLinear regression model using only leaf velocity. ^dLinear regression model with all leaf motion parameters.

Table 3. Model performance in predicting delivered MLC positions for moving and resting MLCs from the test set (N = 65).

Institution	Model	All plans		H&N plans		Prostate plans
Institution	Model	MAE (mm)^a	RMSE (mm)^b	MAE	RMSE	MAE	RMSE
1	Planned	0.513	0.987	0.802	1.247	0.24	0.651
	LM_{V Only}^c	0.17	0.298	0.264	0.387	0.082	0.176
	LM^d	0.134	0.253	0.207	0.32	0.065	0.164
	Random forest	0.124	0.244	0.183	0.307	0.067	0.162
	Cubist	0.108	0.226	0.17	0.288	0.049	0.145

2	Planned	0.39	0.867	0.636	0.124	0.281	0.724
	LM_{V Only}	0.109	0.228	0.162	0.282	0.086	0.199
	LM	0.086	0.193	0.134	0.246	0.064	0.163
	Random forest	0.089	0.2	0.138	0.255	0.067	0.169
	Cubist	0.075	0.196	0.122	0.254	0.055	0.164

3	Planned	0.576	1.048	0.647	1.115	0.292	0.72
	LM_{V Only}	0.191	0.362	0.211	0.38	0.114	0.278
	LM	0.162	0.319	0.176	0.332	0.105	0.259
	Random forest	0.165	0.323	0.182	0.34	0.096	0.242
	Cubist	0.139	0.299	0.153	0.314	0.08	0.227

^aMean absolute error. ^bRoot mean squared error. ^cLinear regression model using only leaf velocity. ^dLinear regression model with all leaf motion parameters.

**Figure 5.** MAE between planned and delivered positions, and between predicted and delivered positions for both resting and moving MLC leaves. In all cases the predicted positions are much closer to the delivered positions than are the planned positions.
Download figure:
Standard image High-resolution image

For Institution 1, the MAE between the planned leaf positions and the delivered leaf positions of moving MLC leaves was 1.284 mm, with root mean squared error (RMSE) of 1.636 mm. The MAE between positions predicted by the Cubist model and the delivered positions was 0.253 mm (RMSE = 0.371 mm). Therefore, the predictions were, on average, greater than 1 mm closer to the delivered positions than were the planned positions.

Institutions 2 and 3 showed similar tendencies. For Institution 2, the MAE between planned and delivered positions of moving leaves was 1.409 (RMSE = 1.699) mm, and the difference between predicted and delivered was 0.278 (0.387) mm. These values for Institution 3 were 1.145 (1.495) mm and 0.274 (0.426) mm.

3.3. Independence of predictions from choice of training plan

The results of testing the dependence of the predictions on choice of training plan are shown in table 4. For Institutions 1 and 2 there was no significant difference in the predictions made using different plans for training the model. For Institution 3, there was a significant difference, with p-value < 0.001. However, the confidence interval was from 0.021 to 0.029 mm, indicating that the effect of using a different training plan was small.

Table 4. Differences in the predictions made by models trained using different plans from the same institution.

Institution	Difference in median (mm)	95% CI^a	p value
1	0.001	−0.001–0.003	0.294
2	0.001	−0.001–0.004	0.275
3	0.025	0.021–0.029	<0.001

^a95% Confidence interval.

The results of the differences in predictions when using a model trained with data from a different institution (and therefore a different MLC) are presented in table 5. Table 5 shows that for all combinations of training institution and testing institution, there were significant differences in the predictions made by the model trained using the same institution as the testing data, and the predictions made by the model trained using a different institution as the testing data. However, the estimated differences in medians of the predictions were all less than 0.1 mm.

Table 5. Differences in the predictions made by models trained using different institutions than the testing data.

Model institution^a	Testing institution^b	Difference in median (mm)	95% CI^c	p value
1	2	0.048	0.045–0.051	<0.001
1	3	0.080	0.076–0.084	<0.001

2	1	0.058	0.055–0.060	<0.001
2	3	0.018	0.014–0.022	<0.001

3	1	0.064	0.062–0.066	<0.001
3	2	0.007	0.004–0.010	<0.001

^aThe institution for which the data used to train the model was from. ^bThe institution for which the data used to test the model was from. ^c95% Confidence interval.

3.4. Gamma analysis

The analysis of the improvements in gamma passing rates was separated into four categories, local and global passing rates for both H&N and prostate plans. This data is summarized in table 6 and figure 6. Table 6 presents the mean differences between the passing rates of the plans utilizing the planned MLC positions, and plans utilizing predicted positions.

Table 6. The change in gamma passing rates due to the inclusion of predicted errors in the plans is shown. For all local criteria, and for all H&N plans the passing rate of the plan is improved when errors are predicted.

		Local gamma passing rates			Global gamma passing rates
		PR change (%)^a	p value	95% CI^b	PR change (%)	p value	95% CI
H&N	1%/2 mm	4.17	<0.001	3.32–5.03	3.53	<0.001	3.00–4.06
	2%/2 mm	3.6	<0.001	2.85–4.35	1.47	<0.001	1.07–1.86
	3%/3 mm	1.83	<0.001	1.42–2.24	0.41	0.002	0.18–0.64

Prostate	1%/2 mm	–	0.08	–	–	0.50	–
	2%/2 mm	0.83	0.005	0.29–1.34	−0.16	0.02	−0.29–0.03
	3%/3 mm	0.64	<0.001	0.35–0.92	−0.09	0.03	−0.17–0.01

^aPassing rate change from using planned positions to predicted positions. ^b95% confidence interval.

**Figure 6.** Boxplots showing the increase in passing rates through the utilization of predicted MLC positions. Plots (A) and (B) show the increases in local and global gamma passing rate when using predicted positions for H&N plans, respectively. Plots (C) and (D) show the same information for prostate plans.
Download figure:
Standard image High-resolution image

In all cases for H&N plans the passing rate is increased by calculating the dose plane with the predicted positions. This indicates that the predicted positions better represent the reality of delivery than do the planned positions. For prostate plans there was a similar trend, however since the global passing rates for prostate plans were often near 100%, the difference was generally much smaller.

3.5. DVH analysis

Representative DVH curves for OARs and planning target volumes (PTVs) of the patient dose distributions as calculated using the planned, predicted, and delivered MLC positions are presented in figure 7. The average differences in dose volumetric parameters between planned and delivered, and predicted and delivered, are shown in figure 8. In all cases the dose volumetric parameters calculated with the predicted positions are in closer agreement with the delivered parameters than are the planned parameters. Figure 8 shows that the largest differences are present in the OAR dose distributions. For instance, the average difference between planning and delivery of the volume of the right parotid receiving 50% of the dose was 8.16% (SD = 3.3%, p = 0.005), whereas the difference between the predicted and delivered was statistically insignificant (0.18%, SD = 0.96%, p = 0.7). The change in the PTVs was of the same general magnitude as the changes in the OARs, but owing to the much larger dose prescriptions they had smaller percent changes.

**Figure 7.** Representative DVH curves showing the curves of (left to right) (A): left parotid, right parotid, left SMG, right SMG, and (B): PTV 48 Gy, PTV 54 Gy, PTV 67.5 Gy. In all cases the DVH curves calculated using the predicted positions are in closer agreement with the delivered curves than are the planned curves.
Download figure:
Standard image High-resolution image

**Figure 8.** Average percent differences in dose volumetric parameters planned versus delivered positions, and predicted versus delivered positions. Plots (A)–(C) show the percent changes for PTVs, parotids, and SMGs, respectively. Stars above the bars indicate significance (* = p < 0.05, ** = p < 0.01, and *** = p < 0.001).
Download figure:
Standard image High-resolution image

4. Discussion

A model capable of predicting errors for specific MLC leaves could help to inform better optimization algorithms for creating plans capable of being delivered as intended. In this study such a model was built and validated. First, it was shown that MLC errors are predictable to a high degree of accuracy. Second that such MLC errors have an appreciable impact on the gamma passing rate of the plan, and a new plan corrected for the errors raises the passing rates. Finally, dose volumetric histograms (DVH) recalculated with predicted positions incorporated into the plan provide the treatment planner with a better representation of the deliverable dose distributions for the PTV and OARs. In this study, several parameters which offer predictive capability were established. Although much of the variance in the positional errors is captured by leaf velocity, the linear model taking only velocity into account was outperformed by all other models. This indicates that there are other patterns in the data not related to velocity which may result in discrepancies between planned and delivered positions. The other patterns were well predicted by the best performing cubist model.

The inclusion of a leaf motion parameter in the final model does not necessarily imply that the parameter has a real effect on error magnitude. The inclusion of leaf bank is an example of this, where the inclusion increases predictive accuracy not because one leaf bank is more error prone. Rather, it allows the model to switch the direction of the predicted error when the orientation of the coordinate system switches after the first treatment arc. In this study there was no significant difference between the means of errors from leaf banks A and B by the Welch two sample t-test (p = 0.15). This is in accordance with the findings of Kerns et al (2014), and in opposition to Stell et al (2004).

Some of the leaf motion parameters which were hypothesized to be related to error magnitude did not have an appreciable effect on the models, for example, leaf acceleration, and the movement of adjacent MLC leaves. Inclusion of either of these parameters led to over-fitting of the training data, and consequently decreased the RMSE of the models on the validation set, hindering the generalizability of the models.

It has been posited that dose errors correlate with gap error, and not necessarily with individual leaf position errors (Losasso 2008). In contrast to dynamic IMRT plans, where leaves on opposing leaf banks move in the same direction, in VMAT plans the leaves move back and forth in both directions. Therefore it is important to know how many of the errors assessed in this study are gap errors (where the opposing leaves have errors in opposite directions, leading to larger or smaller leaf gaps than intended), or shift errors (where opposing leaves are both shifted in the same direction, with little change to the leaf gap). For H&N plans in this study, the average proportion of errors which were gap errors was 31.77% (SD = 1.46%). Prostate plans generally showed a lower proportion, with 15.70% (SD = 1.34%) of the errors being gap errors. It was also found that, in general, when the errors of opposing leaves were in opposite directions, the average change in leaf gap was 1.74 mm (SD = 0.44 mm), whereas for shift errors the average change in leaf gap was 0.35 mm (SD = 0.20 mm). That is, although there are fewer gap errors, the magnitude of gap errors is typically much larger than that of shift errors.

It was shown that the accuracy of predictions for a given MLC was independent of the choice of training plan. However, using training data from a different MLC of the same model led to discrepancies in predictions. It is therefore recommended that a model specific to each MLC should be trained, and used for predictions only for that specific MLC.

MLC leaf position errors are potentially a contributing factor in radiotherapy treatment plans failing to be delivered as intended. The prediction of leaf position errors could be used as a component of a modulation index to predict the delivery accuracy of a plan pre-delivery. For example, an MLC modulation index could be built as a linear combination or ratio of the number of predicted errors above or below certain thresholds. Although methods such as these may be able to predict deliverability, pre-treatment QA should continue to be an important part of the treatment workflow.

Predictions may also be used to further investigate the dosimetric effects of MLC errors. Dosimetric effects of random MLC errors have been studied in the past by sampling from a Gaussian (Rangel and Dunscombe 2009, Oliver et al 2010) or from a uniform distribution (Mu et al 2007, Yan et al 2009, Bai et al 2013). However, neither of these distributions accurately model a realistic error distribution, nor do they take into account the directional dependence of leaf errors on leaf velocity. Therefore, by utilizing the method for error prediction described in this study, more accurate assessments of the dosimetric effects of MLC errors may be investigated. This study is limited in that it only considered Varian Millennium 120 MLCs, however, there is nothing precluding the methods from being adapted to other MLCs, and this will be undertaken as a future work.

It is important to note that this work is concerned with the internal representation of MLC positions used to calculate the dose distributions within the TPS. If the positions sent to the MLC controller were altered to be the predicted positions, there would still be positional errors.

5. Conclusions

In this study, it was shown that MLC leaf position errors can be predicted to a high degree of accuracy by utilizing statistical learning techniques. All models took only a single plan as an input, the models are simple to implement, and take approximately one second to train. By utilizing the predicted positions, rather than the planned positions to calculate dose distributions it was shown that gamma passing rates can be increased, and that errors in MLC positions that impact dose volumetric parameters can be reduced. The methodology developed in this study was shown to be generalizable to other institutions by assessing their own institutional data.

By incorporating and correcting for the predicted errors in MLC positions, optimization routines for encoding MLC leaf positions may be improved, and would allow for more realistic calculation of the dose distributions as truly delivered to the patient.

Acknowledgments

This work was in part supported by the National Research Foundation of Korea (490-20150036, 490-20140041 and 5267-20150100) grant funded by the Korea government. The authors are grateful to the editor and associate editors for their valuable comments and review of this paper.

A machine learning approach to the accurate prediction of multi-leaf collimator positional errors

Article metrics

Permissions

Author e-mails

Author affiliations

Dates

Abstract

1. Introduction