Evaluation of atlas based auto-segmentation for head and neck target volume delineation in adaptive/replan IMRT

IMRT for head and neck patients requires clinicians to delineate clinical target volumes (CTV) on a planning-CT (>2hrs/patient). When patients require a replan-CT, CTVs must be re-delineated. This work assesses the performance of atlas-based autosegmentation (ABAS), which uses deformable image registration between planning and replan-CTs to auto-segment CTVs on the replan-CT, based on the planning contours. Fifteen patients with planning-CT and replan-CTs were selected. One clinician delineated CTVs on the planning-CTs and up to three clinicians delineated CTVs on the replan-CTs. Replan-CT volumes were auto-segmented using ABAS using the manual CTVs from the planning-CT as an atlas. ABAS CTVs were edited manually to make them clinically acceptable. Clinicians were timed to estimate savings using ABAS. CTVs were compared using dice similarity coefficient (DSC) and mean distance to agreement (MDA). Mean inter-observer variability (DSC>0.79 and MDA<2.1mm) was found to be greater than intra-observer variability (DSC>0.91 and MDA<1.5mm). Comparing ABAS to manual CTVs gave DSC=0.86 and MDA=2.07mm. Once edited, ABAS volumes agreed more closely with the manual CTVs (DSC=0.87 and MDA=1.87mm). The mean clinician time required to produce CTVs reduced from 169min to 57min when using ABAS. ABAS segments volumes with accuracy close to inter-observer variability however the volumes require some editing before clinical use. Using ABAS reduces contouring time by a factor of three.


Introduction
Manual delineation of CTV volumes for head and neck (H&N) IMRT patients on a planning-CT (pCT) can take more than two hours per patient and is subject to large inter and intra-observer error [1][2]. During treatment H&N patients sometimes undergo significant anatomical changes during their treatment often due to weight loss [3]. This can lead to inadequacies in their immobilisation or rotations that cannot be accounted for through rigid body translations. In these cases, it is often necessary to carry out a replan-CT (rCT) so that the dosimetric implications of anatomical changes on the original plan can be assessed and to decide if a full re-plan is required. This involves the clinician re-delineating CTV volumes on rCT and re-calculating the dose from the original plan to highlight either regions of under dosage of the target, or over dosage of surrounding normal tissue. The redelineation of CTV volumes on rCT again can take more than two hours per patient and is subject to large inter-and intra-observer error [1][2]. Atlas Based Autosegmentation (ABAS) is an example of commercially available software which facilitates the transfer of structures between different CT data sets using deformable image registration, so that contours can be generated automatically. This has the potential benefits of reducing contouring time and both inter-and intra-observer variability. The performance of ABAS has been demonstrated for the lung [4], gynaecology [5], breast and anorectal [6] and H&N [7][8].
The aim of this work was to assess the accuracy of automatically delineated CTV volumes produced by ABAS on rCT and compare them to volumes manually delineated by multiple observers at multiple time points, thus allowing inter-and intra-observer errors to be estimated. A further aim of this work was to estimate the potential time that ABAS could save in the adaptive H&N pathway.

Method
Fifteen patients, previously treated with IMRT for H&N carcinoma, were included in this retrospective study. Each patient had a pCT and rCT acquired with a Siemens Somatom Sensation wide bore CT scanner. Patients had a range of prescription doses and tumour locations (five nasopharynx, two oral cavity, two tongue, two larynx, one tonsil, one glottis, one oropharynx and one nasopharynx). Four groups of volumes were delineated for each patient: manual delineation on the pCT; automatic segmentation on rCT; manual delineation on the rCT by up to three clinical oncologists (clin) at multiple time points; and manual editing of the automatic segmentations on rCT. A schematic of all groups of volumes produced is shown in figure 1. For inter-observer results three clinicians contoured a subset of five patients and for intra-observer results one clinician contoured a subset of two patients on three occasions separated by at least one week. Manual CTV volumes were delineated according to local protocols. Volumes were automatically segmented on rCT with the H&N algorithm in a research version of ABAS proto8 v0.53 (Elekta AB, Stockholm, Sweden) using the manual delineation from clinician 1 from pCT as an atlas. All volumes were observed in Focal 4D v4.8 (Elekta AB, Stockholm, Sweden).
Different volume sets were compared using the; dice similarity coefficient (DSC) [9], which ranges from zero to one indicating no overlap and a perfect agreement, respectively; and mean distance to agreement (MDA), which reduces for improved agreement. The final analysis was to time observer 1 while manually delineating CTV volumes in steps 2 and 6 in figure 1. This allowed the potential time savings due to ABAS to be measured. Contouring was performed by a research fellow without routine clinical time pressures and therefore may be longer than in the clinical setting.

Results
CTV volumes were produced by (1) manual contouring by up to three observers; (2) automatic segmentation by ABAS and (3) manual editing of the ABAS volumes. Differences in volumes were observed between the three manual delineations and the ABAS volume which needed altering in the anterior aspect to make it clinically acceptable (figure 2).
Mean inter and intra-observer errors were assessed along with ABAS data against each manual observer ( figure 3a,b). The mean inter-observer variability (DSC>0.79 and MDA<2.1mm) was greater than the intra-observer variability (DSC>0.91 and MDA<1.5mm). Also the variability between ABAS and manual delineation was different for all observers (DSC 0.82-0.88 and MDA 1.63-2.61mm) and comparable to the inter-observer variability. Finally, although ABAS volumes produced acceptable results (i.e. observer 1 vs. ABAS for all 15 patients gave mean DSC=0.86 and MDA=2.07mm). While ABAS volumes agreed more closely with the manual CTVs, after being edited by observer 1, (mean DSC=0.87 and MDA=1.87mm) the edited volumes remained closer to the volumes auto-segmented by ABAS.

Conclusion
Inter-observer errors in delineation are higher than intra-observer errors for the H&N volumes considered in this work. ABAS segments volumes with accuracy close to inter-observer variability. However, small regions with significant discrepancies may require editing before clinical use. Use of ABAS reduces contouring time by a factor of three which helps facilitate adaptive radiotherapy for the H&N IMRT patient group.