Biofidelic image registration for head and neck region utilizing an in-silico articulated skeleton as a transformation model

Objective. We propose an integration scheme for a biomechanical motion model into a deformable image registration. We demonstrate its accuracy and reproducibility for adaptive radiation therapy in the head and neck region. Approach. The novel registration scheme for the bony structures in the head and neck regions is based on a previously developed articulated kinematic skeleton model. The realized iterative single-bone optimization process directly triggers posture changes of the articulated skeleton, exchanging the transformation model within the deformable image registration process. Accuracy in terms of target registration errors in the bones is evaluated for 18 vector fields of three patients between each planning CT and six fraction CT scans distributed along the treatment course. Main results. The median of target registration error distribution of the landmark pairs is 1.4 ± 0.3 mm. This is sufficient accuracy for adaptive radiation therapy. The registration performs equally well for all three patients and no degradation of the registration accuracy can be observed throughout the treatment. Significance. Deformable image registration, despite its known residual uncertainties, is until now the tool of choice towards online re-planning automation. By introducing a biofidelic motion model into the optimization, we provide a viable way towards an in-build quality assurance.


Introduction
Image registration is a fundamental part of image-guided and adaptive radiation therapy for cancer. As deformations during treatment are present in nearly every body region, deformable image registration (DIR) has superseded the application of rigid-body methods.
Commonly used intensity-based DIR techniques can be a computationally fast and simple way to approximate anatomical deformations. They are however susceptible to artifacts in the image due to their reliance on voxel intensities. Moreover, homogenous regions missing high-contrast features can cause misregistrations (Kirby et al 2011). In general, these approaches consider changes in the intensity distribution without concern for the biomechanical properties of the tissues. This can lead to unrealistic deformations.
Utilizing biomechanical models enables the incorporation of the available knowledge of anatomy and physiology into the registration process as a biofidelic transformation model. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.
Finite element methods are assumed to provide high accuracy and are of interest when complex and independent organ motion is considered (Brock et al 2005). They suffer however from long computation times limiting their practical use in adaptive radiation therapy.
Simplified approaches utilize the rigidity of independent individual bones and can be used to evaluate skeletal motion. However, they do not incorporate any motion propagation of soft tissue and ignore the biomechanical constraints given by the joints that are essential in skeletal motion (Yip et al 2014).
To enforce rigidity of bones in image registration, intensity-based DIR algorithms were previously enhanced with a rigidity constraint (Reaungamornrat et al 2014, König et al 2016. These approaches consider the motion in each rigid structure independently rather than incorporating the articulated nature of the skeleton.
For an articulated skeleton-based registration, the head and neck area is of special interest. Due to the close proximity of tumors to organs at risk, the utilized image registration has to be very accurate and robust to facilitate the optimal treatment in the presence of steep dose gradients. Furthermore, the head and neck region is governed by skeletal motion between the fractions influencing soft tissue deformations. This means soft tissue deformation can be modeled based on the skeletal motion.
In this study, we firstly propose the novel integration concept of the articulated kinematic skeleton model introduced by Teske et al (2017a) in a fully automated registration process by wrapping the articulated skeleton as the transformation model within the optimization process of the registration. Secondly, we quantify the accuracy of this model to represent imaged postures in the range of anatomical deformations present in typical inter-fractional radiotherapy treatment courses of the head and neck region. Additionally, we also test the interdependency between the targeted postures by the registration optimizer and the achievable restricted postures of the kinematic model, showing that a high accuracy and robustness of the registration of the skeleton is achievable.

Articulated biomechanical model
The model construction was previously published by Teske et al (2017a) and will be briefly summarized in the following, highlighting the newly introduced improvements. The biomechanical transformation model is composed of two different parts, a kinematics-based model for the description of transformations induced in the skeleton and a chainmail-based model subsequently deforming the adjacent soft tissue. In this study, the registration process was performed on the skeleton model, while soft tissue extrapolation was utilized to resample the deformed image after the registration.

Set-up of patient specific model geometry
The construction of the model requires input segmentation of individual skeleton bones. In this study, we used manual bone delineations on the planning CT. Skull bones are delineated as one connected structure excluding the brain and nasal cavities, ribs were delineated including the costal cartilages. Care was taken that contours of different bones do not overlap despite 3 mm slice thickness of the planning CT.

Joint positioning and parametrization
All joints are modeled as 3 degree-of-freedom ball-and-socket joints without mobility restrictions. Their position is either calculated as the nearest distance between a pair of connecting bones (Teske et al 2017a), or determined according to joint-specific rules (Teske et al 2017b). The newly positioned joints are shown in figure 1.
Atlanto-occipital joint located between the skull and the atlas is positioned on the curve, approximating the spinal curvature on a level with the points of contact of atlas and skull.
Atlanto-axial joint located between the atlas and axis (2nd cervical vertebra) is positioned within the odontoid process of the axis (dens). Its eigenvector with the smallest eigenvalue represents the direction vector of the dens. After projection of all vertebra voxels onto this eigenvector axis, the mean positions of 1% of the voxels each from both ends of the scale are used to calculate dens axis position. The joint is approximated to be in the middle of the line connecting both positions.
Intervertebral joint located between two adjacent vertebrae is positioned in the center of the line connecting the two body centroids to locate it centrally within the intervertebral disk.
Costovertebral joint located between each rib and adjacent vertebral body is positioned in the center of the shortest line connecting the center of the vertebral body and the medial axis of the rib.
Acromioclavicular joint located between the scapula and the clavicle is positioned in the center of the nearest distance between the medial axis transforms of both bones.
Glenohumeral joint located between humerus and scapula is positioned in the center of the humeral head (Veeger 2000). The latter is determined as the voxel with the largest distance to the humerus surface.

Kinematic tree and inverse kinematics solver
The described prototype is based on the open-source Simbody toolkit (Seth et al 2010, Sherman et al 2011, which is used for the construction of the kinematic model and for solving the inverse kinematic equations. The chosen solver is based on the L-BFGS optimization approach (Liu and Nocedal 1989). Three points with fixed positions relative to each other characterize the position of each bone. The posture of the skeleton is modified by moving these points according to the shift and rotation parameters chosen by the optimizer of the registration process. Figure 2 shows the schematic construction of the kinematic tree and the connections from the root element ( i.e. the skull). To avoid loops in the kinematic tree, the mandible and sternum are split during optimization and kept rigidly connected using a weld constraint. This constraint is included into the cost function of the optimizer to assure biofidelity.

Image data sets
The accuracy of the proposed non-rigid image registration approach incorporating an articulated kinematic skeleton model is retrospectively evaluated using imaging scans of three head and neck cancer patients. All patients have undergone postoperative fractionated radiation therapy using an integrated boost concept in 33 fractions. They were randomly selected from a patient cohort described previously including utilized fixation (Giske et al 2011) and planning dose prescription (Schwarz et al 2012). Written informed consent to use their data was obtained from all patients.
Besides the planning CT scans, daily kV-control-CT scans were available, showing typical inter-fractional anatomical deformations in the range from 0 to 9 mm after a rigid offset correction (Giske et al 2011). Patient 1 and 2 received multiple control-CT scans in some fractions, where image inspection uncovered re-positioning necessity. For patient 1 and 3, some of the fraction scans were absent due to the unavailability of the scanner on some treatment days.
In total 18 (6 per patient) fraction scans were utilized in this retrospective study. All image scans share a pixel spacing of 0.98 × 0.98 mm with a slice thickness of 3, 3, and 2 mm for patients 1, 2, and 3, respectively. Planning CT scans were acquired by a Toshiba Aquilon scanner (Toshiba, Otawara, Japan), and the fraction CT scans by a Siemens Primatom in-room single-slice spiral scanner (Siemens OCS, Malvern, PA). In all fraction CT scans, a stereotactic frame registration was applied to establish their spatial alignment with the planning CT scans utilizing a stereotactic frame described earlier (Giske et al 2011).
The imaging quality including the manual segmentation on the planning CT is shown in figure 3.

Reference annotation data
To evaluate the accuracy and robustness of the proposed registration, two quantification approaches were chosen alongside the visual assessment by image fusion. First, visibly identifiable points (landmarks) were manually localized on the planning CT and six fraction scans distributed along the treatment course of all patients. At least three landmarks on each bone were marked. For the landmarks, a combination of anatomical feature points and visually dominant intensity shifts (e.g. small fissures) were chosen to improve the detection on several image data sets. Outlier detection and a rigidity condition (<3 mm violation) were applied. For patient 1, we positioned at least 161 corresponding landmark pairs per fraction scan (see figure 4).
The inter-observer variability of the landmark identification was assessed on two out of six fractions for patient 1 by four independent observers and ranged from 0.1-2.9 mm.
Secondly, for patient 2 and 3, 63-70 corresponding landmark pairs were identified on 6 fraction scans to evaluate the robustness of the approach over several patients and the full treatment course.

Kinematic non-rigid skeleton registration
The registration pipeline established in this section is summarized in figure 5. The model is built-up from bones and automatic joint positioning. Then the optimizer maximizes the overlap of the model and the bone tissue in the fraction CT. The optimized motion is propagated through the surrounding soft tissue to generate the deformation vector field. Finally, the planning CT is deformed to align with the fraction CT.

Similarity metric
Due to similar image quality of the datasets, we utilize the overlap of bone voxels in the model and the target image as the similarity measure. As an objective function, this is simple and fast. Since bones are delineated in the planning CT scan for model build-up, we only evaluate voxels within these masks. The fraction CT scans are binarized to bones and background using a threshold at 120 HU.

Optimization scheme
The presented image registration prototype moves the complete skeleton depending on the kinematic rules. The optimization, however, considers each bone sequentially following a predefined hierarchical scheme. After each bone optimization, the respective bone is fixed and can only deviate slightly to conform to the kinematic rules as enforced during further optimization steps.
We have adopted a Nelder-Mead-Simplex optimization approach (Nelder and Mead 1965) for each rigidbone optimization. Each corner of the simplex is either a 3 or a 6 dimensional point, representing a transformation inducing either only rotations (R) or translations and rotations (R + T). Rotations are performed either relative to the bone's centroid or relative to the joint position. Bones that only undergo rotations are the clavicles, sternum, ribs, humeri, and mandible.

Soft tissue deformation propagation
To generate the deformation vector field necessary for image transformation, the optimized skeletal posture is forwarded to a chainmail-based soft tissue deformation model published by Teske et al (2017b). Each bone voxel is initialized with the transformation parameters obtained from the registration process. The soft tissue is parameterized by a material-transfer function, mapping the HU-values of the planning CT scan to modelspecific elasticity parameters. In this way, the skeletal posture is propagated into the surrounding soft tissue without the need for soft tissue segmentations.
The resulting forward vector field retains the rigidity of the bones and describes a consistent deformation of soft tissue in the vicinity of the skeletal bones. In order to resample the deformed planning CT scan, a vector field inversion as proposed by Aguilera et al (2015) is applied to the forward deformation field.

Comparison to intensity-based DIR
To investigate the performance of KinematicDIR in the context of existing intensity-based DIR approaches, the registration of patient 1 is also performed using 3-stage multi-resolution registration with Plastimatch (Pinter et al 2012) in the 3DSlicer extension (Fedorov et al 2012). In each stage, the previous result is used as the initial guess. Since the default parametrization of Plastimatch is optimized for fast registration, a second set of fine grid parameters is chosen to give the best registration accuracy. Table 1 summarizes the two sets of parameters for each stage.

Visual evaluation
The visual evaluation of the registration is shown in figure 6. The color fusion indicates that after stereotactic alignment, there are large deformations between the images. After the kinematic registration, the bones in the images align well and the residual errors are only present in the soft tissue. In the thoracic area, differences in the HU are mostly caused by contrast agent that is only present in the planning CT but not in any fraction CT. Additionally, residuals arise in the soft tissue at distant locations from bones due to the local nature of the implemented chainmail forward propagation. The blue and orange areas in the transversal and sagittal slice arise from the limited field of view of the fraction CT and the treatment couch and frame result, respectively.

Accuracy evaluation
For each fraction, the target registration error (TRE) was calculated for all landmarks. The boxplots in figure 7 show the distribution between the predefined corresponding landmarks after the registration transformation is applied.
The median TRE for patient 1 was 1.2 ± 0.1 mm with an inter-quartile range (IQR) of 0.9 ± 0.2 mm. There is no observable trend of the TRE within the data during the treatment course and the registration quality was consistent for all considered fractions.
The inter-observer variability is assessed on two fractions (F02 and F28) where four independent observers positioned the same landmarks. Figure 7 shows that after registration, there is no relevant difference in the distribution of residual errors for all observers. This means the inter-observer variability in the positioning of landmarks did not contribute to the accuracy evaluation of the proposed kinematic registration. All further landmarks were positioned by the same observer.
The comparison of KinematicDIR with the Plastimatch is shown in figure 8. The median TRE over all fractions with the Plastimatch default settings is 2.0 ± 0.3 mm and hence significantly worse than the KinematicDIR registration. For patient 1 the tuned fine grid parameters, Plastimatch achieves a median TRE of 1.2 ± 0.1 mm indicating comparable registration accuracy as observed for KinematicDIR.

Robustness evaluation
The robustness of the proposed registration approach was evaluated using the corresponding landmark pairs on 3 patients and a total of 18 fraction CT scans. Kinematic registration was performed for each fraction. For patient 1, the distribution of the TRE after registration is shown in figure 7. For patient 2 and 3, the distribution of the TRE per corresponding landmark is shown in figure 9. The median target registration error for patient 2 was 1.6 ± 0.2 mm with an IQR of 1.3 ± 0.4 mm. For patient 3, the TRE was 1.5 ± 0.1 mm with an IQR of 1.2 ± 0.1 mm.  . Distribution of the target registration error after kinematic registration for six fractions of patient 1. In fractions F02 and F28, the inter-observer variability was assessed by four observers (Obs1 -Obs4). The TRE after registration is 1.2 ± 0.1 mm with an IQR of 0.9 ± 0.2 mm. The kinematic registration shows accurate performance. The inter-observer variability is below 1 mm for both considered fractions and all observers.
Of all subjects, patient 1 showed the best results because the large field of view in the fraction CT included a large part of the sternum. This stabilizes the alignment of the ribcage. However, the quantitative evaluation of the TRE shows only negligible deviation between different fractions. Patient 2 showed slightly worse registration results for the last three considered fractions. This coincides with a re-planning after fraction 13 which resulted in a new patient positioning and hence an increased deformation in the image space between the original planning CT and the fraction CTs.
This led to an increased uncertainty when positioning the landmarks as reported by the observers which results in a broader distribution of the TRE without necessarily indicating a worse registration.
Overall, the TRE for all patients and fractions was robust under realistic conditions.

Model build-up
The presented articulated biomechanical model is built up from manual segmentations on the planning CT of each individual patient. This means we proposed a patient-specific model that incorporates the shape and size of  the bones. More importantly, this also means the joint position in the kinematic tree is specific to the individual and is therefore an accurate representation of the skeletal system. This requires human input for the manual segmentation of individual bones, which is expected to be, solved in the future using convolutional neural networks (Belal et al 2019). In particular with the development of the TotalSegmentator framework (Wasserthal et  Regarding the kinematic model, the use of ball and socket joint, which is known to be approximate for all human joints, was not detrimental to the registration quality. Anatomically, however, it appears reasonable that certain joints can be modeled using fewer degrees of freedom (Moore et al 2013). In particular, the motion of the ribs can be further restricted this way. This specific adaption-as an example-would improve the biofidelity further and reduce computation time.
Currently, the registration optimizer does not have any knowledge of the constraints imposed by the kinematic tree. Due to this, the optimizer will probe points that violate the model's constraints. We deal with this by using the implicit projection provided by the Simbody solver utilizing the closest feasible points, which are weighted projections onto the constraint surface. We evaluate the objective function at these points instead and modify our initial simplex to have new positions instead. This can cause problems due to the simplex collapsing at the boundary (Le Floc'h 2012), yet in our experiments, we did not encounter those cases.

Accuracy and robustness evaluation
The accuracy of the presented kinematic registration approach was evaluated on six fractions with more than 160 landmarks to assure that the quality of registration is comparable over the full course of a treatment fraction and does not deteriorate. The evaluation was limited to six fractions to limit the manual labor necessary to identify feature points.
The target registration error and its statistical distribution showed two properties: the accuracy of the skeleton registration is below 1.2 mm, which can be considered sufficient for adaptive workflows. In addition, there is no significant difference or trend to be observed throughout the treatment fractions. The kinematic registration approach retains the same level of performance even in the presence of anatomical changes in the surrounding soft tissue.
In two of the fractions, the inter-observer variability was investigated. The effect this inter-observervariability had on the median TRE was below 1 mm and in line with other publications (Sarrut et al 2006). Therefore, all other landmarks were identified by the same observer.
The comparison of KineamticDIR with the intensity-based Plastimatch DIR algorithm should be seen as a representative comparison to put KinematicDIR into the context of current intensity-based approaches. With tuned fine grid parameters, Plastimatch achieved a comparable accuracy as the proposed KinematicDIR. Therefore, the advancement of the articulated skeleton model was to maintain accuracy in the TRE while combining rigidity and enforced articulation in the approach as well as the more realistic motion modeling with explicit joint positioning. In particular, the kinematic skeleton model enforces the realistic motion of the complete skeleton. In addition, the coupling of the multi-body-physics model helps in prepositioning of the skeleton after optimizing the second bone, such that compared to multiple independent rigid body registration the number of evaluation steps is decreased. This helps to position bones in artifact-prone areas, where intensitybased methods fail frequently.
The robustness and generalizability of the presented approach were investigated using two additional patients with six fractions distributed along the treatment course in the same way as patient 1.
All patients underwent the same kinematic registration with the only adaptation being the optimization order for patients without visible sternum.
For patient 2 we identified a broader distribution of the TRE for the last three fractions that can be explained by a re-planning that occurred after F13. The resulting larger deformations between the original planning CT and consequent fraction CTs led to larger inaccuracies in the landmark positioning. The fact that the median TRE remains mostly unchanged indicates that the registration performance was still on a comparable level. This also indicates that the use of positioning devices remains useful even when using the proposed kinematic registration. Other than that, the analysis of the TRE showed no deterioration of the registration quality, and all considered fractions show a similar distribution of the error. This indicates that the proposed method is robust to be used with different patient anatomies and all fractions during a fractionated radiation treatment.
At the current stage of KinematicDIR, the registration is performed exclusively on the skeleton and all soft tissue is deformed using motion propagation in heterogeneous soft tissue. Ongoing work includes the incorporation of an optimization step of predefined soft tissue voxels during the soft tissue propagation.

Conclusion
In this study, for the first time, we have presented the application of an articulated kinematic multi-body skeleton model including joint positioning in deformable image registration. By incorporating biomechanical properties into the registration scheme we provide a robust, accurate, and biofidelic image registration.
We could also verify the high accuracy and robustness of the model to fit real patient postures as measured in CT scans during fractionated radiation therapy. We have shown the potential of kinematic model registration to be used in adaptive radiation therapy by providing an accurate transformation model for the skeleton in the head and neck area.