Motion-compensated scheme for sequential scanned statistical iterative dual-energy CT reconstruction

Abstract Objective. Dual-energy computed tomography (DECT) has been widely used to reconstruct numerous types of images due its ability to better discriminate tissue properties. Sequential scanning is a popular dual-energy data acquisition method as it requires no specialized hardware. However, patient motion between two sequential scans may lead to severe motion artifacts in DECT statistical iterative reconstructions (SIR) images. The objective is to reduce the motion artifacts in such reconstructions. Approach. We propose a motion-compensation scheme that incorporates a deformation vector field into any DECT SIR. The deformation vector field is estimated via the multi-modality symmetric deformable registration method. The precalculated registration mapping and its inverse or adjoint are then embedded into each iteration of the iterative DECT algorithm. Main results. Results from a simulated and clinical case show that the proposed framework is capable of reducing motion artifacts in DECT SIRs. Percentage mean square errors in regions of interest in the simulated and clinical cases were reduced from 4.6% to 0.5% and 6.8% to 0.8%, respectively. A perturbation analysis was then performed to determine errors in approximating the continuous deformation by using the deformation field and interpolation. Our findings show that errors in our method are mostly propagated through the target image and amplified by the inverse matrix of the combination of the Fisher information and Hessian of the penalty term. Significance. We have proposed a novel motion-compensation scheme to incorporate a 3D registration method into the joint statistical iterative DECT algorithm in order to reduce motion artifacts caused by inter-scan motion, and successfully demonstrate that interscan motion corrections can be integrated into the DECT SIR process, enabling accurate imaging of radiological quantities on conventional SECT scanners, without significant loss of either computational efficiency or accuracy.


Introduction
Compared to conventional single-energy CT (SECT), dual-energy CT (DECT) generates more informative and quantitative results from transmission sinograms acquired at two different x-ray spectra. In 1976, Alvarez and Macovski (1976) represented the linear attenuation coefficient (LAC) as the outer product of an energydependent basis function and a set of spatially-dependent coefficients and introduced the basic idea of multienergy CT to estimate the energy-independent spatial functions.
Since then, DECT has been further developed and is widely used in numerous clinical applications, including automated bone removal in CT angiography, blood-pool imaging, and virtual noncontrastenhanced imaging (McCollough et al 2015). In radiotherapy applications, DECT has improved the accuracy with which electron density (Tsunoo et al 2008), effective atomic number (Goodsitt et al 2011, Bonnin et al 2014, Hua et al 2018, Schaeffer et al 2021 and proton stopping power , Taasti et al 2016 can be imaged. More recently, statistical iterative reconstruction (SIR) algorithms for DECT have been introduced to reduce noise and improve resolution and accuracy (Fessler et al 2002, O'Sullivan et al 2004, Zhang et al 2014, Chen et al 2016, Zhang et al 2017. A novel SIR algorithm that jointly operates on different-energy raw-data sinograms demonstrated that it achieves sub-percentage uncertainty in imaging proton stopping-power (Zhang et al 2019, Medrano et al 2020, 2022. In current proton-therapy clinical practice, SECT is used to map stopping-power ratios, leading to 2-3.5% proton beam range uncertainty (Paganetti 2012, Park et al 2012. Dual-energy data acquisition methods include rapid-kVp switching scanning, single-source multi-layer scanning, and dual-source scanning (McCollough et al 2015), all of which require specialized hardware. In contrast, single-source sequential scanning can be implemented on any conventional SECT scanner. However, this method is more vulnerable to motion and deformation of the scan subject during the scanning process. Such patient motion includes changes in position, respiratory-and cardiac-induced organ motion, peristalsis, and intestinal gas transit, and can be as large as several cm over a 1-10 min period and are highly nonlinear and localized (Langen and Jones 2001). Since the two sinograms are acquired from differently deformed or positioned instances of the same anatomy, potentially large global artifacts result, significantly degrading the quantitative accuracy achieved by DECT SIR.
To address the dual-energy motion artifact, image registration has been incorporated with medical imaging techniques to reconstruct dual-energy images from measurements taken in different patient positions. Gang et al, Huang et al and Leng have all deployed deformable image registration (DIR) methods to align two sequentially scanned images at different energies before dual-energy material decomposition (Gang et al 2009, Leng et al 2015, Huang et al 2020 and demonstrated significant reduction of motion artifacts. However, these motion-compensated methods are designed for image-domain decomposition techniques, which cannot be simply applied to DECT SIR.
More relevant to our setting, is the literature on single-energy, motion-compensated 4D cone-beam CT (CBCT) image reconstruction. In this application, the source and flat-panel detector assembly slowly rotate (15-60 s/rotation) around the free-breathing patient, acquiring as many as 2000 projections. To minimize image blur and undersampling artifacts, Rit et al, followed by many others (Tang et al 2012, Brehm et al 2013, Mory et al 2016, proposed incorporating a 4D inverse deformation vector field (DVF) model into the backprojection operator so that all projections, regardless of breathing phase, can be backprojected onto a single artifact-free reference phase image. Several investigators (Jailin et al 2021, Wang and Gu 2013, Renders et al 2021 developed iterative 4D CBCT reconstruction algorithms, in which precomputed or iteratively updated 4D deformation fields are incorporated into each iterative image update. Inspired by these investigations, we propose to incorporate a 3D registration method into our joint statistical iterative DECT algorithm in order to reduce motion artifacts caused by inter-scan motion. To the best of our knowledge, embedding DIR in iterative DECT algorithms has not been previously reported. The proposed scheme is derived from the forward model for a deformed object and can be used by any DECT SIR. As well as testing the method on digital phantoms and patient images, we perform a first-order perturbation analysis to theoretically evaluate the estimation error introduced by registration target-registration errors (TREs).
We acknowledge that intrascan motion can also degrade DECT performance. However, in this work, we consider only the motion between two scans. Nevertheless, 4D-CT motion-compensated DVFs could be incorporated into the proposed scheme. Similarly, for more than two scans, the techniques here can be extended in a straightforward manner to more than two energies.

Materials and methods
2.1. General framework As shown in figure 1, the framework of the proposed DECT reconstruction algorithm with registration consists of 2 steps: registration maps evaluation and DECT algorithm.
The registration mappings are evaluated based on SECT images reconstructed from low-and high-energy data. The SECT images could be reconstructed by analytical algorithms or iterative algorithms. Single-energy alternating minimization algorithm is utilized in this work to reduce the influence of artifacts, so the registration algorithm is less likely to attempt to match the noise.
After the image registration, the iterative DECT algorithm is initialized by the analytical DECT images, and the deformation fields correct the system operator that maps the data domain and the image domain in both the initialization method and the iterative DECT algorithm.

Algorithm derivation from the forward model
3 be the coordinate systems describing the subject anatomy during high-and low-energy scanning, respectively. The true deformation transform Given the LAC map μ(x H , E) as a function of energy E, specified in the fixed coordinate system x H , the warped moving image can be written as μ(j 0 (x H ), E) = (μ•j 0 )(x H , E), where •denotes the function composition operator. Therefore, the transmission measurements for DECT are modeled by , the forward-projection of an image μ(x H , E) from the distorted image frame of reference can be rewritten as is the distorted anatomy system operator, which allows an image formed in the high-energy coordinate system to be compared with its low energy projection, which was acquired with the patient in the deformed state. By the definition of adjoint, we can derive the corresponding back-projection as  where BP(x H ) = ∫h(x H , y)g(y)dy is the back-projection of the sinogram g(y) into the high-energy image domain. The determinant of the Jacobian of the deformation field j -0 1 compensates for the variation of the space density in the epsilon-ball about the location x after the deformation field is applied. If the neighborhood of the point x is concentrated when the image is warped by the forward deformation field j, the point x tends to get more weight in the gradient calculation.
Because images and sinogram data are discretized, x, y can be treated as integer indices, i.e. so that image μ(x, E) can be represented as a matrix since we choose to discretize energy as well. In this case, The forward model can be represented as , 3 denote the discrete indices of the image space and measurement space, respectively, μ denotes the discretized image that only takes integer indices, μ( · , E) is the attenuation image at the energy E, and Interp (x: I) denotes the interpolated image that is indexed by  Î x 3 , estimated from the digital image I. The first proposed method for the discretized problem could be easily derived following the continuous case. The back-projection becomes Note that (9) provides the adjoint operator for an arbitrary interpolation method, which allows the use of nonlinear interpolations in the proposed scheme.
Conventionally, the interpolated image could be computed elementwise by a linear combination of the neighboring value and a set of coefficients where ( ) x H denotes the neighborhood indices of x H , and w(x H , x L ) denotes the coefficients based on the distance between x H and x L .
Moreover, for any interpolation of the form in (10), one could combine the deformation field and the interpolation as Then, the adjoint of the combination of interpolation and forward-projection to calculate the gradient could also be computed as which is the back-projection followed by the adjoint of the warped linear interpolation.

DECT SIR with image registration
In the previous section, we have shown that the warp of the image could be plugged into the forward-projection process to form a 'distorted forward-projection', and the adjoint of the distorted forward-projection could be represented as a back-projection followed by the inverse warping multiplied by the determinant of its Jacobian matrix (shown in (9)) or the adjoint of the interpolation if the interpolation function is linear (shown in (12)).
Since the substitution of the forward-projection does not influence the convergence property of the original algorithm, one could easily apply the proposed framework to any DECT SIR by warping the image before the target forward-projections and applying the adjoint warping after the corresponding back-projections.
This study employed a material decomposition model to simulate the target LAC, based on the assumption that the LAC of typical biological media can be accurately represented through a linear combination of distinct materials, denoted as ( , where c is the basis component weight, and i denotes the index of the basis. For the sake of simplicity, here we transform c 1 and c 2 into a 1-dimensional vector, denoting as  Î C N 2 X . Then, suppose the objective function of DECT SIR has the form of (O' Sullivan and Benac 2007, Huh and Fessler 2009, Xu et al 2009, Long and Fessler 2014, Zhang et al 2014 denotes the regularization term, and λ is a scalar that controls the penalty strength; j ä [L, H] denotes the scan index, and [ ( )] g C j y is the mean photon counts estimated from the basis images C. The update step can be written as a function involving backprojections of the reweighted mean sinograms and the reweighted measured sinograms, q y E C , : , : , 16 X Y denote the system operator (forward-projection) and its adjoint (backprojection), respectively, N X and N Y are the numbers of measurements and image voxels, respectively. Two Ń N Y X matrices on the diagonal of H are identical, and the Ń N Y X matrices off the diagonal of H are filled with zeros. Therefore, HC, S j (C) and P j (d j , C) are Ń 2 1 Y vectors.
(·) exp denotes the element-wise exponentiation, is the gradient function that evaluates the update direction based on the back-projections of the mean and measured transmissions as well as the penalty term.
The proposed method changes the updating step from (14) to X X is the interpolation operator given the deformation field j,˜j W is a dummy operator that represents j -W 1 or † j W , and † j W is the adjoint operator of W j if W j represents a linear operation. The flowchart is shown in figure 2.

Error analysis
Due to the inherent errors in approximating a deformation using the combination of the warping and the interpolation, there will always be errors between the ground truth and the computed image. In this section, we theoretically analyze the influence of warping error on the result of DECT SIR.
Let C f be the stationary point of the warped DECT SIR with the truth deformation field j 0 and perfect interpolation mapping W 0 . Note that the stationary point is achieved only when the first-order necessary condition is satisfied for (13). Then, C f satisfies (see the proof in the appendix) be the interpolation operator based on the estimated deformation field j e , and ΔW be the difference between the estimated interpolation and the 'truth' interpolation, then the estimated imageĈ satisfies˜(ˆ) where Λ L , Λ H , Ω(ΔW) and Q are defined in the appendix. Equation (22) describes a linear relationship between the small perturbation of the deformation field ΔW and the change in the estimated image ΔC. The first two terms in (22) give the Fisher Information Matrix of the loss function, which provides a manifold on the domain of c describing the mean of gained information over all the random realizations of the observed data. Intuitively, an uniform image will not propagate any error to the result even if the DVF residual is large. The deformation error is propagated through the application on the image C. Small components in the inverted matrix will amplify the error introduced by estimated DVF. However, increasing the penalty strength λ helps reduce the error.
In order to ensure that maximum accuracy is achieved relative to data quality, a plausible requirement is that estimated image errors due to DVF errors are small compared to the errors caused by Poisson noise in the transmission data (Qi and Huesman 2004) where  denotes the expectation with respect to the random measurement, η is a tolerance factor (i.e. 0.01), and Σ d denotes the covariance matrix caused by noise in data d. One can easily derive an approximation to Σ d in the case of DECT (Fessler 1996), which is As it was mentioned in Qi and Huesman (2004), the comparison of traces of the matrices could be a surrogate of (25) for the sake of computational simplicity, i.e.
In this section, we introduce the dual-energy alternating minimization (DEAM) algorithm as an example of DECT SIR. The DEAM algorithm is a statistical DECT algorithm that operates jointly on low and high-energy sinograms by decoupling the maximum log-likelihood problem at the energy domain using an alternatingminimizing strategy. The objective function of DEAM consists of the I-divergence and penalty term as Let R be the local regularization term that constrains the gradient of the image, such that x denotes the neighborhood of x, w R denotes the weight addressing the distance between x and ¢ x , and the potential function ψ is a element-wise convex symmetric smooth function. The derivation of the image update with the penalty of the form (29) is shown in the appendix.
In DEAM, ψ is assumed to be a Huber-type potential function: where δ controls the transition between ℓ 2 -norm and ℓ 1 -norm. With an appropriate choice of δ (see appendix A.3 for the penalty parameter selection), this differentiable function preserves sparsity in the gradient domain. Then, When the high-and low-energy images are perfectly aligned, DEAM yields decomposed images that can estimate physical properties, e.g. charged-particle stopping powers and electron densities, with subpercentage uncertainties (Medrano et al 2022) on both synthetic and measured sinogram data acquired from clinical CT scanners (Medrano et al 2020). The flowchart of the modified version of DEAM, called warped DEAM, is shown in figure 3, and the algorithm is shown in algorithm 1.
In this work, the warped DEAM algorithm is accelerated by the ordered subset technique (Erdogan and Fessler 1999) during the early iterations. The GPU-accelerated code has been implemented so that the algorithm is run on 4 NVIDIA V100 for time efficiency.

Deformable registration
In our setting, we require a DIR algorithm that yields diffeomorphic, topology-preserving mappings.
Since we need to repetitively and reciprocally warp image volumes between two different material energy domains (c 1 and c 2 ), a symmetric algorithm that guarantees the invertibility of the estimated transformation is critical to the whole reconstruction process. This guarantees the inverse consistency of the output mappings. Since the patient's anatomy may vary during two successive scans, the algorithm should also be stable with largescale deformation. Of the algorithms meeting these specifications (Oliveira and Tavares 2014), the state-of-theart advanced normalization tools (ANTs) (Avants et al 2009) toolkit was selected for our warped DEAM pipeline.
Designating the 90 and 140 kVp images as the fixed image J and moving image I. The ANTs DIR algorithm (symmetric image normalization method (SyN) (Avants et al 2008)) computes the optimal transformation j * within a transformation space that maps the coordinates x ä Ω of the moving image I(x) to a location ¢ Î W x which minimize a cost function E describing the similarity between I and J.
SyN generates transformations in a diffeomorphic space that forms a group of differentiable maps with differentiable inverses (Ebin and Marsden 1970) that is closed under composition. The diffeomorphism j is defined on the image domain Ω and is parameterized over time as a family of diffeomorphisms, j(x, t): d , described by the following ordinary differential equation (ODE) ( ) , , The final DVF yielded by j is u(x) = j(x, 1) − x. SyN derives the path by decomposing the final diffeomorphism j into two symmetric components j 1 and . This leads to an optimization problem as where  is the similarity metric depending on the images and the transformation, λ is a parameter controlling the tradeoff between smoothness and image similarity, L is a differential operator that enforces smoothness of the vector field, and Lv  represents the vector norm of v. Note that the regularization term in (33) is the total distance between the initial and the final transformations, which enforces a geodesic property on the resulting transformations. The resulting Euler-Lagrange equations are solved by variational optimization (Beg et al 2005).
To find the optimal v * , the variational energy E is minimized from either endpoint towards the midpoint of the transformation, as indicated by the similarity term. This strategy 'splits' the optimization dependence equally between both images. Thus, gradient-based iterative convergence deforms I and J along the geodesic diffeomorphism, j, to a fixed point midway (intuited by the notion of shape distance) motivating the moniker 'symmetric normalization' (SyN) for the solution strategy (Avants et al 2011).
As discussed in section 2.10, a multi-modality similarity term must be used since the attenuation coefficients of the same material differ significantly from 90 to 140 kVp.

Initialization
We use the iterative filtered back projection (iFBP) method to initialize DEAM (Yan et al 2000). iFBP is an imagedomain decomposition method that incorporates the polychromatic characteristics of the x-ray source into reconstruction and iteratively updates the image by the FBP of the discrepancy between the monochromatic and the polychromatic sinograms. However, since the estimated image is iteratively projected for the sinogram evaluation in iFBP, the inverse DVF should be applied prior to each forward projection.

Simulation studies
Two synthetic phantoms were used to evaluate the performance of the proposed motion-compensated DEAM method. The first was a modified high-contrast resolution phantom shown in figure 4 consisting of disks, squares, and grids at 1-5 intervals composed of 18% CaCl 2 solution embedded in a 30 cm diameter water cylinder. The resolution insert bars were composed of 45% K 2 HPO 4 solution. The flowchart for this test is shown in figure 4. The phantom was warped by a harmonic deformation field that has an exact analytical inverse. The original and warped phantoms were then back-projected to generate the non-warped (NW) and distorted geometry synthetic transmission data.
The dimensionality and voxel size of the phantom were 2440 × 2440 × 52 and 0.25 × 0.25 × 1 mm, respectively, so as to preserve its sub-millimeter information. The phantom is warped by a harmonic deformation field that has a strict analytical inverse. Then, the original DEAM and warped DEAM algorithms jointly operated on the low-and high-energy synthetic sinograms for the non-warped and warped phantom geometries.
For the anthropomorphic simulation study, we utilize the advanced version of the 4D digital phantom 4D Extended Cardiac-Torso (XCAT) (Segars et al 2010) as the ground truth. XCAT is an anthropomorphic phantom based on a non-uniform rational B-spline (NURBS) with highly detailed whole-body anatomies. The deformation fields are given by parameterized motion models with the heart and respiratory cycles. The XCAT phantom CT projector generates projections via continuous line integrals from the NURBS file rather than the discretized phantom. Synthetic polychromatic CT sinograms were constructed using the continuous XCAT projector in fan beam mode. We used a script to simulate the helical scanning process that duplicated Philips Big Bore scanner geometry and scan settings. The incident photon-fluence profile was scaled so that the variance of XCAT image reconstructed by the in-house FBP algorithm matched the variance of the clinical FBP image reconstructed by the same algorithm.

Clinical study
Helical 90 and 140 kVp scans of a patient subject (IRB study NCT03403361) were sequentially acquired on a 16row Philips Brilliance Big Bore CT scanner with 0.75 mm × 16 row (12 mm) collimation. The scanned region extended from the top of the head to the base of the skull to model a typical CNS patient and from the base of the skull to the top of the pelvis to represent a typical lung patient. The central-axis spectra used in the reconstruction process were experimentally determined by fitting the well-established Birch Marshall model to measured transmission profiles through aluminum and copper attenuators of varying thickness (Evans et al 2013).
The raw sinograms were exported directly from the scanner and preprocessed without beam hardening correction using proprietary software provided by the vendor. The sinograms were then reconstructed with our original and warped DEAM with a resolution of 1 × 1 × 1.034 mm 3 . The z-direction resolution was chosen to ensure that eight slices covered exactly a single gantry rotation. A research-oriented version of DEAM algorithm was run for 400 iterations with 33 ordered subsets followed by 500 iterations without ordered subsets. Each reconstruction took approximately 8 h for the component weights with a size of 610 × 610 × 146.

Similarity and image quality metrics
In this work we use two metrics, mutual information (MI) and local cross-correlation (LCC), for multispectral image registration.
MI quantifies the similarity between two images by predictability, which is the information gained about one image by observing the other. Let P I (a) denote the probability that value a appears in image I, P J (b) denote the probability that value b appears in image J, and P IJ (a, b) denote the joint probability that value a appears in image I and value b appears in image J. Then, The point-wise mutual information (PMI) is utilized as the misalignment indicator of two images to visualize the registration improvement (Rogelj et al 2003): In order to quantitatively assess the performance of the proposed motion-compensation framework with DEAM, relative LAC bias and mean absolute error (MAE) are introduced against the reference value at different energies In addition, we also use the SPR estimation error as another metric to assess the performance of the proposed method. The stopping power mappings are estimated from DEAM basis component images following the procedure outlined in previous studies [41], and the estimated SPR is compared to the ground truth value derived from known material composition with respect to the bias and the mean absolute error.

Resolution simulation results
The reconstructed images for the high-contrast study are shown in figure 5(a). Figures 5(b) and (c) show the zoomed details of the corresponding reconstruction. As expected, the DEAM-NW image has the cleanest edges, and the iFBP image has the most severe artifact, especially around the upper region in 5(b) and left region in 5(c). DEAM-Linear (DEAM-LI), DEAM-BSpine (DEAM-BS), and DEAM-Adjoint (DEAM-AD) share a similar pattern of the artifact in the region of 1 mm spacing bars, and DEAM-BS performs better than DEAM-AD and DEAM-LI. The lower left corn of the reconstructed grids in 5(c) are blurred, and the DEAM-LI and DEAM-AD images are slightly better than DEAM-BS. Moreover, with the increase of the energy for VMIs, the artifact would be reduced. Figure 5(d) shows profiles through the various bar patterns. Since the warping magnitude varies across the bar pattern, the profiles are not identical for bars within the same group. In profile 1, the outer and inner bars are over-and under-estimated, respectively, by the three warped DEAM algorithms. As bar spacing increases, the differences between bars in the same spacing group are reduced. For the 4 mm pattern, the warped DEAM profiles closely approximate those of the non-warped DEAM image. Regarding the dependence of image quality on DEAM interpolation scheme, none standards out as unambiguously superior. For example, in profile 1, DEAM-BS and DEAM-AD outperform DEAM-LI at 20 keV, whereas DEAM-LI and DEAM-AD outperform DEAM-BS at 140 keV. In profile 3, DEAM-LI and DEAM-AD outperform DEAM-BS at 20 keV, but in profile 4, DEAM-BS outperforms DEAM-LI and DEAM-AD at 20 keV. Considering both the overall performance and the computational complexity, DEAM with adjoint was used in later studies. Figure 6 shows the performance of DEAM with different deformation fields on three different slices of the XCAT phantom. The PMI images demonstrate the pointwise similarity between 140-kVp fixed image and deformed 90-kVp image.

Anthropomorphic simulation results
DEAM-SyN and DEAM-GT both outperform DEAM. The percentage MAEs in the region of the soft tissues and spine for DEAM-SyN are reduced by 12.55, 3.27, and 4.88-fold compared to the result without registration at 20, 60, and 150 keV, respectively.
For extra-lung tissue, incorporating warped DEAM using GT reduces MAEs from 14.82%-24.89% to 2.66%-2.76% for 20 keV LAC For 150 keV, warped DEAM with GT DVF reduces the errors from 1.53%-2.46% to 0.32%-0.35%. For lung parenchyma, the corresponding error reductions are 44.84%-77.02% for 20 keV LAC, and 4.95%-7.58% for 150 keV LAC The LAC of lung parenchyma is small compared to soft and bony tissues, leading to the small denominator in MAE. Therefore, MAEs for lung parenchyma are significantly larger than MAEs for extra-lung tissue.
DEAM-SyN errors are only modestly higher than DEAM-GT errors. It could be seen that the mismatches in PMI images and artifacts in reconstructed images are mainly concentrated in the heart region. The artifacts caused by misalignment match the shape of red regions in PMI. Most of the mismatching artifacts are corrected by the DVF from SyN. However, DEAM-GT still outperforms DEAM-SyN in reconstructing the boundaries of organs.  Figure 7 shows the accuracy with which DEAM XCAT reconstructions with and without motion compensation estimate mono-energetic linear attenuation coefficients in muscle (heart), adipose, muscle, and spine, respectively. The selected ROIs are 10 mm long cylinders. Figure 7(b) and (c) shows that motion compensation significantly reduces the magnitude of bias and MAE. For example, bias for heart muscle is reduced from 56% to 8% and 5% to 0.4% at 20 keV, 150 keV, respectively. The same trend can also be seen in other regions. The magnitude of bias for adipose and muscle is reduced from 1.7% and 4% to 0.2% and 0.1% at 40 keV, respectively. Both the bias and MAE for warped DEAM in the selected regions are within 1% from 40 to 150 keV, except for the heart muscle. The magnitude of bias and MAE for heart muscle is within 1% after 55 keV.
To assess the robustness of the proposed pipeline against noise, we conducted a quantitative comparison between the results reconstructed from the simulated normal-dose and low-dose measurements of the XCAT phantom, where the low-dose measurement was simulated at 1/10 of the normal dose. In order to highlight the superiority of our approach, we conducted a performance comparison between our proposed method and the pure image-domain decomposition method that is widely used in clinical practice. In image domain decomposition, two bead-hardening-corrected measurements were reconstructed by the filtered back projection, and the low-kVp image was warped by the estimated DVF. Subsequently, two basis component images are obtained by linearly combining the low-and high-kVp images, which we refer to as IDD-SyN. The results are presented in figure 8. Each histogram plots the pointwise errors between the estimated LAC and the ground truth LAC, divided by the ground truth LAC, in a 610 × 610 × 100 image volume. We only considered non-boundary voxels corresponding to soft and bony tissues in this assessment to avoid extremely small values being used as the denominator.  The evaluation results indicate that both DEAM-GT and DEAM-SyN algorithms reconstruct the results with negligible bias and variance in the normal dose and low dose studies. The majority of the errors fall within the range of −1% to 1% at 60 and 150 keV for normal dose. The histograms of DEAM-NW at 60 and 150 keV exhibit two peaks, where one peak represents the correctly reconstructed values, and the other denotes the values influenced by the misalignment. In the 20 keV results, the peak corresponding to the 'correct' values is greatly reduced, indicating that motion artifacts in DECT SIR can significantly affect the fixed region even when the target energy is relatively low.
When comparing DEAM-SyN with IDD-SyN in the normal-dose case, we found that the FWHM of the IDD-SyN histogram is approximately 1.8 times larger than that of the DEAM-SyN histogram at 60 keV. This difference becomes more pronounced for other energies. Specifically, the FWHM for DEAM-SyN is around 11.7 and 8.9 times larger than the FWHM for IDD-SyN at 20 and 150 keV, respectively. Additionally, the estimation biases of IDD-SyN are larger than those of DEAM-SyN. As a result, our study demonstrates that the imagedomain decomposition method with the estimated DVF leads to greater bias and variance compared to the proposed algorithm with the same DVF.
Moreover, since the penalty weight in DECT SIR can be adjusted to compensate for the noise, our proposed method is less likely to be affected by variance in the measurement than the image-domain decomposition method. In the low dose case, we multiply the penalty weight by a factor of 3 to balance the noise level and resolution of the reconstructed result. The FWHM of low-dose DEAM-SyN errors is approximately 1.6 times larger than that FWHM in the normal-dose case at 60 keV, while the FWHM of low-dose IDD-SyN errors is approximately 3.1 times larger than that at 60 keV. Figure 9 depicts a representative slice of the reconstructed image. As the simulated dose decreases, the image reconstructed by IDD exhibits a notable increase in noise levels, while DEAM-SyN with the appropriate penalty weight effectively suppresses the noise. However, DEAM-SyN-LD exhibits a larger boundary error in comparison to DEAM-SyN, especially at 20 keV. This discrepancy is likely due to the intensified penalty strength employed in DEAM-SyN-LD. Figure 10 shows propagated uncertainty in BVM component weights due to DVF inaccuracies, where Δc est is calculated from our first-order theoretical analysis, equation (22), and Δc real denotes the actual observed difference between warpred DEAM  reconstructions based on-SyN and ground-truth DVFs. Note that SyN registration errors are sufficiently large that the inequality (26) is not satisfied (the error mostly clustered around the organ boundaries.) Figure 11 displays reconstructed images by DEAM using different DVFs for two patient datasets. For the headneck patient, it is observed that the mismatches in the reconstructed images are concentrated in the regions of the pharynx, teeth, and shoulder. These regions of interest are magnified and displayed in the upper left and right corners of the reconstructed images. In the first head-neck slice of the DEAM images, mismatches are observed around the teeth (blue) and pharynx (orange), which exacerbate the artifacts in the DEAM-NW images. In the absence of motion compensation, the pharyngeal wall is reconstructed as a high-density structure in the 30 keV image due to the misalignment of its surface, as shown in the corresponding PMI image. In the second head- neck slice of the DEAM image, the motion artifact on the left shoulder (indicated in blue) has a similar shape to the mismatch identified by the inverse PMI. More mismatches are apparent on the boundary of the right shoulder (indicated in orange), affecting image intensity at the corresponding positions and in the surrounding areas. By incorporating motion compensation, both geometric mismatches (as indicated by PMI image) and the associated image artifacts are significantly reduced.

Clinical results
Due to the lack of the ground truth, we used the 140 kVp image as the structural ground truth, and quantitatively evaluated the image quality through the pointwise cross correlation (PCC) of VMI against the 140 kVp image. PCC images are displayed as 1−PCC, so a brighter region indicates a serverer mismatch. The number on the top right shows the mean value of the corresponding image. Both DEAM-MI and DEAM-CC outperform DEAM-NW with respect to CC images for all keVs, especially for the reconstructions at 30 keV. For the first HN slice, the means of PCC increase from 0.7854 to 0.8442 and 0.8503, and for the second HN slice, from 0.7918 to 0.8659 and 0.8750, respectively.
For the lung dataset, the mismatches are concentrated in the regions of the heart and lung parenchyma blood vessels. These mismatches are manifested as artifactual high-density structures in the 30 keV VMI estimated derived from DEAM-NW (1st, 2nd, 3rd orange and 3rd blue). The blue ROI of the first lung slice shows an overestimation of c 1 and underestimation of c 2 by DEAM-NW, which results in the underestimation of 30 keV attenuation coefficients in the heart. The mismatch and the motion artifact for the right bronchus are shown in the blue rectangles in the second lung slice, which is manifested as anomalously high attenuation coefficients in the bronchial wall. The means of PCC are increased from 0.8478 to 0.9018 (DEAM-MI) and 0.9048 (DEAM-CC) for the first lung slice, and 0.7710 to 0.8785 (DEAM-MI) and 0.8965 (DEAM-CC) for the second lung slice at 30 keV.
In conclusion, the results presented in figure 11 demonstrate that both DEAM-MI and DEAM-CC yield fewer mismatches and motion artifacts than DEAM without registration. Our analysis of the CC images suggests that DEAM with the proposed framework outperforms DEAM without registration, and the DVF derived from minimizing CC outperforms the DVF derived from minimizing MI in this task. This finding is consistent with the research of Brian Avants et al (Avants et al 2009), who demonstrated that CC is more effective at capturing local patterns and reducing the impact of artifacts and noise than MI. Figure 12 shows the histograms of PCC for clinical imaging reconstructed by DEAM, with and without the use of the DVF. Overall, the PCC values for DEAM-MI and DEAM-CC images exhibit a reduced number of occurrences near zero, indicating a successful reduction of motion artifacts. Additionally, this result also indicates that DEAM-CC outperforms DEAM-MI, as demonstrated by lower PCC counts in the range of 0-0.5.  Figure 13 compares the accuracy with which warped DEAM and uncompensated DEAM algorithms reconstruct LAC and SPRs in three cylindrical adipose ROIs in the head and neck dataset. Since DEAM-CC performed slightly better than DEAM-MI in the previous evaluation, the comparison is limited to DEAM to DEAM-CC. The following quantitative analyzes are based on the assumption that the materials property of the object is close to our anticipation. Since the weight fraction of lipids in adipose tissue is highly variable, we use three different lipid concentrations, 61%, 87%, and 94%, representing the lower limit, mean, and upper limits of lipid mass fraction documented in reference (Woodard and White 1986) to derive three difference adiposereference LACs.
Three regions of interest in the adipose are indicated in figure 13(a). For the reference adipose with 61%, the magnitude of bias and MAE for warped DEAM is less than the original DEAM result for the 20 to 67 keV energy range above which the uncompensated DEAM outperforms warped DEAM. However, since the bias and MAE in the least affected region (region 3) are larger than the bias and MAE in other regions, it is reasonable to assume that adipose with 61% lipid is unlikely to be the 'true' adipose composition for this patient. With the reference adipose with 87% and 94% lipid, the warped DEAM has a lower bias and MAE than the original DEAM at most  energies. The percentage MAE with 87% lipid reference adipose is within 2% after 60 keV, and the percentage MAE with 87% lipid reference adipose is within 2% after 47 keV.
Proton SPR estimation error is used as another performance metric. With 87% lipid reference adipose, the SPR bias of original DEAM is 5.7%, 3.4%, and -2.5%, respectively, while the SPR bias of warped DEAM is −1.2%, −1.1%, and −1.2%, respectively. With 94% lipid reference adipose, the magnitude of the SPR bias of warped DEAM could be as low as 0.3%. The same trend could be observed for the plot of MAE. With the proposed scheme, the MAE of DEAM result could be reduced from 5.8%, 3.7%, and 2.8% to 1.7%, 1.4%, and 1.5%. With 94% lipid reference adipose, the MAEs could be as low as 1.0%, 0.6%, and 0.6%. Figure 14 shows a similar analysis of two heart muscle ROIs, where mass fractions of water are varied over the range of documented compositions. Over this range, the variation of muscle LAC is much smaller than for adipose. The magnitude of bias and MAE for images from warped DEAM is less than the corresponding DEAM metrics for all three reference compositions. With the proposed DEAM-CC scheme, the SPR estimation bias is reduced from 8.4% and 4.4% to 0.4% and -0.3%, respectively, while MAE is reduced from 8.4% and 4.4% to 1.1% and 0.6%, respectively.

Discussion
In this manuscript, we explore the feasibility of utilizing DVFs in DECT SIR to compensate for interscan organ motion and tissue deformation that would otherwise compromise the accuracy of quantitative DECT applications. We fully appreciate that an 8 h reconstruction time is not acceptable for even off-line radiotherapy use cases and continue to investigate acceleration strategies. The DECT SIR process is dominated by the forward and back projection operations, which consume approximately 95% of the elapsed time. In contrast, the motion-correction step is relatively efficient, accounting for less than 1% of the total time as it involves only two trilinear interpolations per iteration. This paper successfully demonstrates that interscan motion corrections can be integrated into the DECT SIR process, enabling accurate imaging of radiological quantities on conventional SECT scanners, without significant loss of either computational efficiency or accuracy.
Nevertheless, the current total reconstruction time still falls far below the clinically acceptable threshold, due to the slow convergence rate of DECT SIR and high computational demands of forward-and back-projection. However, the potential exists to substantially reduce DECT SIR reconstruction time from several hours to on the order of ten minutes through a combination of more efficient update strategies (Degirmenci et al 2015, Zhang 2018, deploying additional computational resources (Mitra et al 2017), and incorporating deep learningbased acceleration. Recently, a novel model-based deep-learning technique for DECT SIR was proposed (Ge et al 2023). The model-based network was able to reconstruct clinical images in less than 6 min, achieving accuracy comparable to existing methods.
Reducing the computational burden associated with accurate radiological quantity mapping remains a challenge, requiring additional engineering efforts to achieve clinically acceptable reconstruction times, especially for online adaptive replanning applications which demand near real-time computational efficiency. However, as currently practiced, proton-treatment planning is an offline non-real-time process (1-2 h), for which a reconstruction time on the order of ten minutes is clinically acceptable. While a two-orders-ofmagnitude efficiency gain is a feasible engineering goal, its implementation is beyond the scope of this paper and is left to other ongoing and future investigations by our laboratory.

Conclusion
We developed a motion-compensated scheme that effectively mitigates motion artifacts in dual-energy sequential scanned reconstructions. This scheme is compatible with any DECT SIR that necessitates the assessment of projection and backprojection. The perturbation analysis described a linear relationship between the error in DVF and the error in the estimated DECT image.
The evaluation of the proposed scheme with a selected DECT SIR, DEAM, indicates a notable decrease in errors related to estimated mono-energetic linear attenuation coefficients. Specifically, our method reduces such errors from a range of 1.45%-21.16% to 0.40%-4.18% in both simulated and clinical cases. In the latter case, we observed significant reductions in motion artifacts in the pharynx, shoulder, and heart regions upon implementation of our scheme. : . A1 j y j j Then, the first order necessary condition (FONC) is achieved when for all x and i.
Let C f be the stationary point of the warped DECT SIR with the truth deformation field f 0 and perfect interpolation mapping f w 0 0 .˜f w 0 0 denotes the perfect adjoint or inverse of f w 0 0 . The stationary point is achieved when FONC of the dual-energy transmission problem is satisfied, i.e. where C f denotes the corresponding stationary point, and    The resolution metric is calculated by the integral from the frequency 0 to 0.5, i.e.
(˜)˜( ) ò w w MTF d , A 2 3 0 0.5 and the noise level metric is denoted by the variance of the selected uniform region.
In figure 15(a), five red circles indicate five selected edges for ESF evaluation, and the corresponding variance is computed on the region surrounded by each circle. Figure 15(b) and (c) show two examples of ESF, LSF, and MTF for the Vendor and DEAM images. Figure 15(d) shows the scatter plot of variance versus resolution of the DEAM result with different sets of penalty parameters compared with the vendor reconstruction. We selected (λ = 1e 5 , δ = 2e −3 ) as the parameter set for DEAM because DEAM with this set of penalty parameters generates images with higher resolution and less noise compared to the vendor reconstruction.