Robust residual-guided iterative reconstruction for sparse-view CT in small animal imaging

Objective. We introduce a robust image reconstruction algorithm named residual-guided Golub–Kahan iterative reconstruction technique (RGIRT) designed for sparse-view computed tomography (CT), which aims at high-fidelity image reconstruction from a limited number of projection views. Approach. RGIRT utilizes an inner-outer dual iteration framework, with a flexible least square QR (FLSQR) algorithm implemented in the inner iteration and a restarted iterative scheme applied in the outer iteration. The inner FLSQR employs a flexible Golub–Kahan bidiagonalization method to reduce the size of the inverse problem, and a weighted generalized cross-validation method to adaptively estimate the regularization hyper-parameter. The inner iteration efficiently yields the intermediate reconstruction result, while the outer iteration minimizes the residual and refines the solution by using the result obtained from the inner iteration. Main results. The reconstruction performance of RGIRT is evaluated and compared to other reference methods (FBPConvNet, SART-TV, and FLSQR) using projection data from both numerical phantoms and real experimental Micro-CT data. The experimental findings, from testing various numbers of projection views and different noise levels, underscore the robustness of RGIRT. Meanwhile, theoretical analysis confirms the convergence of residual for our approach. Significance. We propose a robust iterative reconstruction algorithm for x-ray CT scans with sparse views, thereby shortening scanning time and mitigating excessive ionizing radiation exposure to small animals.


Introduction
X-ray computed tomography (CT) has become an indispensable tool in various fields, including industrial inspection, security checks, and medical diagnosis of a wide range of diseases (Wang et al 2008).However, traditional CT scanning requires a dataset from a large number of projection angels, which can expose patients to excessive ionizing radiation, increasing the risk of cancer (Brenner and Hall 2007).To lower the CT radiation dose in clinical practice, sparse-view CT scanning with a small number of projection views was proposed.Sparse-view CT has the significant advantage of reducing the radiation dose proportionally to the number of projection views.It also improves scanning speed and hence temporal resolution by shortening data acquisition time, which is particularly useful for some dynamic imaging tasks, such as cardiac micro-CT imaging for small animals (Kudo et al 2013).
Despite the potential benefits of sparse-view CT, one significant challenge is image reconstruction due to insufficient projection data.Reconstructing images from under-sampled projection data is an ill-posed inverse problem and imposes a more strict requirement on the robustness of the algorithm in contrast to a fullprojection problem (Kim et al 2014).In general, CT reconstruction can be primarily classified into three categories: (1) direct or analytical reconstruction, (2) iterative reconstruction, and (3) deep-learning (DL)-based reconstruction (Wang et al 2020).The first type of direct reconstruction relies on the mathematical inverse of a forward model.Its embodiment in CT reconstruction is filtered back-projection (FBP) (Holschneider 1991).Despite its high efficiency, FBP may not constitute a proper method in a sparse-view problem, as it is vulnerable to appeared noise and produces highly degraded images with severe streaking artifacts (Singh et al 2010).Alternatively, iterative reconstruction combines a numerical forward model with a feedback loop, by progressively reducing the error between predicted sensor data and real measurement (Beister et al 2012).Examples include the Kaczmarz family of algorithms (e.g.algebraic reconstruction technique (ART) (Gordon et al 1970), simultaneous ART (SART) (Andersen and Kak 1984)), ordered-subsets expectation maximization (OSEM) (Shepp andVardi 1982, Hudson andLarkin 1994), separable quadratic surrogates (SQS) (Kim et al 2013(Kim et al , 2014)), penalized weighted least-squares (PWLS) (Fessler 1994, Niu et al 2014), and nonlocal means (NLM) algorithm (Kim et al 2016).Compared to direct reconstruction, iterative methods demonstrate significant improvement in reconstruction quality in reducing noise and artifacts.Nevertheless, these algorithms exhibit limited robustness regarding reconstruction with sparse-view projection data, attributed to the challenges of balancing data fidelity and regularization terms (e.g.Tikhonov regularization ( 2 l ) (Golub et al 1999), total variation (TV) regularization (Lu et al 2012, Liu et al 2013, Niu et al 2014, Kim et al 2016), or 1 l regularization (Kim et al 2014, Beck andTeboulle 2009)), and, a lack of theoretical convergence analysis (Scherzer 1995, Elfving et al 2014, 2017, Magreñán and Argyros 2018).In particular, for some regularizations such as TV and , 1 l it is mathematically challenging to show the theoretical convergence due to the non-smoothness and non-linearity of the regularization terms.More recently, DL-based reconstruction was conceived, showing promising results in sparse-view CT reconstruction (Lee et al 2018, Zhang et al 2018, 2020, Li et al 2022).However, DL models typically lack explainability and generalizability, which implies that these methods tend to be more susceptible to irregularities, biases, and noise inherent in the data, thereby compromising their robustness.Specifically, the effectiveness of supervised learning approaches is significantly influenced by the caliber of their training data, whereas the success of unsupervised learning techniques primarily relies on the convergence ability of the optimization solvers when minimizing their loss functions.Therefore, investigating robust methods capable of consistently producing accurate reconstruction results is of great significance in sparse-view CT reconstruction.
In this paper, we focus on improving the robustness of iterative methods and develop a new approach to tackle two primary challenges: meticulously fine-tuning the regularization parameter and the deficiency of the theoretical convergence analysis.To address the former issue, we consider integrating flexible least squares QR (FLSQR) (Chung and Gazzola 2019), an advanced inverse problem solver to adaptively choose appropriate regularization parameter in an iterative manner .The main ingredient of FLSQR is flexible Golub-Kahan (FGK) process (Chung and Gazzola 2019), an iterative dimension reduction technique which can progressively augments the solution space, thereby enabling the implementation of computationally demanding parameter selection methods (e.g.WGCV (Chung et al 2008), UPRE (Vogel 2002), DP (Colton et al 1997)).Although FLSQR shows promising simulation results in sparse-view CT, it still requires theoratical convergence guarantee and practical testing.To address the latter problem, we consider adopting an inner-outer scheme (Saad 1993, Eiermann et al 2000, Baker et al 2005, Ergül et al 2010) that uses the inner loop results to refine the solution at each outer loop iteration.This strategy can not only expedite convergence but also facilitate the convergence analysis.
Exploiting aspects of both FGK process and inner-outer framework, we propose residual-guided Golub-Kahan iterative reconstruction technique (RGIRT) for sparse-view CT, a more robust method that inherits the benefits of parameter selection in FLSQR, and ensures theoretical convergence of the residual.RGIRT utilizes a unique inner-outer iteration framework, in which FLSQR is implemented in the inner iteration, while a restarted iterative scheme is applied in the outer iteration.The outer iteration is capable of effectively minimizing the residual and refining the solution by integrating the intermediate results acquired from the inner iteration.By incorporating the residual, the theoretical convergence of the whole algorithm does not rely on the convergence of its inner solver, thus circumventing the major bottleneck of FLSQR which lacks a theoretical convergence proof.Based on the theoretical and experimental results, RGIRT offers two notable advantages compared with FLSQR and SART-TV (Lu et al 2012) which is a widely used TV-regularized CT reconstruction algorithm: first, RIGIRT enhances the precision of the reconstruction under sparse-view conditions.It is important to note that setting the number of outer iterations to one in RGIRT results in a solution equivalent to FLSQR, representing the worst-case scenario for RGIRT, as no correction is implemented; second, RIGIRT ensures the theoretical convergence of the residual and experimental convergence to the reference solution.It has a higher convergence rate and costs less memories.On the other hand, compared with FBPConvnet (Jin et al 2017), a representative DL-based method, RGIRT exhibits more robust performance in comparative experiments, especially in increasingly sparse-view condition and escalating noise levels.From an algorithmic perspective of RGIRT, under the correction mechanism applied in the outer iteration, high accuracy is not required during the inner iteration.This increases the tolerance of the algorithm, thereby in turn enhancing its robustness.

Method and experiments
2.1.Residual-guided Golub-Kahan iterative reconstruction technique 2.1.1.Inner-outer iterative scheme The ill-posed nature of sparse-view CT image reconstruction calls for the use of regularization to stabilize the inverse process.The mathematical model can be formulated as where Î x R N is the discrete image array; Î b R M is the corresponding measurement data (sinogram) that carries noise; Î Á R M N is the system matrix that models the forward process; M is the number of source- detector pairs, N is the number of CT image grids.( )  x denotes the regularization term, and ( ) l l > 0 is the regularization parameter that balances the data fidelity term   -Ax b 2 2 and the regularization term ( )  x .The primary framework of our method is based on inner-outer iterations (as shown in figure 1), where FLSQR is performed in the inner iterations to solve a subproblem until convergence, and the outer iteration refines the intermediate solution to the main problem (1) heuristically using the results from the inner iteration.Given an initial guess ( )  x , 0 we define the inner problem at the ( ) + 1 l th outer iteration as represents the residual at the lth outer iteration and ( ) x l is the approximated solution at the lth outer iterative; s represents the correction that will be used for updating ( )  x l in the outer iteration.Subsequently, we obtain the ( ) + 1 l th outer iterative result ( ) Although there are multiple choices for ( )  s , in this work we use 1 l regularization, which is defined as   ( ) =  s s .
1 Furthermore, we proved the convergence of residual for our proposed algorithm.We show that incorporating the correction in the outer iteration facilitates a gradual reduction of the residual, thereby mitigating the concern regarding semiconvergence.This can be expressed as where the proof is provided in appendix.2.1.2.Optimization via iterative reweighted norms Note that each inner problem of RGIRT can be regarded as a new minimization problem solved by FLSQR method.The implementation of FLSQR can be delineated into two steps: first, employing iterative reweighted norms (IRN) to break the original non-convex optimization problem into a sequence of weighted norm least-square problems; second, utilizing FGK process to perform iterative dimensionality reduction.IRN method has proven effective in solving inverse problems with 1 l regularization constraint, by approximating 1 l regularization with iteratively updated weight matrix (Gorodnitsky andRao 1992, Rodrguez andWohlberg 2008).The first step of IRN approach is to reduce 1 l regularized problem to a sequence of leastsquares problems involving a weighted 2 l norm.
with [ ] s i denoting the ith entry of s and a small threshold t > 0. To avoid nonlinearities of solving (4), we follow the common practice of approximating ( ) D s by the weighted matrix at the kth inner iteration ( ) where ( ) s k 1 is an approximation of the solution at the ( ) k 1 th inner iteration.Notice that seeking ( )  s k in (4) requires solving a large optimization problem with N unknowns at each iteration.To ameliorate the computational burden, we adopt the FLSQR method that can reduce problem size and automatically estimate regularization parameter.

Solution for the inner problem via FGK
The essence of dimension reduction lies in an iterative projection scheme, encompassing two main stages in each iteration.First, we generate a basis (a set of vectors) for the solution by exploiting the FGK process (Chung and Gazzola 2019).Second, we compute the coefficients of the basis by solving an optimization problem in the projected subspace, where the regularization parameter can be estimated automatically via WGCV (Chung et al 2008).
During the implementation of the first stage, at the kth iteration we obtain where is upper Hessenberg and whose column vectors are the basis for the solution of ( ) s ; 1 satisfy the orthogonality condition (in exact arithmetic) with identity matrix and approximate the solution of (4) via the relations is the solution to (9).To further simplify the problem (9), we perform QR factorization of (1997), where Q Z k , has orthonormal columns and R Z k , is an upper triangular matrix.Plugging the factorization results from equations ( 6) and (8) into problem (9), we arrive at a small-sized optimization problem in the projected subspace at the kth iteration Note that the size of the projected problem (10) is ( ) + k 1 by k.Therefore, RGIRT alleviates the computational complexity by transforming the challenge of solving N unknowns of s in the full-sized problem (4) into that of solving k unknowns of f k in the small-sized projected problem (10).After applying WGCV in (10) to estimate the regularization parameters l, we can determine the optimal coefficient ( ) Finally, when the stop criterion of the inner iteration is satisfied, we obtain the solution of problem (1) at the A general overview of RGIRT is illustrated in figure 1.Meanwhile, the implementation details of RGIRT are provided in algorithm 1, where d tol is the tolerance given by the user, and z z , in out are the maximum number of the inner and outer iterations, respectively.

Numerical simulation based on a Shepp-Logan phantom
We utilized a Shepp-Logan phantom to simulate sparse-view CT reconstruction, as it incorporates both highcontrast and low-contrast structures.The sinogram data were generated using ASTRA toolbox (van Aarle et al 2016).The detector row comprises 1024 elements with a pixel pitch of 0.05 mm.The distance from the focal spot of the x-ray source to the detector was set to 185.03 mm, while the distance to the system iso-centre was set to 141.52 mm.The detector array consists of 1024 elements.The dimension of each element is 0.05 mm.The matrix size of the reconstruction image is 512 × 512, and each pixel covers an area of 0.0765 0.0765mm . 2 A total of 400 projections were evenly distributed over 200°in the fan-beam geometry.The photon number was set to 1 × 10 6 for simulating Poisson noise in the sinogram.Different projection views (100, 75, 57, and 39 views) were sampled evenly among the 400 views and tested with different reconstruction methods for validation purposes.

Mouse cardiac micro-CT data
The sinogram data used in this study were experimentally collected from a mouse cardiac scan using a customized micro-CT scanner (Cao et al 2010).The scanning parameters were 50 kVp anode voltage, 2 mA anode current, 1024 detector elements of 0.05 mm pixel pitch for each detector row.The source-to-detector distance and source-to-object distance are 185.03mm and 141.52 mm, respectively.A total of 400 projections were acquired with a step angle of 0.5°, resulting in a total scan angle of 200°for a short-scan mode in the fanbeam geometry.The 400 projection views were acquired in the step-and-shoot mode with prospective gating to both the respiratory and cardiac signals of the mouse subject under free-breathing condition, so that all the projection views were acquired at the same phase of the cardiac and respiratory cycles.Similar to the Shepp-Logan phantom experiment, we designate the FBP reconstruction results obtained from 400 projection views as the reference image, and employ the reconstruction outcomes from 100 (or 75, 57, and 39) projection views to evaluate the performance of our algorithm.

Performance evaluation
We chose four different methods for comparisons including FBP, FBPConvNet (Jin et al 2017), SART-TV (Lu et al 2012), and FLSQR (Chung and Gazzola 2019).FBP relies on the radon transform and its inverse, usually viewed as much faster than iterative methods.FBPConvNet is a post-processing DL model based on convolutional neural network (CNN).To ensure a fair comparison, we trained four models for each experiment, Algorithm 1. Residual-guided Golub-Kahan iterative reconstruction technique (RGIRT).
end for Obtain ( )  x l using (12) = + 1 l l end while return final reconstruction ( )  x l with each model representing a distinct sparsity sampling condition (100, 75, 57, and 39 views, respectively).For each sampling rate, the training dataset consisted of 300 images of mouse with an image size of 512 × 512 pixels.Meanwhile, we used mean square error (MSE) as the loss function and employed the Adam (Kingma and Ba 2014) optimizer during network training.The MSE loss function can be written as where Y is the reference images reconstructed from the full 400 projection views by using FBP, X is the images reconstructed from the sparse-view sinograms via FBP, and ( ) F X stands for the images post-processed by FBPConvNet.The SART-TV and FLSQR are both iterative methods.In particular, FLSQR is the same algorithm as our inner iteration but without the outer restarted iterative scheme.As for quantitative measures, we used MSE, structural similarity index measure (SSIM) (Wang et al 2004), peak signal-to-noise ratio (PSNR), and computational time to evaluate the performance of each algorithm.

Reconstructions of numerical simulation results
Reconstructed images of the Shepp-Logan phantom under various view-sampling conditions were compared in figure 2. The matrix size of the reconstruction image is 512 × 512, and each pixel covers an area of 0.0765 0.0765mm .
2 For training the model of FBPConvNet, we generated a dataset consisting of 500 images representing ellipses with random intensity, size, and location.A fixed learning rate of ´- 1 10 4 was employed.For all iterative reconstruction methods, we fine-tuned the reguxlarization parameters to strike a balance between artifact removal and image resolution, following the examples outlined in the original literature.Specifically, in this case, the maximum number of iterations for the gradient descent step in the SART-TV method was set to 1000, with a regularization weight reduction factor of ´-2.5 10 , 4 a fixed step size of 0.2 that directly influences image refinement, and a tolerance of - 10 , 7 consistent with other iterative algorithms.For the FLSQR method, the maximum number of iterations was set to 100.As for RGIRT, the inner iteration number was set to 1, while the outer iteration number was set to 300.The regularization parameters for these two methods were selected automatically by WGCV.
As observed in figure 2, all the reconstructed images from 39 views are significantly degraded, exhibiting the streaking artifacts indicated by the green arrow.In contrast, relatively reasonable reconstruction results are achievable using 57 views.From the results, FBP exhibits streaking artifacts across all sparse-view sampling conditions.FBPConvNet recovers sharper edges attributed to its ability to learn representative features from the training dataset.However, as the number of views for reconstruction decreases, the smaller ellipses marked in the red box in the FBPConvNet-reconstructed images become increasingly blurred.This blurring effect is primarily caused by the adoption of mean squared error (MSE) as the loss function, where a lower MSE corresponds to more blurred images.To further illustrate this, we selected and magnified the region outlined by the red box in figure 2 for each method, as shown in figure 3.While the FBPConvNet results merge the three ellipses within the red box for the images reconstructed from 79, 57, and 39 views, other iterative methods successfully separate them.SART-TV results exhibit blurred edges in the images reconstructed with 75 and 100 views.Although the FLSQR and RGIRT results appear similar in figure 2 due to the simple structure of the Shepp-Logan phantom, RGIRT demonstrates superior performance in the subsequent experiment involving more complex mouse cardiac micro-CT images.
To further demonstrate the advantages of RGIRT in terms of artifact reduction and preservation of image details, figure 4 presents the absolute difference images, which highlight the absolute discrepancies between the results obtained from each method and the reference image.In the FBPConvNet results, fewer differences are observed within the phantom but more outside of it.These outside residual streaking artifacts may be attributed to discrepancies between the training and test datasets, a challenge often encountered in deep learning-based methods known as dataset shift (Takahashi and Braga 2020).Specifically, for the Shepp-Logan phantom experiment, the training dataset consist of 500 images, each containing 20-40 randomly positioned ellipses.The length of the horizontal and vertical axes is randomly chosen between 32 and 128 pixels, and each ellipse rotates with a random angle.Although the Shepp-Logan phantom is comprised of ellipses, not all its features are randomly incorporated in the current training dataset.Additionally, significant differences of the three ellipses are visible, particularly in the images reconstructed by 75, 57, and 39 views sinogram, aligning with the findings depicted in figure 3. The SART-TV results exhibit noticeable differences along the edge of the phantom, though there are fewer artifacts outside of it.Moreover, in comparison, the FLSQR and RGIRT results exhibit less differences along the edge and less discrepancy in the region containing the three ellipses.Compared to the FLSQR results, our RGIRT results have fewer artifacts outside of the phantom, especially in the 57-view reconstructions, further demonstrate the robustness in cases of extreme sparse-view conditions.Among the evaluated methods, RGIRT exhibits the most superior performance across multiple metrics, indicating its higher overall competence in terms of artifact reduction, preservation of image details, and fidelity to the   A quantitative analysis was conducted on the reconstructions of the sparse-view Shepp-Logan phantom and presented in table 1, with the best results highlighted in bold red and the second-best results in bold blue.Among all sparsity conditions, RGIRT demonstrated the smallest MSE and PSNR, except for the 100 views image, which closely resembled the performance of FLSQR.Furthermore, RGIRT ranked second highest in SSIM across all sparsity conditions, except for the 75 views image.While SART-TV outperformed RGIRT in SSIM, it exhibited significantly larger MSE.The results in table 1 demonstrate that RGIRT is capable of reducing artifacts and preserving image structure simultaneously, rather than excelling in one aspect while performing poorly in the other.Table 1.Quantitative results for sparse-view Shepp-Logan phantom.The best results are marked in red, and the second-best results are marked in blue.

Robustness analysis to noise
In practical applications, the presence of photonic noise during imaging can significantly impact the quality of reconstructed images.To evaluate the robustness of our proposed method, we carried out further comparative experiments within the Shepp-Logan experiment, assessing the performance of our proposed method under various noise levels.Poisson noise with five different intensities was introduced to the simulated sinogram of 57 projection views, corresponding to 1 10 , 6 5 10 , 5 1 10 , 5 5 10 , 4 1 10 4 photons.The reconstruction results using FBP, FBPConvNet, SART-TV, FLSQR, and RGIRT at different intensity levels are shown in figure 6. Notably, while all methods are affected by noise at the intensity level of 1 10 , 4 FBPConvNet exhibits a higher level of noise compared to the other methods.Reconstruction results (figure 6) and metrics (figure 7) show a good agreement that that RGIRT maintained its efficiency and accuracy both qualitatively and quantitatively, even in the presence of a increasing noise level.Although SART-TV also exhibits high noise tolerance (green curves in figure 7), its convergence is not guaranteed, resulting in a relatively poorer image quality when the stopping point is inappropriately selected.Another noteworthy observation is that, as the noise level increases, FBPConvNet experiences a sudden rise in MSE and a sharp decline in SSIM (figure 7), which emphasizes the susceptibility of FBPConvNet to noise interference, along with limited robustness and applicability.

Reconstructions of mouse cardiac micro-CT
The reconstructed images under various view-sampling conditions of the cardiac micro-CT are shown in figure 8.We can observe severe streaking artifacts (marked by the arrows) appearing in all the reconstructed images obtained from 39 views.As a comparison, the reconstructed results from 57 views have effectively eliminated the streaking artifacts.To train the FBPConvNet on mouse cardiac micro-CT images, the training dataset consisted of 300 images, with an additional 100 images allocated as the validation dataset.In this particular instance, RGIRT was configured with an inner iteration count of 3 and an outer iteration count of 20.The selection of the number of inner and outer iterations in contrast regions to be reconstructed.When the level of detail complexity is higher, we decrease the number of inner iterations and increase the number of outer iterations.The step size for SART-TV was set to 0.1.For all iterative methods, the tolerance was set to - 10 . 7All other parameters used in FBPConvNet training process and the three iterative methods are the same as the Shepp-Logan experiment.The FBPConvNet, SART-TV, and FLSQR methods effectively suppress the streaking artifacts in 100 views.However, as the views become sparser, the different methods degrade differently.FBPConvNet tends to blur low-contrast regions such as the heart and enhance high-contrast regions such as bones.Particularly, in the case of 57 views, the FBPConvNet blurs the heart entirely, while other methods preserve its structure.SART-TV, on the other hand, tends to blur the entire image, including both high-and low-contrast regions.This blurring effect can be clearly observed in the results of 75, 57, and 39 views (figure 8).Compared with the FBP and FBPConvNet, RGIRT and FLSQR successfully preserve the geometries of imaging objects in both high-and low-contrast regions in a small number of views (e.g.57 views and 75 views, figure 8).However, in the case of 57 views, FLSQR exhibits a slight degradation than RGIRT in high contrast regions, as clearly shown in the difference images in figure 8.
More detailed visual comparison is shown in figure 9, where the red box region of figure 8 (a) is enlarged to examine the thin interventricular septum more closely.The interventricular septum (indicated by the blue  arrow and red curve in the reference image of figure 9) is not discernable at all from the sparse-view CT image reconstructed by FBP.For the images reconstructed by FBPConvNet, it is somewhat visible from the 100-view case but barely visible from the remaining fewer-view (e.g.75, 57 and 39 views) scenarios.The same interventricular septum is visible from the images reconstructed from SART-TV, FLSQR, and RGIRT, but a close look reveals that the interventricular septum reconstructed by SART-TV and FLSQR are a little bit blurry due to the over-smoothing from those two methods.The interventricular septum reconstructed by our RGIRT method seems most clear, especially when examining the small structure details.Moreover, the contrast between the interventricular septum and the neighboring iodine-filled ventricles appears clearer than that from the other methods, which indicates RGIRT is capable of preserving soft tissue contrast and suppressing noise.Judging from figure 9, as the sparsity changes from 100 views to 75, 57 and 39 views, the superiority of our RGIRT method becomes even more obvious.
The effectiveness of RGIRT is further demonstrated in the difference images shown in figure 10, which were calculated as the absolute difference between the reference image and the sparse-view images in figure 8.In these difference images, the darker the color, the larger the error.We can see that the difference image of RGIRT has the smallest overall difference, which confirms RGIRT's superiority in preserving the image details.Interestingly, comparing the difference images between FBPConvNet and RGIRT, we found that RGIRT is better at preserving the soft tissues, while FBPConvNet is better at preserving the bone structures.Because the heart is a soft-tissue organ without bone structure, our RGIRT method is indeed better suited for sparse-view cardiac CT reconstruction.
Figure 11 shows the comparison of line intensity profiles of various methods passing through the yellow dash line (see figure 8) in the 57-view case.It is clear that the line intensity profile from RGIRT resembles most closely to the one from the reference CT image, especially for the peak and valley in the zoomed region.This demonstrates the advantage of our method over the other methods on persevering edge and small features in the images.
We also carried out quantitative analysis for the sparse-view cardiac micro-CT reconstructions and the results are presented in table 2. As shown in table 2, RGIRT and FBPConvNet clearly outperformed the other methods with smaller MSE and higher SSIM and PSNR.The FBPConvNet method has the best MSE among many sparse-view cases, mainly because it uses MSE as the loss function.However, as the aforementioned zoomed images (figure 9) clearly show, FBPConvNet has a poor performance in reducing the overall noise and noise textures, which led to the loss of small image details.Thus, although the FBPConvNet method provides good quantitative results, it cannot provide visual results as good as RGIRT.Compared to the other reference methods, the superiority of RGIRT is more obvious when it comes to fewer projection views.Therefore, we can conclude that RGIRT provides the most accurate reconstruction results for sparse-view CT.

Convergence analysis
For the three iterative methods (SART-TV, FLSQR, RGIRT), we analyzed their convergence behaviors from the following three aspects: whether it converges, the speed of convergence, and image error.This convergence analysis can be examined by plotting the MSE loss curves during iterations, whisch are shown in figure 12(a).This figure demonstrates the capability of the three iterative methods in stabilizing the semi-convergent behavior of the ill-posed problem in the 57-view mouse case.First, compared to FLSQR and RGIRT, SART-TV method has clear semi-convergent behavior, where MSE shows a trend of first decreasing and then slowly increasing as the number of iterations increases.As for the speed of convergence, it can be observed that RGIRT reaches the convergence point fastest with only 9 iterations.Furthermore, the image at the 9th iteration in figure 12(a) show that our RGIRT is able to obtain the most optimal result compared to the other methods.This is consistent with the performance of the MSE loss curve (see the zoomed area in figure 9).Combined with the fact that each iteration of RGIRT is to solve the dimensionality-reduced projected problem, the computing time of RGIRT spent on one iteration is the smallest among the three iterative methods.Thus, RGIRT is the fastest method to converge.In terms of image error, RGIRT produced the lowest MSE in all three iterative methods throughout the iterations.Therefore, among the three iterative methods studied in this work, RGIRT can provide the most accurate result with the fastest convergence speed.
When determining the maximum iteration number for inner and outer loops, it is essential to consider their respective roles-the inner iteration focuses on calculating solution space, while the outer iteration is dedicated to correcting the results obtained from the inner loop.An exploratory study concerning the selection of maximum iteration number for inner and outer loops is presented in figure 12(b).From the curve, we notice that the combination of fewer inner iterations with larger outer iterations often serves as a suitable parameter selection, which can improve the precision with high reconstruction speed.Theoretically speaking, FLSQR is a special case of RGIRT, wherein the number of outer iterations is set to 1 and no residual information is used for guidance.

Computational cost
Computational cost is an important factor for any reconstruction algorithm.All the algorithms in this work were implemented using MATLAB except the FBPConvNet, which was implemented with PYTHON.Considering that the training process for a DL method is very time consuming, the pre-training time was not included in comparison.Although the algorithms were realized using different programming languages, the efficiency can be roughly compared on a same computer.The MSE, SSIM and reconstruction time from the different methods when reconstructing the images for the 57-view mouse case are shown in figure 12(c).All metrics in figure 12(c) were normalized to the largest value from various methods.For example, the SSIM of RGIRT is scaled to 1, and the SSIMs of other methods are normalized to the SSIM of RGIRT.The average reconstruction time for FBPConvNet, FLSQR,and RGIRT is 1.57,84.66,5.33,and 3.89 s per CT slice, respectively.All the reconstructions were conducted on the same PC workstation (Intel Xeon Gold 6248 R CPU @ 3.00 GHz, 128 GB RAM and Nvidia Quadro RTX 4000 GPU card).Our RGIRT method clearly owns a considerable advantage in computational efficiency compared to the other iterative methods.

Discussion and conclusion
Although DL-based sparse-view CT reconstruction methods have gained increasing popularity in recent years, the interpretability and generalizability of trained networks remain unresolved issues in clinical applications (Antun et al 2020).A feasible strategy for resolving these issues is to combine deep learning networks with model-based iterative reconstruction methods (Zhang et al 2020, Xiang et al 2021, Su et al 2022).The focus of this study is to develop a robust iterative reconstruction algorithm for sparse-view CT, which is beneficial for the future research into sparse-view CT reconstruction methods that leverage the synergistic powers of data-driven and model-driven methods.The proposed RGIRT algorithm is based on an inner-outer iteration framework, where FLSQR is implemented in the inner iteration to build up the solution space and a restarted iterative scheme is applied in the outer iteration as a correction.In this study, we utilized data from 100, 75, 57, and 39 projection views for reconstruction and conducted comparative experiments involving four other algorithms (FBP, FBPConvNet, SART-TV and FLSQR) to explore the performance and limitations of RGIRT.Given sufficient projection views (e.g. 100 views), all reconstruction algorithms, including FBP, can yield outstanding reconstruction results.In contrast, in a situation with considerable data scarcity (e.g.39 views), noticeable artifacts emerge, indicating that 39 views exceed the practical limitations of all considered algorithms.Consequently, we selected reconstructions from 57 views for further evaluation, in which the performance differences (e.g.accuracy, stability) among various algorithms become particularly noticeable.It is important to mention that all the presented reconstruction methods including RGIRT cannot fully eliminate artifacts under extremely sparse-view conditions, e.g.57 and 39 views.One have to be careful in selecting an appropriate number of projection angles according to specific scenarios and image quality requirements.
The key strength of RGIRT lies in its exceptional robustness, primarily manifesting in the following aspects.Firstly, compared with other methods, RGIRT can deliver more accurate reconstruction results, especially under sparse-view conditions, since its inner-outer framework incorporates the residual-guided solution correction during the iterative process.We observe that the performance of all methods deteriorates as the number of views decreases.FBPConvNet tends to blur the details in low-contrast regions as shown in cardiac data of 57 and 39 views (figure 9) and numerical data of 75, 57, and 39 views (figure 3).SART-TV displaces a large error in the edges of high-contrast regions, which is clearly visible in both cardia data and numerical data for each case of view numbers (figures 4 and 10).Compared with these two methods, our method maintains a relatively smaller error in strong edges and depicts the low-contrast regions more clearly even under some extremely sparse-view conditions, e.g.57 views, for both in vivo and numerical experiments.Furthermore, The variation of SSIM with decreasing numbers of projection views indicates that RGIRT is distinguished by the lowest gradient of change, providing greater stability and robustness in more ill-posed cases with fewer projection views compared to other methods (figure 12(d)).
Secondly, the convergence of RGIRT is validated through both theoretical and experimental results.RGIRT employs a restarted iterative scheme in the outer loop to iteratively minimize the residual and refine the solution using the result obtained from the inner loop, thereby ensuring convergence.Classic iterative methods such as SART-TV have semi-convergent behaviour (figure 12(a)).Similarly, it has been shown that during network training practical DL networks fail to converge to optimum solutions (Kawaguchi and Sun 2021), due to the challenge of solving highly nonlinear optimization problems.We believe the convergence guarantee of our RGIRT framework would be valuable in designing hybrid learning methods that combine RGIRT and deep learning techniques, particularly in improving algorithmic stability and robustness, a claim that can be validated through proof-of-concept.This would certainly improve the explainability and generalizability of hybrid DL methods.Furthermore, RGIRT features a faster convergence speed, primarily attributed to its adoption of the efficient Golub-Kahan bidiagonalization method, and the use of residual-guided correction expedites the approximation of the optimal solution.
Thirdly, RGIRT outperforms other methods in terms of MSE and SSIM across different noise levels (figure 7).FBPConvNet occurs a sharp decrease in reconstruction accuracy when introducing Poisson noise with 1 10 4 intensity levels.For RGIRT, because of the correction in the outer loop, the inner solver doesn't require high degree of accuracy, which enhances the robustness of our algorithm against various levels of noise.Moreover, we can also reduce the number of iteration for the inner solver to save runtime memory.In figure 12(b), We investigated the performance of RGIRT across varying numbers of inner iterations and observed that a high number of outer iterations combined with a relatively smaller number of inner iterations can yield precise results.
Our work is subject to following limitations.Firstly, RGIRT is currently designed for a certain type of regularization, i.e. 1 l -norm regularization.In fact, other sparsity-encouraging prior knowledge can be adopted, such as 0 l -norm regularized dictionary learning method (Wu et al 2018).It is obvious that more complex and advanced priors may improve the reconstructed image quality, although they are more computationally demanding.In practice, one would balance reconstruction performance and computational cost.Secondly, the inner-outer double iteration scheme of RGIRT could cause some computational complexity, especially when the convergence speed is slow for some special CT geometries.In this study we mainly focused on the short-scan in fan-beam geometry, since for a cone-beam CT scanner the short-scan is advantageous in terms of decreased mechanical manipulation, increased scanning flexibility, and shortened acquisition time.Given the prevalent use of DL-based methods nowadays, we expect that our method can be hybridized with an DL-based method.This may involve a neural network to pinpoint an optimal mix of the regularization parameter and penalty function, followed by employing our algorithm for iterative reconstruction.Such framework will maintain the algorithmic convergence while reducing the uncertainties inherent in DL-methods.
In conclusion, we introduce a robust sparse-view CT reconstruction algorithm named RGIRT, which features an inner-outer dual iteration framework.The inner iteration efficiently yields an intermediate result, while the outer iteration minimizes the residual and refines the solution.Both numerical phantoms and real experimental Micro-CT data demonstrate the robustness and accuracy of RGIRT.Meanwhile, theoretical outcomes confirm the convergence of our approach.

Figure 1 .
Figure 1.The framework of RGIRT.RGIRT minimizes the residual in the outer iteration and employs the FGK process in the inner iteration to reduce its dimension adaptively.

Figure 2 .
Figure 2. Reconstruction results of the numerical study based on a Shepp-Logan phantom.Columns from left to right are the reference image and sparse-view image reconstructed via FBP, FBPConvNet, SART-TV, FLSQR and RGIRT, respectively.Images from top to bottom show reconstruction results obtained from 100, 75, 57, and 39 projection views, respectively.

Figure 3 .
Figure 3.The zoomed region marked by the red box in figure 2 (a).Columns from left to right are zoomed images from the reference image and sparse-view images reconstructed via FBP, FBPConvNet, SART-TV, FLSQR and RGIRT, respectively.Images from top to bottom show zoomed region for 100, 75, 57, and 39 projection views, respectively.

Figure 4 .
Figure 4.The difference images between the reference image and the sparse-view images in figure 2. Columns from left to right are difference images from FBP, FBPConvNet, SART-TV, FLSQR and RGIRT, respectively.Images from top to bottom show difference results for 100, 75, 57, and 39 projection views, respectively.

Figure 5 .
Figure 5. Line intensity profiles for the yellow dashed line in figure 2 (a3) (57 projection views).The inset shows the line from the our RGIRT method is closer to the reference line when compared to the other methods.

Figure 8 .
Figure 8. Reconstruction results for a mouse cardiac micro-CT dataset using different methods.Columns from left to right are the full-view reference image and sparse-view image reconstructed via FBP, FBPConvNet, SART-TV, FLSQR and RGIRT, respectively.Images from top to bottom show reconstruction results from 100, 75, 57, and 39 projection views, respectively.

Figure 10 .
Figure 10.The difference images between the reference image and the sparse-view images in figure 8. Columns from left to right are difference images from FBP, FBPConvNet, SART-TV, FLSQR and RGIRT, respectively.Images from top to bottom show difference results for 100, 75, 57, and 39 projection views, respectively.

Figure 11 .
Figure 11.Line intensity profiles for the yellow dashed line in figure 8(a) (57 projection views).The inset shows the line from the our RGIRT method is closer to the reference line when compared to the other methods.

Figure 12 .
Figure 12. (a)MSE loss curve of SART-TV (green), FLSQR (purple) and RGIRT (red) methods during iterations.The number of iterations for RGIRT is the product of the inner and outer iterations.(b) MSE loss curve of RGIRT during various number pairs of inner-outer iterations (57 views of mouse data).The number of inner iterations ranges from 1 to 30, while the number of outer iterations ranges from 1 to 200.(c) Quantitative evaluation of different reconstruction methods in the 57-view mouse case.(d) Variation of SSIM values with different projection views and reconstruction methods.RGIRT and FLSQR achieve a similarly smoother trend than FBPConvNet and SART-TV when the number of views changes.

Table 2 .
Quantitative results for sparse-view mouse cardiac micro-CT image.The best results are marked in red, and the second-best results are marked in blue.