Imaging based on Compton scattering: model-uncertainty and data-driven reconstruction methods

The recent development of scintillation crystals combined with $\gamma$-rays sources opens the way to an imaging concept based on Compton scattering, namely Compton scattering tomography (CST). The associated inverse problem rises many challenges: non-linearity, multiple order-scattering and high level of noise. Already studied in the literature, these challenges lead unavoidably to uncertainty of the forward model. This work proposes to study exact and approximated forward models and develops two data-driven reconstruction algorithms able to tackle the inexactness of the forward model. The first one is based on the projective method called regularized sequential subspace optimization (RESESOP). We consider here a finite dimensional restriction of the semi-discrete forward model and show its well-posedness and regularisation properties. The second one considers the unsupervised learning method, deep image prior (DIP), inspired by the construction of the model uncertainty in RESESOP. The methods are validated on Monte-Carlo data.


Introduction
At first a tool for visualizing the inside of the human body using X-rays by the upcoming of Computerized Tomography (CT), the need for imaging affects nowadays astrophysics, homeland security, landscape and environment monitoring and of course manufacturing processes to cite only a few. This success is made possible by the technological progress in terms of detection -cameras, crystals, etc -but also in terms of computing and storage capacities.
Computerized Tomography (CT) is a well-established and widely used technique which images an object by exploiting the properties of penetration of the x-rays. Due to the interactions of the photons with the atomic structure, the matter will resist the propagation of the photon beam of energy E and intensity I(x) according to the well-known Beer-Lambert law where µ stands for the lineic attenuation coefficient and x → y denotes the straight line {x + t(y − x), t ∈ [0, 1]}. To interpret the measurement of the intensity in a CT-scan is then possible with the help of the Radon transform in 2D and the X-ray transform in 3D, which maps the attenuation map µ(x) into its line integrals, i.e.
ln I(s, θ) with (p, θ) ∈ R × S 1 and where s and d stand for the position of the source and of the detection point. We refer to [30] for more information.
The energy constitutes an important variable made accessible by the recent development of scintillation crystals and semi-conductors detectors [23]. Currently the energy is exploited in multi-spectral CT as a supplementary variable split into several channels delivering a precious information on the attenuation coefficient at different energy levels. We refer to [4,34,15,42,28,19,18]. However the recently achieved energy resolution, more precisely the FWHM, of the current scintillation crystals opens the way to consider the energy as a reliable dimension along with viewpoints and detector positions. In particle physics, the question of the energy intersects with Compton scattering. Indeed, when one focuses on the physics between the matter and the photons, four types of interactions come out: Thomson-Rayleigh scattering, photoelectric absorption, Compton scattering and pair production. In the classic range of applications of the x-rays or γ-rays, [50, 1000] keV, the photoelectric absorption and the Compton scattering are the dominant phenomena which leads to a model for the lineic attenuation factor due to Stonestrom et al. [41] which writes where λ P E is a factor depending on the materials and symbolizing the photoelectric absorption, σ(E) the total-cross section of the Compton effect at energy E and f the electron density (generally noted n e ) at x.
The Compton effect stands for the collision of a photon with an electron. The photon transfers a part of its energy E 0 to the electron. The electron suffers then a recoil and the photon is then scattered of an (scattering) angle ω with the axis of propagation. The energy of the photon after scattering is expressed by the Compton formula [13], where mc 2 = 511 keV represents the energy of an electron at rest. Measuring accurately the variations of the energy can thus be interpreted as scattering events characterized geometrically by the scattering angle which is the foundation of Compton scattering tomography (CST), see [3,5,8,10,12,16,17,20,11,29,6,21,22,32,31,33,37,43].

Spectral data
Given a monochromatic γ-ray source s of energy E 0 and an energy-resolved detector d, the illumination of a specimen represented by its attenuation map µ leads by the Compton effect to a polychromatic response measured at d. This would also hold for a polychromatic source as studied in [25,26] but for the sake of simplicity we consider in this work only monochromatic sources. Assuming only Compton scattering and photoelectric absorption events, we can decompose the spectrum Spec(E, d, s) measured at a detector d with energy E as follows Spec(E, d, s) = i∈N g i (E, d, s).
The data g i stands for the measured radiation without scattering events for i = 0 and after i-scattering events for i > 0. The ballistic data g 0 can be understood as the intensity I(d, θ) in eq. (2). Widely studied in 2D [25] and 3D [36], the first-order scattered radiation can be modeled by weighted circular or toric Radon transform and shares similarities with g 0 in particular in terms of mapping properties. More generally, g i , i ≥ 1, can be seen as a special case of the integral transforms with k i (·) a singular kernel, Ω ⊂ R d , d = 2, 3 and (E, D, S) the domain of definition of (E, d, s). The complexity to handle k i computationally, already for i = 2 studied in [25,36], combined with its nonlinearity with respect to f (already for the firstorder scattering as µ is a function of f ), makes the use of multiple-order scattering intractable in practice, at least with the current level of technology. The exploitation of scattering in imaging is thus extremely challenging at theoretical and computational levels and solving (5), i.e. finding f from Spec given the scattering model L 1 , will lead to a large model inexactness.
Therefore, one needs appropriate reconstruction methods able to tackle this limitation of the model. Two approaches appear suited and are considered in this work. The first one is the RESESOP (regularized sequential subspace optimization) developed in [9] for dealing with model inexactness. The principle of this method is to split the inverse problem into subproblems and to relax the solution set for each using stripes instead of hyperplanes. The thickness of the stripes is then controlled by a parameter of model uncertainty. The second approach is the widely used deep image prior (DIP) unsupervised learning technique, which was presented in [27] for denoising and inpainting problems. The reason to use this approach is twofold: (i) it does not require datasets which are at the moment inexistant for CST, (ii) it provides a very flexible architecture while sharing interesting properties from optimization.
Studied in [36], the shape and disposition of the detector array is important to the structure of the forward models. We denote by S 1 1 2 (d) ⊂ R d the half-sphere of dimension d − 1 and parametrized by angles (α 1 , . . . , α d−1 ). We define the set of detector positions defined by s and t as with t a smooth function. For the implementation, we consider here the case t(α) = cos α which characterizes the sphere passing through 0 and will denote below D(cos) by D for the sake of readibility.

Outline and contributions
The paper is organized as follows. Section 2 recalls the forward models associated to the first-and second-order scattered part of the spectrum, see [36,25]. We study the nonlinearity of the first term and discuss how standard algorithms could be exploited at the cost of large computation costs and favorable prior information. More flexible, a linear approximation of the forward operators leads to interesting mapping properties, see also [36,25], and is suited for reconstruction strategies. Due to the complexity of the second-order part, we focus the inverse problem on the first order part. However, this approximation implies a strong model uncertainty in particular when incorporating the second-order scattering. In order to solve the spectral inverse problem (5) with such an inaccuracy, we propose first to adapt in Section 3 the RESESOP method. In [9], the authors proved that the proposed RESESOP-Kaczmarz is a regularization method for the SESOP (sequential subspace optimization) method with exact model, see Theorem 3.8. The spectral problem is reformulated first as semi-discrete, and then as fully discrete, more precisely we consider a finite dimensional restriction of the solution space. It follows by Corollary 3.10 that the RESESOP method adapted to the fully discrete problem regularizes the semi-discrete one. Furthermore, the constructed solution for the fully-discrete problem converges to the minimum norm solution of the semi-discrete problem for a suitable start iterate, see Theorem 3.13. Inspired by the RESESOP approach, we then derive in Section 4 an appropriate loss function for a DIP algorithm. Simulation results are presented in Section 5 for synthetic data and Monte-Carlo data for the second-order scattered radiation. A conclusion ends the manuscript.

Formulation of the mathematical problem
As explained in the Introduction, the measured spectrum is the sum of the primary radiation and of the scattered radiation of different orders. From a physical point of view, the lower the energy the lower is the probability of a scattering event. It follows that high-order scattering events, typically ≥ 3, represent a marginal part of the scattered radiation and by the stochastic nature of the emission of photons will be highly noisy. To reflect this physical point of view, we consider that with a noisy perturbation. In this section, we recall the modelling of the firstand second-order scattered radiation, their properties and detail the computation of the spectral data for a specific scanning architecture. The section ends with the presentation of a general reconstruction strategy.

The forward problem
As proven in [38,37], the first-order scattered radiation g 1 can be modelled by the integration of the electron density f along spindle tori (in 3D) or circular-arcs (in 2D) expressed as For an illustration of the geometry of a circular-arc see Figure 1. It follows that where W 1 (µ) quantifies the physical factors (attenuation and photometric dispersion) between s, x and d, and φ stands for the level-set function associated to the inside (resp. outside) spindle torus when positive (resp. negative) and is given by where with · 2 the euclidean norm.
Studied in [36,25], the second-order scattered radiation g 2 can be represented, akin to g 1 , by a nonlinear integral transform where W 2 (µ) quantifies the physical factors between s, z and d with z = (x, y), f = f ⊗ f and ψ characterizes the locations of successive first-and second-order scattering events. We refer to [36,25] for more details.

A look on the nonlinear problem
The operators L 1 and L 2 are nonlinear w.r.t. f andf respectively but also difficult to handle numerically. The Fréchet (or Gâteaux) derivative is then essential in the construction of reconstruction schemes. Focusing on the first-order scattering, we can compute the corresponding Fréchet derivative where we neglect the photoelectric absorption in µ in eq. (3), i.e. µ(·, E) = σ(E)f , and consider, for the sake of simplicity, the operator L 1 (f ) := L 1 (µ, f ) : X → Y, with X , Y two suited Hilbert spaces equipped with their respective norms. Furthermore, the weight W 1 (see [36,25] for more details) can then be written as where X denotes the X-ray transform applied on the electron density f along the scattering path s to x and x to d (see eq. (1)).
Proof. The Fréchet derivative for L 1 is defined as Inspecting W 1 closer, we note that it is the composition of a smooth function and of a linear operator which implies that W 1 is Fréchet-differentiable. Given the Fréchet derivative of W 1 , it holds The linear part w.r.t. h on the right handside, i.e. its Fréchet derivative, reads now For Ω compactly supported and for f being bounded, it is clear that the linear operator f (z) (W 1 ) f + W 1 (f )I is bounded and consequently the property holds for (L 1 ) f . We observe that the computation of the Fréchet derivative of L 1 , for instance within the Kaczmarz's method, would require the computation of (W 1 ) f and W 1 (f ) at each iterate which constitutes, especially in 3D, an expensive task.
Besides the computation cost, the Fréchet derivative needs to satisfy the so-called tangential cone condition which would read as with some constant c t < 1, in order that most of the iterative schemes applied on L 1 converge. Using the expression of W 1 , it holds with symbolic notations We observe that the tangential cone condition might not hold for "large" h as the second term explodes for h large. Therefore, dealing with the nonlinear problem might require an a priori initial value close to the solution which is not always possible to guarantee.

Linear approximations and mapping properties
This is the reason why it is relevant to split the dependency on f and therefore study instead linear approximations L 1 (µ * , ·) and L 2 (µ * , ·) with µ * a known a priori smooth approximation to the original µ. Such approximations have the following properties on the Sobolev scale.
In the next sections, we consider a semi-discrete and a fully discrete setting for the spectral data in order to better reflect data acquisition. To this end, we consider a set of source and detector positions (s k , d k ) k=1,...,K as well as a set of energy (E p ) p=1,...,P . The sampling step on the energy will in practice depend on the energy resolution (called FWHM) of the detector.
We also aim to connect these both settings in terms of representation. However, similarly to the semi-discrete Radon transform, see e.g. in [35,Chapter 6.3.], we face the problem that sampling L 1 (µ, f ) on a finite set is not well-defined for arbitrary f ∈ L 2 (Ω), as its equivalence class may not have a continuous representative. Indeed, for s fixed, the operator is not well-defined in the semi-discrete setting, as L 1 (µ, f ) would have no continuous representative, and therefore discretizing (E, d, s) would be improper regarding the continuous case. However, we can exploit an embedding property for the Sobolev spaces. Before stating this property, we recall some geometric concepts. if there exists a cone C 0 in R d , such that for all x ∈ Ω there is a cone C(x, r, U ) ⊆ Ω with vertex at x, which is congruent to C 0 . That is, C 0 and C x must be equal up to rotation and translation.
The embedding theorem below is a special case of Sobolev's Lemma, see [44,Theorem 6.2.]. Therein, the set Ω does not only need to satisfy the cone property, but also the segment property in [44, Definition 2.1.]. However, it is mentioned in [44] that for bounded Ω the cone property is sufficient.
Theorem 2.6. [44, Corollary 6.1], [2] Let Ω ⊂ R n be a bounded region and have the cone property. Then the Sobolev spaces H s 0 (Ω ) and H s (Ω ) are continuously embedded into C m (Ω ) for all s > m + n/2.
In order to exploit this result, it is important to relax the locality constraint in Theorem 2.3. To this end, we consider a suited smooth cut-off χ which vanishes at the boundaries of (E, D) such that χL 1 . Choosing k = 0, it follows that the operator has a continuous representative for The result holds similarly for L 2 . Therefore, assuming µ ∈ C ∞ (Ω) and f ∈ H α 0 (Ω) with α > 1/2, we can now define the forward operators for the semi-discrete first-order and second-order scattering by ..,P, k=1,...,K .
Letting aside the ballistic radiation g 0 which contributes in only one value at E 0 in the spectrum, the spectral problem (7) becomes then Most reconstruction techniques require the computation of the adjoint operator, here of L µ 1 and L µ 2 , and consider the topology of the L 2 -space in order to take into account perturbations in the measurement. However, akin to the semi-discrete Radon transform, see [35], the adjoint of L µ 1 is not continuous w.r.t. the L 2 -topology and its computation in the H α -topology can be a hard analytic and computational task. A way to circumvent this obstacle is to restrict the domain space to a finite dimensional subspace allowing us to use for instance the L 2 -topology, by equivalence of the norms. Therefore, we consider for the implementation the fully-discrete case which can be formulated by restricting the forward domain space into a subspace 2 ) j are bounded with respect to the L 2norm and more standard approaches can be used to solve where g i , i = 1, 2, denotes the sampled version of g i . An interesting question is how to relate the solution to all subproblems (13) to the solution of (12). This is answered in Section 3.

Model uncertainty and reconstruction strategies
Focusing on the Compton part, the spectral problem (7) can be reformulated with the fully discrete setting in eq. (13) by in which Spec ∈ R P ×K is the sampled version of Spec.
Remark 2.7. Using g 0 in the reconstruction process is sensible. For instance, it is possible to reconstruct under sparsity constraints a first approximation of the attenuation map which can help to refine the forward model, in particular the weight functions, see [26,25]. However, we discarded this part of the spectrum as we wanted to stress the model uncertainty using a weaker a priori of the attenuation map.
This inverse problem is in particular challenging regarding the following two aspects: • complexity: while the computational cost regarding L m 1 u is similar to the one of the semi-discrete Radon transform (assuming the weight function is precomputed), evaluating L µ 2 is much more expensive. Given a grid of N d elements, then the complexity of L µ . In 2 dimensions and N = 100, it means that the computation of the second order scattering is 10000 times more expensive! This represents an important obstacle which encourages us to focus on the firstorder scattered radiation and forces us to use different simulators such as a Monte-Carlo approach for the second-order.
• model uncertainty: the linearization of the forward models by assuming a prior attenuation map µ * leads to an inaccuracy in the model i.e. we have with some η 1j > 0 The issue of the model uncertainty further increases when focusing on the firstorder scattering as proposed above. In this case, the second-order (and larger order in practice) has to be treated as model error as well, which yields where η j can be expected large.
Another reason to focus on the first-order part is the smoothness properties given in Theorem 2.3. Since the L 2 is a smoother FIO than L 1 , it tends to spread the features of f and therefore the first-order part is richer for encoding f . A way to emphasize the smoothness scale, it is possible to add to the inverse problem a discretized differential operator (finite difference for example), P : R P ×K → R P ×K acting on the energy variable, leading to solve and to the model uncertainty We observe empirically that the use of a differential operator reduces the model uncertainty, i.e. η P j < η j , but this remains to be proved. This strategy was successfully applied: in 3D using FBP-type algorithm for the extraction of the contours in [36] and in 2D using iterative total-variation (TV) regularization in [26,25]. However errors and artifacts due to an inaccurate model can appear and need to be addressed using data-driven algorithm.
In [9] the authors consider a CT problem affected by motion of the patient, which leads to an inexact forward operator, as the motion is not explicitly known. They proposed to apply the regularized sequential subspace optimization (RESESOP) for solving inverse problem subject to model uncertainty and studied how the method is well-posed and regularizing for the exact inverse problem. In Section 3, we propose to adapt this strategy for solving the semi-discrete and fully-discrete problems associated to CST. We also prove that the fully-discrete RESESOP is a regularization method for the semi-discrete SESOP solution.
A second approach consists in implementing the deep image prior (DIP) unsupervised learning method in order to address our inexact inverse problem. Discussed in Section 4 and in Section 5, the standard loss function does not succeed to compensate for the model uncertainty in CST. Inspired from the RESESOP approach, we propose to adapt the loss function by incorporating the model uncertainty in the loss function which leads to similar results with the RESESOP method.

Study and construction of a RESESOP algorithm for CST
As discussed in Section 2, we are facing the issue of solving an inverse problem without explicitly knowing the forward operator. In order to get a valid reconstruction, we need to take the model uncertainty between the exact and inexact forward operator into account. In CST, the model uncertainty between L µ 1 and L µ * 1 highly depends on the different source and detector positions, but also on the energy of the scattered photons. This is why we want to consider a system of inverse problems, instead of a single one. To handle the model uncertainty issue for multiple inverse problems, we use the RESESOP-Kaczmarz procedure presented in [9]. In the first part of this section, we give a recap on the functionality and regularization properties of this method. Since we want to solve the fully discrete problem (13), the question arises whether the RESESOP outcome for the fully discrete problem regularizes the semidiscrete problem regarding L µ 1 . This is inspected in the second part of this section and further, whether these reconstructions are stable with respect to the chosen subspace. Last but not least, we explain how the RESESOP framework can be applied to CST.

Consider finitely many linear bounded operators
where · X →Y k denotes the operator norm of linear bounded functions between X and Y k . In what follows, we abbreviate all norms by · when there is no ambiguity. For the sake of readibility, we avoid writing η k in the superscript of the inexact forward operators. Further, the following notation will be useful: [n] := n mod K.
The recap on the RESESOP-Kaczmarz procedure, presented in [9], will be twofold: First, in case that all A k are known and exact data g k in the range of A k , noted Ran(A k ), are available, we illustrate the concept of SESOP-Kaczmarz for solving this system of inverse problems. Second, we recall how this method can be extended if only inexact forward operators and noisy data is available.
Beforehand, we give an important definition.
Definition 3.1. Let u ∈ X and α ∈ R. Define the corresponding hyperplane via and upper halfspace via In addition, for ξ ≥ 0 we define the corresponding stripe as Note that H(u, α, ξ) ⊂ H(u, α) for all ξ. (a) Therefore, the idea of Sequential Subspace Optimization is to choose w n ∈ Y [n] and iteratively project onto the corresponding hyperplanes ).
That is, given a start iterate f 0 ∈ X , we set f n := P Hn (f n−1 ), where P Hn denotes the orthogonal projection onto the closed convex set H n , see Figure 2. The projection onto a single hyperplane can be computed by the following formula Lemma 3.2. Let u ∈ X \ {0} and α ∈ R. Then the projection onto H(u, α) can be computed via for any x ∈ X .
Instead of projecting onto a single hyperplane at each iteration, projecting onto the intersection of multiple hyperplanes may significantly increase the convergence rate, see [40], and leads to multiple search directions, see [9]. This effect is also illustrated in Figure 2. However, we did not observe empirically a significant benefit in the convergence rate. This is the reason why we consider below only one search direction. Figure 3. Increasing the thickness of the hyperplane so that the resulting stripe contains the restricted solution set 3.1.2. RESESOP-Kaczmarz for inexact forward operator Let us now assume that only noisy data g δ k with noise levels δ k ≥ g k − g δ k and inexact forward operators A η k are available, for k ∈ {0, 1, ...K − 1}. At this point, we also set some convenient notation We further make the assumption that for some constant ρ > 0 the restricted solution set is non-empty, that means there is a solution whose norm is smaller than ρ.
) may no longer contain the restricted solution set of the respective subproblem and hence, neither M ρ A (g). This problem is tackled by projecting onto stripes instead of hyperplanes whose thickness is chosen in accordance with the level of noise δ and the model inexactness η, see Figure 3. This approach combined with the discrepancy principle is summarized in the following algorithm. Algorithm 3.3 (Similar to Algorithm 2.7. in [9]). Choose an initial value f 0 := f η,δ 0 ∈ B ρ (0) ⊂ X and a constant τ > 1. If the current iterate f η,δ n fulfills the discrepancy principle for the current subproblem, i.e.
Note that for (η, δ) = 0, the previous algorithm is just the SESOP-Kaczmarz procedure from the previous section. In this case we will omit all superindices and write for example f n instead of f η,δ n . As shown in [9], by the construction of the stripe H η,δ n , it contains the restricted solution set M ρ A (g) := M A (g) ∩ B ρ (0). Furthermore, it might happen that u η,δ n = 0 although the discrepancy principle (15) is not fulfilled. In that case the stripe H η,δ n may be the empty set and hence the iteration step not well-defined. This leads to the following definition: Definition 3.4. We call a start iterate f η,δ 0 of Algorithm 3.3 to be feasible if u η,δ n = 0, whenever the discrepancy principle (15) is not fullfilled.
is surjective, all start iterates are feasible.
Remark 3.6. Due to the form of f η,δ n+1 , the u η,δ n are also called search directions. Next we state two theorems from [9], which will be important in the next section. The first one is about the convergence of the SESOP-Kaczmarz iteration. Before we state that the RESESOP-Kaczmarz method is indeed a regularization of the inverse problems A k f = g δ k , k ∈ {0, . . . , K−1}, it is to be noted here that whenever (η, δ) = 0 Algorithm 3.3 terminates after a finite number of iterations according to [9,Lemma 3.7.], that is ), ∀n = n, ..., n+K−1 (16) is finite. It is called finite stopping index, but note that it is called auxilliary stopping index in [9].

Restriction of the domain space
As mentioned in Section 2 we aim for an approximate solution of a semi-discrete inverse problem by considering a fully-discrete version of it. Therefore, in this section we assume Y to be finite dimensional and inspect what happens if we restrict the domain space X to some closed subspace X j . To simplify the notation, we consider only a single forward operator A, A η : X → Y in the following and denote the exact data by g ∈ Ran(A). However, note that together with ideas from [9] it might be possible to adapt the results in this section to the case of multiple forward operators. The restrictions of A and A η to X j are denoted by respectively. There restrictions are also linear and bounded operators between Hilbert spaces X j and Y.
In the first part of this section, we want to apply the RESESOP-Kaczmarz method to the restricted operators and use the preceding theory to observe in Corollary 3.10 that for fixed subspace X j , this yields under some assumption to a regularized solution of the semi-discrete inverse problem Af = g δ . In the second part, we prove stability with respect to the chosen subspace in Theorem 3.13.
3.2.1. RESESOP-Kaczmarz applied to a restricted forward operator Throughout this section we make the assumption that which seems to be restrictive at first, but as Y is finite dimensional, the restricted operator A j has even a high chance of being surjective if the dimension of X j is sufficiently large. The assumption (17) implies that there exists some ρ j > 0 such that M ρj A j (g) := M A j (g) ∩ B ρj (0) = ∅, i.e. the (restricted) solution set is non-empty.
Also, we replace ρ by ρ j so that the stripes H η,δ,j n will contain the restricted solution set M ρj A j (g). Again, if (η, δ) = 0, we omit them in the superindex, for example, we then write f j n instead of f 0,0,j n . The theory presented in Section 3.1 is also applicable to the restricted operators A j and A η,j , which also means that for (η, δ) = 0 there is a finite stopping index n * (η, δ, j) := min n : and for n ≥ n * (η, δ, j) it holds that f η,δ,j n = f η,δ,j n * (η,δ,j) . Moreover, from Theorem 3.7 and Theorem 3.8 we immediately obtain the following result: Corollary 3.9. For (η, δ) = 0 let (f j n ) n∈N be the sequence generated by Algorithm 3.3 for a feasible initial value f j 0 ∈ X . If the parameters t j n from Lemma 3.5 are bounded, then it holds that i.e. (f j n ) n∈N strongly converges to the projection of f j 0 onto the solution set M A j (g). This means that in case of (η, δ) = 0, the application of the SESOP algorithm to the resctricted operator A j indeed converges to a solution of Af = g. Further, in case of model uncertainty or noisy data, applying RESESOP to A η,j regularizes the inverse problem regarding A in the following way.
Corollary 3.10. Let ((η, δ) l ) l∈N be a null sequence and (η, δ) l = 0 for all l. Further, for some feasible start iterate f j 0 ∈ B ρj (0) let f η l ,δ l ,j n * (η l ,δ l ,j) be the outcome of Algorithm 3.3. Assume that the parameters t η l ,δ l ,j n from Lemma 3.5 are bounded with respect to n, l ∈ N. Then it holds that However, at this point it is not clear, whether the RESESOP reconstruction is stable with respect to the chosen subspace X j . This is analyzed in the next subsection.
We recall now a descent property for the RESESOP iterates from [9], which will be helpful for the analysis in the next subsection. In particular, b) holds for all elements z of the restricted solution set M ρj A j (g). The following Lemma addresses the computation of the adjoint of the restricted forward operator. Lemma 3.12. For all y ∈ Y it holds that (A j ) * y = P Xj A * y and similarly for (A η,j ) * .
Proof. Let y ∈ Y. For all v ∈ X j it holds that Therefore, (A j ) * y − A * y is orthogonal to X j . As (A j ) * y ∈ X j , we conclude

Stability with respect to the chosen subspace
In this section we consider a nested sequence of closed subspaces X j of X , i.e. X j ⊂ X j+1 for all j ∈ N.
Further, we assume that and make the stronger assumption, compared to section 3.2.1, that there exists some J ∈ N such that the restriction A J of A to X J is surjective. Due to the nestedness (19), it follows that (A j ) is surjective for j ≥ J. Therefore, without loss of generality we assume all A j to be surjective. Furthermore, it is to be noted here that for closed subspaces V ⊆ X j the expression V ⊥ stands for the orthogonal complement in X . The orthogonal complement in X j is denoted by V ⊥j .
The main goal of this section is to prove the following result: (0), j ∈ N, be start iterates for the RESESOP method applied to A η,j and assume that f j 0 converges to some start iterate f 0 ∈ N (A) ⊥ ∩ B ρ (0) for the SESOP method applied to A. Given some sequence ((η, δ, j) l ) l∈N converging to (0, 0, ∞) we assume that the parameters t (η,δ,j) l n and t n from Lemma 3.5 are bounded with respect to n, l ∈ N. Then it holds that where n * (l) := n * ((η, δ, j) l ) is the finite stopping index from (18). Remark 3.14. We say that a sequence (η, δ, j) l is convergent to (0, 0, ∞), if for all ε > 0 and N > 0 there exists some L ∈ N such that for all l ≥ L |η l | < ε, |δ l | < ε and j l > N.
Moreover, we want to emphasize again that the assumption of A j being surjective implies that all start iterates f j 0 := f η,δ,j 0 ∈ X j and f 0 ∈ X , respectively, are feasible. Therefore, we omitted these conditions in Theorem 3.13.
In order to prove this theorem, some preparations are required. First, we inspect the projections onto solution sets.
Proof. Let f ∈ M A (g). As the solution set is an affine set, namely M A (g) = f +N (A), the corresponding orthogonal projection can be computed via This means that P M A (g) f 0 is the minimum-norm solution of Af = g.
At this point it is to be noted here that, as A and A j both map into a finite dimensional space Y, their generalized inverses are bounded operators defined on the whole space Y, see [35,Satz 2.1.8]. The next step is to show that (A j ) + g converges to A + g. For that purpose we need some results on orthogonal projections: Lemma 3.16. Let (V j ) j∈N be a sequence of nested subspaces of X . For all f ∈ X it holds that In particular, for all f ∈ X it holds that Proof. Let f ∈ V and ε > 0. By definition of V , there exists some N ∈ N and f ε ∈ V N such that f − f ε < ε. Due to the nestedness of the V j we conclude that f ε belongs to all V j for n ≥ N . Thus, Therefore P Vj f converges to P V f . Due to the nestedness, the general case f ∈ X follows analogously by rewriting Corollary 3.17. Let (x n ) n∈N be a sequence in X converging to some x ∈ X . It holds that Proof. As g ∈ Ran(A j ) for all j ∈ N and by the nestedness of the X j , there is some x ∈ X with x ∈ M A j (g) for all j ∈ N. Therefore, we can write Hence, we conclude by Lemma 3.16 that Therefore, for ε > 0 there exists a J ∈ N such that which ends the proof. Proof. Let ε > 0 and set V := j X j . As A * maps Ran(A) into a dense subset of N (A) ⊥ , there exists an element y ∈ Ran(A) such that x − A * y < ε/2. It follows from Lemma 3.16 that as A * y ∈ N (A) ⊥ ⊆ V by assumption (20). Therefore, there exists J ∈ N such that P Xj A * y − A * y ≤ ε/2 for all j ≥ J. Using Lemma 3.12, the desired result follows since for all j ≥ J we have Now we are able to prove the convergence of the sequence of the generalized solutions (A j ) + g. Regarding the proof, we follow the ideas in [35, Section 6.1.
Proof. We set f + := A + g and f + j := (A j ) + g. We recall the notation N (A j ) ⊥j for the orthogonal complement of N (A j ) in X j . We start with the consideration where I denotes the identity on X . Regarding the second term we apply [35, Theorem 2.1.9] to derive which converges to 0, according to Lemma 3.18, as f + ∈ N (A) ⊥ . It remains to show that also the first term above converges to zero. We start by mentioning that the generalized solutions of A j and A are bounded operators. By the surjectivity of the A j we conclude for arbitrary y ∈ Y that (A j ) + y solves the inverse problem A j f = y. It is due to the nestedness (19) of the X j that it solves also A j+1 f = y. Thus, by definition of the generalized inverse we obtain (A j+1 ) + y ≤ (A j ) + y for all j ∈ N.
As a consequence, there is a C > 0 such that Therefore, we are able to estimate As f + ∈ N (A) ⊥ ⊆ j X j , the right-hand-side of the inequality converges to 0, according to Lemma 3.16. Finally, we have shown that (f + j ) j indeed converges to f + . Remark 3.20. If we had not assumed the surjectivity of A j , then the estimate (A j+1 ) + y ≤ (A j ) + y in the proof of the previous theorem might not hold in general.
Proof. According to Lemma 3.15 we have that We recall our assumption that Af = g has a solution in B ρ (0) and A j f = g has a solution in B ρj (0)∩X j . Note that due to the nestedness of the X j we may assume that ρ j+1 ≤ ρ j for all j. In all what follows, we consider for the application of RESESOP to A η,j and for the application of SESOP to A a uniform ρ := max j {ρ, ρ j } = max{ρ, ρ 1 }. (22) This guarantees that the restricted solution sets M A j (g) ∩ B ρ (0) are non-empty for all j and are contained in the respective stripes H η,δ,j n from Algorithm 3.3. In order to prove Theorem 3.13, we follow the same strategy as in [9] for proving Theorem 3.8 for the case X j = X . For that, we first show that the iterates f η,δ,j n from the application of RESESOP to A η,j converges to f n , the iterates from the application of SESOP to A. To simplify the notation, we denote by x η,δ,j → x the following lim (η,δ,j)→(0,0,∞) x η,δ,j = x and state a useful result. Proof. First, note that for x ∈ X j it holds that A j x = Ax and A η,j x = A η x. Consider Note that A η is bounded, as A η converges to A. Moreover, f η,δ,j n converges to f n by assumption, so that we conclude from the estimation above that w η,δ,j n converges to w n if (η, δ, j) converges to (0, 0, ∞). Using Lemma 3.12, we obtain Therefore, it follows that Since A * w n ∈ N (A) ⊥ ⊆ j X j , the second term on the right-hand side converges to 0 by Lemma 3.16. Therefore, it follows by the estimations above and the convergence of the w η,δ,j n that u η,δ,j n converges to u n . It is now straightforward to see that also α η,δ,j n , ξ η,δ,j n converge to α n , ξ n , respectively.
With this result we are able to prove the following.
Proof. We prove this statement by induction. The base case for n = 0 is fulfilled just by assumption. Assume that f η,δ,j n converges to f n . From Algorithm 3.3 and Proposition 3.11, we observe that First, consider sequences in This means, that for those (η, δ, j) the discrepancy principle (15) is fulfilled at iteration index n, which means f η,δ,j n+1 = f η,δ,j n . In this case, we conclude by the induction hypothesis and the previous Lemma 3.22 w n = lim w η,δ,j n = 0, which implies Af n − g = w n = 0 and hence f n+1 = f n = lim f η,δ,j n = lim f η,δ,j n+1 . Second, consider sequences in If w n = 0, it follows by the feasibility of f 0 that u n = 0. Therefore, we conclude by the induction hypothesis, Lemma 3.22 and Proposition 3.11 a) that Let now w n = 0, which means that f n is the outcome of the SESOP algorithm. Due to Theorem 3.7, we conclude f n+1 = f n = P M A (g) f 0 . Let ε > 0. We insert z := P M A j (g) f j 0 into Proposition 3.11 b) and obtain The first term on the right-hand side converges to zero according to Corollary 3.21 and Corollary 3.17, whereas the second term converges to zero by the induction hypothesis. Finally, this means that Altogether we conclude that f η,δ,j n converges to f n+1 . Now we are able to prove the main result of this section.
Proof of Theorem 3.13. First, each subsequence of n * (l) has a bounded or a monotonically increasing, unbounded subsequence. Therefore, it suffices to consider these two cases -namely n * (l) to be bounded or unbounded but monotonically increasing -and show that in both cases f η,δ,j n * (l) converges to the same element P M A (g) f 0 . (i) Assume n * (l) to be bounded and set We observe by the definition of the finite stopping index that As a consequence, for those n ≥ N the discrepancy principle is satisfied which yields On the other hand, we have A η l ,j l f (η,δ,j) l n − g δ l = w (η,δ,j) l n l→∞ −→ w n , according to Corollary 3.23 and Lemma 3.22. This implies w n = 0, so Af n = g, which means that f n is the output of the SESOP algorithm. By Theorem 3.7 we conclude f n = P M A (g) f 0 for all n ≥ N, which together with Corollary 3.23 leads to (ii) Now assume that n * (l) is monotonically increasing and unbounded.
On the one hand, the SESOP iterates (f n ) n converge to P M A (g) f 0 . On the other hand, by Corollary 3.21 and Corollary 3.17 we have that Therefore, let choose an N ∈ N such that for all n, j ≥ N. According to Corollary 3.23 there exists some l ∈ N with l ≥ N such that for all l ≥ l . As n * (l) is unbounded and monotonically increasing we can choose l ≥ l such that n * (l) ≥ N for all l ≥ l . For those l we derive where the last step follows from Proposition 3.11 b), which is applicable as P M A N (g) f N 0 belongs to the restricted solution set M ρ A N (g), which is a subset of M ρ A n * (l) (g). We further estimate f due to (23) and (24). Altogether, we have shown that for all l ≥ l f (η,δ,j) l n * (l) Finally, we have shown convergence of f (η,δ,j) l n * (l) to P M A (g) f 0 in both cases and hence, by our introductory discussion, in all cases.
We have shown in the last theorem that the RESESOP iterates for A η,j converge to the SESOP outcome for A, namely P M A (g) f 0 . By the choice of f 0 ∈ N (A) ⊥ ∩B ρ (0) we have seen in Lemma 3.15 that P M A (g) f 0 must be the minimum-norm solution of Af = g. As we assumed that f has a solution in B ρ (0) we conclude that P M A (g) f 0 must also be in B ρ (0) and not only in B ρ (0) for ρ = max{ρ, ρ j }.

Application to CST
In this section we specify how the previous framework can be applied to the semidiscrete and fully discrete operators of CST. We assume that the chosen subspace X j of H α 0 (Ω) ensures surjectivity of the fully discrete operator (L µ 1 ) j : X j → R P ×K from Section 2.3. We then set A j := (L µ 1 ) j and A η,j := (L µ * 1 ) j , as well as A := L µ 1 and A η := L µ * 1 , for a chosen prior µ * to the groundtruth µ. If we equip the domain spaces with the Sobolev norm, all operators above are continuous so that the theory in both Sections 3.2.1 and 3.2.2 is applicable. That means that applying RESESOP to (L µ * 1 ) j does not only regularize L µ 1 f = g, see Corollary 3.10, but is also stable with respect to the chosen subspace in the sense of Theorem 3.13. However, as mentioned before, we do not have access to the adjoint operators regarding the Sobolev norm, so that for our numerical experiments we need to equip X j with the L 2 -norm. In this case, it still holds that RESESOP applied to (L µ * 1 ) j converges to a solution of L µ 1 f = g as soon as the model uncertainty η and the noise-level δ go to zero. However, this reconstruction may not be stable with respect to the chosen X j , as the semi-discrete forward operator L µ 1 is no longer continuous.
Since the model uncertainty highly depends on the respective source, detector positions and energies of incoming photons, we split the operators up by A η,j p,k f := (L µ * 1 ) j f p,k ∈ R, for p ∈ {1, ..., P } and k ∈ {1, ..., K} and analogously for the other operators. Therefore, in our simulations in Section 5, we apply the RESESOP-Kaczmarz Algorithm 3.3 to A η,j p,k .

A Deep Image Prior approach for CST
Solving the inverse problem in Compton scattering tomography using standard learning techniques would require large databases obtained from energy-resolved detectors with sufficient energy resolution and γ-ray sources. Unfortunately, such datasets do not exist preventing the training of a neural network for the CST problem. Alternatively, it is possible to use unsupervised techniques such as Deep Image Prior (DIP), see [27]. Therefore, in this section we inspect how the DIP approach can be applied and adapted to the model inexactness in CST.
In this section, we use the same notation as in section 3.3. For the sake of readibility, we denote the fully discrete inexact forward operator (L µ * 1 ) j (·) p,k by A η,j p,k and the exact operator (L µ 1 ) j (·) p,k by A j p,k . In order to apply a DIP approach to these operators, we consider a suitable (neural network) mapping ϕ θ : Z → X j , where θ belongs to some parameter space Θ. Given a single data point g δ ∈ R P ×K and some random input z ∈ Z, the DIP approach seeks for finding parameters θ opt ∈ Θ that minimizes some loss function , i.e. θ opt ∈ argmin θ∈Θ A η,j p,k ϕ θ (z), g δ p,k p,k .
The DIP reconstruction is then obtained by evaluating ϕ θopt (z).
Usually, the construction and efficiency of such an approach requires: (i) a suitable network architecture must be chosen, which should capture information about the "nature" of images we are looking for, see also [14]; (ii) a stopping criterion in order to avoid noise over-fitting, and (iii) a proper loss function , which, as we will see, should also contain information about the model uncertainty between A η,j p,k and A j p,k . In our experiments we focus on inspecting the effect of including model uncertainty estimates to the loss function.

Network architecture
Motivated by the similarities between the model of the first-order scattered and the standard Radon transform, we consider the U-Net provided by J. Leuschner on GitHub ‡, which was also successfully used in [7] for CT reconstructions. In our simulation settings, see section 5, we obtained the best results for seven layers with (32,32,64,64,128,128,256) channels. For this, we needed to slightly adapt the code, as it was designed for at most six layers. Moreover, we used six skip connections and a sigmoid function as a final step of the neural network. For the details of the down and up sampling parts of the U-Net we refer directly to the code mentioned above. However, it is reasonable to think that a more optimal network architecture for CST could be constructed in the future, in particular to address the complexity of the model and the multiple-order scattering.

Loss functions
As a first and standard approach -in case of an exact forward operator, see also [14] -we consider the mean squared error loss function for θ ∈ Θ. This means that no information on the model uncertainty is explicitly included. By this, we want to inspect, whether the network itself is capable of reducing artifacts caused by considering the inexact forward operators A η,j p,k . As we will see in the next section, this is not the case.
Hence, we want to include information on the model uncertainty to the loss function. Motivated by the approach in [9] to include the model uncertainty to the RESESOP-Kaczmarz procedure, we propose to include the model uncertainty to a loss function via where c p,k := τ (ρ η p,k + δ p,k ) is a model correction term inspired from the RESESOP method studied in the previous section. The connection to RESESOP-Kaczmarz is revealed by the following observation: If we assume that one of the summands in (26) is zero, then it is not difficult to see that ϕ θ (z) belongs to the boundary of the restricted stripe B ρ (0) ∩ H(u η,δ,j p,k , α η,δ,j p,k , ξ η,δ,j p,k ), where u η,δ,j p,k := (A η,j p,k ) * (A η,j p,k ϕ θ (z) − g δ p,k ), α p,k := A η,j p,k ϕ θ (z) − g δ p,k , g δ p,k , ξ η,δ,j p,k := c p,k A η,j p,k ϕ θ (z) − g δ p,k are analogously defined as in Algorithm 3.3. Hence, if θ opt is a minimizer of 2 , then ϕ θopt (z) is expected to be close to the boundary of all those stripes. As illustrated in Figure 3, the solution of A j f = g is expected to be close to the stripe boundaries. Further note that 2 is also differentiable with respect to θ, given that ϕ θ is differentiable, which enables backpropagation. p,k , whose range is contained in the desired interval. This way, the network could learn a better estimation of the model uncertainty and be less vulnerable to bad model uncertainty estimates. This idea might be valuable for further research. Moreover, the advantage of considering loss functions like 2 is that they probably do not require a stopping criterion.
We end this section by describing the general training process. For minimizing the loss functions we used the stochastic optimizer ADAM from pytorch ¶. We observed that in the beginning of the training process 2 seems to be more sensitive than 1 to the choice of the learning rate in the ADAM optimizer. More precisely, if the learning rate does not decrease quickly enough during the training, it sometimes happened that the current reconstruction completely changes from one training step to another. This might be explained by inspecting the gradients of the loss functions with respect to θ: For simplicity, we consider instead the following functions and their gradients Thus, if x is not close to c -which is the case in the beginning of the optimizationthe gradients of f 2 have a larger dynamic than those of f 1 . So, if the learning rate is not small enough, the gradient descent step for 2 is more likely to be too large.
In order to stabilize the minimization of 2 the following strategies turned out to be efficient: First, starting with 1 and later changing the loss function to 2 is more robust to the choice of the learning rate. Second, clipping the gradients during the backpropagation turned out to be another good option to stabilize the loss function

Simulation results
In this section, we consider only the two-dimensional case for CST for convenience as the three-dimensional case is significantly more expensive in terms of computations, as mentioned in Section 2. However, there is no obstacle in the analysis of both forward models and reconstruction techniques for a direct application to 3D.
We start by presenting the setting of our numerical experiments and exhibit then how the first-and second-order scattering data g 1 , g 2 ∈ R P ×K are simulated. Afterwards, we present the reconstruction results for the RESESOP and the DIP approach.
Image domain. During our experiments the region Ω ⊂ R 2 to be scanned is a square of 30 cm side-length and center at zero.
Architecture of the CST scanner. The detector space D is a sphere with radius 30 cm and center at zero. At one half of this sphere n s = 10 sources are evenly positioned. For each source we evenly sample 80% of the detector space at n d locations for the detectors, which is illustrated in Figure 4. 20% are omitted because detectors close to the source will only receive a low signal. In total, we have K = n s · n d = 200 source-detector tuples. Sources and energy. The monochromatic sources are assumed to emit γ-rays at energy E 0 = 1173 keV, which corresponds to the maximal peak of Cobalt-60. Moreover, the total number of emitted photons per source is set to I 0 = 8 · 10 8 . Further research and simulations shall take into account the polychromacy of the source as in [26] but our proposed method can be adapted to this physical aspect. We also discard the backscattering, i.e. scattering angles ω ∈ (π/2, π), as the flux of this part of the spectrum is rather low, thus heavier affected by noise, and further delivers a poorer information for fixed energy resolution, see [36]. Therefore, accordingly with to the Compton formula (4), we equally sample the energy space E at P = 80 energies in the interval (359.6, 1161.5) keV, so for scattering angles ω ∈ (0, π/2).
A modified Shepp-Logan phantom. For the groundtruth µ we consider a bilinear interpolator of a modified Shepp-Logan Phantom defined on a grid twice as fine as Ω h , so µ / ∈ X j . The original Shepp-Logan phantom has a very low contrast in the inner part which is not suited for the level of model inexactness we consider. In order to still provide a challenge for the algorithms, we increased the contrast but not as much as for the "Modified Shepp-Logan" defined in MATLAB. The electron densities relative to water of µ are in the interval [1.36, 5.66], see figure 5 a). This means its maximal electron density corresponds to bone. Note that the electron density of water is 3.23 · 10 23 electrons per cm −3 . The horizontal and vertical diameters of the phantom are 19.5 cm and 26 cm, respectively. Regarding the prior µ * , see Figure 5 b), we choose the same shape as for µ, but set the relative electron density of the interior constantly to 0.67. Both µ and µ * are positioned in Ω such that their center are at zero.
Restricting the domain space. The finite dimensional subspace X j of H α 0 (Ω) is constructed in the following way: On Ω = (−15, 15) 2 , we consider a regular 100 × 100 grid x − x n 0.5h where c nm is chosen such that e nm L 2 (Ω) = 1. Each gaussian is truncated to the set Ω ∩ B r ((x n , y m )), for r := 1.5h. By setting X j as the linear span of the e nm we obtain a 10000 dimensional space.
Forward models. For the implementation of the first-order CST operator L 1 , we used the trapezoidal rule for computing the involved integrals. The first-order scattering data g 1 ∈ R P ×K is then computed by evaluating the semi-discrete L µ 1 at µ for the respective detector-source-energy triples described above. The second-order scattering data g 2 ∈ R P ×K was generated using Monte-Carlo simulations [36]. Since "only" I 0 = 8 · 10 8 photons were sent by the source, the second-order scattered radiation is subject to noise due to the stochastic nature of the emission of photons. The data g 1 , g 2 and the sum g 1 + g 2 for one source position are depicted in Figure 5(c).
Inexact model. Regarding the inexact fully discrete operator (L µ * 1 ) j : X j → R P ×K , we compute its matrix representation (P µ * 1 ) j -an 16000×10000 matrix -with respect to the basis (e nm ) n,m and the standard basis of R P ×K , that is, its columns are flattened versions of the vectors L µ * 1 (e nm ) ∈ R P ×K . This allows a fast evaluation of the fully-discrete operator. For computing the matrix entries, we pre-computed the weight W 1 (µ * ) on a grid twice as fine as Ω h and used linear interpolation. Thereby, the computation time was reduced.   Different scenarios. In our numerical experiments with RESESOP and DIP, we consider the following reconstruction problems and use the notation introduced in the previous sections.
For our reconstructions we need accurate estimations of the model uncertainty for every subproblem. We computed them numerically by inspecting the discrepancy between data generated by the exact and inexact forward operators. Further, we use four different similarity measurements in Table 1,2 and 3 for comparing the different reconstructions, namely: Signal-to-Noise ratio (SNR) †, peak signal-to-noise ratio (PSNR), structural self-similarity (SSIM) and normalized mean square error (NMSE).
In scenario (i), that is dealing only with exact first-order scattering data g 1 , we present the outcome of six reconstruction methods. Three of them, Landweber with early stopping, Total Variation (TV) -see Remark 5.1 -and the DIP approach with loss function 1 , do not take the model uncertainty into account and are depicted in Figure 6 (a), (c) and Figure 7 (a), respectively. We observe that the overall contours of different tissues are well recognizable, which is expectable according to the results in [36]. However, the contrast between different tissues is badly reconstructed. In comparison to that, inspecting the RESESOP-Kaczmarz reconstruction in Figure 6 (e), the contrast between different tissues is much better retained. But a certain noisy pattern is noticeable in this reconstruction, which might be caused by considering multiple inverse problems instead of one. To deal with this problem one could e.g. do some post-processing with a suitable denoiser. Empirically, a good strategy was to include a few TV denoising steps after every 100 RESESOP-Kaczmarz sweeps. By that we obtained satisfactory results, see Figure 7 (g). Nevertheless, it is to be noted here, that it is probably not easy to prove convergence for a combination of RESESOP and a non-projective method like TV-denoising. Finally, the DIP reconstruction with loss function 2 is depicted in Figure 7 (b) and looks pretty similar to the RESESOP reconstruction, which is not very surprising, as the discrepancy term c p,k in 2 was chosen like in the discrepancy principle (15) in the RESESOP Algorithm 3.3. The reconstruction errors are listed in Table 1 and the best results were achieved by the the combination of RESESOP and TV-denoising, called RESESOP+TV in the following.
Remark 5.1. Introduced in [39], the Total-Variation has become a standard for regularizing inverse problems in imaging and image processing and is solved by the primal-dual algorithm. Here, we considered the regularized TV introduced and analyzed in [1], i.e. for solving an inverse problem Bf = g δ we find a minimizer of The objective function is differentiable, so gradient methods can be used for deriving a solution. For the TV denoising step in RESESOP+TV, we chose B to be the identity operator.
In scenario (ii) for corrupted data g δ 1 we observe that RESESOP, RESEOP+TV and DIP with loss function 2 are able to handle both the model uncertainty and noise if good approximations of the noise-levels δ p,k are known, see Figure 6 (f), (h) and Figure 7 (c). However, the corresponding reconstruction errors in Table 2 are a bit worse than in case of noise-free data in Table 1. Except for RESESOP+TV. Here the reconstruction remains good, even a slightly better, which might be because of a better choice by hand of the TV-denoising parameter. In the Landweber and TV reconstructions in Figure 6 (b) and (d) some details are gone.
Incorporating the second-order scattering data g 2 to g 1 in scenario (iii) leads to a huge additional uncertainty, as the flux of g 2 is almost as high as the flux of g 1 , that is g 2 1 ≈ g 1 1 , where In this case, we see that RESESOP is no longer capable to handle it, see Figure 8 (d). Also TV in Figure 8 (a) is no longer able to reconstruct the inner contours.
Therefore, we applied a finite difference operator P in scenario (iv) to both sides of the problem A η,j f = (g 1 + g 2 ). This reduced the flux of both first-and secondorder data, but more importantly, the latter decreased more: Pg 2 ≤ 0.44 · Pg 1 . Indeed, by this, RESESOP and RESESOP+TV lead to good reconstructions, see Figure 8 (e) and (f). However, the contrast between different tissues is not as well reconstructed as in the case for just g 1 data, see also Table 3 for the reconstruction errors. The DIP approach with loss function 2 in Figure 7 (d) has some artifacts in form of scratches, but the contrast between tissues is better conserved, which results in comparable reconstructions errors to RESESOP and RESESOP+TV in Table 3.
Remark 5.2. As also the DIP reconstructions in Figure 7 include some noisy pattern, we tried to add a further denoising penalty to the loss function 2 in the DIP approach. Unsurprisingly, thereby the influence of the model uncertainty gets more visible again. Therefore, we propose to gain improvements rather by some post-processing, for example by TV denoising.
Remark 5.3. The prior µ * is very simple, so the model uncertainty is rather large. To decrease the model uncertainty, one could use one of the reconstructions as a new priorμ * and consider A η = Lμ * 1 , which probably is a better approximation of L µ 1 . Furthermore, a prior g * 2 could be included in the discrepancy term, both for RESESOP and DIP, in order to reduce the model uncertainty. We did not consider these improvements to stress the algorithms in terms of model uncertainty.  Table 3. Scenario (iv): Error measures for the different reconstructions and methods to solve (PA) η P ,j f = P(g 1 + g 2 ).

Conclusion
We have proposed two data-driven reconstruction strategies able to handle the model uncertainty occurring in imaging based on Compton scattering. The construction of these algorithms is based on the study of the properties of the forward models: nonlinearity, mapping properties and model uncertainty. The first approach considers the RESESOP method which is studied in terms of convergence and regularization for the fully discrete case in order to fit the restrictions of our spectral inverse problem. The second approach exploits the popular DIP method, suited for the treated problem since unsupervised, it does not require a dataset. We modified the learning loss function using the model uncertainty model used in the first approach. Simulation results on synthetic data for the first-order scattering and on Monte-Carlo data for The performed simulations assumed an almost perfect estimation of the model uncertainty for every subproblems which is hard to achieve in practice and remains an open issue for the general RESESOP approach or here for our modified DIP method. A first possibility would be to learn the model uncertainty coefficients from a synthetic dataset or from real dataset in the future. Another more general approach would be to relax the uncertainty parameter in the RESESOP method, for instance by incorporating a minimization problem at each iterate to find the best parameter. These questions will be the core of future research.