Convergence rates for the joint solution of inverse problems with compressed sensing data

Compressed sensing (CS) is a powerful tool for reducing the amount of data to be collected while maintaining high spatial resolution. Such techniques work well in practice and at the same time are supported by solid theory. Standard CS results assume measurements to be made directly on the targeted signal. In many practical applications, however, CS information can only be taken from indirect data $h_\star = W x_\star$ related to the original signal by an additional forward operator. If inverting the forward operator is ill-posed, then existing CS theory is not applicable. In this paper, we address this issue and present two joint reconstruction approaches, namely relaxed $\ell^1$ co-regularization and strict $\ell^1$ co-regularization, for CS from indirect data. As main results, we derive error estimates for recovering $x_\star$ and $h_\star$. In particular, we derive a linear convergence rate in the norm for the latter. To obtain these results, solutions are required to satisfy a source condition and the CS measurement operator is required to satisfy a restricted injectivity condition. We further show that these conditions are not only sufficient but even necessary to obtain linear convergence.


Introduction
Compressed sensing (CS) allows to significantly reduce the amount of measurements while keeping high spatial resolution [4,7,9]. In mathematical terms, CS requires recovering a targeted signal x ∈ X from data y δ = Mx + z δ . Here M : X → Y is the CS measurement operator, X, Y are Hilbert spaces and z δ ∈ Y is the unknown data perturbation with z δ ≤ δ. CS theory shows that even when the measurement operator is severely under-determinated one can derive linear error estimates x δ −x = O(δ) for the CS reconstruction x δ . Such results can be derived uniformly for all sparse x ∈ X assuming the restricted isometry property (RIP) requiring that Mx 1 − Mx 2 x 1 − x 2 for sufficiently sparse elements [5]. The RIP is known to be satisfied with high probability for a wide range of random matrices [1]. Under a restricted injecticity condition, related results for elements satisfying a range condition are derived in [11,12].
In [10] a strong form of the source condition has been shown to be sufficient and necessary for the uniqueness of 1 minimization. In [12] it is shown that the RIP implies the source condition and the restricted injectivity for all sufficiently sparse elements.

Problem formulation
In many applications, CS measurements can only be made on indirect data h = Wx instead of the targeted signal x ∈ X, where W : X → H is the forward operator coming from a specific application at hand. For example, in computed tomography, the forward operator is the Radon transform, and in microscopy, the forward operator is a convolution operator. The problem of recovering x ∈ X from CS measurements of indirect data becomes Recover x from y δ = AWx + z δ , (1.1) where A : H → Y is the CS measurement operator. In this paper we study the stable solution of (1.1).
The naive reconstruction approach is a single-step approach to consider (1.1) as standard CS problem with the composite measurement operator M = AW. However, CS recovery conditions (such as the RIP) are not expected to hold for the composite operator AW due to the ill-posedness of the operator W. As an alternative one may use a twostep approach where one first solves the CS problem of recovering Wx and afterwards inverts the operator equation of the inverse problem. Apart from the additional effort, both recovery problems need to be regularized and the risk of error propagation is high.
Moreover, recovering h ∈ H from sparsity alone suffers from increased non-uniqueness if ran(W) H.

Proposed 1 co-regularization
In order to overcome the drawbacks of the single-step and the two-step approach, we introduce two joint reconstruction methods for solving (1.1) using a weighted 1 norm · 1,κ (defined in (2.1)) addressing the CS part and variational regularization with an additional penalty R for addressing the inverse problem part. More precisely, we study the following two regularization approaches.
(a) Strict 1 co-regularization: Here we construct a regularized solution pair where α > 0 is a regularization parameter. This is equivalent to minimizing Ah − (b) Relaxed 1 co-regularization: Here we relax the constraint h = Wx by adding a penalty and construct a regularized solution (x δ α , h δ α ) by minimizing The relaxed version in particular allows some defect between Wx δ α and h δ α .
Under standard assumptions, both the strict and the relaxed version provide convergent regularization methods [18].

Main results
As main results of this paper, under the parameter choice α δ, we derive the linear convergence rates (see Theorems 2.8, 2.7) where D R ξ is the Bregman distance with respect to R and ξ (see Definition 2.1) for strict as well as for relaxed 1 co-regularization. In order to archive these results, we assume a restricted injectivity condition for A and source conditions for x and Wx . These above error estimates are optimal in the sense that they cannot be improved even in the cases where A = Id, which corresponds to an inverse problem only, or the case W = Id where (1.1) is a standard CS problem on direct data.
As further main result we derive converse results, showing that the source condition and the restricted injectivity condition are also necessary to obtain linear convergence rates (see Theorem 3.4).
We note that our results and analysis closely follow [11,12], where the source condition and restricted injectivity are shown to be necessary and sufficient for linear convergence of 1 -regularization for CS on direct data. In that context, one considers CS as a particular instance of an inverse problem under a sparsity prior using variational regularization with an 1 -penalty (that is, 1 -regularization). Error estimates in the norm distance for 1 -regularization based on the source condition have been first derived in [14] and strengthened in [11]. In the finite dimensional setting, the source condition (under a different name) for 1 -regularization has been used previously in [10]. For some more recent development of 1 -regularization and source conditions see [8].
Further note that for the direct CS problem where W = Id is the identity operator and if we take the regularizer R = · 2 /2, then the strict 1 co-regularization reduces to the well known elastic net regression model [19]. Closely following the work [11], error estimates for elastic net regularization have been derived in [13]. Finally, we note that another interesting line of research in the context of 1 co-regularization would be the derivation of error estimates under the RIP. While we expect this to be possible, such an analysis is beyond the scope of this work.

Linear convergence rates
Throughout this paper X, Y and H denote separable Hilbert spaces with inner product · , · and norm · . Moreover we make the following assumptions. (A.5) (φ λ ) λ∈Λ ∈ H Λ is an orthonormal basis (ONB) for H.
Recall that R is wlsc (weakly lower semi-continuous) if lim inf k→∞ R(x k ) ≥ R(x) for all (x k ) k∈N ∈ X N weakly converging to x ∈ X. We write ran(W) := {Wx | x ∈ X} for the range of W and For a finite subset of indices Ω ⊆ Λ, we write Finally, A denotes the standard operator norm.

Auxiliary estimates
One main ingredient for our results are error estimates for general variational regularization in terms of the Bregman distance. Recall that ξ ∈ X is called subgradient of a The set of all subgradients is called the subdifferential of Q at x and denoted by ∂Q(x ).
Definition 2.1 (Bregman distance). Given Q : X → [0, ∞] and ξ ∈ ∂Q(x ), the Bregman distance between x , x ∈ X with respect to Q and ξ is defined by The Bregman distance is a valuable tool for deriving error estimates for variational regularization. Specifically, for our purpose we use the following convergence rates result.
Then for all δ, α > 0, y δ ∈ Y with y δ − y ≤ δ and Proof. Lemma 2.2 has been derived in [12,Lemma 3.5]. Note that error estimates for variational regularization in the Bregman distance have first been derived in [3].
For our purpose we will apply Lemma 2.2 where Q is a combination formed by R and · 1,κ . We will use that the subdifferential of · 1,κ at h consists of all η = Since the family (η λ ) λ∈Λ is square summable, η λ = ±κ λ can be obtained for only finitely many λ and therefore ∂ h 1,κ is nonempty if and only if h is sparse.

Relaxed 1 co-regularization
First we derive linear rates for the relaxed model B α,y δ . These results will be derived under the following condition.
Conditions ( Remark 2.6 (Product formulation). We introduce the operator M : Using these notions, one can rewrite the relaxed co-regularization functional B α,y δ as Because W and A are linear and bounded, M is linear and bounded, too. Moreover, since R and · 1,κ are proper, convex and wlsc, Q has these properties, too. The Here comes our main estimate for the relaxed model. Proof.
Using the product formulation as in Remark 2.

Strict 1 co-regularization
Next we analyze the strict approach (1.2). We derive linear convergence rates under the following condition.
Condition (2.2) is a source condition for the forward operator WA and the regularization functional R + ∂ W( · ) 1,κ . Condition (2.3) assumes the splitting of the subgradient The assumption (2.4) is the restricted injectivity.

Necessary Conditions
In this section we show that the source condition and restricted injectivity are not only sufficient but also necessary for linear convergence of relaxed 1 co-regularization. In the following we restrict ourselves to the 1 -norm We denote by M and Q the product operator and regularizer defined in (2.14), (2.15).

Auxiliary results
Condition 3 is clearly stronger than Condition 1. Below we will show that these conditions are actually equivalent. For that purpose we start with several lemmas. These results are in the spirit of [12] where necessary conditions for standard 1 minimization have been derived.
If z ∈ (ker(A)) ⊥ , then we choose ξ := z and (3.5) is fulfilled. If, on the other hand, z / ∈ (ker(A)) ⊥ , then dim(ker(A Ω∪Ω 1 )) =: s ≥ 1 and there exists a basis (w (1) , . . . , w (s) ) of ker(A Ω∪Ω 1 ) such that  Because of the equality 1 = z, w (i) , the admissible vectors z in (3.7) are precisely those for which ξ := z + z ∈ (ker(A Ω∪Ω 1 )) ⊥ . Thus, the task of finding ξ satisfying (3.5) reduces to showing that the value of (3.7) is strictly smaller that 1. Note that the dual of the convex function z → max λ∈Ω 1 | φ λ , z | is the function (3.8) Recalling that z , w (i) = λ∈Ω 1 φ λ , z φ λ w (i) , it follows that the dual problem to (3.7) is the following constrained problem on R s : for every p ∈ R s . Taking w = s i=1 p i w (i) ∈ ker(A) ∩ H Ω∪Ω 1 , inequality (3.4) therefore implies that for every p ∈ R s there exist µ ∈ (0, 1) such that From (3.9) it follows that |S(p)| ≤ µ for every admissible vector p ∈ R s for (3.9). Thus the value of S(p) in (3.9) is greater than or equal to −µ. Since the value of the primal problem (3.7) is the negative of the dual problem (3.9), this shows that the value of (3.7) is at most µ. As µ ∈ (0, 1), this proves that the value of (3.7) is strictly smaller than 1 and, as we have shown above, this proves assertion (3.5).
Proof. See [12,Lemma 4.1]. The proof given there also applies to our situation.

Main result
The following theorem in the main results of this section and shows that the source condition and restricted injectivity are in fact necessary for linear convergence. (ii) (x , h , y ) satisfies Condition 1.
and therefore By the definition of the Bregman distance Since the Bregman distance is nonnegative, it follows Now let (iv) hold and y − y k ≤ δ k , δ k → 0. Choose α k = Cδ k and (x k , h k ) ∈ argmin B α k ,y k . The uniqueness of (x , h ) implies (x k , h k ) − (x , h ) → 0 as k → ∞, see [11,Prop. 7]. Moreover, H × Y such that W * u ∈ ∂R(x ) and A * v − u ∈ ∂ h 1 . Proposition 3.2 finally shows that Condition 3 holds, which concludes the proof.

Numerical example
We consider recovering a function from CS measurements of its primitive. The aim of this elementary example is to point out possible implementation of the two proposed models and supporting the linear error estimates. Detailed comparison with other methods and figuring out limitations of each method is subject of future research. The discrete operator W : R N → R N is taken as a discretization of the integration f . The CS measurement matrix A : R m×N is taken as random Bernoulli matrix with entries 0, 1. We apply strict and relaxed 1 coregularization with R = · 2 /2, (φ λ ) λ∈Λ as Daubechies wavelet ONB with two vanishing moments and κ λ = 1. For minimizing the relaxed 1 co-regularization functional we apply the Douglas-Rachford algorithm [6, Algorithm 4.2] and for strict 1 co-regularization we apply the ADMM algorithm [6, Algorithm 6.4] applied to the constraint formulation argmin x,h { AWx − y δ 2 /2 + α x 2 /2 + α h 1 | Wx = h}.
Results are shown in Figure 1. The top row depicts the targeted signal x ∈ R N (left) for which Wx is sparsely represented by (φ λ ) λ∈Λ (right). The middle row shows reconstructions using the strict and the relaxed co-sparse regularization from noisy data y − y ≤ 10 −5 . The bottom row plots D R x (x δ α , x ) and h δ α − Wx as functions of the noise level. Note that for R = · 2 /2 the Bregman distance is given by

Conclusion
While the theory of CS on direct data is well developed, this by far not the case when compressed measurements are made on indirect data. For that purpose, in this paper we study CS from indirect data written as composite problem y δ = AWx + z δ where A models the CS measurement operator and W the forward model generating indirect data and depending on the application at hand. For signal reconstruction we have proposed two novel reconstruction methods, named relaxed and strict 1 co-regularization, for jointly estimating x and h = Ax. Note that the main conceptual difference between the proposed method over standard CS is that we use the 1 penalty for indirect data Wx instead of x together with another penalty for x accounting for the inversion of A, and jointly recovering both unknowns.
As main results for both reconstruction models we derive linear error estimates under source conditions and restricted injectivity (see Theorems 2.8, 2.7). Moreover, conditions have been shown to be even necessary to obtain such results (see Theorem 3.4). Our results have been illustrated on a simple numerical example for combined CS and numerical differentiation. In future work further detailed numerical investigations are in order comparing our models with standard CS approaches in practical important applications demonstrating strengths and limitations of different methods. Potential applications include magnetic resonance imaging [2,15] or photoacoustic tomography [17,16].