Efficient regularization with wavelet sparsity constraints in photoacoustic tomography

In this paper, we consider the reconstruction problem of photoacoustic tomography (PAT) with a flat observation surface. We develop a direct reconstruction method that employs regularization with wavelet sparsity constraints. To that end, we derive a wavelet-vaguelette decomposition (WVD) for the PAT forward operator and a corresponding explicit reconstruction formula in the case of exact data. In the case of noisy data, we combine the WVD reconstruction formula with soft-thresholding, which yields a spatially adaptive estimation method. We demonstrate that our method is statistically optimal for white random noise if the unknown function is assumed to lie in any Besov-ball. We present generalizations of this approach and, in particular, we discuss the combination of PAT-vaguelette soft-thresholding with a total variation (TV) prior. We also provide an efficient implementation of the PAT-vaguelette transform that leads to fast image reconstruction algorithms supported by numerical results.


Introduction
Photoacoustic tomography (PAT) is a novel coupled-physics (hybrid) modality for non-invasive biomedical imaging that combines the high contrast of optical tomography with the high spatial resolution of acoustic imaging [7,68,102,105,107]. Its principle consists in illuminating a sample by an electromagnetic pulse that, due to the photoacoustic effect, generates pressure waves inside of the object; see figure 1. The generated pressure waves (the acoustic signals) then propagate through the sample and beyond, and the pressure is recorded outside of the sample. Finally, mathematical algorithms are used to reconstruct an image of the interior (see, for example [69,86,107]).
In this paper, we work with the standard model of PAT, where the acoustic pressure u : R d × (0, ∞) → R solves the standard wave equation (see (3)). The goal of PAT is to recover the initial pressure distribution (at time t = 0), given by a function h : R d → R, from measurements Uh := u| ∂H+×(0,∞) of the pressure that is recorded on the hyperplane ∂H +. Here, H + denotes the half space R d−1 × (0, ∞). That is, we assume that we know u on ∂H + = R d−1 × {0} (the acquisition surface), and our goal is to reconstruct h from (possibly approximate) knowledge of Uh. This is an inverse problem which amounts to an (approximate) inversion of the operator U. We will refer to that problem as the inverse problem of PAT with a planar acquisition geometry. Note that similar approaches can be considered for other acquisition geometries as well.
Among those iterative techniques, sparse regularization approaches have gained a lot of attention during the last years as they have proven to perform well for noisy data as well as for incomplete data problems. One of their main advantages consists in their ability to combine efficient regularization with good feature preservation and (to some extent) to compensate for the missing data [10,46,56,90]. However, these advantages of sparse regularization methods come with the cost of typically significantly longer reconstruction times than FBP-type approaches. This is because the forward and adjoint operators have to be evaluated repeatedly. Due to that reason, FBP type methods (or other direct approaches) are often preferred over the more elaborate sparse regularization techniques [3,5,16,88,89].
In this paper, we develop a numerically efficient reconstruction method for PAT with planar geometry that effectively deals with noisy data g = Uh + z, where regularization is achieved by enforcing sparsity constraints in the reconstruction with respect to wavelet coefficients of h. More precisely, we derive a direct method for calculating a minimizer of the 1 -Tikhonov functional in the case of exact data, gives rise to an inversion formula of the form f = λ κ λ Af , v λ ψ λ . Given noisy data g = Af + z, we show that a combination of this formula with soft-thresholding s w (see equation (11)), namely provides a minimizer of the functional (1). Additionally, we derive an efficient algorithm for the PAT-vaguelette transform g → ( g, v λ ) λ and provide an implementation for the WVD reconstruction (2). Moreover, we show order optimality of our method in the case of deterministic noise as well as is the case of Gaussian random noise. We also consider generalizations of our method and, in particular, we show how WVD reconstruction can be combined with an additional TV-prior. Sparse regularization has been widely used as a reconstruction method for general inverse problems and there is a vast literature on that topic (see, for example, [14,31,42,50,51,54,72,91,94,101]). In the statistical setting, 1 -Tikhonov regularization is known as LASSO [11,83,98,110] or basis pursuit [22]. In most cases, the reconstructions are computed by employing iterative algorithms (such as iterative soft-thresholding) to minimize the 1 -Tikhonov functional [8,12,27,28,31,39]. As mentioned above, those methods have the disadvantage to be slower than FBP type methods, as the forward and adjoint problem have to be solved repeatedly.
WVD and generalizations like biorthogonal curvelet decompositions and shearlet decompositions have been derived for the classical Radon transform in 2D, see [1,17,26,35,66]. Wavelet methods for inverse problems and tomographic applications have also been studied, for example, in [24,33,65,73,92]. To the best of our knowledge, this paper is the first to provide a WVD for PAT as well as an efficient direct implementation of sparse regularization using wavelets for that case.

Organization of the paper
In section 2 we review the mathematical principles of PAT with a flat observation surface and collect results required for our further analysis. In section 3 we derive the WVD for the forward operator. The WVD is then used to define the soft-thresholding estimator in section 4. In that section we also discuss the equivalence to variational estimation such as 1 -Tikhonov regularization and introduce hybrid estimators combining the WVD with general regularization terms like the TV-prior. The efficient implementation of the WVD based estimators requires an efficient algorithm for evaluating the PAT-vaguelette transform. Such an algorithm is derived in section 5, where we also present results of our numerical simulation

PAT with a flat observation surface
with h ∈ C ∞ 0 (H + ). We assume that the pressure data is observed on the hyperplane ∂H + = R d−1 × {0} and define the corresponding PAT forward operator as where u denotes the unique solution of (3). Our aim is to recover h from exact or approximate knowledge of Uh. We are particularly interested in the cases d = 2 and d = 3, as they are of practical relevance in PAT (see [15,69]). Nevertheless, in what follows, we consider the case of general dimension since this does not introduce additional difficulties.

Isometry property
The following isometry property for the wave equation is central in the analysis we derive below. For odd dimensions, it has been obtained in [13], and for even dimensions in [76]; see also [9].

Lemma 1 (Isometry property for the operator U). For any
Proof. See [13] for d odd and [76] for d even. □ From lemma 1 it follows that U extends to an isometry on the space cl(C ∞ 0 (H + )) with respect to the scalar product R d−1 ∞ 0 h 1 (x, y)h 2 (x, y)y −1 dydx. In view of the isometry property and the desired WVD, instead of the operator U (that represents the actual measured data), it is more convenient to work with the modified operator It is not hard to see that A is an isometry with respect to the inner product We have the following result. (4). Then, the following assertions hold:

Lemma 2 (Isometry property for the operator A). Let A be defined as in
The operator A uniquely extends to an isometry A : L 2 (H + ) → L 2 (H + ).

Proof.
(a) According to lemma 1 for every f ∈ C ∞ 0 (H + ) we have and, according to (a), the operator A is an isometry on C ∞ 0 (H + ). Consequently, it follows from the general Hilbert space theory that the operator A can be extended in a unique manner to an isometry A : Finally, from the construction of A it is clear that A L 2 (H + ) ⊆ L 2 (H + ).

Isometric extension to
For the following considerations it will be convenient to apply the operator A to functions that are defined on R d rather than on the half space H +. For that purpose, we need to extend the operator A : L 2 (H + ) → L 2 (H + ) in a meaningful way to an operator A : One possibility for doing this, is to consider the wave equation (3) with initial data f ∈ C ∞ 0 (R d ) and then to proceed as above. However, any function that is odd in the last variable would be in the kernel of the resulting operator. Therefore, we use a different extension that leads to an isometric operator on L 2 (R d ).
To that end, we define the operator S : . Then S is an isometric isomorphism with S −1 = S and S(L 2 (H + )) = L 2 (H − ), where H − is defined in an obvious way. We are now able to define the announced extension of A.
Proof. Any function f ∈ L 2 (R d ) can be written in the form f = P L 2 (H+) f + P L 2 (H−) f and From definition 3 and lemma 2 we then conclude that In what follows, we will also consider the operator

Explicit expressions for A and its dual
In this section, we will state explicit expressions for the operator A and its dual. For that purpose, we consider the spherical Radon transform M, which is defined as follows: where S n−1 := {x ∈ R n | |x| = 1} denotes the unit sphere in R n and S n−1 is its surface measure. A simple calculation (application of Fubini's theorem) shows that the dual M * of the operator M is given by The operator M * is called spherical backprojection operator, because (M * g) (x, z) integrates the function g over all spheres (z, r) that pass through the point (x, z). We will also consider the (fractional) differentiation operators The formal L 2 adjoints of those operators are given by We are now able to provide explicit expressions for the operator A and its dual A * .

Lemma 5. We have
Proof. The identity (5) follows from the well known explicit expression for the solution of the wave equation (see [29, p 682] and [37, p 80]). The identity (6) follows from (5) by applying calculation rules for the adjoint. □

Wavelet vaguelette decomposition (WVD)
In this section, for a given wavelet basis (ψ λ ) λ∈Λ of R d , we construct the WVD of the operators U and A that we defined in the previous section and prove inversion formulae for the case of exact data. To that end, we particularly will make use of the isometry relation that we proved in section 2.
The basic idea of the WVD is to start with an orthogonal wavelet basis and to construct a possibly non-orthogonal basis of the image space in such a way that the operator and the prior information are simultaneously (nearly) diagonalized [35]. For readers convenience, we summarized some basic facts about wavelets in appendix.

The idea of the WVD
be a linear, not necessarily bounded, operator and let (ψ λ ) λ∈Λ be an orthonormal wavelet basis of L 2 (R d ). The construction of a WVD for the operator K amounts to finding families (u λ ) λ , (v λ ) λ in L 2 (R d ) and a family (κ j ) j of positive numbers satisfying the following properties: Here, f g means that there are constants A, B > 0 such that A g f B g.
is called a WVD for the operator K. Given such WVD for an operator K, one can always obtain an explicit inversion formula for the operator K of the form Note the analogy between (7) and inverting an operator via its singular value decomposition (SVD). The numbers κ j depend here only on the scale parameter j and have the same meaning as singular values in the SVD. Thus, κ j are referred to as quasi singular values. Similarly to the SVD, the decay of the quasi singular values κ j reflects the ill-posedness of the inverse problem g = Kf .
A WVD decomposition has been constructed for the classical computed tomography modeled by the two dimensional Radon transform, where κ j = 2 j/2 ; see [35]. In the case of the two dimensional Radon transform, a generalization of the WVD, a so-called biorthogonal curvelet decomposition was constructed in [17]. In [26], the authors derived biorthogonal shearlet decompositions for two and three dimensional Radon transforms.
In the case of PAT, there are no such decompositions available so far. In the next subsection, we establish a WVD for the operators U and A, which serve as forward operators for PAT with a planar observation surface, and so automatically obtain an inversion formula for exact data.

Construction of the WVD for PAT
The desired WVD of U and A is in fact a direct consequence of the isometry relation of theorem 4.

Theorem 6 (The WVD for PAT). For every
Proof. We start with an arbitrary function f ∈ L 2 (R d ) and express this function in terms of a wavelet expansion f = λ∈Λ f , ψ λ L 2 ψ λ with respect to (ψ λ ) λ∈Λ . Then according to the isometry property (u λ ) λ∈Λ is an orthonormal basis of ran (A) and further A * u λ = ψ λ . In particular, this amounts to a WVD decomposition with v λ = u λ and κ j = 1. Further, (8) is a consequence of the WVD. Next, let f = |y| −1/2 h be an element in Consequently, applying (8) yields (9). □ Note that the identity (9) in theorem 6 is an explicit inversion formula for the operator U in the spirit of a WVD. Instead of an orthonormal wavelet basis it uses the family (|y| which is non-orthogonal with respect to the L 2 inner product. Restricted to functions in L 2 (R d ) that vanish outside K := K + ∪ S(K + ), where K + H + is any compact subset, the operator h → |y| 1/2 h is an isomorphism. Then, (|y| 1/2 ψ λ ) λ∈Λ allows to characterize the Besov norm · B r p,q of any function that is supported in K. Figure 2 shows a vertical, a horizontal and a diagonal Daubechies 6 wavelet (scaled and displaced versions of ψ (1) (x, y), ψ (2) (x, y) and ψ (3) (x, y); see appendix A.2) and the corresponding PAT-vaguelettes obtained by application of the operator A.

Inversion from noisy data
In what follows, we assume (ψ λ ) λ∈Λ to be an orthonormal compactly supported wavelet basis of L 2 (R d ). Further, by (u λ ) λ∈Λ we denote the corresponding PAT-vaguelette basis of If exact data are available, then the WVD decomposition (8) provides an exact reconstruction formula for the unknown f. However, in practical applications the data Af (or Uh) are only known up to some errors (e.g. noise). We therefore assume that we are given erroneous (noisy) data where z is the error term and f the exact unknown. We consider both the deterministic and the stochastic case, in which different models for z are assumed: In the deterministic case, we assume that a bound z L 2 δ is available.
In the stochastic case, we assume that z = δZ, where Z is a white noise process.
In the deterministic situation the constant δ > 0 is referred to as the noise level; in the stochastic case it is the noise standard deviation. The goal in both situations is to estimate the unknown f ∈ L 2 (R d ) from data in (10).

PAT-vaguelette-thresholding estimator
In section 3 we have shown that the reproducing formula f = A * Af = λ∈Λ Af , u λ L 2 ψ λ holds in the case of exact data. If the data is corrupted by noise, i.e. g = Af + z, the inner products Af , u λ L 2 cannot be computed exactly. Instead, they are estimated by first evaluating g, u λ L 2 and then applying the soft-thresholding operation with appropriate (level dependent) thresholds q = w j .

Definition 7 (WVD soft thresholding estimator).
For any nonnegative sequence w = (w j ) j , the WVD soft thresholding-estimator for the solution of (10) is defined by with the nonlinear soft-thresholding function s wj defined in (11).
In the deterministic case we assume that the error term z is known to satisfy z δ. In this case, due the isometry property of the operator A, one obtains a stable reconstruction by applying the adjoint A * , As the adjoint operator satisfies A * = S 0 , i.e. A * corresponds to the WVD-thresholding estimator with w = 0, it follows that also the class of WVD-thresholding estimators yields the optimal error estimate. Without additional knowledge about the noise there is no need to apply the WVD soft-thresholding estimator with w = 0, even if the function f is known to belong to some smoothness class, such as a Sobolev or Besov ball. However, if a portion of the noise energy is known to correspond to high frequency components, then using non-zero thresholds may significantly outperform A * given that f belongs to some smoothness class. For example, this happens in the case of stochastic white noise, where the energy-spectrum is uniformly distributed.

Optimality of PAT-vaguelette-thresholding
Suppose z = δZ, where Z is a white noise process. In the case of random noise, it is common to measure the performance of an estimator R : Further, define the nonlinear minimax risk, the minimax risk using the WVD estimator (12) and the linear minimax risk, respectively by From the definition it is clear that no reconstruction method R : can have worst-case risk ∆(R, δ, M) smaller than the non-linear minimax risk ∆ N (δ, M). We are in particular interested in asymptotic behavior for the case δ → 0 and M is a ball in a Besov space.
Then, as δ → 0, the following hold Proof. Follows from [35, theorem 4] for the special case κ j = 1. □ Theorem 8 implies that, despite its simplicity, the WVD-thresholding estimator is order optimal on any Besov-ball and the rate cannot be improved by any other estimator (up to some constant factors). On the other hand, if p < 2, then the exponent in the linear minimax rate is strictly smaller than the exponent in the non-linear minimax rate, r+d(1/2−1/p−) r+d(1−1/p−) < r r+d/2 . Therefore, no linear estimator can give the optimal convergence order. In particular, this implies that the WVD-thresholding estimator outperforms filter based regularization methods including the truncated SVD or quadratic Tikhonov regularization.

Variational characterizations and extensions
The WVD-based soft thresholding estimation can be characterized via various variational minimization schemes that, as we shall discuss later, in turn offer several extension of the WVD-estimators.
Theorem 9 (Variational characterizations of vaguelette thresholding). Let (w j ) j be a sequence of thresholds and let g,f ∈ L 2 (R d ). Then the following assertions are equivalent: (1) f = S w (g); (2) f = arg min 1 2 Proof.
(1) ⇔ (2): Let f denote the minimizer of which shows that f is the unique minimizer of The latter functional is minimized by minimizing every summand independently in the first argument. The minimizer is given by one-dimensional soft thresholding which gives f , ψ λ = Af , u λ = s w k ( g, u λ ) and therefore f = S w (g).
(1) ⇔ (3): Let f denote the solution of (13). Using that (ψ λ ) λ∈Λ is an orthogonal basis and ψ λ = A * u λ , shows that f can be equivalently characterized as the minimizer of The minimization problem (14) can again be solved separately for every component Af , u λ which is again given by the one-dimensional soft thresholding Af , u λ = s wj ( g, u λ ). As above this implies f = S w (g). □ The variational characterizations of theorem 9 have several important implications. First, they provide an explicit minimizer for the 1 -Tikhonov functional which in general has to be minimized by an iterative algorithm, such as the iterative soft thresholding algorithm and its variants [8,27,28,31]. For the analysis of 1 -Tikhonov regularization for inverse problems see [31,50,51,72]. Another important consequence is that the WVD-soft thresholding estimator can be generalized in various directions. In particular, one can get a generalization of (13) by replacing the L 2 -norm in (13) by an appropriate regularization functional J (for example, J can be chosen as the total variation norm, see section 5.3).
That is, instead of solving the problem (13), one aims at solving the problem This generalization constitutes a hybrid version that combines the WVD estimator with more general regularization functionals J . Such hybrid approaches have been introduced independently in [18,36,74,96] (see also [21,25]). It is also related to the Dantzig and multiscale estimators of [19,43,45,52,58,78].

Numerical implementation
Throughout the following, (ψ λ ) λ∈Λ denotes an orthonormal wavelet basis of L 2 (R d ) and (u λ ) λ∈Λ is the corresponding orthogonal PAT-vaguelette basis of ran (A) ⊆ L 2 (R d ) defined by u λ := Aψ λ (see theorem 6). In this section, we provide algorithms for the calculation of the PAT-vaguelette transform g → ( g, u λ ) λ∈Λ and the corresponding WVD estimator defined by (12). Moreover, we consider a hybrid version of the WVD estimator that combines wavelets and TV regularization and discuss its implementation. We also present results of our numerical studies.

Practical aspects
The considered noisy data model (10) assumes that the data can be collected on the whole hyperplane ∂H +. However, this is not feasible in practice since the data can be collected only on a finite subset. In this subsection, we discuss the effects of partial (or limited view) data and discretization.
First, we address the limited data issue. In the considered imaging setup, the data is collected on a subset Q × [0, T] where Q ⊆ R d−1 is the finite measurement aperture (see figure 3) and T ∈ (0, ∞) the maximal measurement time. Such partial or limited view data can be modeled by where χ Q×[0,T] denotes the characteristic function of Q × [0, T], and δZ is noise with variance δ 2 . Using partial data, only certain features of f can be reconstructed in a stable way, see [6,47,80,97] and 3. Consequently, the practical problem of reconstructing the unknown function f from partial data, is a limited data problem and therefore severely ill-posed. It is therefore common to incorporate a priori information into the reconstruction and so to regularize the reconstruction. In this work, we are doing it in two ways: first, we incorporate wavelet sparsity assumptions. This is what the WVD estimator does, which is implemented by evaluating f δ := λ s q ( g δ , u λ L 2 ) ψ λ (see definition 7). Here and below, for simplicity, we take the thresholding sequence w = (w j ) j being constant, and denote its value by q = w j . In the noise free case, taking q = 0, noting u λ = Aψ λ , and using the linearity of A and A * , we obtain The partial reconstruction f 0 = A * (χ Q×[0,T] Af ) is known t be a 'good' approximate reconstruction for f, since it recovers all visible boundaries of f correctly (as follows from [6,47,80,97]) and can be evaluated stably. Second, we combine wavelet sparsity with TV regularization by using the hybrid estimator (15) to impose even more regularization. Another practical restriction is that only a finite number of samples of the pressure can be measured. Assuming equidistant sampling and limited view data g as in (16)   Here N and M are natural numbers, ∆ N := X/N and ∆ M := T/M are the sampling step sizes and z [n, m] decribes the noise in the data. Using Shannon sampling theory it can be shown that (18) is correctly sampled for ∆ N = ∆ M π/Ω, where Ω is the essential bandwidth of f; see [61]. The precise analysis of discretization effects on the considered vaguelette estimators is beyond the scope of this paper.

Implementation of the PAT-vaguelette transform
Analogously to the wavelet transform, we can define the vaguelette transform of g ∈ L 2 (R d ) corresponding to the operator A by Vg := ( u λ , g L 2 ) λ∈Λ . Using the representation u λ = Aψ λ (compare (WVD1) and theorem 6) we see that the PAT-vaguelette transform is given by Hence, the PAT-vaguelette transform can be computed by first applying the adjoint A * to the data g and then calculating the wavelet transform of A * g. Assuming discrete data g ∈ R N d−1 ×M of the form g[n, m] = g(n∆ N , m∆ M ), the discrete PATvaguelette transform can be efficiently computed by algorithm 1. Both the steps in this algorithm are well known and numerically efficient (compare with remark 10 below). For the first step, we use a numerical approximation of A * by numerically implementing (6) with a filtered backprojection (FBP) algorithm (see [15,16]). For evaluating the second step we use the implementation of the fast discrete wavelet transform provided by the Matlab function wavedec2.

Implementation of the reconstruction algorithms
For the numerical experiments, that will be presented in the next section, we implement the WVD soft-thresholding estimator defined by (12) as well as a hybrid vaguelette-TV approach defined by (15) with J ( f ) = f TV being the TV norm of f. In both cases, we take constant weights w j = q, where q > 0 is some threshold. In the case of random noise, a typical choice for the threshold is Donoho's so called universal threshold δ 2 log(N d−1 M), that can be derived from extreme value theory [34,57]. The generalization to non-constant weights is straightforward.
The implementation of the WVD soft-thresholding estimator is summarized in algorithm 2. It is based on the discrete PAT-vaguelette transform that has been presented in algorithm 1. 1: Compute PAT-vaguelette coefficients c = Vg by applying algorithm 1 2: Apply soft thresholding to c using threshold q 3: Apply the inverse wavelet transform Remark 10 (Numerical complexity of the WVD soft-thresholding algorithm). In two spatial dimensions, and assuming N ∼ M , the FBP algorithm requires O(N 3 ) operations (see [16]) and the discrete wavelet transform requires O (N 2 log N) operations. Therefore, the discrete PAT-vaguelette transform (algorithm 1) as well as the proposed WVD soft thresholding estimator given by algorithm 2 have a complexity of O(N 3 ), which is the same as for the standard FBP algorithm. Using Fourier algorithms for evaluating A * (see [61,63,67,82,108])) the numerical effort even reduces to O (N 2 log N).
Given discrete data, the hybrid vaguelette-TV approach can be written in the form where B denotes the discrete FBP operator. One recognizes that (19) can be implemented by applying a hybrid wavelet-TV denoising algorithm to Bg. Following [44], we rewrite the optim ization problem (19) in the form and introduce the associated augmented Lagrangian operator Then, (20) can be solved by the alternating direction method of multipliers (ADMM), introduced in [48,49], which alternatingly performs minimization steps with respect to f and v and maximization steps with respect to μ. The resulting ADMM algorithm for solving (19) is summarized in algorithm 3.
For performing the f-update in algorithm 3 we have to solve the unconstraint total variation minimization problem 1 2 2 + c f TV . For this purpose we use Chambolle's dual projection algorithm [20]. In two spatial dimensions, assuming M ∼ N and using the FBP algorithm for evaluating Bg (which is required only once), the numerical complexity of algorithm 3 is O(N 3 + N 2 N inner N iter ). Here N iter is the number of outer iterations and N inner the number of inner iterations with the Chambolle algorithm. In the numerical example presented in the following subsection, we use N = 512, N iter = 100 and N inner = 50. In such a situation, the numerical complexity is dominated by N 2 N inner N iter , which is confirmed by the runtimes.

Numerical example 1: Disc phantom
In this section, we present a numerical example testing the PAT-vaguelette soft-thresholding and the hybrid approach. For that purpose we first consider a simple phantom that consists of the sum of four non-overlapping uniformly absorbing discs as illustrated in the top image in figure 4. The simple phantom has been chosen in order to clearly identify the reconstruction artefacts due to noise and the limited detection view. Note the amount of visible boundaries depends of the location of the disc relative to the observation surface (see figure 3). More boundaries are visible for the inner discs than for the outer discs. The initial phantom f is discretized using 128 × 512 equidistant samples in To simulate measurement errors, we added i.i.d. Gaussian noise with standard deviation δ = 0.25. The relative 2 -error in the noisy data g δ ∈ R 512×512 is g δ − g 0 2 / g 0 2 = 1.05. The simulated data with noise added in shown in middle row in figure 4. Note that the data are shown for times restricted to [0,2] but simulated until the maximal measurement time T = 4, and have been multiplied by |t| 1/2 to represent actual PAT measurement data. The last row in figure 4 shows a reconstruction from exact data using the FBP algorithm (denoted as B and implemented following [15]). Reconstructions from noisy data are presented below. We point out here, that the FBP algorithm equals our WVD soft-thresholding estimator when the threshold parameters are chosen to be constantly zero (see algorithms 1 and 2). Reconstruction results for noisy data are shown figure 5. The top image shows the non-regularized reconstruction f δ FBP = Bg δ , the middle image the WVD soft-thresholding reconstruction f δ WVD and the bottom image the reconstruction with the hybrid vaguelette-TV method f δ hybrid using 100 steps of the ADMM with 50 inner iterations for Chambolle's TV minimization algorithm. For the WVD and the hybrid reconstruction, the weights w j = q = 0.63 are chosen to be constant which value equals half of Donoho's universal threshold δ 2 log(512 × 512).  One clearly observes that the vaguelette estimators significantly reduce the error in the reconstruction. This can be quantified by the relative 2 -errors, which are equal to f δ FBP − f 2 / f 2 = 0.66 for the unregularized (FBP) reconstruction, f δ WVD − f 2 / f 2 = 0.38 for the PAT-vaguelette thresholding estimator and ||f δ hybrid − f|| 2 / f 2 = 0.42 for the hybrid estimator. The relative 2 -error for the hybrid reconstruction is slightly larger than for the vaguelette-thresholding estimator. This is reasonable since the PAT-vaguelette-thresholding estimator minimizes the 2 -norm (see (13)) among all potential reconstructions f that are compatible with the data, whereas the hybrid estimator minimizes the total variation. However, in some more appropriate error measure the hybrid reconstruction may outperform the thresholding estimator. Further note that all reconstruction methods contain some limited data artifacts, which cannot be removed completely by wavelet methods [6,47,61,79,97].

Numerical example 2: Shepp-Logan phantom
Next we present reconstruction results using a more complex phantom. We also discuss the effects of different noise levels and different thresholds. The Shepp-Logan head phantom f is discretized using 256 × 512 equidistant samples in thresholding estimator and 39.26 s for the hybrid estimator. As in the disc example, the reconstruction errors (that are measured in the L 2 -norm) are slightly larger for the hybrid method, while the visual results might be slightly more appealing. This again indicates that the L 2 -norm might not be the optimal error measure (at least) for the hybrid method. As a consequence, the following detailed error study is only performed for the WVD soft thresholding method.
To investigate the dependence of the reconstructions on the noise variance and the selected thresholds in more detail, we applied the WVD soft thresholding estimator for noisy data with  figure 9. For any value of the standard deviation, the reconstruction errors first decreased with increasing threshold and then start to increase again. The relatively large values of the reconstruction errors are due to the limited view artifacts that are present in any reconstruction. The limited view artifacts are also partially removed by the soft thresholding estimator which affects that similar thresholds is optimal for a large range of nose levels (in the sense that they minimize E 1 , note the solid white line with added white dots). In the right image in figure 9, we therefore also depict the relative distance , where f 0 = Bg 0 is the noise free partial reconstruction. The threshold that minimize E 2 now shows a clear (linear) dependence on the variance. The dashed lines in the images in the top row show Donoho's universal threshold δ 2 log(512 × 512) that is known to completely remove noise with high probability but also is known to yield over-smoothing [34,57].

Conclusion
In this paper, we developed a regularization framework using wavelet sparsity in photoacoustic tomography (PAT). For that purpose we derived wavelet-vaguelette decompositions (WVDs) (see theorem 6) and efficient implementation of the corresponding PAT-vaguelette transform (see algorithm 1). Using the WVD we derived an explicit formula for minimizing the sparse Tikhonov functional that can be implemented without any iterative reconstruction procedure (see algorithm 2). The considered regularization approach has been shown to provide optimal error estimates in the deterministic as well as in the stochastic setting (see theorem 8). The computational complexity of the given implementation is O(N 3 ) for recovering a phantom at N × N reconstruction points, which is the same as for the standard filtered backprojection (FBP) algorithm. In order to account for wavelet artifacts we also developed hybrid regularization methods combining wavelet sparsity with total variation (see algorithm 3). Numerical results demonstrate the feasibility of our reconstruction approaches. Future work will be done to extend our approach to more general measurements geometries in PAT and also to different Radon type inverse problems.

Appendix. Orthogonal wavelets
We recall some basic facts about orthogonal wavelets as we need them for our purpose (in par ticular in section 3). For the task of function estimation, wavelets are known to sparsely represent many signals and, hence, they can be used to effectively encode prior information. Another useful property of wavelets consists in the ability to characterize several classical smoothness measures (e.g. Sobolev and Besov norms) in terms of the decay properties of wavelet coefficients. We will also use that property in section 4. For a detailed introduction to wavelets we refer to [23,30,75].

A.1. One-dimensional wavelets
We first briefly recall the basic definitions and notations of orthonormal wavelet bases (wavelet ONB) in one spatial dimension, which will be then extended to higher dimensions.
The construction of a wavelet ONB is based on the concept of a multiresolution analysis (MSA), which is given by a sequence subspaces (V j ) j∈Z in L 2 (R) that satisfy the following requirements (see [23,30,75]): For all j ∈ Z it holds that V j ⊆ V j+1 . The union j∈N V j is dense in L 2 (R).
There is a function ϕ ∈ L 2 (R) such that the translates (ϕ(· − k)) k∈Z constitute an ONB of the scaling space V 0 .
The function ϕ is called scaling function and the spaces V j are called scaling (or approx imation) spaces at scale j. To each scaling space V j , one can associate a wavelet (or detail) space W j , that are defined to be the orthogonal complements of V j in V j+1 , i.e. V j+1 = V j ⊕ w j . Because of the above properties it holds that, for each j ∈ Z, the functions t → ϕ j,k (t) := 2 j/2 ϕ 2 j t − k constitute an ONB for the scaling space V j . One can also show that from the existence of the scaling function it follows that there exists a so-called generating wavelet (or mother wavelet) ψ ∈ L 2 (R) such that, for each j ∈ Z, the functions t → ψ j,k (t) := 2 j/2 ϕ 2 j t − k constitute an ONB for the spaces W j . Hence, we have that, for every j ∈ Z, the following mappings are bijections: The above constructions provide the following decompositions of the signal space L 2 (R) into the sum of the scaling spaces V j and the wavelet spaces W j , or into a sum of only wavelet spaces: From these decomposition we immediately get the following decompositions of signals f ∈ L 2 (R): The coefficients of the above decomposition are called the wavelet and scaling coefficients of f, respectively, and the corresponding mapping that maps f to those coefficients is called the wavelet transform of f. A detailed construction of orthogonal (and biorthogonal) wavelet basis together with many interesting details may be found, for example, in [23,30,75].
is well defined and finite. We note that, the given definition of · B r p,q is an equivalent norm to the classical definition of Besov norms. The above characterization holds as long as the generating wavelet has m > r vanishing moments and is m-times continuously differentiable.
In section 4, we use the above characterization of Besov norms in terms of wavelet coefficients in order to evaluate the performance of our method for functions that lie in balls of Besov spaces B s p,q (R d ) for some given norm parameters p, q 1 and a smoothness parameter r 0.