Adaptive Spectral Inversion for Inverse Medium Problems

A nonlinear optimization method is proposed for the solution of inverse medium problems with spatially varying properties. To avoid the prohibitively large number of unknown control variables resulting from standard grid-based representations, the misfit is instead minimized in a small subspace spanned by the first few eigenfunctions of a judicious elliptic operator, which itself depends on the previous iteration. By repeatedly adapting both the dimension and the basis of the search space, regularization is inherently incorporated at each iteration without the need for extra Tikhonov penalization. Convergence is proved under an angle condition, which is included into the resulting \emph{Adaptive Spectral Inversion} (ASI) algorithm. The ASI approach compares favorably to standard grid-based inversion using $L^2$-Tikhonov regularization when applied to an elliptic inverse problem. The improved accuracy resulting from the newly included angle condition is further demonstrated via numerical experiments from time-dependent inverse scattering problems.


Introduction
The solution of inverse medium problems entails the reconstruction of a medium's spatially varying properties, u(x), inside a bounded region of space Ω ⊂ R d from partially available, often noisy observations of a state variable y.This is inverse to the direct or forward problem of determining y for a given medium u(x), where typically y satisfies a governing partial differential equation, be it stationary or time-dependent, which involves u(x).More abstractly, if we denote the solution to the forward problem by a forward operator F : H 1 → H 2 acting between two Hilbert spaces H 1 , H 2 , the associated inverse problem then consists in solving for a given right-hand side y † ∈ H 2 , that is, to determine a solution u † ∈ H 1 such that y † = F (u † ).Due to measurement errors, only approximate (noisy) data y δ is available in practice, for a noise level δ 0.Then, the inverse medium problem consists in finding u †,δ ∈ H 1 that satisfies which we reformulate as a least-squares minimization problem for the misfit J δ : H 1 → R as u †,δ = argmin u∈H1 J δ (u) := argmin u∈H1 1 2 F (u) − y δ 2 H2 . (1.4) In general, (1.3) or (1.4) is ill-posed and cannot be solved directly without further regularization.Numerical methods for obtaining stable solutions to (1.3) or (1.4) essentially fall into three categories.First, starting from an initial guess u (0) , iterative regularization methods, such as the Landweber iteration [21,28], a special case of gradient descent, improve upon u (m) at iteration m by using the derivative of J δ [3,25].Due to semi-convergence, the iteration needs to be stopped judiciously as the regularization parameter corresponds to the reciprocal of m.Second, Tikhonov regularization adds a regularization term R(u) to J δ (u) [12,27].Hence, instead of solving (1.4), one now seeks a minimizer of J δ (u) + αR(u), where α is the regularization parameter which must be chosen carefully.Typical choices for the regularization term R(u) include the L 2 -norm, the TV-norm, or the H 1 -semi-norm, depending on the expected smoothness in u † .Third, regularization by projection (or discretization) in preimage-space replaces in (1.4) or (1.3) the infinite dimensional space H 1 by a finite dimensional subspace Ψ (m) ⊂ H 1 and then solves for u (m) ∈ Ψ (m)  [16,26].Here, Ψ (m) typically corresponds to the underlying subspace of a finite difference or finite element discretization whose mesh-size, h m > 0, then acts as the regularization parameter.Alternative projection methods use finite-dimensional subspaces in image-space [23,24], or even projection into both image-and preimage-space simultaneously [20,33].
Unless an effective parametrization of u(x) is known a priori, inverse medium problems usually lead to prohibitively high-dimensional search spaces where the number of unknown parameters (or control variables) is determined by the degrees of freedom of the underlying finite difference or finite element discretization.Alternatively, sparsity promoting strategies attempt to remain sufficiently general while keeping the dimension of the search space small by applying 1 -norm soft-thresholding, say, to promote a sparse representation in a fixed wavelet, curvelet, etc. basis or frame [8,22,29,31].
An adaptive inversion method was first introduced in [11, Section 5.4], where the search space consists of the first few eigenfunctions of a judicious elliptic differential operator L[u (m) ].
As L[u (m) ] itself depends on the previous iterate, u (m) , so do its eigenfunctions; hence, the current search space is repeatedly adapted at every iteration.To avoid division by zero in the presence of vanishing gradients, De Buhan and Kray later incorporated a small fixed parameter ε which led to the coercive elliptic operator L ε [u (m) ].Restricting the search space to the first few eigenfunctions of L ε [u (m) ] has proved highly effective in a number of acoustic [10,15], electro-magnetic [9] and seismic [13] inverse scattering problems, but also in optimal control [34].
To take advantage of the regularizing effect of a low-dimensional search space, both the eigenfunctions and the dimension of the search space were adapted in [17].Moreover, the connection between L ε and TV-regularization motivated the use of different elliptic operators depending on the expected smoothness in the target medium [18].Recently, a first step was taken in developing a mathematical theory underpinning the remarkable accuracy of that decomposition for the approximation of piecewise constant functions [2], which led to rigorous L 2 -error estimates in [1].
The remaining part of this paper is structured as follows.In Section 2, we introduce the general Adaptive Inversion iteration for the solution of inverse problems, which proceeds by solving (1.4) in a sequence of finite dimensional subspaces Ψ (m) ⊂ H 1 , not necessarily known a priori.Here we identify a key angle condition, which yields convergence of the Adaptive Inversion iteration, and also prove that it is a genuine regularization method.In Section 3, we present the Adaptive Spectral (AS) decomposition and recall its approximation properties [1,2].By combining the Adaptive Inversion iteration together with the AS decomposition, we propose in Section 4 the Adaptive Spectral Inversion (ASI) Algorithm, which incorporates the new angle condition from Section 2 by including the sensitivities of the gradient into the construction of the search space at the subsequent iteration.Finally in Section 5, we present several numerical experiments which illustrate the performance of the ASI method and verify the theory from Section 2. In particular, we apply the ASI method to a time-dependent inverse scattering problems which demonstrates that it yields a significant improvement over previous versions from [17,2].

Adaptive Inversion
To determine a solution to the inverse problem (1.3) for given data y δ , we shall minimize the misfit J δ in (1.4) successively in a sequence of (closed) finite-dimensional subspaces Ψ (m) ⊂ H 1 , m 1, until its gradient D J δ vanishes.In doing so, we do not assume the entire sequence of subspaces {Ψ (m) } m 1 known a priori.Hence, we consider the following adaptive inversion algorithm:

4.
m ← m + 1 By ensuring that u (m),δ also belongs to the new search space Ψ (m+1) , we guarantee in every iteration that the misfit does not increase, Therefore the sequence {u (m),δ } m 1 obtained by the Adaptive Inversion Algorithm 1 is a minimizing sequence of J δ .Without further assumptions, however, we do not know yet whether this sequence converges.

Convergence
To prove that the sequence {u (m),δ } m 1 generated by the Adaptive Inversion Algorithm indeed converges, we proceed as follows.First, we identify a key angle condition, which ensures that for fixed δ 0, D J δ (u (m),δ ) → 0 as m → ∞.Then, under suitable assumptions from convex optimization theory, we conclude that the sequence indeed converges to u †,δ .
and further assume that J δ (u) C > −∞ for all u ∈ H 1 .Then, if the corrections d (m),δ = u (m+1),δ − u (m),δ satisfy the angle condition uniformly in m, we have Proof.First, we claim that where As both u (m+1),δ , u (m),δ ∈ Ψ (m+1) , we also have . The linearity of the derivative thus yields which proves (2.6).Since J δ is bounded from below and decreases in every iteration, there exists a constant C 0 0 such that by (2.6) Next, we define the angle and rewrite (2.12) -(2.14) using (2.7) as (2.16) Taking the limit M → ∞ then yields the well known Zoutendijk condition [36] ∞ which immediately implies Since we optimize for u (m),δ such that the angle condition (2.4) holds uniformly in m, i.e. cos(θ m ) ε θ for all m 1, with 0 < ε θ < 1, we thus conclude that Theorem 2.1 implies that the Adaptive Inversion Algorithm yields a minimizing sequence {u (m),δ } m 1 such that the gradient of the misfit tends to zero; hence, the algorithm is welldefined and converges.Without further assumptions, however, this does not imply that the sequence {u (m),δ } m 1 converges (weakly) to a (local) minimizer or accumulation point.Under further standard assumptions from convex optimization theory, one can even prove convergence to a minimizer of (1.4), as in [4,Corollary 11.30].Those assumptions, however, seldom hold in practice for inverse medium problems.

Regularization
In the presence of perturbed noisy data, it makes little sense to improve the approximate solution u (m),δ at iteration m beyond the error δ in the observations.Instead for τ > 1, we stop the Adaptive Inversion Algorithm at iteration m * (δ) when the discrepancy principle is satisfied: (2.20) If we assume that the search spaces Ψ (m) satisfy where Π Ψ (m) denotes the projection into Ψ (m) and u † the exact (noise-free) solution, the following lemma guarantees that the Adaptive Inversion Algorithm will always satisfy (2.20) after a finite number of steps.
Proof.By minimality of u (m),δ , the residual is bounded by (2.24) From the continuity of F and (2.21) we obtain for m M sufficiently large.Thus m * (δ) := M is always well defined.
Next, we consider the sequence {u (m * ),δ } δ 0 obtained from the Adaptive Inversion Algorithm when stopped via the discrepancy principle (2.20).To show that it converges to the exact (noisefree) solution u † as δ → 0, that is that we assume that the forward operator F is continuous with Fréchet derivative DF (u) and that the standard tangential cone condition (aka Scherzer condition), holds for some η ∈ (0, 1).Moreover, we assume that F is weakly sequentially closed.Under all the above assumptions, we conclude from [26,Theorem 3.4] that the Adaptive Inversion Algorithm is a genuine regularization method in the sense of [12, Definition 3.1]: Theorem 2.3.Let Ψ (m) satisfy (2.21), F be weakly sequentially closed, Fréchet differentiable and also satisfy (2.27).Then, the sequence {u (m * ),δ } δ 0 obtained from the Adaptive Inversion Algorithm 1, stopped at iteration m * = m * (δ) according to (2.20), admits a subsequence converging to a minimizer of J δ .Moreover, if the minimizer is unique, then the sequence {u (m * ),δ } δ 0 converges to the exact (noise-free) solution u † of the inverse problem (1.3), that is (2.29) 3), we then infer that which yields the sought contradiction and hence the uniqueness of u † .Thus, if the nullspace of the Fréchet derivative DF (u) is trivial ∀u ∈ H 1 , then Theorem 2.3 immediately implies the convergence of {u (m * ),δ } δ 0 to the exact solution.Clearly, those assumptions may not hold in practice, for instance, in a situation of limited indirect observations.
In summary, the Adaptive Inversion Algorithm generates a minimizing sequence {u (m),δ } m 1 for J δ and, under standard assumptions, Theorem 2.1 implies that the gradient of J δ tends to zero; hence, the algorithm is well defined and terminates.In fact, under standard assumptions from convex optimization theory, there exists for every δ a minimizer u †,δ with u (m),δ → u †,δ as m → ∞.Moreover, under standard assumptions for inverse problems, we deduce that the Adaptive Inversion Algorithm yields a genuine regularization method; thus, u (m * ),δ converges to the exact (noise-free) solution u † of (1.3) as δ → 0.

Adaptive Spectral Decomposition
To apply the Adaptive Inversion Algorithm from Section 2, we must specify the search spaces Ψ (m) in practice.Here we shall consider low-dimensional search spaces spanned by the first few eigenfunctions of a judicious elliptic operator, which itself depends on the previous iterate.By combining those search spaces with the Adaptive Inversion Algorithm, we shall devise in Section 4 the Adaptive Spectral Inversion Algorithm for the solution of inverse medium problems.
Consider an arbitrary bounded function 2, which vanishes on the boundary.Next, let V h ⊂ H 1 0 (Ω) be the finite element space of continuous, piecewise polynomials of degree r 1, on a regular and quasi-uniform mesh with mesh-size h > 0. Denoting by u h ∈ V h the standard P r -FE interpolant of u, we introduce the differential operator In contrast to u, which may be discontinuous, u h is continuous and piecewise polynomial with ) is well defined and for h small even uniformly elliptic in Ω [1]; in practice, we always set ε = 10 −8 .Hence, there exists a non-decreasing sequence of strictly positive eigenvalues For K 1, we let the (truncated) Adaptive Spectral (AS) decomposition of u.Note that the operator L ε [u h ] depends on u h (and ε), and hence so do its eigenfunctions as well as Π To illustrate the AS decomposition, consider the piecewise constant function u : Ω ⊂ R 2 → R shown in Figure 1a.Next, we (numerically) solve the eigenvalue problem (3.2) with ε = 10 −8 using a P 1 -FE discretization with mesh-size h = 0.01.As shown in Figure 2, the first three eigenfunctions ϕ k , k = 1, 2, 3 capture the inclusions rather well, whereas the subsequent eigenfunctions resemble eigenfunctions of the Laplacian with eigenvalues that scale as 1/ε.Next, we project u into Φ 3 = span{ϕ 1 , ϕ 2 , ϕ 3 }.As shown in Figure 1b, the projection Π 3 [u h ]u is hardly distinguishable from u.
In [1,2], rigorous L 2 -error bounds were derived for the AS decomposition.More precisely, consider a piecewise constant function with zero boundary values of the form where χ A k is the characteristic function of Lipschitz domains A k ⊂⊂ Ω with mutually disjoint and connected boundaries ∂A k .

Next we partition Ω = M h ∪ D h into the two open sets
where M h is the open tube with diameter h > 0 around the jump discontinuities of u.In Figure 3, for instance, the sets A k , M h , and D h are shown for u as in Figure 1a.Clearly, the FEinterpolant u h ∈ V h of u satisfies u h = u and ∇u h = 0 in D h .Moreover, for a sequence of regular and quasi-uniform meshes (T h ) h>0 , we have Fig. 3: AS decomposition: The sets A k , k = 1, 2, 3, M h , and D h from (3.6) for u as in Figure 1b.
In [2], those properties led to the following upper bound for the first K eigenfunctions.
Theorem 3.1.Let u be given by (3.5), u h ∈ V h be its FE interpolant, and Then, for every ε > 0 and h > 0 sufficiently small, there exists a constant C > 0, independent of ε and h, such that From Theorem 3.1 we conclude that the first K eigenfunctions of L ε [u h ] are almost piecewise constant in D h , that is away from any discontinuities, when u is piecewise constant with K inclusions.This indicates that the AS decomposition (3.3) is able to approximate u well throughout Ω, which was proved in [1]: u be its projection.Then, for every ε > 0 and h > 0 sufficiently small, there exists a constant C > 0, independent of ε and h, but possibly depending on u, such that In summary, the AS decomposition Π K [u h ]u of u approximates u arbitrarily well, when u is piecewise constant and consists of K inclusions.Note that more general versions of Theorem 3.1 and Theorem 3.2 were proved in [2, Theorem 5] and [1,Theorem 3.6], which also apply to piecewise constant functions with nonzero boundary values.
Finally, we remark that the operator L ε [u] corresponds to the Fréchet derivative of the regularized TV-functional, TV ε (u) = Ω |∇u| 2 + ε 2 , which reduces to the standard TV-functional TV(u) = Ω |∇u| for ε = 0 -see [17,Remark 1].In fact, for v ∈ H 1 0 , v ≈ u h , the "energy-norm" associated to the operator L ε [u h ] essentially corresponds to the TV-"energy", The AS decomposition also bears a remarkable resemblance to the spectral decomposition of the nonlinear TV-functional [6,14].

Adaptive Spectral Inversion
We shall now combine the simple Adaptive Inversion iteration from Section 2 with the AS decomposition (3.3) from Section 3. Hence, at each iteration, we shall determine the new search space Ψ (m+1) from the current minimizer u (m),δ ∈ Ψ (m) using the eigenfunctions ϕ k of the elliptic operator L ε [u (m),δ ] in (3.2).Here, we recall from Theorems 3.1 and 3.2 in Section 3 that a piecewise constant medium u with K inclusions is approximated with high accuracy by the first K eigenfunctions of L ε [u h ] -see also [1,2] for further details.First, we merge the current search space Ψ (m) with those eigenfunctions and reduce its dimension, while ensuring that u (m),δ is still well represented in the merged space.This first step corresponds to the algorithm previously used in [2] -See Remark 2 below for further details.
Lemma 4.1.Let N ∞ be the largest index such that Thus, the angle condition (2.4) is satisfied.
Note that for ε θ < 1 there always exists N ∞ 1 sufficiently large such that (4.2) is satisfied.
We refer to (4.2) as the ∞ -criterion and to (4.7) as the 2 -criterion.If we let we ensure that there exists at least one element in the subspace Φ N θ that satisfies the newly introduced angle condition (2.4).Clearly, (4.2) is more stringent than (4.7) and thus N 2 N ∞ .
As we only compute in practice a finite number of eigenfunctions ϕ k , 1 k N , ordered according to their eigenvalues, none might in fact satisfy (4.2).Then we set N ∞ = 0 and thereby N θ = N 2 in (4.11).In the unlikely case that no N 2 N satisfies (4.7), one needs to increase N until (4.7) holds to ensure the existence of d ∈ Φ N θ which satisfies (2.4).This yields the full Adaptive Spectral Inversion (ASI) Algorithm below.

8.
Truncate Ψ(m+1) while maintaining the accuracy of u (m),δ .This yields Ψ(m+1) .m ← m + 1 By combining the previous search space Ψ (m) with promising new search directions determined from the current iterate u (m),δ , Steps 6 -8 construct a low-dimensional subspace Ψ(m+1) able to represent u (m),δ up to a small error.In Step 9, we then add the basis functions which will probably contribute most to the reduction of the misfit.Below, we discuss in detail Steps 6 -9 of the ASI Algorithm 2.
Here we arbitrarily set the dimension of Φ (m+1) to that of Ψ (m) , though any other sufficiently large number of eigenfunctions could be chosen.
Step 8: Truncate.To truncate Ψ(m+1) and retain only those basis functions ψk essential for representing u (m),δ , we proceed in two steps.First, we compute an indicator v close to u (m),δ , but with minimal TV-"energy", to remove noise and preserve edges.In doing so, we keep the computational costs low by minimizing the linearized TV-functional (3.10).Thus, the indicator v satisfies for a prescribed truncation tolerance ε Ψ > 0; typically, ε Ψ = 5%.Since (4.13) is a quadratic optimization problem with quadratic inequality constraints, computing v is cheap.Second, we reduce the dimension of Ψ(m+1) by discarding all basis functions that do not contribute much to the indicator v. Let γ k denote the Fourier coefficients γ k of v, sorted in decreasing order and sort the L 2 -orthonormal basis functions ψ(m+1) k accordingly.Next, we determine the index such that the relative L 2 -error in the Fourier expansion truncated at N 0 is below ε Ψ .To avoid drastic changes in the dimension of the search space, we now calculate ρ = N 0 /K m .For prescribed ρ 0 , ρ 1 such that 0 < ρ 0 1 ρ 1 , typically ρ 0 = 0.8 and ρ 1 = 1.2, we choose the dimension Km+1 of the truncated space Ψ(m+1) as follows: (ii) If ρ < ρ 0 , the dimension decreases too fast.Set Km+1 = ρ 0 K m and halve ε Ψ to avoid a rapid decrease in the dimension at the next iteration.
(iii) If ρ > ρ 1 , the dimension increases too fast.Set Km+1 = ρ 1 K m and double ε Ψ to avoid a rapid increase in the dimension at the next iteration.
In doing so, we use the ∞ -and2 -criteria (4.2) and (4.7) from Lemmas 4.1 and 4.2, respectively, to determine N θ .In the unlikely event that the AS space Φ (m+1) from Step 6 contains no eigenfunction that satisfies (4.2) or (4.7), we can either increase the dimension of Φ (m+1) , or set Φ(m+1) = ∅ (and simply proceed).By combining Ψ(m+1) and Φ(m+1) , we eventually obtain the subsequent search space Note that the sensitivity based selection procedure in Step 9 only ensures that the angle condition (2.4) is satisfied by at least one element d in Φ(m+1) ⊂ Ψ (m+1) , but not necessarily by the defect d (m),δ = u (m),δ − u (m−1),δ itself.The latter, stronger condition would require verifying (2.4) at every iteration, and possibly rejecting the new minimizer of (4.12) when not satisfied, while further increasing the search space.Instead, Lemmas 4.1 and 4.2 guarantee that at least one element in the search space satisfies the angle condition, thereby making it rather likely that it will also be satisfied by the correction d (m),δ .

Remark 2. In the above Adaptive Spectral Inversion (ASI) Algorithm, Steps 1 -8 correspond to the ASI Algorithm previously introduced in [2], yet with the added growth control on K m in Step 8 (iii).
Step 9, however, where further eigenfunctions are included based on their sensitivites (4.1), is new and will prove crucial for detecting small-scale features in the medium.Henceforth we denote by ASI 0 the above ASI Algorithm without Step 9, similar to that from [2], to distinguish it from the present ASI Algorithm.

Numerical Results
To illustrate the accuracy and usefulness of the ASI Algorithm from Section 4, we shall now apply it to two inverse medium problems of the form:

H2
(5.1) for a given source f and noisy data y δ ∈ H 2 as in (1.2).Each inverse problem is governed by a distinct forward problem (5.2) whose solution, for any given medium 1 u, is y = A[u] −1 f .Thus, we can eliminate the constraint (5.2), which leads to the equivalent unconstrained minimization problem: Find u †,δ ∈ H 1 such that For each inverse problem, we shall attempt to recover separately the two different (unknown) media shown in Figure 4.The first consists of six discs, each with a different value and radius.The second consists of three inclusions: an open wedge with a sharp 90 • interior angle, a convex drop-like inclusion with a sharp tip, and a kite-shaped inclusion, which is non-convex with a smooth boundary.Since the unknown medium u † must be strictly bounded away from zero, we write u † = 1 + u † 0 for u † 0 with zero boundary.Then, we apply the ASI Algorithm 2 to minimize the misfit min u J δ (1 + u) (5.4) with u equal to zero at the boundary.In all cases, we apply the ASI Algorithm 2 from Section 4 with the following fixed parameter settings: (5.5) As initial guess, we always choose u (0),δ = 1 constant and set the initial search space Ψ (1) equal to the first K 1 = 50 or K 1 = 100 L 2 -orthonormal eigenfunctions of the Laplacian sorted in non-decreasing order w.r.t.their eigenvalues.
To assess the accuracy of the ASI method, we shall monitor the following quantities: The dimension K m of the search space Ψ (m) , the relative error the ratio τ m from (2.20), and the total number of iterations m * , where m = m * + 1 is the first index such that τ m τ 0 ; hence, the discrepancy principle (2.20) is then satisfied with τ = τ m * .

Elliptic Inverse Problem
First, we consider as forward problem the elliptic differential equation in the unit square Ω = (0, 1) 2 with constant right-hand side f = 100; hence, the observations y δ are available throughout Ω.In [24,25,35], the one-dimensional version of (5.8) was considered and shown to satisfy the Scherzer condition (2.27).
Here we shall compare the ASI Algorithm 2 from Section 4 with a standard grid-based inversion method using Tikhonov regularization.We initialize the ASI method with the constant initial guess u (0),δ = 1 and let Ψ (1) equal the first K 1 = 100 Laplace eigenfunctions.The forward problem (5.8) is solved in , where V h corresponds to the subspace of continuous, piecewise linear P 1 finite elements (FE) with mesh size h > 0. Here, we use for both u and y the same fixed triangular mesh with vertices located on a 400 × 400 equidistant Cartesian grid.Despite the large number (160 000) of dof's in the FE representation, we recall that the ASI Algorithm determines the m-th iterate only in the much smaller subspace Ψ (m) of dimension K m .In the m-th step of the ASI Algorithm the minimizer u (m),δ to (4.12) is determined using standard BFGS together with Armijo [36] instead of Wolfe-Powell line search to keep the number of (expensive) gradient evaluation small.
For the standard Tikhonov L 2 -regularization approach, we minimize min where α > 0 denotes the regularization parameter.Again, we discretize u and y with P 1 -FE on the same triangular mesh as for the ASI method, yet due to the resulting large number of unknowns, we now solve (5.9) using standard limited memory BFGS (L-BFGS) [36] together with Armijo line search.Following [12, Section 11.2], we set the regularization parameter α n = 2 −n at the n-th L-BFGS iteration.
To avoid any potential inverse crime, the exact data y † = y[u † ] was computed from a 20% finer mesh and perturbed at each grid point x j as for a noise level δ 0. Here, η j corresponds to Gaussian noise normalized such that the data misfit with y † , y δ ∈ V h (Ω) satisfies exactly In Table 1 we list the relative error e m * at the final iteration m * for the ASI method and Tab. 1: Elliptic inverse problem, six discs: the relative error em * , the total number of iterations m * , and the dimension Km * of the search space are shown for the ASI method and L 2 -Tikhonov regularization.1.005 1.0001 1.000 1.0003 1.000 1.0008 1.005 1.001 L 2 -Tikhonov regularization together with the dimension K m * of the search space Ψ (m * ) .As expected, both methods require fewer iterations m * to achieve (2.20) as the noise level δ increases, while the relative error obtained by the ASI method is always smaller than that achieved by standard L 2 -Tikhonov regularization.Even with δ = 10% the ASI method yields a smaller relative error than standard Tikhonov regularization using smaller δ.In Figure 5, we compare the reconstructed media u (m * ),δ for the two methods.Clearly, the reconstruction obtained by the ASI method displays sharper contrasts and crisper edges while the coefficients inside each disc are more accurate.Moreover, for L 2 -Tikhonov regularization, the background appears noisy and the coefficients inside the inclusions are not accurately reconstructed.Remarkably, the better reconstruction obtained by the ASI method is achieved with as few as K m * = 100 control variables only, in contrast to grid-based Tikhonov regularization with more than 160 000 unknowns.

Method
In Figure 6, we observe that the relative error e m and D J δ (u (m),δ ) decrease throughout all iterations, as expected from Theorem 2.1 in Section 2, while τ m tends to 1.Note that the dimension K m , or equivalently the number of control variables, remains small for the ASI method regardless of δ, which keeps the computational effort low.Next, we consider the (unknown) medium u † shown in the right frame of Figure 4, which consists of three distinct inclusions.Again, the ASI Algorithm from Section 4 is able to recover the medium at the various noise levels δ, as shown in Figure 7: all three inclusions are clearly visible with good contrast and sharp edges, except for the reentrant corner of the open wedge with 5% noise.L 2 -Tikhonov regularization, however, results in more noisy and blurred reconstructions, where the inclusions become hardly visible beyond 5% noise.
From Table 2, we again infer that the relative error e m * of the ASI method alway remains below that obtained with L 2 -Tikhonov regularization: Even with δ = 10% noise, the ASI method remains more accurate than L 2 -Tikhonov regularization with as little as 1% noise.Moreover, the number of control variables K m * used in the ASI method never exceeds 160, in comparison to approximately 160 000 control variables used in the nodal FE representation for L 2 -Tikhonov regularization.As a consequence, the computational effort of the ASI method always remains well below that with standard Tikhonov regularization.
In Figure 8, we observe that the relative error e m and D J δ (u (m),δ ) decrease with each iteration, while the ratio τ m tends to 1 and the number of basis functions K m of the search space Ψ (m) , i.e. the number of control variables, remains small, which again translates to a low computational effort.Note that τ m ≈ 1 implies that u (m),δ (nearly) yields an optimal data misfit because y † − y δ L2(Ω) δ.
Tab. 2: Elliptic inverse problem, three inclusions: the relative error em * , the total number of iterations m * , and the dimension Km * of the search space are shown for the ASI method and L 2 -Tikhonov regularization.

Time Dependent Inverse Scattering Problem
Next, we consider wave scattering from an unknown spatially distributed medium illuminated by surrounding point sources.Hence, the forward problem (5.2) now corresponds to the timedependent wave equation in Ω = (0, 1) 2 , with homogeneous initial conditions and first-order absorbing boundary conditions.Here, u(x) denotes the squared wave speed whereas the sources f (x, t) = g (x)r(t), = 1, . . ., N s , correspond to smoothed Gaussian point sources in space, centered about distinct locations x ∈ Ω, with s = 10 −2 and κ = 200, and a Ricker wavelet [10,30] in time, with central frequency ν = 10.In Figure 9, snapshots of the solution to the forward problem (5.12) are shown at different times for the source located at the top left corner.The forward problem (5.12) is solved by a standard Galerkin-FE method, where we again discretize u(x) ∈ V h using piecewise linear P 1 -FE on a triangular mesh with vertices located on a 400 × 400 equidistant Cartesian grid.For y(•, t) ∈ V h in (5.12), however, we use quadratic P 2 -FE with mass-lumping [7,32] on a separate triangular mesh with about 10 elements per wavelength, resulting in approximately 60 000 nodes.For the time integration, we use the standard (fully explicit) leapfrog method with time-step ∆t ≈ 4.5 • 10 −4 .
To generate the (synthetic) observations, we now place N s = 32 sources located at x equidistributed near the boundary and illuminate the medium, one source at a time.In contrast to the previous example from Section 5.1, here the data y δ is only available at the boundary nodes, yet for all discrete time-steps t n = n∆t, n = 0, 1, . . ., N T , until the final time T = 2 when the incident wave has essentially left Ω.Hence, we set 3), where the misfit (5.15) now accounts for the data y δ from multiple sources = 1, . . ., N s .
To avoid any potential inverse crime, the exact data y † = y [u † ] was computed from a 20% finer mesh.The perturbed (noisy) data y δ ∈ H 2 was then obtained at each boundary node x j and time-step t n as for a noise level δ 0. Here, (η ) j,n corresponds to normally distributed Gaussian noise with δ. (5.17) To reduce computational cost during the inverse iteration, we do not minimize (5.15) directly, but instead use a standard sample average approximation (SAA) [19]: at each iteration, we combine all sources f into a single "super-shot" f (m) , where ξ (m) = ±1 follow a Rademacher distribution with zero mean.Thus, at each iteration m, we solve the minimization problem min with corresponding boundary observations First, we compare the ASI Algorithm from Section 4 with the former ASI 0 Algorithm, where we omit Step 9 and thus ignore the most sensitive AS basis functions required for the angle condition, see Remark 2. To do so, we consider the (unknown) medium u † consisting of six discs shown in Figure 4.In Table 3, we observe that the ASI and ASI 0 Algorithms perform similarly in terms of the relative error e m * and dimension of the search space K m * .However, when we compare the reconstructed media u (m * ),δ from Figure 10, we observe that the ASI 0 Algorithm (bottom row) fails to reconstruct the smallest disc with increasing noise, whereas the ASI Algorithm (top row) always recovers even that smallest disc.
In Figure 11, the relative error e m decreases throughout all iterations, while K m , and hence the number of control variables, remains small, which keeps the overall computational cost low.As expected from Theorem 2.1, the norm of the gradient D J δ (u (m),δ ) decreases and the ratio τ m from the discrepancy principle (5.7) tends to 1.  Finally, we consider the medium u † shown in Figure 4 with three geometric inclusions.As shown in Figure 12, the ASI Algorithm recovers the shape and height of all three inclusions with high fidelity and regardless of the noise level δ.In Table 4 and Figure 13, we observe that the relative error e m remains low for all δ and decreases throughout all iterations.Again, the ratio τ m from (5.7) tends to 1 while D J δ (u (m),δ ) decreases.For all noise levels δ, the number of  basis functions K m never exceeds 140, which greatly reduces the computational effort compared to a standard nodal based optimization approach with 160 000 control variables.Since the ASI and ASI 0 Algorithms performed similarly, the results from the latter are omitted here.
Tab. 4: Inverse scattering problem, three inclusions: the relative error em * , the total number of iterations m * , and the dimension Km * of the search space are shown for the ASI method.the data misfit with added noise δ in a small subspace Ψ (m) , which inherits key information from the previous step while adding promising new search directions from the first few eigenfunctions of the elliptic operator L ε [u (m),δ ] in (3.2).Since the operator L ε [u (m),δ ] itself depends on the current iterate, u (m),δ , so do its eigenfunctions and thus also the new search space Ψ (m+1) .The convergence of the ASI method hinges upon the angle condition (2.4) -see Theorem 2.1 -which has been newly included as a final step into each iteration.The full ASI Algorithm 2 is listed in Section 4.Under suitable assumptions, the ASI iteration stops after (finitely many) m * iterations when the discrepancy principle (2.20) is satisfied.Hence, the ASI Algorithm yields a genuine regularization method whose solution u (m * ),δ converges to the exact (noise-free) solution as δ → 0, without the need for extra Tikhonov regularization -see Theorem 2.3.By adapting the search space at each iteration while keeping its dimension low, the ASI method achieves more accurate reconstructions than standard grid-based Tikhonov L 2 -regularization together with a thousandfold decrease in the number of unknowns.Thanks to the newly incorporated angle condition (2.4), the ASI Algorithm is able to detect even the smallest inclusions in the medium, which previous versions of the algorithm [18] at times failed to identify with increasing noise.

Method
The added cost from the numerical solution of the eigenvalue problem (3.2) is rather small.On the one hand, the Galerkin FE discretization leads to a generalized eigenvalue problem which is sparse, symmetric, and positive definite.On the other hand, we only require a few eigenfunctions which can be efficiently computed by a standard Lanczos iteration.A further reduction in the computational cost can easily be achieved by using an adaptively refined FE mesh when solving (3.2), as in [17], which need not coincide with the mesh used for discretizing the governing PDE.As the higher eigenfunctions become increasingly localized, they can easily be "sparsified" simply by setting to zero the smallest coefficients in their discrete FE representation [17].
Although we have concentrated here on two-dimensional inverse medium problems, the AS decomposition in fact applies to arbitrary many space dimensions [1].When the target medium is not piecewise constant but instead smoothly varying, adaptive spectral bases resulting from different elliptic operators may be more effective [18].Clearly, the ASI approach would probably prove useful for inverse source problems, for instance, or could be be combined with alternative (globally convergent) inversion methods [5].It could also be applied to other inverse problems, unrelated to wave scattering, where the state variable is governed by a different partial differential, or possibly even integral, equation.

Fig. 4 :
Fig. 4: Inverse Problem: The two media u † used in the numerical experiments.Left: six discs; right: three inclusions.

6 :
Elliptic inverse problem, six discs: The relative error (5.6), the ratio τm from the discrepancy principle (5.7), the norm of the gradient, and the dimension of the search space at iteration m for different noise levels δ.

Fig. 8 :
Fig. 8: Elliptic inverse problem, three inclusions: The relative error (5.6), the ratio τm from the discrepancy principle (5.7), the norm of the gradient, and dimension of the search space, for every iteration m for different noise levels δ > 0.

Fig. 11 :
Fig. 11: Inverse scattering problem, six discs: The relative error (5.6), the ratio τm from the discrepancy principle (5.7), the norm of the gradient, and the dimension of the search space are shown at every iteration m for the ASI method and the different noise levels δ.
Inverse scattering problem, six discs: the relative error em * , the total number of iterations m * , and the dimension Km * of the search space are shown for the ASI and ASI0 method.