Towards optimal sensor placement for inverse problems in spaces of measures

The objective of this work is to quantify the reconstruction error in sparse inverse problems with measures and stochastic noise, motivated by optimal sensor placement. To be useful in this context, the error quantities must be explicit in the sensor configuration and robust with respect to the source, yet relatively easy to compute in practice, compared to a direct evaluation of the error by a large number of samples. In particular, we consider the identification of a measure consisting of an unknown linear combination of point sources from a finite number of measurements contaminated by Gaussian noise. The statistical framework for recovery relies on two main ingredients: first, a convex but non-smooth variational Tikhonov point estimator over the space of Radon measures and, second, a suitable mean-squared error based on its Hellinger-Kantorovich distance to the ground truth. To quantify the error, we employ a non-degenerate source condition as well as careful linearization arguments to derive a computable upper bound. This leads to asymptotically sharp error estimates in expectation that are explicit in the sensor configuration. Thus they can be used to estimate the expected reconstruction error for a given sensor configuration and guide the placement of sensors in sparse inverse problems.


Introduction
The identification of an unknown signal µ † comprising finitely many point sources lies at the heart of challenging applications such as acoustic inversion [30,20], microscopy [25,10], astronomy [35], low-rank tensor decomposition [23], linear system identification [3], as well as initial value identification [8,7,22].Moreover, the recovery of an unknown function by one-hidden-layer neural networks [2,9,29] is intrinsically linked to this task.In all of these contexts the problem is to identify an unknown linear combination (superposition) of functions indexed by a nonlinear parameter from a finite number of measurements.Motivated by inverse point source location tasks we will refer to the linear parameters as amplitudes and nonlinear parameters as locations.Moreover, we will assume that measurements are associated to certain spatial locations, motivated by point-wise measurements of physical quantities.Denoting by Ω s ⊂ R d and Ω o ⊂ R do , d, d o ≥ 1, compact sets of possible source locations and measurement points, a common mathematical framework for the recovery of the locations y † n ∈ Ω s and amplitudes q † n of its N † s individual point sources can be given by equations of the form Here, k ∈ C(Ω o ×Ω s ) denotes a sufficiently smooth given integral kernel (resulting from the modeling of the physical process and the properties of the sensors), and x j ∈ Ω o denote measurement locations.Moreover, ε j is a measurement error for each sensor that, for the purposes of this paper is thought of as a random perturbation stemming from measurement noise.This type of ill-posed inverse problem is challenging for a variety of reasons.First and foremost, we neither assume knowledge of the amplitudes and positions of the sources nor of their number.This adds an additional combinatorial component to the generally nonlinear nonconvex problem.Second, inference on µ † is only possible through a finite number of indirect measurements z d .Additional challenges are given by the appearance of unobservable measurement noise ε in the problem.
To alleviate some of these difficulties we identify µ † with a finite linear combination of Dirac measures Here, M(Ω s ) is the space of Radon measures defined on the location set Ω s .At first glance, this might seem counter-intuitive: The space M(Ω s ) is much larger than the set of "sparse" signals of the form (1.2).Thus, this lifting should only contribute to the ill-posedness of the problem.However, it also bypasses the nonlinear dependency of k(x j , •) onto the location of the sources and enables the use of powerful tools from variational regularization theory for the reconstruction of µ † .In this work, stable recovery of µ is facilitated by a variational Tikhonov estimator in the space of Radon measures [19,4], which amounts to solving a nonsmooth minimization problem over this space.
However, measurements stemming from experiments are always affected by errors, either due to external influences, imperfectness of the measurement devices or human failure.These have to be taken into account in order to guarantee a stable recovery of µ † .In particular, it is evident that the choice of the measurement locations and the quality of the employed sensors is a key factor for the successful and robust reconstruction of the signal.This directly leads to the problem of sensor design, which is to identify a measurement configuration leading to recovery guarantees with minimal error for the given effort, in a suitable way.Since the sensor design must usually be chosen before the exact source is known and the practical measurement has been performed (thus yielding a realization of the noise), this usually calls for a stochastic framework for the noise.Although much is know about the error caused by deterministic noise [13,11,1,33,34], we are not aware of any works pertaining to the case of stochastic noise in this context.Moreover, existing deterministic bounds on the error of the recovery µ(ε) to the ground truth µ † are not explicit in terms of the measurement locations x j the statistical properties of the error ε j and ground truth µ † , and thus can not directly be used to quantify the influence of the measurement locations on the error.The explicit dependency on the measurement setup is needed to guide the choice of an optimal design that minimizes the expected recovery error for a given cost (often measured in terms of number and quality of sensors), while robustness with respect to the ground truth is desirable if only an approximate guess of the exact source is available (which is the realistic case, in practice).
In addition, to quantify the error, often estimates are given separately in terms of positions and coefficients, which can then be translated into an an upper bound of the error of the measure, which may be an overestimate by a large factor.To provide a useful bound for sensor placement, we start from the error in the recently developed Hellinger-Kantorovich metric [24], which we then link to the parameters and the quantitative bound that is asymptotically sharp.
1.1.Sparse inverse problems with deterministic noise.Despite the popularity of sparse inverse problems, most of the existing work, to the best of our knowledge, focuses on deterministic noise ε.Central objects in this context, are the (noiseless) minimum norm problem min µ∈M (Ωs) ∥µ∥ M(Ωs) subject to Kµ = Kµ † (P 0 ) as well as the question whether µ † is identifiable, i.e., its unique solution.A sufficient condition for the latter, is, e.g., the injectivity of the restricted operator K | supp µ † as well as the existence of a so-called dual certificate η † ∈ C 2 (Ω s ), [11], i.e. a subgradient η † ∈ ∂∥µ † ∥ M(Ωs) , which is in some sense minimal, satisfying a strengthened source condition |η † (y)| ≤ 1 for all y ∈ Ω s , η † (y † n ) = sign(q † n ), |η † (y)| < 1 for all y ∈ Ω s \ {y † } Ns n=1 .For example, in particular settings, the groundbreaking paper [6] shows that µ † is identifiable if the source locations y † n are sufficiently well separated.In this context, several manuscripts, see e.g.[11,1,34,13] for a non-exhaustive list, study the approximation of an identifiable µ † by solutions to the Tikhonov-regularized problem μ(ε) ∈ M(ε) := arg min µ∈M(Ωs) where Σ 0 is a positive definite diagonal matrix and the regularization parameter β = β(∥ε∥) > 0 is adapted to the strength of the noise.This represents a challenging nonsmooth minimization problem over the infinite-dimensional and non-reflexive space of Radon measures.Moreover, due to its lack of strict convexity, its solutions are typically not unique.Under mild conditions on the choice of β, arbitrary solutions μ(ε) approximate µ † in the weak*-sense as ε goes to zero.Moreover, it was shown in [11] that if the minimal dual certificate η † associated to problem (P 0 ) satisfies the strengthened source condition and its curvature does not degenerate around y † n , μ(ε) is unique and of the form provided that ∥ε∥ and β are small enough.

Sparse inverse problems with random noise.
From a practical perspective, assuming knowledge on the norm of the error is very restrictive or even unrealistic and a statistical model for the measurement error is more appropriate.While the literature on deterministic sparse inversion is very rich, there are only few works dealing with randomness in the problem.We point out, e.g., [5] in which the authors consider additive i.i.d.noise stemming from a low-pass filtering of the signal.A reconstruction μ(ε) is obtained by solving a constrained version of (P β,ε ) and the authors show that, with high probability, there holds where Q hi is a convolution with a high-resolution kernel.Moreover, in [34] the authors consider deterministic noise but allow for randomness in the forward operator K. Their main result provides an estimate on an optimal transport energy between two positive measures derived from source and reconstruction.These again hold with high probability.Finally, we also mention [12] in which the authors propose a first step towards Bayesian inversion for sparse problems, i.e. both measurement noise as well as the unknown µ † are considered to be random variables.A suitable prior is constructed and well-posedness of the associated Bayesian inverse problem is shown.
In this paper, similar to [5], we adopt a frequentist viewpoint on sparse inverse problems and assume that the measurement errors follow a known probability distribution.In contrast, the unknown signal µ † is treated as a deterministic object.More in detail, we assume unbiased independent Gaussian noise with diagonal covariance matrix Σ = diag(σ j ), corresponding to independent measurements with variable quality sensors at different locations.We consider the Tikhonov-type estimator (P β,ε ) with Σ −1 0 = Σ −1 /p, where p = tr(Σ −1 ) and investigate its error to the ground truth, where we have to account for the randomness of the noise.In statistical terms, Σ −1 is the precision matrix of the sensor array, and p can be interpreted as an overall precision of the combined measurement, roughly representing an analogue to 1/∥ε∥ in the stochastic setting.First and foremost, the uncertainty of the noise propagates to the estimator and thus μ has to be interpreted as a random variable.Second, unlike the deterministic setting of [11], the asymptotic analysis cannot exclusively rely on smallness assumptions on the Euclidean norm of the noise: some realizations of ε might be very large, albeit with small probability.Consequently, reconstructions can exhibit undesirable features such as clustering phenomena around y † n or spurious sources far away from the true support.In particular, the reconstructed signal may comprise more or less than N † s sources.Thus, we require a suitable distance between signed measures that is compatible with weak* convergence on bounded subsets of M(Ω s ).We find a suitable candidate in generalizations of optimal transport energies [24]; cf. also [9,34].
Despite its various difficulties, stochastic noise also provides new opportunities.For example, unlike the deterministic case, we are given a whole distribution of the measurement data and not only one particular realization.Clearly, the uncertainty in the estimate critically depends on the appropriate choice of measurement locations x = (x j ) j=1,...,No , the overall precision p, and relative precision of each sensor Σ −1 0 .Formalizing this connection enables the mathematical program of optimal sensor placement or optimal design, i.e. an optimization of the measurement setup to mitigate the influence of the noise before any data is collected in a real experiment.This requires a cheap-to-evaluate design criterion which allows to compare the quality of different sensor setups.For linear inverse problems in Hilbert spaces, a popular performance indicator is the mean-squared error of the associated least-squares estimator, which admits a closed form representation through its decomposition into variance and bias; see, e.g., [18].For nonlinear problems, locally optimal sensor placement approaches rely on a linearization of the forward model around a best guess for the unknown parameters; see, e.g., [36].To the best of our knowledge, optimal sensor placement for nonsmooth estimation problems and for infinite dimensional parameter spaces beyond the Hilbert space setting is uncharted territory.

Contribution.
Taking the mentioned difficulties in the stochastic setting into consideration, we are led to the analysis of the worst-case mean-squared-error of the estimator where d HK denotes an extension of the Hellinger-Kantorovich distance introduced in [24] to signed measures (see Section 4) and γ p is the noise distribution N (0, Σ).We point out that, in comparison to linear inverse problems in Hilbert space, MSE[μ] does not admit a closed form expression and its computation requires both, a sampling of the expected value, as well as an efficient way to calculate the Hellinger-Kantorovich distance.This prevents its direct use in the context of optimal sensor placement for sparse inverse problems.
To enable efficient sensor design, we first need to select an appropriate regularization parameter, depending on the noise level.Here, we focus on the a priori choice rule of β(p) = β 0 / √ p for some tunable β 0 > 0, that only takes into account the overall precision of the sensor.For this choice, we provide the following upper bound: where the constant ψ β 0 (x, Σ 0 ) (further detailed below) explicitly depends on the locations and relative precisions while the constants c and λ depend on the problem setup (the kernel and domain, some basic bounds on the ground truth), a non-degeneracy parameter of the dual certificate η † (further detailed below), and quantities that can be bounded by ψ β 0 (x, Σ 0 ), but do not depend on p or β 0 for p ≥ p > 0 and β 0 ≥ β0 > 0; see Theorem 6.1.Thus, under these basic assumptions and by choosing β 0 large enough, the second term in (1.4) becomes negligible and the first term dominates and closely predicts the mean-squared error.This behavior is confirmed by numerical examples; see Section 7.
To further illustrate the meaning of the constant ψ β 0 (x, Σ 0 ), let us denote by q = (q 1 ; . . .; q Ns ) and y = (y 1 ; . . .; y Ns ) the vectors of coefficients and positions of sources, respectively.Additionally, we collect all the parameters of a given finite source µ in the vector m = (q; y), and introduce the parameter-to-observation map G(m) = Kµ, as well as its Jacobian G ′ (m † ) evaluated at the parameters of the ground truth.Associated to this, we denote the Fisher information matrix I 0 by Then the constant in the estimate above is computed as W † , with the sign vector ρ = sign q † and a weighted Euclidean norm ∥ • ∥ W † which is induced by a positive definite matrix W † connected to the ground truth m † .This clarifies how the multiplicative constant in the estimate explicitly depends on the measurement setup and we note that it closely resembles the "classical" A-optimal design criterion; cf.[18].Together with the estimate (1.4), and the smallness of the second term, this suggests that ψ β 0 (x, Σ 0 ) is a suitable criterion to quantify the quality of a given design in terms of the MSE (1.3).
Concerning the smallness of the second term, we note that the constant λ also depends on a non-degeneracy constant θ > 0, which is a further tightening of the assumption on the dual certificate.This non-degenerate source condition on µ † requires the associated minimal norm dual certificate η † to fulfill for all y ∈ Ω s (1.6) for some θ > 0. This condition has been employed in many previous works, and is known to uniformly hold for several settings under general assumptions on the measurement and a separation of the condition of the sources; see, e.g.[33] and the references therein.
The proof of the main result relies on a splitting of the set of measurement errors R No into a set of "nice" events A nice as well as an estimate of the probability of its complement R No \A nice , related to the second term in (1.4).On A nice , there is a unique optimal parameter m = (q, ȳ) with the correct number of sources that parametrizes μ.Then, the distance between the reconstruction and the ground truth in the Hellinger-Kantorovich distance can be estimated by a weighted Euclidean distance of the parameters.Those can be further estimated with a linearization of G, which leads to (1.4) after explicitly computing the expectation.This estimate is specific to the choice of d HK and relies on its interpretation as an unbalanced Wasserstein-2 distance.While similar estimates can be derived for other popular metrics such as the Kantorovich-Rubinstein distance (related to the Wasserstein-1 distance; see Appendix C) this would introduce additional constants in the first term of (1.4) stemming from an inverse inequality of discrete ℓ 1 and weighted ℓ 2 norms.Thus, the first term in the modified estimate would overestimate the true error by a potentially substantial factor.In contrast, the first term in (1.4) is sharp in the sense that the convenient factor of 8 can, mutatis mutandis, be replaced by any c > 1, at the cost of increasing the constant in the second term.

Further related work.
Sparse minimization problems beyond inverse problems.Minimization problems over spaces of measures represent a sensible extension of ℓ 1 -regularization towards decision variables on continuous domains.Consequently, problems of the form (P β,ε ) naturally appear in a variety of different applications, detached from inverse problems.We point out, e.g., optimal actuator placement, optimal sensor placement [26], as well as the training of shallow neural networks [2].Non-degeneracy conditions similar to (1.6) play a crucial role in this context and form the basis for an in-depth (numerical) analysis of the problem, e.g., concerning the derivation of fast converging solution methods, [9,14,31], or finite element error estimates [22].
Inverse problems with random noise.Frequentist approaches to inverse problems have been studied previously in, e.g., [16,37].These works focus on the "lifting" of deterministic regularization methods as well as of their consistency properties and convergence rates to the random noise setting.This only relies on minimal assumptions on the inverse problem, e.g., classical source conditions, and thus covers a wide class of settings.Similar to the present work, an important role is played by a splitting of the possible events into a set on which the deterministic theory holds and its small complement.However, we want to stress that the proof of the main estimate in (1.4) is problem-taylored and relies on exploiting specific structural properties of inverse problems in spaces of measures.Moreover, our main goal is not the consistency analysis of an estimator but the derivation of a useful and mathematically sound design criterion for sparse inverse problems.
Organization of the paper.The paper is organized as follows: In Section 3, we recall some properties of the minimum norm problem (P β,ε ) and the Tikhonov regularized problem (P β,ε ) as well as its solutions.In Section 4, we define the Hellinger-Kantorovich distance and investigate its properties.Section 5 is devoted to study the linearized estimate δ m.Using these results, we then investigate sparse inverse problems with random noise in Section 6 and provide a sharp upper bound for MSE[μ] in Section 6.2.Finally, in Section 7 we present some numerical examples to verify our theory.

Notation and preliminaries
Before going into the main part of the paper, we introduce the basic notation used throughout the paper and gather preliminary assumptions concerning the considered integral kernels as well as pertinent facts on Radon measures.
2.1.Notation.Throughout the paper, c i , C i , i = 1, 2, . . .denote generic constants that may vary from line to line.By C = C(a, b, . ..), we indicate that C depends on a, b, . ... We denote by Ω s ⊂ R d and Ω o ⊂ R do the compact location and observation set, where d o , d ≥ 1 and Ω s has a nonempty interior.A vector in X m for a set X and m > 1, will be written in bold face, for instance y = (y 1 ; . . .; y Ns ) ∈ Ω Ns s , q = (q 1 ; . . .; q Ns ) ∈ R Ns and x = (x 1 ; . . . are vectors of coefficients, positions of sources and positions of observations, respectively, where the formal definitions are introduced in the sequel.We write (a 1 , . . ., a n ) and (a 1 ; . . .; a n ) to stack vectors a 1 , . . ., a n horizontally and vertically, respectively.We write ∥•∥ p for the usual ℓ pnorm on R m .For a vector x ∈ R m and a positively defined matrix W ∈ R m×m , we define the weighted W -norm of x as ∥x∥ W := ∥W 1/2 x∥ 2 .The closed ball in this weighted norm is denoted by For a linear map A : X → Y , the operator norm of A is given by ∥A∥ X→Y = sup ∥x∥ X ≤1 ∥Ax∥ Y .Similarly, any bilinear map A : Furthermore, let k : Ω o × Ω s → R be a real-valued kernel.We introduce the following notations which turn k into vector-valued kernels: k Similarly, we also have the matrix k[x, y] defined as •) is a smooth function in variable y, we consider the r th -derivative of k the tensor of partial derivatives is y by ∇ r y•••y k(x, y).In particular, ∇ y k(x, y) and ∇ 2 yy k(x, y) are the gradient and Hessian of k (with respect to variable y,) respectively.We note that ∇ y k : Ω o × Ω s → R Ns is a vector valued kernel and thus we define ∇ ⊤ y k[x, y] as a matrix defined by Throughout the paper, by a slight abuse of notation, we denote by ε a variable deterministic noise, a random variable, or its realization, which will be clear from the context.By γ p we denote the density of a multivariate Gaussian random variable with expectation zero and covariance Σ.
Further notation, specific to the present manuscript, will be introduced at first appearance.For quicker reference, a notation table can be found in Appendix D.

Preliminaries.
We also recall some basic facts and assumptions for inverse source location.
Integral kernels.Throughout the paper, we assume that the kernel is sufficiently regular: ) is three-times differentiable in the variable y.For abbreviation, we further set By means of the kernel k, we introduce the weak* continuous source-to-measurements operator Moreover, consider the operator Then K * is linear and continuous and there holds Space of Radon measures.We recall some properties of Radon measures.Let Ω ⊂ R d , d ≥ 1 be a compact set.We define the space of Radon measures M(Ω) as the topological dual of the space C(Ω) of continuous functions on Ω endowed with the supremum norm.It is then a Banach space equipped with the dual norm Weak* convergence of a sequence in M(Ω) will be denoted by "⇀ * ".More specifically, we have Next, by the definition of the total variation norm, its subdifferential is defined by see for instance [11].In particular, for a discrete measure µ Finally, by M + (Ω) we refer to the set of positive Radon measures on Ω.

Sparse inverse problems with deterministic noise
Our interest lies in the stable recovery of a sparse ground truth measure by solving the Tikhonov regularization (P β,ε ) associated to the inverse problem z d = Kµ given noisy data z d .In this preliminary section, we give some meaningful examples of this abstract setting and briefly recap the key concepts and results in the case of additive deterministic noise In particular, we clarify the connection between (P β,ε ) and (P 0 ) and recall a first qualitative statement on the asymptotic behavior of solutions to (P β,ε ) for a suitable a priori regularization parameter choice β = β(ε).

Examples.
Sparse inverse problems appear in a variety of interesting applications.In the following, we give some examples which fit into our setting.
Example 3.1.Consider the advection-diffusion equation together with the initial value u(0, •) = µ.The boundary condition is given by u → 0 as x → ∞.This equation describes the rate of change of the concentration of the contaminant u(t, x).For simplicity, we consider a two-dimensional medium, and both κ = (κ 1 , κ 2 ) and Here the solution to (3.2) is given by where G(x, t) is the Green's function of the advection-diffusion equation, which is given by Here, if one seeks to identify the initial value µ from finite number of measurements at time Example 3.2.Consider the advection-diffusion equation on a bounded smooth domain Ω, together with the Dirichlet boundary conditions u| (0,T )×∂Ω = 0, then there exists a kernel see, e.g., [15].In this case, for observations at time For Ω o ⊂ Ω (i.e., no observation near the boundary), the regularity requirements on ∂Ω are not necessary since one can employ interior regularity arguments; see, e.g., [17].

Tihkonov regularization of sparse inverse problems.
In this section, we briefly summarize some preliminary results concerning the regularized problem (P β,ε ) as well as its solution set.We start by discussing its well-posedness.
Proposition 3.3.Problem (P β,ε ) admits a solution μ.Furthermore, any solution μ to (P β,ε ) and the solution set Proof.Existence of a minimizer of (P β,ε ) is guaranteed by [4, Proposition 3.1] noticing that the forward operator For the upper bound we use the optimality of μ compared to µ † as well as the definition of Moreover, M(ε) is weak* closed since the objective functional in (P β,ε ) is weak* lower semicontinuous.Combining both observations, we conclude the weak* compactness of M(ε).□ In particular, note that M(ε) is, in general, not a singleton due to the lack of strict convexity in (P β,ε ).Moreover, we recall that the inverse problem was introduced as a lifting of the nonconvex and combinatorial integral equation (1.1).From the same perspective, (P β,ε ) can be interpreted as a convex relaxation of the parametrized problem In the following proposition, we show that this relaxation is exact, i.e. there exists at least one solution to (3.3) and its minimizers parametrize sparse solutions to (P β,ε ).
Moreover, (P β,ε ) admits at least one solution of this form with N ≤ N o .
Proof.Given (N, y, q) with y i ̸ = y j , i ̸ = j, note that the sparse measure Hence, one readily verifies min (P β,ε ) = inf (3.3) as well as the claimed equivalence due to the weak* density of the set of sparse measures in M(Ω s ) and since the objective functional in (P β,ε ) is weakly* lower semicontinuous.The existence of a sparse solution to (P β,ε ) follows similarly to [30,Theorem 3.7].□ The equivalence between both of these problems will play a significant role in our subsequent analysis.Additional insight on the structure of solutions to (P β,ε ) can be gained through the study of its first order necessary and sufficient optimality conditions.Since our interest lies in sparse solutions, we restrict the following proposition to this particular case.
Note that η is independent of the particular choice of the solution to (P β,ε ).We will refer to it as the dual certificate associated to (P β,ε ) in the following.Finally, we give a connection between (P β,ε ) and the minimum norm problem (P 0 ) in the vanishing noise limit.The following general convergence property follows directly from [19].

3.3.
Radon minimum norm problems.Following Proposition 3.6, guaranteed recovery of the ground truth measure requires that µ † is identifiable, i.e. the unique solution of (P 0 ).In this section, we briefly summarize some key concepts regarding (P 0 ) and state sufficient assumptions for the latter.For this purpose, introduce the associated Fenchel dual problem as well as the minimal-norm dual certificate Note that the existence of ζ † , and therefore the minimum-norm dual certificate η † , is guaranteed in this setting following [30,Proposition A.2] as well as due to K * : R No → C 2 (Ω s ).Moreover, by standard results from convex analysis, a given µ ∈ M(Ω s ) is a solution to (P 0 ) if and only if η † ∈ ∂∥µ∥ M(Ωs) .The following assumptions on µ † and η † are made throughout the paper: (A2) Structure of µ † : We assume that there holds (A3) Source condition: We assume that the minimum-norm dual certificate η † satisfies and the operator Here, Assumption A3 is equivalent to η † ∈ ∂∥µ † ∥ M(Ωs) , i.e., µ † is indeed a solution to (P 0 ), whereas Assumptions A2 and A4 imply its uniqueness.While Assumption A4 seems very strong at first glance, it can be explicitly verified in some settings (see, e.g., [6]) and is often numerically observed in practice.According to [11,Proposition 5] we have the following: Proposition 3.7.Let Assumptions A2-A4 hold.Then µ † is the unique solution of (P 0 ).
As a consequence, Proposition 3.6 implies μ ⇀ * µ † .Moreover, according to [11, Proposition 1], the dual certificates η associated to (P β,ε ) approximate the minimal norm dual certificate η † in a suitable sense.Taking into account Assumption A3 as well as Proposition 3.5, we thus conclude that the reconstruction of µ † from (3.3) is governed by the convergence of the global extrema of η towards those of η † .However, in order to capitalize on this observation in our analysis, we need to compute a closed form expression for η † .In general, this is intractable due to the global constraint |η † (z)| ≤ 1, z ∈ Ω s .As a remedy, the authors of [11] introduce a simpler proxy replacing this constraint by finitely many linear ones noting that

The computation of the associated vanishing derivative pre-certificate η
only requires the solution of a linear systems of equations and coincides with η † under appropriate conditions, see [11,Proposition 7].Finally, in order to derive quantitative statements on the reconstruction error between μ and µ † , we require the non-degeneracy of the minimal norm dual certificate of µ † in the sense of [11].Since we aim to use (1.4) in the context of optimal sensor placement, that is, we need to track the dependence of the involved constants on the measurement setting, we utilize the following quantitative definition; cf.[33].
Definition 3.8.We say that η ∈ C 2 (Ω s ) is θ−non-degenerate or θ−admissible for the sparse measure µ = Ns n=1 q n δ yn and θ ∈ (0, 1] if there holds and weights Due to the regularity of η one readily verifies that (3.8) is equivalent to as well as

Distances on spaces of measures
In order to quantitatively study the reconstruction error of estimators of the source µ † , we introduce a distance function on M(Ω s ) which measures the error between the estimated source measure µ and the reference measure µ † .An obvious choice of distance would be the total variation norm on M(Ω s ), however it is not suitable for quantifying the reconstruction error.In fact, evaluating that is, d TV does not quantify the reconstruction error of the source positions, and small perturbations of the source points lead to a constant error in the metric.Hence, in general one cannot rely on TV distance to evaluate the quality of the reconstruction.In the following, we consider an extension of the Hellinger-Kantorovich (H-K) metric [24] to signed measures, which possesses certain properties that will be discussed below.The construction of the H-K distance is more involved than another often used candidate, namely the Kantorovich-Rubinstein (K-R) distance (see, e.g.[28,21]) or flat metric, which is directly obtained as a dual norm of a space of Lipschitz functions (see Appendix C).It induces the same topology of weak* convergence, and is bounded by the H-K metric [24].Since our estimates are going to be asymptotically sharp in H-K, but only an upper bound in K-R, we focus on H-K in the following.
The Hellinger-Kantorovich metric [24] is a generalization of the Wasserstein-2 distance (see, e.g., [27]) for measures which are not necessarily of the same norm.We first assume the case of positive measures µ 1 , µ 2 ≥ 0 and define the H-K metric in terms of the Wasserstein-2 metric as: Here, P 2 (R + × Ω s ) are the probability measures of with finite second moment on R + × Ω s , the two-homogeneous marginal is and R + × Ω s is endowed with a conic metric where sin + (z) := sin(min{ z, π/2 }).For a detailed study of this metric and its properties as well as equivalent formulations in terms of Entropy-Transport problems we refer to [24].
For signed measures, we note that for any distance based on a norm (such as the TV or K-R distance) one observes that by using the Jordan decomposition which is indeed a metric on M(Ω s ) and fulfills . In contrast to the total variation distance, the Hellinger-Kantorovich distance between two Dirac measures q 1 δ y 1 and q 2 δ y 2 can be computed by ), which is exactly the conic metric given in (4.1).Clearly, it is evidence that for small perturbations of both the source positions and coefficients, the resulting change of the H-K distance remains small.Hence, it is reasonable to employ this type of distance to measure the reconstruction error.
One next advantage of the H-K distance is that it is compatible with the weak* topology on M(Ω s ), namely it induced weak* convergence on bounded set in M(Ω s ).Proof.Assume that d HK (µ n , µ) → 0 as n → ∞.One can write and the HK-distance metrizes weak * convergence on bounded sequences of non-negative measures (see [24,Theorem 7.15]), we have µ 1 n − µ 2 n ⇀ * 0, which means that µ n ⇀ * µ.Conversely, assume that µ n ⇀ * µ.Consider the decomposition (4.4) and suppose that the distance d HK (µ n , µ) does not converges to zero.Then there exists a subsequence, denoted by the same symbol, such that We now use the fact that ∥µ i n ∥ M ≤ 2M to extract a further subsequence (again with the same symbol) such that µ i n ⇀ * µ i , which implies Due to (4.5) and the fact that the HK-distance metrizes weak* convergence on bounded sequences of non-negative measures we have that µ 1 ̸ = µ 2 and thus µ n − µ ⇀ * µ 1 − µ 2 ̸ = 0. Thus the subsequence {µ n } n∈N does not converge weak* to µ and the original sequence {µ n } n∈N can not converge to µ. □ To evaluate the reconstruction error, the distance between finitely supported measures is needed since the reference measure as well as the reconstructed measure are known to be sparse.In fact, we only need a (sharp) upper bound for the H-K distance, which will be provided for the finitely supported case below in term of a (weighted) ℓ 2 -type distance.This is yet another advantage of the H-K distance in comparison to other distances.Proposition 4.2.Let µ and µ † be finitely supported with the same number N of support points and sign q n = sign q † n , for all n = 1, . . ., N .Then we have where R(q, q † ) := max Loosely speaking, the H-K distance between two discrete measures µ and µ † with the same number of support points could be upper bounded by a weighted ℓ 2 -type distance of their corresponding coefficients and positions.

Fully explicit estimates for the deterministic reconstruction error
The Hellinger-Kantorovich distance allows us to quantify the reconstruction error between the unknown source µ † and measures obtained by solving (P β,ε ).This will be done in two steps.First, we study the approximation of m † = (q † ; y † ), i.e., the support points and coefficients of the ground truth, by stationary points m = m(ε) of the nonconvex parametrized problem min m=(q;y)∈(R×Ωs) Ns where the source-to-observable map G satisfies By Assumption A1, the latter is three times differentiable.Notice that (5.1) is obtained from (3.3) by fixing N s = N † s points of sources in the formulation.Hence, solutions, let alone stationary points, of problem (5.1) do not parametrize minimizers of (P β,ε ) in general.Moreover, it is clear that problem (5.1) is primarily of theoretical interest since its practical realization requires knowledge of N † s .Thus, in a second step, we investigate for which noises ε, m parametrizes the unique solution of (P β,ε ).While these results build upon similar techniques as [11], we give a precise, quantitative characterization of this asymptotic regime and clarify the dependence of the involved constants on the problem parameters, e.g., the measurement points x.This is necessary, for both, lifting these deterministic results to the stochastic setting in Section 3 as well utilizing the derived error estimates in the context of optimal sensor placement.However, since these are merely intermediate steps in the derivation of our main result, we omit a detailed exposition at this point and direct the interested reader to Appendix B. In the following, a central role will be played by the linearized problem min δm=(δq;δy)∈R (1+d)Ns (5.3) Note that here we have linearized both, the mapping G as The following proposition characterizes the solutions of (5.1) and (5.3).Since its proof relies on standard computations, we omit it for the sake of brevity.
for some ρ ∈ ∂∥q∥ 1 .The solutions of (5.3) satisfy ) has full column rank then the Fisher information matrix is invertible and the unique solution of (5.3) is given by where (Σ Since (5.1) is nonconvex, the stationarity condition (5.4) is only necessary but not sufficient for optimality.In the following, we call any solution to (5.4) a stationary point.

Error estimates for stationary points.
In this section, we show that for sufficiently small noise ε, problem (5.1) admits a unique stationary point m(ε) in the vicinity of m † .Moreover, loosely speaking, m † and m † + δ m(ε) provide Taylor expansions of zeroth and first order, respectively, for m(ε).Proposition 5.2.Suppose that G ′ (m † ) has full column rank.Then, for some constant For the sake of brevity, we omit by now the proof of Proposition 5.2, which is then presented in Appendix B. Remark 5.3.We note that C 1 depends monotonically on the norm of the inverse Fisher information matrix; see Remark B.3.Moreover, the dependency on the ground truth µ † is only in terms of the norm ∥q † ∥ 1 , and distances of y † n to the boundary and q † n to zero.

Error estimates for reconstructions of the ground truth.
As mentioned in the preceding section, solving the stationarity equation (5.4) for m = ( y, q) is not feasible in practice since it presupposes knowledge of N † s .Moreover, recalling that m is merely a stationary point, the parametrized measure is not necessarily a minimizer of (P β,ε ).In this section, our primary goal is to show that m indeed parametrizes the unique solution of problem (P β,ε ) if the minimum norm dual certificate η † associated to (P 0 ) is θ-admissible and if the set of admissible noises ε is further restricted.A fullyexplicit estimate for the reconstruction error between µ and the ground truth µ † in the Hellinger-Kantorovich distance then follows immediately.For this purpose, recall from [11,Proposition 7] that the non-degeneracy of η † implies where η PC denotes the vanishing derivative pre-certificate from Section 3.3.
We first prove that is θ/2-admissible for certain ε and β.
The proof of Proposition 5.4 is then provided in Appendix B.
Remark 5.5.We note that C 2 depends monotonically on the norm of the inverse Fisher information matrix; see Remark B.5.Moreover, the dependency on the ground truth µ † is only in terms of the norm ∥q † ∥ 1 , and distances of y † n to the boundary and q † n to zero.
As a consequence, we conclude that the solution to (P β,ε ) is unique and parametrized by m.Moreover, its H-K distance to µ † can be bounded in terms of the linearization δ m.Theorem 5.6.Let the assumptions of Proposition 5.4 hold.Then the solution of (P β,ε ) is unique and given by µ from (5.7).There holds (5.11) Proof.From Proposition 5.4, we conclude that η is θ/2-admissible for µ.Consequently, we have η ∈ ∂∥ µ∥ M(Ωs) , i.e., µ is a solution of (P β,ε ).It remains to show its uniqueness.For this purpose, it suffices to argue that is injective, see, e.g., the proof of [31,Proposition 3.6].Assume that this is not the case.Then, following [30,Theorem B.4], there is v ̸ = 0 with k[x, y]v = 0 and τ ̸ = 0 such that the measure μ parametrized by m = (q; y) with q = q + τ v is also a solution of (P β,ε ) (choose the sign of τ to not increase the ℓ 1 -regularization, and the magnitude small not to change the sign of q) and q ̸ = q.For s ∈ (0, 1), set q s = (1 − s) q + sq.By convexity of (P β,ε ), the measure parametrized by m s = (q s ; y) is also a minimizer of (P β,ε ).Consequently, m s is a solution of (5.1) and thus also a stationary point.Finally, noting that m s ̸ = m, s ∈ (0, 1), and lim s→0 m s = m, we arrive at a contradiction to the uniqueness of stationary points in the vicinity of m † .The estimate in (5.11) immediately follows from Inverse problems with random noise Finally, let (D, F, P) denote a probability space and consider the stochastic measurement model where the noise is distributed according to ε ∼ γ p = N (0, p −1 Σ 0 ) for some p > 0 representing the overall precision of the measurements.Mimicking the deterministic setting, we are interested in the reconstruction of the ground truth µ † by solutions obtained from (P β,ε ) for realizations of the random variable ε.By utilizing the quantitative analysis presented in the preceding section, we provide an upper bound on the worst-case mean-squared error for a suitable a priori parameter choice rule β = β(p).Note that the expectation is well-defined according to Appendix A.2.

A priori parameter choice rule.
Before stating the main result of the manuscript, let us briefly motivate the particular choice of the misfit term in (P β,ε ) as well as the employed parameter choice rule from the perspective of the stochastic noise model.Since we consider independent measurements, their covariance matrix Σ = p −1 Σ 0 is diagonal with Σ jj = σ 2 j for variances σ 2 j > 0, j = 1, . . ., N o .This corresponds to performing the individual measurements with independent sensors of variable precision p j = 1/σ 2 j .We call the total precision of the sensor array.It can be seen that its reciprocal σ 2 tot = 1/p corresponds to the harmonic average of the variances divided by the number of sensors N o .Therefore, the misfit in (P β,ε ) satisfies For identical sensors and measurements ε ∼ N (0, Id No ) this simply leads to the scaled Euclidean norm In general, by increasing the total precision of the sensor setup p, we improve the measurements by proportionally decreasing the variances by σ 2 tot .While this will decrease the expected level of noise through its distribution, it will not affect the misfit functional, which is just influenced by Σ 0 , or the normalized variances σ 2 0,j = σ 2 j /σ 2 tot .Moreover, since ε ∼ N (0, Σ), we have Σ −1/2 ε ∼ N (0, Id No ) and by direct calculations, the following estimate holds Hence, with high probability, realizations of the error fulfill the estimate and thus ∥ε∥ Σ −1 0 ≲ 1/ √ p.Thus, we consider the expected noise σ tot = 1/ √ p as an (expected) upper bound for the noise.This motivates the parameter choice rule for some β 0 > 0 large enough.

6.2.
Quantitative error estimates in the stochastic setting.We are now prepared to prove a quantitative estimate on the worst-case mean-squared error by lifting the deterministic result of Theorem 5.6 to the stochastic setting.
Theorem 6.1.Assume that η † is θ-admissible for θ ∈ (0, 1) and set β(p) = β 0 / √ p. Then there exists such that for p ≥ p, there holds where In addition, the expectation Proof.Define the sets By a case distinction, we readily verify and thus d HK (µ, µ † ) 2 dγ p (ε) i.e., ε satisfies (5.10).Moreover, expanding the square in the definition of A 2 , we conclude that (5.9) also holds due to 2C 4 (∥ε∥ Hence, for ε ∈ A 1 ∩ A 2 , there holds M(ε) = { µ} and sup µ∈M(ε) by Proposition 5.6.Next, we estimate I 1 by applying Proposition 3.3 and [24,Proposition 7.8].Together with Lemma A.1 this yields where the first inequality follows from ε ̸ ∈ A 2 and the second follows from ε ∈ A 1 .Hence, if we choose ∩ A 1 is empty and I 2 = 0. Together with (6.3)-( 6.5), we obtain (6.1) for every p ≥ p.The equality in (6.2) follows immediately from the closed form expression (5.6) for δ m and ε ∼ N (0, p −1 Σ 0 ).□ Let us interpret this result: By choosing β 0 large enough, the second term on the right hand side of (6.6) becomes negligible, i.e., for some 0 < δ ≪ 1.As a consequence, due to its closed form representation (6.2), E γp [∥δ m∥ 2  W  † ] provides a computationally inexpensive, approximate upper surrogate for the worst-case meansquared error which vanishes as p → ∞.Moreover, due to its explicit dependence on the measurement setup, it represents a suitable candidate for an optimal design criterion in the context of optimal sensor placement for the class of sparse inverse problems under consideration.This potential will be further investigated in a follow-up paper.Remark 6.2.It is worth mentioning that the constant 8 appearing on the right hand side of (6.6) is not optimal and is primarily a result of the proof technique.In fact, by appropriately selecting constants in Propositions B.2 and 5.2, it is possible to replace 8 by 1 + δ, where 0 < δ ≪ 1 at the cost of increasing p.We will illustrate the sharpness of the estimate of the worst-case mean-squared error by E γp [∥δ m∥ 2  W † ] in the subsequent numerical results.
Remark 6.3.Relying on similar arguments as in the proof of Theorem 6.1, we are also able to derive pointwise estimates on the Hellinger-Kantorovich distance which hold with high probability.Indeed, noticing that (6.4) holds in the set A 1 ∩ A 2 , we derive a lower probability bound for By invoking Lemma A.1, one has Hence, since exp(−x 2 ) → 0 as x → ∞, we can see that for every δ ∈ (0, 1), one can choose β 0 and p large enough such that Therefore, with probability at least 1 − δ, we have for realization ε of the noise.Furthermore, employing Lemma A.1 again, we know that with probability at least 1 − δ, and independently from p, one has ∥ε∥ Σ −1 ≤ −2N o ln(δ/2).Hence, by Proposition 5.2 together with ε ∈ A 1 ∩ A 2 , we have with probability at least 1 − 2δ.

Numerical results
We end this paper with the study of some numerical examples to illustrate our theory.We consider a simplified version of Example 3.1: • The source domain Ω s and observation domain • The measurement points {x 1 , . . ., x No } ⊂ Ω o vary between the individual examples and are marked by grey points in the respective plots.The associated noise model is given by ε ∼ N (0, Σ) with Σ −1 = pΣ −1 0 , where Following our theory, we attempt to recover µ † by solving (P β,ε ) using the a priori parameter choice rule β(p) = β 0 / √ p.The regularized problems are solved by the Primal-Dual-Active-Points method, [26,31], yielding a solution μ.Since the action of the forward operator K on sparse measures can be computed analytically, the algorithm is implemented on a grid free level.In addition, we compute a stationary point m of the nonconvex problem (5.1) inducing the measure µ from (5.7).This is done by a similar iteration to the Gauss-Newton sequence (B.10) with a nonsmooth adaptation to handle the ℓ 1 -norm and an added globalization procedure to make it converge without restrictions on the data.We note that this solution depends on the initialization of the algorithm at m † , which is usually unavailable in practice.To evaluate the reconstruction results in a qualitative way, we follow [11] by considering the dual certificates and pre-certificates; see Section 3. Our Matlab implementation is available at https://github.com/hphuoctruong/OED_SparseInverseProblems.
Example 1.In the first example, we illustrate the reconstruction capabilities of the proposed ansatz for different measurement setups and with and without noise in the observations.To this end, we attempt to recover the reference measure µ † using a variable number N o of uniformly distributed sensors.For noisy data, the regularization parameter is selected as β = β 0 / √ p where β 0 = 2 and p = 10 4 .We first consider the exact measurement data with N o ∈ {6, 9, 11} and try to obtain µ † by solving (P 0 ).The results are shown in Figure 1.We observe that with 6 sensors, the pre-certificate η PC is not admissible.Recalling [11,Proposition 7], this implies that µ † is not a minimum norm solution.In contrast, the experiments with 9 and 11 uniform sensors provide admissible pre-certificates.In these situations, the pre-certificates coincide with the minimum norm certificates and the ground truth µ † is indeed an identifiable minimum norm solution.Next, we consider noisy data and solve (P β,ε ) for the aforementioned choice of β(p).Following the observation in the first example, we only evaluate the reconstruction results obtained by 9 and 11 uniform sensors.In the absence of the measurement data obtained from experiments, we generate synthetic noisy measurements where the noise vector ε is a realization of the Gaussian random noise ε ∼ N (0, Σ).The results are shown in Figure 2. Since µ † is identifiable in these cases, µ and μ coincide and closely approximate µ † with high probability for an appropriate choice of β 0 and p large enough.Both properties can be clearly observed in the plots, where β 0 = 2.
Example 2. In the second example we study the influence of the parameter choice rule on the reconstruction result.To this end, we fix the measurement setup to 9 uniformly distributed sensors.We recall that the a priori parameter choice rule is given by β(p) = β 0 / √ p.According to Section 6.2, selecting a sufficiently large value for β 0 is recommended to achieve a high quality reconstruction.To determine a useful range of regularization parameters, we solve problem (P β,ε ) for a sequence of regularization parameters using PDAP.Here, we choose β 0 ∈ {0.5, 1, 2} and p ∈ {10 4 , 10 5 , 10 6 }.In Figure 3, different reconstruction results are shown for the same realization of noise, β 0 ∈ {0.5, 1, 2} and p = 10 4 .As one can see, for this particular realization of the noise, the number of spikes is recovered exactly in the case β 0 = 2 and we again observe that µ = μ.In contrast, for smaller β 0 , the noisy pre-certificate is not admissible.Hence, while µ still provides a good approximation of µ † , μ admits two additional spikes away from the support of µ † .These observations can be explained by looking at Theorem 6.1 the second term on the right hand side of the inequality becomes negligible for increasing β 0 and large enough p.Thus, roughly speaking, the parameter β 0 controls the probability of the "good events" in which µ is the unique solution of (P β,ε ).
Finally, we address the reconstruction error from a quantitative perspective.For this purpose, we simplify the evaluation of the maximum mean-squared error (MSE) by inserting the solution μ computed algorithmically.We note that this could only lead to an under-estimation of the maximum error in the case of non-unique solutions of (P β,ε ); a degenerate case that is unlikely to occur in practice.Moreover, the expectation is approximated using 10 3 Monte-Carlo samples.Additionally, we use the closed form expression (6.2) for evaluating the linearized estimate E γp [∥δ m∥ 2  W † ] exactly.Here, the expectations are computed for β 0 ∈ {2, 0.5}.The results are collected in Table 1.We make several observations: Clearly, the MSE decreases for increasing p, i.e. lower noise level.For increased β 0 , the behavior differs: For the theoretical quantities m and δ m increased β 0 only introduces additional bias and thus increases error.For the estimator μ, the increased regularization however leads to generally improved results, since the probability of µ ̸ = μ is decreased.We p = 10 4 p = 10 5 p = 10 6 p = 10  1. Reconstruction results with β 0 = 2 and β 0 = 0.5.
highlight in bold the estimator which performed best for each β 0 .Here, the results conform to Theorem 6.1:For larger β 0 , the second term on the right-hand side of (6.1) is negligible and the linearized estimate provides an excellent bound on the MSE for both µ and μ.We also note that the estimate is closer to the MSE in the limiting case for larger p.In contrast, for β = 0.5, the linearized estimate and the MSE of µ are much smaller than the MSE of the estimator μ.This underlines the observation that Theorem 5.6 requires further restrictions on the admissible noises in comparison to Proposition 5.2.
Example 3. The final example is devoted to compare the reconstruction results obtained by uniform designs and an improved design chosen by heuristics.To this end, we consider three measurement setups: uniformly distributed setups with 6 and 11 sensors, respectively, and one with 6 sensors selected on purpose.More precisely, in the later case, we place the sensors at Ω o = {−0.8,−0.6, −0.4,−0.1, 0.1, 0.4}.The different error measures are computed as in the previous example and the results are gathered in Table 2.We observe that the measurement setup with 6 11 sensors "selected" 6 sensors 6 sensors p = 10

Inf
Table 2. Reconstruction results with different sensor setups.
selected sensors performs better than the uniform ones.Moreover, the linearized estimate again provides a sharp upper bound on the error for both ten uniform and six selected sensors but yields numerically singular Fisher information matrices for six uniform sensors (denoted as Inf in the table), i.e. µ † is not stably identifiable in this case.Note that the estimator μ still yields somewhat useful results, which are however affected by a constant error due to the difference in minimum norm solution and exact source as depicted in Figure 1 and do not improve with lower noise level.These results suggest that the reconstruction quality does not only rely on the amount of measurements taken but also on their specific setup.In this case, we point out that the selected sensors are chosen to be adapted to the sources as every two sensors are placed on the two sides of every source.Thus the obtained results imply that if we have some reasonable prior information on the source positions and amplitudes, one may obtain a better sensor placement setup by incorporating it in the design of the measurement setup.This leads to the concept of optimal sensor placement problems for sparse inversion which we will consider in a future work.

Conclusion
In the present work, we have considered the inverse problem of estimating an unknown sparse signal µ † from finitely many measurements perturbed by Gaussian random noise which was formulated as a linear, ill-posed operator equation in the space of Radon measures.The main result of the paper is an asymptotical sharp upper bound on the mean-squared error defined in terms of the Hellinger-Kantorovich distance of a nonsmooth Tikhonov-type estimator which is confirmed by extensive numerical experiments.Its proof relies on three key concepts: A suitable a priori regularization parameter choice rule β = β(p) which is adapted to the overall precision of the measurements p, the non-degeneracy of the minimal-norm dual certificate as well as a careful linearization argument for the H-K distance on a quantifiable set of random events.In comparison to the intractable mean-squared error, the new bound is easily computable and explicitly depends on the locations of the measurement sensors as well as their relative precision.In perspective, these observations suggest the application of this new-found upper estimate in the context of optimal sensor design for sparse inverse problems.However, we also point out that a practical realization of such an approach is not straightforward since the derived upper bound, i.e. the prospective design criterion, depends on the unknown source µ † and the non-degeneracy of the minimal-norm certificate, a property that also inherently depends on the measurement setup.Addressing these problems goes beyond the scope of the current paper and will be addressed in future work.Moreover, the extension of the presented result towards vector measures as, e.g., encountered in acoustic inversion is of great interest.and thus Using Lemma A.2, we conclude the measurability of the worst-case distance.
Proposition A.3.The function defined in (A.1) is γ p -measurable.Moreover, there holds Hence Err[μ] is upper semicontinuous and thus measurable w.r.t γ p .Finally, we apply (A.2) to conclude

Proofs of Proposition
In this section we provide the omitted proofs of Proposition 5.2 and 5.4, respectively, as well as all the auxiliary results needed in their derivation.
Proposition B.1.The following estimates hold: Here, we choose w n = |q n |.Hence, by estimating term by term, we have Then for every m ∈ B W † (m † , r † ), there holds sign q = sign q † and R(q, q † ) ≤ 2 and 1/2∥q where R(q, q † ) is the maximal ratio of the weights w n and w † n from Proposition 4.2.In addition, for all m, m ′ ∈ B W † (m † , r † ) and δm, there holds where This implies 1/2 ≤ q n /q † n ≤ 3/2 for all n = 1, 2, . . ., N s .Hence, sign q = sign q † and (B.3) follows.Also, the condition r † ≤ d w † n (y † n , ∂Ω)/2 guarantees that y n ∈ Ω s for all n = 1, . . ., N s .By Proposition B.1 and (B.3), it can now be seen that  ) with ρ = ρ = sign q † .In order to obtain the claimed results, we aim to show that T is a contraction and argue similarly to the proof of the Banach fixed point theorem.However, since the correct domain of definition for the map T is difficult to determine beforehand, we provide a direct proof.
We start by showing that T is Lipschitz continuous on the ball B W † (m † , r) for some as of yet undetermined 0 < r ≤ r with Lipschitz constant κ(r) ≤ 1/2 if ε is chosen suitably.For this purpose, consider two points m and m ′ in B W † (m † , r), their difference δm = m − m ′ and the difference of their images δm T = T (m) − T (m ′ ).Note that We multiply this equation from the left with (δm T ) ⊤ and consider each term on the right hand side separately.Using Proposition B.2, we have for the first term For the second term we estimate and for the third term we have Since m, m ′ are contained in the ball B W † (m † , r) it follows that using the fact that one has Dividing by ∥m T ∥ W † , the estimate The contraction estimate above holds for any r ≤ r under the assumption that the points under consideration lie in the appropriate ball.In order to ensure contraction, we need to establish an appropriate bound and assumptions on the data.For this, we consider the linearized estimate .6).Using the weighted W † -norm defined in Proposition B.1, one has Hence, G ′ (m † ).This implies  Measurements k[x, y]q given m = (y; q), (5.2) m(ε), δ m(ε) Stationary point of (5.1), linear approximation of m(ε), (5.6).

Proposition 5 . 1 .
The solutions m to (5.1) fulfill the stationarity condition