Brought to you by:
Paper

Variational regularization theory based on image space approximation rates

Published 10 May 2021 © 2021 IOP Publishing Ltd
, , Citation Philip Miller 2021 Inverse Problems 37 065003 DOI 10.1088/1361-6420/abf5bb

0266-5611/37/6/065003

Abstract

We present a new approach to convergence rate results for variational regularization. Avoiding Bregman distances and using image space approximation rates as source conditions we prove a nearly minimax theorem showing that the modulus of continuity is an upper bound on the reconstruction error up to a constant. Applied to Besov space regularization we obtain convergence rate results for 0, 2, q- and 0, p, p-penalties without restrictions on p, q ∈ (1, ). Finally we prove equivalence of Hölder-type variational source conditions, bounds on the defect of the Tikhonov functional, and image space approximation rates.

Export citation and abstract BibTeX RIS

1. Introduction

The subject of this paper are ill-posed equations Ax = g with A a bounded linear operator mapping from a Banach space $\mathbb{X}$ to a Hilbert space $\mathbb{Y}$. We analyze approximations of an unknown $x\in \mathbb{X}$ given noisy, indirect observations gδ satisfying ${\Vert}{g}^{\delta }-Ax{{\Vert}}_{\mathbb{Y}}{\leqslant}\delta $ with a fixed noise level δ > 0.

In this context ill-posedness means that the unknown x does not depend continuously on the observations gδ . As a naive application of the inverse of A may therefore amplify the noise indefinitely regularization is needed to compute stable approximations of the unknown. Here, we study variational regularization with a convex penalty $\mathcal{R}$ defined on $\mathbb{X}$. More precisely, we consider the Tikhonov functional given by

and denote its set of minimizers by

A central aim of regularization theory are upper bounds on the distance $L\left(x,{\hat{x}}_{\alpha }\right)$ between x and estimators ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$ with respect to some loss function L. For ill-posed problems the convergence of ${\hat{x}}_{\alpha }$ to x for δ → 0 can be arbitrarily slow in general. Therefore, upper bounds on the error require regularity conditions on the true solution x, which are referred to as source conditions in regularization theory. The name comes from the first such conditions in a Hilbert space setting, $x={\left({A}^{{\ast}}A\right)}^{\nu /2}\omega ,\enspace \nu { >}0$, where ω is referred to as source generating x. This condition implies the convergence rate ${\Vert}x-{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{X}}=\mathcal{O}\left({\delta }^{\frac{\nu }{\nu +1}}\right)$ in the Hilbert space norm that defines the penalty. In [21] convergence rates in Hilbert scales are proven under source conditions of the form x = φ(A*A)ω for more general functions φ. Nevertheless, we restrict our attention to Hölder-type convergence rates in this paper. A generalization of the above source condition for ν = 1 to convex or Banach space penalties is given by source-wise representations

Equation (1.1)

leading to the convergence rate $\mathcal{O}\left(\delta \right)$ in the Bregman divergence of $\mathcal{R}$ (see [3]). Slower rates of convergence in Banach space settings can be shown under variational source conditions [23, 24] or under approximate source conditions [13, 14]. We refer to [6] for a comparison of the latter two concepts. Recently in [11] convergence rates are shown under the condition ${\left({A}^{{\ast}}A\right)}^{\nu }\omega \in \partial \mathcal{R}\left(x\right)$ for convex penalties defined on Hilbert spaces. In [16] upper bounds on Tα (x, Ax) − Tα (xα , Ax) (defect of the Tikhonov functional) in terms of α are used as a source condition.

In this work we consider Hölder-type image space approximation rates, i.e. bounds of the form

Equation (1.2)

for some $\nu \in \left[\right.0,\infty \left.\right)$ and c ⩾ 0. These will play the role of source conditions. In many situations these kind of bounds can be proven under source conditions. (see e.g. [18, theorem 2.3], [16, proposition 6]).

We first prove a bound on $L\left(x,{\hat{x}}_{\alpha }\right)$ uniformly on the set of all x satisfying (1.2) in terms of the modulus of continuity. One main advantage of our analysis is the flexibility in the choice of the loss function L. Then for penalties given by Banach space norm powers we work out a characterization of condition (1.2) in terms of real interpolation spaces. This leads to convergence rate results with regularity conditions given by real interpolation spaces. As examples we consider weighted p -regularization, Besov space 0, p, p- and 0, 2, q-regularization. Our approach seems to allow for the first time to obtain minimax optimal rates for all p resp. q in (1, ). Finally we compare condition (1.2) to source conditions used in the literature. We prove equivalence of (1.2), Hölder-type variational source conditions (used e.g. in [7, 19]) and Hölder-type bounds on the defect of the Tikhonov functional. In particular, this equivalence yields a characterization of (1.2) that does not directly depend on the minimizers xα .

The structure of the paper is as follows: in section 2 we present our main results. Sections 35 are devoted for the proofs of the main results and establish some new techniques which may be of some independent interest in variational regularization theory. We finish with an outlook where we also discuss limitations of the present work.

2. Main results

To give an overview over the main results of this paper, we present and discuss the theorems in their precise mathematical form and refer to the proofs given in the sections 35.

2.1. Minimax convergence rates

Assumption 2.1. Let τ be a topology such that $\left(\mathbb{X},\tau \right)$ is a locally convex Hausdorff space and $\mathcal{R}:\mathbb{X}\to \left(-\infty ,\infty \right]$ a proper, convex function. We assume that the sublevel set $\left\{x\in \mathbb{X}:\mathcal{R}\left(x\right){\leqslant}\lambda \right\}$ is τ-compact for all $\lambda \in \mathbb{R}$. Note that this implies that $\mathcal{R}$ is lower semi-continuous on $\left(\mathbb{X},\tau \right)$.

Let $\mathbb{Y}$ be a Hilbert space and $A:\mathbb{X}\to \mathbb{Y}$ a linear, τ-to-weak continuous operator.

Assumption 2.1 implies τ-compactness of the sublevel sets of the Tikhonov functional. Using the finite intersection property of these sets one can show that Rα (g) is nonempty for all $g\in \mathbb{Y}$. Furthermore, for every g ∈ im(A) there exist a (possibly not unique) $\mathcal{R}$-minimal $x\in \mathbb{X}$ with Ax = g, i.e. $\mathcal{R}\left(x\right){\leqslant}\mathcal{R}\left(z\right)$ for all $z\in \mathbb{X}$ with Az = g. Let $\nu \in \left[\right.0,\infty \left.\right)$ and ϱ > 0. We define

Equation (2.1)

Note that xKν if and only if a bound (1.2) holds true and ϱν (x) is the smallest possible constant c > 0.

Let $L:\mathbb{X}{\times}\mathbb{X}\to \left[0,\infty \right]$ satisfy the triangle inequality. We use L to measure the reconstruction error.

The first and central result is a uniform bound in ${K}_{\nu }^{\varrho }$ on $L\left(x,{\hat{x}}_{\alpha }\right)$ with ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$ in terms of the modulus of continuity. Recall that the latter is given by

Equation (2.2)

for a subset $K\subset \mathbb{X}$.

We consider two parameter choice rules for the regularization parameter α. An a priori rule requiring prior knowledge of the parameter ν in (1.2) characterizing the regularity of the unknown x, and the discrepancy principle as most well-known a-posteriori rule.

Theorem 1. Let $\nu \in \left(\right.0,1 \left.\right]$ and ϱ, α > 0. Suppose $x\in {K}_{\nu }^{\varrho }$. Let α > 0 and ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$.

  • (a)  
    (A priori rule) Let cr cl > 0. If ${c}_{l}{\varrho }^{-\frac{1}{\nu }}{\delta }^{\frac{1}{\nu }}{\leqslant}\alpha {\leqslant}{c}_{r}{\varrho }^{-\frac{1}{\nu }}{\delta }^{\frac{1}{\nu }}$, then
    with ${c}_{1}{:=}1+{c}_{r}^{\nu }$ and ${c}_{2}{:=}2+{c}_{l}^{-\nu }$.
  • (b)  
    (Discrepancy principle) Let CD > cD > 1. If ${c}_{D}\delta {\leqslant}{\Vert}{g}^{\delta }-A{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}{C}_{D}\delta $, then
    with d1 := 1 + CD and ${d}_{2}{:=}2+{\left({c}_{D}-1\right)}^{-1}$.

The proof of theorem 1 can be found in subsection 3.5. Under mild assumptions theorem 1 gives rise to an almost minimax result in the following manner. Recall that the worst case error of a reconstruction map $R:\mathbb{Y}\to \mathbb{X}$ on a set $K\subset \mathbb{X}$ is given by

and satisfies the lower bound

Equation (2.3)

(see [5, remark 3.12], [28, lemma 3.11] or [4, 4.3.1. proposition 1]). Let ${\bar{R}}_{\alpha }:\mathbb{Y}\to \mathbb{X}$ satisfy ${\bar{R}}_{\alpha }\left({g}^{\delta }\right)\in {R}_{\alpha }\left({g}^{\delta }\right)$ for all ${g}^{\delta }\in \mathbb{Y}$ with either α = α(δ) satisfying the a priori parameter choice given in theorem 1(a) or α = α(δ, gδ ) satisfying the discrepancy principle in theorem 1(b). In the case ${\Omega}\left(\delta ,{K}_{\nu }^{\varrho }\right)\sim {\varrho }^{e}{\delta }^{f}$ for some exponents e, f > 0 this yields a minimax result

This shows that up to a constant C no method can achieve a better approximation uniformly on ${K}_{\nu }^{\varrho }$.

Moreover, we would like to highlight the flexibility in the choice of the loss function L. Many recent works in Banach space or convex regularization theory are restricted to error bounds in the Bregman divergence (see e.g. [7, 16, 19, 29]). In some situations the meaning of the Bregman divergence is unclear and lower bounds on the Bregman distance are required to obtain more tangible statements. In [28] these lower bounds cause a restriction on the parameters s, p, q of the Besov scale. By applying theorem 1 to Besov space regularization we can overcome these restrictions.

2.2. Convergence rate theory for Banach space regularization

Here we consider $\mathcal{R}:\mathbb{X}\to \left[\right.0,\infty \left.\right)$ given by $\mathcal{R}\left(x\right)=\frac{1}{u}{\Vert}x{{\Vert}}_{\mathbb{X}}^{u}$ for fixed u ∈ [1, ). We assume ${\mathbb{X}}_{A}$ to be a Banach space with a continuous, dense embedding $\mathbb{X}\subset {\mathbb{X}}_{A}$ such that A extends to a norm isomorphism $A:{\mathbb{X}}_{A}\to \mathbb{Y}$, i.e. there exists a constant M ⩾ 1 such that

Equation (2.4)

Note that injectivity is necessary for (2.4). On the other hand injectivity of $A:\mathbb{X}\to \mathbb{Y}$ suffices for the existence of a space ${\mathbb{X}}_{A}$ such that (2.4) holds with M = 1. (Take the Banach completion of $\mathbb{X}$ in the norm $x{\mapsto}{\Vert}Ax{{\Vert}}_{\mathbb{Y}}$).

For example, in Besov space settings we will assume ${\mathbb{X}}_{A}$ a space with negative smoothness index, and we consider spaces $\mathbb{X}$ with smoothness index 0.

Moreover we need the following assumption on K1 and ϱ1 defined in (2.1). Recall that a quasi-norm satisfies the properties of norm except that the triangle inequality is replaced by ${\Vert}x+y{\Vert}{\leqslant}c\left({\Vert}x{\Vert}+{\Vert}y{\Vert}\right)$ for a constant c > 0. A complete and quasi-normed vector space is called a quasi-Banach space.

Assumption 2.2. Let u ∈ (0, ). Suppose K1 is a vector space and that there is a quasi-norm ||⋅||lin on K1 such that (K1, ||⋅||lin) is a quasi-Banach space. Moreover assume

This assumption is motivated by the computation of K1 for the examples below.

Recall that for a quasi-Banach space ${\mathbb{X}}_{S}$ with a continuous embedding ${\mathbb{X}}_{S}\subset {\mathbb{X}}_{A}$ and θ ∈ (0, 1) the real interpolation space ${\left({\mathbb{X}}_{A},{\mathbb{X}}_{S}\right)}_{\theta ,\infty }$ consists of all $x\in {\mathbb{X}}_{A}$ such that

Here the K-functional is given by

For the definition of the real interpolation spaces ${\left({\mathbb{X}}_{A},{\mathbb{X}}_{S}\right)}_{\theta ,q}$ for q ∈ (0, ) we refer to [2].

Theorem 2 (Error bounds). Suppose (2.4) and assumption 2.2 hold true. If $\mathbb{X}$ is not reflexive, suppose assumption 2.1 holds true.

Let ${\mathbb{X}}_{L}$ be a Banach space with a continuous embedding ${\mathbb{X}}_{L}\subset {\mathbb{X}}_{A}$. Let 0 < ξ < θ < 1 and δ, ϱ, α > 0 and cr cl > 0, CD > cD > 1. Suppose there is a continuous embedding ${\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\xi ,1}\subset {\mathbb{X}}_{L}$. Assume

Let ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$. There exists a constant C > 0 independent of x, δ and ϱ such that whenever α satisfies either

the bound

holds true.

We refer to subsection 4.4 for the proof of theorem 2.

Remark 2.3. The statement of the theorem remains valid in the limiting case θ = 1 where the source condition in terms of ${\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }$ has to be replaced by simply xK1 with ||x||linϱ. Here the a priori rule is αϱ−(u−1) δ.

We illustrate the impact of this result by applying it to three more concrete Banach space regularization setups.

Example 1:  Weighted p -norm penalization. Let Λ be a countable index set, p ∈ (0, ) and $\bar{\omega }={\left({\bar{\omega }}_{j}\right)}_{j\in {\Lambda}}$ a sequence of positive reals. We consider weighted sequence spaces ${\ell }_{\bar{\omega }}^{p}$ defined by

We assume that the forward operator maps a weighted 2-space isomorphically to the image space $\mathbb{Y}$. More precisely, we suppose that (2.4) holds true with ${\mathbb{X}}_{A}={\ell }_{\bar{a}}^{2}$ for $\bar{a}={\left({\bar{a}}_{j}\right)}_{j\in {\Lambda}}$ a sequence of positive real numbers.

Moreover let p ∈ (1, 2) and $\bar{r}={\left({\bar{r}}_{j}\right)}_{j\in {\Lambda}}$ a sequence of weights such that $\bar{a}{\bar{r}}^{-1}$ is bounded. We consider $\mathbb{X}={\ell }_{\bar{r}}^{p}\subset {\ell }_{\bar{a}}^{2}$ (see [20, proposition A.1]) with $\mathcal{R}\left(x\right)=\frac{1}{p}{\Vert}x{{\Vert}}_{\bar{r},p}^{p}$.

Furthermore we introduce weighted weak p -spaces. For $\mu ={\left({\mu }_{j}\right)}_{j\in {\Lambda}}$ and $\nu ={\left({\nu }_{j}\right)}_{j\in {\Lambda}}$ sequences of positive reals and t ∈ (0, ) those are defined by the following quasi-norms

We apply theorem 2 and obtain the following result.

Corollary 2.4 (Error bounds for weighted p-norm penalties). Let p ∈ (1, 2), t ∈ (2p − 2, p) and δ, ϱ, α > 0 and cr cl > 0, CD > cD > 1 and $\mu {:=}{\left({\bar{a}}^{2}{\bar{r}}^{-p}\right)}^{\frac{1}{2-p}},\nu {:=}{\left({\bar{a}}^{-1}\bar{r}\right)}^{\frac{2p}{2-p}}$. Assume $x\in {\ell }_{\mu ,\nu }^{t,\infty }$ with ||x||μ,ν,t ϱ and ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$. There is a constant C > 0 independent of x, δ and ϱ such that whenever α satisfies either

the bound

holds true.

The proof of corollary 2.4 can be found in subsection 4.4.

Remark 2.5. In the limiting case t = 2p − 2 the statement remains valid if one replaces ${\ell }_{\mu ,\nu }^{t,\infty }$ by ${K}_{1}={\ell }_{\bar{s}}^{2p-2}$ with $\bar{s}={\bar{a}}^{-\frac{1}{p-1}}{\bar{r}}^{\frac{p}{p-1}}$. Here we obtain the rate ${\Vert}x-{\hat{x}}_{\alpha }{{\Vert}}_{\bar{r},p}{\leqslant}C{\varrho }^{\frac{p-1}{p}}{\delta }^{\frac{1}{p}}$.

In [10] the rate $\mathcal{O}\left({\delta }^{\frac{1}{p}}\right)$ is already proven under a condition similar to (1.1). Here we obtain intermediate convergences rates between $\mathcal{O}\left({\delta }^{0}\right)$ and $\mathcal{O}\left({\delta }^{\frac{1}{p}}\right)$. This has the advantage that we obtain statements on the speed of convergences on larger sets.

Remark 2.6. Corollary 2.4 remains valid word by word in the case p = 1 (see [20, theorem 4.4]).

Example 2:  Besov 0, p, p-penalties. We introduce a scale of sequence spaces that allows to characterize Besov function spaces by decay properties of coefficients in wavelet expansions (see [26]).

Let ${\left({{\Lambda}}_{j}\right)}_{j\in {\mathbb{N}}_{0}}$ be a family of sets such that 2jd ⩽ |Λj | ⩽ CΛ2jd for some constant CΛ ⩾ 1 and all $j\in {\mathbb{N}}_{0}$. We consider the index set ${\Lambda}{:=}\left\{\left(j,k\right):j\in {\mathbb{N}}_{0},k\in {{\Lambda}}_{j}\right\}$.

For p, q ∈ (0, ) and $s\in \mathbb{R}$ we set ${b}_{p,q}^{s}=\left\{x\in {\mathbb{R}}^{{\Lambda}}:{\Vert}x{{\Vert}}_{s,p,q}{< }\infty \right\}$ with

with the usual replacements for p = or q = .

Let a > 0 and assume that the forward operator $A:{b}_{2,2}^{-a}\to \mathbb{Y}$ satisfies (2.4) with ${\mathbb{X}}_{A}={b}_{2,2}^{-a}$. Let p ∈ (1, ) (for p = 1 we refer to [20] again) with $\frac{d}{p}-\frac{d}{2}{\leqslant}a$. Then we have a continuous embedding ${b}_{p,p}^{0}\subset {b}_{2,2}^{-a}$ (see [27, 3.3.1.(6),(7), 3.2.4.(1)]).

We use $\mathbb{X}={b}_{p,p}^{0}$ with

Note that we have

Equation (2.5)

Hence for p < 2, this example is a special case of example 1.

Let $\tilde {s}=\frac{a}{p-1}$ and $\tilde {t}=2p-2$. For $0{< }s{< }\tilde {s}$ we set

Equation (2.6)

Here the application of theorem 2 yields the following error bound.

Corollary 2.7 (Error bounds for 0, p, p-penalties). Let $0{< }s{< }\tilde {s}$ and δ, ϱ, α > 0, cr cl > 0, CD > cD > 1. Assume xks with ${\Vert}x{{\Vert}}_{{k}_{s}}{\leqslant}\varrho $ and ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$. There is a constant C > 0 independent of x, δ and ϱ such that whenever α satisfies either

the bound

holds true.

The proof of corollary 2.7 can be found in subsection 4.4.

Remark 2.8. In the limiting case $s=\tilde {s}$ the result remains valid if one replaces ks by ${K}_{1}={b}_{\tilde {t},\tilde {t}}^{\tilde {s}}$ and we obtain the bound ${\Vert}x-{\hat{x}}_{\alpha }{{\Vert}}_{0,p,p}{\leqslant}C{\varrho }^{\frac{p-1}{p}}{\delta }^{\frac{1}{p}}$.

For p = 2 we have ${k}_{s}={b}_{2,\infty }^{s}$ (see [27, 3.3.6.(9)]). The following proposition provides a nesting of ks for p ≠ 2 by Besov sequence spaces.

Proposition 2.9. Let $0{< }s{< }\tilde {s}$ and $t=\frac{2pa}{\left(2-p\right)s+2a}$.

  • (a)  
    For p < 2 we have continuous embeddings
  • (b)  
    For p > 2 we have continuous embeddings ${b}_{t,t}^{s}\subset {k}_{s}\subset {b}_{2,\infty }^{s}$.

We refer to subsection 5.4 for a proof of proposition 2.9. For p < 2 the same argument as in [20, example 6.7] shows that describing the regularity of functions with jumps or kinks via their wavelet expansion in terms of ks allows for a higher value of s then using ${B}_{\mathrm{s},\infty }^{\mathrm{p}}\left({\Omega}\right)$ as in [28]. Therefore we obtain a faster convergence rate for this class of functions.

For p > 2 we measure the error in a stronger norm than the 2-norm. On the other hand the set on which we obtain convergence rates is smaller than ${b}_{2,\infty }^{s}$.

Example 3:  Besov 0, 2, q-penalties. Again we consider a > 0 and ${\mathbb{X}}_{A}={b}_{2,2}^{-a}$ with A satisfying (2.4). Let q ∈ (1, ). Then there is a continuous embedding $\mathbb{X}{:=}{b}_{2,q}^{0}\subset {b}_{2,2}^{-a}$ (see [27, 3.3.1.(7)]) and we choose

For a convergence analysis in the case q = 1 we refer to [17]. The application of theorem 2 provides:

Corollary 2.10 (Error bounds for 0, 2, q-penalties). Let $0{< }s{< }\frac{a}{q-1}$ and δ, ϱ, α > 0, cr cl > 0, CD > cD > 1. Assume $x\in {b}_{2,\infty }^{s}$ with ||x||s,2,ϱ and ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$. There is a constant C > 0 independent of x, δ and ϱ such that whenever α satisfies either

the bound

holds true.

The proof of corollary 2.10 can be found in subsection 4.4.

Remark 2.11. In the limiting case $s=\frac{a}{q-1}$ the result remains valid if one replaces ${b}_{2,\infty }^{s}$ by ${K}_{1}={b}_{2,\tilde {q}}^{\tilde {s}}$ with $\tilde {q}=2q-2$. Here we obtain ${\Vert}x-{\hat{x}}_{\alpha }{{\Vert}}_{0,2,2}{\leqslant}C{\varrho }^{\frac{q-1}{q}}{\delta }^{\frac{1}{q}}$.

In contrast to the analysis in [17] we measure the error in the 2-norm independent of the value of q, i.e. the error norm is not dictated by the penalty term.

The smaller q the larger is the region $0{< }s{< }\frac{a}{q-1}$ of regularity parameters for which we guarantee upper bounds. Furthermore we see that changing the fine index q while keeping p = 2 does not change the set where convergence rates are guaranteed, but it influences the parameter choice rule.

Example 4:  Radon transform. To give a more concrete example we discuss the Radon transform which appears as forward operator in computed tomography and positron emission tomography. This example also shows how our results apply to operators initially defined on function spaces.

Let $d\in \mathbb{N}$ with d ⩾ 2, ${\Omega}{:=}\left\{x\in {\mathbb{R}}^{d}:\vert x\vert {\leqslant}1\right\}$, ${S}^{d-1}{:=}\left\{x\in {\mathbb{R}}^{d}:\vert x\vert =1\right\}$ and $\mathbb{Y}{:=}{L}^{2}\left({S}^{d-1}{\times}\left[-1,1\right]\right)$. Then the Radon transform $R:{L}^{2}\left({\Omega}\right)\to {L}^{2}\left({S}^{d-1}{\times}\mathbb{R}\right)$ is given by

With $a=\frac{d-1}{2}$ it follows from [15, theorem 3.1] that R is a norm isomorphism from ${B}_{2,2}^{-a}\left({\Omega}\right)$ to $\mathbb{Y}$. Here ${B}_{2,2}^{-a}\left({\Omega}\right)$ denotes a Besov function space. We refer to the book [9] for an introduction to this scale of function spaces.

Furthermore, with Λ and the scale of spaces ${b}_{p,q}^{s}$ as introduced in example 2 and smax > a we consider a smax-regular wavelet system ${\left({\psi }_{\lambda }\right)}_{\lambda \in {\Lambda}}$ on Ω such that the synthesis operator

is well defined and a norm isomorphism for all $s\in \mathbb{R}$ and p, q ∈ (0, ] satisfying s ∈ (σp smax, smax) with ${\sigma }_{p}=\mathrm{max}\left\{d\left(\frac{1}{p}-1\right),0\right\}$ (see [26]). Now for $\mathcal{R}$ as in example 2 or example 3 we consider

Equation (2.7)

and obtain the following convergence rate results.

Corollary2.12 (Convergence rates for wavelet regularization of the Radon transform).

  • (a)  
    Let p ∈ (1, ). With the notation of example 2 suppose $0{< }s{< }\mathrm{min}\left\{\bar{s},{s}_{\text{max}}\right\}$ and δ, ϱ, α > 0, cr cl > 0, CD > cD > 1. Assume $f\in {B}_{t,t}^{s}\left({\Omega}\right)$ with ${\Vert}f{{\Vert}}_{{B}_{t,t}^{s}\left({\Omega}\right)}{\leqslant}\varrho $ and ${\hat{f}}_{\alpha }\in {S}_{\alpha }\left({g}^{\delta }\right)$ with $\mathcal{R}$ as in example 2. Then there is a constant C > 0 independent of f, δ and ϱ such that whenever α satisfies either
    the bound
    holds true. If p ⩽ 2 we also obtain the bound
  • (b)  
    Let q ∈ (0, ). Suppose $0{< }s{< }\mathrm{min}\left\{\frac{a}{q-1},{s}_{\mathrm{max}}\right\}$ and δ, ϱ, α > 0, cr cl > 0, CD > cD > 1. Assume $f\in {B}_{2,\infty }^{s}\left({\Omega}\right)$ with ${\Vert}f{{\Vert}}_{{B}_{2,\infty }^{s}\left({\Omega}\right)}{\leqslant}\varrho $ and ${\hat{f}}_{\alpha }\in {S}_{\alpha }\left({g}^{\delta }\right)$ with $\mathcal{R}$ as in example 3. Then there is a constant C > 0 independent of f, δ and ϱ such that whenever α satisfies either
    the bound
    holds true.

The proof of corollary 2.12 can be found in subsection 4.4. In corollary 2.12(a) it would also be sufficient to require $f\in \mathcal{S}{k}_{s}$ instead of $f\in {B}_{t,t}^{s}\left({\Omega}\right)$. Transferring the interpolation identity in (2.6) to function spaces shows that $\mathcal{S}{k}_{s}$ is independent of the chosen wavelet system (see [20, section 6.2] for a similar discussion).

In the same manner the presented theory can be applied to other linear, finitely smoothing forward operators as inverses of elliptic differential operators with smooth, periodic coefficients or specific periodic convolution operators (see [17, example 2.5] for more details).

2.3. Connections to source conditions

Assuming only assumption 2.1 we compare (1.2) to source conditions used in the literature. For a concave and upper semi-continuous function $\phi :\left[\right.0,\infty \left.\right)\to \left[\right.0,\infty \left.\right)$ we consider variational source conditions of the form

Equation (2.8)

In [19] this condition is used to prove convergence rates with respect to the twisted Bregman distance of $\mathcal{R}$ and it is shown that the source condition (1.1) implies (2.8) with $\phi \sim \sqrt{\cdot }$. In [7] necessity of (2.8) for convergence rates with respect to the twisted Bregman distance under a fixed parameter choice rule is proven.

Inspired by [16] we also study the defect of the Tikhonov functional

The following result shows that Hölder-type variational source conditions, Hölder-type bounds on the defect of the Tikhonov functional and Hölder type image space approximation rates are equivalent.

Theorem 3. Let $\nu \in \left(\frac{1}{2},1\right]$. Assume $x\in \mathrm{dom}\left(\mathcal{R}\right)$ is $\mathcal{R}$-minimal in A−1({Ax}) and xα Rα (Ax) for α > 0 is any selection of a minimizers for exact data. The following statements are equivalent:

  • (a)  
    There exists a constant c1 > 0 with ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}{c}_{1}{\alpha }^{\nu }$ for all α > 0.
  • (b)  
    There exists a constant c2 > 0 such that σx (α) ⩽ c2 α2ν−1.
  • (c)  
    There exists a constant c3 > 0 with (2.8) holds true for $\phi \left(t\right)={c}_{3}{t}^{\frac{2\nu -1}{2\nu }}$.

More precisely (a) implies (b) with ${c}_{2}=\frac{{c}_{1}^{2}}{4\nu -2}$, (b) implies (c) with ${c}_{3}=2{c}_{2}^{\frac{1}{2\nu }}$ and (c) implies (a) with ${c}_{1}={c}_{3}^{\nu }$.

We provide a proof of theorem 3 in subsection 4.4. The result allows the following representation of Kν in terms of variational source conditions:

Equation (2.9)

for all $\nu \in \nu \in \left(\frac{1}{2},1\right]$. Note that since the map $\left(\frac{1}{2},1\right]\to \left(0,\frac{1}{2}\right]$ given by $\nu {\mapsto}\frac{2\nu -1}{2\nu }$ is bijective this characterization grasps all Hölder type functions $\phi \left(t\right)=\mathcal{O}\left({t}^{\mu }\right)$ for $\mu \in \left(0,\frac{1}{2}\right]$. Due to [16, proposition 3] the largest meaningful exponent is $\mu =\frac{1}{2}$. Furthermore, (2.8) implies $x\in \mathrm{dom}\left(\mathcal{R}\right)$ and this in turn yields (a) with $\nu =\frac{1}{2}$. Therefore, we cannot expect a characterization of (a) for $\nu {< }\frac{1}{2}$ by variational source conditions. Hence all meaningful Hölder type variational source conditions of the form (2.8) are covered in theorem 3 and (2.9). In other words it is not possible to extend theorem 3 to a larger set of exponents.

Together with theorem 1 we see that Hölder-type variational source conditions imply upper bounds on the reconstruction error for any loss function given by the modulus of continuity. In contrast as far as the author knows all upper bounds in the literature derived from (2.8) are restricted to the twisted Bregman distance.

3. Minimax convergence rates on Kν

The aim of this section is to prove theorem 1. Here we only assume the topological assumptions given in assumption 2.1. We will follow an idea presented in the seminal paper [4]: any feasible procedure is nearly minimax (see [4, 4.3.1]). In our context feasibility means

  • (a)  
    Image space bounds: ${\Vert}Ax-A{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}c\delta $,
  • (b)  
    Regularity of the minimizers: ${\varrho }_{\nu }\left({\hat{x}}_{\alpha }\right){\leqslant}c{\varrho }_{\nu }\left(x\right)$ for some constant c > 0.

After proving feasibility we use the same argument as in [4, 4.3.1, proposition 2] to obtain a nearly minimax result.

3.1. Characterization of ARα as proximity mapping

This subsection provides an important preliminary that we use in several places throughout the paper. We introduce a convex function $\mathcal{Q}$ on $\mathbb{Y}$ that can be seen as a push forward of $\mathcal{R}$ through the linear operator A. We show that the proximity mapping of $\alpha \mathcal{Q}$ equals ARα . Recall that for a convex, proper and lower semi-continuous function $\mathcal{Q}:\mathbb{Y}\to \left(-\infty ,\infty \right]$ and $g\in \mathbb{Y}$ there is a unique minimizer ${\text{Prox}}_{\mathcal{Q}}\left(g\right)$ of the function $y{\mapsto}\frac{1}{2}{\Vert}g-y{{\Vert}}_{\mathbb{Y}}^{2}+\mathcal{Q}\left(y\right)$. The single-valued mapping

is called proximity mapping of Q (see [1, 11.4, definition 12.23]).

Lemma 3.1. We define

with inf ∅ = . Then $\mathcal{Q}$ is convex, proper and lower semi-continuous, and we have $\mathrm{dom}\left(\mathcal{Q}\right)=A\left(\mathrm{dom}\left(\mathcal{R}\right)\right)$.

Proof. Let $\lambda \in \mathbb{R}$. First we prove that ${L}_{\lambda }{:=}\left\{g\in \mathbb{Y}\enspace :\enspace \mathcal{Q}\left(g\right){\leqslant}\lambda \right\}$ satisfies

To this end let gLλ . There exists $x\in \mathbb{X}$ with Ax = g and $\mathcal{R}\left(x\right){\leqslant}\mathcal{R}\left(z\right)$ for all $z\in \mathbb{X}$ with Az = g. Then $\mathcal{R}\left(x\right)=\mathcal{Q}\left(g\right){\leqslant}\lambda $. On the other hand if $x\in \mathbb{X}$ with $\mathcal{R}\left(x\right){\leqslant}\lambda $ then $\mathcal{Q}\left(Ax\right){\leqslant}\mathcal{R}\left(x\right){\leqslant}\lambda $.

Taking union over $\lambda \in \mathbb{R}$ yields $\mathrm{dom}\left(\mathcal{Q}\right)=A\left(\mathrm{dom}\left(\mathcal{R}\right)\right)$. Hence $\mathcal{Q}$ is proper as $\mathcal{R}$ is proper. The sublevel sets Lλ are convex as the image of a convex set under a linear map and closed as the image of a τ-compact set under a τ-to-weak continuous map. Hence $\mathcal{Q}$ is convex and lower semi-continuous.□

Remark 3.2. Note that in the case of an injective forward operator A, the map $\mathcal{Q}$ is given by $\mathcal{Q}\left(g\right)=\mathcal{R}\left({A}^{-1}g\right)$ if g ∈ im(A) and $\mathcal{Q}\left(g\right)=\infty $ if $g\in \mathbb{Y}{\backslash}\text{im}\left(A\right)$ where ${A}^{-1}:\text{im}\left(A\right)\to \mathbb{X}$ denotes the inverse map of A.

Proposition 3.3. Let $g\in \mathbb{Y}$ and α > 0. Then

In particular $A{\circ}{R}_{\alpha }={\mathrm{Prox}}_{\alpha \mathcal{Q}}$ is single-valued. Hence $A{\hat{x}}_{\alpha }$ and $\mathcal{R}\left({\hat{x}}_{\alpha }\right)$ do not depend on the particular choice of ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left(g\right)$.

Proof. Let $v\in \mathrm{dom}\left(\mathcal{Q}\right)$. By lemma 3.1 we have v ∈ im(A). There exists $z\in \mathbb{X}$ with Az = v and $\mathcal{R}\left(z\right){\leqslant}\mathcal{R}\left(y\right)$ for all $y\in \mathbb{X}$ with Ay = v. By definition of $\mathcal{Q}$ that is $\mathcal{R}\left(z\right)=\mathcal{Q}\left(v\right)$. The first identity follows from

Inserting $v=A{\hat{x}}_{\alpha }$ yields $\mathcal{R}\left({\hat{x}}_{\alpha }\right)=\mathcal{Q}\left(A{\hat{x}}_{\alpha }\right)=\mathcal{Q}\left({\text{Prox}}_{\alpha \mathcal{Q}}\left(g\right)\right)$.□

The statement in proposition 3.3 can be read as follows: the function $\mathcal{Q}$ on $\mathbb{Y}$ stores all relevant information on $\mathcal{R}$ and A to recover the mapping ARα in one object. Note that the definition of Kν can be rephrased only in terms of $\mathcal{Q}$.

Remark 3.4. Suppose $x\in \mathrm{dom}\left(\mathcal{R}\right)$, α > 0 and xα Rα (Ax). In [16] the authors study upper bounds on $\mathcal{R}\left(x\right)-\mathcal{R}\left({x}_{\alpha }\right)$ (defect for penalty) and on σx (α) (defect for Tikhonov functional) in terms of α. The first quantity bounds the second and it is bounded by the double of the second (see [16, proposition 2.4]). In [16, remark 2.5] the authors rely on this nesting to argue that changing the selection of minimizers changes the defect for penalty at most by a factor of 2. Proposition 3.3 actually shows that the defect for penalty is independent of the choice of xα Rα (Ax).

Exploiting firm non-expansiveness (see [1, definition 4.1]) of proximal operators we draw a further conclusion of proposition 3.3.

Corollary 3.5 (Firm non-expansiveness). Let $g,h\in \mathbb{Y}$, α > 0, ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left(g\right)$ and ${\hat{z}}_{\alpha }\in {R}_{\alpha }\left(h\right)$. Then

Proof. By [1, proposition 12.27] the proximity operator ${\text{Prox}}_{\alpha \mathcal{Q}}$ satisfies

for all $g,h\in \mathbb{Y}$. Inserting the first identity in proposition 3.3 yields the claim.□

3.2. Properties of the sets Kν

The following proposition captures properties of the sets Kν . In particular, we show that Kν is nontrivial for $\nu \in \left(\right.0,1 \left.\right]$.

Lemma 3.6. We have

  • (a)  
    ${K}_{0}=\mathbb{X}$.
  • (b)  
    ${K}_{{\nu }_{2}}\subset {K}_{{\nu }_{1}}$ for 0 ⩽ ν1ν2.
  • (c)  
    ${K}_{\nu }=\underset{z\in \mathbb{X}}{\mathrm{argmin}}\enspace \mathcal{R}\left(z\right)+\mathrm{ker}\left(A\right)$ for all ν > 1.
  • (d)  
    $\mathrm{dom}\left(\mathcal{R}\right)+\mathrm{ker}\left(A\right)\subseteq {K}_{1/2}$.

Proof. 

  • (a)  
    Let $x\in \mathbb{X}$. We set ${D}_{x}{:=}\mathrm{inf}\left\{{\Vert}Ax-Ay{{\Vert}}_{\mathbb{Y}}:y\in \underset{z\in \mathbb{X}}{\text{argmin}}\enspace \mathcal{R}\left(z\right)\right\}$. Let $y\in {\text{argmin}}_{z\in \mathbb{X}}\enspace \mathcal{R}\left(z\right)$, α > 0 and xα Rα (Ax). Then
    As $\mathcal{R}\left(y\right){\leqslant}\mathcal{R}\left({x}_{\alpha }\right)$ this implies ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}{\Vert}Ax-Ay{{\Vert}}_{\mathbb{Y}}$. Hence ϱ0(x) ⩽ Dx < .
  • (b)  
    Suppose $x\in {K}_{{\nu }_{2}}$. Then
    implies ${\varrho }_{{\nu }_{1}}\left(x\right){\leqslant}{\varrho }_{{\nu }_{2}}{\left(x\right)}^{\frac{{\nu }_{1}}{{\nu }_{2}}}{\varrho }_{0}{\left(x\right)}^{1-\frac{{\nu }_{1}}{{\nu }_{2}}}$.
  • (c)  
    Let ν > 1. Suppose xKν . From [1, proposition 16.34] and proposition 3.3 we obtain
    Since ηα → 0 and Axα Ax for α → 0 in the norm topology of $\mathbb{Y}$ this implies $0\in \partial \mathcal{Q}\left(Ax\right)$. Hence $Ax\in {\text{argmin}}_{g\in \mathbb{Y}}\enspace \mathcal{Q}\left(g\right)$. Let $y\in \mathbb{X}$ be $\mathcal{R}$-minimal with Ay = Ax. Then
    Hence
    On the other hand assume $x=y+k\in {\text{argmin}}_{z\in \mathbb{X}}\enspace \mathcal{R}\left(z\right)+\mathrm{ker}\left(A\right)$. Then
    yields ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}=0$. Hence xKν .
  • (d)  
    Let $x=y+k\in \mathrm{dom}\left(\mathcal{R}\right)+\mathrm{ker}\left(A\right)$. From
    we obtain ${\varrho }_{1/2}\left(x\right){\leqslant}\sqrt{2\mathcal{R}\left(y\right)}$.□

The set Kν does not change for ν > 1. As announced in subsection 2.2 we will see that K1 is the set of elements satisfying source condition (1.1).

Moreover note that the last inequality in the proof of lemma 3.6(b) resembles an interpolation inequality. This gives a first hint to a connection to interpolation theory in the case of Banach space regularization.

3.3. Image space bounds

This subsection is devoted to error bounds in the image space $\mathbb{Y}$ in terms of the deterministic noise level and the image space approximation error for exact data. Let δ ⩾ 0, $x\in \mathbb{X}$ and ${g}^{\delta }\in \mathbb{Y}$ with ${\Vert}{g}^{\delta }-Ax{{\Vert}}_{\mathbb{Y}}{\leqslant}\delta $.

Lemma 3.7. The following inequalities

Equation (3.1)

Equation (3.2)

hold true for all α > 0, ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$, xα Rα (Ax).

Proof. Corollary 3.5 with g = gδ and h = Ax yields

We neglect the first summand on the left-hand side and obtain

and the second for

Proposition 3.8. Let $\nu \in \left(\right.0,1 \left.\right]$ and α, ϱ > 0. Suppose $x\in {K}_{\nu }^{\varrho }$.

  • (a)  
    Let cr > 0. If $\alpha {\leqslant}{c}_{r}{\varrho }^{-\frac{1}{\nu }}{\delta }^{\frac{1}{\nu }}$ then
  • (b)  
    Let cD > 1. If ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$ satisfies ${c}_{D}\delta {\leqslant}{\Vert}{g}^{\delta }-A{\hat{x}}_{\alpha }{\Vert}$, then

Proof. Let xα Rα (Ax).

  • (a)  
    By (3.1) and the definition of ϱν we obtain
  • (b)  
    The bound (3.2) implies
    Subtracting δ and rearranging yields the claim.□

3.4. Regularity of the minimizers

First we recall the well-known fact that the source condition (1.1) implies a linear convergence rate in the image space (see e.g. [12, lemma 3.5]).

Lemma 3.9. Let $z\in \mathbb{X}$ and assume $\omega \in \mathbb{Y}$ with ${A}^{{\ast}}\omega \in \partial \mathcal{R}\left(z\right)$. Then

Proof. The first order optimality condition yields ${\xi }_{\alpha }{:=}\frac{1}{\alpha }{A}^{{\ast}}A\left(z-{z}_{\alpha }\right)\in \partial \mathcal{R}\left({z}_{\alpha }\right)$. Solving the inequality

for ${\Vert}Az-A{z}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ proves the claim.□

Lemma 3.10. Let α > 0, ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$. Furthermore let β > 0, ${\left({\hat{x}}_{\alpha }\right)}_{\beta }\in {R}_{\beta }\left(A{\hat{x}}_{\alpha }\right)$ and xβ Rβ (Ax).

  • (a)  
    If β ∈ (0, α] then
  • (b)  
    If β ∈ [α, ) then

Proof. 

  • (a)  
    By the first order optimality condition the element ${\hat{x}}_{\alpha }$ satisfies the prerequisite ${A}^{{\ast}}\omega \in \partial \mathcal{R}\left({\hat{x}}_{\alpha }\right)$ of lemma 3.9 with $\omega =\frac{1}{\alpha }\left({g}^{\delta }-A{\hat{x}}_{\alpha }\right)$.By lemma A.1 the map $\alpha {\mapsto}\frac{1}{\alpha }{\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ is non increasing. Together with (3.2) we obtain
    Hence lemma 3.9 implies the claim.
  • (b)  
    We use first corollary 3.5 with g = Ax and $h=A{\hat{x}}_{\alpha }$ then (3.1) and finally non decreasingness of $\alpha {\mapsto}{\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ (see lemma A.1) to estimate
    The triangle inequality finishes the proof.□

Proposition 3.11. Let $\nu \in \left(\right.0,1 \left.\right]$ and ϱ, cl , α > 0. Suppose $x\in {K}_{\nu }^{\varrho }$ and ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left({g}^{\delta }\right)$. If ${c}_{l}{\varrho }^{-\frac{1}{\nu }}{\delta }^{\frac{1}{\nu }}{\leqslant}\alpha $, then ${\varrho }_{\nu }\left({\hat{x}}_{\alpha }\right){\leqslant}\left(2+{c}_{l}^{-\nu }\right)\varrho $.

Proof. Let βα. With $\delta {\leqslant}{c}_{l}^{-\nu }\varrho {\alpha }^{\nu }$ we estimate

Furthermore

Together with ${\Vert}Ax-A{x}_{\beta }{{\Vert}}_{\mathbb{Y}}{\leqslant}\varrho {\beta }^{\nu }$ for all β > 0 and xβ Rβ (Ax) the result follows from lemma 3.10.□

3.5. Almost minimaxity on the sets Kν

Now we are in position to give the proof of theorem 1.

Proof of theorem  1

  • (a)  
    By proposition 3.8 we have ${\Vert}Ax-A{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}{c}_{1}\delta $ and proposition 3.11 yields $x,{\hat{x}}_{\alpha }\in {K}_{\nu }^{{c}_{2}\varrho }$.
  • (b)  
    Using the triangle inequality we obtain
    Proposition 3.8 provides ${\left({c}_{D}-1\right)}^{\frac{1}{\nu }}{\varrho }^{-\frac{1}{\nu }}{\delta }^{\frac{1}{\nu }}{\leqslant}\alpha $. Therefore proposition 3.11 yields $x,{\hat{x}}_{\alpha }\in {K}_{\nu }^{{d}_{2}\varrho }$.In both cases the claim follows from the definition of the modulus Ω.□

4. Convergence rates theory for Banach space regularization

4.1. Source-wise representations and linear image space approximation

We start with a converse to lemma 3.9: a linear bound ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}=\mathcal{O}\left(\alpha \right)$ implies the source condition (1.1) and the minimal $\mathcal{O}$-constant ϱ1(x) agrees with the minimal norm ${\Vert}\omega {{\Vert}}_{\mathbb{Y}}$ attended by a source element ω. Similar results can be found in [12, lemma 4.1] and [22, proposition 4.1]. For sake of self-containedness we include a proof.

Proposition 4.1. Let $x\in \mathbb{X}$ with $\mathcal{R}\left(x\right)=\mathrm{inf}\left\{\mathcal{R}\left(z\right):z\in \mathbb{X}\enspace \;\text{with}\;\enspace Az=Ax\right\}$. Then

If this quantity is finite and xα Rα (Ax), α > 0 is any selection, then the net ${\left(\frac{1}{\alpha }\left(Ax-A{x}_{\alpha }\right)\right)}_{\alpha { >}0}$ convergences weakly for α ↘ 0 to the unique $\omega \in \mathbb{Y}$ with ${A}^{{\ast}}\omega \in \partial \mathcal{R}\left(x\right)$ and ${\Vert}\omega {{\Vert}}_{\mathbb{Y}}={\varrho }_{1}\left(x\right)$.

Proof. Taking the infimum over ω in lemma 3.9 yields

To prove the remaining inequality let $x\in \mathbb{X}$ with ϱ1(x) < . Then the net ${\left(\frac{1}{\alpha }\left(Ax-A{x}_{\alpha }\right)\right)}_{\alpha { >}0}$ is norm bounded in the Hilbert space $\mathbb{Y}$. By the Banach–Alaoglu theorem every null sequence of positive numbers has a subsequence αn > 0 such that $\frac{1}{{\alpha }_{n}}\left(Ax-A{x}_{{\alpha }_{n}}\right)$ converges weakly to some $\omega \in \mathbb{Y}$ with ${\Vert}\omega {{\Vert}}_{\mathbb{Y}}{\leqslant}{\varrho }_{1}\left(x\right)$. Lemma A.2 and the minimality assumption yield

Together with ${\Vert}Ax-A{x}_{{\alpha }_{n}}{{\Vert}}_{\mathbb{Y}}{\leqslant}{\varrho }_{1}\left(x\right){\alpha }_{n}$ we obtain $\mathcal{R}\left({x}_{{\alpha }_{n}}\right)\to \mathcal{R}\left(x\right)$. The first order optimality condition yields $\frac{1}{{\alpha }_{n}}{A}^{{\ast}}A\left(x-{x}_{{\alpha }_{n}}\right)\in \partial \mathcal{R}\left({x}_{{\alpha }_{n}}\right)$. Hence for $z\in \mathbb{X}$ we obtain

This shows ${A}^{{\ast}}\omega \in \partial \mathcal{R}\left(x\right)$. Therefore the stated identity is proven.

Being the preimage of the convex set $\partial \mathcal{R}\left(x\right)$ under the linear map A* the set $\left\{\omega \in \mathbb{Y}:{A}^{{\ast}}\omega \in \partial \mathcal{R}\left(x\right)\right\}$ is convex. Strict convexity of ${\Vert}\cdot {{\Vert}}_{\mathbb{Y}}$ yields uniqueness of ω. In particular this implies the convergence of the net.

Corollary 4.2. We have ϱ1(x) = 0 if and only if $x\in {\mathrm{argmin}}_{z\in \mathbb{X}}\enspace \mathcal{R}\left(z\right)$.□

Proof. By the second statement in proposition 4.1 we have ρ1(x) = 0 if and only if $0\in \partial \mathcal{R}\left(x\right)$. Hence the first order optimality condition $x\in {\text{argmin}}_{z\in \mathbb{X}}\enspace \mathcal{R}\left(z\right)$ if and only if $0\in \partial \mathcal{R}\left(x\right)$ yields the claim.

Example 4.3. Let p ∈ [1, 2], $\mathbb{X}={\ell }^{p}{:=}{\ell }^{p}\left(\mathbb{N}\right)$, $\mathbb{Y}={\ell }^{2}{:=}{\ell }^{2}\left(\mathbb{N}\right)$, A: p 2 the embedding operator given by xx and $\mathcal{R}$ given by $\mathcal{R}\left(x\right)=\frac{1}{p}{\Vert}x{{\Vert}}_{{\ell }^{p}}$. Let xp .

If p > 1 then $\partial \mathcal{R}\left(x\right)=\left\{\xi \right\}$ with |ξj | = |xj |p−1. The adjoint A* identifies with the embedding operator 2ℓp' with p' the Hölder conjugate of p. Hence xK1 if and only if ${\Vert}\xi {{\Vert}}_{{\ell }^{2}}{< }\infty $, and we have

Therefore assumption 2.2 is satisfied in this case.

For p = 1 we have $\xi \in \partial \mathcal{R}\left(x\right)$ if and only if ξj = 1 for xj > 0, ξj = −1 for xj < 0 and |ξj | ⩽ 1 for xj = 0. Hence K1 consists of all elements with finitely many non vanishing coefficients. We have ${\varrho }_{1}\left(x\right)=\#{\left\{j\in \mathbb{N}:{x}_{j}\ne 0\right\}}^{1/2}$ and assumption 2.2 is not fulfilled.

4.2. Computation of K1 for Banach space regularization

In this subsection we assume ${\mathbb{X}}_{A}$ is a Banach space with a dense, continuous embedding $\mathbb{X}\subset {\mathbb{X}}_{A}$ and that A extends to ${\mathbb{X}}_{A}$ such that (2.4) is satisfied. Let u ∈ [1, ) and consider the penalty given by $\mathcal{R}\left(x\right)=\frac{1}{u}{\Vert}x{{\Vert}}_{\mathbb{X}}^{u}$.

If $\mathbb{X}$ is reflexive we choose τ to be the weak topology on $\mathbb{X}$. Then the sublevel sets of $\mathcal{R}$ are τ-compact by the Banach–Alaoglu theorem. Moreover A is weak-to-weak continuous as it is bounded. Therefore, assumption 2.1 is automatically satisfied in this case.

In this subsection we compute K1 for the three penalties covered in the examples in subsection 2.2. We start with a tool that helps computing the function ϱ1 up to equivalence. Note that the density $\mathbb{X}\subset {\mathbb{X}}_{A}$ allows us to view the adjoint of the embedding as an embedding ${\mathbb{X}}_{A}^{\prime }\subset {\mathbb{X}}^{\prime }$.

Proposition 4.4. We have xK1 if and only if $\partial \mathcal{R}\left(x\right)\cap {\mathbb{X}}_{A}^{\prime }\ne \varnothing $. The function

satisfies

Proof. Suppose $\xi \in \partial \mathcal{R}\left(x\right)\cap {\mathbb{X}}_{A}^{\prime }$. Let $z\in {\mathbb{X}}_{A}$, then

Proposition B.2 provides $\omega \in \mathbb{Y}$ with ${\Vert}\omega {{\Vert}}_{\mathbb{Y}}{\leqslant}M{\Vert}\xi {{\Vert}}_{{{\mathbb{X}}_{A}}^{\prime }}$ and ${A}^{{\ast}}\omega =\xi \in \partial \mathcal{R}\left(x\right)$.

Together with proposition 4.1 this yields the first inequality.

Let $\omega \in \mathbb{Y}$, such that ${A}^{{\ast}}\omega \in \partial \mathcal{R}\left(x\right)$. Then

for all $z\in \mathbb{X}$. Hence ${\Vert}{A}^{{\ast}}\omega {{\Vert}}_{{\mathbb{X}}_{A}^{\prime }}{\leqslant}M{\Vert}\omega {{\Vert}}_{\mathbb{Y}}$. This proves the second inequality.□

Computation of K1 for weighted p -norm penalization. We revisit the first example in subsection 2.2. Recall ${\mathbb{X}}_{A}={\ell }_{\bar{a}}^{2}$ and $\mathbb{X}={\ell }_{\bar{r}}^{p}$ with p ∈ (1, 2).

Proposition 4.5. Let $\bar{s}={\bar{a}}^{-\frac{1}{p-1}}{\bar{r}}^{\frac{p}{p-1}}$. Then ${K}_{1}={\ell }_{\bar{s}}^{2p-2}$ with

Proof. Let $x\in {\ell }_{\bar{r}}^{p}$. Then $\partial \mathcal{R}\left(x\right)=\left\{\xi \right\}$ with $\vert {\xi }_{j}\vert ={r}_{j}^{p}\vert {x}_{j}{\vert }^{p-1}$. With ${\bar{\varrho }}_{1}$ as in proposition 4.4 and in view of proposition B.1 we obtain

Proposition 4.4 yields the result.□

Computation of K1 for Besov 0, p, p -penalties. Next we characterize K1 for example 2. Recall ${\mathbb{X}}_{A}={b}_{2,2}^{-a}$ and $\mathbb{X}={b}_{p,p}^{0}$ with p ∈ (1, ).

Proposition 4.6. Let $\tilde {s}=\frac{a}{p-1}$ and $\tilde {t}=2p-2$. Then ${K}_{1}={b}_{\tilde {t},\tilde {t}}^{\tilde {s}}$ with

Proof. The proof works along the lines of the proof of proposition 4.5 by identifying the expression for ||ξ||a,2,2 with ${\Vert}x{{\Vert}}_{\tilde {s},\tilde {t},\tilde {t}}$.□

Computation of K1 for Besov 0, 2, q -penalties. Finally we compute K1 for example 3 with ${\mathbb{X}}_{A}={b}_{2,2}^{-a}$ and $\mathbb{X}={b}_{2,q}^{0}$ with q ∈ (1, ).

Proposition 4.7. Let $\tilde {s}=\frac{a}{q-1}$ and $\tilde {q}=2q-2$. Then ${K}_{1}={b}_{2,\tilde {q}}^{\tilde {s}}$ with

Proof. If $x\in {b}_{2,q}^{0}$, then $\partial \mathcal{R}\left(x\right)=\left\{\xi \right\}$ with ${\xi }_{j,k}={\left({\sum }_{{k}^{\prime }}\vert {x}_{j,{k}^{\prime }}{\vert }^{2}\right)}^{\frac{q}{2}-1}\vert {x}_{j,k}\vert $. With ${\bar{\varrho }}_{1}$ as in proposition 4.4 and using proposition B.1 we obtain ${\bar{\varrho }}_{1}\left(x\right)={\Vert}\xi {{\Vert}}_{a,2,2}={\Vert}x{{\Vert}}_{\tilde {s},2,\tilde {q}}^{q-1}.$ proposition 4.4 yields the result.

Note that assumption 2.2 holds true for all three examples.

4.3. Characterizations of Kν

Kν via approximation by elements of K1. In [3, proposition 1] the authors point out that the set of elements satisfying the source condition (1.1) is the set of possible minimizers of the Tikhonov functional. Therefore one might suggest that the approximation error of $x\in \mathbb{X}$ by xα Rα (Ax) is determined by the best approximation from the family of sets

We consider the best approximation error

The function γx is well defined as corollary 4.2 yields $\varnothing \ne {\text{argmin}}_{z\in \mathbb{X}}\enspace \mathcal{R}\left(z\right)\subset {B}_{r}$ for all r ⩾ 0. Moreover it is non increasing as ${B}_{{r}_{1}}\subseteq {B}_{{r}_{2}}$ for r1r2.

The following proposition is the starting point to prove equivalence of Hölder-type bounds on γx and on ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$.

Proposition 4.8. Let $x\in \mathbb{X}$, α > 0 and xα Rα (Ax). Then

Proof. Proposition 4.1 and the first order optimality condition $\frac{1}{\alpha }{A}^{{\ast}}A\left(x-{x}_{\alpha }\right)\in \partial \mathcal{R}\left({x}_{\alpha }\right)$ provide ${\varrho }_{1}\left({x}_{\alpha }\right){\leqslant}\frac{1}{\alpha }{\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$. This proves the first inequality by definition of γg .

To show the second inequality let zBr . By proposition 4.1 there is $\omega \in \mathbb{Y}$ with ${\Vert}\omega {{\Vert}}_{\mathbb{Y}}{\leqslant}r$ and ${A}^{{\ast}}\omega \in \partial \mathcal{R}\left(z\right)$ hence

From 2αTα (xα , Ax) ⩽ 2αTα (z, Ax) and the last inequality we deduce

Taking the infimum over zBr and estimating the third summand using $ab{\leqslant}\frac{1}{2}{a}^{2}+\frac{1}{2}{b}^{2}$ we obtain

Hence ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}2{\gamma }_{x}\left(r\right)+2\alpha r$ and the choice $r=\frac{1}{4\alpha }{\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ yields the second inequality.□

As announced we see equivalence of Hölder-type bounds on γx and on ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ as a consequence.

Proposition 4.9. Let ν ∈ (0, ) and $x\in \mathbb{X}$. The following statements are equivalent:

  • (a)  
    There exists a constant c1 > 0 such that γx (r) ⩽ c1 rν for all r > 0.
  • (b)  
    There exists a constant c2 > 0 such ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}{c}_{2}{\alpha }^{\frac{\nu }{1+\nu }}$ for all α > 0 and xα Rα (Ax).

More precisely (a) implies (b) with ${c}_{2}=4{c}_{1}^{\frac{1}{1+\nu }}$ and (b) implies (a) with ${c}_{1}={c}_{2}^{1+\nu }$.

Proof. (a) ⇒ (b): the second inequality in proposition 4.8 yields

Multiplying by ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}^{\nu }$ and taking the power $\frac{1}{1+\nu }$ yields

(b) ⇒ (a): let r > 0. For $\alpha ={c}_{2}^{1+\nu }{r}^{-\left(1+\nu \right)}$ we obtain $\frac{1}{\alpha }{\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}{c}_{2}{\alpha }^{-\frac{1}{1+\nu }}=r$. Hence the first inequality in proposition 4.8 yields

Kν via real interpolation. Again we assume ${\mathbb{X}}_{A}$ is a Banach space such that (2.4) holds true. The next lemma shows that under assumption 2.2 the spaces ${\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }$ classify the image space approximation precision.

Proposition 4.10 (Kν as a real interpolation space). Suppose assumption 2.2 holds true. Let θ ∈ (0, 1) and $\nu {:=}\frac{\theta }{\left(1-\theta \right)\left(u-1\right)+\theta }$. We have ${K}_{\nu }={\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }$ with

with constants C1, C2 > 0 depending only on u, θ and M.

Proof. Assume ϱ := ϱν (x) < . Proposition 4.9 provides the bound

Let t > 0. We choose r := ϱ(1−θ)(u−1)+θ t−(1−θ)(u−1). If ɛ > 0 then there exists zK1 with ϱ1(z) ⩽ r and ${\Vert}Ax-Az{{\Vert}}_{\mathbb{Y}}{\leqslant}{\gamma }_{x}\left(r\right)+\varepsilon $. Therefore we obtain

For ɛ → 0 we obtain

This proves the first inequality.

Assume $n{:=}{\Vert}x{{\Vert}}_{{\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }}{< }\infty $. We prove a bound on γx and apply proposition 4.9. Let r > 0. We choose $t{:=}2{M}^{\frac{1}{\left(1-\theta \right)\left(u-1\right)}}{n}^{\frac{1}{1-\theta }}{r}^{-\frac{1}{\left(1-\theta \right)\left(1-u\right)}}$. Since 21−θ > 1 there exists $z\in {\mathbb{X}}_{A}$ such that

Neglecting the first summand on the left-hand side we obtain ϱ1(z) ⩽ r. Therefore

Proposition 4.9 yields ${\varrho }_{\nu }\left(x\right){\leqslant}8M{n}^{\frac{u-1}{\left(1-\theta \right)\left(u-1\right)+\theta }}$.□

Remark 4.11. As already exposed in example 4.3 we cannot expect assumption 2.2 to hold true for 1-type norms like Besov 0, 1, 1 or 0, 2, 1-norms. Nevertheless one may use proposition 4.9 directly to characterize the sets Kν in this case. Applying theorem 1 then reproduces the convergence rates results for the 0, 2, 1-penalty in [17] and for weighed 1-penalties in [20] in the case of linear operators.

4.4. Error bounds

We apply theorem 1 to obtain error bounds measured in the norm of certain Banach spaces ${\mathbb{X}}_{L}$ with a continuous embedding ${\mathbb{X}}_{L}\subset {\mathbb{X}}_{A}$.

To this end we consider the loss function $L:\mathbb{X}{\times}\mathbb{X}\to \left[0,\infty \right]$ given by $L\left({x}_{1},{x}_{2}\right)={\Vert}{x}_{1}-{x}_{2}{{\Vert}}_{{\mathbb{X}}_{L}}$ if ${x}_{1}-{x}_{2}\in {\mathbb{X}}_{L}$ and L(x1, x2) = if ${x}_{1}-{x}_{2}\notin {\mathbb{X}}_{L}$. Before we prove theorem 2 we state a proposition that characterizes for which spaces ${\mathbb{X}}_{L}$ Hölder-type bounds on the modulus of continuity on balls of a given quasi-Banach space ${\mathbb{X}}_{S}\subset \mathbb{X}$ are satisfied.

Proposition 4.12 (Bound on the modulus). Let ${\mathbb{X}}_{S}\subset \mathbb{X}$ be a quasi-Banach space and ${\mathbb{X}}_{L}$ a Banach space with continuous embeddings ${\mathbb{X}}_{S}\subset {\mathbb{X}}_{L}\subset {\mathbb{X}}_{A}$ and e ∈ (0, 1). For ϱ > 0 we denote

The following statements are equivalent:

  • (a)  
    There is a continuous embedding ${\left({\mathbb{X}}_{A},{\mathbb{X}}_{S}\right)}_{e,1}\subset {\mathbb{X}}_{L}$.
  • (b)  
    There exists a constant c > 0 with ${\Omega}\left(\delta ,{K}_{{\mathbb{X}}_{S}}^{\varrho }\right){\leqslant}c{\varrho }^{e}{\delta }^{1-e}$ for all δ, ϱ > 0.

Proof. By [2, section 3.5, theorem 3.11.4] statement (a) is equivalent to an interpolation inequality

Equation (4.1)

Let ${x}_{1},{x}_{2}\in {K}_{{\mathbb{X}}_{S}}^{\varrho }$ with ${\Vert}A{x}_{1}-A{x}_{2}{{\Vert}}_{\mathbb{Y}}{\leqslant}\delta $. The quasi-triangle inequality yields ${\Vert}{x}_{1}-{x}_{2}{{\Vert}}_{{\mathbb{X}}_{S}}{\leqslant}2c\varrho $ and from (2.4) we obtain ${\Vert}{x}_{1}-{x}_{2}{{\Vert}}_{{\mathbb{X}}_{A}}{\leqslant}M\delta $. Hence (4.1) with z = x1x2 yields ${\Vert}{x}_{1}-{x}_{2}{{\Vert}}_{{\mathbb{X}}_{L}}{\leqslant}C{M}^{1-e}{\left(2c\right)}^{e}{\varrho }^{e}{\delta }^{1-e}$. Taking the supremum over x1, x2 yields (b).

Assuming a bound on the modulus we obtain (4.1) from

Next we give the proof of theorem 2.

Proof of theorem  2.For ν as in proposition 4.10 the second inequality therein yields

with $\bar{\varrho }={\left({C}_{2}\varrho \right)}^{\frac{u-1}{\left(1-\theta \right)\left(u-1\right)+\theta }}$.

In view of theorem 1 it remains to prove an upper bound on ${\Omega}\left({c}_{1}\delta ,{K}_{\nu }^{{c}_{2}\bar{\varrho }}\right){\leqslant}C{\varrho }^{\frac{\xi }{\theta }}{\delta }^{1-\frac{\xi }{\delta }}$ for constants c1, c2 > 0 given therein. The first inequality in proposition 4.10 provides

The reiteration theorem (see [2, theorem 3.11.5]) yields

Equation (4.2)

with equivalent quasi-norms. In particular ${\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }\subset {\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\xi ,1}\subset {\mathbb{X}}_{L}\subset {\mathbb{X}}_{A}$. Hence proposition 4.12 with ${\mathbb{X}}_{S}={\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }$ yields a constant c4 with

with $C={c}_{4}{c}_{3}^{\frac{\xi }{\theta }}{c}_{1}^{1-\frac{\xi }{\theta }}$.

For the discrepancy principle the bound ${\Omega}\left({d}_{1}\delta ,{K}_{\nu }^{{d}_{2}\bar{\varrho }}\right){\leqslant}C{\varrho }^{\frac{\xi }{\theta }}{\delta }^{1-\frac{\xi }{\delta }}$ follows by replacing c1 by d1 and c2 by d2.

Remark 4.13. The statement in remark 2.3 for the limiting case θ = 1 follows along the same lines leaving out the step involving the reiteration theorem.□

Remark 4.14. The relation ${\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\xi ,1}\subset {\mathbb{X}}_{L}$ is necessary to obtain error bounds as in theorem 2 in the following sense: assuming ${\mathbb{X}}_{L}$ satisfies an error bound

for some e ∈ (0, 1) and all $x\in {K}_{{\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }}^{\varrho }$ under some a priori parameter choice α = α(δ), then the lower bound (2.3) yields

Thus the converse implication in proposition 4.12 and the identity (4.2) provides

Error bounds for weighted p -norm penalization. To prove corollary 2.4 we return to the setting of example 1.

Proof of corollary  2.4.First note that assumption 2.2 holds true by proposition 4.5. By [8, theorem 2, remark] we have

Hence by [2, theorem 3.4.1(b); section 3.11] there is a continuous embedding ${\left({\ell }_{\bar{a}}^{2},{\ell }_{\bar{s}}^{2p-2}\right)}_{\xi ,1}\subset {\ell }_{\bar{r}}^{p}$. Hence the choice ${\mathbb{X}}_{L}={\ell }_{\bar{r}}^{p}$ satisfies the assumption of theorem 2.

The interpolation spaces ${\left({\mathbb{X}}_{A},{K}_{1}\right)}_{\theta ,\infty }={\left({\ell }_{\bar{a}}^{2},{\ell }_{\bar{s}}^{2p-2}\right)}_{\theta ,\infty }$ are characterized by weighted weak p -spaces ${\ell }_{\mu ,\nu }^{t,\infty }$ in the following manner:

with equivalent quasi-norms (see [8, theorem 2]).

The application of theorem 2 yields corollary 2.4 and remark 2.5 follows from remark 2.3.□

Error bounds for Besov 0, p, p -penalties. Next we revisit example 2.

Proof of corollary  2.7.Here assumption 2.2 holds true by proposition 4.6. The identification [8, theorem 2, remark] for p ≠ 2 and [27, 3.3.6.(9)] for p = 2 yield

Equation (4.3)

Hence the choice ${\mathbb{X}}_{L}={b}_{p,p}^{0}$ satisfies the assumption in theorem 2.

We apply theorem 2 to obtain corollary 2.7. Remark 2.8 follows from remark 2.3.□

Furthermore we prove the nestings given in proposition 2.9.

Proof of proposition 2.9. Let $\theta =\frac{p-1}{p}\frac{s+a}{a}$. Then $\frac{1}{t}=\frac{1-\theta }{2}+\frac{\theta }{\tilde {t}}$. With [8, theorem 2, remark] and [2, theorem 3.4.1(b)] we obtain

in both cases.

Suppose p < 2. Then $\tilde {t}{< }2$ and $t\in \left(\tilde {t},2\right)$. Let ɛ > 0 such that $t-\varepsilon \in \left(\tilde {t},2\right)$. There are $s{< }{s}^{\prime }{< }\tilde {s}$ and θ < θ' < 1 such that ${b}_{t-\varepsilon ,t-\varepsilon }^{{s}^{\prime }}={\left({b}_{2,2}^{-a},{b}_{\tilde {t},\tilde {t}}^{\tilde {s}}\right)}_{{\theta }^{\prime },t-\varepsilon }$. The reiteration theorem (see [2, theorem 3.11.5]) yields ${k}_{s}={\left({b}_{2,2}^{-a},{b}_{t-\varepsilon ,t-\varepsilon }^{{s}^{\prime }}\right)}_{\frac{\theta }{{\theta }^{\prime }},\infty }$. From tɛ < 2 we obtain the continuous embeddings ${b}_{2,2}^{-a}\subset {b}_{2,\infty }^{-a}\subset {b}_{t-\varepsilon ,\infty }^{-a}$ (see [27, 3.2.4(1), 3.3.1(9)]). Together with the interpolation result ${b}_{t-\varepsilon ,\infty }^{s}={\left({b}_{t-\varepsilon ,\infty }^{-a},{b}_{t-\varepsilon ,\infty }^{r}\right)}_{\theta ,\infty }$ (see [27, 3.3.6 (9)]) we obtain the second inclusion using [27, 2.4.1 remark 4]. By [27, 3.3.1(9)]) therefore obtain the second inclusion for all 0 < epsilon < t.

For p > 2 we have ${b}_{\tilde {t},\tilde {t}}^{\tilde {s}}\subset {b}_{2,\tilde {t}}^{\tilde {s}}$ (see [27, 3.3.1(9)]). Hence [27, 3.3.6 (9)] and [27, 2.4.1 remark 4] yield ${k}_{s}\subset {\left({b}_{2,2}^{-a},{b}_{2,\tilde {t}}^{\tilde {s}}\right)}_{\theta ,\infty }={b}_{2,\infty }^{s}$.□

Error bounds for Besov 0, 2, q -penalties. Next we treat example 3.

Proof of corollary  2.10.Due to proposition 4.7 the assumption 2.2 is satisfied. By [27, 3.3.6.(9)] we have

Therefore the choice ${\mathbb{X}}_{L}={b}_{2,2}^{0}$ satisfies the assumption on ${\mathbb{X}}_{L}$ in theorem 2.

Moreover for $0{< }s{< }\frac{a}{q-1}$ we have

Hence the application of theorem 2 yields corollary 2.10 and remark 2.3 yields remark 2.11.

Error bounds for the Radon transform. Finally we turn to proof of the convergence rate result with the Radon transform as forward operator.

Proof of corollary  2.12.

  • (a)  
    Since a < smax the synthesis operator $\mathcal{S}$ is a norm isomorphism ${b}_{2,2}^{-a}\to {B}_{2,2}^{-a}\left({\Omega}\right)$. Hence the operator $R{\circ}\mathcal{S}$ satisfies (2.4) with ${\mathbb{X}}_{A}={b}_{2,2}^{-a}$.The inequality $\frac{d}{p}-\frac{d}{2}{\leqslant}a$ implies σt smaxσt s. Hence $\mathcal{S}:{b}_{t,t}^{s}\to {B}_{t,t}^{s}\left({\Omega}\right)$ is a norm isomorphism. Let c1 be the operator norm of the inverse of $\mathcal{S}$. Then $f=\mathcal{S}x$ with $x\in {b}_{t,t}^{s}$ and ||x||s,t,t c1 ϱ. Let c2 be the embedding constant of ${b}_{t,t}^{s}\subset {k}_{s}$ (see proposition 2.9). Then we obtain xks with ${\Vert}x{{\Vert}}_{{k}_{s}}{\leqslant}{c}_{1}{c}_{2}\varrho $.With ${\hat{x}}_{\alpha }$ given by corollary 2.7 we obtain the bound
    with a constant $\tilde {C}{ >}0$ independent of f, δ and ϱ. Hence the first bound in corollary 2.4 implies
    with c3 the operator norm of $\mathcal{S}:{b}_{p,p}^{0}\to {B}_{p,p}^{0}\left({\Omega}\right)$. The bound in the Lp -norm for p ⩽ 2 follows from the continuity of the embedding ${B}_{p,p}^{0}\left({\Omega}\right)\subset {L}^{p}\left({\Omega}\right)$ (see [27]).
  • (b)  
    This follows along the lines of the proof of (a) using corollary 2.10 instead of corollary 2.7.

5. Connection to other source conditions

In this section we return to the setting of subsection 2.1 and assume only the assumptions in the first lines of subsection 2.1. The aim of this section is the proof theorem 3.

5.1. A preliminary: differentiability of the minimal value function

Definition 5.1 (Minimal value function). For $g\in \mathbb{Y}$ we define

independent of the choice ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left(g\right)$.

The main result of this subsection is the differentiability of the minimal value function. The approximation error ${\Vert}g-A{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ is represented by calculus rules of ϑg .

Recall that the Moreau envelope function of some function $\mathcal{Q}:\enspace \mathbb{Y}\to \left(-\infty ,\infty \right]$ for α > 0 is given by

and the infimum is uniquely attained at ${\text{Prox}}_{\alpha \mathcal{Q}}\left(g\right)\in \mathbb{Y}$. The key ingredient is the following result by T. Strömberg:

Lemma 5.2 (See [25, proposition 3(iii)]). Let $\mathcal{Q}:\enspace \mathbb{Y}\to \left(-\infty ,\infty \right]$ be convex, proper and lower semi-continuous. The family of Moreau envelope functions ${\mathcal{Q}}_{\alpha }:\mathbb{Y}\to \mathbb{R}$, α > 0 satisfies

We apply lemma 5.2 to the function $\mathcal{Q}$ defined in lemma 3.1. Note that due to proposition 3.3 we have

Equation (5.1)

Proposition 5.3. Let $g\in \mathbb{Y}$ and ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left(g\right),\alpha { >}0$ any selection. The function ϑg is convex, non-increasing and continuously differentiable with

Proof. The Moreau envelope function ${\mathcal{Q}}_{\alpha }$ is convex, real valued and continuous with the Fenchel conjugate ${\left({\mathcal{Q}}_{\alpha }\right)}^{{\ast}}={\mathcal{Q}}^{{\ast}}+\frac{\alpha }{2}{\Vert}\cdot {{\Vert}}_{\mathbb{Y}}^{2}$ (see [1, propositions 12.15 and 13.21]). The biconjugation theorem implies

Hence ϑg is convex and non-increasing being the supremum of affine non-increasing functions.

By [1, proposition 12.29] ${\mathcal{Q}}_{\alpha }$ is Fréchet differentiable with $\nabla {\mathcal{Q}}_{\alpha }=\frac{1}{\alpha }\left({\text{Id}}_{\mathbb{Y}}-{\text{Prox}}_{\alpha \mathcal{Q}}\right)$. Lemma 5.2 yields differentiability of $\alpha {\mapsto}{\mathcal{Q}}_{\alpha }\left(g\right)$ with derivative $-\frac{1}{2}{\Vert}\left(\nabla {\mathcal{Q}}_{\alpha }\right)\left(g\right){{\Vert}}^{2}$ for all $g\in \mathbb{Y}$. Therefore, ϑg is differentiable and we conclude with proposition 3.3

Finally, ϑg ' is continuous as ϑg is convex and differentiable.□

5.2. Defect function and its link to variational source conditions

For the rest of this paper we always assume $x\in \mathrm{dom}\left(\mathcal{R}\right)$ is $\mathcal{R}$-minimal in A−1({Ax}) and xα Rα (Ax) for α > 0 is any selection of a minimizer for exact data.

If A is injective then the minimality is trivially satisfied for all $x\in \mathrm{dom}\left(\mathcal{R}\right)$.

As already mentioned we consider the defect of the Tikhonov functional ${\sigma }_{x}:\left(0,\infty \right)\to \left[\right.0,\infty \left.\right)$ given by

The next proposition collects properties of the defect function.

Lemma 5.4. 

  • (a)  
    σx is concave, non-decreasing and continuously differentiable with ${\sigma }_{x}^{\prime }\left(\alpha \right)=\frac{1}{2{\alpha }^{2}}{\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}^{2}$.
  • (b)  
    We have limα↘0σx (α) = 0.
  • (c)  
    The function $\left(0,\infty \right)\to \left[\right.0,\infty \left.\right)$ given by $\alpha {\mapsto}{\sigma }_{x}\left(\frac{1}{\alpha }\right)$ is convex and continuous.

Proof. We have ${\sigma }_{x}\left(\alpha \right)=\mathcal{R}\left(x\right)-{\vartheta }_{Ax}\left(\alpha \right)$ with the minimal value function ϑAx from definition 5.1. Hence (a) follows from proposition 5.3. Lemma A.2 yields (b) because of the $\mathcal{R}$-minimality assumption on x.

Let h be the function given in (c) Then h is differentiable and (a) yields

By lemma A.1(b) the function $\alpha {\mapsto}{\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ is non-decreasing. Hence h' is non-decreasing. Therefore h is convex. Continuity follows from the first statement.□

Let α > 0. We write

to note a similarity to the distance function in [7, (3.1)] and [6, chapter 12] and [16, chapter 3] used to derive variational source conditions of the form (2.8). In [16, proposition 4] its shown that a variational source condition (2.8) implies bounds on the defect function σx . The next result provides a sharp connection between bounds on the defect function and variational source conditions. We introduce two partially ordered sets of functions

with pointwise ordering. Here l.s.c. is an abbreviation for lower semi-continuous. Moreover, we consider the map $\mathcal{F}:{\Sigma}\to {\Phi}$ given by

Equation (5.2)

In lemma C.2 we prove that $\mathcal{F}$ is well-defined, order preserving and bijective. The order preserving inverse ${\mathcal{F}}^{-1}:{\Phi}\to {\Sigma}$ is given by

Equation (5.3)

By lemma 5.4 we have σx ∈ Σ. It turns out that $\phi =\mathcal{F}\left({\sigma }_{x}\right)$ is the minimal function in Φ satisfying (2.8).

Lemma 5.5. Let ϕ ∈ Φ. Then the following statements are equivalent:

  • (a)  
    $\mathcal{F}\left({\sigma }_{x}\right){\leqslant}\phi $.
  • (b)  
    ${\sigma }_{x}{\leqslant}{\mathcal{F}}^{-1}\left(\phi \right)$.
  • (c)  
    $\mathcal{R}\left(x\right)-\mathcal{R}\left(z\right){\leqslant}\phi \left({\Vert}Ax-Az{{\Vert}}_{\mathbb{Y}}^{2}\right)\quad \text{for}\enspace \text{all}\enspace z\in \mathbb{X}$.

In particular, we always have

Equation (5.4)

Proof. The equivalence of (a) and (b) is immediate by lemma C.2. Next we prove (5.4). To this end let $z\in \mathbb{X}$ and α > 0. Then

and $\mathcal{R}\left(x\right)={T}_{\alpha }\left(x,Ax\right)$. We obtain

Taking the infimum over α on the right-hand side yields (5.4).

Hence (a) implies (c). Assuming (c) we estimate

Hence ${\sigma }_{x}{\leqslant}{\mathcal{F}}^{-1}\left(\phi \right)$. This yields (a) as $\mathcal{F}$ is order preserving.□

Remark 5.6. Inequality (5.4) is sharp for z = xα Rα (g) for all α > 0. To see this note that by definition $\left(\mathcal{F}\left({\sigma }_{x}\right)\right)\left(t\right){\leqslant}{\sigma }_{x}\left(\alpha \right)+\frac{1}{2\alpha }t$ for all t ⩾ 0 and α > 0. By (5.4) we have

5.3. Link between defect function and image space approximation

The result of this subsection is a that σx and hence also the smallest index function ϕ allowing for a variational source condition (2.8) depend only on the net ${\left({\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}\right)}_{\alpha { >}0}$. Further we will exploit a condition when a bound ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}\psi \left(\alpha \right)$ implies a bound on the defect function σx .

Lemma 5.7. We have

Equation (5.5)

Proof. Let 0 < ɛ < α. Lemma 5.4(a) yields

In view of lemma 5.4(b) the expression for σx follows by taking the limit ɛ → 0.□

Proposition 5.8 (Image space approximation).

  • (a)  
    We have
  • (b)  
    Let $\psi :\left[\right.0,\infty \left.\right)\to \left[\right.0,\infty \left.\right)$ be continuous. Assume that there is a constant Cψ > 0 with
    Equation (5.6)
    Then a bound ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}{\leqslant}\sqrt{2\alpha \psi \left(\alpha \right)}$ for all α > 0 implies σx (α) ⩽ Cψ ψ(α) for all α > 0.

Proof. 

  • (a)  
    By lemma 5.4 the continuous extension of σx to $\left[\right.0,\infty \left.\right)$ is concave. Hence the claim follows from
  • (b)  
    Using (5.5) and (5.6) we obtain

5.4. Equivalence theorem for Hölder-type bounds

Proof of theorem 3. (a) ⇒ (b). Consider the continuous function

Then ${c}_{1}{\alpha }^{\nu }=\sqrt{2\alpha \psi \left(\alpha \right)}$ for all α > 0. We have

Hence (5.6) is satisfied with ${C}_{\psi }=\frac{1}{2\nu -1}$. Proposition 5.8 implies ${\sigma }_{x}\left(\alpha \right){\leqslant}\frac{{c}_{1}^{2}}{4\nu -2}{\alpha }^{2\nu -1}$.

(b) ⇒ (c): for σ(α) := c2 α2ν−1 inserting $\alpha ={\left(\frac{t}{2c}\right)}^{\frac{1}{2\nu }}$ yields

Lemma 5.5 with $\phi =\mathcal{F}\left(\sigma \right)$ yields the claim.

(c) ⇒ (a): (see also [16, proof of proposition 4]) the first order condition

provides

Solving for ${\Vert}Ax-A{x}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ yields the claim.□

6. Discussion and outlook

We close this paper by addressing some open questions and possible extensions.

The identification of ARα as a proximity mapping (see subsection 3.1) seems to be a new structural insight in convex regularization theory. It allows to apply convex analysis tools leading to interesting statements and new simple proofs (see e.g. corollary 3.5, proposition 5.3, lemma 3.7, lemma A.1). So far the presented theory is limited to Hilbert space data fidelity terms. It would be interesting to generalize the arguments in subsection 3 to Banach spaces $\mathbb{Y}$. A generalization to nonlinear operators seems even more challenging.

So far the presented theory is restricted to Hölder-type convergence rates. To also cover exponentially ill-posed problems it is of interest to investigate logarithmic convergence rates and source conditions. At first sight condition (5.6) seems to fail for index functions not of Hölder-type. Thus it remains open whether an equivalence between image space approximation rates and variational source conditions remains valid for more general upper bounds.

As for approaches using variational source conditions the fastest convergence rate we are able to prove for a p-homogeneous penalty term is $\mathcal{O}\left(\frac{1}{p}\right)$ (see remarks 2.5, 2.8 and 2.11). It seems to be an interesting question to extend the presented approach to higher order convergence rates.

Another direction is the application to further concrete settings as in the three presented examples. An idea is to formulate a weaker version of assumption 2.2 by require a nesting ${\mathbb{X}}_{1a}\subseteq {K}_{1}\subseteq {\mathbb{X}}_{1b}$ with quasi-Banach spaces ${\mathbb{X}}_{1a},{\mathbb{X}}_{1b}$ and try to prove a generalized version of theorem 2. The author believes that this approach would cover e.g. Besov norm penalties with mixed indices p, q with p ≠ 2.

Data availability statement

No new data were created or analysed in this study.

Acknowledgments

I would like to thank Thorsten Hohage and Benjamin Sprung for fruitful discussions, Matthew Tam and Russell Luke for their support concerning convex analysis topics and Thomas Strömberg for lemma 5.2. Financial support by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) through Grant RTG 2088—B01 is gratefully acknowledged.

Appendix A.: Elementary facts from regularization theory

Lemma A.1. Let $g\in \mathbb{Y}$ and ${\hat{x}}_{\alpha }\in {R}_{\alpha }\left(g\right)$, α > 0 any selection.

  • (a)  
    The function $\left(0,\infty \right)\to \mathbb{R}$ given by $\alpha {\mapsto}\mathcal{R}\left({\hat{x}}_{\alpha }\right)$ is non increasing.
  • (b)  
    The function $\left(0,\infty \right)\to \left[\right.0,\infty \left.\right)$ given by $\alpha {\mapsto}{\Vert}g-A{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ is non decreasing.
  • (c)  
    The function $\left(0,\infty \right)\to \left[\right.0,\infty \left.\right)$ given by $\alpha {\mapsto}\frac{1}{\alpha }{\Vert}g-A{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{Y}}$ is non increasing.

Proof. To prove (a) and (b) let α < β. Set $m=\frac{1}{2}{\Vert}g-A{\hat{x}}_{\alpha }{{\Vert}}_{\mathbb{Y}}^{2}-\frac{1}{2}{\Vert}g-A{\hat{x}}_{\beta }{{\Vert}}_{\mathbb{Y}}^{2}.$ From ${T}_{\alpha }\left({\hat{x}}_{\alpha },g\right){\leqslant}{T}_{\alpha }\left({\hat{x}}_{\beta },g\right)$ and ${T}_{\beta }\left({\hat{x}}_{\beta },g\right){\leqslant}{T}_{\beta }\left({\hat{x}}_{\alpha },g\right)$ we obtain

Hence m ⩽ 0. (c) follows from proposition 5.3.□

Lemma A.2. Let $x\in {\mathbb{X}}_{A}$ and xα Rα (Ax), α > 0 any selection. Then

Proof. Due to (5.1) and [1, proposition 12.32] we have

with $\mathcal{Q}$ defined in lemma 3.1 and ${\mathcal{Q}}_{\alpha }$ its Moreau envelope (see subsection 5.1).□

Appendix B.: Properties of Banach spaces

Proposition B.1. 

  • (a)  
    Let p ∈ [1, ) and $\omega ={\left({\omega }_{j}\right)}_{j\in {\Lambda}}$ a sequence of positive reals. Let p' ∈ (1, ] with $\frac{1}{p}+\frac{1}{{p}^{\prime }}=1$. Then the pairing
    is well defined and gives rise to an isometric isomorphism ${\left({\ell }_{\omega }^{p}\right)}^{\prime }\cong {\ell }_{{\omega }^{-1}}^{{p}^{\prime }}$.
  • (b)  
    Let p, q ∈ [1, ) and $s\in \mathbb{R}$. Then the pairing
    is well defined and gives rise to an isometric isomorphism ${\left({b}_{p,q}^{s}\right)}^{\prime }\cong {b}_{{p}^{\prime },{q}^{\prime }}^{-s}$. (See [27, 2.11.2 (1)])

Proposition B.2. [23, lemma 8.21]. Let $A:\mathbb{X}\to \mathbb{Y}$ be a bounded linear operator between Banach spaces and $\xi \in {\mathbb{X}}^{\prime }$. The following statements are equivalent:

  • (a)  
    There exists a constant c ⩾ 0 such that $\langle \xi ,x\rangle {\leqslant}c{\Vert}Ax{{\Vert}}_{\mathbb{Y}}$ for all $x\in \mathbb{X}$.
  • (b)  
    There exists $\omega \in {\mathbb{Y}}^{\prime }$ with ${\Vert}\omega {{\Vert}}_{{\mathbb{Y}}^{\prime }}{\leqslant}c$ and A*ω = ξ.

Appendix C.: Index function calculus

Let ${\Gamma}{:=}\left\{f:\mathbb{R}\to \left(-\infty ,\infty \right]:f\;\text{is}\;\text{proper,}\;\text{convex}\;\text{and}\;\text{lower}\;\text{semi}-\text{continuous}\right\}$.

Lemma C.1. Suppose f ∈ Γ. Then

  • (a)  
    f is positive with dom(f) ⊆ (−, 0] if and only if ${f}^{{\ast}}{\vert }_{\left[\right.0,\infty \left.\right)}{\leqslant}0$.
  • (b)  
    f is non-decreasing if and only if $\mathrm{dom}\left({f}^{{\ast}}\right)\subseteq \left[\right.0,\infty \left.\right)$.

Proof. 

  • (a)  
    f is positive with dom(f) ⊆ (−, 0] if and only if χ(−,0]f. ${f}^{{\ast}}{\vert }_{\left[\right.0,\infty \left.\right)}{\leqslant}0$ if and only if ${f}^{{\ast}}{\leqslant}{\chi }_{\left[\right.0,\infty \left.\right)}$. Hence the claim follows from ${\chi }_{\left[\right.0,\infty \left.\right)}^{{\ast}}={\chi }_{\left(-\infty ,0\right]}$.
  • (b)  
    Suppose f is non-decreasing and let t < 0. Let β0 ∈ dom(f). Then
    As βtf*(β0) → for β → − this shows ${f}^{{\ast}}\left(t\right)={\mathrm{sup}}_{\beta \in \mathbb{R}}\enspace \beta t-f\left(\beta \right)=\infty $. Hence $\mathrm{dom}\left({f}^{{\ast}}\right)\subseteq \left[\right.0,\infty \left.\right)$.Vice versa assume $\mathrm{dom}\left({f}^{{\ast}}\right)\subseteq \left[\right.0,\infty \left.\right)$. Then f(β) = supt⩾0  f*(t) is non-decreasing as a supremum over non-decreasing functions.□

Lemma C.2. The map $\mathcal{F}$ defined in (5.2) is well-defined, order preserving and bijective. The expression (5.3) holds true.

Proof. We define the following sets

By lemma C.1 the Fenchel conjugation *: Γ1 → Γ2 is an order reversing bijection and its inverse is given by the Fenchel conjugation *: Γ2 → Γ1. We will construct bijections ${\mathcal{G}}_{1}:{\Sigma}\to {{\Gamma}}_{1}$ and ${\mathcal{G}}_{2}:{{\Gamma}}_{2}\to {\Phi}$, such that $\mathcal{F}={\mathcal{G}}_{2}{\enspace {\circ}}^{\enspace {\ast}}\enspace {\circ}{\mathcal{G}}_{1}$.

Let σ ∈ Σ. Then we define

Then fσ is proper, non-decreasing and dom(fσ ) ⊂ (−, 0]. Convexity and lower semi-continuity of $\sigma \left(\frac{1}{\cdot }\right)$ yields convexity and lower semi-continuity of fσ on (−, 0). We have

Hence fσ is convex and lower semi-continuous.

It is easy to see that ${\mathcal{G}}_{1}:{\Sigma}\to {{\Gamma}}_{1}$ given by σfσ is an order preserving bijection. Its inverse is given by $\left({\mathcal{G}}_{1}^{-1}\left(f\right)\right)\left(\alpha \right)=f\left(-\frac{1}{2\alpha }\right)$.

Moreover, the map Γ2 → Φ given by $g{\mapsto}-\left(g{\vert }_{\left[\right.0,\infty \left.\right)}\right)$ is well defined, bijective and order reversing. Its inverse is given by ϕgϕ with

If σ ∈ Σ and t ⩾ 0 then

Hence

This shows $\mathcal{F}={\mathcal{G}}_{2}\enspace {{\circ}}^{\enspace {\ast}}\enspace {\circ}{\mathcal{G}}_{1}$. Therefore $\mathcal{F}$ is an order preserving bijection. It remains to compute ${\mathcal{F}}^{-1}={\mathcal{G}}_{1}^{-1}\enspace {{\circ}}^{\enspace {\ast}}\enspace {\circ}{\mathcal{G}}_{2}^{-1}$. If ϕ ∈ Φ and α > 0, then

Please wait… references are loading.