Adaptive minimax optimality in statistical inverse problems via SOLIT — Sharp Optimal Lepski˘ı-Inspired Tuning

,


Introduction
In this paper, we consider statistical linear inverse problems of the form with a linear and injective Hilbert-Schmidt operator T : X Ñ Y mapping between separable Hilbert spaces X and Y, and the standard Gaussian Hilbert space process Z : Y Ñ L 2 pΩ, A, Pq.
Here and in what follows, we denote by f : P X the true (but unknown) solution, by σ ą 0 the noise level, and by Y the available measurements.Note that Z in (1) can be identified with an element in Y only if dimpYq ă 8. Otherwise, we have Y R Y a.s., which implies that the model (1) has to be understood in a weak sense, i.e., for all y P Y we have access to the real-valued random variable xY, yy :" @ T f : , y D `σZpyq.
Statistical inverse problems of form (1) are a widely investigated field as they arise in applications spanning from astronomy over medical imaging to engineering, see, e.g., [7,17,32].The assumption of Gaussian white noise as in ( 1) is most common and can be justified by either central limit theorems or asymptotic equivalence statements [34].
To reconstruct the quantity of interest f : from the available data Y , we consider the filterbased regularization p f α " q α pT ˚T q T ˚Y, where pq α q αą0 is an ordered filter (cf.Definition 1 below).Spectral regularization schemes (2) based on filters were introduced by Bakushinski [3] and yield a variety of regularized estimators for the desired quantity f : .They have also received substantial attention in the literature (see, e.g., the monograph [11] for a deterministic treatment and [8,23,29] for a stochastic investigation).
An important property of the estimators p f α under investigation is their behavior as the noise level σ tends to 0. Due to the famous result by Schock [35], this behavior, if measured in the rate of convergence of the mean squared error (MSE) E "› › p , can be arbitrarily slow.However, under additional smoothness conditions on f : , it is in many situations possible to derive convergence rates with a monotonically increasing and continuous rate function ψ : R ě0 Ñ R ě0 , ψp0q " 0, for the ideal parameter choice α opt " α opt pσq given by We stress that the focus of this paper is on the order of convergence rate, namely, ψpσq, rather than the multiplying constant hidden in Landau's big O notation.Thus, for brevity, the optimality statement throughout is meant for the order of convergence rate, not the multiplying constant, without explicit declaration.Under reasonable assumptions on the filter pq α q αą0 , the rates of convergence can actually be shown to be minimax optimal, i.e., best possible under all estimators uniformly over a smoothness class (cf.Section 2 for details).This is a strong indicator that the chosen family of estimators will perform well in practice, but it is unfortunately useless for applications since the oracle parameter choice α opt cannot be implemented (as f : is unknown).Therefore, one is actually interested in practically implementable (i.e., not depending on unknown quantities such as f : or its smoothness) choices of α that mimic the minimax convergence behavior.
In this paper, we investigate a specific a posteriori parameter choice rule, which we call Sharp Optimal Lepskiȋ-Inspired Tuning (SOLIT).In terms of the estimator (2), the corresponding parameter p α is constructed as follows.We first choose a suitable finite set of candidates tα 0 , ..., α mmax u, ordered increasingly with respect to variance (or model complexity), and then set p α :" α x m with p m :" min where η m1,m2 ě 0 are explicitly computable critical values depending only on the noise level σ, the filter q α , and the forward operator T (see Section 3 for the formal definition and Section 5.2 for computational issues).It is derived from a more general principle proposed in [37], and our analysis exploits and extends results from that paper.The SOLIT rule p α " α x m can be seen as a variant of Lepskiȋ-type balancing principle.The Lepskiȋ-type balancing principles are well-known in inverse problems and typically of the form α LEP " α mLEP with m LEP :" min where µ k :" σ b trace `qα k pT ˚T q 2 T ˚T ˘and κ ě 1 is a tuning parameter, see, e.g., [4, 27-29, 40, 41].Parameter choice rules of the form α LEP have originally been introduced in [22], and the usage in inverse problems is summarized in the overview paper [27].It has been shown in [6,29] that α LEP leads to an oracle inequality which allows to compare the MSE under α LEP , i.e., E " › › p with high probability up to a ?m max exp `´Cκ 2 ˘term.From this one can deduce that a reasonable choice of κ (depending on the minimax rate of convergence, see Section 2 below, and hence on the ill-posedness of T ) leads to minimax rates of convergence up to a logarithmic factor.
Opposed to this, we will show that the SOLIT rule p α " α x m yields the best possible rate (precisely the adaptive minimax rate, see also Section 2 below) of convergence without loss of logarithmic factors and without the necessity to adjust any tuning parameters to the ill-posedness of T over a wide range of smoothness classes for f : .One reason behind this is that the SOLIT p α employs a subtle upper bound η m1,m2 depending on both m 1 and m 2 , while the Lepskiȋ-type balancing principle α LEP utilises an upper bound µ m2 depending just on m 2 .Also importantly, SOLIT requires a special choice of the candidate parameters α 0 , . . ., α mmax , see Section 3.2.1 below.
The outline of this paper is as follows.In Section 2 we discuss the questions of minimax optimality and adaptation in inverse problems and provide the corresponding rates of convergence in a wide class of problems.This also serves as an overview or a review of already known results while adding some new aspects.Section 3 is then devoted to filter-based regularization and a detailed derivation of our parameter choice SOLIT.In this section, we also state a generic oracle inequality for the risk based on the SOLIT.The final result on rates of convergence including all necessary assumptions and computations is then presented in Section 4. In Section 5 we discuss how the SOLIT can be implemented and provide details on the computation of the critical values η m1,m2 involved in p α. Furthermore we present various numerical simulations supporting our analysis, before we end this paper with some conclusions in Section 6.

Minimax optimality and adaptation
Before we define and analyze the SOLIT, let us recall the minimax paradigm in statistical inverse problems.Note that the forward operator T in (1) is assumed to be compact, injective and Hilbert-Schmidt.Thus, there is an eigendecomposition of T ˚T with strictly positive eigenvalues tλ k u kPN ordered decreasingly and corresponding eigenfunctions te k u kPN , such that We stress that this decomposition is only used in our theoretical analysis and is not necessarily required for the construction of our estimator or the SOLIT parameter choice, see Section 5.
The degree of ill-posedness of the model ( 1) is often characterized by the decay rate λ k Ñ 0 as k Ñ 8. Most commonly assumed is a polynomial rate of decay with 0 ă c min ď c max and a ą 0. This is often referred to as mild ill-posedness.Applications where this occurs include medical X-ray tomography (with T the d-dimensional Radon transform, d P N ě2 , which satisfies (5) with a " d ´1, see [31]), parameter identification problems in partial differential equations (see, e.g., [18]), and density estimation in statistics (see, e.g., [39]), to name a few.Also widely accepted are models with an exponential decay of the singular values c min k ´a expp´bk ϑ q ď λ k ď c max k ´a expp´bk ϑ q for all k P N ( with a P R and b, ϑ ą 0, which is typically called severe ill-posedness.Some examples include inverse heat equation problems [12] as well as image deblurring and light microscopy (where T is the convolution operator with, e.g., a Gaussian kernel) [20].In most of this paper, we will focus on mild ill-posedness (5), but in Section 4.2 we will also study the severely ill-posed case (6).
As mentioned in the Introduction, nothing can be said without further assumptions on the unknown f : .Typically, one poses spectral source conditions of the form f : P W ϕ pρq :" tf P X : f " ϕ pT ˚T q g for }g} X ď ρu (7) with an index function ϕ : R ě0 Ñ R ě0 (i.e., monotonically increasing, continuous and ϕp0q " 0).With the eigendecomposition of T ˚T in (4), we can rewrite the source condition (7) equivalently as an ellipsoid where w k " ϕpλ k q ´2 for k P N. For several operators T , spectral source conditions with specific index functions ϕ can be interpreted as classical smoothness conditions in terms of Sobolev spaces.Note that for every f : there exist an index function ϕ and a parameter ρ ą 0 such that f : P W ϕ pρq (see [26]), but the decay behavior of ϕ at 0 can be arbitrarily slow.
We are particularly interested in index functions ϕ of form ϕpxq " ϕ ν,τ pxq :" where either ν ą 0 and τ P R (referred to as polynomial smoothness), or ν " 0 and τ ą 0 (referred to as logarithmic smoothness).To ensure well-definedness of ϕ ν,τ pT ˚T q, we assume that λ 1 ď expp´1q, without loss of generality.Given an index function ϕ and a parameter ρ ą 0, one can now ask for the minimax risk of the MSE, denoted by ψ pσq " ψ pσ, ϕ, ρ, T q :" inf Here and below (e.g., in ( 12)) an estimator is any measurable mapping from Y ˚to X , with typical Borel σ-algebras, where Y ˚denotes the space of linear mappings from Y to L 2 pΩ, A, Pq.To derive the minimax risk (9), it turns out that spectral cutoff -corresponding to the filter-based regularization (2) with q α pλq :" λ ´11tλ ě αu -is particularly helpful.It has been shown, e.g., in [13,19,25], that, for known smoothness parameters ν and τ , the spectral cutoff estimator achieves the minimax rate of convergence (this is the same rate as the minimax risk in (9) as σ Ñ 0) with the index function ϕ ν,τ in (8).Furthermore, a modified spectral cutoff [33], often known as Pinsker's filter, is shown to achieve asymptotically the exact minimax risk (i.e., front constants are also optimal) in many cases, e.g., when sup kPN λ k {λ k`1 ă 8, and the radius ρ is also known, see [14].However, note that the minimax risk (9) allows the estimator to have knowledge of ϕ, which is typically unknown in practice.A more realistic setting is to suppose that ϕ P Φ with a collection of index functions Φ.For example, it might be a priori known that f : is of polynomial smoothness, which corresponds to Φ ν0 " tϕ " ϕ ν,τ : 0 ă ν ď ν 0 , τ P Ru , (10) or that a f : is of logarithmic smoothness, i.e., Φ τ0 " tϕ " ϕ ν,τ : ν " 0, 0 ă τ ď τ 0 u , where ν 0 , τ 0 ą 0 are known.In this case, one is interested to adapt to the unknown smoothness over Φ, and wants to achieve the minimal price of adaptation over Φ with respect to the MSE, defined as ̟pσq " ̟pσ, Φ, ρ, T q :" inf with ψpσ, ϕ, ρ, T q defined in (9).For a fixed ϕ P Φ, we refer to ψ ˚pσq " ψ ˚pσ, ϕ, Φ, ρ, T q :" ̟pσ, Φ, ρ, T qψpσ, ϕ, ρ, T q (13) as the adaptive minimax risk of the MSE over Φ.Clearly, it holds that the adaptive minimax risk ψ ˚pσ, ϕ, Φ, ρ, T q is always no better (i.e., no faster decay as the argument tends to 0) than the minimax risk ψpσ, ϕ, ρ, T q, since ̟pσq ě 1.There are situations that both risks coincide up to a constant factor and also situations that both differ by a logarithmic factor depending on σ even when Φ contains only two index functions.Determining the adaptive minimax risk in (13) or equivalently the minimal price of adaptation in ( 12) is more delicate.In case of logarithmic smoothness (i.e., Φ τ0 ), the convergence rate of the adaptive minimax risk (13) coincides with that of the (non-adaptive) minimax risk (9) even for Φ :" Ť τ0ą0 Φ τ0 .Interestingly, the order of optimal adaptation rates can be attained by, e.g., the spectral cutoff with regularization parameter ασ, which depends solely on the noise level, and thus requires no knowledge of the smoothness parameter τ and the radius ρ (see Proposition A.1 in A for general ordered filters).In case of polynomial smoothness (i.e., Φ ν0 ), the relation between the minimax risk and the adaptive minimax risk is subtle and depends on the degree of ill-posedness.For mildly ill-posed problems (which we mainly consider), the optimal adaptation rates can be equal to the optimal rates, as it shown by [9] using a spectral cutoff variant together with a penalized blockwise Stein's rule, see also [23].For severely ill-posed problems (i.e., eigenvalues of T ˚T decaying at an exponential rate; cf.(6) and Section 4.2), the optimal adaptation rates can differ from the optimal rates by a logarithmic factor.This has, e.g., been shown [38], and we will discuss this in detail in Section 4.2.The difference to the minimax risk is exactly the price that one has to pay for the missing information about the underlying smoothness of the truth.
One of the main results of this paper is that the filters of form (2) with the SOLIT rule are able to attain the sharp order of optimal adaptation rates for different combinations of smoothness and mild ill-posedness without manually adjusting the tuning parameters.In particular, this shows that the typical understanding in inverse problems that one has to lose a log-factor (see, e.g., Theorem 1 in [29]) when applying Lepskiȋ-type methods is not correct.

Filter-based regularization and SOLIT
Before we introduce the SOLIT rule, let us briefly set up the notation for filter-based regularization of (1).

Filter-based regularization
In this paper, we consider the estimator p f α in form of an ordered filter q α p¨q, i.e., where α is a tuning parameter (i.e., regularization parameter).

Ordered filters
Definition 1 (Ordered filters).The function q α p¨q : r0, }T ˚T }s Ñ R ě0 , indexed by α P A, is called an ordered filter, if it satisfies: 1.For all x P r0, }T ˚T }s and for all α P A, α q α pxq ď C 1 q and x q α pxq ď C 2 q " 1.
Here A Ă R ą0 has an accumulation point at 0.
Remark 3.1.In comparison with the literature (e.g., [23]), we make here two additional but rather weak assumptions, namely, the nonnegativity of the filter q α p¨q ě 0, and the nonexpansiveness C q 2 " 1.In particular, Definition 1 still includes common examples of regularization methods, such as spectral cutoff, Tikhonov, iterated Tikhonov, Landweber, and Showalter, etc. Ž

Spectral source condition
For the analysis of filter-based regularizations, further conditions on f : are necessary.Despite the discussion in the Introduction, we will work with a more general setting here (i.e., with general source conditions and general decay of the singular values of the forward operator).

We assume
where ϕ ´1p¨q is the inverse function of ϕp¨q.
Assumption 1(ii) is made only for technical convenience, and imposes no restriction, as one can always work with a convex surrogate of φ in case that this assumption is violated.In particular, the index function in (8) satisfies Assumption 1.
We further require some compatibility between ordered filters and smoothness characterizations.
Assumption 2 (Qualification).The index function ϕ in ( 7) is a qualification of ordered filter q α in the sense that sup xPr0, }T ˚T }s ϕpxq `1 ´xq α pxq ˘ď C ϕ ϕpαq for α P A.
As a consequence of Definition 1(i), and Assumptions 1(i) and 2, it holds that lim αÑ0 x q α pxq " 1 for all x P r0, }T ˚T }s.It further follows that q α pT ˚T q Ñ pT ˚T q ´1 as α Ñ 0. Thus, the filterbased regularization can be seen as a stable approximation of the least squares estimate.
In this paper, the notation lim αÑ0 or α Ñ 0 means that α goes to zero while remains in A. This is possible, since 0 is an accumulation point of A.

SOLIT
Let us now introduce the SOLIT rule in detail.

Candidates for the regularization parameter α
As many other parameter choice methods in the literature, our method relies heavily on a suitably chosen set of candidate parameters α 0 ą α 1 ą ¨¨¨ą α mmax ą 0. Generally, the value of regularization parameter can be interpreted as some sort of "degree of model complexity": the smaller α j (and hence the larger j), the smaller the bias and the larger the variance of the investigated estimator p f αj .A standard way is to consider geometrically equispaced candidates, i.e.
where rxs A :" arg min zPA |z ´x| and θ ą 1 is a tuning parameter.The maximal index m max is determined to ensure that the smallest candidate α max is of the same order as σ 2 .The "rounding" function rxs A is necessary to ensure that only regularization parameters out of the possible set A are considered as candidates.For instance, in case of spectral cutoff, we have that α m P tλ 1 , λ 2 , . ..u as rxs A " λ kx with k x " arg min kPN |x ´λk |.
For the SOLIT, we proceed differently and assume that we are able to discretize A in a way that not the candidate parameters α j decrease geometrically, but the variance of p f αj does.Let us therefore introduce the function which is independent of the noise level σ and depends only on the operator T (via its singular values) and the ordered filter q α .We briefly collect some properties of V T,q : Lemma 3.2.Let q α be an ordered filter such that some index function ϕ is a qualification.Then the function V T,q : A Ñ R obeys the following properties: 1. V T,q is decreasing.
Proof.As discussed above, the function V T,q is decreasing.Furthermore, the series in the definition (14) converges uniformly over all compact sets of p0, }T ˚T }s according to the first upper bound in Definition 1(i).As α Þ Ñ q α pλq is assumed to be continuous for fixed λ, this shows that V T,q as a uniformly convergent sum over continuous functions is also continuous itself on p0, }T ˚T }s.
Now we set α 0 ą 0 arbitrarily and generate the sequence pα n q nPN recursively by α n " max tα ă α n´1 : V T,q pαq " θ 1 V T,q pα n´1 qu , n P N.
As V T,q is continuous and obeys Lemma 3.2(ii), this maximum is attained by the intermediate value theorem.This choice clearly obeys (15) with θ 2 " θ 1 .
The above lemma, in particular, implies that Assumption 3 is satisfied if we consider filters such as Tikhonov, iterated Tikhonov or Showalter.
Given a sequence pα n q nPN satisfying Assumption 3, we now determine our candidates as α 0 ą α 1 ą ¨¨¨ą α mmax with m max " min m P N : σ 2 V T,q pα m q ě 1 ( .
Then it holds that with v m :" σ 2 V T,q pα m q (16c) and v mmax « 1 in our asymptotic considerations as σ Ñ 0. Ideally, α 0 should be chosen such that v 0 « σ 2 to ensure a good performance in practice, but this does not influence our asymptotic results at all, as α 0 is a fixed constant.Besides v m (shorthand notation for V T,q pα m q up to σ 2 ), we introduce another size of variance as u m :" This measures the "amplification" effect, if we view the covariance Var `p f αm ˘" σ 2 q αm pT ˚T qT ˚T q αm pT ˚T q " σ 2 T ˚T q αm pT ˚T q 2 as a linear operator on X .More precisely, it holds that u m " › › Var `p f αm ˘› › , where }¨} denotes the operator norm.

The SOLIT rule
Now let β ą 0 be a fixed parameter, e.g., β " 1.In the end, β can be seen as a tuning parameter of SOLIT, even though the results proven later hold independent of its specific value.
For m 1 , m 2 P t0, 1, . . ., m max u, we define the deterministic part of the comparison between m 1 and m 2 as b m1,m2 :" the random part as n m1,m2 :" and the variance (arising from the random part) as These terms arise naturally from the identity Note that b m1,m2 " b m2,m1 , n m1,m2 " n m2,m1 and v m1,m2 " v m2,m1 , but we often use such notation with the convention that the first index is smaller than the second index.Given m 1 ď m 2 , it holds that α m1 ě α m2 and 0 ď q αm 1 pλ k q ď q αm 2 pλ k q by the filter properties in Definition 1.Thus, Similar to the spirit of standard Lepskiȋ-type principles, we introduce an oracle choice α ˚:" α m˚, which is defined as see also [37]; If the set in ( 17) is empty, we define m ˚" m max .Note that m ˚is deterministic, and independent of the noise Z.However, this oracle rule is practically useless, since b m1,m2 is not accessible.A natural modification is to replace b m1,m2 by its empirical counterpart This leads to the following rule of parameter choice.

Adaptive minimax convergence rates via SOLIT
With the minimax paradigm from Section 2, we are now in position to derive results for filter based regularization (2) with α chosen according to SOLIT.

An oracle inequality
Using the results from [37], we can now state the following oracle inequality: Theorem 3.4 (Oracle inequality [37]).Assume the model in (1), and that Assumptions 1 and 4 hold.Let p f α " q α pT ˚T qT ˚Y with q α p¨q an ordered filter and let the oracle choice α m˚i n (17) and the SOLIT rule α x m P A θ1,θ2 " tα 0 , . . ., α mmax u in (16a)-(16c) with p m " p mpβ, γq in Definition 2.Then, it holds that where R m˚: " Er} p f αm ˚´f : } 2 s, and Proof.This can be shown by casting the abstract and general result developed in [37] into our particular setup.To this end, we only need to check the following two conditions: (i) Decreasingly ordered bias, i.e.
with C a constant independent of m ˚and γ.
Note that the properties (1) and (2) of ordered filter q α p¨q in Definition 1, in particular, the assumptions C 2 q " 1 and q α p¨q ě 0, imply that 0 ď 1 ´λk q αpλ k q ď 1 ´λk q α pλ k q for all k P N and α ă α.
Thus, the condition (i) above is satisfied.
From (16b) it follows that Further, noting θ 1 ą 1, we have i.e. the condition (ii) above is also satisfied for every γ ą 0.
Then, the assertion of this theorem follows immediately from Theorem B.1 and Proposition B.2 in the supplement of [37].
Remark 3.5.In many cases, we have v 0 {v m˚Ñ 0 as σ Ñ 0, and then C 1,m˚Ñ 0. Thus, the price that one needs to pay for the SOLIT rule, in comparison with the oracle choice, is asymptotically determined by C 2,m˚.The first term in C 2,m˚c an be bounded using the fact that v m˚ď R m˚.Note that u m are increasing with respect to m, and we can thus bound the second term in C 2,m˚b y bounding m ˚from above.The term logp1 `mmax q in C 2,m˚i s of order log logp1{σ 2 q by (16b), thus being asymptotically negligible in usual situations.Ž

Oracle risks
Next we recall some results on the convergence rates of a priori choice rules for parameters in the literature.To state those, we require a rather general assumption on the decay behaviour of the singular values.Functionals of T ˚T can be determined by functionals on eigenvalues tλ k u kPN of T ˚T , for instance, trpT ˚T q " ř 8 k"1 λ k .Calculations that involve summations over tλ k u kPN can be formulated as Lebesgue-Stieltjes integrals with respect to the counting measure Σpxq :" Σ `rx, 8q ˘:" # tk P N : λ k ě xu . ( We employ the techniques developed in [8] to tackle the difficulty caused by the discontinuity of Σp¨q and assume: Assumption 4 (Smooth surrogate).
1.There exists a continuous surrogate function S : p0, λ 1 s Ñ r0, 8q of Σp¨q in (19)  Lemma 3.6 (Bias-variance decomposition [8]).Assume the model in (1) and suppose that Assumptions 1, 2 and 4 hold.Let p f α " q α pT ˚T qT ˚Y with q α p¨q an ordered filter.Then there exist constants C b and C v such that sup More precisely, C b :" C 2 ϕ ρ 2 and C v :" C S maxtC 1 q , C 2 q u 2 , with C 1 q and C 2 q " 1 as in Definition 1, and C ϕ in Assumption 2.
Proof.It follows essentially from Theorem 3 in [8].As we slightly simplify (or modify) the requirement on the smooth surrogate of [8], we provide below the upper bound on the variance part for completeness.Let C q :" maxtC 1 q , C 2 q u.Then, by Definition 1, Note that lim αÑ0 αΣpαq " lim αÑ0 `αSpαq ˘`Σpαq{Spαq ˘" 0, by Assumption 4(1).Further, we apply partial integration and obtain where the last inequality is due to Assumption 4(2).
Remark 3.7.In fact, the term C b ϕpαq 2 in Lemma 3.6 is an upper bound for the squared bias part }f α ´f : } 2 and the term C v σ 2 Spαq{α is for the variance part Er} p f α ´fα } 2 s.In order to derive an as sharp upper bound for the risk as possible, one is often searching for the smallest possible smooth surrogate Sp¨q of Σp¨q.For such Sp¨q, we usually have lim sup αÑ0 Spαq{Σpαq ă 8, even though this is not required in Assumption 4 (cf.Section 4).In this situation, the bounds in Lemma 3.6 are often sharp up to multiplying constants, since they lead to minimax optimality in order in many situations, as detailed in [8].Ž As a consequence, we can find a reasonable α by minimizing the upper bound of the risk in Lemma 3.6.Recall that we consider candidate parameters in A θ1,θ2 " tα 0 , . . ., α mmax u, a discrete subset of A. Thus, we define Based on it, the ordered filters are able to achieve (up to possible logarithmic factors) the minimax optimal rates over various smoothness classes for mildly and severely ill-posed problems (cf.[8]).Recall, however, that the performance of SOLIT is evaluated with respect to a different oracle rule defined in (17).Thus, an important step in deriving the explicit rates for ordered filters with the SOLIT rule in particular situations is to investigate the relations between these two oracle rules.
Theorem 3.8 (Comparison of oracles).Assume the model in (1) and suppose that Assumptions 1, 2 and 4 hold.Let p f α " q α pT ˚T qT ˚Y with q α p¨q an ordered filter.Let also m ˚be defined in (17), and m ˛in (20).Then, m ˚ď log θ1 `C1 C 2 Spα m˛q {α m˛˘a nd, for f : P W ϕ pρq, and C 2 is a constant depending only on ρ, ϕ, S, C 1 q .Proof.Recall that A θ1,θ2 " tα 0 , . . ., α mmax u in (16a)-(16c).We define where b m :" › › f αm ´f : › › .Note that b m is decreasing (see the proof of Theorem 3.4), while v m is increasing, with respect to m.Thus, m ˚˚is well-defined.
We split the proof into four steps.
Step 1.We will show m ˚ď m ˚˚.To this end, we consider m 2 ą m 1 ą m ˚˚.The ordered property and C 2 q " 1 in Definition 1 imply that where the last inequality is due to m 1 ą m ˚˚and the definition of m ˚˚.
By the Cauchy-Schwarz inequality, we have Note further that v m2 {v m1 ě θ 1 ą 1 from (16b).Then, Combining the derived inequalities above, we obtain Namely, m ˚˚lies in the set in (17), so m ˚ď m ˚˚.
Step 2. We will show R m˚À R m˚˚.Because of m ˚ď m ˚˚and the definition of m ˚in (17), it holds that where the last inequality is due to the ordered property and nonnegativity of q α p¨q, see Definition 1. Further, by the triangle inequality, we obtain, for f : P W ϕ pρq, Step 3. We will show R m˚˚À R m˛.We consider two cases: • Case: m ˛ă m ˚˚.The definition of m ˚˚and the decreasing of b m imply where the first inequality above is due to (16b).
Thus, combining both cases, we have . This together with Step 2 and Lemma 3.6 leads to the second part of the assertion.
Step 4. We will show m ˚À log `Spα m˛q {α ˛˘.By the definition of m ˛in (20), and v 0σ 2 , we have where C 2 is a constant depending only on C b , C v , ϕ and S. The forms of C b , C v in Lemma 3.6 further imply that C 1 depends only on ρ, ϕ, S and C 1 q .Note that and, due to (16b), that v m˚˚ě v 0 θ m˚1 .Thus, which is the first part of the assertion.
Remark 3.9.In usual cases, the upper bound of R m˛i n the form given in Lemma 3.6 tends to 0 as σ Ñ 0. In particular, we have σ 2 Spα m˛q {α m˛Ñ 0, and thus C 1 C 2 Spα m˛q {α m˛ď 1{σ 2 for sufficiently small σ.This together with Theorem 3.8 implies that m ˚ď ´log θ1 pσ 2 q.Moreover, note that where the inequalities are based on the filter property in Definition 1(i).Thus, the (possible) price of adaptation for SOLIT in Theorem 3.4 is an additional term of order σ 2 p´log σ 2 q{α m˚.Ž

Adaptive minimax rates via SOLIT
Combining Theorems 3.4 and 3.8, we can derive the minimax adaptation rates in various cases for the SOLIT rule p α in Definition 2.

The mildly ill-posed case
We further require a lower bound on the variance part that arises in the bias-variance decomposition (cf.Lemma 3.6).This is formulated as follows.
Assumption 5.There exists a similar lower bound on v m as the upper bound in Lemma 3.6, namely, for some constant c v , Note that v m {σ 2 " V T,q " ř 8 k"1 λ k `qα pλ k q ˘2, and λ kk ´a by (5).One can show that Assumption 5 holds for common ordered filters, including spectral cut-off, (iterated) Tikhonov, Landweber and Showalter, with Spxqx ´1{a .For instance, in case of spectral cut-off, i.e., q α pλq " λ ´11tλ ě αu, it holds that which is Spαq{α up to a multiplying constant.
Theorem 4.1 (Mildly ill-posed problems).Assume the model in (1), and suppose that the eigenvalues of the forward operator decay at a polynomial rate as in (5) with a ą 1.Let Assumptions 2 and 5 hold, where the index function ϕ " ϕ ν,τ is given in (8) with the parameters ν ą 0 and τ P R, or ν " 0 and τ ą 0. Let p f α " q α pT ˚T qT ˚Y with an ordered filter q α , and consider SOLIT α x m P A θ1,θ2 " tα 0 , . . ., α mmax u in (16a)-(16c) with p m " p mpβ, γq as in Definition 2 as the parameter choice rule.Then, for some constant C ą 0, sup Proof.It follows from the polynomial decay of tλ k u kPN that there is a smooth surrogate function S satisfying Assumption 4, which takes the form of Spxqx ´1{a .Recall that the index function ϕ in ( 8) meets the requirement of Assumption 1.Thus, applying Lemma 3.6, we obtain, with m ˛in (20), α m˛-pσ 2 q 1 2ν`1`1{a p´log σ 2 q 2τ 2ν`1`1{a , and sup Further, by Theorem 3.8 and Assumption 5, it holds that which yields α m˛À α m˚.Hence, applying again Theorem 3.8 (see also Remark 3.9), we have m ˚À logp1{σ 2 q and u m˚À σ 2 {α m˚À σ 2 {α m˛.Note that m max -logp1{σ 2 q by Assumption 5 and the form of Sp¨q, see also Remark 3.5.It follows that C 2,m˚i n Theorem 3.4 satisfies C 2 2,m˚À σ 2 logp1{σ 2 q{α m˛! σ 2 Spα m˛q {α m˛.Thus, the assertion of theorem follows now from Theorems 3.4 and 3.8.
Remark 4.2.The convergence rate in Theorem 4.1 is minimax optimal for the function class W ϕ pρq, see e.g.Theorem 1 in [33].Note that the SOLIT rule does not require the exact smoothness of the truth, namely, the parameters ν and τ of the index function ϕ.Thus, the ordered filters with the SOLIT rule automatically adapt to the smoothness of the true.In particular, when ν ą 0 and τ P R, it corresponds to the polynomial smoothness, and when ν " 0 and τ ą 0, it corresponds to the logarithmic smoothness.In the latter case, the convergence rate in Theorem 4.1 is of order p´log σ 2 q ´2τ , because of ν " 0. Ž

Severely ill-posed problems
In this section, we consider severely ill-posed problems, namely, that the eigenvalues λ k of T ˚T in (4) decay exponentially as in ( 6) for some a P R and b, ϑ ą 0.

Technical challenges
We start with the additional technical issues arising from severe ill-posedness.
The super fast decay of eigenvalues poses difficulty in the selection of candidate parameters (cf.Section 3.2.1):Finding parameters with geometrically changing variance may be impossible.For instance, in case of spectral cut-off, we have A " tλ 1 , λ 2 , ...u and consequently, V T,q can be seen as a function of n P N such that It can be readily seen that Assumption 3 can be fulfilled only if the eigenvalues λ k decay not faster than k a exp `´bk ϑ ˘with parameters b ą 0, a ă 0 and ϑ P r0, 1s.As soon as ϑ ą 1, we already have lim nÑ8 V T,q pnq V T,q pn ´1q " 8, so that ( 15) cannot be valid for any sequence of regularization parameters in A.
Another difficulty is that the analysis technique of replacing summation over eigenvalues by integral with respect to surrogate function (cf.Assumption 4; originally developed in [8]) are no longer sharp in rates, but the gap is only in log factors.More precisely, provided that λ kk a expp´bk ϑ q with b, ϑ ą 0, the variance of spectral cut-off is of order σ 2 αp´log αq 1{ϑ´1 for α P A " tλ 1 , λ 2 , . ..u, while the bound in Lemma 3.6 gives that the variance part is of order σ 2 αp´log αq 1{ϑ .In the severely ill-posed situation, a sharp bound on the variance part remains open for general (ordered) filters, and is an interesting topic for future research.

Performance of SOLIT
As mentioned above, the upper bound on the variance part in Lemma 3.6 is sub-optimal in case of serious ill-posedness.Thus, we need a weaker assumption than Assumption 5 as follows.
Assumption 6.We assume a lower bound on v m slightly smaller than the upper bound in Lemma 3.6, more precisely, for some constant c v , Similar as Assumption 5 for mildly ill-posed problems, one can show that Assumption 6 is satisfied for common ordered filters, e.g., spectral cut-off, (iterated) Tikhonov, Landweber and Showalter, for severely ill-posed problems.As a showcase, we consider the spectral cut-off q α pλq " λ ´11tλ ě αu.Then, by (6), we obtain Note that the exponential decay of eigenvalues in (6) implies that Spxq -p´log xq 1{ϑ .Thus, Assumption 6 holds for the spectral cut-off.
Theorem 4.3 (Severely ill-posed problems).Assume the model in (1), and suppose that the eigenvalues of the forward operator decay at an exponential rate as in (6) with a P R, b ą 0 and 0 ă ϑ ď 1{2.Let Assumptions 2, 4 and 6 hold, where the index function ϕ " ϕ ν,τ is given in (8) with the parameters ν ą 0 and τ P R, or ν " 0 and τ ą 0. Let p f α " q α pT ˚T qT ˚Y with an ordered filter q α , and consider SOLIT α x m P A θ1,θ2 " tα 0 , . . ., α mmax u as in (16a)-(16c) with p m " p mpβ, γq as in Definition 2 as the parameter choice rule.Then, for some constant C ą 0, sup Proof.It can be proven in exactly the same way as Theorem 4.1, so, to avoid repetitions, we list only the differences in detailed computation: • The smooth surrogate function takes the form of • The convergence rate corresponding to m ˛in (20) is sup • By Assumption 6 and Theorem 3.8 we have which together with the form of Sp¨q implies that (cf.Remark 3.9) by Theorems 3.4 and 3.8.Remark 4.4.We consider the logarithmic and the polynomial smoothness separately.
First, in case of logarithmic smoothness (i.e., index function ϕ with ν " 0 and τ ą 0), the convergence rate in Theorem 4.3 is in fact of order p´log σ 2 q ´2τ which is minimax optimal (cf.[33]).As in the mildly ill-posed problems, the ordered filters with the SOLIT rule perform as well as if the exact smoothness specified by τ were known.
Second, in case of polynomial smoothness (i.e., index function ϕ with ν ą 0 and τ P R), the minimax optimal rate on W ϕ pρq is of order The convergence rate in Theorem 4.3 is slower by a factor of p´log σ 2 q 2ν{p2ν`1q , and thus not minimax optimal.However, it is unclear whether our slower rate is improvable or not, as it is required to hold simultaneously over more than one smoothness class.In [38], it is shown that the optimal adaption rate (cf.Section 2) is of order in the special case of ϑ " 1.This is exactly the same as in Theorem 4.3, which requires though ϑ P p0, 1{2s.We thus conjecture that the convergence rate of the SOLIT rule would be adaptively minimax optimal over the whole range of smoothness classes specified by the index function ϕ with ν ą 0 and τ P R. Ž Remark 4.5 (Highly serious ill-posedness, ϑ ą 1{2).There is a restriction on the decaying speed of eigenvalues in Theorem 4.3, more precisely, ϑ ď 1{2.The reason behind is the sub-optimality of the upper bound on the variance part in Lemma 3.6.If a sharp bound were available, the result of Theorem 4.3 would hold for ϑ ď 1.Moreover, in the super fast decaying regime of ϑ ą 1, it turns out that the discretization of regularization parameter α uniform in geometric scaling, detailed in (16a)-(16c), is no longer appropriate, as mentioned above.In fact, one should consider instead a smaller set A " A θ1,θ2 " tα m ; m P N 0 u such that 8 ą θ 2 ě exp `plog v m q 1{ϑ ȇxp `plog v m´1 q 1{ϑ ˘ě θ 1 ą 1 for all m.
Under such a discretization, it seems to be possible to show that the convergence rate of the SOLIT rule is adaptively minimax optimal for every ϑ ą 0. This would require a generalization of Theorem 3.4.Thus, the details are left as part of future research.Ž

Implementation and numerical simulations
To investigate the performance of SOLIT in practice, we start with a potential implementation.Note that for suitable filters q α p¨q, it is possible to implement the computation of p f α without using the SVD of T .As an exemplary case we consider the Tikhonov regularization.Here p f α can be computed as the solution of pαId `T ˚T q p f α " T ˚Y, where Id denotes the identity operator on X .Similar computational possibilities exist e.g. for Landweber and Showalter regularizations.Given an SVD-free method to compute p f α , it is also possible to compute the SOLIT parameter p α without making use of the SVD, as we will detail below.

Choosing the candidate parameters
To implement SOLIT, we first have to generate a sequence of regularization parameters α 0 ą α 1 ą ¨¨¨ą α mmax ą 0 obeying (16b).This can be done using a line search for the function V pαq :" tr `Varp p f α q ˘.By recursively splitting an interval into halves, this algorithm provides an approximation p α to the solution α of V pαq " λ with |V pαq ´V pp αq| ď ǫ for a prescribed error level ǫ in O `logp1{ǫq ˘steps.If no such p α exists, an alternative value p α and the corresponding error ǫ is returned.Now we fix a tuning parameter θ ą 1, set θ 2 " θ `θ´1 2 , θ 1 " θ ´θ´1 2 and choose α 0 as the line search approximation of V pα 0 q " σ 2 with ǫ " θ´1 2 .If no such approximation exists, then we take the smallest possible value α 0 such that V pα 0 q ą σ 2 and enlarge θ 2 correspondingly.Given α m´1 , the relation (16b) corresponds to log pθ 1 q `log pV pα m´1 qq ď log pV pα m qq ď log pθ 2 q `log pV pα m´1 qq , i.e., we can choose α m as the line search approximation to the equation log pV pα m qq " log pθq log pV pα m´1 qq with ǫ " θ´1 2 .Again, if no such solution exists we take a potentially larger α m and enlarge θ 2 .
Note that the above procedure is in fact just a discrete approximation of the technique used in the proof of Lemma 3.3.However, it also works in case of discontinuous filters α Þ Ñ q α pλq as the sequence needs to be constructed only until a minimal value α mmax ą 0. From this point of view, the upper bound in (16b) is trivial (as only finitely many αs are considered), but the above procedure ensures a good (i.e., small) value of θ 2 .
Since Varp p f α q " σ 2 T ˚T q α pT ˚T q 2 , we obtain in case of Tikhonov regularization that V pαq " σ 2 tr `pαId `T ˚T q ´2 T ˚T ˘.To compute this function exactly, one has to evaluate T ˚T on an arbitrary orthonormal system of X , which numerically is as expensive as computing the SVD itself.However, by making use of random trace estimation algorithms [2], the function V can be approximated with high accuracy without computing the SVD and with reasonable computational cost.

Computing the critical values z m,m 1
Given the regularization parameters α m , the implementation of SOLIT only requires thresholds in (18b), which are given in terms of the critical values z m1,m2 pxq in (18c).The latter correspond to quantiles of the random variable " N p0, 1q, i.e., standard Gaussian random variables.In our implementation, the operator T is approximated by a matrix A, and hence the sum is truncated at k " n.This implies that n m1,m2 has the same distribution as Z m1,m2 :" ε J A m1,m2 ε with the matrix A m1,m2 " A ˚A`p q αm 1 pA ˚Aq ´qαm 2 pA ˚Aq ˘2 and a random vector ε " N p0, I n q, i.e., n-dimensional standard Gaussian random vector.Note that Z m1,m2 is a generalized χ 2 random variable, the distribution of which can either be simulated by a Monte Carlo method or be approximated by a non-central χ 2 approximation as described in [24].There it is shown that the distribution of χ 2 l pδq (where l is the number of degrees of freedom and δ the non-centrality parameter) is a reasonable approximation of the distribution of Z m1,m2 in the following sense.With c k " tr `Ak m1,m2 ˘for k " 1, 2, 3, 4 and the values s 1 " c 3 {c and s 2 " c 4 {c 2 2 , which correspond multiples of the skewness and kurtosis of Z m1,m2 , we define , δ " s 1 a 3 ´a2 , l " a 2 ´2δ, and , a " 1 s1 .Then it holds P tZ m1,m2 ą tu « P " χ 2 l pδq ą ?2a t ´c1 ?2c 2 `l `δ* .
We compared this approximation with the exact distribution of Z (as computed in [10]) and Monte Carlo simulations and found that the error in the quantiles is overwhelmingly small, and hence we use this approximation in our implementation.The necessary traces can once more be computed efficiently and without using the SVD of T by means of (random) approximations, see [2].

Numerical simulations
In this section we will report on numerical experiments supporting the minimax behaviour of SOLIT.We therefore follow the simulation of convergence rates in [40] and consider the same synthetic problems and there: Antiderivative.A mildly ill-posed problem with Fredholm integral operator T : L 2 pr0, 1sq Ñ L 2 pr0, 1sq such that pT f q 2 " ´f for all f P L 2 pr0, 1sq, see also [16] for details.It is known that the eigenvalues λ k of T ˚T have the decay behaviour λ kk ´4.
2D-Gradiometry.A severely ill-posed problem, which occurs when one aims to reconstruct the earth's gravitational potential on the surface from satellite measurements of the potentials second derivative in radial direction.In the two-dimensional case, using Fourier-coefficients p f pkq, the explicit formula is |k| p|k| `1q R ´|k|´2 exp pikxq p f pkq for the corresponding operator T : L 2 pS, µq Ñ L 2 pRS, µq the surface measure µ on the unit sphere S " x P R 2 : }x} 2 " 1 ( and the relative radius R ą 1 of the satellite is known, see, e.g., [21].Thus, the eigenvalues λ k of T ˚T are given as λ k " |k| 2 p|k| `1q 2 R ´2|k|´4 .Furthermore it follows that the Fourier coefficient p f p0q cannot be determined from T f .We simulate the rate of convergence for specific exact solutions f : described below when using three different ordered filters, namely,

Heat
• Tikhonov regularization, where q α pλq " 1 λ`α .In this case, all requirements of Definition 1 are satisfied with C 1 q " 1, and the maximal qualification is ϕpxq " x.Moreover, the mapping α Þ Ñ q α pλq is obviously continuous and hence Lemma 3.3 is applicable.
• Spectral cut-off regularization, where q α pλq " 1 λ 1 rα,8q pλq.Again, all requirements of Definition 1 are met with C 1 q " 1 and an arbitrary index function ϕ, but in this case Lemma 3.3 is not applicable and as discussed in Section 4.2.1 the choice of suitable candidates α's is questionable.
Besides the rate of convergence, we also depict the price of adaptation `aR m˚`C2,m˚˘2 on the right-hand side of the oracle inequality in Theorem 3.8.

The antiderivative problem
We choose the continuous function as exact solution.As argued in [23], the optimal rate of convergence in this situation is O `σ3{4´ε for any ε ą 0. To avoid an inverse crime, the exact data g " T f : is implemented analytically: 1 r 1 2 ,1s pxq.In Figure 1 we depict the ideal rate O `σ3{4 ˘the convergence rate obtained with SOLIT, the convergence rate obtained by the oracle α ˚in (17), and the convergence rate obtained for the optimal choice α opt :" arg min αPtα0,...,αm max u The results show that SOLIT clearly obtains the minimax rate and yields also a surprisingly small loss compared to the oracle choice.For Tikhonov and Showalter regularization, the oracle choice and the optimal choice cannot even be distinguished, for spectral cut-off regularization (where determining the candidate α's is more subtle) a less regular behavior can be observed.It is also visible that the price of adaptation ( 21) is of the same order as the optimal rate, supporting our analysis.

The 2D-gradiometry problem
As p f p0q cannot be determined from T f : , we adopt the exact solution from Section 5.3.1 to As discussed in [40], the optimal rate of convergence in this situation is O `p´log σq  Carlo runs, for different parameter choices α, namely, SOLIT (eq.18a-18d, ), Lepskiȋ (eq. 3 with κ " 1, ) and the optimal rate of convergence ( ).

Conclusion and outlook
In the setup of statistical linear inverse problems with white (i.e., independent) Gaussian noises, we have investigated the SOLIT rule of choosing the regularization parameter for (ordered) filter based regularization methods.SOLIT is an automatic and data adaptive procedure, as it requires only the knowledge of forward operator and the noise level.From a computational perspective, it involves Oplog σq number of computations of the estimator p f α and O `plog σq 2 ˘number of evaluations of the variance estimate tr `Varp p f α q ˘.The overall computational complexity of SOLIT is thus of nearly (i.e., up to log-factors) the same order as that of evaluation of p f α for a fixed α.In particular, a singular value decomposition of the forward operator is not required, unless spectral cut-off and its variants are used as regularization methods.In short, SOLIT is piratically attractive for various applications (even with massive data), which has also been demonstrated by our simulation study in Section 5.
Besides desirable empirical performance, the SOLIT rule is shown to attain the sharp order of optimal adaptation rates under various smoothness conditions (including logarithmic and polynomial smoothness) for mildly ill-posed problems.This is in sharp contrast to the standard Lepskiȋ rule, where one often losses unnecessarily a log-factor in the convergence rates.In case of severely ill-posed problems, we have also established the convergence rates for SOLIT under general smoothness conditions, but it remains open whether such rates are adaptively optimal or not.
In addition, we would like to point out that there are at least two ways of estimation of the noise level (required by SOLIT) when it is not available in practice.The one approach is to employ bootstrap procedures as it is done in [37], which allows to estimate the threshold η m1,m2 (cf.Definition 2) without knowing the noise level.The other approach is to estimate the noise level directly from the data, see, e.g., [1,30,36].Further, it may be possible to extend SOLIT beyond the Gaussian noises.If the distribution of the noises is known, e.g., Poisson, one could modify SOLIT by simply replacing z m1,m2 pxq (cf.Definition 2) with the p1 ´xq-quantile of the noise distribution.Otherwise, one could attempt to estimate such a quantile from the data, e.g., by following the idea in [42].All of these extensions of SOLIT are interesting topics for future research.

A Adaptation rates
We show a simple and interesting fact that the ordered filters with the posteriori parameter choice ασ attain the sharp order of optimal adaptation rates for logarithmic smoothness classes.This is because the upper bound on MSE in Proposition A.1 below coincides (up to multiplying constants) with the well-known lower bound in this smoothness class (see e.g.[33]).
Combining two cases concludes the proof.

2 .α ż α 0
There exists a constant C S ą 0 such that 1 Σptq dt ď C S Spαq for all α P A.

Figure 4 :" 10 4
Figure 4: Comparison of SOLIT with the classical Lepskiȋ-type balancing principle (3) for all three different testing problems with Tikhonov regularization.Depicted are different noise levels σ (x-axis) against empirical mean squared errors E " › › › p f α ´f › › › 2 X  simulated in M " 10 4 Monte
. A severely ill-posed problem where one aims at reconstructing the source f in the periodic heat equation 2px, tq in p´π, πs ˆp0, tq , u px, 0q " f pxq on r´π, πs , u p´π, tq " u pπ, tq on t P p0, ts from measurements of the final heat distribution g " u p¨, tq with t ą 0. By separation of variables, an explicit representation of T : L 2 pr´π, πsq Ñ L 2 pr´π, πsq in form of a Fourier series can be computed, i.e.,pT f q pxq " 8 ÿ k"´8 exp `´k 2 t˘e xp pikxq f pkq ,where f pkq is the k-th Fourier coefficient of f .Consequently, the eigenvalues λ k of T ˚T are given by λ k " exp `´2k 2 t˘.