Optimal-order convergence of Nesterov acceleration for linear ill-posed problems*

Stefan Kindermann

doi:10.1088/1361-6420/abf5bc

1. Introduction

One option to calculate a regularized solution to a linear ill-posed problem Ax = y, with A: X → Y linear and bounded and X, Y being Hilbert spaces, when only noisy data y^δ with ||y − y^δ|| = δ are available is to employ iterative regularization schemes. Here, approximate solutions ${x}_{k}^{\delta }$ are calculated iteratively combined with a stopping rule as regularization parameter choice. The simplest one being Landweber iteration (cf, e.g. [7]), which has the downside of being rather slow. To speed up convergence, acceleration schemes may be used such as the following Nesterov acceleration:

$\begin{equation}\begin{aligned}\hfill {x}_{k+1}^{\delta }& ={z}_{k}^{\delta }+{A}^{{\ast}}\left({y}^{\delta }-A{z}_{k}^{\delta }\right),\qquad \enspace \enspace k{\geqslant}1\hfill \\ \hfill {z}_{k}^{\delta }& ={x}_{k}^{\delta }+{\alpha }_{k}\left({x}_{k}^{\delta }-{x}_{k-1}^{\delta }\right),\qquad {x}_{0}=0,\enspace {x}_{1}={A}^{{\ast}}{y}^{\delta },\hfill \end{aligned}\end{equation} \tag{ 1 }$

where ||A*A|| ⩽ 1 is assumed and where the sequence α_k is chosen, for instance, as

$\begin{equation}{\alpha }_{k}=\frac{k-1}{k+\beta },\quad k{\geqslant}1,\enspace \beta { >}-1.\end{equation} \tag{ 2 }$

Here, β is a parameter; common choices are, for example, β = 1 or β = 2. We remark that other alternatives for the sequence α_k are possible as well, but for the main analysis of this paper we only consider (2).

This iteration (in a general nonlinear context) was suggested by Yurii Nesterov for general convex optimization problems [14]. It is an instance of a method that achieves the best rate of convergence (in the sense of objective function decrease) that is generally possible for a first-order method. Nesterov acceleration can be employed to speed up convergence of gradient methods in nonlinear or convex optimization. A particular successful instance is the FISTA algorithm of Beck and Teboulle [3] for nondifferentiable convex optimization.

In the realm of ill-posed problems, Hubmer and Ramlau [12] performed a convergence analysis for the nonlinear case, and showed the efficiency of the method.

We note that although the main field of application of Nesterov acceleration lies in nonlinear optimization, in the paper we only treat the case of linear operator equations and the acceleration properties of the method for linear ill-posed problems. Other recent acceleration schemes proposed in the literature use, e.g., Hilbert scale preconditioning [6], the continuous version of Nesterov's scheme [4, 9], or fractional asymptotical regularization [17].

The background and main motivation of the present article is the recent interesting work of Neubauer [15] for ill-posed problems in the linear case. He showed that (1) is an iterative regularization scheme and, more important, proved convergence rates, which are of optimal order only for a priori parameter choices and in case of low smoothness of the solution while being suboptimal otherwise. What is puzzling is that the method shows a quite unusual 'semi-saturation' phenomenon (we explain this term below in section 3.1).

Our contribution in this article is twofold: at first, we prove a formula for the residuals of the iteration (1) involving Gegenbauer polynomials. On this basis, we can build a convergence rate analysis, which improves and extends the results of Neubauer. In particular, we show that the method can always be made an optimal-order method if the parameter β is chosen accordingly to the (Hölder-)smoothness index of the solution. This result holds for both an a priori stopping rule and for the discrepancy principle.

Our analysis also explains the quite nebulous role that this parameter plays in the iteration; it turns out that it is related to the index of the orthogonal polynomials appearing in the residual formula.

Moreover, the above mentioned residual representation also clearly elucidates the semi-saturation phenomenon because the iteration can be interpreted as a mixture of a saturating iteration (Brakhage's ν-method) and a non-saturating one (Landweber method).

In the following we employ some standard notation of regularization theory as in [7]: δ = ||Ax^† − y^δ|| is the noise level and x^† denotes the minimum-norm solution to the operator equation Ax = y with exact data y = Ax^†. The index δ of y^δ indicates noisy data, and analogous, ${x}_{k}^{\delta }$ denotes the iterates of (1) with noisy data y^δ, while the lack of δ indicates exact data y and correspondingly the iteration x_k with exact data y in place of y^δ in (1).

2. Residual polynomials for Nesterov acceleration

Our work follows the general theory of spectral filter-based regularization methods as in [7], where the convergence analysis results from estimates of the corresponding filter function. The first main result, theorem 1 is quite useful for this purpose as it represents the residual function in terms of known polynomials.

The iteration (1) is a Krylov-space method, and the residual can be expressed as

$\begin{equation*}{y}^{\delta }-A{x}_{k}^{\delta }= :\enspace {r}_{k}\left(A{A}^{{\ast}}\right){y}^{\delta }\end{equation*}$

with the residual polynomials satisfying the recurrence relation (cf [15])

$\begin{equation}\begin{aligned}\hfill {r}_{k+1}\left(\lambda \right)& =\left(1-\lambda \right)\left[{r}_{k}\left(\lambda \right)+{\alpha }_{k}\left({r}_{k}\left(\lambda \right)-{r}_{k-1}\left(\lambda \right)\right)\right],\quad k{\geqslant}1,\hfill \\ \hfill \qquad {r}_{0}\left(\lambda \right)& =1,\qquad {r}_{1}\left(\lambda \right)=\left(1-\lambda \right).\hfill \end{aligned}\end{equation} \tag{ 3 }$

This is a simple consequence of the definition in (1). The kth iterate can be expressed via spectral filter functions

$\begin{equation*}{x}_{k}^{\delta }={g}_{k}\left({A}^{{\ast}}A\right){A}^{{\ast}}{y}^{\delta },\qquad {g}_{k}\left(\lambda \right){:=}\frac{1-{r}_{k}\left(\lambda \right)}{\lambda }.\end{equation*}$

Observe that the three-term recursion (3) is not of the form to apply Favard's theorem [8], hence r_k does not agree with any orthogonal polynomial with respect to some weight functions. (Note that Favard's theorem fully characterizes three-term recurrence relations that lead to orthogonal polynomials).

Before we proceed, we may compare the residual polynomials with other well-known cases. For classical Landweber iteration [7], which is obtained by setting α_k = 0 and thus ${z}_{k}^{\delta }={x}_{k}^{\delta }$ , the corresponding residual functions ${r}_{k}= :\enspace {r}_{k}^{\left(\mathrm{L}\mathrm{W}\right)}$ is

$\begin{equation*}{r}_{k}^{\left(\mathrm{L}\mathrm{W}\right)}\left(\lambda \right)={\left(1-\lambda \right)}^{k}.\end{equation*}$

On the other hand, another class of well-known iteration methods for ill-posed problems that are based on orthogonal polynomials are two-step semiiterative methods [10]. They have the form

$\begin{equation*}{x}_{k+1}^{\delta }={x}_{k}^{\delta }+{\mu }_{k+1}\left({x}_{k}-{x}_{k-1}\right)+{\omega }_{k+1}{A}^{{\ast}}\left({y}^{\delta }-A{x}_{k}\right),\quad k{ >}1,\end{equation*}$

where μ_k and ω_k are appropriately chosen sequences. The corresponding residual functions satisfy the recurrence relation

$\begin{equation}{r}_{k+1}\left(\lambda \right)=\left(1-{\omega }_{k+1}\lambda \right){r}_{k}\left(\lambda \right)+{\mu }_{k+1}\left({r}_{k}-{r}_{k-1}\right),\quad k{ >}1,\end{equation} \tag{ 4 }$

and thus, r_k(λ) form a sequence of orthogonal polynomials. Of special interest in ill-posed problems are the ν-methods of Brakhage [5, 10], defined by the sequences, for k > 1,

$\begin{align*}\hfill {\mu }_{k+1}& =\frac{\left(k-1\right)\left(2k-2\right)\left(2k+2\nu -1\right)}{\left(k+2\nu -1\right)\left(2k+4\nu -1\right)\left(2k+2\nu -3\right)},\hfill \\ \hfill \qquad {\omega }_{k+1}& =4\frac{\left(2k+2\nu -1\right)\left(k+\nu -1\right)}{\left(k+2\nu -1\right)\left(2k+4\nu -1\right)},\hfill \end{align*}$

the initial values x₀ = 0, ${x}_{1}=\frac{4\nu +2}{4\nu +1}{T}^{{\ast}}{y}^{\delta }$ , and with ν > 0 a user-selected parameter. The associated residual polynomials ${r}_{k}= :\enspace {r}_{k}^{\left(\nu \right)}$ related to (4) with r₀ = 1, ${r}_{1}=1-\lambda \frac{4\nu +2}{4\nu +1}$ , have the representation [5]

$\begin{equation*}{r}_{k}^{\left(\nu \right)}\left(\lambda \right)=\frac{{C}_{2k}^{\left(2\nu \right)}\left(\sqrt{1-\lambda }\right)}{{C}_{2k}^{\left(2\nu \right)}\left(1\right)},\end{equation*}$

where ${C}_{n}^{\left(\alpha \right)}$ denotes the Gegenbauer polynomials (aka. ultraspherical polynomials); cf [1].

We now obtain the corresponding representation for the Nesterov residual polynomials, which is the basis of this article.

Theorem 1. Let β > −1. The residual polynomials for the Nesterov acceleration (1) with (2) are

$\begin{equation}{r}_{k}\left(\lambda \right)={\left(1-\lambda \right)}^{\frac{k+1}{2}}\frac{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(\sqrt{1-\lambda }\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)},\quad k{\geqslant}1,\end{equation} \tag{ 5 }$

with the Gegenbauer polynomials ${C}_{n}^{\left(\alpha \right)}$ .

Proof. Defining ${h}_{k}\left(\lambda \right)={r}_{k}\left(\lambda \right){\left(1-\lambda \right)}^{-\frac{k+1}{2}}$ and multiplying (3) by ${\left(1-\lambda \right)}^{-\frac{k+2}{2}}$ leads to the relation

$\begin{align}\hfill {h}_{k+1}\left(\lambda \right)& =\left(1+{\alpha }_{k}\right)\sqrt{1-\lambda }{h}_{k}\left(\lambda \right)-{\alpha }_{k}{h}_{k-1}\left(\lambda \right),\quad k{\geqslant}2,\hfill \\ \hfill {h}_{1}\left(\lambda \right)& =1,\qquad {h}_{2}\left(\lambda \right)=\sqrt{1-\lambda }.\hfill \end{align} \tag{ 6 }$

We note that ${C}_{n}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)$ satisfy the recursion relation (cf [1, p 782])

$\begin{equation}\begin{aligned}\hfill {C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)& =x{c}_{k}{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)-{d}_{k}{C}_{k-2}^{\left(\frac{\beta +1}{2}\right)}\left(x\right),\quad k{\geqslant}2\hfill \\ \hfill {C}_{0}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)& =1,\qquad {C}_{1}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)=\left(\beta +1\right)x\hfill \end{aligned}\end{equation} \tag{ 7 }$

with

$\begin{equation*}{c}_{k}=\frac{2k+\beta -1}{k},\qquad {d}_{k}=\frac{k+\beta -1}{k}.\end{equation*}$

Using the recurrence relation with x = 1, leads to

$\begin{equation*}{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)={c}_{k}{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)\left(1-{\theta }_{k}^{-1}\right)={d}_{k}{C}_{k-2}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)\left({\theta }_{k}-1\right)\end{equation*}$

with

$\begin{equation*}{\theta }_{k}{:=}\frac{{c}_{k}{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}{{d}_{k}{C}_{k-2}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}.\end{equation*}$

Dividing (7) by ${C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)$ and using this relation yields

$\begin{equation}\frac{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)}{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}=\frac{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}\frac{1}{1-{\theta }_{k}^{-1}}-\frac{{C}_{k-2}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)}{{C}_{k-2}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}\frac{1}{{\theta }_{k}-1}.\end{equation} \tag{ 8 }$

By induction (or by well-known formulae [1, 16]), it can easily be verified that $\frac{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}{{C}_{k-2}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}=\frac{k+\beta -1}{k-1}$ , from which it follows that ${\theta }_{k}-1={\alpha }_{k}^{-1}$ as well as $1-{\theta }_{k}^{-1}={\left(1+{\alpha }_{k}\right)}^{-1}$ . Thus, $\frac{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(x\right)}{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}$ satisfies the same recursion as h_k+1(λ), and the corresponding initial values for k = 0, 1 agree when setting $x=\sqrt{1-\lambda }$ . This allows us to conclude that

$\begin{equation*}{h}_{k}\left(\lambda \right)=\frac{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(\sqrt{1-\lambda }\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)},\end{equation*}$

which proves the theorem. □

This theorem relates the residual function of Nesterov acceleration to other known iterations. In particular, the residual r_k is roughly the product of that of $\frac{k}{2}$ Landweber iterations and that of $\frac{k}{2}$ iterations of a ν-method with $\nu =\frac{\beta +1}{4}$ .

Remark 1. Gegenbauer polynomials are special cases of Jacobi polynomials and they themselves embrace several other orthogonal polynomials as special cases. Certain values of β in (1) yield various specializations in (5): the choice β = 0 leads to Legendre polynomials, the often encountered choice β = 1 leads to Chebyshev polynomials of the second kind [1].

We note that the result of theorem 1 even holds for β = −1. In this case, only α₁ is not well-defined, but it is always 0 for β > −1. Thus, we may extend the definition of the iteration to β = −1 by setting ${x}_{k}^{\delta }{:=}{\mathrm{lim}}_{\beta \to -1}\enspace {x}_{k}^{\delta }$ . (This just amounts to slightly modifying (2) by setting α₁ = 0 for k = 1; the remaining iteration is well-defined by (1) and (2).) In this case, we may use [1, equation (22.5.28)] to conclude that the resulting polynomials are Chebyshev polynomials of the first kind.

Before we proceed with the convergence analysis, we state for generality the corresponding theorem for the Nesterov acceleration (1) with a general sequence α_k.

Theorem 2. Consider the iteration (1) with a positive sequence α_k. Then the corresponding residual function can be expressed as

$\begin{equation}{r}_{k}\left(\lambda \right)={\left(1-\lambda \right)}^{\frac{k}{2}}\frac{{P}_{k}\left(\sqrt{1-\lambda }\right)}{{P}_{k}\left(1\right)},\quad k{\geqslant}1,\end{equation} \tag{ 9 }$

where P_k is a sequence of orthogonal polynomials obeying the recurrence relation

$\begin{equation}\begin{aligned}\hfill {P}_{k+1}\left(x\right)& ={c}_{k}x{P}_{k}\left(x\right)-{d}_{k}{P}_{k-1}\left(x\right),\quad k{\geqslant}1\hfill \\ \hfill {P}_{0}\left(x\right)& =1,\qquad {P}_{1}\left(x\right)={c}_{0}x\hfill \end{aligned}\end{equation} \tag{ 10 }$

with c_n and d_n recursively defined to satisfy

$\begin{equation}\begin{aligned}\hfill \frac{{c}_{1}{c}_{0}}{{d}_{1}}& =1+\frac{1}{{\alpha }_{1}}\hfill \\ \hfill \frac{{c}_{k}{c}_{k-1}}{{d}_{k}}& =\left(1+\frac{1}{{\alpha }_{k}}\right)\left({\alpha }_{k-1}+1\right)\quad k{\geqslant}2.\hfill \end{aligned}\end{equation} \tag{ 11 }$

Conversely, given a sequence of orthogonal polynomials defined by the recurrence relation (10) with given sequences c_n, d_n. Then there exists a sequence α_k (defined via (11)) such that the corresponding Nesterov iteration (1) has a residual function as in (9).

Proof. The function ${h}_{k}\left(\lambda \right){:=}{r}_{k}\left(\lambda \right){\left(1-\lambda \right)}^{-\frac{k}{2}}$ satisfies the recursion (6) with h₀(λ) = 1 and ${h}_{1}\left(\lambda \right)=\sqrt{1-\lambda }$ and for k ⩾ 1. As in the proof of theorem 1, we may conclude that (10) leads to a similar recursion as (8):

$\begin{equation*}\frac{{P}_{k+1}\left(x\right)}{{P}_{k+1}\left(1\right)}=\frac{{P}_{k}\left(x\right)}{{P}_{k}\left(1\right)}\frac{1}{1-{\theta }_{k}^{-1}}-\frac{{P}_{k-1}\left(x\right)}{{P}_{k-1}\left(1\right)}\frac{1}{{\theta }_{k}-1},\quad k{\geqslant}1.\end{equation*}$

with

$\begin{equation*}{\theta }_{k}=\frac{{c}_{k}{P}_{k}\left(1\right)}{{d}_{k}{P}_{k-1}\left(1\right)},\quad k{\geqslant}1.\end{equation*}$

From (10) we can conclude by some algebraic manipulations that

$\begin{equation*}{\theta }_{k}=\frac{{c}_{k}{c}_{k-1}}{{d}_{k}}\left(1-{\theta }_{k-1}^{-1}\right),\quad k{\geqslant}2.\end{equation*}$

If (11) holds, then from the recursion for θ_k, it follows that we can perform an induction step following that ${\theta }_{k-1}^{-1}=1+{\alpha }_{k-1}^{-1}$ implies ${\theta }_{k}^{-1}=1+{\alpha }_{k}^{-1}$ . Since ${\theta }_{1}^{-1}=1+{\alpha }_{1}^{-1}$ by definition, we obtain that h_k(λ) and $\frac{{P}_{k}\left(x\right)}{{P}_{k}\left(1\right)}$ satisfy identical recursions and have identical initial conditions with the setting $x=\sqrt{1-\lambda }$ .

Conversely, if (10) is given and the sequence α_k is recursively defined by (11), then it follows in a similar manner that $\frac{{P}_{k}\left(x\right)}{{P}_{k}\left(1\right)}$ has the same recursion and initial conditions as h_k(λ) and thus both functions agree. □

The polynomials P_k(x) in this theorem correspond to $x{C}_{k-1}^{\frac{\beta +1}{2}}\left(x\right)$ in theorem 1. As an illustration, we may consider the peculiar choice of α_k in Nesterov's original paper [14], which is also used in the well-known FISTA iteration [3]: first, a sequence is defined recursively,

$\begin{equation*}{t}_{k+1}=\frac{1}{2}\left(1+\sqrt{1+4{t}_{k}^{2}}\right),\qquad {t}_{1}=1,\end{equation*}$

and then the sequence α_k is given by

$\begin{equation*}{\alpha }_{k}=\frac{{t}_{k}-1}{{t}_{k+1}}.\end{equation*}$

Note that t_k+1 is the positive root of the equation ${t}_{k+1}\left({t}_{k+1}-1\right)={t}_{k}^{2}$ . Using this identity, we may calculate that

$\begin{equation*}\left(1+\frac{1}{{\alpha }_{k}}\right)\left({\alpha }_{k-1}+1\right)=\frac{{t}_{k}}{{t}_{k-1}}\left(1+\frac{{t}_{k}}{{t}_{k+1}}\right)\left(1+\frac{{t}_{k-1}}{{t}_{k}}\right).\end{equation*}$

Thus, coefficients for a recurrence formula for orthogonal polynomials that correspond to such an iteration are

$\begin{equation*}{c}_{k}=1+\frac{{t}_{k}}{{t}_{k+1}},\qquad {d}_{k}={c}_{k-1}-1.\end{equation*}$

However, this does not seem to be related to any common polynomial family, to the knowledge of the author.

On the other hand, we may design Nesterov iterations from the recurrence relation of classical polynomials. For instance, the Hermite polynomials obey a relation (10) with c_k = 2, d_k = 2k. Thus, the sequence α_k has to satisfy the recursion

$\begin{equation*}{\alpha }_{k}{:=}\frac{1+{\alpha }_{k-1}}{\frac{2}{k}-{\alpha }_{k-1}-1}.\end{equation*}$

3. Convergence analysis

We consider the iteration (1) with the usual α_k-sequence (2) and show that it is an optimal-order regularization methods (of course, when combined with a stopping rule).

3.1. Convergence rates and semi-saturation

In the classical analysis of regularization schemes [7], one tries to bound the error in terms of the noise level δ: ${\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}\psi \left(\delta \right)$ , where ψ is some function decreasing to 0 with δ → 0. Often, Hölder-type rates are considered with ψ(δ) = Cδ^ξ. For such estimates, one has to impose smoothness conditions in form of a source condition

$\begin{equation}{x}^{{\dagger}}={\left({A}^{{\ast}}A\right)}^{\mu }\omega ,\quad {\Vert}\omega {\Vert}{< }\infty ,\enspace \mu { >}0.\end{equation} \tag{ 12 }$

It is also well-known [7] that the optimal rate of convergence under (12) is of the form

$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}O\left({\delta }^{\frac{2\mu }{2\mu +1}}\right),\end{equation*}$

and a regularization scheme that achieves this bound is called of optimal order.

The phenomenon of saturation is the effect that for certain regularization method, the convergence rate ψ(δ) does not improve even when the smoothness is higher, i.e., μ is larger. This happens, for instance for Tikhonov regularization at μ = 1 or for the ν-methods at μ = ν; see [7].

For the Nesterov acceleration (1), a detailed analysis has been performed by Neubauer [15] with the result that, assuming a usual source condition (12) and an appropriate a priori stopping rule, the resulting iterative regularization scheme is of optimal order for $\mu {\leqslant}\frac{1}{2}$ , and, for $\mu { >}\frac{1}{2}$ , the convergence rates improve with μ but in a suboptimal way. More precisely, the convergence rates proven in [15] are

$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}=\begin{cases}O\left({\delta }^{\frac{2\mu }{2\mu +1}}\right)\quad \hfill & \mu {\leqslant}\frac{1}{2},\hfill \\ O\left({\delta }^{\frac{2\mu +1}{2\mu +3}}\right)\quad \hfill & \mu { >}\frac{1}{2}.\hfill \end{cases}\end{equation*}$

Thus, contrary to saturating methods, the order still improves beyond the 'saturation index' $\mu =\frac{1}{2}$ but in a suboptimal way. This is what we call 'semi-saturation', and, to the knowledge of the author, this has not been observed yet for a classical regularization method. A further result of [15] is that using the discrepancy principle as stopping rule, convergence rates are proven, which are, however, always suboptimal.

Our second main contribution is an improvement of Neubauer's result in the sense that we show that the Nesterov iteration is of optimal order for a smoothness index $\mu {\leqslant}\frac{\beta +1}{4}$ with an a priori stopping rule. Moreover, contrary to [15], we also obtain optimal-order rates with the discrepancy principle provided that $\mu {\leqslant}\frac{\beta -1}{4}$ . These findings allow one to always achieve optimal-order convergence provided that β is chosen sufficiently large.

Moreover, the phenomenon of semi-saturation is made transparent by referring to the representation in theorem 1: the residual is a product of Landweber-type and ν-type residuals, and keeping in mind that Landweber iteration does not show saturation for Hölder indices while the ν-method do, it is clear that a product as in (5) leads to the above described semi-saturation.

3.2. Convergence analysis

In this section we perform a convergence analysis for the iteration (1). By theorem 1, we may base our investigation on the known results for Landweber iteration and the ν-methods.

We collect some useful known estimates:

$\begin{equation}\left\vert \frac{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(\sqrt{1-\lambda }\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}\right\vert {\leqslant}1,\quad 0{\leqslant}\lambda {\leqslant}1,\enspace \beta { >}-1.\end{equation} \tag{ 13 }$

This is well-known and follows from [16, equations (7.33.1) and (4.73)]. From this we immediately obtain that

$\begin{equation}\vert {r}_{k}\left(\lambda \right)\vert {\leqslant}1,\qquad 0{\leqslant}\lambda {< }1,\quad \beta { >}-1,\end{equation} \tag{ 14 }$

which has already been shown in [15]. Moreover, we may conclude from (13) and (5) as well that

$\begin{equation}{\mathrm{lim}}_{k\to \infty }\enspace {r}_{k}\left(\lambda \right)\to 0,\quad 0{< }\lambda {< }1.\end{equation} \tag{ 15 }$

Recall that we denote by x_k the iteration with y^δ replaced by the exact data. As usual, this allows one to split the total error into an approximation and stability term. We estimate the stability term:

Proposition 1. Let ||A*A|| ⩽ 1 and define ${x}_{k}^{\delta }$ by (1) and (2) with β > −1. Let x_k be the corresponding noise-free iteration with y^δ replaced by y = Ax^†. Then we have the estimate

$\begin{equation}{\Vert}{x}_{k}^{\delta }-{x}_{k}{\Vert}{\leqslant}\sqrt{2}\sqrt{{\left(k-1\right)}^{2}+\frac{k+1}{2}}\delta {\leqslant}\sqrt{2}\enspace k\delta .\end{equation} \tag{ 16 }$

Proof. Following [7], it is enough to estimate

$\begin{equation*}{g}_{k}\left(\lambda \right)=\frac{1-{r}_{k}\left(\lambda \right)}{\lambda }={r}_{k}^{\prime }\left(\tilde {\lambda }\right),\end{equation*}$

where we used the mean value theorem with $\tilde {\lambda }\in \left(0,\lambda \right)$ . The derivative may be calculated from (5) as

$\begin{equation*}{r}_{k}^{\prime }\left(\lambda \right) = \frac{k+1}{2}{\left(1-\lambda \right)}^{\frac{k-1}{2}}\frac{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(\sqrt{1-\lambda }\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}-\frac{1}{2}{\left(1-\lambda \right)}^{\frac{k}{2}}\frac{{\left[{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\right]}^{\prime }\left(\sqrt{1-\lambda }\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}.\end{equation*}$

We use Markov's inequality (cf [7, equation (6.16)]) and (13) to conclude that

$\begin{equation*}\left\vert \frac{{\left[{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\right]}^{\prime }\left(\sqrt{1-\lambda }\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}\right\vert {\leqslant}2{\left(k-1\right)}^{2}{\mathrm{max}}_{0{\leqslant}\lambda {\leqslant}1}\left\vert \frac{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(\sqrt{1-\lambda }\right)}{{C}_{k-1}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}\right\vert {\leqslant}2{\left(k-1\right)}^{2}.\end{equation*}$

Thus,

$\begin{equation}\vert {g}_{k}\left(\lambda \right)\vert {\leqslant}\frac{k+1}{2}+{\left(k-1\right)}^{2}.\end{equation} \tag{ 17 }$

The result now follows with [7, theorem 4.2] and (14). □

Note that this estimate is a slight improvement compared to the corresponding estimate in [15, equation (3.2)], which has 2kδ on the right-hand side, similar as for the ν-methods.

From this we may conclude convergence:

Theorem 3. Let ||A*A|| ⩽ 1 and β > −1. If the iteration is stopped at a stopping index k(δ) that satisfies k(δ)δ → 0 and k(δ) → ∞ as δ → 0, then the iteration (1) is convergent:

$\begin{equation*}{x}_{k\left(\delta \right)}^{\delta }\to {x}^{{\dagger}},\quad \text{as}\enspace \delta \to 0.\end{equation*}$

Proof. We estimate

$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}_{k\left(\delta \right)}{\Vert}+{\Vert}{x}_{k\left(\delta \right)}-{x}^{{\dagger}}{\Vert}{\leqslant}\sqrt{2}k\left(\delta \right)\delta +{r}_{k\left(\delta \right)}\left({A}^{{\ast}}A\right){x}^{{\dagger}}.\end{equation*}$

The first term converges to 0 by the assumption on k(δ), and the second term does so because k(δ) → ∞ and by the dominated convergence theorem using (14), (15) as in [7]. □

We now consider convergence rates, and for this, the following rather deep estimate for orthogonal polynomials is needed. It was derived by Brakhage [5] as well as by Hanke [7, appendix A.2], [10] on basis of Hilb-type estimates for Jacobi polynomials.

Proposition 2. Let β > −1. Then there is a constant c_β with

$\begin{equation}\left\vert {\lambda }^{\frac{\beta +1}{4}}\frac{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(\sqrt{1-\lambda }\right)}{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}\right\vert {\leqslant}{c}_{\beta }{k}^{-2\frac{\beta +1}{4}},\quad 0{\leqslant}\lambda {\leqslant}1.\end{equation} \tag{ 18 }$

Proof. For k even, this is [7, equation (6.22)] (with k there meaning 2k here), or [10, theorem 4.1]. However, the result there is based on the Hilb-type formula ([16, theorems 8.21.12, 8.21.13] which holds for all k as in [5, p 170]. Thus, by following the steps in [7, appendix A.2], the result is obtained. □

Note that in case −1 < β < 1, the constant c_β may be explicitly calculated from [16, equation (7.33.5)].

The corresponding estimates for the residuals of Landweber iteration are standard; cf [7, equation (6.8)]:

$\begin{equation}\vert {\lambda }^{\mu }{\left(1-\lambda \right)}^{k}\vert {\leqslant}{c}_{\mu }{\left(k+1\right)}^{-\mu },\quad \mu { >}0.\end{equation} \tag{ 19 }$

As a consequence, we may state our main convergence rate result for an a priori stopping rule:

Theorem 4. Let ||A*A|| ⩽ 1 and β > −1, and suppose that a source condition (12) is satisfied with some μ > 0.

(a)
If $\mu {\leqslant}\frac{\beta +1}{4}$ and the stopping index is chosen as
$\begin{equation*}k\left(\delta \right)=O\left({\delta }^{-\frac{1}{2\mu +1}}\right),\end{equation*}$
then optimal order convergence is obtained:
$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}O\left({\delta }^{\frac{2\mu }{2\mu +1}}\right).\end{equation*}$
(b)
If $\mu { >}\frac{\beta +1}{4}$ and the stopping index is chosen as
$\begin{equation}k\left(\delta \right)=O\left(\right.{\delta }^{-\frac{1}{\mu +\frac{\beta +1}{4}+1}}\left.\right),\end{equation} \tag{ 20 }$
then the following suboptimal order convergence is obtained:
$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}O\left(\right.{\delta }^{\frac{\mu +\frac{\beta +1}{4}}{\mu +\frac{\beta +1}{4}+1}}\left.\right).\end{equation*}$

Proof. For λ ⩽ 1, the estimate (18) yields (by interpolation) and ${\left(1-\lambda \right)}^{\frac{k+1}{2}}{\leqslant}1$ that

$\begin{equation}\vert {r}_{k}\left(\lambda \right){\lambda }^{\mu }\vert {\leqslant}C{k}^{-2\mu },\quad \mu {\leqslant}\frac{\beta +1}{4}.\end{equation} \tag{ 21 }$

In case of $\mu { >}\frac{\beta +1}{4}$ , we have with additionally using (19)

$\begin{align}\hfill \vert {r}_{k}\left(\lambda \right){\lambda }^{\mu }\vert & {\leqslant}\vert {\left(1-\lambda \right)}^{\frac{k+1}{2}}{\lambda }^{\mu -\frac{\beta +1}{4}}\vert {c}_{\beta }{k}^{-2\frac{\beta +1}{4}}\hfill \\ \hfill & {\leqslant}{c}_{\mu ,\beta }\left\vert {\left(\frac{k+1}{2}\right)}^{-\left(\mu -\frac{\beta +1}{4}\right)}\right\vert \enspace {c}_{\beta }{k}^{-2\frac{\beta +1}{4}}{\leqslant}C{k}^{-\left(\mu +\frac{\beta +1}{4}\right)}.\hfill \end{align} \tag{ 22 }$

The result now follows by standard means:

$\begin{align*}\hfill {\Vert}{x}_{k}^{\delta }-{x}^{{\dagger}}{\Vert}& {\leqslant}{\Vert}{x}_{k}^{\delta }-{x}_{k}^{\delta }{\Vert}+{\Vert}{x}_{k}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}\sqrt{2}k\delta +{\Vert}{r}_{k}\left({A}^{{\ast}}A\right){\left({A}^{{\ast}}A\right)}^{\mu }\omega {\Vert}\hfill \\ \hfill & {\leqslant}\begin{cases}\sqrt{2}k\delta +C{k}^{-2\mu }\quad \hfill & \mu {\leqslant}\frac{\beta +1}{4},\hfill \\ k\delta +C{k}^{-\left(\mu +\frac{\beta +1}{4}\right)}\quad \hfill & \mu { >}\frac{\beta +1}{4}.\hfill \end{cases}\hfill \end{align*}$

Solving for k by equating the two terms in the last bounds yields the a priori parameter choice and the corresponding rates. □

By a slight refinement, we may even show o(.)-rates as in [15]:

Corollary 1. With the same assumptions as in theorem 4, if the stopping index is chosen such that

$\begin{equation*}\frac{{\Vert}{x}_{k\left(\delta \right)}-{x}^{{\dagger}}{\Vert}}{k\left(\delta \right)}{\leqslant}\tau \delta {< }\frac{{\Vert}{x}_{k}-{x}^{{\dagger}}{\Vert}}{k}\quad 1{\leqslant}k{\leqslant}k\left(\delta \right),\end{equation*}$

(cf, [15, equation (3.1)]), then the same rates for k(δ) as in theorem 4 hold for the stopping index and we obtain the convergence rates

$\begin{align*}\hfill {\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}o\left({\delta }^{\frac{2\mu }{2\mu +1}}\right)\qquad & \quad \text{if}\enspace \mu {\leqslant}\frac{\beta +1}{4}\hfill \\ \hfill {\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}o\left(\right.{\delta }^{\frac{\mu +\frac{\beta +1}{4}}{\mu +\frac{\beta +1}{4}+1}}\left.\right)\qquad & \quad \text{if}\enspace \mu { >}\frac{\beta +1}{4}\hfill \end{align*}$

Proof. Following [15], we only have to improve the approximation rates ||x_k − x^†|| to o(.)-rates. From (21), it follows that for $\mu {\leqslant}\frac{\beta +1}{4}$

$\begin{equation}{\mathrm{lim}}_{k\to \infty }\frac{\vert {r}_{k}\left(\lambda \right){\lambda }^{\mu }\vert }{{k}^{-2\mu }}=0\qquad \text{pointwisely}\;\text{for}\;\text{all}\;\lambda \in \left(0,1\right].\end{equation} \tag{ 23 }$

In case of $\mu { >}\frac{\beta +1}{4}$ , we note that for all λ ∈ (0, 1] and ξ > 0, ${\mathrm{lim}}_{k\to \infty }{\left(1-\lambda \right)}^{\frac{k+1}{2}}{k}^{\xi }=0$ . Thus, we may conclude from (22) that for all λ ∈ (0, 1],

$\begin{equation}{\mathrm{lim}}_{k\to \infty }\enspace \frac{\vert {r}_{k}\left(\lambda \right){\lambda }^{\mu }\vert }{{k}^{-\left(\mu +\frac{\beta +1}{4}\right)}}{\leqslant}{\mathrm{lim}}_{k\to \infty }\left\vert {\left(1-\lambda \right)}^{\frac{k+1}{2}}\right\vert \enspace {k}^{\mu -\frac{\beta +1}{4}}{c}_{\beta }\left\vert {\lambda }^{\mu -\frac{\beta +1}{4}}\right\vert =0.\end{equation} \tag{ 24 }$

Thus, by the theorem of dominated convergence, we obtain o(.)-rates for the approximation error:

$\begin{align*}\hfill {\Vert}{x}_{k}^{\delta }-{x}^{{\dagger}}{\Vert}& ={\Vert}{r}_{k}\left({A}^{{\ast}}A\right){\left({A}^{{\ast}}A\right)}^{\mu }\omega {\Vert}{\leqslant}\begin{cases}^{-2\mu }\frac{{\Vert}{r}_{k}\left({A}^{{\ast}}A\right){\left({A}^{{\ast}}A\right)}^{\mu }\omega {\Vert}}{{k}^{-2\mu }}\quad \hfill & \mu {\leqslant}\frac{\beta +1}{4},\hfill \\ {k}^{-\left(\mu +\frac{\beta +1}{4}\right)}\frac{{\Vert}{r}_{k}\left({A}^{{\ast}}A\right){\left({A}^{{\ast}}A\right)}^{\mu }\omega {\Vert}}{{k}^{-\left(\mu +\frac{\beta +1}{4}\right)}}\quad \hfill & \mu { >}\frac{\beta +1}{4}\hfill \end{cases}\hfill \\ \hfill & =\begin{cases}o\left({k}^{-2\mu }\right)\quad \hfill & \mu {\leqslant}\frac{\beta +1}{4},\hfill \\ o\left({k}^{-\left(\mu +\frac{\beta +1}{4}\right)}\right)\quad \hfill & \mu { >}\frac{\beta +1}{4}.\hfill \end{cases}\hfill \end{align*}$

The result now follows in exactly the same way as in [15]. □

These results correspond to those of Neubauer when β = 1. However, for β > 1 this is an improvement as we obtain optimal-order convergence if β is chosen larger than 4μ − 1. We note that in the optimal-order case, the number of iteration needed is $O\left({\delta }^{-\frac{1}{2\mu +1}}\right)$ , which is the same order as for semiiterative methods and for the conjugate gradient method. The corresponding number of iteration for the Landweber methods is $O\left({\delta }^{-\frac{2}{2\mu +1}}\right)$ , cf, e.g. [7, theorem 6.5]. Since $\frac{1}{\mu +\frac{\beta +1}{4}+1}{< }\frac{2}{2\mu +1}$ , the number of iterations is, in general, smaller than for the Landweber method even in the suboptimal case. Thus, the Nesterov acceleration certainly qualifies being called a fast method.

3.3. Converse results and logarithmic rates

We present some further contributions to the regularization theory of Nesterov acceleration, namely converse results and logarithmic rates. In this section, we denote by E_λ the spectral family of A*A (cf [7]).

Converse results are statements that certain rates for the approximation error ||x_k − x^†|| imply some regularity of the true solution x^† in form of source conditions. These are converse to standard convergence rates result as, e.g., in theorem 4, where a decay of the approximation error follows from a regularity condition. The results of (21) and (22) state that a given rate of approximation

$\begin{equation}{\Vert}{x}_{k}-{x}^{{\dagger}}{\Vert}=O\left(\frac{1}{{k}^{\xi }}\right),\end{equation} \tag{ 25 }$

for some ξ > 0, is obtained for a smoothness index of with μ = μ*, where

$\begin{equation}{\mu }^{{\ast}}=\begin{cases}\frac{\xi }{2}\quad \hfill & \quad \text{if}\enspace \frac{\xi }{2}{\leqslant}\frac{\beta +1}{4}\hfill \\ \frac{\xi }{2}+\frac{\xi }{2}-\frac{\beta +1}{4},\quad \hfill & \quad \text{if}\enspace \frac{\xi }{2}{ >}\frac{\beta +1}{4}.\hfill \end{cases}\end{equation} \tag{ 26 }$

Now, concerning converse results for Nesterov acceleration, we may verify similar to [7, proposition 4.13], that a given rate of approximation requires a certain smoothness index, such that our convergence results are rather sharp in that respect. Unfortunately, we can prove this only for the optimal-order range of indices.

Theorem 5. Let ||A*A|| ⩽ 1. For β > −1 fixed, assume that the approximation error obeys a certain rate (25) for some ξ > 0. Then x^† has to satisfy a source condition

$\begin{equation*}{x}^{{\dagger}}={\left({A}^{{\ast}}A\right)}^{{\mu }^{{\ast}}-{\epsilon}}\omega ,\quad {\Vert}\omega {\Vert}{< }\infty ,\end{equation*}$

for any > 0, with ${\mu }^{{\ast}}=\frac{\xi }{2}$ .

Proof. In (17), we established the bound |g_k(λ)| ⩽ k² for k ⩾ 1. This yields that

$\begin{equation*}\vert {r}_{k}\left(\lambda \right)\vert {\geqslant}1-\lambda \vert {g}_{k}\left(\lambda \right)\vert {\geqslant}1-\lambda {k}^{2}{\geqslant}\frac{1}{2}\quad \text{for}\enspace \lambda \in \left[0,\frac{1}{2{k}^{2}}\right].\end{equation*}$

Using spectral theory, it follows that

$\begin{align*}\hfill {\Vert}{x}_{k}-{x}^{{\dagger}}{{\Vert}}^{2}& ={\int }_{0}^{1}{r}_{k}{\left(\lambda \right)}^{2}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}{\geqslant}{\int }_{0}^{\frac{1}{2{k}^{2}}}{r}_{k}{\left(\lambda \right)}^{2}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}\hfill \\ \hfill & {\geqslant}\frac{1}{4}{\int }_{0}^{\frac{1}{2{k}^{2}}}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}=\frac{1}{4}{\Vert}{E}_{\frac{1}{2{k}^{2}}}{{\Vert}}^{2}.\hfill \end{align*}$

Thus, a convergence rate of ||x_k − x^†|| = O(k^−ξ) implies $\left.{{\Vert}{E}_{\frac{1}{2{k}^{2}}}{\Vert}}^{2}\right)=O\left({\left(\frac{1}{{k}^{2}}\right)}^{\frac{\xi }{2}}\right)$ , which implies the source condition with $\mu {\leqslant}\frac{\xi }{2}-{\epsilon}$ for any positive by [7, lemma 4.12] (see also [2]). This proves the result. □

Remark 2. The result of theorem 5 is comparable to well-known result in the optimal-order situation. It is an open problem whether it can be verified that for a rate $\frac{\xi }{2}{ >}\frac{\beta +1}{4}$ , i.e., in the suboptimal case, also a higher smoothness index as in (26) (second line) is needed. We could not establish results in that direction, mainly because it is difficult to find lower bounds for the Gegenbauer polynomials (which may have zeros in the spectrum).

Some more general rates and converse results have been established in [2]. There (see also [4]), the so-called best worst case error has been defined as (adopted to our notation)

$\begin{equation*}\text{bwc}\left(\delta \right){:=}{\mathrm{sup}}_{{\Vert}{y}^{\delta }-y{\Vert}{\leqslant}\delta }{\mathrm{inf}}_{k}{{\Vert}{x}_{k}^{\delta }-{x}^{{\dagger}}{\Vert}}^{2},\end{equation*}$

which represents the best δ-rate that one can get for x^† satisfying certain smoothness conditions. What has been found in [2] (cf proposition 3.3) and [4, theorem 2.16] is that, for certain methods, a convergence rate of ${{\Vert}{x}_{k}-{x}^{{\dagger}}{\Vert}}^{2}{\leqslant}\phi \left(k\right)$ is equivalent to a convergence rate for the best worst case error bwc(δ) ⩽ ψ(δ) and also equivalent to a decay rate of ${{\Vert}{E}_{t}{x}^{{\dagger}}{\Vert}}^{2}=\phi \left(t\right)$ (which is related to a source condition). These results are established for a general class of monotone regularization schemes. However, in our case, this monotonicity (and various positivity assumptions) are not necessarily satisfied as the Gegenbauer polynomials are not monotone in k. Thus, a further investigation of such equivalences and converse results is an open problem.

As an illustration of this theory and as an example of convergence rates under general smoothness classes similar as in, e.g. [2, 4, 13], we can verify logarithmic rates for the Nesterov acceleration scheme. We define the logarithmic (monotone) rate function (cf [2, 4]) for some ν > 0:

$\begin{equation*}{\varphi }_{\nu }\left(t\right){:=}\begin{cases}{\vert \mathrm{log}\left(t\right)\vert }^{-\nu }\quad \hfill & 0{< }t{< }{e}^{-\left(1+\nu \right)},\hfill \\ {\left(1+\nu \right)}^{-\nu }\quad \hfill & {e}^{-\left(1+\nu \right)}{\leqslant}t{\leqslant}1,\hfill \end{cases}\end{equation*}$

with the continuous extension φ_ν(0) = 0.

Proposition 3. Let ||A*A|| ⩽ 1 and β > −1. Suppose that the following logarithmic source condition holds:

$\begin{equation}{{\Vert}{E}_{t}{x}^{{\dagger}}{\Vert}}^{2}{\leqslant}{\varphi }_{\nu }\left(t\right),\quad 0{< }t{< }1.\end{equation} \tag{ 27 }$

Then Nesterov acceleration shows a logarithmic best worst case rate

$\begin{equation*}\text{bwc}\left(\delta \right)=O\left({\vert \mathrm{log}\enspace \delta \vert }^{-\nu }\right).\end{equation*}$

for δ → 0.

Proof. From log(x) ⩽ x − 1, for x ⩾ 0, it follows that log(1 − λ) ⩽ −λ for λ ∈ [0, 1], hence

$\begin{equation*}{\left(1-\lambda \right)}^{\frac{k+1}{2}}={e}^{\frac{k+1}{2}\mathrm{log}\left(1-\lambda \right)}{\leqslant}{e}^{-\frac{k+1}{2}\lambda }.\end{equation*}$

Combining this with (5) and (13) yields the bound

$\begin{equation*}\vert {r}_{k}\left(\lambda \right)\vert {\leqslant}{e}^{-\frac{k+1}{2}\lambda }{\leqslant}{e}^{-\frac{k}{2}\lambda }\quad \lambda \in \left[0,1\right],\enspace k{\geqslant}1.\end{equation*}$

We may proceed similar as in [4]. Using [4, estimate (4.6)] with α = k⁻¹ yields

$\begin{equation*}\frac{{\varphi }_{\nu }\left(\lambda \right)}{{\varphi }_{\nu }\left({k}^{-1}\right)}{\leqslant}\lambda k\quad \text{for}\enspace 0{\leqslant}{k}^{-1}{\leqslant}\lambda {\leqslant}{e}^{-\left(1+\nu \right)}.\end{equation*}$

Thus,

$\begin{equation*}{\vert {r}_{k}\left(\lambda \right)\vert }^{2}{\leqslant}{e}^{-k\lambda }{\leqslant}{e}^{-\frac{{\varphi }_{\nu }\left(\lambda \right)}{{\varphi }_{\nu }\left({k}^{-1}\right)}}\quad 0{\leqslant}k{\leqslant}\lambda {\leqslant}{e}^{-\left(1+\nu \right)}.\end{equation*}$

As in [4, (2.18)] we find with (27) that

$\begin{align*}\hfill & {\int }_{{k}^{-1}}^{{e}^{-\left(1+\nu \right)}}\vert {r}_{k}\left(\lambda \right){\vert }^{2}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}{\leqslant}{\int }_{{k}^{-1}}^{{e}^{-\left(1+\nu \right)}}{e}^{-\frac{{\varphi }_{\nu }\left(\lambda \right)}{{\varphi }_{\nu }\left({k}^{-1}\right)}}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}\hfill \\ \hfill & {\leqslant}{\int }_{{k}^{-1}}^{{e}^{-\left(1+\nu \right)}}{e}^{-\frac{{{\Vert}{E}_{\lambda }{x}^{{\dagger}}{\Vert}}^{2}}{{\varphi }_{\nu }\left({k}^{-1}\right)}}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}{\leqslant}{\varphi }_{\nu }\left({k}^{-1}\right){\int }_{{k}^{-1}}^{{e}^{-\left(1+\nu \right)}}{e}^{-\frac{{{\Vert}{E}_{\lambda }{x}^{{\dagger}}{\Vert}}^{2}}{{\varphi }_{\nu }\left({k}^{-1}\right)}}d\left(\frac{{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}}{{\varphi }_{\nu }\left({k}^{-1}\right)}\right)\hfill \\ \hfill & {\leqslant}{\varphi }_{\nu }\left({k}^{-1}\right){\int }_{0}^{1}{e}^{-z}dz.\hfill \end{align*}$

Furthermore,

$\begin{align*}\hfill & {\int }_{{e}^{-\left(1+\nu \right)}}^{1}{\vert {r}_{k}\left(\lambda \right)\vert }^{2}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}{\leqslant}{\int }_{{e}^{-\left(1+\nu \right)}}^{1}{e}^{-\lambda k}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}\hfill \\ \hfill & \quad {\leqslant}{\varphi }_{\nu }\left({k}^{-1}\right){\int }_{{e}^{-\left(1+\nu \right)}}^{1}\frac{{e}^{-{e}^{-\left(1+\nu \right)}k}}{{\varphi }_{\nu }\left({k}^{-1}\right)}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}{\leqslant}{\varphi }_{\nu }\left({k}^{-1}\right){\Vert}{x}^{{\dagger}}{\Vert}{\mathrm{sup}}_{s{\geqslant}{e}^{-\left(1+\nu \right)}}\frac{{e}^{-{e}^{-\left(1+\nu \right)}s}}{{\varphi }_{\nu }\left({s}^{-1}\right)}.\hfill \end{align*}$

Since the supremum is easily seen to be bounded, it follows that this integral can be bounded by O(φ_ν(k⁻¹)). Altogether, we find for k⁻¹ ⩽ e^−(1+ν) with some generic constant C that

$\begin{align*}\hfill & {\Vert}{x}_{k}-{x}^{{\dagger}}{{\Vert}}^{2}\hfill \\ \hfill & {\leqslant}{\int }_{0}^{{k}^{-1}}\vert {r}_{k}\left(\lambda \right){\vert }^{2}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}+{\int }_{{k}^{-1}}^{{e}^{-\left(1+\nu \right)}}\vert {r}_{k}\left(\lambda \right){\vert }^{2}{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}\hfill \\ \hfill & \quad +{\int }_{{e}^{-\left(1+\nu \right)}}^{1}\vert {r}_{k}\left(\lambda \right){\vert }^{2}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}\hfill \\ \hfill & {\leqslant}{\int }_{0}^{{k}^{-1}}d{\Vert}{E}_{\lambda }{x}^{{\dagger}}{{\Vert}}^{2}+C{\varphi }_{\nu }\left({k}^{-1}\right){\leqslant}C{\varphi }_{\nu }\left({k}^{-1}\right),\hfill \end{align*}$

where we used (27). Since additionally φ_ν(k⁻¹) ⩽ 2^ν φ_ν(k⁻²), we observe with a different constant C that

$\begin{equation*}{\Vert}{x}_{k}^{\delta }-{x}^{{\dagger}}{{\Vert}}^{2}{\leqslant}2{\Vert}{x}_{k}^{\delta }-{x}_{k}{{\Vert}}^{2}+2{\Vert}{x}_{k}-{x}^{{\dagger}}{{\Vert}}^{2}{\leqslant}C\left({k}^{2}{\delta }^{2}+{\varphi }_{\nu }\left({k}^{-2}\right)\right).\end{equation*}$

By balancing the two term, we obtain an equation for k, which, when put back into the bound yields as in [2, p 533] (with k⁻² playing the role of α) the upper bound

$\begin{equation*}{\Vert}{x}_{k}^{\delta }-{x}^{{\dagger}}{{\Vert}}^{2}{\leqslant}O\left(\mathrm{log}\left(\vert \delta {\vert }^{-\nu }\right)\right),\end{equation*}$

for δ sufficiently small. Taking the inf and sup on the left-hand side establishes the result. □

3.4. Discrepancy principle

With the improved estimates, we can as well strengthen the result of [15] when the iteration is combined with the well-known discrepancy principle. Recall that it defines a stopping index k(δ) a posteriori by the first (smallest) k that fulfils the inequality

$\begin{equation}{\Vert}A{x}_{k}^{\delta }-{y}^{\delta }{\Vert}{\leqslant}\tau \delta ,\end{equation} \tag{ 28 }$

where τ > 1 is fixed. The corresponding convergence rates can be obtained by a slight modification of the proof in [15] and the general theory in [7].

Theorem 6. Let ||A*A|| < 1, β > −1, and assume a source condition (12) satisfied. If the iteration (1) is stopped by the discrepancy principle (28), then the following convergence rates are obtained:

(a)
If $\mu +\frac{1}{2}{\leqslant}\frac{\beta +1}{4}$ , then optimal order convergence rates is achieved
$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}o\left({\delta }^{\frac{2\mu }{2\mu +1}}\right)\end{equation*}$
with a stopping index k(δ) being of the same order as in (20).
(b)
$\mu +\frac{1}{2}{\geqslant}\frac{\beta +1}{4}$ , then it holds that
$\begin{equation*}k\left(\delta \right)=O\left({\delta }^{-\frac{1}{\frac{1}{2}+\mu +\frac{\beta +1}{4}}}\right)\end{equation*}$
and a rate of
$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}{\leqslant}o\left({\delta }^{\frac{\mu +\frac{\beta +1}{4}-\frac{1}{2}}{\mu +\frac{\beta +1}{4}+\frac{1}{2}}}\right).\end{equation*}$
is achieved.

Proof. The proof [15, theorem 4.1] only needs minor modifications. The estimate [15, equation (4.3)]

$\begin{equation*}{\Vert}{x}_{k\left(\delta \right)}-{x}^{{\dagger}}{\Vert}{\leqslant}{\Vert}{r}_{k\left(\delta \right)}\left({T}^{{\ast}}T\right)w{{\Vert}}^{\frac{1}{2\mu +1}}{\left(\left(\tau +1\right)\delta \right)}^{\frac{2\mu }{2\mu +1}}\end{equation*}$

is valid independent of our new rate results, hence it follows as in [15, equation (4.4)] that ${\Vert}{x}_{k\left(\delta \right)}-{x}^{{\dagger}}{\Vert}{\leqslant}o\left({\delta }^{\frac{2\mu }{2\mu +1}}\right)$ . It remains to estimate ${\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}$ by (16) combined with an upper bound for k(δ). Estimate [15, equation (4.2)] and the discrepancy principle yields

$\begin{equation*}\tau \delta {\leqslant}\delta +{\Vert}{\left({T}^{{\ast}}T\right)}^{\frac{1}{2}+\mu }{r}_{k}\left({T}^{{\ast}}T\right)w{\Vert}\end{equation*}$

for k = k(δ).

To obtain o(.)-estimates, we slightly refine the bound (21). By interpolation, we obtain from (18) and (13) that

$\begin{equation}{k}^{2\mu }\left\vert {\lambda }^{\mu }\frac{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(\sqrt{1-\lambda }\right)}{{C}_{k}^{\left(\frac{\beta +1}{2}\right)}\left(1\right)}\right\vert {\leqslant}c\qquad 0{\leqslant}\lambda {\leqslant}1,\enspace \mu {\leqslant}\frac{\beta +1}{4}.\end{equation} \tag{ 29 }$

Thus,

$\begin{align*}\hfill & {k}^{2\mu }{r}_{k}\left(\lambda \right){\lambda }^{\mu }{\leqslant}\begin{cases}C{\left(1-\lambda \right)}^{\frac{k+1}{2}}\quad \hfill & \mu {\leqslant}\frac{\beta +1}{4}\hfill \\ C{k}^{2\left(\mu -\frac{\beta +1}{4}\right)}{\lambda }^{\mu -\frac{\beta +1}{4}}{\left(1-\lambda \right)}^{\frac{k+1}{2}}\quad \hfill & \mu {\leqslant}\frac{\beta +1}{4}\hfill \end{cases}\hfill \\ \hfill & \qquad {=:}\begin{cases}_{1}\left(k,\lambda \right),\quad \hfill & \mu {\leqslant}\frac{\beta +1}{4}\hfill \\ {k}^{\mu -\frac{\beta +1}{4}}{\gamma }_{2}\left(k,\lambda \right),\quad \hfill & \mu {\leqslant}\frac{\beta +1}{4}\hfill \end{cases}\hfill \end{align*}$

and where lim_k→∞ γ₁(k, λ) = lim_k→∞ γ₂(k, λ) = 0, pointwise for λ ∈ [0, 1). Thus, in case that $\mu +\frac{1}{2}{\leqslant}\frac{\beta +1}{4}$ , we obtain by the theorem of dominated convergence that

$\begin{align*}\hfill & {\Vert}{\left({T}^{{\ast}}T\right)}^{\frac{1}{2}+\mu }{r}_{k}\left({T}^{{\ast}}T\right)w{\Vert}{\leqslant}k{\left(\delta \right)}^{-2\left(\frac{1}{2}+\mu \right)}{\Vert}{\gamma }_{1}\left(k\left(\delta \right),{T}^{{\ast}}T\right)\omega {\Vert}\hfill \\ \hfill & =k{\left(\delta \right)}^{-2\left(\frac{1}{2}+\mu \right)}o\left(k\right)\quad \text{as}\enspace k\to \infty .\hfill \end{align*}$

Hence,

$\begin{equation*}\left(\tau -1\right)\delta {\leqslant}Ck{\left(\delta \right)}^{-2\left(\frac{1}{2}+\mu \right)}o\left(k\left(\delta \right)\right),\end{equation*}$

which yields (20), and with (16) we obtain ${\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}=o\left({\delta }^{\frac{2\mu }{2\mu +1}}\right)$ , which proves the result in the optimal case.

In case that $\mu +\frac{1}{2}{ >}\frac{\beta +1}{4}$ , since γ₂(k, λ) = o(k), the corresponding estimate is

$\begin{equation*}\left(\tau -1\right)\delta {\leqslant}Ck{\left(\delta \right)}^{-\left(\frac{1}{2}+\mu +\frac{\beta +1}{4}\right)}o\left(k\left(\delta \right)\right),\end{equation*}$

from which the result in the second case follows. □

These rates agree with those of [15] when setting β = 1. There, however, only the suboptimal case 2 was possible. Our improvement is to show that we may achieve optimal order results even with the discrepancy principle provided β is sufficiently large.

Remark 3. It is clear that in practice β should be selected in the regime of optimal rates, i.e. β > 4μ − 1 for a prior choices and β > 4μ + 1 for the discrepancy principle. However, it is a rule of thumb to choose such parameter also as small as possible, or more precisely, in such a way to come close to the saturation point, i.e., β ∼ 4μ − 1, respectively, β ∼ 4μ + 1.

Remark 4. For semiiterative methods, a modified discrepancy principle [7, 10] has been defined, where the residual in (28) is replaced by an expression of the form (y^δ, s_k(AA*)y^δ) with a constructed function s_k. This yields an order-optimal method as for the a priori stopping rule. An adaption of this strategy for Nesterov iteration is certainly possible and this should yield order-optimal rates for all $\mu {\leqslant}\frac{\beta +1}{4}$ . However, the strategy is quite involved and it is not completely clear to us how to include this into the iteration efficiently. We thus do not intend to investigate such modifications in this article.

4. Numerical results

In this section we present some small numerical experiments to illustrate the semi-saturation phenomenon and to investigate the performance of Nesterov's iteration, in particular, with respect to the optimal-order results.

In a first example we consider a simple diagonal operator $A=\mathrm{diag}\left(\frac{1}{{n}^{2}}\right)$ , for n = 1, ..., 1000, as well as an exact solution ${x}^{{\dagger}}={\left(\frac{1}{{n}^{4}}{\left(-1\right)}^{n}\right)}_{n=1}^{\mathrm{1000}}$ , which amounts to a source condition being satisfied with index μ = 0.75. Thus, we are in a case of higher smoothness, where the results of the present article really improve those of [15]. We add standard normally distributed Gaussian noise to the exact data and performed various iterative regularization schemes: Landweber iteration, the ν-method, and the Nesterov iteration; the latter two with various settings of the parameters ν and β, respectively.

We calculated the stopping index either by the discrepancy principle (28) with τ = 1.01 or, since we have the luxury of an available exact solution in this synthetic example, we also calculate the oracle stopping index, which is defined as

$\begin{equation*}{k}_{\text{opt}}={\mathrm{arg min}}_{k}{\Vert}{x}_{k}^{\delta }-{x}^{{\dagger}}{\Vert}.\end{equation*}$

In other words, k_opt is the theoretically optimal possible stopping index.

In figure 1, we display the error ${\Vert}{x}_{k\left(\delta \right)}^{\delta }-{x}^{{\dagger}}{\Vert}$ against various noise levels on a log–log scale. The curves correspond to convergence rates for Nesterov iteration (full line, blue), Landweber iteration (dotted line, black), and the ν-method (dashed dotted line, red). The parameter were chosen as β = 4 and ν = 1, i.e., we are in the optimal-order case covered by item 1 in theorem 4 and theorem 6. On the left-hand side we employ the oracle stopping rule using k_opt and on the right-hand side we use the discrepancy principle.

As can be observed, all three methods show a similar (optimal-order) rate, as stated in theorems 4 and 6. In particular, this verifies one of our findings that the discrepancy principle for Nesterov's iteration leads to an optimal-order method provided β is chosen appropriately.

In figure 2, we illustrate the semi-saturation phenomenon: here β and ν are deliberately chosen as too small (β = 0, ν = 0.4 on the left-hand side and β = −0.5, ν = 0.3 on the right-hand side). We observe that for small ν, the convergence rate of the ν-method is slow as a result of its saturation. On the other hand, the Nesterov iteration also has a slower rate than the non-saturating Landweber iteration, but, as can be expected from our residual polynomial representation, it is in between the other two.

We remark that the ν-methods show some unpleasant behaviour when ν is chosen small. The residual is highly oscillating and for small noise level we could not even reach the prescribed discrepancy, and if we did, then the number of iteration was quite high, even higher than for Landweber iteration. This might be attributed to our quite aggressive setting of the discrepancy principle with τ = 1.01. In that respect, the Nesterov iteration was very well-behaved, and we had no problem with a small β, which is probably due to the robust Landweber-component in the representation (5).

The optimal-order convergence only partly illustrates the effective performance of the methods. In table 1 we therefore provide the ratio of errors values, i.e., the numbers in the table are $\frac{{\Vert}{x}_{\mathrm{m}\mathrm{e}\mathrm{t}\mathrm{h}\mathrm{o}\mathrm{d},k}^{\delta }-{x}^{{\dagger}}{\Vert}}{{\Vert}{x}_{\mathrm{N}\mathrm{e}\mathrm{s}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{o}\mathrm{v},{k}_{\text{opt}}}^{\delta }-{x}^{{\dagger}}{\Vert}}$ , where ${x}_{\mathrm{N}\mathrm{e}\mathrm{s}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{o}\mathrm{v},{k}_{\text{opt}}}^{\delta }$ denotes Nesterov iteration with the optimal stopping rule and ${x}_{\mathrm{m}\mathrm{e}\mathrm{t}\mathrm{h}\mathrm{o}\mathrm{d},k}^{\delta }$ the iteration of the respective method with the respective stopping rule. All results correspond to an optimal-order regime of parameters (those of figure 1). The number of iterations (both for the oracle stopping rule and the discrepancy principle) are given in table 2. In these tables, we also include the corresponding results for the conjugate gradient iteration CGNE [11].

Table 1. Errors compared to Nesterov iteration: $\frac{{\Vert}{x}_{\mathrm{m}\mathrm{e}\mathrm{t}\mathrm{h}\mathrm{o}\mathrm{d},k}^{\delta }-{x}^{{\dagger}}{\Vert}}{{\Vert}{x}_{\mathrm{N}\mathrm{e}\mathrm{s}\mathrm{t}\mathrm{e}\mathrm{r}\mathrm{o}\mathrm{v},{k}_{\text{opt}}}^{\delta }-{x}^{{\dagger}}{\Vert}}$ .

		δ
Method	Stopping	10⁻⁵	10⁻⁴	10⁻³	10⁻²	10⁻¹
Nesterov	k_opt	1	1	1	1	1
Landweber	k_opt	1.15	0.83	0.96	1.05	1.06
ν-method	k_opt	1.02	1.06	1.01	1.26	0.97
CGNE	k_opt	1.02	0.82	1.05	1.02	0.84
Nesterov	Discrepancy	1.58	1.10	1.41	2.84	1.90
Landweber	Discrepancy	2.23	1.17	1.41	2.80	1.98
ν-method	Discrepancy	1.02	1.13	1.00	1.56	1.88
CGNE	Discrepancy	1.81	1.19	1.05	2.51	1.97

Table 2. Number of iterations for various methods; setting as in table 1.

		δ
Method	Stopping	10⁻⁵	10⁻⁴	10⁻³	10⁻²	10⁻¹
Nesterov	k_opt	371	163	65	26	15
Landweber	k_opt	11 000	2193	512	145	36
ν-method	k_opt	190	82	33	22	9
CGNE	k_opt	10	6	4	3	2
Nesterov	Discrepancy	260	111	39	13	1
Landweber	Discrepancy	5106	1080	220	37	1
ν-method	Discrepancy	190	96	33	10	1
CGNE	Discrepancy	8	5	4	2	1

In terms of the number of iteration, the Nesterov iteration is slightly slower than the ν-methods (approximately by a constant factor of 1.5) but both have a similar modest increase of iterations when δ is decreased. Both need more iteration than the CGNE-method, which, of course, is the fastest one by design. The slightly higher number of iterations might be attributed to the better error estimate in (16). (Note that the ν-methods have a 2 in place of $\sqrt{2}$ there.) It might appear a little bit paradoxical that a better estimate leads to slower convergence, but this is clear from the theory as the number of iteration is a decreasing function of δ and thus also of any factor in front of δ. This factor, however, pays off when considering the total error of the method, and we observe that Nesterov iteration with the optimal choice k_opt indeed has almost always a slightly smaller error than the ν-method. Surprisingly, it is in several instances also better than the CGNE-method. However, the Nesterov method sometimes loses some of its advantages against the ν-method, when using the discrepancy principle, but the performance is still acceptable.

Some further experiments indicate that the results are rather insensitive to overestimating β. As stated in remark 3, the best choice is usually related to the smoothness index, but there arose no serious problems when β was larger.

Further numerical experiments have been performed in [15]: even though the value of β was not reported there, the results are consistent with our theory with the choice β = 1. The forward operator there was the Green's function for the solution of the 1D boundary value problem −u'' = f with homogeneous boundary conditions. Exact solutions with various smoothness are stated there: example 5.1 with $\mu =\frac{1}{8}$ , example 5.2 with $\mu =\frac{5}{8}$ , and example 5.3 with $\mu =\frac{17}{8}$ . We used the same problem and the same examples, but we calculated A by using an FEM-discretization of the boundary value problem and A as the corresponding solution operator. For simplicity we ignored discretization errors and took the discretized (projected) solution as x^†.

The main purpose of this experiment is to verify that the discrepancy principle (τ = 1.1) can be made an optimal-order method. We choose β = 3.5 for the first two examples and β = 9.5 for the third, which should in any case lead to an optimal-order situation. In figure 3, we plotted the error versus the relative noise level on a logarithmic scale for the three examples with this choice of β, indicated by the marker 'x'. As a comparison, we also indicated the predicted optimal rate by a solid line. Furthermore, also shown and marked with '+' are the corresponding results for β = 1, i.e., in the suboptimal case.

**Figure 3.** Convergence rates for the examples in [15]. Left: example 1, smoothness index $\mu =\frac{1}{8}$ . Centre: example 2, smoothness index $\mu =\frac{5}{8}$ . Right: example 3, smoothness index $\mu =\frac{17}{8}$ . Displayed are the errors versus the noise level on a logarithmic scale. A marker 'x' indicates optimal choice of β, and '+' indicates suboptimal choice β = 1. The full line indicates the optimal order rate.
Download figure:
Standard image High-resolution image

**Figure 3.** Convergence rates for the examples in [15]. Left: example 1, smoothness index $\mu =\frac{1}{8}$ . Centre: example 2, smoothness index $\mu =\frac{5}{8}$ . Right: example 3, smoothness index $\mu =\frac{17}{8}$ . Displayed are the errors versus the noise level on a logarithmic scale. A marker 'x' indicates optimal choice of β, and '+' indicates suboptimal choice β = 1. The full line indicates the optimal order rate.
Download figure:
Standard image High-resolution image

These results clearly illustrate that for the discrepancy principle we may achieve the optimal order rates with the correct choice of β and for a wrong choice of β the rate deteriorates. For low-smoothness as in example 1 (left picture in figure 3, however, there seems to occur almost no deterioration contrary to expectation.

5. Conclusion

We have provided a representation of the residual polynomials for Nesterov's acceleration method for linear ill-posed problems as a product of Gegenbauer polynomials and Landweber-type residuals. This allowed us to prove optimal-order rates for an a priori stopping rule and the discrepancy principle as long as β in (2) is sufficiently large. The number of iteration is shown to be of the same order as for other fast methods such as the ν-method or the conjugate gradients methods. Moreover, our representation clearly explains the observed semi-saturation phenomenon.

Within the class of linear iterative methods, the Nesterov acceleration is an excellent choice, as it is a fast method as well as a quite robust one. Although, it must be conceded, that it cannot compete with the conjugate gradient method in terms of number of iterations. However, this is compensated by its flexibility and simplicity of use, which also allows one to easily integrate it into existing gradient methods and also to apply it in nonlinear cases.

Data availability statement

No new data were created or analysed in this study.

Optimal-order convergence of Nesterov acceleration for linear ill-posed problems^*

Article metrics

Submit

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Residual polynomials for Nesterov acceleration

3. Convergence analysis

3.1. Convergence rates and semi-saturation

3.2. Convergence analysis

3.3. Converse results and logarithmic rates

3.4. Discrepancy principle

4. Numerical results

5. Conclusion

Data availability statement

Footnotes

Optimal-order convergence of Nesterov acceleration for linear ill-posed problems*

Article metrics

Submit

Share this article

Author e-mails

Author affiliations

Author notes

ORCID iDs

Dates

Peer review information

Abstract

1. Introduction

2. Residual polynomials for Nesterov acceleration

3. Convergence analysis

3.1. Convergence rates and semi-saturation

3.2. Convergence analysis

3.3. Converse results and logarithmic rates

3.4. Discrepancy principle

4. Numerical results

5. Conclusion

Data availability statement

Footnotes

Optimal-order convergence of Nesterov acceleration for linear ill-posed problems^*