An alternating iterative minimisation algorithm for the double-regularised total least square functional

The total least squares (TLS) method is a successful approach for linear problems if both the right-hand side and the operator are contaminated by some noise. For ill-posed problems, a regularisation strategy has to be considered to stabilise the computed solution. Recently a double regularised TLS method was proposed within an infinite dimensional setup and it reconstructs both function and operator, reflected on the bilinear forms Our main focuses are on the design and the implementation of an algorithm with particular emphasis on alternating minimisation strategy, for solving not only the double regularised TLS problem, but a vast class of optimisation problems: on the minimisation of a bilinear functional of two variables.


Introduction
In [2], the authors described a new two-parameter regularisation scheme for solving an illposed operator equation. The task consists of the inversion of a linear operator   → A : 0 defined between Hilbert spaces = A f g .
( 1 ) 0 0 In contrast to standard inverse problems, where the task is to solve (1) from given noisy data, a more realistic setup is considered where both data and operator are not known exactly. For the reconstruction, a cost functional with two penalisation terms based on the TLS (total least squares) technique is used.
This approach presented in [2] focuses on linear operators that can be characterised by a function, as it is, e.g. the case for linear integral operators, where the kernel function determines the behaviour of the operator. Moreover, it is assumed that the noise in the operator is due to an incorrect characterising function. A penalty term is not only used to stabilise the reconstruction of the unknown solution, as it is the case in [10][11][12], but also to stabilise the unknown operator. As a drawback, the regularisation scheme becomes nonlinear even for linear equations. However, the potential advantage is that not only the unknown solution is reconstructed, but also a suitable characterising function and thus the governing operator describing the underlying data. Additionally, convergence rates for the reconstruction of both solution and operator have been derived.
The double regularised total least squares (dbl-RTLS) approach allow us to treat the problem in the framework of Tikhonov regularisation rather than as a constraint minimisation problem. More precisely, the regularised solution is obtained by minimising a nonlinear, nonconvex and possibly non-differentiable functional over two variables, which is computationally not always straightforward. Thus the goal of this paper is the development of an efficient and convergent numerical scheme for the minimisation of the Tikhonov-type functional for the dbl-RTLS approach.
The rest of paper is organised as follows: in section 2 we formulate the underlying problem and give a short summary of the dbl-RTLS method. Section 3 is dedicated to the development of an algorithm based on an alternating minimisation strategy, as well as its convergence properties. In section 4, numerical results for the proposed algorithm are provided and the efficiency of the method is discussed. For the convenience of the reader in appendix we display important concepts and fundamental results used throughout this article.

Problem formulation and the dbl-RTLS method
As mentioned above, we aim at the inversion of the linear operator equation (1)  ,  also a Hilbert space. To be more specific, we consider operators Inverse Problems 31 (2015) 075004 I R Bleyer and R Ramlau fulfilling, for some > C 0, the inequality .
( 2 ) Associated to the bilinear operator B, we also define the linear operator i.e. the operator error norm is controlled by the error norm of the characterising functions. Now we can formulate our problem as follows: Please note that the problem with explicitly known k 0 (or the operator A 0 ) is often ill-posed and needs regularisation for a stable inversion. Therefore we will also propose a regularising scheme for the problem (5a)-(5c). Due to our assumptions on the structure of the operator A 0 , the inverse problem of identifying the function f true from noisy measurements δ g and an inexact operator ϵ A can now be rewritten as the task of solving the inverse problem find f s.t.
0 In most applications, the 'inversion' of B will be ill-posed (e.g. if B is defined via a Fredholm integral operator), and a regularisation strategy is needed for a stable solution of the problem (6). For the solution of (6) from given data ϵ δ k g ( , ) fulfilling (7), we use the dbl-RTLS method proposed in [2], where the approximations to the solutions are computed as , 2 Here, α and β are the regularisation parameters which have to be chosen properly, γ is a scaling parameter (arbitrary but fixed), L is a bounded linear and continuously invertible operator and   ⊂ → +∞ X : [ 0 , ] is a proper, convex and weakly lower semi-continuous functional. The functional α β δ ε J , , is composed as the sum of two terms: one which measures the discrepancy of data and operator, and one which promotes stability. The functional δ ε T , is a data-fidelity term based on the TLS technique, whereas the functional α β R , acts as a penalty term which stabilises the inversion with respect to the pair (k, f). As a consequence, we have two regularisation parameters, which also occurs in double regularisation, see, e.g. [17].
The domain of the functional can be extended over   × by X . Then  is proper, convex and weak lower semicontinuous functional in  .
It has been shown that the sequence of the pair of solutions k f ( , ) n n of (8) converges to a minimum-norm solution when δ ϵ → ( , ) (0, 0), i.e. it is a regularisation method (see [2, theorem 4.5]). However, the task of finding minimisers of (8) has not been addressed properly, which will be done in the following sections.

An algorithm for the minimisation of the dbl-RTLS functional
In this section, we will formulate the first-order necessary condition for critical points of the functional α β δ ε J , , , which requires in particular the derivative of the bilinear operator B. The core of this section is to design an algorithm to minimise α β δ ε J , , , which is not a trivial task, as the functional is most likely nonconvex and nonlinear.

Optimality condition
It is well known that the study of local behaviour of nonsmooth functions can be achieved by the concept of subdifferentiality which replaces the classical derivative at non-differentiable points.
The first-order necessary condition based on subdifferentiability is stated as the following: if k f (¯,¯) minimises the functional α β δ ε J , , then , , We denote the set of all subderivatives of the functional α β δ ε , , and we name it the subdifferential of α β δ ε J , , at (k, f). For a quick revision on subdifferentiability we refer to the apppendix.
The first result gives us the derivative of a bilinear operator B.
is given by Since B is bilinear, we have , and thus exists and is a bounded linear operator whenever both  and   × are Hilbert spaces. In order to analyse the optimality condition (9) we shall compute the subdifferential of a functional over two variables. As pointed out in [6, proposition 2.3.15] for a general function h the set-valued mapping * the set ∂h x x ( , ) 1 2 and the product set 1 2 are not necessarily contained in each other. Here, ∂ h i denotes the partial subgradient with respect to x i for = i 1, 2. However this is not the case for the functional we are interested in as will be shown in the following theorem.
where Q is a (nonlinear) differentiable term and Proof. In general the subdifferential of a sum of functions does not equal the sum of its subdifferentials. However, if Q is differentiable, φ and ψ are convex some inclusions and even equalities hold true (combining [6, proposition 2.3.3; corollary 3; proposition 2.3.6]), as for instance, Since Q is differentiable, calling the previous results, the (partial) subderivative is unique [6], proposition 2.3.15] and therefore Note that the subderivative of the sum of two separable convex functionals satisfies The last implication of this theorem, follows straightforward by the definition of partial subderivative and (11). □ Please note that the above proof holds for all definitions of subdifferentials introduced in the appendix, as for convex functionals all the definitions are equivalent, and for differentiable (possibly nonlinear) terms the subdifferential is a singleton and the subderivative equals the derivative. Based on theorem 3.2 we can now calculate the derivative of the functional which is the gist for building up the upcoming algorithm; please give heed to the structure of (10) and the proposed functional α β δ ε J , , :

Inverse Problems 31 (2015) 075004 I R Bleyer and R Ramlau
Proof. The result follows straightforward from lemma 3.1 and theorem 3.2. Observe that the sum Up to now, we did not specify the functional , it is only required to be convex and lower semi-continuous. We are particularly interested in, e.g. the L p norm or the weighted ℓ p norm, denoted by  , . Its subdifferential is given in section 4. An easy way to compute the subderivatives of functionals  with a specific structure is given by the following lemma. where

An alternating minimisation algorithm
Coordinate descent methods are based on the idea that the minimisation of a multivariable function can be achieved by minimising it along one direction at a time. It is a simple and surprisingly efficient technique. The coordinates can be chosen arbitrarily with any permutation, but one can also replace them by block coordinates (for more details see [16] and references therein). This method is closely related to coordinate gradient descent (CGD), Gauss-Seidel and SOR methods, which was studied previously by several authors and described in various optimisation books, e.g. [1,14]. In the unconstrained setting the method is called alternating minimisation (AM) when the variables are split into two blocks. The computation of a solution of dbl-RTLS is not straightforward, as determining the minimum of the functional (8) with respect to both parameters is a nonlinear and nonconvex problem over two variables. Nevertheless we shall overcome this problem by applying some coordinate descent techniques.
In the following we shall denote the dbl-RTLS functional by J instead of α β δ ε J , , , as the parameters of the functionals are kept fix for the minimisation process.
In the AM algorithm, the functional is minimised iteratively with two alternating minimisation steps. Each step minimises the problem over one variable while keeping the second variable fixed: n k U n 1 1 The notation | J k f u ( , ) means we minimise the function J with u fixed, where u can be either k or f. Thus we minimise in each cycle the functionals ) We highlight some important facts: 1. For each subproblem, the considered operators are linear, and the functional is convex. Thus a local minimum is global. 2. The first step is a standard quadratic minimisation problem.

□
The existence of a minimiser of the functional J has already been proven in [2, theorem 4.2]. The goal of the following results is to prove that the sequence generated by the alternating minimisation algorithm has at least a subsequence which converges towards to a critical point of the functional. Throughout this section, let us make the following assumptions.
we define the weighted norm for given γ > 0 as for all  ∈ f and for all  ∈ k .
Proof. As the iterates of the AM algorithm can be characterised as the minimisers of a reduced dbl-RTLS functional, see (13a), (13b) we observe   α γ β γ β  and by Alaoglu's theorem, it has a weakly convergent subsequence ⇀ , .
The second inequality in (14) is proven similarly: since In summary, the AM algorithm yields a bounded sequence and hence a weakly convergent subsequence. The next two results extend the convergence on the strong topology, for both   For the sake of notation simplicity we denote for the remainder of the proof the index + n 1 As all summands in (17)   Let us assume that which is in contradiction to (19). Thus we have shown the convergence of + k m 1 to k¯in norm. The last part of this proof focus on the convergence of the partial subdifferential of J with respect to k.

Inverse Problems 31 (2015) 075004 I R Bleyer and R Ramlau
Now, on the limit, ∈ ∂ J k f 0 (¯,¯) k , means that olds, i.e. the right hand-side of (20) converges and the limit of the sequence of subderivatives belongs also to the subdifferential set  ∂ ( ) k¯. The first part of the statement above can be seen by using condition (A2). Whereas the second part is obtained by the assumption that  is a convex functional, because in this case the Fenchel subdifferential coincides with the limiting subdifferential, which is a strongweakly closed mapping (see appendix).
The strong convergence for the second variable is obtained as follows. As the limes inferior exists, we can in particular extract a subsequence ,¯¯. The second half of this proof refers to the convergence of the partial subdifferential of J with respect to f and its limit.
Since + f m 1 solves the sub-minimisation problem (13a), the optimality condition reads as ∈ ∂ Our goal is to show that the limit given in (21) from the optimality condition, but we would like to show Subtracting the latter expression from the first one, we get For our test computations we choose  to be the weighted l p norm of the coefficients of the characterising function k with respect to an orthonormal basis ϕ λ λ . For all possible choices of p it is well known that the choice p = 1 promotes sparsity [8], in the sense that the minimiser of the related Tikhonov functional has only few nonzero coefficients with respect to the underlying bases. We are particularly interested in wavelets bases, as many signals (1D) or images (2D) exhibit piecewise smooth behaviour punctuated by transients.
One cycle of the alternating minimisation problem (13) consists of two steps, where each step consists of the minimisation of a linear and convex functional. Firstly, solving (13a) we fix k n and find the solution + f n 1 through, e.g. a conjugate gradient method. Secondly, solving (13b) we fix + f n 1 from the previous step and solve the shrinkage minimisation problem described on [8] and get + k n 1 . We test the performance of our algorithm for a 2D convolution problem * = f k g.
true 0 0 More precisely the image f true is represented numerically as a matrix of size × 256 256, using the command imread to read the original JPEG image 4 file composed by three levels of grey. The blurring kernel k 0 is described by a Gaussian function ⎛ Numerical experiments are performed from given measurements in order to reconstruct both the function and the kernel. An example of the initial noisy data and noisy kernel is illustrated on figure 2, where we added 8% relative white noise.
The figure 3 illustrates the significant improvement from the initial given noisy data (see figure 2 (right) with 8% relative noise, = SNR 4.199) compared to the one obtained from the dbl-RTLS solution. For comparison, we also give a reconstruction result obtained by using Tikhonov regularisation applied to the linear convolution problem where the noisy kernel is fixed. The reconstruction result shows that our approach which considers both the function and the kernel as variable leads to a better reconstruction. However, this effect becomes less prominent when the noise becomes considerably small.
The numerical results are given in figure 4, which displays in each row three graphics: the approximated image, the reconstructed kernel and its convolution. It plots a collection of numerical solutions computed from four samples with 8%, 4%, 2% and 1% relative error (RE) on both measurements, respectively in each row from top to bottom. Moreover, we compare the numerical reconstruction with the true image and kernel; the errors in norm are displayed in the table 1. Either numerically or visually one can conclude that dbl-RTLS is indeed a regularisation method, since its reconstruction and computed data improve as the noise level decreases.    where c c , 1 2 , μ ν < ⩽ 0 , 1have been picked heuristically. The × M N matrix represents the underlying function; in our numerical example = = M N 256. Depending on the noise level, up to 5 AM cycles has been carried out. Each of the cycles has been stopped whenever the related norms of the computed updates were below the threshold 1e-4.