This site uses cookies. By continuing to use this site you agree to our use of cookies. To find out more, see our Privacy and Cookies policy.
Topical Review The following article is Open access

Higher-order total variation approaches and generalisations

and

Published 3 December 2020 © 2020 The Author(s). Published by IOP Publishing Ltd
, , Citation Kristian Bredies and Martin Holler 2020 Inverse Problems 36 123001 DOI 10.1088/1361-6420/ab8f80

0266-5611/36/12/123001

Abstract

Over the last decades, the total variation (TV) has evolved to be one of the most broadly-used regularisation functionals for inverse problems, in particular for imaging applications. When first introduced as a regulariser, higher-order generalisations of TV were soon proposed and studied with increasing interest, which led to a variety of different approaches being available today. We review several of these approaches, discussing aspects ranging from functional-analytic foundations to regularisation theory for linear inverse problems in Banach space, and provide a unified framework concerning well-posedness and convergence for vanishing noise level for respective Tikhonov regularisation. This includes general higher orders of TV, additive and infimal-convolution multi-order total variation, total generalised variation, and beyond. Further, numerical optimisation algorithms are developed and discussed that are suitable for solving the Tikhonov minimisation problem for all presented models. Focus is laid in particular on covering the whole pipeline starting at the discretisation of the problem and ending at concrete, implementable iterative procedures. A major part of this review is finally concerned with presenting examples and applications where higher-order TV approaches turned out to be beneficial. These applications range from classical inverse problems in imaging such as denoising, deconvolution, compressed sensing, optical-flow estimation and decompression, to image reconstruction in medical imaging and beyond, including magnetic resonance imaging, computed tomography, magnetic-resonance positron emission tomography, and electron tomography.

Export citation and abstract BibTeX RIS

Original content from this work may be used under the terms of the Creative Commons Attribution 4.0 licence. Any further distribution of this work must maintain attribution to the author(s) and the title of the work, journal citation and DOI.

1. Introduction

In this paper we give a review of higher-order regularisation functionals of total-variation type, encompassing their development from their origins to generalisations and most recent approaches. Research in this field has in particular been triggered by the success of the total variation (TV) as a regularisation functional for inverse problems on the one hand, but on the other hand by the insight that tailored regularisation approaches are indispensable for solving ill-posed inverse problems in theory and in practice. The last decades comprised active development of the latter topic which resulted in a variety of different strategies for TV-based regularisation functionals that model data with some inherent smoothness, possibly of higher order or multiple orders. For these functionals, this paper especially aims at providing a unified presentation of the underlying regularisation aspects, giving an overview of numerical algorithms suitable to solve associated regularised inverse problems as well as showing the breadth of respective applications.

Let us put classical and higher-order total-variation regularisation into an inverse problems context. From the inverse problems point of view, the central theme of regularisation is the stabilisation of the inversion of an ill-posed operator equation, which is commonly phrased as finding a uX such that

for given K : XY and fY, where X and Y are usually Banach spaces. Various approaches for regularisation exist, e.g., iterative regularisation, Tikhonov regularisation, regularisation based on spectral theory in Hilbert spaces, or regularisation by discretisation. Being a regularisation and providing a stable inversion is mathematically well-formalised [84], and usually comprises regularisation parameters. Essentially, stable inversion means that each regularised inverse mapping from data to solution space is continuous in some topology, and being a regularisation requires in addition that, in case the measured data approximates the noiseless situation, a suitable choice of the regularisation parameters allows to approximate a solution that is meaningful and matches the noiseless data. These properties are typically referred to as stability and convergence for vanishing noise, respectively. For general non-linear inverse problems, they usually depend on an interplay between the selected regularisation strategy and the forward operator K, where often, derivative-based assumptions on the local behaviour around the sought solution are made [84, 113]. In contrast, for linear forward operators, unified statements are commonly available such that regularisation properties solely depend on the regularisation strategy. We therefore consider linear inverse problems throughout the paper, i.e., the solution of Ku = f where K : XY is always assumed to be linear and continuous.

Variational regularisation, which is the stabilised solution of such an inverse problems via energy minimisation methods, then encompasses—and is often identified with—Tikhonov regularisation (but comprises, for instance, also Morozov regularisation [136] or Ivanov regularisation [114]). Driven by its success in practical applications, it has become a major direction of research in inverse problems. Part of its success may be explained by the fact that variational regularisation allows to incorporate a modelling of expected solutions via regularisation functionals. In a Tikhonov framework, this means that the solution of the operator equation Ku = f is obtained via solving

where Sf : Y → [0, ] is an energy that measures the discrepancy between Ku and the measured data f, and ${\mathcal{R}}_{\alpha }:X\to \left[0,\infty \right]$ is the regularisation functional that depends on regularisation parameters α. From the analytical perspective, two main features of ${\mathcal{R}}_{\alpha }$ are important: first, it needs to possess properties that allow to guarantee that the corresponding solution map enjoys the stability and convergence properties as mentioned above (typically, lower semi-continuity and coercivity in some topology). Second, it needs to provide a good model of reasonable/expected solutions of Ku = f in the sense that ${\mathcal{R}}_{\alpha }\left(u\right)$ is small for such reasonable solutions and ${\mathcal{R}}_{\alpha }\left(u\right)$ is large for unreasonable solutions that suffer, for instance, from artefacts or noise.

While the first requirement is purely qualitative and known to be true for a wide range of norms and seminorms, the second requirement involves the modelling of expected solutions as well as suitable quantification, having in particular in mind that the outcome should be simple enough to be amenable to numerical solution algorithms. Suitable models are for instance provided by various classical smoothness measures such as Hilbert scales of smooth functions, i.e., by Hs -norms where s ⩾ 0, but also reflexive Banach-space norms such as Lp -norms, associated Sobolev-space seminorms in Hk,p for 1 < p < , and Besov-space seminorms based on wavelet-coefficient expansions [39, 69, 170]. The reflexivity of the underlying spaces then helps to turn an ill-posed equation into a well-posed one, since the direct method in the calculus of variations can be employed with weak convergence.

However, there are reasons to consider Banach spaces that lack reflexivity, with L1-spaces and spaces of Radon measures being prominent examples. Indeed, L1-type norms as penalties in variational energies have seen a tremendous rise in popularity in the past two decades, most notably in the theory of compressed sensing [77]. This is due to their property of favouring sparsity in solutions, which allows to model more specific a priori assumptions on the expected solutions than generic smoothness, for instance. While sparsity in L1-type spaces over discrete domains, such as spaces of wavelet coefficients, is directly amenable to analysis, sparsity for continuous domains requires to consider spaces of Radon measures and corresponding Radon-norm-type energies which are natural generalisations of L1-type norms. Being the dual of a separable normed space then mitigates the non-reflexivity of these spaces. As a consequence, they play a major role in continuous models for sparsity-promoting variational regularisation strategies.

A particular example is the total variation functional [59, 162], see section 2 below for a precise definition, which can be interpreted as the Radon norm realised as a dual norm on the distributional derivative of u. As such, TV(u) is finite if and only if the distributional derivative of u can be represented by a finite Radon measure. The TV functional then penalises variations of u via a norm on its derivative while still being finite in the case of jump discontinuities, i.e., when u is piecewise smooth. In particular, its minimisation realises sparsity of the derivative which is often considered a suitable model for piecewise constant functions. In addition, it is convex and lower semi-continuous with respect to Lp -convergence for any p ∈ [1, ], and coercive up to constants in suitable Lp -norm topologies. These features make TV a reasonable model for piecewise constant solutions and allow to obtain well-posedness of TV regularisation for a broad class of inverse problems. They can be considered as some of the main reasons for the overwhelming popularity of TV in inverse problems, imaging sciences and beyond.

Naturally, the simplicity and desirable properties of TV come with a cost. As previously mentioned, interpreting TV as a functional that generalises the L1-norm of the image gradient, compressed sensing theory suggests that this enforces sparsity of the gradient and hence piecewise constancy, i.e., one might expect that a TV-regularised function is non-constant only on low-dimensional subsets of its domain. While this might in fact be a feature if the sought solution is piecewise constant, it is not appropriate for general piecewise smooth data. Indeed, for non-piecewise-constant data, TV has the defect of producing artificial plateau-like structures in the reconstructions which became known as the staircasing effect of TV. This effect is nowadays well-understood analytically in the case of denoising [54, 139, 158], and recent results also provide an analytical confirmation of this fact in the context of inverse problems with finite-dimensional measurement data [26, 29]. The appearance of staircasing artefacts is in particular problematic since jump discontinuities are features which are, on the one hand, very prominent in visual perception and typically associated with relevant structures, and, on the other hand, important for automatic post-processing or interpretation of the data. As a result, it became an important research question in the past two decades how to improve upon this defect of TV regularisation while maintaining its desirable features, especially the sparsity-enforcing properties.

One first possible answer to this question is to consider the second-order total variation which is the Radon norm of the distributional second derivative. Indeed, if first-order total variation enforces sparse gradients, i.e., a distributional derivative that is supported on lower-dimensional subsets and hence, corresponding to a piecewise constant function, one might expect an analogous behaviour for the second derivative, resulting in piecewise affine functions. The smooth linear ramps that are characteristic for the latter are then typically not conceived as staircasing artefacts such that one might hope for a reduction of these artefacts by employing second-order total variation. These considerations motivate, generally, higher-order total variation as sparsity-enforcing regulariser for smooth regions in an image. However, as also described in section 3 below, pure higher-order total variation regularisation comes with some drawbacks, mainly the inability to recover jump discontinuities. To account for this, several extensions and alternatives have been proposed in the literature which still have the common theme of using higher-order derivatives and sparsity-enforcing functionals to model smooth regions in images.

This review is concerned with these developments with a focus on approaches that are related to the incorporation of higher-order derivatives and maintenance of the sparsity concepts realised by the Radon norm and the underlying spaces of Radon measures. These developments indeed resulted in a variety of different variational regularisation strategies, for which some are very successful in achieving the goal of providing an amenable model for piecewise smooth solutions. It is also a central message of this review that the success of higher-order TV model in terms of modelling and regularisation effect depends very much on the structure and the functional-analytic setting in which the higher-order derivatives are included. Following this insight, we will discuss different higher-order regularisation functionals such as higher-order total variation, the infimal-convolution of higher-order TV as well as the total generalised variation (TGV), which carries out a cascadic decomposition to different orders of differentiation. Starting form the analytical framework of the total-variation functional and functions of bounded variation, we will introduce and analyse several higher-order approaches in a continuous setting, discuss their regularisation properties in a Tikhonov regularisation framework, introduce appropriate discretisations as well as numerical solution strategies for the resulting energy minimisation problems, and present various applications in image processing, computer vision, biomedical imaging and beyond.

Nevertheless, due to the broad range of the topic as well as the many works published in its environment, it is impossible to give a complete overview. The various references to the literature given throughout the paper therefore only represent a selection. Let us also point out that we selected the presented material in particular on a basis that, on the one hand, enables a treatment that is as unified as possible. On the other hand, a clear focus is put on approaches for which the whole pipeline ranging from mathematical modelling, embedding into a functional-analytic context, proof of regularisation properties, numerical discretisation, optimisation algorithms and efficient implementation can be covered. In addition, extensions and further developments will shortly be pointed out when appropriate. Especially, many of the applications in image processing, computer vision, medical imaging and image reconstruction refer to these extensions. The applications were further chosen to represent a wide spectrum of inverse problems, their variational modelling and higher-order TV-type regularisation, and, not negligible, successful realisation of the presented theory. We finally aimed at providing a maximal amount of useful information regarding theory and practical realisation in this context.

2. Total-variation (TV) regularisation

Before discussing higher-order total variation and how it may be used to regularise ill-posed inverse problems, let us begin with an overview of first-order total variation. Throughout the review, we mainly adapt a continuous viewpoint which means that the objects of interest are usually functions on some fixed domain Ω, i.e., an non-empty, open and connected subset Ω ⊂ Rd in the d-dimensional Euclidean space. This requires in particular a common functional-analytic context for which we assume that the reader is familiar with and refer to the books [1, 85, 204] for further information. In the following, we will make, for instance, use of the Lebesgue spaces Lp (Ω, H) for H-valued functions where H is a finite-dimensional real Hilbert space as well as their measure-theoretic and functional-analytic properties. Also, concepts of weak differentiability and properties of the associated Sobolev spaces Hk,p (Ω, H) will be utilised without further introduction. This moreover applies to the classical spaces such as $\mathcal{C}\left(\overline{{\Omega}},H\right)$, ${\mathcal{C}}_{\mathrm{c}}\left({\Omega},H\right)$ and ${\mathcal{C}}_{0}\left({\Omega},H\right)$, i.e., the spaces of uniformly continuous functions on $\overline{{\Omega}}$, of compactly supported continuous functions on Ω and its closure with respect to the supremum norm. As usual, the respective spaces of k-times continuously differentiable functions are denoted by ${\mathcal{C}}^{k}\left(\overline{{\Omega}},H\right)$, ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},H\right)$ and ${\mathcal{C}}_{0}^{k}\left({\Omega},H\right)$ where k could also be infinity, leading to spaces of test functions.

We further employ, throughout this section, basic concepts from convex analysis and optimisation. At this point, we would like to recall that for a convex function $F:X\to \left.\right]- \infty ,\infty \left.\right]$ defined on a Banach space X, the subgradientF(x) at a point uX is the collection of all wX* that satisfy the subgradient inequality

For F proper, the Fenchel dual or Fenchel conjugate of F is the function ${F}^{{\ast}}:{X}^{{\ast}}\to \left.\right]- \infty ,\infty \left.\right]$ defined by

The Fenchel inequality then states that ${\langle w,\enspace u\rangle }_{{X}^{{\ast}}{\times}X}{\leqslant}F\left(u\right)+{F}^{{\ast}}\left(w\right)$ for all uX and wX* with equality if and only if w ∈ ∂F(u). For more details regarding these notions and convex analysis in general, we refer to research monographs covering this subject, for instance [83, 198].

2.1. Functions of bounded variation

Generally, when solving a specific ill-posed inverse problem with, for instance, Tikhonov regularisation, one usually has many choices regarding the regularisation functional. Now, while functionals associated with Hilbertian norms or seminorms possess several advantages such as smoothness and allow, in addition, for regularisation strategies that can be computed by solving a linear equation, they are often not able to provide a good model for piecewise smooth functions. This can, for instance, be illustrated as follows.

Example 2.1. Classical Sobolev spaces cannot contain non-trivial piecewise constant functions. Let Ω ⊂ Rd be a domain and Ω' ⊂ Ω be non-empty, open with ∂Ω' a null set. Then, the characteristic function u = χΩ', i.e., u(x) = 1 if x ∈ Ω' and 0 otherwise, is not contained in H1,p (Ω) for any p ∈ [1, ]. To see this, suppose that vLp (Ω, Rd ) is the weak derivative of u. Let $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({{\Omega}}^{\prime }\right)$ be a test function. Clearly,

Hence, v = 0 on Ω'. Likewise, one sees that also v = 0 on ${\Omega}{\backslash}\overline{{{\Omega}}^{\prime }}$. In total, v = 0 almost everywhere and as v is the weak derivative of u, u must be constant which is a contradiction.

The defect which is responsible for the failure of characteristic functions being (classical) Sobolev functions can, however, be remedied by allowing weak derivatives to be Radon measures. These are in particular able to concentrate on Lebesgue null-sets; a property that is necessary as the previous example just showed. In the following, we introduce some basic notions and results about vector-valued Radon measures, in particular, with an eye of embedding them into a functional-analytic framework. Moreover, we would like to have these notions readily available when dealing with higher-order derivatives and the associated higher-order total variation.

Throughout this section, let Ω ⊂ Rd be a domain and H a non-trivial finite-dimensional real Hilbert space with ⋅ and $\left\vert \cdot \right\vert $ denoting the associated scalar product and norm, respectively. As usual, the case H = R corresponds to the scalar case and H = Rd to the vector-field case, but, as we will see later, H could also be a space of higher-order tensors. The following definitions and statements regarding basic measure theory and can, for instance, be found in [7].

Definition 2.2. A vector-valued Radon measure or H -valued Radon measure on Ω is a function $\mu :\mathcal{B}\left({\Omega}\right)\to H$ on the Borel σ-algebra $\mathcal{B}\left({\Omega}\right)$ associated with the standard topology on Ω satisfying the following properties:

  • (a)  
    It holds that μ(Ø) = 0,
  • (b)  
    For each pairwise disjoint countable collection A1, A2, .... in $\mathcal{B}\left({\Omega}\right)$ it holds that $\mu \left({\bigcup }_{i\in \mathbf{N}}{A}_{i}\right)={\sum }_{i=1}^{\infty }\mu \left({A}_{i}\right)$ in H.

A positive Radon measure is a function $\mu :\mathcal{B}\left({\Omega}\right)\to \left[0,\infty \right]$ satisfying (a), (b) (with H replaced by [0, ]) as well as μ(K) < for each compact K ⊂⊂ Ω. It is called finite, if μ(Ω) < .

Naturally, vector-valued Radon measures can be associated to an integral. For μ an H-valued Radon measure and step functions $u={\sum }_{j=1}^{N}{c}_{j}{\chi }_{{A}_{j}}$, $v={\sum }_{j=1}^{N}{v}_{j}{\chi }_{{A}_{j}}$ with c1, ..., cN R, v1, ..., vN H and ${A}_{1},\dots ,{A}_{N}\in \mathcal{B}\left({\Omega}\right)$, the following integrals make sense:

For uniformly continuous functions $u:\overline{{\Omega}}\to \mathbf{R}$ and $v:\overline{{\Omega}}\to H$, the integrals are given as

where {un } and {vn } are sequences of step functions converging uniformly to u and v, respectively. Of course, the above integrals are well-defined, meaning that there are approximating sequences as stated and the above limits exist independently of the specific choice of the approximating sequences. The following definition is the basis for introducing a norm for H-valued Radon measures.

Definition 2.3. For a vector-valued Radon measure μ on Ω the positive Radon measure $\left\vert \mu \right\vert $ given by

is called the total-variation measure of μ.

The total-variation measure is always positive and finite, i.e., $0{\leqslant}\left\vert \mu \right\vert \left(A\right){< }\infty $ for all $A\in \mathcal{B}\left({\Omega}\right)$. By construction, μ is absolutely continuous with respect to $\left\vert \mu \right\vert $, i.e., μ(A) = 0 whenever $\left\vert \mu \right\vert \left(A\right)=0$ for a $A\in \mathcal{B}\left({\Omega}\right)$. By Radon–Nikodým's theorem, we thus have that each H-valued Radon measure μ can be written as $\mu ={\sigma }_{\mu }\left\vert \mu \right\vert $ with ${\sigma }_{\mu }\in {L}_{\left\vert \mu \right\vert }^{\infty }\left({\Omega},H\right)$ such that ${{\Vert}{\sigma }_{\mu }{\Vert}}_{\infty }{\leqslant}1$ and $\left\vert {\sigma }_{\mu }\right\vert =1$ almost everywhere with respect to $\left\vert \mu \right\vert $. In this light, integration can also be phrased as

for $u:\overline{{\Omega}}\to \mathbf{R}$, $v:\overline{{\Omega}}\to H$ uniformly continuous. The following theorem, which is a direct consequence of [163, theorem 6.19], provides a useful characterisation of the space of vector valued measures as the dual of a separable space.

Proposition 2.4. The space $\mathcal{M}\left({\Omega},H\right)$ of all vector-valued Radon measures equipped with the norm ${{\Vert}\mu {\Vert}}_{\mathcal{M}}=\left\vert \mu \right\vert \left({\Omega}\right)$ for $\mu \in \mathcal{M}\left({\Omega},H\right)$ is a Banach space.

It can be identified with the dual space ${\mathcal{C}}_{0}{\left({\Omega},H\right)}^{{\ast}}$ as follows. For each $T\in {\mathcal{C}}_{0}{\left({\Omega},H\right)}^{{\ast}}$ there exists a unique $\mu \in \mathcal{M}\left({\Omega},H\right)$ such that

In particular, one has a notion of weak*-convergence of Radon measures. For a sequence {μn } and an element μ* in $\mathcal{M}\left({\Omega},H\right)$ we have that ${\mu }^{n}{\ast}{\rightharpoonup }{\mu }^{{\ast}}$ in $\mathcal{M}\left({\Omega},H\right)$ if

As the predual space ${\mathcal{C}}_{0}\left({\Omega},H\right)$ is separable, the Banach–Alaoglu theorem yields in particular the sequential relative weak*-compactness of bounded sets. That means for instance that a bounded sequence always admits a weakly*-convergent subsequence, a property that may compensate for the lack of reflexivity of $\mathcal{M}\left({\Omega},H\right)$.

The interpretation as a dual space as well as the density of test functions in ${\mathcal{C}}_{0}\left({\Omega},H\right)$ also allows to conclude that in order for a linear functional T defining a Radon measure, it suffices to test against $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},H\right)$ and to establish $\left\vert T\left(\varphi \right)\right\vert {\leqslant}C{{\Vert}\varphi {\Vert}}_{\infty }$ for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},H\right)$ and C > 0 independent of φ. This is useful for derivatives, i.e., the derivative of a $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},H\right)$ defines a Radon measure in $\mathcal{M}\left({\Omega},{H}^{d}\right)$ if

Equation (1)

In this case, we denote by $\nabla u\in \mathcal{M}\left({\Omega},{H}^{d}\right)$ the unique Hd -valued Radon measure for which ${\int }_{{\Omega}}$ φ ⋅ d∇u = −${\int }_{{\Omega}}$ u  div  φ dx for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{H}^{d}\right)$. Here, Hd is equipped with the scalar product $x\cdot y={\sum }_{i=1}^{d}{x}_{i}\cdot {y}_{i}$ for x, yHd . In the case where (1) fails, there exists a sequence {φn } in ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathbf{R}}^{d}\right)$ with ${{\Vert}{\varphi }^{n}{\Vert}}_{\infty }=1$ and $\left\vert {\int }_{{\Omega}}u\cdot \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \enspace \mathrm{d}x\right\vert \to \infty $ as n. Thus, allowing the supremum to take the value , this yields following definition.

Definition 2.5. The total variation of a $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},H\right)$ is the value

Clearly, in case TV(u) < , we have $\nabla u\in \mathcal{M}\left({\Omega},{H}^{d}\right)$ with ${{\Vert}\nabla u{\Vert}}_{\mathcal{M}}=\mathrm{T}\mathrm{V}\left(u\right)$. Trivially, for scalar functions, i.e., H = R, one recovers the well-known definition [7, 162]. Also, one immediately sees that TV is invariant to translations and rotations, or, more generally, to Euclidean-distance preserving transformations. This is the reason that this definition is also referred to as the isotropic total variation.

Example 2.6. Piecewise constant functions may have a Radon measure as derivative. Let Ω' ⊂ Ω be a subdomain such that ∂Ω' ∩ Ω can be parameterised by finitely many Lipschitz mappings. Then, the outer normal ν exists almost everywhere in ∂Ω' ∩ Ω with respect to the Hausdorff ${\mathcal{H}}^{d-1}$ measure and one can employ the divergence theorem. This yields, for u = χΩ' and $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathbf{R}}^{d}\right)$ with ||φ|| ⩽ 1 that

so $\nabla u=-\nu {\mathcal{H}}^{d-1}\enspace \llcorner \enspace \left(\partial {{\Omega}}^{\prime }\cap {\Omega}\right)$ is a Radon measure. One sees, for instance via approximation, that ${{\Vert}\nabla u{\Vert}}_{\mathcal{M}}={\mathcal{H}}^{d-1}\left(\partial {{\Omega}}^{\prime }\cap {\Omega}\right)$.

The class of sets Ω' ⊂ Ω for which χΩ' possesses a Radon measure as weak derivative is actually much greater than the class of bounded Lipschitz domains. These are the sets of finite perimeter, denoted by $\mathrm{P}\mathrm{e}\mathrm{r}\left({{\Omega}}^{\prime }\right)={{\Vert}\nabla {\chi }_{{{\Omega}}^{\prime }}{\Vert}}_{\mathcal{M}}$. One the other hand, for uH1,1(Ω), we have $\mathrm{T}\mathrm{V}\left(u\right)={\int }_{{\Omega}}\left\vert \nabla u\right\vert \enspace \mathrm{d}x$ and the weak derivative as Radon measure is just $\nabla u{\mathcal{L}}^{d}$, i.e., the Sobolev derivative interpreted as a weight on the Lebesgue measure. Collecting all functions whose weak derivative is a Radon measure, we arrive at the following space.

Definition 2.7. The space

is the space of H-valued functions of bounded variation. In case H = R, we denote by BV(Ω) = BV(Ω, R) and just refer to functions of bounded variation.

Proposition 2.8. The space BV(Ω, H) with the associated norm is a Banach space. The total variation functional TV is a continuous seminorm on BV(Ω, H) which vanishes exactly at the constant functions, i.e., ker(TV) = H 1, with H 1 being the set of constant, H-valued functions.

The total variation functional is just designed to possess many convenient properties [7].

Proposition 2.9. 

  • The functional TV is proper, convex and lower semi-continuous on each Lp (Ω, H), i.e., for 1 ⩽ p.
  • For 1 ⩽ p < , each u ∈ BV(Ω, H) ∩ Lp (Ω, H) can smoothly be approximated as follows: for ɛ > 0, there exists ${u}^{\varepsilon }\in {\mathcal{C}}^{\infty }\left({\Omega},H\right)\cap \mathrm{B}\mathrm{V}\left({\Omega},H\right)\cap {L}^{p}\left({\Omega},H\right)$ such that
  • If Ω is a bounded Lipschitz domain, then there exists a constant C > 0 such that for each u ∈ BV(Ω, H) with ${\int }_{{\Omega}}$ u  dx = 0, the Poincaré–Wirtinger estimate
    holds.

From the regularisation-theoretic point of view, the fact that TV is proper, convex and lower semi-continuous on Lebesgue spaces is relevant, a property that fails for the Sobolev-seminorm ||∇ ⋅ ||1. The Poincaré–Wirtinger estimate can be interpreted as a coercivity property on a subspace with codimension 1. Also note that this estimate is the same as for H1,1(Ω, H)-functions and the respective constants C coincide. Consequently, the embedding properties of the latter space transfer immediately.

Proposition 2.10. Let Ω is a bounded Lipschitz domain. Then,

  • The embedding BV(Ω, H) ↪ Ld/(d−1)(Ω, H) (with d/(d − 1) = for d = 1) exists and is continuous.
  • The embedding BV(Ω, H) ↪ Lp (Ω, H) is compact for each 1 ⩽ p < d/(d − 1).
  • Each bounded sequence {un } in BV(Ω, H) possesses a subsequence $\left\{{u}^{{n}_{k}}\right\}$ which converges to a u ∈ BV(Ω, H) weak* in BV(Ω, H), which we define as ${u}^{{n}_{k}}\to u$ in L1(Ω, H), $\nabla {u}^{{n}_{k}}{\ast}{\rightharpoonup }\nabla u$ in $\mathcal{M}\left({\Omega},{H}^{d}\right)$ as k.

Consequently, the total variation is suitable for regularising ill-posed inverse problems in certain Lp -spaces.

2.2. Tikhonov regularisation

Let us now turn to solving ill-posed inverse problems with Tikhonov regularisation and BV-based penalty, i.e., solving

for some data f in a Banach space Y. As mentioned in the introduction, since the focus of this review is on regularisation terms rather than tackling inverse problems in the most possible generality, we restrict ourselves here to linear and continuous forward operators K : Ld/(d−1)(Ω) → Y. Nevertheless we note that, building on the results developed here for the linear setting, an extension to non-linear operators typically boils down to ensuring additional requirements on the non-linear forward model rather than the regularisation term, see for instance [84, 106, 182].

Measuring the discrepancy in terms of the norm in Y, the problem is then to solve

for some exponent q ⩾ 1. Usually, Y is some Hilbert space and q = 2, resulting in a quadratic discrepancy, which is often used in case of Gaussian noise. For impulsive noise (or salt-and-pepper noise), the space Y = L1(Ω'), with Ω' a domain, turns out to be useful. In case of Poisson noise, however, it is not advisable to take the norm but rather the Kullback–Leibler divergence between Ku and f, i.e. KL(Ku, f), where KL is given, for fL1(Ω') with f ⩾ 0 almost everywhere, according to the non-negative integral

Equation (2)

provided that v ⩾ 0 a.e., and else. In particular, in this context, we agree to set the integrand to v where f = 0 and to where v = 0 and f > 0.

In the following, we assume to have given a discrepancy functional Sf : Y → [0, ] that is proper, convex, lower semi-continuous and coercive. This is not the most general case but will be sufficient for us in order to ensure existence of minimisers of the Tikhonov functional.

Theorem 2.11. Let Ω be a bounded Lipschitz domain, Y be a Banach space, K : Ld/(d−1)(Ω) → Y linear and continuous (weak*-to-weak-continuous in case d = 1), Sf : Y → [0, ] a proper, convex, lower semi-continuous and coercive discrepancy functional associated with some data f and α > 0. Then, there exist solutions of

Equation (3)

If Sf is strictly convex and K is injective, the solution is unique whenever the minimum is finite.

We provide the proof for the sake of completeness and as a prototype for the generalisation to higher-order functionals.

Proof. Assume that the objective functional in (3) is proper, otherwise, there is nothing to show. For a minimising sequence {un }, the Poincaré–Wirtinger inequality gives boundedness of $\left\{{u}^{n}-{\left\vert {\Omega}\right\vert }^{-1}{\int }_{{\Omega}}{u}^{n}\enspace \mathrm{d}x\right\}$ in Ld/(d−1)(Ω) while the coercivity of Sf yields the boundedness of {Kun }. By continuity, $\left\{K\left({u}^{n}-{\left\vert {\Omega}\right\vert }^{-1}{\int }_{{\Omega}}{u}^{n}\enspace \mathrm{d}x\right)\right\}$ must be bounded, so if K 1 ≠ 0, then {${\int }_{{\Omega}}$ un   dx} is bounded as otherwise, {Kun } would be unbounded. In the case that K 1 = 0, we can without loss of generality assume that ${\int }_{{\Omega}}$ un   dx = 0 for all n as shifting along constants does not change the functional value. In each case, {${\int }_{{\Omega}}$ un   dx} is bounded, so {un } must be bounded in Ld/(d−1)(Ω). Hence, by compact embedding (proposition 2.10) we have ${u}^{{n}_{k}}\to {u}^{{\ast}}$ in L1(Ω) as k for a subsequence $\left\{{u}^{{n}_{k}}\right\}$ and u* ∈ BV(Ω). Reflexivity and continuity of K (weak* sequential compactness and weak*-to-weak continuity in case d = 1) give $K{u}^{{n}_{k}}\rightharpoonup K{u}^{{\ast}}$ in Y for another subsequence (not relabelled). By lower semi-continuity, u* has to be a solution to (3).

Finally, if Sf is strictly convex and K is injective, then Sf K is already strictly convex, so minimisers have to be unique. □

Example 2.12. 

  • The discrepancy functional ${S}_{f}\left(v\right)=\frac{1}{q}{{\Vert}v-f{\Vert}}_{Y}^{q}$ for some fY is obviously proper, convex, lower semi-continuous and coercive.
  • It follows from lemma A.1 in the appendix that the discrepancy Sf (v) = KL(v, f) defined on Y = L1(Ω') for fL1(Ω') with f ⩾ 0 almost everywhere is proper, convex and coercive in L1(Ω'). Lower semi-continuity in turn follows as special case of lemma A.2.

Remark 2.13. Note that if the inversion of K : Lp (Ω) → Y is well-posed for some p ∈ [1, ], then solutions of (3) still exist (even for α = 0). Clearly, the TV penalty is not necessary for obtaining a regularising effect for these problems. In this case, minimising the Tikhonov function with TV penalty may the interpreted as denoising. The most prominent example might be the Rudin–Osher–Fatemi problem [162] which reads as

for fL2(Ω). Here, as the identity is 'inverted', the effect of total-variation regularisation can be studied in detail. Minimisation problems of this type with other regularisation functionals are thus a good benchmark test for the properties of this functional.

The stability of solutions in case of varying f depends, of course, on the dependence of Sf on f. The appropriate notion here is the convergence of the discrepancy functional, i.e., for a sequence {fn } and limit f, we say that ${S}_{{f}^{n}}$ converges to Sf if

Equation (4)

Moreover, we say that $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive if there is a coercive function S0 : Y → [0, ] such that ${S}_{{f}^{n}}{\geqslant}{S}_{0}$ in Y for each n.

Theorem 2.14. In the situation of theorem 2.11, assume that ${S}_{{f}^{n}}$ converges to Sf in the sense of (4) and $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive. Then, for each sequence of minimisers {un } of (3) with discrepancy ${S}_{{f}^{n}}$,

  • Either ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{n}\right)\to \infty $ as n and (3) with discrepancy f does not admit a finite solution.
  • Or ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{n}\right)\to {\mathrm{min}}_{u\in {L}^{d/\left(d-1\right)}\left({\Omega}\right)}{S}_{f}\left(u\right)+\alpha \mathrm{T}\mathrm{V}\left(u\right)$ as n and there is, possibly up to constant shifts, a weak accumulation point uLd/(d−1)(Ω) (weak* accumulation point for d = 1) that minimises (3) with discrepancy Sf .

For each subsequence $\left\{{u}^{{n}_{k}}\right\}$ weakly converging to some u in Ld/(d−1)(Ω) (${u}^{{n}_{k}}{\ast}{\rightharpoonup }u$ in case d = 1), it holds that $\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)\to \mathrm{T}\mathrm{V}\left(u\right)$ as k and u solves (3) with discrepancy Sf . If solutions to the latter are unique, we have un u in Ld/(d−1)(Ω) (${u}^{n}{\ast}{\rightharpoonup }u$ in case d = 1).

Proof. Let, in the following ${\int }_{{\Omega}}$ un   dx = 0 for all n if K 1 = 0 and denote by F = Sf K + αTV as well as ${F}_{n}={S}_{{f}^{n}}\enspace {\circ}\enspace K+\alpha \mathrm{T}\mathrm{V}$. First of all, suppose that {Fn (un )} is bounded. As $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive, we can conclude as in the proof of theorem 2.11 that {un } is bounded. Therefore, a weak accumulation point (weak* in case d = 1) exists.

Suppose that ${u}^{{n}_{k}}\rightharpoonup u$ as k. Then,

as well as, for each u' ∈ Ld/(d−1)(Ω)

Thus, u is a minimiser for F and plugging in u' = u we see that ${\mathrm{lim}}_{k\to \infty }{F}_{{n}_{k}}\left({u}^{{n}_{k}}\right)=F\left(u\right)$. In order to obtain ${\mathrm{lim}}_{k\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)=\mathrm{T}\mathrm{V}\left(u\right)$, suppose that ${\text{lim sup}}_{k\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right){ >}\mathrm{T}\mathrm{V}\left(u\right)$, such that

which is a contradiction. Thus, ${\mathrm{lim}}_{k\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{n}_{k}}\right)=\mathrm{T}\mathrm{V}\left(u\right)$. Finally, if u is the unique minimiser for (3) with discrepancy Sf , then un u as n for the whole sequence (${u}^{n}{\ast}{\rightharpoonup }u$ in case d = 1) as any subsequence has to contain another subsequence that converges weakly (weakly*) to u.

In order to conclude the proof, suppose that lim infn Fn (un ) < . In that case, the above arguments yield an accumulation point as stated as well as a minimiser u ∈ BV(Ω) of F with F(u) ⩽ liminfn Fn (un ). In particular, F is proper. By convergence of ${S}_{{f}^{n}}$ to Sf and minimality, we have

so the whole sequence of functional values converges.

Finally, in case Fn (un ) → as n, F cannot be proper: otherwise, we obtain analogously to the above that > F(u) ⩾ lim supn Fn (u) ⩾ liminfn Fn (un ) for some u ∈ BV(Ω) which is a contradiction. □

Remark 2.15. The convergence of discrepancies as in (4) is related to gamma-convergence. Indeed, the difference is that, for the latter, on the right-hand side of the lim sup inequality, an arbitrary sequence converging to v is allowed (instead of the constant sequence). In this context, as can be seen in the proof of the stability result above, one could still weaken the lim sup-assumption in (4) by allowing not only the constant recovery sequence but any sequence for which the regularisation functional converges. However, in order to maintain an assumption on the discrepancy term that is independent of the choice of regularisation, we chose the slightly stronger condition.

Example 2.16. 

  • A typical discrepancy is some power of the norm-distance in Y, i.e., ${S}_{f}\left(v\right)=\frac{1}{q}{{\Vert}v-f{\Vert}}_{Y}^{q}$ for some q ⩾ 1. It is easy to show that whenever fn f in Y, ${S}_{{f}^{n}}$ converges to Sf in the above sense. Also, the equi-coercivity of $\left\{{S}_{{f}^{n}}\right\}$ is immediate.
  • For the Kullback–Leibler divergence, let Y = L1(Ω') for some Ω' and assume that {fn }, f in L1(Ω') are such that fn Cf a.e. in Ω' for some C > 0 and KL(f, fn ) → 0 as n. Then, it follows from lemma A.2 in the appendix that ${S}_{{f}^{n}}=\mathrm{K}\mathrm{L}\left(\cdot ,{f}^{n}\right)$ converges to Sf = KL(⋅, f), and also that ||fn f||1 → 0. The latter in particular implies boundedness of {fn } in L1(Ω') which, together with the coercivity estimate of lemma A.1 shows that $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive.

In addition to well-posedness of the Tikhonov-functional minimisation, one is of course interested in regularisation results, i.e., the convergence of solutions to a minimum-TV-solution provided that the data converges and α → 0 in some sense. For this purpose, let u ∈ BV(Ω) be a minimum-TV-solution of Ku = f for some data f in Y, i.e., TV(u) ⩽ TV(u) for each Ku = f, suppose that for each δ > 0 one has given a fδ Y such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta $, and denote by uα,δ a solution of (3) for parameter α > 0 and data fδ .

Theorem 2.17. In the situation of theorem 2.11, let the discrepancy functionals $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ in the sense of (4) for some data fY with ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f. Choose for each δ > 0 the parameter α > 0 such that

Then, again up to constant shifts, {uα,δ } has at least one weak accumulation point in Ld/(d−1)(Ω) (weak* in case d = 1). Each such accumulation point is a minimum-TV-solution of Ku = f and limδ→0TV(uα,δ ) = TV(u).

Proof. Again we assume that ${\int }_{{\Omega}}$ uα,δ dx = 0 for all (α, δ) if K 1 = 0. Using the optimality of uα,δ for (3) compared to u gives

Since α → 0 as δ → 0, we have that ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)\to 0$ as δ → 0. Moreover, as also δ/α → 0, it follows that lim supδ→0TV(uα,δ ) ⩽ TV(u). This allows to conclude that {uα,δ } is bounded in BV(Ω) and, by embedding, admits a weak accumulation point in Ld/(d−1)(Ω) (weak* in case d = 1).

Next, let u* be such an accumulation point associated with {δn }, δn → 0 as well as the corresponding parameters {αn }. Then, ${S}_{{f}^{{\dagger}}}\left(K{u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }{S}_{{f}^{{\delta }_{n}}}\left(K{u}^{{\alpha }_{n},{\delta }_{n}}\right)=0$, so Ku* = f. Moreover, $\mathrm{T}\mathrm{V}\left({u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right){\leqslant}\mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)$, hence u* is a minimum-TV-solution. In particular, TV(u*) = TV(u), so ${\mathrm{lim}}_{n\to \infty }\mathrm{T}\mathrm{V}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)=\mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)$.

Finally, each sequence of {δn }, δn → 0 contains another subsequence $\left\{{u}^{{\delta }_{n}}\right\}$ for which $\mathrm{T}\mathrm{V}\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)\to \mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)$ as n, so TV(uα,δ ) → TV(u) as δ → 0. □

Finally, if a respective source condition is satisfied, we can, under some circumstances, give rates for some Bregman distance with respect to TV associated with respect to a particular subgradient element [48]. Recall that the Bregman distance ${D}_{{x}^{{\ast}}}^{F}\left(y,x\right)$ of x, yX for a convex functional $F:X\to \enspace \left.\right]- \infty ,\infty \left.\right]$ and subgradient element x* ∈ ∂F(x) is given by

The convergence rate results are then a consequence of the following proposition.

Proposition 2.18. In the situation of theorem 2.17, let K*w ∈ ∂TV(u) for some wY*. Then,

Equation (5)

Proof. Using the minimality of uα,δ yields ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)+\alpha \mathrm{T}\mathrm{V}\left({u}^{\alpha ,\delta }\right){\leqslant}\alpha \mathrm{T}\mathrm{V}\left({u}^{{\dagger}}\right)+\delta $. Rearranging, adding ⟨K*w, uuα,δ ⟩ on both sides as well as using Fenchel's inequality twice yields

Subtracting ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)$ and dividing by α gives the result. □

For well-known discrepancy terms, one easily gets parameter choice rules that lead to rates for ${D}_{{K}^{{\ast}}{w}^{{\dagger}}}^{\mathrm{T}\mathrm{V}}\left({u}^{\alpha ,\delta }\right)$.

Example 2.19. 

  • For ${S}_{{f}^{\delta }}\left(v\right)=\frac{1}{q}{{\Vert}v-{f}^{\delta }{\Vert}}_{Y}^{q}$ with q > 1, ${S}_{{f}^{\delta }}^{{\ast}}\left(w\right)=\frac{1}{{q}^{{\ast}}}{{\Vert}w{\Vert}}_{{Y}^{{\ast}}}^{{q}^{{\ast}}}+\langle {f}^{\delta },\enspace w\rangle $ where 1/q + 1/q* = 1, hence (5) reads as
    In the non-trivial case of w ≠ 0, the right-hand side becomes minimal for $\alpha ={{\Vert}{w}^{{\dagger}}{\Vert}}_{{Y}^{{\ast}}}^{-1}{\left(\frac{{q}^{{\ast}}}{{q}^{{\ast}}-1}\right)}^{1/{q}^{{\ast}}}{\delta }^{1/{q}^{{\ast}}}$ giving the well-known rate of $\mathcal{O}\left({\delta }^{1/q}\right)=\mathcal{O}\left({{\Vert}{f}^{\delta }-{f}^{{\dagger}}{\Vert}}_{Y}\right)$ for the Bregman distance.
  • For the Kullback–Leibler discrepancy, i.e., ${S}_{{f}^{\delta }}\left(v\right)=\mathrm{K}\mathrm{L}\left(v,{f}^{\delta }\right)$ on L1(Ω'), a direct, pointwise computation shows that the dual functional obeys ${S}_{{f}^{\delta }}^{{\ast}}\left(w\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-w\right)={\int }_{{{\Omega}}^{\prime }}-{f}^{\delta }\enspace \mathrm{log}\left(1-{w}^{2}\right)\enspace \mathrm{d}x$ if $\left\vert w\right\vert {\leqslant}1$ almost everywhere, setting −t  log(0) = for t > 0 and −0  log(0) = 0, and ${S}_{{f}^{\delta }}^{{\ast}}\left(w\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-w\right)=\infty $ else. As wL(Ω'), we may choose α > 0 such that $\alpha {{\Vert}{w}^{{\dagger}}{\Vert}}_{\infty }{\leqslant}\frac{1}{\sqrt{2}}$. Then, the equivalence
    holds. Assuming ${\int }_{{{\Omega}}^{\prime }}{f}^{{\dagger}}{\left({w}^{{\dagger}}\right)}^{2}\enspace \mathrm{d}x{ >}0$, the weak convergence fδ f in L1(Ω') (see lemma A.2) implies ${S}_{{f}^{\delta }}^{{\ast}}\left(\alpha {w}^{{\dagger}}\right)+{S}_{{f}^{\delta }}^{{\ast}}\left(-\alpha {w}^{{\dagger}}\right)\sim {\alpha }^{2}$ independent from δ. Hence, choosing $\alpha \sim \sqrt{\delta }$ yields the rate $\mathcal{O}\left(\sqrt{\delta }\right)$ for the Bregman distance as δ → 0.

2.3. Further first-order approaches

Besides these functional-analytic properties, functions of bounded variation admit interesting structural and fine properties. Let us briefly discuss the structure of the gradient ∇u for a u ∈ BV(Ω). By Lebesgue's decomposition theorem, ∇u can be split into an absolutely continuous part ∇a u with respect to the Lebesgue measure and a singular part ∇s u. We tacitly identify ∇a u with the Radon–Nikodým derivative, i.e., ∇a uL1(Ω, Rd ) via the measure ${\nabla }^{a}u{\mathcal{L}}^{d}$.

The singular part ∇s u therefore has to capture the jump discontinuities of u. Indeed, introducing the jump set, it can further be decomposed. Recall that a uL1(Ω) is almost everywhere approximately continuous, i.e., for almost every x ∈ Ω there exists a zR such that

The collection of all points Su for which u is not approximately continuous is called the discontinuity set of u.

Definition 2.20. Let $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$ and x ∈ Ω.

  • (a)  
    The function u is called approximately differentiable in x if there exists a vRd such that
    The vector ∇ u(x) = v is called the approximate gradient of u at x.
  • (b)  
    The point x is an approximate jump point of u if there exist u+(x) > u(x) and a νRd , $\left\vert \nu \right\vert =1$ such that
    where ${B}_{r}^{+}\left(x,\nu \right)$ and ${B}_{r}^{-}\left(x,\nu \right)$ are balls cut by the hyperplane perpendicular to ν and containing x, i.e.,
    The set Ju of all approximate jump points is called the jump set of u.

Theorem 2.21. ([7]). Let u ∈ BV(Ω). Then,

  • (a)  
    u is almost everywhere approximately differentiable with ∇a u = ∇ u in L1(Ω, Rd ).
  • (b)  
    The jump set satisfies ${\mathcal{H}}^{d-1}\left({S}_{u}{\backslash}{J}_{u}\right)=0$ and we have $\nabla u\enspace \llcorner \enspace {J}_{u}=\left({u}^{+}-{u}^{-}\right){\nu }_{u}{\mathcal{H}}^{d-1}$.
  • (c)  
    The restriction ∇u ⌞ (Ω\Su ) is absolutely continuous with respect to ${\mathcal{H}}^{d-1}$.

In particular, the involved sets and functions are Borel sets and functions, respectively.

Denoting by

where ∇j u and ∇c u is the jump and Cantor part of ∇u, respectively, the gradient of a u ∈ BV(Ω) can be decomposed into

Equation (6)

with ∇c u being singular with respect to ${\mathcal{L}}^{d}$ and absolutely continuous with respect to ${\mathcal{H}}^{d-1}$.

This construction allows in particular to define penalties beyond the total variation seminorm (see, for instance [7, section 5.5]). Letting g : Rd → [0, ] a proper, convex and lower semi-continuous function and g be given according to

with allowed, then the functional

Equation (7)

where ${\sigma }_{{\nabla }^{c}u}$ is the sign of ∇c u, i.e., ${\nabla }^{c}u={\sigma }_{{\nabla }^{c}u}\left\vert {\nabla }^{c}u\right\vert $, is proper, convex and lower semi-continuous on BV(Ω). With the Fenchel-dual functional, i.e., ${g}^{{\ast}}\left(y\right)=\underset{x\in {\mathbf{R}}^{d}}{\mathrm{sup}}\enspace x\cdot y-g\left(x\right)$, it can also be expressed in (pre-)dual form as

Obviously, the usual TV-case corresponds to g being the Euclidean norm on Rd . Also, g(x) = for some $\left\vert x\right\vert =1$ does not allow jumps in the direction of x, so one usually assumes that g(x) < for each $\left\vert x\right\vert =1$ in order to obtain a genuine penalty in BV(Ω). In addition, if there are c0 > 0 and R > 0 such that $g\left(x\right){\geqslant}{c}_{0}\left\vert x\right\vert $ for each $\left\vert x\right\vert {\geqslant}R$, then there is a constant C > 0 such that

for all u ∈ BV(Ω), i.e., ${\mathcal{R}}_{g}$ is as coercive as TV. Consequently, the well-posedness and convergence statements in theorems 2.11, 2.14 and 2.17 as well as in proposition 2.18 can be adapted to ${\mathcal{R}}_{g}$ in a straightforward manner with the proofs following the same line of argumentation.

Example 2.22. There are several possibilities for replacing the non-differentiable norm function $\left\vert \cdot \right\vert $ in the TV-functional by a smooth approximation in 0.

Choosing a ɛ > 0, consider

both being continuously differentiable in Rd and approximating $\left\vert \cdot \right\vert $ for ɛ → 0.

The associated penalties ${\mathcal{R}}_{{g}_{\varepsilon }^{1}}$ and ${\mathcal{R}}_{{g}_{\varepsilon }^{2}}$ are often referred to as Huber-TV and smooth TV, respectively.

Example 2.23. Taking g as a non-Euclidean norm on Rd yields functionals of anisotropic total-variation type. The common choice is g = |⋅|1 which is also often referred to as anisotropic TV.

Remark 2.24. It is worth noting that g as above can also be made spatially dependent, which has applications for instance in the context of regularisation for inverse problems involving multiple modalities or multiple spectra. Under some assumptions, functionals ${\mathcal{R}}_{g}$ as in (7) with spatially dependent g are again lower semi-continuous on BV [6] and well-posedness results for TV apply [105].

2.4. Colour and multichannel images

Colour and multichannel images are usually represented by functions mapping into a vector-space. Total-variation functionals and regularisation approaches can easily be extended to such vector-valued functions; definition 2.5 already contains an isotropic variant for functions with values in a finite-dimensional space H, where we used the Hilbert-space norm $\vert x\vert ={\left({\sum }_{i=1}^{d}{x}_{i}\cdot {x}_{i}\right)}^{1/2}$ as pointwise norm on Hd for the test functions $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{H}^{d}\right)$.

However, in contrast to the scalar case, this is not the only choice yielding TV-functionals that are invariant under distance-preserving transformations. The essential property for a norm |⋅| on Hd needed for the latter is

where ${\left(Ox\right)}_{i}={\sum }_{j=1}^{d}{o}_{ij}{x}_{j}$. We call such norms unitarily left invariant. Denoting by |⋅|* the dual norm, the associated total variation for a $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},H\right)$ is given by

and invariant to distance-preserving transformations. If the norm |⋅| is moreover unitarily right invariant, i.e.,

where ${\left(xO\right)}_{i}={\left(O\left(x\right)\right)}_{i}$, then it can be written as a unitarily invariant matrix norm and hence |x| only depends on the singular values of the mapping associated with x in a permutation- and sign-invariant manner. More precisely, there exists a norm |⋅|Σ on Rd with ||Σ = |σ|Σ for all σRd and PRd×d with $\left\vert P\right\vert $ being a permutation matrix, such that |x| = |σ|Σ for all xHd , where σ are the singular values of the mapping Hd Rd given by $y{\mapsto}{\left({x}_{i}\cdot y\right)}_{i}$. Conversely, any such norm on Rd induces a unitarily invariant matrix norm. A common choice are the norms generated by the p-vector norm, the Schatten- p -norms. For p = 1, p = 2 and p = , they correspond to the nuclear norm, the Frobenius norm and the usual spectral norm, respectively, all of which have been proposed in the existing literature to use in conjunction with TV, see, e.g., [79, 164]. Among those possibilities, the nuclear norm appears particularly attractive as it provides a relaxation of the rank functional [156]. Hence, solutions with low-rank gradients and more pronounced edges can be expected from nuclear-norm-TV regularisation.

Also here, the well-posedness and convergence results in theorems 2.11, 2.14 and 2.17 as well as in proposition 2.18 are transferable to the vector-valued case, as can be seen from equivalence of norms.

Moreover, functionals of the type (7) are possible with g : Hd → [0, ] proper, convex and lower semi-continuous such that g exists. However, u takes values in H which calls for some adaptations which we briefly describe in the following. First, concerning definition 2.20(a), we are able to generalise in a straightforward way by considering vHd , the norm in H and the scalar product in Hd such that the approximate gradient of u at x is ∇ u(x) ∈ Hd . For jump points according to (b), we are no longer able to require u+(x) > u(x) such that we have to replace this by u+(x) ≠ u(x) and arrive at a meaningful definition replacing the absolute value by the norm in H. However, u+, u and ν are then only unique up to a sign. Nevertheless, (u+u) ⊗ ν according to ${\left(\left({u}^{+}-{u}^{-}\right)\otimes \nu \right)}_{i}=\left({u}^{+}-{u}^{-}\right){\nu }_{i}$ is still unique. The analogue of theorem 2.21 and (6) holds with these notions, with the following adaptation:

with the Cantor part being of rank one, i.e., ${\nabla }^{c}u={\sigma }_{{\nabla }^{c}u}\left\vert {\nabla }^{c}u\right\vert $ where ${\sigma }_{{\nabla }^{c}u}$ is rank one $\left\vert {\nabla }^{c}u\right\vert $-almost everywhere [7, theorem 3.94]. The functional ${\mathcal{R}}_{g}$ according to

then realises a regulariser with the same regularisation properties as its counterpart for scalar functions.

3. Higher-order TV regularisation

First-order regularisation for imaging problems might not always lead to results of sufficient quality. Recall that taking the total variation as regularisation functional has the advantage that the solution space BV(Ω) naturally allows for discontinuities along hypersurfaces ('jumps') which correspond, for imaging applications, to object boundaries. Indeed, TV has a good performance in edge preservation which can also be observed numerically.

However, for noisy data, the solutions suffer from non-flat regions appearing flat in conjunction with the introduction of undesired edges. This effect is called the staircasing effect, see figure 1, in particular panel (c). Thinking of TV as a one-norm type penalty for the gradient, this is, on the one hand, due to the 'linear growth' of the Euclidean norm $\left\vert \cdot \right\vert $ at infinity (which implies BV(Ω) as solution space). On the other hand, $\left\vert \cdot \right\vert $ is non-differentiable in 0 which can be seen to be responsible for the flat regions in the solutions.

As we have seen in subsection 2.3, the latter can be remedied by considering convex functions of the measure ∇u instead of TV which are smooth in the origin and have linear growth at , also see example 2.22. Then, ${\mathcal{R}}_{g}$ can be taken as a first-order regulariser under the same conditions as for TV regularisation leading to solutions which are still in BV(Ω) and may, in particular, admit jumps. Additionally, less flat regions tend to appear in solutions for noisy data as we no longer have a singularity at 0. However, this feature comes with two drawbacks: first, compared to TV, noise removal seems not to be so strong in numerical solutions. Second, in addition to the regularisation parameter for the inverse problem, one has to choose the parameter ɛ appropriately. A too small choice might again lead to staircasing to appear while choosing ɛ too big may lead to edges being lost, see figure 1(d). The question remains whether we can improve on this.

Here, we like to discuss and study the use of higher-order derivatives for regularisation in imaging. This can be motivated by modelling images as piecewise smooth functions, i.e., assuming that an image is several times differentiable (in some sense) while still allowing for object boundaries where the function may jump. With this model in mind, higher-order variational approaches arise quite naturally and we refer for instance to [17, 73, 104] for spaces and regularisation approaches related to second-order variational approaches.

3.1. Symmetric tensor calculus

For smooth functions, higher-order derivatives can be represented as tensor fields, i.e., the derivative represents a tensor in each point. As the order of partial differentiation might be interchanged, these tensors turn out to be symmetric. Symmetric tensors are therefore a suitable tool for representing these objects independent from indices. There are several ways to introduce and motivate tensors and vector spaces of tensors. For our purposes, the following definition will be sufficient. Note that there and throughout this chapter, l ⩾ 0 will always be a tensor order.

Figure 1.

Figure 1. First-order denoising example. (a) Ground truth, (b) noisy image with additive Gaussian noise (PSNR: 13.9 dB), (c) TV-regularised solution (PSNR: 29.3 dB), (d) regularisation with smooth TV-like penalty $\varphi \left(x\right)=\sqrt{{x}^{2}+{\varepsilon }^{2}}-\varepsilon $ (PSNR: 29.7 dB). All parameters were manually tuned via grid search to yield highest PSNR.

Standard image High-resolution image

Definition 3.1. We define

as the vector space of l -tensors and symmetric l -tensors, respectively.

Here, $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ is called symmetric, if ξ(a1, ..., al ) = ξ(aπ(1), ..., aπ(l)) for all a1, ..., al Rd and πSl , where Sl denotes the permutation group of {1, ..., l}.

For $\xi \in {\mathcal{T}}^{k}\left({\mathbf{R}}^{d}\right)$, k ⩾ 0 and $\eta \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ the tensor product is defined as the element $\xi \otimes \eta \in {\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$ obeying

for all a1, ..., ak+l Rd .

Note that the space of l-tensors is actually the space of (0, l)-covariant tensors, however, we will not need to distinguish between co- and contravariant tensors. We have

while for low orders, the symmetric tensor spaces coincide with well-known spaces Sym0(Rd ) ≡ R, Sym1(Rd ) ≡ Rd and Sym2(Rd ) ≡ Sd×d , the space of symmetric d × d matrices.

In the following, we give a brief overview of the tensor operations that are the most relevant to define regularisation functionals on higher-order derivatives.

Remark 3.2. The space ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ can be associated with a unit basis. Indexed by p ∈ {1, ..., d}l , its elements are given by ${e}_{p}\left({a}_{1},\dots ,{a}_{l}\right)={\prod }_{i=1}^{l}{a}_{i,{p}_{i}}$ while the respective coefficient for a $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ is given by ${\xi }_{p}=\xi \left({e}_{{p}_{1}},\dots ,{e}_{{p}_{l}}\right)$. Each $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ thus has the representation

The identity of vector spaces ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)={\mathbf{R}}^{d{\times}\cdots {\times}d}$ is evident from that.

The space Syml (Rd ) is obviously a (generally proper) subspace of ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$. A (non-symmetric) tensor $\xi \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ can be symmetrised by averaging over all permuted arguments, i.e.,

The symmetrisation operator $\vert \vert \vert :{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\to {\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)$ obviously defines a projection. A basis for Syml (Rd ) is given by ${\mathrm{e}}_{p}^{\mathrm{S}\mathrm{y}\mathrm{m}}=\vert \vert \vert {e}_{p}$ for p ranging over all tuples in {1, ..., d}l with non-decreasing entries. The coefficients ξp can still be obtained by ${\xi }_{p}=\xi \left({e}_{{p}_{1}},\dots ,{e}_{{p}_{l}}\right)$.

We would like to equip the spaces with a Hilbert space structure.

Definition 3.3. For $\xi ,\eta \in {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$, the scalar product and Frobenius norm are defined as

Example 3.4. For ξ ∈ Syml (Rd ), the norm corresponds to the absolute value for l = 0, the Euclidean norm in Rd for l = 1 and in case l = 2, we can identify ξ ∈ Sym2(Rd ) with

With the Frobenius norm, tensor spaces become Hilbert spaces of finite dimension and the symmetrisation becomes an orthogonal projection, see, e.g., [99].

Proposition 3.5. 

  • (a)  
    With the above scalar-product and norm, the spaces ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$, Syml (Rd ) are finite-dimensional Hilbert spaces with $\mathrm{dim}{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)={d}^{l}$ and $\mathrm{dim}{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)=\left(\genfrac{}{}{0pt}{}{d+l-1}{l}\right)$.
  • (b)  
    The symmetrisation | | | is the orthogonal projection in ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ onto Syml (Rd ).

Tensor-valued mappings ${\Omega}\to {\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ on the domain Ω ⊂ Rd are called tensor fields. The tensor-field spaces $\mathcal{C}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, ${\mathcal{C}}_{\mathrm{c}}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and ${\mathcal{C}}_{0}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ as well as the Lebesgue spaces ${L}^{p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ are then given in the usual manner. Also, measures can be tensor-valued, giving $\mathcal{M}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, the space of l-tensor-valued Radon measures. Duality according to proposition 2.4 holds, i.e., $\mathcal{M}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)={\mathcal{C}}_{0}{\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$. Note that for all spaces, the Frobenius norm is used as pointwise norm in the respective definitions of the tensor-field norm. Furthermore, all the above applies analogously to symmetric tensor fields, i.e., mappings between Ω → Syml (Rd ).

Turning to differentiation, the kth Fréchet derivative of a sufficiently smooth l-tensor field, where from now on k ⩾ 1 will always denote an order of differentiation, is naturally a (k + l)-tensor field which we denote by ${\nabla }^{k}\otimes u:{\Omega}\to {\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$ according to

The fact that gradient tensor-fields are not symmetric in general gives rise to consider the k th symmetrised derivative given by ${\mathcal{E}}^{k}u=\vert \vert \vert {\nabla }^{k}\otimes u$. This definition is consistent as ${\mathcal{E}}^{{k}_{2}}{\mathcal{E}}^{{k}_{1}}={\mathcal{E}}^{{k}_{1}+{k}_{2}}$ for k1, k2 ⩾ 0. Divergence operators are then, up to the sign, formal adjoints of these differentiation operators. They are given as follows. Introducing the trace of a tensor $\xi \in {\mathcal{T}}^{l+2}\left({\mathbf{R}}^{d}\right)$ according to

gives an l-tensor. It can be interpreted as the tensor contraction of the first and the last component of the tensor. As for the vector-field case, the divergence is now the trace of the derivative. For k-times differentiable $v:{\Omega}\to {\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$, the kth divergence is thus given by

Again, this is consistent with repeated application, i.e., ${\mathrm{d}\mathrm{i}\mathrm{v}}^{{k}_{1}+{k}_{2}}={\mathrm{d}\mathrm{i}\mathrm{v}}^{{k}_{2}}{\mathrm{d}\mathrm{i}\mathrm{v}}^{{k}_{1}}$. Note that there might be other choices of the divergence, such as contracting the derivative with any other than the last components of the tensor. This affects, however, only non-symmetric tensor fields. For symmetric tensor fields, the result is independent from the choice of the contraction components and always a symmetric tensor field.

Example 3.6. The symmetrised gradient of scalar functions Ω → Sym0(Rd ) coincides with the usual gradient while the divergence for mappings Ω → Sym1(Rd ) coincides with the usual divergence.

The cases ${\mathcal{E}}^{2}{u}^{0}$ and $\mathcal{E}{u}^{1}$ for u0 : Ω → Sym0(Rd ) and u1 : Ω → Sym1(Rd ) can be handled with the identification of Sym2(Rd ) and symmetric matrices Sd×d :

Analogously, for the divergence of a v : Ω → Sym2(Rd ), we have that

In particular, for k ⩾ 1, there are the usual spaces of continuously differentiable tensor fields which are denoted by ${\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and equipped with the usual norm ${{\Vert}u{\Vert}}_{k,\infty }={\mathrm{max}}_{0{\leqslant}m{\leqslant}k}\enspace {{\Vert}{\nabla }^{m}\otimes u{\Vert}}_{\infty }$. Likewise, we consider k-times continuously differentiable tensor fields with compact support ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ where k = leads to the space of test tensor fields. Also, for finite k, the space ${\mathcal{C}}_{0}^{k}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is given as the closure of ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ in ${\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$. Of course, the analogous constructions apply to symmetric tensor fields, leading to the spaces ${\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, ${\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and ${\mathcal{C}}_{0}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ as well as the space of test symmetric tensor fields ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$.

As Ω is assumed to be a connected set, we are able to describe the kernels of ∇k and ${\mathcal{E}}^{k}$ for (symmetric) tensor fields in terms of finite-dimensional spaces of polynomials.

Proposition 3.7. Let $u\in {\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ such that ∇k u = 0. Then, u is a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$-valued polynomial of maximal order k − 1, i.e., there are ${\xi }^{m}\in {\mathcal{T}}^{l+m}\left({\mathbf{R}}^{d}\right)$, m = 0, ..., k − 1 such that

Equation (8)

If ${\mathcal{E}}^{k}u=0$ for $u\in {\mathcal{C}}^{k+l}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, then u is a Syml (Rd )-valued polynomial of maximal order k + l − 1, i.e., the above representation holds for ξm ∈ Syml+m (Rd ), m = 0, ..., k + l − 1 with the sum ranging from 0 to k + l − 1.

Proof. At first we note that any ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$- and Syml (Rd )-valued polynomial of maximal order k − 1 and k + l − 1, respectively, admits a representation as claimed. In case ∇k u = 0 for $u\in {\mathcal{C}}^{k}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ it follows directly from a basis representation of u(x) that u is a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$-valued polynomial of maximal order k − 1.

Now in case ${\mathcal{E}}^{k}u=0$ for $u\in {\mathcal{C}}^{k+l}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, we get that ∇k+l u = 0, see lemma A.3. This implies that u is a Syml (Rd )-valued polynomial of maximal degree k + l − 1 as claimed. □

Next, we would like to introduce and discuss weak forms of differentiation for (symmetric) tensor fields. Starting point for this is a version of the well-known Gauss–Green theorem for smooth (symmetric) tensor fields [28].

Proposition 3.8. Let Ω ⊂ Rd be a bounded Lipschitz domain, $u\in \mathcal{C}\left(\overline{{\Omega}},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, $v\in {\mathcal{C}}^{1}\left(\overline{{\Omega}},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$. Then, a Gauss–Green theorem holds in the following form:

with ν being the outward unit normal on ∂Ω.

If $u\in \mathcal{C}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, $v\in {\mathcal{C}}^{1}\left(\overline{{\Omega}},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$, the identity reads as

If one of the tensor fields u or v have compact support in Ω the boundary term does not appear and the identities are valid for arbitrary domains Ω.

As usual, being able to express integrals of the form ${\int }_{{\Omega}}$(∇ ⊗ u) ⋅ v  dx and ${\int }_{{\Omega}}\mathcal{E}u\cdot v\enspace \mathrm{d}x$ for test tensor fields without the derivative of u allows to introduce a weak notion of ∇ ⊗ u and $\mathcal{E}u$, respectively, as well as associated Sobolev spaces.

Definition 3.9. For $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, $w\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$ is the weak derivative of u, denoted w = ∇ ⊗ u, if for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$, it holds that

Likewise, for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, wL1 loc(Ω, Syml+1(Rd )) is the weak symmetrised derivative of u, denoted $w=\mathcal{E}u$, if the above identity holds for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$.

Like the scalar versions, ∇ and $\mathcal{E}$ are well-defined and constitute closed operators between the respective Lebesgue spaces with dense domain.

Definition 3.10. The Sobolev space of tensor fields of order l of differentiation order k and exponent p ∈ [1, ] is defined as

while ${H}_{0}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is the closure of the subspace ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ with respect to the ||⋅||k,p -norm.

Replacing ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ by Syml (Rd ) and letting

defines the Sobolev space of symmetric tensor fields, denoted by Hk,p (Ω, Syml (Rd )). The space ${H}_{0}^{k,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is again the closure ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ with respect to the corresponding norm.

By closedness of the differential operators, the Sobolev spaces are Banach spaces. Also, since weak derivatives are symmetric, we have that ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{0}\left({\mathbf{R}}^{d}\right)\right)={H}^{k,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{0}\left({\mathbf{R}}^{d}\right)\right)$ in the sense of Banach space isometry, as well as coincidence with the usual Sobolev spaces. For l ⩾ 1, the space ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ corresponds to the space where all components of u are in Hk,p (Ω). However, generally, for l ⩾ 1, the norm of Hk,p (Ω, Syml (Rd )) is weaker than the norm in ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, such that only ${H}^{k,p}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)\hookrightarrow {H}^{k,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ in the sense of continuous embedding and the latter is a strictly larger space.

Nevertheless, equality holds if some kind of Korn's inequality can be established which is, for instance, the case for the spaces ${H}_{0}^{1,p}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{1}\left({\mathbf{R}}^{d}\right)\right)$ for 1 < p < [120, section 5.6] as well as and the spaces ${H}_{0}^{1,2}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ for l ⩾ 1 (which follows from [28, proposition 3.6] via smooth approximation).

Finally, let us briefly discuss (symmetric) tensor-valued distributions and the distributional forms of ∇k and ${\mathcal{E}}^{k}$.

Definition 3.11. A ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ -valued distribution on Ω is a linear mapping $u:{\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)\to \mathbf{R}$ that satisfies the following continuity estimate: for each K ⊂⊂ Ω, there is an mN and a C > 0 such that

The distribution u is regular if there is a $\bar{u}\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ such that

A Syml (Rd )-valued distribution on Ω and its regularity is analogously defined by replacing ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$ by Syml (Rd ) in the above definition.

Then, the distributional (symmetrised) derivatives are given by (∇k u)(φ) = (−1)k u(divk φ), $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathcal{T}}^{k+1}\left({\mathbf{R}}^{d}\right)\right)$ and $\left({\mathcal{E}}^{k}u\right)\left(\varphi \right)={\left(-1\right)}^{k}u\left({\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \right)$, $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+1}\left({\mathbf{R}}^{d}\right)\right)$ which makes them a ${\mathcal{T}}^{k+l}\left({\mathbf{R}}^{d}\right)$- and Symk+l (Rd )-valued distribution, respectively. We then have the following generalisation of theorem 3.7 which will be useful for analysing functionals that depend on (symmetrised) distributional derivatives.

Proposition 3.12. If ∇k u = 0 for a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$-valued distribution, then u is regular and a ${\mathcal{T}}^{l}\left({\mathbf{R}}^{d}\right)$-valued polynomial of maximal degree k − 1.

If ${\mathcal{E}}^{k}u=0$ for a Syml (Rd )-valued distribution, then u is regular and a Syml (Rd )-valued polynomial of maximal degree k + l − 1.

Proof. This can be deduced from proposition 3.7 via mollification arguments similar as in [28, proposition 3.3]. □

3.2. Functions of higher-order bounded variation

In the following, we discuss functions whose derivative is a Radon measure for a fixed order of differentiation. As higher-order derivatives of scalar functions are always symmetric, it suffices to consider only the symmetrised higher-order derivative ${\mathcal{E}}^{k}$ in this case as well as symmetric tensor fields. However, as we are also interested in intermediate differentiation orders, we moreover discuss spaces of symmetric tensors for which the symmetrised derivative of some order is a Radon measure.

In the following, recall that k ⩾ 1 denotes a differentiation order and l ⩾ 0 denotes a tensor order.

Definition 3.13. Let Ω ⊂ Rd be a domain.

  • (a)  
    In the case l = 0, for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$, the total variation of order k is defined as
    For general l ⩾ 0 and $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$, the total deformation of order k is
  • (b)  
    The normed space according to
    is called the space of symmetric tensor fields of bounded deformation of order k. The scalar case, i.e., l = 0, is referred to as the space of functions of bounded variation of order k. The latter spaces are denoted by BVk (Ω).

We note that the Hilbert-space norm on the tensor space for the definition of TVk leads to a corresponding pointwise norm on the derivatives. While this choice is rather natural, and does not require to distinguish primal and dual norms, also other choices are possible for which we refer to [128] in the second-order case.

Let us analyse some of the basic properties of these spaces.

Proposition 3.14. Let Ω ⊂ Rd be a domain, p ∈ [1, ]. Then:

  • (a)  
    TDk is proper, convex and a lower semi-continuous seminorm on Lp (Ω, Syml (Rd )).
  • (b)  
    TDk (u) = 0 if and only if ${\mathcal{E}}^{k}u=0$. In particular, TDk (u) = 0 implies that u is a Syml (Rd )-valued polynomial of maximal degree k + l − 1.

Proof. With p* being the dual exponent to p, each test tensor field obeys ${\mathrm{d}\mathrm{i}\mathrm{v}}^{k}\varphi \in {L}^{{p}^{{\ast}}}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ for $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$. The functional TDk is thus a pointwise supremum over a set of continuous linear functionals and, consequently, convex and lower semi-continuous. By definition, it is obviously proper and positively homogeneous since if divk φ is a test vector field, then also −divk φ is.

By definition of TDk we see that TDk (u) = 0 if and only if ${\int }_{{\Omega}}$ u ⋅ divk φ dx = 0 for each $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$. But this is equivalent to ${\mathcal{E}}^{k}u=0$ in the distributional sense such that in particular, the polynomial representation follows from proposition 3.12. □

In order to show more properties, for instance, that BDk (Ω, Syml (Rd )) is a Banach space, let us adopt a more abstract viewpoint. We say that a function $\left\vert \cdot \right\vert :X\to \left[0,\infty \right]$ for X a Banach space is a lower semi-continuous seminorm on X if $\left\vert \cdot \right\vert $ is positive homogeneous, satisfies the triangle inequality and is lower semi-continuous. The kernel of $\left\vert \cdot \right\vert $, denoted $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$, is the set $\left\{x\in X\enspace \vert \enspace \left\vert x\right\vert =0\right\}$ which is a closed linear subspace of X.

Lemma 3.15. Let $\left\vert \cdot \right\vert $ be a lower semi-continuous seminorm on the Banach space X with norm ||⋅||X . Then,

is a Banach space. The seminorm $\left\vert \cdot \right\vert $ is continuous in Y.

Proof. It is immediate that Y is a normed space. Let {xn } be a Cauchy sequence in Y which is obviously a Cauchy sequence in X. Hence, a limit xX exists for which the lower semi-continuity yields $\left\vert x\right\vert {\leqslant}{\text{lim inf}}_{n\to \infty }\left\vert {x}^{n}\right\vert {< }\infty $, the latter since $\left\{\left\vert {x}^{n}\right\vert \right\}$ is a real Cauchy sequence. In particular, xY.

To obtain convergence with respect to $\left\vert \cdot \right\vert $, choose, for ɛ > 0, an n such that for all mn, $\left\vert {x}^{n}-{x}^{m}\right\vert {\leqslant}\varepsilon $. Letting m gives, as xn xm xn x in X,

This implies xn x in Y which is what we intended to show.

Finally, the continuity of $\left\vert \cdot \right\vert $ follows from the standard estimate $\left\vert \left\vert {x}^{1}\right\vert -\left\vert {x}^{2}\right\vert \right\vert {\leqslant}\left\vert {x}^{1}-{x}^{2}\right\vert {\leqslant}{{\Vert}{x}^{1}-{x}^{2}{\Vert}}_{Y}$ for x1, x2Y. □

It is then obvious from proposition 3.14 and lemma 3.15 that BDk (Ω, Syml (Rd )) is a Banach space. In order to examine the structure of these spaces, it is crucial to understand the case k = 1, i.e., BD(Ω, Syml (Rd )) = BD1(Ω, Syml (Rd )), where the symmetrised derivative is only a measure. For l ⩾ 1, these spaces are strictly greater that BV(Ω, Syml (Ω)) as a consequence of the failure of Korn's inequality. Important properties of these spaces are summarised as follows.

Theorem 3.16. ([32, theorem 2.6]).If u is a Syml (Rd )-valued distribution on Ω a bounded Lipschitz domain with $\mathcal{E}u\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$, then u ∈ BD(Ω, Syml (Rd )).

Theorem 3.17. ([28, theorems 4.16 and 4.17]). For Ω a bounded Lipschitz domain and, 1 ⩽ pd/(d − 1), the space BD(Ω, Syml (Rd )) is continuously embedded in Lp (Ω, Syml (Rd )). Moreover, for p < d/(d − 1), the embedding is compact.

Theorem 3.18. (Sobolev–Korn inequality [28, corollary 4.20]). For Ω a bounded Lipschitz domain and ${R}_{l}:{L}^{d/\left(d-1\right)}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\to \mathrm{ker}\left(\mathcal{E}\right)$ a linear and continuous projection onto the kernel of $\mathcal{E}$, there exists a constant C > 0 such that for each u ∈ BD(Ω, Syml (Rd )) it follows that

Equation (9)

Note that the projection Rl as stated always exists as $\mathrm{ker}\left(\mathcal{E}\right)$ is finite-dimensional (see proposition 3.12).

Now, for general k and u ∈ BDk (Ω, Syml (Rd )) fixed, $w={\mathcal{E}}^{k-1}u$ is a Syml+k−1(Rd )-valued distribution with the property

for $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$. In other words, $\mathcal{E}w={\mathcal{E}}^{k}u\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$, thus theorem 3.16 implies that ${\mathcal{E}}^{k-1}u=w\in \mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l-1}\left({\mathbf{R}}^{d}\right)\right)$ and, in particular, we have u ∈ BDk−1(Ω, Syml (Rd )). Hence, the spaces are nested:

Let us look at the norms: by the Sobolev–Korn inequality (9), for some linear projection ${R}_{k+l-1}:\mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l-1}\left({\mathbf{R}}^{d}\right)\right)\to \mathrm{ker}\left(\mathcal{E}\right)$, we see

which implies

Now, $u{\mapsto}{R}_{k+l-1}{\mathcal{E}}^{k-1}u$ is well-defined on BDk (Ω, Syml (Rd )), linear, has finite-dimensional image and is hence continuous. We may therefore estimate

Proceeding inductively, we arrive at the estimate

Equation (10)

for some C > 0 independent of u. Therefore, we obtain the following theorem.

Theorem 3.19. If Ω ⊂ Rd is a bounded Lipschitz domain, then the norm equivalence

Equation (11)

holds on BDk (Ω, Syml (Rd )). The embeddings

are continuous.

Proof. The nontrivial estimate to establish norm equivalence has just been shown in (10). The continuity of the embedding follows from the fact that the norm on the right-hand side in (11) is increasing with respect to k. □

In the scalar case, we can furthermore establish Sobolev embeddings.

Theorem 3.20. Let Ω be a bounded Lipschitz domain and 0 ⩽ m < k.

For kmd : the space BVk (Ω) is continuously embedded in Hm,p (Ω) for $1{\leqslant}p{\leqslant}\frac{d}{d-\left(k-m\right)}$, where we set $\frac{d}{d-\left(k-m\right)}=\infty $ for km = d.

If $p{< }\frac{d}{d-\left(k-m\right)}$, then the embedding is compact.

For km > d : the space BVk (Ω) is compactly embedded in ${\mathcal{C}}^{m,\alpha }\left(\overline{{\Omega}}\right)$ for each $\alpha \in \left.\right]0,1 \left[\right. $.

Proof. In the scalar case, ${{\Vert}u{\Vert}}_{1}+{\sum }_{\left\vert \beta \right\vert {\leqslant}k-1}{{\Vert}\nabla {\partial }^{\beta }u{\Vert}}_{\mathcal{M}}$ for βNd a multiindex and u ∈ BVk (Ω) constitutes an equivalent norm on BVk (Ω), as a consequence of theorem 3.19. By the Poincaré inequality in BV(Ω),

for each $\left\vert \beta \right\vert {\leqslant}k-1$. This establishes the continuous embedding BVk (Ω) → Wk−1,d/(d−1)(Ω). Application of the well-known embedding theorems for Sobolev spaces (see [1, theorems 5.4 and 6.2]) then give the results for the cases km < d and km > d as well as for the case km = d and p < .

For the case km = d and p = we note that again by Sobolev embeddings [1, theorem 5.4] we get for a constant C > 0 and all uHk,1(Ω) that

Approximating u ∈ BVk (Ω) with a sequence {un } in C(Ω) ∩ BVk (Ω) strictly converging to u in BVk (Ω) as in  Lemma A.4, the result follows from applying this estimate to each un and using lower semi-continuity of the L-norm with respect to convergence in L1. □

We would like to employ TDk as a regulariser and first characterise its kernel. For that purpose, we note that TDk (u) = 0 for some u ∈ BDk (Ω, Syml (Rd )) implies that ${\mathcal{E}}^{k}u=0$ in the distributional sense, hence proposition 3.12 implies that u is a Syml (Rd )-valued polynomial of maximal degree k + l − 1. This yields the following result.

Proposition 3.21. The space ker(TDk ) is a subspace of polynomials of degree less than k + l. If l = 0, then ker(TVk ) = Pk−1 = {u : Ω → R|u polynomial of degree ⩽ k − 1}.

Next, we like to discuss coercivity of the higher-order total variation functionals.

Proposition 3.22. Let k ⩾ 1, l ⩾ 0 and Ω be a bounded Lipschitz domain. Then, TDk is coercive in the following sense: for each linear and continuous projection R : Ld/(d−1)(Ω, Syml (Rd )) → ker(TDk ), there is a C > 0 such that

Proof. At first note that by the embeddings BDk (Ω, Syml (Rd )) ↪ BD(Ω, Syml (Rd )) ↪ Ld/(d−1)(Ω, Syml (Rd )) the left-hand side of the claimed inequality is well defined and finite.

We use a contradiction argument in conjunction with compactness. Suppose for R as stated above there is a sequence {un } such that ${{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{d/\left(d-1\right)}=1$ and TDk (un ) → 0 as n. This implies $\left\{{{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{1}\right\}$ being bounded, TDk (un Run ) → 0 and by theorems 3.19 and 3.17 {un Run } has to be precompact in L1(Ω, Syml (Rd )), i.e., without loss of generality, we may assume that un Run u in L1(Ω, Syml (Rd )). By lower semi-continuity,

hence u ∈ ker(TDk ) = rg(R). On the other hand, R(un Run ) = 0 for each n as R is a projection, thus, Ru = 0 and, consequently, u = 0. In total, we have limn(un Run ) = 0 in BDk (Ω), and again by continuous embedding, also in Ld/(d−1)(Ω) which is a contradiction to ${{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{d/\left(d-1\right)}=1$ for all n. Consequently, coercivity has to hold. □

Corollary 3.23. In the scalar case, for p ∈ [1, ] with $p{\leqslant}\frac{d}{d-k}$ if k < d, we also have

Proof. This follows with the embedding theorem 3.20:

Remark 3.24. The above coercivity estimate also implies that the Fenchel conjugate of TVk is the indicator functional of closed convex set in ${L}^{{p}^{{\ast}}}\left({\Omega}\right)\cap \mathrm{ker}{\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{\perp }$ with non-empty interior. Indeed, for $\xi \in {L}^{{p}^{{\ast}}}\left({\Omega}\right)\cap \mathrm{ker}{\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{\perp }$ such that ${{\Vert}\xi {\Vert}}_{{p}^{{\ast}}}{\leqslant}{C}^{-1}$ it follows for any uLp (Ω) that

which means that ${\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}\left(\xi \right)=0$. On the other hand, if $\xi \in {L}^{{p}^{{\ast}}}\left({\Omega}\right){\backslash}\mathrm{ker}{\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{\perp }$, then ⟨ξ, u⟩ > 0 for some u ∈ ker(TVk ). Thus, ⟨ξ, u⟩ > TVk (u) so ${\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}\left(\xi \right)=\infty $.

It is interesting to note that a coercivity estimate similar to the one of corollary 3.23 also holds between two higher-order TV functionals of different order.

Lemma 3.25. Let Ω be a bounded Lipschitz domain, 1 ⩽ k1 < k2 be two orders of differentiation, $p\in \left[\right.1,\infty \left[\right.$ with pd/(dk2) if k2 < d and $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ be a continuous, linear projection. Then there exists a constant C > 0 such that

Equation (12)

holds for each $u\in {\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$.

Proof. Assume the opposite, i.e., the existence of {un } such that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}-R{u}^{n}\right)=1$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{n}\right)\to 0$ as n. Then, by compact embedding ${\mathrm{B}\mathrm{D}}^{{k}_{2}-{k}_{1}}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)\to {L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$, we have ${\nabla }^{{k}_{1}}\left({u}^{n}-R{u}^{n}\right)\to v$ as n for some $v\in {L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ for a subsequence (not relabelled). On the other hand, the Poincaré estimate gives ${{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{p}{\leqslant}C{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{n}\right)$, so un Run → 0 as n in L1(Ω). By closedness of ${\nabla }^{{k}_{1}}$ this yields v = 0. By convergence in ${L}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$, this gives the contradiction ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}-R{u}^{n}\right)\to 0$ as n. □

3.3. Tikhonov regularisation

The coercivity which has just been established can be regarded as the most important step towards existence for variational problems with TVk -regularisation. Here, we first prove an existence result for linear inverse problems in a general abstract version.

Theorem 3.26. Let X be a reflexive Banach space, Y be a Banach space, K : XY be linear and continuous, Sf : Y → [0, ] a proper, convex, lower semi-continuous and coercive discrepancy functional associated with some data f, $\left\vert \cdot \right\vert :X\to \left[0,\infty \right]$ a lower semi-continuous seminorm and α > 0. Assume that there exists a linear and continuous projection $R:X\to \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ and a C > 0 such that

and either

  • (a)  
    $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ is finite-dimensional or, more generally,
  • (b)  
    $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ admits a complement Z in $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ and ||u||X C||Ku||Y for some C > 0 and all uZ.

Then, the Tikhonov minimisation problem

Equation (13)

is well-posed, i.e., there exists a solution and the solution mapping is stable in sense that, if ${S}_{{f}^{n}}$ converges to Sf as in (4) and $\left\{{S}_{{f}^{n}}\right\}$ is equi-coercive, then for each sequence of minimisers {un } of (13) with discrepancy ${S}_{{f}^{n}}$,

  • Either ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \left\vert {u}^{n}\right\vert \to \infty $ as n and (13) with discrepancy Sf does not admit a finite solution,
  • Or ${S}_{{f}^{n}}\left(K{u}^{n}\right)+\alpha \left\vert {u}^{n}\right\vert \to {\mathrm{min}}_{u\in X}{S}_{f}\left(u\right)+\alpha \left\vert u\right\vert $ as n and there is, possibly up to shifts by functions in $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$, a weak accumulation point uX that minimises (13) with discrepancy Sf .

Further, in case (13) with discrepancy Sf admits a finite solution, for each subsequence $\left\{{u}^{{n}_{k}}\right\}$ weakly converging to some uX, it holds that $\left\vert {u}^{{n}_{k}}\right\vert \to \left\vert u\right\vert $ as k. Also, if Sf is strictly convex and K is injective, finite solutions u of (13) are unique and un u in X.

The same result is true if, for instance, instead of being reflexive, X is the dual of a separable space, and we replace weak convergence by weak* convergence in the (lower semi-) continuity assumptions on K, $\left\vert \cdot \right\vert $, Sf and in (4).

Proof. At first note that $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ being finite-dimensional implies condition (b) above, hence we can assume that (b) holds. We start with existence. Assume that the objective functional in (13) is proper as otherwise, there is nothing to show. For a minimising sequence {un }, by the coercivity assumption, {un Run } is bounded in X. Now, (b) implies the existence of a linear and continuous projection ${P}_{Z}:\mathrm{ker}\left(\left\vert \cdot \right\vert \right)\to Z$ such that id − PZ projects $\mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ onto $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$. With vn = PZ Run , we see that also {un Run + vn } is a minimising sequence and it suffices to show boundedness of {vn } to obtain a convergent subsequence. But the latter holds true since by assumption ${{\Vert}{v}^{n}{\Vert}}_{p}{\leqslant}C{{\Vert}K{v}^{n}{\Vert}}_{Y}$, such that ${{\Vert}K{v}^{n}{\Vert}}_{Y}{\leqslant}{{\Vert}K\left({u}^{n}-R{u}^{n}+{v}^{n}\right){\Vert}}_{Y}+{\Vert}K{\Vert}{{\Vert}{u}^{n}-R{u}^{n}{\Vert}}_{X}$, with the right-hand side being bounded as a consequence of the coercivity of Sf and the boundedness of {un Run }. Hence, as X is reflexive, a subsequence of {un Run + vn } converges weakly to a limit uX. By continuity of K and lower semi-continuity of both Sf and $\left\vert \cdot \right\vert $ it follows that u is a solution to (13). In case Sf is strictly convex and K is injective, Sf K is already strictly convex, so finite minimisers of (13) have to be unique.

Now let {un } be a sequence of minimisers of (13) with discrepancy ${S}_{{f}^{n}}$. We denote by $F={S}_{f}\enspace {\circ}\enspace K+\alpha \left\vert \cdot \right\vert $ as well as ${F}_{n}={S}_{{f}^{n}}\enspace {\circ}\enspace K+\alpha \left\vert \cdot \right\vert $ and first suppose that {Fn (un )} is bounded. We can then add ${v}^{n}-R{u}^{n}\in \mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ to un , with vn = PZ Run , and from equi-coercivity of $\left\{{S}_{{f}^{n}}\right\}$ obtain boundedness of {un Run + vn } as before.

This shows that by shifting the minimisers within $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left(\left\vert \cdot \right\vert \right)$ always leads to a bounded sequence, i.e., we may assume without loss of generality that {un } is bounded such that a weak accumulation point exists. Suppose that ${u}^{{n}_{k}}\rightharpoonup u$ as k. Then, estimating as in the proof of theorem 2.14, we can obtain that u is a minimiser for F and that ${\mathrm{lim}}_{k\to \infty }{F}_{{n}_{k}}\left({u}^{{n}_{k}}\right)=F\left(u\right)$ as well as ${\mathrm{lim}}_{k\to \infty }\left\vert {u}^{{n}_{k}}\right\vert =\left\vert u\right\vert $. Also, if u is the unique minimiser for (13) with discrepancy Sf , un u as n follows since any subsequence has to contain another subsequence that converges weakly to u.

The result for the two remaining cases lim infn Fn (un ) < and Fn (un ) → , respectively, finally follows analogously to theorem 2.14. □

Given that ker(TVk ) is finite dimensional, the above result immediately implies well-posedness for $\left\vert \cdot \right\vert ={\mathrm{T}\mathrm{V}}^{k}$ with X = Lp (Ω), as stated in the following corollary. The crucial ingredient here is the estimate ||uRu||p CTVk (u), which restricts the exponent of the underlying Lp -space to pd/(dk) if k < d. This shows that, the higher the order of differentiation used in the regularisation, the weaker are the requirements on the underlying spaces and, consequently, on the continuity of the operator K.

Corollary 3.27. With X = Lp (Ω), Ω being a bounded Lipschitz domain, and Sf and K as in theorem 3.26,

Equation (14)

is well-posed in the sense of theorem 3.26 whenever $p\in \left.\right]1,\infty \left[\right.$ with pd/(dk) if k < d.

As can be easily seen from the respective proofs, also the convergence result of theorem 2.17 and the result on convergence rates as in proposition 2.18 transfer to TVk regularisation.

Theorem 3.28. With the assumptions of corollary 3.27, let u ∈ BV(Ω) be a minimum-TVk -solution of Ku = f for some data f in Y and for each δ > 0 let fδ be such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta $ and denote by uα,δ a finite solution of (14) for parameter α > 0 and data fδ . Let the discrepancy functionals $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ in the sense of (4) and ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f. Choose for each δ > 0 the parameter α > 0 such that

Then, up to shifts by functions in ker(K) ∩ Pk−1, {uα,δ } has at least one Lp -weak accumulation point. Each Lp -weak accumulation point is a minimum-TVk -solution of Ku = f and limδ→0TVk (uα,δ ) = TVk (u).

Proposition 3.29. In the situation of theorem 3.28, let K*w ∈ ∂TVk (u) for some wY*. Then,

Equation (15)

The last result in particular guarantees convergence rates for the settings of example 2.19. Note also that the above results remain true in case p = 1 or in case p = d/(dk) = and K is weak*-to-weak continuous.

Let us finally note some first-order optimality conditions. For this purpose, recall that for X a Banach space, the normal cone ${\mathcal{N}}_{K}\left(u\right)$ of a set KX at uK is given by the collection of all wX* for which ${\langle w,\enspace v-u\rangle }_{{X}^{{\ast}}{\times}X}{\leqslant}0$ for all vK. If we set ${\mathcal{N}}_{K}\left(u\right)=\varnothing$ for uK, we have that ${\mathcal{N}}_{K}=\partial {\mathcal{I}}_{K}$ where ${\mathcal{I}}_{K}$ is the indicator function of K, i.e., ${\mathcal{I}}_{K}\left(u\right)=0$ if uK and otherwise.

Proposition 3.30. In the situation of corollary 3.27, if ${S}_{f}\left(v\right)=\frac{1}{2}{\Vert}v-f{{\Vert}}_{Y}^{2}$ and Y is a Hilbert space, u* ∈ Lp (Ω) is a solution of

Equation (16)

if and only if

where ${\mathcal{N}}_{{\mathrm{T}\mathrm{V}}^{k}}$ is the normal cone associated with the set $\overline{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}$ where

Proof. As $u{\mapsto}\frac{1}{2}{{\Vert}Ku-f{\Vert}}_{Y}^{2}$ is Gâteaux differentiable, it is continuous with unique subgradient, so, by subdifferential calculus, optimality of u* is equivalent to K*(fKu*) ∈ α∂TVk (u*) which can also be expressed as

Now since ${\mathrm{T}\mathrm{V}}^{k}={\mathcal{I}}_{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}^{{\ast}}$, it follows that ${\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}={\mathcal{I}}_{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}^{{\ast}{\ast}}={\mathcal{I}}_{\overline{{\mathcal{B}}_{{\mathrm{T}\mathrm{V}}^{k}}}}$, so $\partial {\left({\mathrm{T}\mathrm{V}}^{k}\right)}^{{\ast}}={\mathcal{N}}_{{\mathrm{T}\mathrm{V}}^{k}}$. □

Remark 3.31. In the situation of proposition 3.30, it is also possible to give an a-priori estimate for the solutions of (16) in case K is injective on Pk−1. Indeed, with R : Lp (Ω) → Pk−1 the continuous projection operator on the kernel of TVk and C > 0 the coercivity constant, i.e., ||uRu||p CTVk (u) for all uLp (Ω), by optimality, a solution u* satisfies $\alpha {\mathrm{T}\mathrm{V}}^{k}\left({u}^{{\ast}}\right){\leqslant}\frac{1}{2}{{\Vert}f{\Vert}}_{Y}^{2}$ and consequently, ${{\Vert}u-Ru{\Vert}}_{p}{\leqslant}\frac{1}{2\alpha }C{{\Vert}f{\Vert}}_{Y}^{2}$. Likewise, comparing with u* − Ru*, optimality also gives ${{\Vert}K{u}^{{\ast}}-f{\Vert}}_{Y}^{2}{\leqslant}{{\Vert}K\left({u}^{{\ast}}-R{u}^{{\ast}}\right)-f{\Vert}}_{Y}^{2}$, which is equivalent to ${{\Vert}KR{u}^{{\ast}}{\Vert}}_{Y}^{2}{\leqslant}2\langle KR{u}^{{\ast}},\enspace f-K\left({u}^{{\ast}}-R{u}^{{\ast}}\right)\rangle $. Using $ab{\leqslant}\frac{1}{4}{a}^{2}+{b}^{2}$, the latter leads to ${{\Vert}KR{u}^{{\ast}}{\Vert}}_{Y}^{2}{\leqslant}4{{\Vert}f-K\left({u}^{{\ast}}-R{u}^{{\ast}}\right){\Vert}}_{Y}^{2}$, where the right-hand side can further be estimated, using ${\left(a+b\right)}^{2}{\leqslant}\left(1+\varepsilon \right)\left({a}^{2}+\frac{1}{\varepsilon }{b}^{2}\right)$ with $\varepsilon =\frac{1}{4{\alpha }^{2}}{C}^{2}{{\Vert}K{\Vert}}^{2}$ to give

Now, as K is injective on Pk−1 = rg(R), there is a c > 0 such that c||Ru||p ⩽ ||KRu||Y for all uLp (Ω). Consequently, employing the triangle inequality and estimating yields

Equation (17)

which is an a priori bound that only requires the knowledge of the Poincaré–Wirtinger-type constant C, the constant c in the inverse estimate for K on Pk+1, as well as an estimate on ||K||. Beyond being of theoretical interest, such a bound can for instance be used in numerical algorithms, see section 6, example 6.24.

If the Kullback–Leibler divergence is used instead of the quadratic Hilbert space discrepancy, i.e., Sf (v) = KL(v, f), Y = L1(Ω'), and data f ⩾ 0 a.e., then one has to choose a u0 ∈ BVk (Ω) such that KL(Ku0, f) < . Set Cf = KL(Ku0, f) + αTVk (u0). Then, an optimal solution u* will satisfy ${\mathrm{T}\mathrm{V}}^{k}\left({u}^{{\ast}}\right){\leqslant}\frac{{C}_{f}}{\alpha }$. Further, we have ||v||1 ⩽ 2KL(v, f) + 2||f||1 for vL1(Ω') with v ⩾ 0 a.e., see lemma A.1, such that, if c > 0 is a constant with c||Ru||p ⩽ ||KRu||1 for all uLp (Ω), we get

and finally arrive at

Equation (18)

This constitutes an a priori estimate similar to (17) for the Kullback–Leibler discrepancy, however, with the difference that also a suitable constant Cf has to determined.

Remark 3.32. In order to show the effect of TV2 regularisation in contrast to TV regularisation, we performed a numerical denoising experiment for f shown in figure 2(a), i.e., solved ${\mathrm{min}}_{u\in {L}^{2}\left({\Omega}\right)}\frac{1}{2}{{\Vert}u-f{\Vert}}_{2}^{2}+{\mathcal{R}}_{\alpha }\left(u\right)$ where ${\mathcal{R}}_{\alpha }=\alpha \mathrm{T}\mathrm{V}$ or ${\mathcal{R}}_{\alpha }=\alpha {\mathrm{T}\mathrm{V}}^{2}$. One clearly sees that TV2 regularisation (figure 2(c)) reduces the staircasing effect of TV regularisation (figure 2(b)) and piecewise linear structures are well recovered. However, TV2 regularisation also blurs the object boundaries which appear less sharp in contrast to TV regularisation.

This is due to the fact that TVk regularisation for k ⩾ 2 is not able to produce solutions with jump discontinuities. Indeed, TVk regularisation implies that a solution u has be to contained in BVk (Ω) which embeds into the Sobolev space Hk−1,1(Ω) ↪ H1,1(Ω). As we have seen, for instance, in example 2.1, this means that characteristic functions cannot be solutions. More generally, for uH1,1(Ω) ⊂ BV(Ω), the derivative ∇u interpreted as a measure is absolutely continuous with respect to the Lebesgue measure such that the singular part satisfies ∇s u = 0. Theorem 2.21 then implies that the jump set Ju is a ${\mathcal{H}}^{d-1}$-negligible set, i.e., u cannot jump on (d − 1)-dimensional hypersurfaces.

Figure 2.

Figure 2. Second-order total-variation denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with TV2 (PSNR: 27.8 dB), (d) regularisation with ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ (PSNR: 26.3 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).

Standard image High-resolution image

Remark 3.33. Instead of taking higher-order TV which bases on the full gradient, one could also try to regularise with other differential operators, for instance with the (weak) Laplacian:

However, the kernel of this seminorm is the space of p-integrable harmonic functions on Ω, the Bergman spaces, which are infinite-dimensional. Therefore, in view of theorem 3.26, to use ${\mathcal{R}}_{\alpha }$ for the regularisation of ill-posed linear inverse problems, the forward operator K must be continuously invertible on a complement of $\mathrm{ker}\left({\mathcal{R}}_{\alpha }\right)\cap \mathrm{ker}\left(K\right)$, i.e., well-posed. This limits the applicability of this regulariser. Nevertheless, denoising problems can, for instance, still be solved, see figure 2(d), leading to 'speckle' artefacts in the solutions. Another possibility would be to add more regularising functionals, which is discussed in the next section.

Higher order TV for multichannel images. In analogy to TV, also higher order TV can be extended to colour and multichannel images represented by functions mapping into a vector space, say Rm . This is achieved by testing with ${\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$-valued tensor fields, where

and requires to choose a norm for this space. While, also in view of the Frobenius norm used in Symk (Rd ), the most natural choice seems to pick the norm that is induced by the inner product

as with TV, this is not the only possible choice and different norms imply different types of coupling of the multiple channels. Generally, we can take |⋅| to be any norm on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$, set |⋅| to be the corresponding dual norm and extend TVk to functions $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathbf{R}}^{m}\right)$ as

Equation (19)

where ||φ||,∗ is the pointwise supremum of the scalar function x ↦ |φ(x)|. By equivalence of norms in finite dimensions, the functional-analytic properties of TVk and the results on regularisation for inverse problems transfer one-to-one to its multichannel extension. Further, TVk is invariant under rotations whenever the tensor norm |⋅| is unitarily invariant in the sense that for any orthonormal matrix ORd×d and $\left({\xi }_{1},\dots ,{\xi }_{m}\right)\in {\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$ it holds that

where we define (ξi O)(a1, ..., ak ) = ξi (Oa1, ..., Oak ) for i = 1, ..., m.

Fractional-order TV. Recently, ideas from fractional calculus started to be transferred to construct new classes of higher-order TV, namely fractional-order total variation. The latter bases on fractional partial differentiation with respect to the coordinate axes. The partial fractional derivative of a non-integral order α > 0 of a function u compactly supported on the interval $ \left.\right]a,b\left[\right.\to \mathbf{R}$ can, for instance, be defined as

where kN is such that k − 1 < α < k and, denoting by Γ the gamma-function, i.e., ${\Gamma}\left(t\right)={\int }_{0}^{\infty }{s}^{t-1}{\mathrm{e}}^{-t}\enspace \mathrm{d}s$,

as well as

This fractional-order derivative corresponds to a central version of the Riemann–Liouville definition [140, 199]. However, one has to mention that there are also other possibilities to define fractional-order derivatives [146]. On a rectangular domain ${\Omega}=\left.\right]{a}_{1},{b}_{1}\left[\right.{\times}\cdots {\times} \left.\right]{a}_{d},{b}_{d}\left[\right.\subset {\mathbf{R}}^{d}$ and for test vector fields $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathbf{R}}^{d}\right)$, the fractional divergence of order α can then be defined as ${\mathrm{d}\mathrm{i}\mathrm{v}}^{\alpha }\varphi ={\sum }_{i=1}^{d}\frac{{\partial }_{\left[{a}_{i},{b}_{i}\right]}^{\alpha }{\varphi }_{i}}{\partial {x}_{i}^{\alpha }}$ which is still a bounded function. Consequently, the fractional total variation of order α for uL1(Ω) is given as

It is easy to see that this defines a proper, convex and lower semi-continuous functional on each Lp (Ω) which makes the functional suitable as a regulariser for denoising [194, 199], typically for 1 < α < 2. The use of TVα for the regularisation of linear inverse problems, however, seems to be unexplored so far, and not many properties of the solutions appear to be known.

4. Combined approaches

We have seen that employing higher-order total variation for regularisation yields well-posedness results for general linear inverse problems that are comparable to first-order TV regularisation, where the use of higher-order differentiation even weakens the continuity requirements on the forward operator. On the other hand, TVk regularisation, for k > 1, does not allow to recover jump discontinuities, as we have shown analytically and observed numerically (see remark 3.32). An interesting question in this context is how combinations of TV functionals with different orders behave with respect to these properties. Addressing this question, we consider, in the following, the combination of higher-order TV functionals via addition and infimal convolution. While these two approaches yield regularisation methods with rather different analytical properties, each of them has advantages and disadvantages with respect to continuity requirements on the forward operator and the possibility to recover non-smooth structures such as jumps. Also, a combination of two or more functionals introduces additional parameters. To account for that, we consider multi-parameter convergence results for each of the two approaches, which yield, in particular, an interpretation of the involved parameters and can be helpful for parameter choice in practice.

4.1. Additive multi-order regularisation

In this section, we consider the additive combination of total variation functionals with different orders. That is, we are interested in the following Tikhonov approach:

Equation (20)

with αi > 0 for i = 1, 2 and 1 ⩽ k1 < k2. With k1 = 1, k2 = 2, such an approach has for instance been considered in [142] for the regularisation of linear inverse problems.

The following proposition summarises, in the general setting of seminorms, basic properties of the function spaces arising from the additive combination of two different regularisers. Its proof is straightforward.

Proposition 4.1. Let |⋅|1 and |⋅|2 be two lower semi-continuous seminorms on the Banach space X. Then,

  • (a)  
    The functional $\left\vert \cdot \right\vert =\vert \cdot {\vert }_{1}+\vert \cdot {\vert }_{2}$ is a seminorm on X.
  • (b)  
    We have
  • (c)  
    The seminorm $\left\vert \cdot \right\vert $ is lower semi-continuous and
    constitutes a Banach space.
  • (d)  
    With Yi the Banach spaces arising from the norms ||⋅||X + |⋅|i , i = 1, 2 (see lemma 3.15),

Setting $\vert \cdot {\vert }_{i}={\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}$ for i = 1, 2 shows in particular that the function space associated with the additive combination of the ${\mathrm{T}\mathrm{V}}^{{k}_{i}}$ is embedded in ${\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$, i.e., the BV space corresponding to the highest order. Hence non-trivial combinations of different ${\mathrm{T}\mathrm{V}}^{{k}_{i}}$ again do not allow to recover jumps and, as the following proposition shows, in fact even yield the same space as the single TV term with the highest order.

Theorem 4.2. Let 1 ⩽ k1 < k2, α1 > 0, α2 > 0 and Ω be a bounded Lipschitz domain. For X = L1(Ω) and the seminorm $\left\vert \cdot \right\vert ={\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$, let Y be the associated Banach space according to lemma 3.15. Then,

in the sense of Banach space equivalence, and for p ∈ [1, ], pd/(dk2) if k2 < d, $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ a continuous, linear projection, there is a C > 0 independent of u such that

for all uLp (Ω).

Proof. For the claimed norm equivalence, one estimate is immediate, while the other one follows from theorem 3.19. Denoting by ${R}_{2}:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ a continuous, linear projection, the estimate on ||uRu||p follows from corollary 3.23 and norm equivalence in finite-dimensional spaces as

with C > 0 a generic constant. □

Tikhonov regularisation. For employing ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ as regularisation in a Tikhonov setting, the coercivity estimate in theorem 4.2 is crucial since it allows to transfer the well-posedness result of theorem 3.26. Observe in particular that $\mathrm{ker}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ is finite-dimensional, such that assumption (a) in theorem 3.26 is satisfied.

Proposition 4.3. With X = Lp (Ω), $p\in \left.\right] 1,\infty \left[\right.$, Ω a bounded Lipschitz domain, Y a Banach space, K : XY linear and continuous, Sf : Y → [0, ] proper, convex, lower semi-continuous and coercive, 1 ⩽ k1 < k2, α1 > 0, α2 > 0 the Tikhonov minimisation problem

Equation (21)

is well-posed in the sense of theorem 3.26 whenever pd/(dk2) if k2 < d.

It is interesting to note that the necessary coercivity estimate on ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ uses a projection to the smaller kernel of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and an Lp norm with a larger exponent corresponding to ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$. Hence, in view of the assumptions in theorem 3.26, the additive combination of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ inherits the best properties of the two summands, i.e., the ones that are the least restrictive for applications in an inverse problems context.

Regarding the convergence result of theorem 3.28 and the rates of proposition 3.29, a direct extension to regularisation with ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ can be obtained by regarding the weights α1, α2 to be fixed and introducing an additional factor α > 0 for both terms, which then acts as the regularisation parameter. A more tailored approach, however, would be to regard both α1, α2 as regularisation parameters and study the limiting behaviour of the method as α1, α2 converge to zero in some sense. This is covered by the following theorem.

Theorem 4.4. In the situation of proposition 4.3, let for each δ > 0 the data fδ be given such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta $, let $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ for some data fY in the sense of (4) with ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f.

Choose the positive parameters α = (α1, α2) in dependence of δ such that

and $\left({\tilde {\alpha }}_{1},{\tilde {\alpha }}_{2}\right)=\left({\alpha }_{1},{\alpha }_{2}\right)/\mathrm{max}\left\{{\alpha }_{1},{\alpha }_{2}\right\}\to \left({\alpha }_{1}^{{\dagger}},{\alpha }_{2}^{{\dagger}}\right)$ as δ → 0. Set

and assume pd/(dk) in case of k < d, and that there exists u0 ∈ BVk (Ω) such that Ku0 = f. Then, up to shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{1}-1}$, any sequence {uα,δ }, with each uα,δ being a solution to (20) for parameters (α1, α2) and data fδ , has at least one Lp -weak accumulation point. Each Lp -weak accumulation point is a minimum-$\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$-solution of Ku = f and ${\mathrm{lim}}_{\delta \to 0}\left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)=\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$.

Proof. First note that, as consequence of theorem 3.26 and the fact that u0 ∈ BVk (Ω) with Ku0 = f, there exists a minimum-$\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$-solution to Ku = f, that we denote by u, such that TVk (u) < . Using optimality of uα,δ compared to u gives

Since max{α1, α2} → 0 as δ → 0, we have that ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)\to 0$ as δ → 0. Moreover, as also δ/max{α1, α2} → 0, it follows that

The choice of k allows to conclude that {TVk (uα,δ )} is bounded, which, in case k = k1, means that $\left\{{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }\right)\right\}$ is bounded. Now we show that also in the other case when k = k2, $\left\{{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }\right)\right\}$ is bounded. To this aim, denote by $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ and ${P}_{Z}:\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\to Z$ linear, continuous projections, where Z is a complement of $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ in $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$, i.e., id − PZ projects $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ to $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$. Then, by optimality and invariance of K and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ on $\mathrm{ker}\left(K\right)\cap \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ we estimate

which, together with lemma 3.25, norm equivalence on finite-dimensional spaces and injectivity of K on the finite-dimensional space Z, yields

Now, the last expression is bounded due to boundedness of ${\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}^{\alpha ,\delta }\right)$ and equi-coercivity of $\left\{{S}_{{f}^{\delta }}\right\}$. Hence, $\left\{{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{\alpha ,\delta }\right)\right\}$ is always bounded and, again using the equi-coercivity of $\left\{{S}_{{f}^{\delta }}\right\}$ and the techniques in the proof of theorem 3.26, one sees that with possible shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{1}-1}$, one can achieve that {uα,δ } is bounded in BVk (Ω). Therefore, by continuous embedding and reflexivity, it admits a weak accumulation point in Lp (Ω).

Next, let u* be a Lp -weak accumulation point associated with {δn }, δn → 0 as well as the corresponding parameters {αn } = {(α1,n , α2,n )}. Then, ${S}_{{f}^{{\dagger}}}\left(K{u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }{S}_{{f}^{{\delta }_{n}}}\left(K{u}^{{\alpha }_{n},{\delta }_{n}}\right)=0$ by convergence of ${S}_{{f}^{\delta }}$ to ${S}_{{f}^{{\dagger}}}$, so Ku* = f. Moreover,

hence, u* is a minimum-$\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$-solution. In particular,

so

Finally, each sequence of {δn }, δn → 0 contains another subsequence (not relabelled) for which $\left({\tilde {\alpha }}_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\alpha }_{n},{\delta }_{n}}\right)\to \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$ as n, so we have $\left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)\to \left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$ as δ → 0. □

Remark 4.5. 

  • Theorem 4.4 shows that, with the additive combination higher-order TV functionals, the maximum of the parameters plays the role of the regularisation parameter. The regularity assumption on u0 such that Ku0 = f on the other hand depends on whether some parameters converge to zero faster than the maximum or not, i.e., it depends on the ratio of the parameters. Assuming for instance that α2/max{α1, α2} → 0 leads to the weaker ${\mathrm{B}\mathrm{V}}^{{k}_{1}}$-regularity requirement for u0.In view of a practical parameter choice strategy, this motivates a parametrisation via (α1, α2) = α(λ1, λ2), where α > 0 is interpreted as the regularisation parameter and chosen in dependence of the noise level, and (λ1, λ2) with max{λ1, λ2} = 1 is interpreted as model parameter, which is chosen (or learned) once for a particular class of image data and then fixed independent of the concrete forward model or noise level.
  • Although (20) incorporates multiple orders, a solution is always contained in ${\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$. Since k2 ⩾ 2, this space is always contained in H1,1(Ω), so jump discontinuities cannot appear. One can observe that for numerical solutions, this is reflected in blurry reconstructions of edges while higher-order smoothness is usually captured quite well, see figure 2(c).
  • Naturally, it is also possible to consider the weighted sum of more than two TV-type functionals for regularisation, i.e.,
    Equation (22)
    with orders k1, ..., km ⩾ 1 and weights α1, ..., αm > 0. Solutions then exist, for appropriate p, in the space BVk (Ω) for k = max{k1, ..., km }.

Optimality conditions. As for TVk , one can also consider optimality conditions for variational problems with ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ as regularisation. Again, in the case that Y is a Hilbert space, q = 2 and ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$, one can argue according to proposition 3.30 and obtain that u* is optimal for (20) if and only if

or, equivalently,

A difficulty with a further specification of these statements, however, is that it is not immediate that either the subdifferential is additive in this situation or that the dual of the sum of the ${\mathrm{T}\mathrm{V}}^{{k}_{i}}$ equals the infimal convolution of the duals (see definition 4.6 in the next subsection for a definition of the infimal convolution). A possible remedy is to consider the original minimisation problem in the space ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ instead, such that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ becomes continuous. This, however, yields subgradients in the dual of ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ instead of Lp*(Ω) making the optimality conditions again difficult to interpret.

A priori estimates. In order to obtain a bound on a solution u* for a quadratic Hilbert-norm discrepancy, i.e., ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$, Y Hilbert space, on can proceed analogously to remark 3.31, provided that K is injective on the space ${\mathbf{P}}^{{k}_{1}-1}$. We then also arrive at (17), with α replaced by α1, C being the coercivity constant for ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and c the inverse bound for K on ${\mathbf{P}}^{{k}_{1}-1}$. Of course, in case K is still injective on the larger space ${\mathbf{P}}^{{k}_{2}-1}$, the analogous bound can be obtained with α2 instead of α1 and respective constants C, c. In case of the Kullback–Leibler discrepancy, i.e., Sf (v) = KL(v, f), the analogous statements apply to the estimate (18).

Denoising performance. Figure 3 shows the effect of α1TV + α2TV2 regularisation compared to pure TV-regularisation. While staircase artefacts are slightly reduced, the overall image is more blurry than the one obtained with TV, see figures 3(b) and (c). This is expected as additive regularisation inherits the analytical properties of the stronger regularisation term, hence α1TV + α2TV2 is not able to recover jumps. The result is not much different when ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ is used instead of TV2, see figure 3(d). Nevertheless, although not discussed in this paper, the issue of limited applicability of ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ for regularisation of general inverse problems, as mentioned in remark 3.33, is overcome in an additive combination with TV since the properties of TV are sufficient to guarantee well-posedness results.

Figure 3.

Figure 3. Additive multi-order denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with α1TV + α2TV2 (PSNR: 29.5 dB), (d) regularisation with ${\alpha }_{1}\mathrm{T}\mathrm{V}+{\alpha }_{2}{{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ (PSNR: 29.4 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).

Standard image High-resolution image

4.2. Multi-order infimal convolution

In order to overcome the smoothing effect of total variation of order two and higher, and additive combinations thereof, another idea would be to model an image u as the sum of a first-order part and a second order part, i.e.,

This has originally been proposed in [59], and different variants have subsequently been analysed in [16] and considered in [174, 175] in a discrete setting.

Obviously, such a decomposition exists for each u ∈ BV(Ω) but is, of course, not unique. The parts u1 and u2 are now regularised with first and second-order total variation associated with some weights α1 > 0, α2 > 0. The associated Tikhonov minimisation problem reads as

As we are only interested in u, we rewrite this problem as

Equation (23)

This regularisation functional is called infimal convolution of α1TV and α2TV2.

Definition 4.6. Let ${F}_{1},{F}_{2}:X\to \left.\right] - \infty ,\infty \left.\right]$. Then,

is the infimal convolution of F1 and F2.

An infimal convolution is called exact, if for each uX there is a pair u1, u2X with

The infimal convolution may or may not be exact and may or may not be lower semi-continuous, even if both F1, F2 are lower semi-continuous. The next proposition, which should be compared to proposition 4.1 above, provides basic properties and the function spaces associated with infimal convolutions.

Proposition 4.7. Let |⋅|1 and |⋅|2 be two lower semi-continuous seminorms on the Banach space X. Then,

  • (a)  
    The functional $\left\vert \cdot \right\vert =\vert \cdot {\vert }_{1}{\triangle}\vert \cdot {\vert }_{2}$ is a seminorm on X.
  • (b)  
    We have
    with equality if |⋅|1△|⋅|2 is exact.
  • (c)  
    If |⋅|1△|⋅|2 is lower semi-continuous, then
    constitutes a Banach space.
  • (d)  
    With Yi the Banach spaces arising from the norms ||⋅||X + |⋅|i , i = 1, 2 (see lemma 3.15),
  • (e)  
    It holds that ${\left(\vert \cdot {\vert }_{1}{\triangle}\vert \cdot {\vert }_{2}\right)}^{{\ast}}={\vert \cdot {\vert }_{1}}^{{\ast}}+{\vert \cdot {\vert }_{2}}^{{\ast}}$ and if |⋅|1△|⋅|2 is exact, then

Proof. The seminorm axioms can easily be verified for $\left\vert \cdot \right\vert $. If u = u1 + u2 for ui ∈ ker(|⋅|i ), i = 1, 2, then

The converse inclusion follows directly from the exactness.

The third statement is a direct consequence of lemma 3.15 while the forth immediately follows from ${{\Vert}u{\Vert}}_{X}+\left\vert u\right\vert {\leqslant}{{\Vert}u{\Vert}}_{X}+\vert u{\vert }_{i}$ for i = 1, 2.

For the fifth statement, the assertion on the Fenchel dual follows by direct computation. Regarding equality of the subdifferentials, let wX* and uX such that $u\in \partial \left({\vert \cdot {\vert }_{1}}^{{\ast}}+{\vert \cdot {\vert }_{2}}^{{\ast}}\right)\left(w\right)$. Then, the Fenchel identity yields

Equation (24)

for the minimising u1, u2X with u1 + u2 = u. As by the Fenchel inequality

Equation (25)

the equation (24) can only be true when there is equality in (25). But this means, in turn, that ${u}_{1}\in \partial {\vert \cdot {\vert }_{1}}^{{\ast}}\left(w\right)$ and ${u}_{2}\in \partial {\vert \cdot {\vert }_{2}}^{{\ast}}\left(w\right)$. Hence, $\partial \left({\vert \cdot {\vert }_{1}}^{{\ast}}+{\vert \cdot {\vert }_{2}}^{{\ast}}\right)\subset \partial {\vert \cdot {\vert }_{1}}^{{\ast}}+\partial {\vert \cdot {\vert }_{2}}^{{\ast}}$. The other inclusion holds trivially. □

The statement (e) will be relevant for obtaining optimality conditions and we note that, as can be seen from the proof, it holds true for arbitrary convex functionals, not necessarily seminorms.

The previous proposition shows in particular that lower semi-continuity and exactness of the infimal convolution are important for obtaining an appropriate function space setting. Regarding the infimal convolution of TV functionals, this holds true on Lp -spaces as follows.

Proposition 4.8. Let Ω be a bounded Lipschitz domain, 1 ⩽ k1 < k2 and p ∈ [1, ] with pd/(dk1) if k1 < d. Then, for α = (α1, α2), α1 > 0, α2 > 0, the infimal convolution

Equation (26)

is exact and lower semi-continuous in Lp (Ω).

Proof. By continuous embedding, we may assume without loss of generality that p < . Take a sequence {un } converging to some u in Lp (Ω) for which ${\text{lim inf}}_{n\to \infty }\enspace {\mathcal{R}}_{\alpha }\left({u}^{n}\right){< }\infty $. For each n, we can select ${u}_{1}^{n},{u}_{2}^{n}\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ such that ${u}^{n}={u}_{1}^{n}+{u}_{2}^{n}$,

and ${u}_{1}^{n}$ is in the complement of $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ in the sense that $R{u}_{1}^{n}=0$ for $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ a linear and continuous projection. The latter condition can always be satisfied since both ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ are invariant on $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$. Now, by coercivity of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ as in corollary 3.23, we get that $\left\{{u}_{1}^{n}\right\}$ is bounded in ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$. Hence, by the embedding of ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ in either L(Ω) or ${L}^{d/\left(d-{k}_{1}\right)}\left({\Omega}\right)$ in case of k1 < d as in theorem 3.20, the choice of p and convergence of {un } in Lp (Ω), we can extract (non-relabelled) subsequences of $\left\{{u}_{1}^{n}\right\}$ and $\left\{{u}_{2}^{n}\right\}$ converging weakly to some u1 and u2 in Lp (Ω), respectively, such that u = u1 + u2. Thus, lower semi-continuity of both ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ implies

such that lower semi-continuity holds. Finally, exactness for uLp (Ω) with ${\mathcal{R}}_{\alpha }\left(u\right){< }\infty $ follows from choosing {un } as the sequence that is constant u. □

Given this, the special case $\vert \cdot {\vert }_{i}={\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}$ and X = L1(Ω) of proposition 4.7 shows that both ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ and ${\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ are embedded in the Banach space Y. Hence, in contrast to the sum of different TV terms, their infimal convolution allows to recover jumps whenever k1 = 1, independent of k2. In fact, as the following proposition shows, the space Y is even equivalent to the BV space corresponding to the lowest order, in particular to BV(Ω) for k1 = 1. Again, the result should be compared to theorem 4.2 above.

Theorem 4.9. Let 1 ⩽ k1 < k2, α1 > 0, α2 > 0, Ω be a bounded Lipschitz domain, and Y be the Banach space associated with X = L1(Ω) and total-variation infimal convolution according to (26). Then,

in the sense of Banach space equivalence, and for p ∈ [1, ], pd/(dk1) if k1 < d, and for $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ a linear, continuous projection there exists a C > 0 such that

Equation (27)

for all uLp (Ω).

Proof. We first show the claimed norm equivalence. For this purpose, note that one estimate corresponds to the fourth statement in proposition 4.7.

For the converse estimate, let $u\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ and $R:{L}^{1}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ be a projection. Then,

Equation (28)

for C independent of $u,w\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$. Indeed, if for {un } and {wn } we have ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}\right)=1$ and ${{\Vert}{u}^{n}{\Vert}}_{1}\to 0$ as well as ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}-R{w}^{n}\right)\to 0$, meaning un → 0 in L1(Ω) and ${\nabla }^{{k}_{1}}\left({u}^{n}-R{w}^{n}\right)\to 0$ in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$. The latter implies that $\left\{{\nabla }^{{k}_{1}}R{w}^{n}\right\}$ is bounded in a finite-dimensional space, hence there is a convergent subsequence (not relabelled) with limit $v\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$. Then, ${\nabla }^{{k}_{1}}{u}^{n}\to v$ in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{{k}_{1}}\left({\mathbf{R}}^{d}\right)\right)$ and the closedness of ${\nabla }^{{k}_{1}}$ yields v = 0 which is a contradiction to ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{n}\right)=1$ for all n.

Using this, together with the estimate

Equation (29)

from lemma 3.25, it holds for $u\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ and $w\in {\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$ that

Taking the infimum over all $w\in {\mathrm{B}\mathrm{V}}^{{k}_{2}}\left({\Omega}\right)$, adding ||u||1 on both sides as well as observing that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\mathrm{T}\mathrm{V}}^{{k}_{2}}{\leqslant}\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ yields

and, consequently, the desired norm estimate. Likewise, the estimate ${\Vert}u-Ru{{\Vert}}_{p}{\leqslant}C\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)$ follows in analogy to proposition 3.22 and corollary 3.23, which immediately gives the claimed estimate for arbitrary α1 > 0, α2 > 0. □

Tikhonov regularisation. Again, the second estimate in theorem 4.9 is crucial as it allows to apply the well-posedness result of theorem 3.26.

Proposition 4.10. With X = Lp (Ω), $p\in \left.\right] 1,\infty \left[\right.$, Ω being a bounded Lipschitz domain, Y a Banach space, K : XY linear and continuous, Sf : Y → [0, ] proper, convex lower semi-continuous and coercive, 1 ⩽ k1 < k2, α1 > 0, α2 > 0, the Tikhonov minimisation problem

Equation (30)

is well-posed in the sense of theorem 3.26 whenever pd/(dk1) if k1 < d.

Compared to the sum of different TV terms, we see that now the necessary coercivity estimate incorporates a projection to the larger kernel of ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ and an Lp norm with a smaller exponent corresponding to ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$. Hence, in view of the assumptions of theorem 3.26, the infimal convolution of ${\mathrm{T}\mathrm{V}}^{{k}_{1}}$ and ${\mathrm{T}\mathrm{V}}^{{k}_{2}}$ inherits the worst properties of the two summands, i.e., the ones that are more restrictive for applications in an inverse problems context. Nevertheless, such a slightly more restrictive assumption on the continuity of the forward operator is compensated by the fact that the infimal convolution with k1 = 1 allows to reconstruct jumps. In addition, each solution u* of a Tikhonov functional admits an optimal decomposition ${u}^{{\ast}}={u}_{1}^{{\ast}}+{u}_{2}^{{\ast}}$ with ${u}_{i}^{{\ast}}\in {\mathrm{B}\mathrm{V}}^{{k}_{i}}\left({\Omega}\right)$, i = 1, 2, which follows from the exactness of the infimal convolution.

Regarding the convergence result of theorem 3.28 and the rates of proposition 3.29, again a direct extension to regularisation with ${\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}$ can be obtained by regarding the weights α1, α2 to be fixed and introducing an additional factor α > 0 for both terms, which then acts as the regularisation parameter. Considering the limiting behaviour for both weights converging to zero, a counterpart of theorem 4.4 can be obtained as follows. There, we allow also for infinite weights αi , i.e., for αi = , we set ${\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left(u\right)=0$ if $u\in \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{i}}\right)$ and ${\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left(u\right)=\infty $ else. We first need a lower semi-continuity result.

Lemma 4.11. Let Ω be a bounded Lipschitz domain, $p\in \left[\right.1,\infty \left[\right.$  with 1 ⩽ pd/(dk1) if k1 < d, {(α1,n , α2,n )} be a sequence of positive parameters converging to some $\left({\alpha }_{1}^{{\dagger}},{\alpha }_{2}^{{\dagger}}\right)\in { \left.\right]0,\infty \left.\right]}^{2}$ and {un } be a sequence in Lp (Ω) weakly converging to u* ∈ Lp (Ω). Then,

Proof. By moving to a subsequence, we can assume that $\left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right)$ converges to the limes inferior on the right-hand side of the claimed assertion and that the latter is finite. Choose $\left\{{u}_{1}^{n}\right\}$, $\left\{{u}_{2}^{n}\right\}$ sequences such that for each n, we have ${u}_{1}^{n}+{u}_{2}^{n}={u}^{n}$, $\left({\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{n}\right)={\alpha }_{1,n}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}_{1}^{n}\right)+{\alpha }_{2,n}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({u}_{2}^{n}\right)$ and ${u}_{1}^{n}$ being in a complement of $\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ in the sense that ${u}_{1}^{n}\in \mathrm{ker}\left(R\right)$ for a linear, continuous projection $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$. Setting ${\hat{\alpha }}_{i}=\mathrm{inf}\enspace \left\{{\alpha }_{i,n}\right\}{ >}0$, we obtain

In particular, for a constant C > 0 it holds that

which implies that $\left\{{u}_{1}^{n}\right\}$ is bounded in ${\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$. By the embedding of theorem 3.20 and since {un } is convergent, both $\left\{{u}_{1}^{n}\right\}$ and $\left\{{u}_{2}^{n}\right\}$ admit subsequences (not relabelled) weakly converging to some ${u}_{1}^{{\ast}}$ and ${u}_{2}^{{\ast}}$ in Lp (Ω), respectively. Now, in case ${\alpha }_{i}^{{\dagger}}{< }\infty $, we can conclude

Otherwise, we get by boundedness of ${\alpha }_{i,n}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{n}\right)$ that ${\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{n}\right)\to 0$ and by lower semi-continuity that ${\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}_{i}^{{\ast}}\right)=0$. Together, this implies

This implies the desired statement. □

Theorem 4.12. In the situation of proposition 4.10 and for $p\in \left.\right] 1,\infty \left[\right.$  with pd/(dk1) in case of k1 < d, let for each δ > 0 the data fδ be given such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta $, let $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ for some data f in Y in the sense of (4) and ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f.

Choose the parameters α = (α1, α2) in dependence of δ such that

and assume that $\left({\tilde {\alpha }}_{1},{\tilde {\alpha }}_{2}\right)=\left({\alpha }_{1},{\alpha }_{2}\right)/\mathrm{min}\left\{{\alpha }_{1},{\alpha }_{2}\right\}\to \left({\alpha }_{1}^{{\dagger}},{\alpha }_{2}^{{\dagger}}\right)\in {\left.\right]0,\infty \left.\right]}^{2}$ as δ → 0. Set

and assume that there exists u0 ∈ BVk (Ω) such that Ku0 = f.

Then, up to shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{2}-1}$, any sequence {uα,δ }, with each uα,δ being a solution to (30) for parameters (α1, α2) and data fδ , has at least one Lp -weak accumulation point. Each Lp -weak accumulation point is a minimum-$\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$-solution of Ku = f and ${\mathrm{lim}}_{\delta \to 0}\left({\tilde {\alpha }}_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\tilde {\alpha }}_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{\alpha ,\delta }\right)=\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right)$.

Proof. First note that, with $R:{L}^{p}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$ a linear, continuous projection, for any uLp (Ω), we have

and by the choice of k as well as u0 ∈ BVk (Ω), that $\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}_{0}\right){< }\infty $. Hence, as a consequence of theorem 3.26, there exists a minimum-$\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$-solution u ∈ BVk (Ω) to Ku = f. Using optimality of uα,δ compared to u gives

Now since $\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\dagger}}\right){\leqslant}{\mathrm{min}}_{i=1,2}\left\{{\alpha }_{i}{\mathrm{T}\mathrm{V}}^{{k}_{i}}\left({u}^{{\dagger}}\right)\right\}$ and min{α1, α2} → 0 as δ → 0, we have that ${S}_{{f}^{\delta }}\left(K{u}^{\alpha ,\delta }\right)\to 0$ as δ → 0. Moreover, as also δ/min{α1, α2} → 0, it follows that

for ɛ > 0 independent of uα,δ and letting ɛ → 0, we obtain

In particular, using (27), we can conclude that {uα,δ Ruα,δ } is bounded in Lp (Ω). By introducing appropriate shifts in $\mathrm{ker}\left(K\right)\cap {\mathbf{P}}^{{k}_{2}-1}$ as done in theorem 3.26 and using the equi-coercivity of $\left\{{S}_{{f}^{\delta }}\right\}$, one can then achieve that {uα,δ } is bounded in Lp (Ω) such that by reflexivity, it admits a Lp -weak accumulation point.

Next, let u* be a Lp -weak accumulation point associated with {δn }, δn → 0 as well as the corresponding parameters {αn } = {(α1,n , α2,n )}. Then, ${S}_{{f}^{{\dagger}}}\left(K{u}^{{\ast}}\right){\leqslant}{\text{lim inf}}_{n\to \infty }\enspace {S}_{{f}^{{\delta }_{n}}}\left(K{u}^{{\alpha }_{n},{\delta }_{n}}\right)=0$ by convergence of ${S}_{{f}^{\delta }}$ to ${S}_{{f}^{{\dagger}}}$, so Ku* = f. Moreover, employing lemma 4.11, we get

hence, u* is a minimum-$\left({\alpha }_{1}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}^{{\dagger}}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)$-solution. The remaining assertions follow as in the proof of theorem 4.4 by replacing the sum with the infimal convolution. □

Remark 4.13. 

  • Compared to the convergence result for the sum of higher-order TV functionals as in theorem 4.4, we see that now the minimum of the parameters plays the role of the regularisation parameter, but again the ratio of the parameters defines the required regularity assumption on u0 such that Ku0 = f. For a parameter choice in practice, this again motivates the choice (α1, α2) = α(λ1, λ2) as in remark 4.5, with α > 0 being interpreted as regularisation parameter and (λ1, λ2) with min{λ1, λ2} = 1 being interpreted as model parameter.
  • It is also possible to construct infimal convolutions of more than two TV-type functionals and, of course, other functionals than TVk .
  • Introducing orders k1, ..., km ⩾ 1 and weights α1, ..., αm > 0, one can consider
    Equation (31)
    Solutions then exist, for appropriate p, in the space BVk (Ω) for k = min{k1, ..., km }.
  • The latter is in contrast to the multi-order TV regularisation (20) where the solution space is determined by the highest effective order of differentiation. Letting ki = 1 for some i, the solution space is then BV(Ω) which allows for discontinuities; a desirable property for image restoration.

Optimality conditions. Again, in the situation that Y is a Hilbert space, q = 2 and ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$, we obtain some first-order optimality conditions. Noting that the dual of the infimal convolution of two functions is the sum of the respective duals, and arguing according to proposition 3.30, an u* is optimal for (30) if and only if

By proposition 4.7, the subgradients are additive, so in terms of the normal cones introduced in proposition 3.30, the optimality condition reads as

Equation (32)

A priori estimates. Also here, in the above Hilbert space situation, i.e., ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ and Y Hilbert space, an a-priori bound of solutions u* can be derived thanks to the coercivity estimate (27). One indeed has ${{\Vert}u-Ru{\Vert}}_{p}{\leqslant}\frac{1}{2\mathrm{min}\left\{{\alpha }_{1},{\alpha }_{2}\right\}}C{{\Vert}f{\Vert}}_{Y}^{2}$ with R and C coming from (27). Hence, assuming that K is injective on ${\mathbf{P}}^{{k}_{2}-1}$, which leads to c||Ru||p ⩽ ||KRu||Y for all u and some c > 0, one proceeds analogously to remark 3.31 to obtain the bound (17) with α replaced by min{α1, α2}. By analogy, for Sf (v) = KL(v, f) being the Kullback–Leibler discrepancy, an a priori estimate of the type (18) follows.

Moreover, it is possible to control w* up to ${\mathbf{P}}^{{k}_{1}-1}$ whenever $\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{{\ast}}\right)={\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\ast}}-{w}^{{\ast}}\right)+{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\left({w}^{{\ast}}\right)$. Let, in the following, Cf ⩾ 0 be an a priori estimate for the optimal functional value, for instance, ${C}_{f}=\frac{1}{2}{{\Vert}f{\Vert}}_{Y}^{2}$ in case of ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$, and ${C}_{f}=\mathrm{K}\mathrm{L}\left(K{u}^{0},f\right)+\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left({u}^{0}\right)$ for a ${u}^{0}\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left({\Omega}\right)$ with KL(Ku0, f) < in case of Sf (v) = KL(v, f). Further, denoting by $\tilde {C}{ >}0$ a constant such that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left(u\right){\leqslant}\tilde {C}\left({{\Vert}u{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}\left({\alpha }_{1}{\mathrm{T}\mathrm{V}}^{{k}_{1}}{\triangle}{\alpha }_{2}{\mathrm{T}\mathrm{V}}^{{k}_{2}}\right)\left(u\right)\right)$ for all $u\in {\mathrm{B}\mathrm{V}}^{{k}_{1}}\left(u\right)$ (which exists by virtue of the norm equivalence in theorem 4.9), we see that ${\mathrm{T}\mathrm{V}}^{{k}_{1}}\left({u}^{{\ast}}\right){\leqslant}\tilde {C}\left({\left\vert {\Omega}\right\vert }^{1/p}{{\Vert}{u}^{{\ast}}{\Vert}}_{p}+\mathrm{min}{\left\{{\alpha }_{1},{\alpha }_{2}\right\}}^{-1}{C}_{f}\right)$, hence

Consequently, we obtain the bound

Equation (33)

which gives an a priori estimate when plugging in the already-obtained bound on ${{\Vert}{u}^{{\ast}}{\Vert}}_{p}$. Moreover, this estimate implies a bound on ${{\Vert}{w}^{{\ast}}-R{w}^{{\ast}}{\Vert}}_{p}$ by the Poincaré–Wirtinger inequality. However, the norm of w* can not fully be controlled since adding an element in ${\mathbf{P}}^{{k}_{1}-1}=\mathrm{ker}\left({\mathrm{T}\mathrm{V}}^{{k}_{1}}\right)$ to w* would still realise the infimum in the infimal convolution. Thus, an estimate of the type (33) is the best one could except in the considered setting.

Denoising performance. Figure 4 shows that it is indeed beneficial for denoising to regularise with α1TV△α2TV2 compared to pure TV-regularisation: higher-order features as well as edges are recognised by this image model. Nevertheless, staircase artefacts are still present, see figure 4(c). Essentially, this does not change when the second-order component of the infimal convolution is replaced, for instance by ${{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ as in remark 3.33, see figure 4(d). (For the latter penalty functional, basically the same problems as the ones mentioned in remark 3.33 appear.)

Figure 4.

Figure 4. Infimal-convolution denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with α1TV△α2TV2 (PSNR: 28.9 dB), (d) regularisation with ${\alpha }_{1}\mathrm{T}\mathrm{V}{\triangle}{\alpha }_{2}{{\Vert}{\Delta}\cdot {\Vert}}_{\mathcal{M}}$ (PSNR: 28.6 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).

Standard image High-resolution image

5. Total generalised variation (TGV)

5.1. Basic concepts

As a motivation for TGV, consider the formal predual ball associated with the infimal convolution ${\mathcal{R}}_{\alpha }={\alpha }_{1}\mathrm{T}\mathrm{V}{\triangle}{\alpha }_{0}{\mathrm{T}\mathrm{V}}^{2}$ for α = (α0, α1), α0, α1 > 0. Then

and

Neglecting the closure for a moment, this leads to the predual ball according to

Equation (34)

Each $\varphi \in \mathcal{B}$ possesses a representation as an -bounded first and second-order divergence of some φ1 and φ2. However, as the kernel of the divergence is non-trivial (and even infinite-dimensional for d ⩾ 2), we can only conclude that φ1 = div φ2 + η for some η with div η = 0. Enforcing η = 0 thus gives the set

which leads, interpreted as a predual ball, to a seminorm which also incorporates first- and second-order derivatives but is different from infimal convolution: the total generalised variation [37].

There is also a primal version of this motivation via the (TV–TV2)-infimal convolution which reads as follows: writing

we see that the infimal convolution allows to subtract a vector field w = ∇v from the derivative of u at the cost of penalising its derivative $\nabla w=\mathcal{E}w$, where the equality is due to symmetry of the weak Hessian ∇2 v. While, by the embedding BD(Ω, Rd ) ↪ Ld/(d−1)(Ω, Rd ), necessarily w ∈ BD(Ω, Rd ), it is not arbitrary among such functions but still restricted to be the gradient of v ∈ BV2(Ω). Omitting this additional constraint (in the predual version above, this corresponds to enforcing η = 0), we arrive at

Equation (35)

which is, as will be shown in this section, is an equivalent formulation of the TGV functional.

Definition 5.1. Let Ω ⊂ Rd be a domain, k ⩾ 1 and α0, ..., αk−1 > 0. Then, the total generalised variation of order k with weight α for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$ is defined as the value of the functional

Equation (36)

which takes the value in case the respective set is unbounded from above.

For symmetric tensors $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ of order l ⩾ 0, the total generalised variation is given by

Equation (37)

The space

is called the space of symmetric tensor fields of bounded generalised variation of order k with weight α. The special case l = 0 is denoted by ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega}\right)$.

Remark 5.2. For k = 1 and α > 0, the definition coincides, up to a factor, with the total deformation of symmetric tensor fields of order l, i.e., ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{1,l}=\alpha \mathrm{T}\mathrm{D}$, in particular ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{1,0}={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{1}=\alpha \mathrm{T}\mathrm{V}$. Hence, we can identify the spaces ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)=\mathrm{B}\mathrm{D}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$. In particular, ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{1}\left({\Omega}\right)=\mathrm{B}\mathrm{V}\left({\Omega}\right)$.

In the following, we will derive some basic properties of the total generalised variation.

Proposition 5.3. The following basic properties hold:

  • (a)  
    ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is a lower semi-continuous seminorm on Lp (Ω, Syml (Rd )) for each p ∈ [1, ].
  • (b)  
    The kernel satisfies $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\right)=\mathrm{ker}\left({\mathrm{T}\mathrm{D}}^{k}\right)$ for the kth order total deformation for symmetric tensor fields of order l. In particular, $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\right)$ is a finite-dimensional subspace of polynomials of order less than k + l. For l = 0, we have $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\right)={\mathbf{P}}^{k-1}$.
  • (c)  
    ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is a Banach space independent of α.

Proof. Observe that ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is the seminorm associated with the predual ball

Equation (38)

By definition, each element of ${\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}$ can be associated with an element of the dual space of Lp (Ω, Syml (Rd )), so ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is convex and lower semi-continuous as pointwise supremum over a set of linear and continuous functionals. The positive homogeneity finally follows from $\lambda {\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}\subset {\mathcal{B}}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}}$ for each $\left\vert \lambda \right\vert {\leqslant}1$.

The statement about the kernel of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ is a consequence of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)=0$ if and only if ⟨u, divk φ⟩ = 0 for each $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$ (compare with proposition 3.21).

Finally, ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ is a Banach space by lemma 3.15. The equivalence for parameter sets α0, ..., αk−1 > 0 and ${\tilde {\alpha }}_{0},\dots ,{\tilde {\alpha }}_{k-1}{ >}0$ can be seen as follows. Choosing C > 0 large enough, we can achieve that

This implies

Interchanging roles we get ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k,l}{\leqslant}C{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$, so the spaces ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ and ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ have equivalent norms. □

Remark 5.4. As ${\mathrm{B}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ are all equivalent for different α, we will drop, in the following, the subscript α.

Proposition 5.5. The scalar total generalised variation, i.e., ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ possesses the following invariance and scaling properties:

  • (a)  
    ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ is translation invariant, i.e. for x0${\mathbf{R}}^{d}$ and u ∈ BGVk (Ω) we have that tilde u given by tilde u(x) = u(x + x0) is in BGVk (Ω − x0) and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(\tilde{u}\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)$,
  • (b)  
    ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ is rotationally invariant, i.e. for each orthonormal matrix O${\mathbf{R}}^{d{\times}d}$ and u ∈ BGVk (Ω) we have, defining tilde u(x) = u(Ox), that tilde u ∈ BGVk (OTΩ) with ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(\tilde{u}\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)$,
  • (c)  
    For r > 0 and u ∈ BGVk (Ω), we have, defining tilde u(x) = u(rx), that tilde u ∈ BGVk (r−1Ω) with

Proof. See [37]. □

The derivative versus the symmetrised derivative. In both ways to motivate the second-order TGV functional as presented at the beginning of this section, we see that symmetric tensor fields and a symmetrised derivative appear naturally in the penalisation of higher-order derivatives. Indeed, in the motivation via the predual ball $\mathcal{B}$ of the infimal convolution, symmetric tensor fields (resulting in a symmetrised derivative in the primal version) appear as the most economic way to write the predual ball, since for $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{2}\left({\Omega},{\mathcal{T}}^{2}\left({\mathbf{R}}^{d}\right)\right)$, div2 φ = div2(| | |φ). In the primal version, the symmetrised derivative results from writing ${\nabla }^{2}v=\mathcal{E}\nabla v$ and then relaxing ∇v to be an arbitrary vector field w. Nevertheless, also non-symmetric tensor fields and the equality ∇2 v = ∇(∇v) could have been used in these motivations. For the TGV functional, this would have resulted in a primal version of second-order TGV according to

which is genuinely different from the definition in (35). The following example provides some insight on the differences between using the derivative and the symmetrised derivative of vector fields of bounded variation in a Radon-norm penalty.

Example 5.6. On ${\Omega}=\left\{\left({x}_{1},{x}_{2}\right)\in {\mathbf{R}}^{2}\enspace \vert \enspace {x}_{1}^{2}+{x}_{2}^{2}{< }\frac{1}{4}\right\}$ define, for given $\nu ,n\in {\mathcal{S}}^{1}$ (the unit sphere in R2),

where n = (n2, −n1). Then, w ∈ BV(Ω, R2) and, with $L=\left\{\lambda {n}^{\perp }\enspace \vert \enspace \lambda \in \left.\right] -\frac{1}{2},\frac{1}{2}\left[\right.\right\}$,

A direct computation shows that

thus, the symmetrised derivative depends on the angle of the vector field relative to the jump set, while the derivative does not. In particular, whenever ν2 = 0 such that the vector field can be written as the gradient of a function in BV2(Ω), the two notions coincide. See figure 5 for a visualisation of w for different values of ν.

Figure 5.

Figure 5. Visualisation of the function w of example 5.6 and values of ${\Vert}\nabla w{{\Vert}}_{\mathcal{M}}$ and ${\Vert}\mathcal{E}w{{\Vert}}_{\mathcal{M}}$ for different choices of ν. The blue lines show the level lines of a function v such that w = ∇v.

Standard image High-resolution image

5.2. Functional analytic and regularisation properties

We would like to characterise ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ in terms of a minimisation problem. This characterisation will base on Fenchel–Rockafellar duality. Here, the following theorem by [8] is employed. Recall that the domain of a function $F:X\to \left.\right]- \infty ,\infty \left.\right]$ is defined as dom(F) = {xX|F(x) < }.

Theorem 5.7. Let X, Y be Banach spaces and Λ : XY linear and continuous. Let $F:X\to \left.\right]- \infty ,\infty \left.\right]$ and $G:Y\to \left.\right]- \infty ,\infty \left.\right]$ be proper, convex and lower semi-continuous. Assume that

Equation (39)

Then,

Equation (40)

In particular, the maximum on the right-hand side is attained.

As a preparation for employing this duality result, we note:

Lemma 5.8. Let l ⩾ 0, i ⩾ 1 and ${w}_{i-1}\in {\mathcal{C}}_{0}^{i-1}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i-1}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$, ${w}_{i}\in {\mathcal{C}}_{0}^{i}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$ be distributions of order i − 1 and i, respectively. Then,

Equation (41)

with the right-hand side being finite if and only if $\mathcal{E}{w}_{i-1}-{w}_{i}\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ in the distributional sense.

Proof. Note that in the distributional sense, $\langle {w}_{i}-\mathcal{E}{w}_{i-1},\enspace \varphi \rangle =\langle {w}_{i-1},\enspace \mathrm{d}\mathrm{i}\mathrm{v}\enspace \varphi \rangle +\langle {w}_{i},\enspace \varphi \rangle $ for all $\varphi \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$. Since ${\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ is dense in ${\mathcal{C}}_{0}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$, the distribution ${w}_{i}-\mathcal{E}{w}_{i-1}$ can be extended to an element in ${\mathcal{C}}_{0}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}=\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+i}\left({\mathbf{R}}^{d}\right)\right)$ if and only if the supremum in (41) is finite. In case of finiteness, it coincides with the Radon norm by definition. □

This enables us to derive the problem which is dual to the maximisation problem in (37). We will refer to the resulting problem as the minimum representation of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$.

Theorem 5.9. For k ⩾ 1, l ⩾ 0, Ω a bounded Lipschitz domain and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ according to (37), we have for each uL1(Ω, Syml (Rd )):

Equation (42)

with the minimum being finite if and only if u ∈ BD(Ω, Syml ( R d )) and attained for some w0, ..., wk where wi ∈ BD(Ω, Syml+i ( R d )) for i = 0, ..., k and w0 = u as well as wk = 0 in case of u ∈ BD(Ω, Syml ( R d )).

Proof. First, take $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$ such that ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right){< }\infty $. We will employ Fenchel–Rockafellar duality. For this purpose, introduce the Banach spaces

the linear operator

and the proper, convex and lower semi-continuous functionals

With these choices, the identity

follows from the definition in (37).

In order to show the representation of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}\left(u\right)$ as in (42), we would like to obtain

Equation (43)

This follows as soon as (39) is verified. For the purpose of showing (39), let ψY and define backwards recursively: ${\varphi }_{k}=0\in {\mathcal{C}}_{0}^{k}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l}\left({\mathbf{R}}^{d}\right)\right)$, ${\varphi }_{i}={\psi }_{i}-\mathrm{d}\mathrm{i}\mathrm{v}\enspace {\varphi }_{i+1}\in {\mathcal{C}}_{0}^{i}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{i+l}\left({\mathbf{R}}^{d}\right)\right)$ for i = k − 1, ..., 1. Hence, φX and −Λφ = ψ. Moreover, choosing λ > 0 large enough, one can achieve that ${{\Vert}{\lambda }^{-1}\varphi {\Vert}}_{\infty }{\leqslant}{\alpha }_{k-i}$ for all i = 1, ..., k, so λ−1 φ ∈ dom(F) and since 0 ∈ dom(G), we get the representation ψ = λ(0 − Λλ−1 φ). Thus, the identity (43) holds and the minimum is attained in Y*. Now, Y* can be written as

with elements w = (w1, ..., wk−1), ${w}_{i}\in {\mathcal{C}}_{0}^{i}{\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{i+l}\left({\mathbf{R}}^{d}\right)\right)}^{{\ast}}$, for 1 ⩽ ik − 1. Therefore, with w0 = u and wk = 0 we get, as G* = 0, that

From lemma 5.8 we obtain that each supremum is finite and coincides with ${{\Vert}\mathcal{E}{w}_{i-1}-{w}_{i}{\Vert}}_{\mathcal{M}}$ if and only if $\mathcal{E}{w}_{i-1}-{w}_{i}\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+i}\left({\mathbf{R}}^{d}\right)\right)$ for i = 1, ..., k. Then, as wk = 0, according to theorem 3.16, this already yields wk−1 ∈ BD(Ω, Symk+l−1(Rd )), in particular ${w}_{k-1}\in \mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{k+l-1}\left({\mathbf{R}}^{d}\right)\right)$. Proceeding inductively, we see that wi ∈ BD(Ω, Symk+i (Rd )) for each i = 0, ..., k. Hence, it suffices to take the minimum in (43) over all BD-tensor fields which gives (42).

In addition, the minimum in (42) is finite if u ∈ BD(Ω, Syml (Rd )). Conversely, if TD(u) = , also ${{\Vert}\mathcal{E}{w}_{0}-{w}_{1}{\Vert}}_{\mathcal{M}}=\infty $ for all w1 ∈ BD(Ω, Syml+1(Rd )). Hence, the minimum in (42) has to be . □

Remark 5.10. In the scalar case, i.e., l = 0 it holds that

Equation (44)

Remark 5.11. The minimum representation also allows to define ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ recursively:

Equation (45)

where α' = (α0, ..., αk−1) if α = (α0, ..., αk ).

Remark 5.12. For the scalar ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$, the minimum representation reads as

This can be interpreted as follows. For u ∈ BV(Ω), ∇u is a measure which can be decomposed into a regular and singular component with respect to the Lebesgue measure. The singular part is always penalised with the Radon norm where from the regular part, an optimal bounded deformation vector field w is extracted. This vector field is penalised by TD which, like TV, implies certain regularity but also allows for jumps. Thus, $\mathcal{E}w$ essentially contains the second-order derivative information of u.

Provided that w is optimal, the total generalised variation of second order then penalises the first-order remainder ∇uw which essentially contains the jumps of u as well as the second-order information $\mathcal{E}w$.

The next step is to examine the spaces BGVk (Ω, Syml (Rd )). Our aim is to prove that these space coincide with BD(Ω, Syml (Rd )) for fixed l ⩾ 0 and all k ⩾ 1. We will proceed inductively with respect to k and hence vary k, l but leave Ω fixed and assume that Ω is a bounded Lipschitz domain. For what follows, we choose a family of projection operators onto the kernel of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}=\mathrm{ker}\left({\mathcal{E}}^{k}\right)$ (see proposition 5.3).

Definition 5.13. For each k ⩾ 1 and l ⩾ 0, denote by ${R}_{k,l}:{L}^{d/\left(d-1\right)}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)\to \mathrm{ker}\left({\mathcal{E}}^{k}\right)$ a linear and continuous projection.

As $\mathrm{ker}\left({\mathcal{E}}^{k}\right)$ (on Ω and for symmetric tensor fields of order l) is finite-dimensional, such a Rk,l always exists but is not necessarily unique. A coercivity estimate in Ld/(d−1)(Ω, Syml (Rd )) for ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}$ will next be formulated and proven in terms of these projections. As we will see, the induction step in the proof requires an intermediate estimate as follows.

Lemma 5.14. For each k ⩾ 1 and l ⩾ 0 there exists a constant C > 0, only depending on Ω, k and l such that for each u ∈ BD(Ω, Syml (Rd )) and wLd/(d−1)(Ω, Syml+1(Rd )),

Proof. If this is not true for some k and l, then there exist {un } in BD(Ω, Syml (Rd )) and {wn } in Ld/(d−1)(Ω, Syml+1(Rd )) such that

This implies that {Rk,l+1 wn } is bounded in terms of ${{\Vert}\cdot {\Vert}}_{\mathcal{M}}$ in the finite-dimensional space $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l+1}\right)=\mathrm{ker}\left({\mathcal{E}}^{k}\right)$, see proposition 5.3. Consequently, there exists a subsequence, again denoted by {wn }, such that Rk,l+1 wn w as n with respect to ||⋅||1. Hence, $\mathcal{E}{u}^{n}\to w$ as n. Further, we have that un → 0 as n and thus, by closedness of the weak symmetrised gradient, $\mathcal{E}{u}^{n}\to 0$ as n in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$, which contradicts ${\Vert}\mathcal{E}{u}^{n}{{\Vert}}_{\mathcal{M}}=1$ for all n. □

Proposition 5.15. For each k ⩾ 1 and l ⩾ 0, there exists a constant C > 0 such that

Equation (46)

Equation (47)

for all u ∈ BD(Ω, Syml (Rd )).

Proof. We prove the result by induction with respect to k. In the case k = 1 and l ⩾ 0 arbitrary, the first inequality is immediate while the second is equivalent to the Sobolev–Korn inequality in BD(Ω, Syml (Rd )), see theorem 3.18.

Now assume that both inequalities hold for a fixed k and each l ⩾ 0 and perform an induction step with respect to k, i.e., we fix lN, α = (α0, ..., αk ) with αi > 0 for i = 0, ..., k. We assume that assertion (46) holds for α' = (α0, ..., αk−1) and any l' ∈ N.

We will first show the uniform estimate for ${\Vert}\mathcal{E}u{{\Vert}}_{\mathcal{M}}$ for which it suffices to consider u ∈ BD(Ω, Syml (Rd )), as otherwise, according to theorem 5.9, ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left(u\right)=\infty $. Then, with the projection Rk,l+1, the help of lemma 5.14, the continuous embeddings

and the induction hypothesis, we can estimate for arbitrary w ∈ BD(Ω, Syml+1(Rd )),

for C > 0 suitable generic constants. Taking the minimum over all such w ∈ BD(Ω, Syml+1(Rd )) then yields

by virtue of the recursive minimum representation (45).

The coercivity estimate can be shown analogously to proposition 3.22 and corollary 3.23. First, assume that the inequality does not hold true for α = (1, ..., 1). Then, there is a sequence {un } in Ld/(d−1)(Ω, Syml (Rd )) such that

By $\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\right)=\mathrm{r}\mathrm{g}\left({R}_{k+1,l}\right)$, we have ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left({u}^{n}-{R}_{k+1,l}{u}^{n}\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}\left({u}^{n}\right)$ for each n. Thus, since we already know the first estimate in (46) to hold,

Equation (48)

implying, by continuous embedding, that {un Rk+1,l un } is bounded in BD(Ω, Syml (Rd )). By compact embedding (see theorem 3.17), we may therefore conclude that un Rk+1,l un u in L1(Ω, Syml (Rd )) for some subsequence (not relabelled). Moreover, as Rk+1,l (un Rk+1,l un ) = 0 for all n, the limit has to satisfy Rk+1,l u = 0. On the other hand, by lower semi-continuity (see proposition 5.3),

hence $u\in \mathrm{ker}\left({\mathcal{E}}^{k+1}\right)=\mathrm{r}\mathrm{g}\left({R}_{k+1,l}\right)$. Consequently, limn un Rk+1,l un = u = Rk+1,l u = 0. From (48) it follows that also $\mathcal{E}\left({u}^{n}-{R}_{k+1,l}{u}^{n}\right)\to 0$ in $\mathcal{M}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l+1}\left({\mathbf{R}}^{d}\right)\right)$, so un Rk+1,l un → 0 in BD(Ω, Syml (Rd )) and by continuous embedding also in Ld/(d−1)(Ω, Syml (Rd )). However, this contradicts ||un Rk+1,l un ||d/(d−1) = 1 for all n, and thus, the claimed coercivity for the particular choice α = (1, ..., 1) holds. The result for general α then follows from monotonicity of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k+1,l}$ with respect to each component of α. □

Corollary 5.16. For k ⩾ 1 and l ⩾ 0 there exist C, c > 0 such that for all u ∈ BD(Ω, Syml (Rd )) we have

Equation (49)

In particular, BGVk (Ω, Syml (Rd )) = BD(Ω, Syml (Rd )) in the sense of Banach space isomorphy.

Proof. The estimate on the right is a consequence of (46) while the estimate on the left follows by the minimum representation (42) which gives ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k,l}{\leqslant}{\alpha }_{k-1}\mathrm{T}\mathrm{D}$. □

Tikhonov regularisation. Once again, the second estimate in proposition 5.15 is crucial to transfer the well-posedness result of theorem 3.26 as follows.

Proposition 5.17. With X = Lp (Ω), $p\in \left.\right]1,\infty \left[\right.$, Ω being a bounded Lipschitz domain, Y a Banach space, K : XY linear and continuous, Sf : Y → [0, ] proper, convex, lower semi-continuous and coercive, k ⩾ 1, α = (α0, ..., αk−1) with αi > 0 for i = 0, ..., k − 1, the Tikhonov minimisation problem

Equation (50)

is well-posed in the sense of theorem 3.26 whenever pd/(d − 1) if d > 1.

Regarding the assumptions of theorem 3.26 on the kernel of the seminorm and the constraint on the exponent p in the underlying Lp -space, we see that, as one would expect, TGVk resembles the situation of the infimal convolution of TV-type functionals rather than their sum, in particular the constraint pd/(d − 1) is the same as with first-order TV regularisation.

This is also true for the following convergence result, which should be compared to the results of theorems 4.4 and 4.12 for the sum and the infimal convolution of higher-order TV functionals, respectively. Here, similar as with the infimal convolution, we extend ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ to weights in $ \left.\right]0,\infty \left.\right]$ by using the minimum representation and defining ${\alpha }_{i}{{\Vert}\cdot {\Vert}}_{\mathcal{M}}={\mathcal{I}}_{\left\{0\right\}}$ for αi = .

Theorem 5.18. In the situation of proposition 5.17 and $p\in \left.\right]1,\infty \left[\right.$  with pd/(d − 1) if d > 1, let for each δ > 0 the data fδ be such that ${S}_{{f}^{\delta }}\left({f}^{{\dagger}}\right){\leqslant}\delta $, and let the discrepancy functionals $\left\{{S}_{{f}^{\delta }}\right\}$ be equi-coercive and converge to ${S}_{{f}^{{\dagger}}}$ for some data f in Y in the sense of (4) and ${S}_{{f}^{{\dagger}}}\left(v\right)=0$ if and only if v = f.

Choose the parameters α = (α0, ..., αk−1) in dependence of δ such that

and assume that $\left({\tilde {\alpha }}_{0},\dots ,{\tilde {\alpha }}_{k-1}\right)=\left({\alpha }_{0},\dots ,{\alpha }_{k-1}\right)/\mathrm{min}\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}\to \left({\alpha }_{0}^{{\dagger}},\dots ,{\alpha }_{k-1}^{{\dagger}}\right)\in \left.\right]0,\infty \left.\right]^{k}$ as δ → 0. Set

and assume that there exists u0 ∈ BVm (Ω) such that Ku0 = f.

Then, up to shifts in ker(K) ∩ P k−1, any sequence {uα,δ }, with each uα,δ being a solution to (50) with parameters (α0, ..., αk−1) and data fδ , has at least one Lp -weak accumulation point. Each Lp -weak accumulation point is a minimum-${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{{\dagger}}}^{k}$-solution of Ku = f and ${\mathrm{lim}}_{\delta \to 0}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\tilde {\alpha }}^{k}\left({u}^{\alpha ,\delta }\right)={\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }^{{\dagger}}}^{k}\left({u}^{{\dagger}}\right)$.

Proof. The proof is analogous to the one of [32, theorem 4.8], which considers the case ${S}_{f}\left(v\right)=\frac{1}{q}{\Vert}v-f{{\Vert}}_{Y}^{q}$ for $q\in \left[\right.1,\infty \left[\right. $. Alternatively, one can proceed along the lines of the proof of theorem 4.12 with the infimal convolution replaced by TGV to obtain the result. □

Remark 5.19. (parameter choice). Similar to the infimal convolution of TV functionals, also for TGV, the minimum of the involved parameters determines the overall regularisation, while the ratio between the minimum and the different parameters reflect the regularity assumption on the ground-truth data. In practice, this suggests a parameter choice as (α0, ..., αk−1) = α(λ0, ..., λk−1), where again α > 0 is chosen in dependence of the noise level and (λ0, ..., λk−1) with min{λ0, ..., λk−1} = 1 is fixed once for a given class of image data and then left constant independent of the concrete forward model or the noise level.

A priori estimates. In case of Hilbert-space data and quadratic norm discrepancy, i.e., ${S}_{f}\left(v\right)=\frac{1}{2}{{\Vert}v-f{\Vert}}_{Y}^{2}$ for Y Hilbert space, one can, in the situation of proposition 5.17 once again find an a-priori bound thanks to the coercivity estimate (47). Let C > 0 be a constant such that ${{\Vert}u-Ru{\Vert}}_{p}{\leqslant}C\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left(u\right)$ for a linear and continuous projection operator R onto Pk−1 for all u ∈ BV(Ω). Further, assume that K is injective on Pk−1 and c > 0 is chosen such that c||Ru||p ⩽ ||KRu||Y for all uLp (Ω). Then, for a solution u* of the minimisation problem

the norm ${{\Vert}{u}^{{\ast}}{\Vert}}_{p}$ obeys the a priori estimate (17) with α replaced by min{α0, ..., αk−1}. Also here, if the discrepancy is replaced by the Kullback–Leibler discrepancy Sf (v) = KL(v, f), then ${{\Vert}{u}^{{\ast}}{\Vert}}_{p}$ can be estimated analogously in terms of (18). Let, again, Cf ⩾ 0 be an a priori estimate for the optimal functional value, analogous to the Cf that leads to (33). Moreover, analogous to the multi-order infimal-convolution case in subsection 4.2, it is possible to estimate each tuple $\left({w}_{1}^{{\ast}},\dots ,{w}_{k-1}^{{\ast}}\right)$ that realises the minimum in the primal representation (42) of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({u}^{{\ast}}\right)$. Now, in order to estimate, for instance, ${{\Vert}{w}_{1}^{{\ast}}{\Vert}}_{1}$, set ${w}_{0}^{{\ast}}={u}^{{\ast}}$ and note that we already have the bound ${{\Vert}{w}_{0}^{{\ast}}{\Vert}}_{1}{\leqslant}{\left\vert {\Omega}\right\vert }^{1/p}{{\Vert}{u}^{{\ast}}{\Vert}}_{p}$ where (17) or (18) provides an a-priori estimate of the right-hand side. Choosing a C1 > 0 such that ${{\Vert}\mathcal{E}{w}_{0}{\Vert}}_{\mathcal{M}}{\leqslant}{C}_{1}\left({{\Vert}{w}_{0}{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}\left({w}_{0}\right)\right)$ for all w0 ∈ BD(Ω, Sym0(Rd )) = BV(Ω), we obtain ${{\Vert}\mathcal{E}{w}_{0}^{{\ast}}{\Vert}}_{\mathcal{M}}{\leqslant}{C}_{1}\left({{\Vert}{w}_{0}^{{\ast}}{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-1}\right\}}^{-1}{C}_{f}\right)$ and, consequently,

We thus obtain the bound

Equation (51)

which is similar to (33), but involves a norm and not a seminorm due to the structure of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$. Using this line of argumentation, one can now inductively obtain bounds on w2, ..., wk−1 according to

Equation (52)

for i = 1, ..., k − 1, where each Ci > 0 is a constant such that ${{\Vert}\mathcal{E}{w}_{i-1}{\Vert}}_{\mathcal{M}}{\leqslant}{C}_{i}\left({{\Vert}{w}_{i-1}{\Vert}}_{1}+\mathrm{min}{\left\{{\alpha }_{0},\dots ,{\alpha }_{k-i}\right\}}^{-1}{\mathrm{T}\mathrm{G}\mathrm{V}}_{\left({\alpha }_{0},\dots ,{\alpha }_{k-i}\right)}^{k-i+1}\left({w}_{i-1}\right)\right)$ for all wi−1 ∈ BD(Ω, Symi−1(Rd )), whose existence is guaranteed by proposition 5.15. This provides an a-priori estimate for u* and ${w}_{1}^{{\ast}},\dots ,{w}_{k-1}^{{\ast}}$.

Denoising performance. In figure 6, one can see how second-order TGV regularisation (figure 6(d)) performs in comparison to first-order TV (figure 6(b)) and α1TV△α2TV2 (figure 6(c)) as regulariser for image denoising. It is apparent that TGV covers higher-order features more accurately than the associated infimal-convolution regulariser with the staircase effect being absent, while at the same time, jump discontinuities are preserved as for first-order TV. This is in particular reflected in the underlying function space for TGV being BV(Ω), see corollary 5.16. In conclusion, the total generalised variation can be seen as an adequate model for piecewise smooth images and will, in the following, be the preferred regulariser for this class of functions.

Figure 6.

Figure 6. Total generalised variation denoising example. (a) Noisy image (PSNR: 13.9 dB), (b) regularisation with TV (PSNR: 29.3 dB), (c) regularisation with α1TV△α2TV2 (PSNR: 28.9 dB), (d) regularisation with ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ (PSNR: 30.4 dB). All parameters were manually tuned via grid search to give highest PSNR with respect to the ground truth (figure 1(a)).

Standard image High-resolution image

5.3. Extensions

TGV for multichannel images. Again, in analogy to TV and higher-order TV, TGV can also be extended to colour and multichannel images represented by functions mapping into the vector space Rm by testing with ${\mathrm{S}\mathrm{y}\mathrm{m}}^{k}{\left({\mathbf{R}}^{d}\right)}^{m}$-valued tensor fields. This requires to define pointwise norms on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{l}{\left({\mathbf{R}}^{d}\right)}^{m}$ for l = 1, ..., k where, apart from the standard Frobenius norm, one can take any norm $\vert \cdot {\vert }_{{{\circ}}_{l}}$ on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{l}{\left({\mathbf{R}}^{d}\right)}^{m}$, noting that different norms imply different types of coupling of the multiple channels. With each $\vert \cdot {\vert }_{{{\ast}}_{l}}$ denoting the dual norm of $\vert \cdot {\vert }_{{{\circ}}_{l}}$, ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ can be extended to functions $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},{\mathbf{R}}^{m}\right)$ as

Equation (53)

where ${{\Vert}\psi {\Vert}}_{\infty ,{{\ast}}_{l}}$ is the pointwise supremum of the scalar function $x{\mapsto}\vert \psi \left(x\right){\vert }_{{{\ast}}_{l}}$ on Ω for $\psi \in {\mathcal{C}}_{\mathrm{c}}\left({\Omega},{\mathrm{S}\mathrm{y}\mathrm{m}}^{l}\left({\mathbf{R}}^{d}\right)\right)$. As before, by equivalence of norms in finite dimensions, the functional-analytic and regularisation properties of TGV transfer to its multichannel extension, see e.g. [27, 33]. Rotationally invariance holds whenever all tensor norms $\vert \cdot {\vert }_{{{\ast}}_{l}}$ are unitarily invariant. For k = 2, particular instances that are unitarily invariant can be constructed by choosing ${\left\vert \cdot \right\vert }_{{{\ast}}_{1}}$ as a unitarily invariant matrix norm and ${\left\vert \cdot \right\vert }_{{{\ast}}_{2}}$ as either the Frobenius tensor norm or ${\left\vert \xi \right\vert }_{{{\ast}}_{2}}={\sum }_{i=1}^{m}{\left\vert {\xi }_{i}\right\vert }_{{{\ast}}_{1}}$, i.e., a decoupled norm. This allows, for instance, to penalise the nuclear norm of first-order derivatives and the Frobenius tensor norm of the second order component, as it was done, e.g., in [123].

Infimal-convolution TGV. Beyond the realisation of different couplings of multiple colour channels, the extension to arbitrary pointwise tensor norms in the definition of TGV can also be beneficial in the context of scalar-valued functions. In [109], the infimal convolution of different TGV functionals with different, anisotropic norms was considered in the context of dynamic data as well as anisotropic regularisation for still images. With ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{{k}_{i}}$ for i = 1, ..., n denoting TGV functionals according to (53) for m = 1 of order ki and each βi denoting a tuple of pointwise norms, the functional ${\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\beta }^{n}$ can be defined for $u\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega}\right)$ as

As shown in [109], this functional is equivalent to ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ for k = max{ki } and α any parameter vector, and, in case ${\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\beta }^{n}\left(u\right){< }\infty $, the minimum is attained for ui Ld/(d−1)(Ω) for i = 1, ..., n − 1. Hence, the coercivity estimate on ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{k}$ transfers to ${\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\beta }^{n}$ and again, all results in the context of Tikhonov regularisation apply.

For applications in the context of dynamic data, the norms for the different ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{{k}_{i}}$ can be chosen to realise different weightings of spatial and temporal derivatives. This allows, in a convex setting, for an adaptive regularisation of video data via a motion-dependent separation into different components, see, for instance, figure 7.

Figure 7.

Figure 7. Frame of an image sequence showing a juggler (left), and three frames of a decomposition into components capturing slow (top right images) and fast (bottom right images) motion that was achieved with ICTGV regularisation. Reprinted from [35] by permission from Springer Nature Customer Service Centre GmbH: Springer © 2015.

Standard image High-resolution image

Similarly, for still image regularisation, one can choose ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{1}}^{{k}_{1}}$ to employ isotropic norms and correspond to the usual total generalised variation, and each ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{{k}_{i}}$ for i = 2, ..., n to employ different anisotropic norms that favour one particular direction. This yields again an adaptive regularisation of image data via a decomposition into an isotropic and several anisotropic parts and can be employed, for instance, to recover certain line structures for denoising [109] or applications in CT imaging [125].

Oscillation TGV and its infimal convolution. The total generalised variation model can also be extended to account for functions with piecewise oscillatory behaviour, which is, for instance, useful to model texture in images [90]. The basic idea to include oscillations is to fix a direction ωRd , ω ≠ 0 and to modify the definition of second-order TGV such that its kernel corresponds to oscillatory functions in the $\omega /\left\vert \omega \right\vert $-direction with frequency $\left\vert \omega \right\vert $:

where, as before, α = (α0, α1), α0, α1 > 0. Indeed, the kernel of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}$ is spanned by the functions x ↦ sin(xω) and x ↦ cos(xω). Further, the functional is proper, convex and lower semi-continuous in each Lp (Ω), and admits the minimum representation

With ${R}_{\omega }:{L}^{d/\left(d-1\right)}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)$ a linear and continuous projection, a coercivity estimate holds as follows:

for all u ∈ BV(Ω), see [90]. The functional can therefore be used as a regulariser in all cases where TV is applicable.

In order to obtain a texture-aware image model, one can now take the infimal convolution of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{0}}^{2}$ with parameter vector ${\alpha }_{0}\in { \left.\right]0,\infty \left[\right. }^{2}$ and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{i},{\omega }_{i}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}$ for parameter vectors ${\alpha }_{1},\dots ,{\alpha }_{n}\in {\left.\right]0,\infty \left[\right. }^{2}$ and directions ω1, ..., ωn Rd with ωi ≠ 0 for i = 1, ..., n, i.e.,

which again yields a proper, convex and lower semi-continuous regulariser on each Lp (Ω) which is coercive in the sense that ${{\Vert}u-Ru{\Vert}}_{d/\left(d-1\right)}{\leqslant}C{\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\left(u\right)$ for a linear and continuous projection $R:{L}^{d/\left(d-1\right)}\left({\Omega}\right)\to \mathrm{ker}\left({\mathrm{I}\mathrm{C}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha ,\omega }^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)=\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{0}}^{2}\right)+\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{1},{\omega }_{1}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)+\cdots +\mathrm{ker}\left({\mathrm{T}\mathrm{G}\mathrm{V}}_{{\alpha }_{n},{\omega }_{n}}^{\mathrm{o}\mathrm{s}\mathrm{c}\mathrm{i}}\right)$, see again [90]. It is therefore again applicable as a regulariser for inverse problems whenever TV is applicable. See figure 8 for an example of ICTGVosci-based denoising and its benefits for capturing and reconstructing textured regions.

Figure 8.

Figure 8. Example of ICTGVosci denoising. In the top row, the whole image is depicted, while a closeup of the respective marked region is shown in the bottom row. (a) A noisy image (PSNR: 26.0 dB). (b) Results of TGV2-denoising (PSNR: 34.8 dB). (c) Results of ICTGVosci denoising (PSNR: 36.6 dB). Parameters were manually optimised via grid search towards best peak signal-to-noise ratio (PSNR) with respect to the ground truth (not shown). Figure taken from [90]. Copyright © 2018 Society for Industrial and Applied Mathematics. Reprinted with permission. All rights reserved.

Standard image High-resolution image

TGV for manifold-valued data. In different applications in inverse problems and imaging, the data of interest takes values not in a vector space but rather a non-linear space such as a manifold. Examples are sphere-valued data in synthetic aperture radar (SAR) imaging or data with values in the space of positive matrices, equipped with the Fisher–Rao metric, which is used diffusion tensor imaging. Motivated by such applications, TV regularisation has been extended to cope with manifold-valued data, using different approaches and numerical algorithms [68, 98, 129, 192]. A rather simple extension of TV for discrete and finite, univariate signals ${\left({u}_{i}\right)}_{i}$ living in a complete Riemannian manifold $\mathcal{M}\subset {\mathbf{R}}^{d}$ with metric ${d}_{\mathcal{M}}$ is given as

For this setting, and an extension to bivariate signals, the work [192] provides simple numerical algorithms which yield, in case $\mathcal{M}$ is a Hadamard space, globally optimal solutions of variational TV denoising for manifold-valued data. While this allows in particular to extend edge-preserving regularisation to non-linear geometric data, it can again be observed that TV regularisation has a tendency towards piecewise constant solutions with artificial jump discontinuities. To overcome this, different works have proposed extensions of this approach to higher-order TV [10], the (TV–TV2)-infimal convolution [14, 15] and second-order TGV [15, 36]. Here we briefly sketch the main underlying ideas, presented in [36], for an extension of TGV to manifold-valued data. For simplicity, we consider only the case of univariate signals ${\left({u}_{i}\right)}_{i}$ and assume that length-minimising geodesics are unique (see [36] for the general case and details on the involved differential-geometric concepts).

From the continuous perspective, a natural approach to extend TGV for manifold-valued data, at least in a smooth setting, would be to use tangent spaces for first-order derivatives and, for the second-order term, invoke a connection on the manifold for the differentiation of vector fields. In contrast to that, the motivation for the definition of TGV as in [36] was to exploit a discrete setting in order to avoid high-level differential-geometric concepts but rather to come up with a definition of TGV that can be written only in terms of the distance function on the manifold. To this aim, we identify tangential vectors $v\in {T}_{a}\mathcal{M}$, with ${T}_{a}\mathcal{M}$ denoting the tangent space at a point $a\in \mathcal{M}$, with point tuples [a, b] via the exponential map b = expa (v). A discrete gradient operator then maps a signal ${\left({u}_{i}\right)}_{i}$ to a sequence of point-tuples ${\left(\left[{u}_{i},{u}_{i+1}\right]\right)}_{i}$, where we regard (∇u)i = [ui , ui+1], which generalises first-order differences in vector spaces, since in this case, ${\mathrm{exp}}_{{u}_{i}}\left({u}_{i+1}-{u}_{i}\right)={u}_{i+1}$. Vector fields whose base points are ${\left({u}_{i}\right)}_{i}$ can then be identified with a sequence ${\left(\left[{u}_{i},{y}_{i}\right]\right)}_{i}$ with each ${y}_{i}\in \mathcal{M}$ and, assuming $D:{\mathcal{M}}^{2}{\times}{\mathcal{M}}^{2}\to \left[\right.0,\infty \left[\right.$ to be an appropriate distance-type function for such tuples, an extension of second-order TGV can be given as

The difficulty here is in particular how to define D for two point tuples with different base points, as those represent vectors in different tangent spaces. To overcome this, a variant for D as proposed in [36] uses the Schild's ladder [119] construction as a discrete approximation of the parallel transport of vector fields between different tangent spaces. In order to describe this construction, denote by [u,v]t for $u,v\in \mathcal{M}$ and tR the point reached at time t after travelling on the geodesic from u to v, i.e., [u,v]t = expu (t  logu (v)), where log is the inverse exponential map. Then, the parallel transport of [u, v] (which represents ${\mathrm{log}}_{u}\left(v\right)\in {T}_{u}\mathcal{M}$) to the base point $x\in \mathcal{M}$ is approximated by [x, y'] where ${y}^{\prime }={\left[u,{\left[x,v\right]}_{\frac{1}{2}}\right]}_{2}$ (which represents ${\mathrm{log}}_{x}\left({y}^{\prime }\right)\in {T}_{x}\mathcal{M}$). Using this, a distance on point tuples, denoted by DS , can be given as

Exploiting the fact that ${D}_{S}\left(\left[u,v\right],\left[u,w\right]\right)={d}_{\mathcal{M}}\left(v,w\right)$ for tuples having the same base point, a concrete realisation of discrete second order TGV for manifold-valued data is then given as

The S-TGV denoising problem for ${\left({f}_{i}\right)}_{i}$ some given data with ${f}_{i}\in \mathcal{M}$ then reads as

and a numerical solution (which can only be guaranteed to deliver stationary points due to non-convexity) can be obtained, for instance, using the cyclic proximal point algorithm [9, 36]. Figure 9 shows the results for this setting using both TV and second-order TGV regularisation for the denoising of ${\mathcal{S}}^{2}$ valued image data, which is composed of different blocks of smooth data with sharp interfaces. It can be seen that both TV and TGV are able to recover the sharp interfaces, but TV suffers from piecewise-constancy artefacts which are not present with TGV.

Figure 9.

Figure 9. Example of variational denoising for manifold-valued data. The images show noisy ${\mathcal{S}}^{2}$-valued data (left) which is denoised with TV-regulariser (middle) and TGV-regulariser (right). The sphere ${\mathcal{S}}^{2}$ is colour-coded with hue and value representing the longitude and latitude, respectively. All parameters were optimised via grid search towards the best result with respect to a suitable distance measure for manifold-valued data, see [36] for details.

Standard image High-resolution image

Image-driven TGV. In case of denoising problems, i.e., fLp (Ω), second-order TGV can be modified to incorporate directional information obtained from f, resulting in image-driven TGV (ITGV) [152]. The latter is defined by introducing a diffusion tensor field into the functional:

where $D:\overline{{\Omega}}\to {\mathrm{S}\mathrm{y}\mathrm{m}}^{2}\left({\mathbf{R}}^{d}\right)$ is assumed to be continuous and positive semi-definite in each point. Denoting by fσ = f*Gσ a smoothed version of the data f obtained by convolution with a Gaussian kernel Gσ of variance σ > 0 and suitable extension outside of Ω, the diffusion tensor field D may be chosen according to

with parameters γ > 0 and β > 0. If the smallest eigenvalue of D is uniformly bounded away from 0 in $\overline{{\Omega}}$, then ITGV admits the same functional-analytic and regularisation properties as second-order TGV. We refer to [152] for an application and numerical results regarding this regularisation approach in stereo estimation.

Non-local TGV. The concept of non-local total variation (NLTV) [94] can also be transferred to the total generalised variation. Recall that instead of taking the derivative, non-local total variation penalises the differences of the function values of u for each pair of points by virtue of a weight function:

where the weight function a : Ω × Ω → [0, ] is measurable and a.e. bounded from below by a positive constant. We note that, alternatively, the weight function a may also be chosen as a(x, y) = |xy|−(θ+d) with θ ∈ ]0, 1[ such that low-order Sobolev–Slobodeckij seminorms can be realised [76]. In the context of non-local total variation, a allows to incorporate a priori information for the image to reconstruct. For instance, if one already knows disjoint segments Ω1, ..., Ωn where the solution is piecewise constant, one can set

where c1c0 > 0. This way, the difference between two function values of u in Ωi is forced to 0, meaning u is constant in Ωi .

Non-local total generalised variation now gives the possibility to enforce piecewise linearity of u in the segments by incorporating the vector field w corresponding to the slope of the linear part in a non-local cascade. This results in

with two weight functions a0, a1 : Ω × Ω → [0, ], again measurable and bounded a.e. away from zero [154]. In analogy to NLTV, a priori information on, for instance, disjoint segments where the sought solution is piecewise linear, allows to choose weight functions such that the associated NLTGV2 regulariser properly reflects this information. See figure 10 for a denoising example where non-local TGV turns out to be beneficial, in particular in the regions near the jump discontinuities of sought solution.

Figure 10.

Figure 10. Example for non-local total generalised variation denoising. (a) An noisy piecewise linear image. (b) Results of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$-denoising together with a surface plot of its graph. (c) Results of non-local TGV-denoising together with a surface plot of its graph. Images taken from [154]. Reprinted by permission from Springer Nature.

Standard image High-resolution image

6. Numerical algorithms

Tikhonov regularisation with higher-order total variation, its combination via addition or infimal convolution, as well as total generalised variation poses a non-smooth optimisation problem in an appropriate Lebesgue function space. In practice, these minimisation problems are discretised and solved by optimisation algorithms that exploit the structure of the discrete problem. While there are many possibilities for a discretisation of the considered regularisation functionals as well as for numerical optimisation, most of the algorithms that can be found in the literature base on finite-difference discretisation and first-order proximal optimisation methods. In the following, we provide an overview of the building blocks necessary to solve the considered Tikhonov functional minimisation problems numerically. We will exemplarily discuss the derivation of respective algorithms on the basis of the popular primal-dual algorithm with extragradient [60] and, as alternative, briefly address implicit and preconditioned optimisation methods.

6.1. Discretisation of higher-order TV functionals

We discretise the discussed functionals in 2D, higher dimensions follow by analogy. Moreover, for the sake of simplicity, we assume a rectangular domain, i.e., ${\Omega}=\left.\right]0,{N}_{1}\left[\right.{\times}\left.\right]0,{N}_{2}\left[\right.\subset {\mathbf{R}}^{2}$ for some positive N1, N2N. A generalisation to non-rectangular domains will be straightforward.

Following essentially the presentation in [37] we first replace Ω by the discretised grid

One consistent way of discretising higher-order derivatives is to define partial derivatives as follows: a discrete partial derivative takes the difference between two neighbouring elements in the grid with respect to a specified axis. This difference is associated with the midpoint between the two grid elements, resulting in staggered grids. For a finite sequence of directions p ∈ ⋃k⩾0{1,2}k , this results, on the one hand, in the recursively defined grids

Equation (54)

Note that ${{\Omega}}_{h}^{p}$ does not depend on the order of the pi and one could use multiindices in N2 instead. Likewise, the discrete partial derivatives recursively given by

Equation (55)

yield well-defined functions ${\partial }_{h}^{p}u:{{\Omega}}_{h}^{p}\to \mathbf{R}$ for u : Ωh R which do not depend on the order of the entries in p. The discrete gradient ${\nabla }_{h}^{k}u$ of order k ⩾ 1 for u : Ωh R is then the tuple that collects all the partial derivatives of order k:

Note that due to this construction, the partial derivatives ${\partial }_{h}^{p}u$ are generally defined on different grids. However, in order to define the Frobenius norm of ${\nabla }_{h}^{k}$, and, consequently, an 1-type norm, a common grid is needed. There are several possibilities for this task (such as interpolation) which has been studied mainly for the first-order total variation. Here, we discuss a strategy that results in a simple definition of a discrete higher-order total variation. It bases on collecting, for (i, j) ∈ Z2, all the nearby points in the different ${{\Omega}}_{h}^{p}$. This can, for instance, be done by moving half-steps forward and backward in the directions indicated by p ∈ {1,2}k :

Equation (56)

where e1, e2 are the unit vectors in R2. The Frobenius norm in a point (i, j) ∈ Z2 is then given by

Equation (57)

where ${\partial }_{h}^{p}u$ is extended by zero outside of ${{\Omega}}_{h}^{p}$. Note that here, although ${\partial }_{h}^{p}$ does not depend on the order of discrete differentiation, the point (ip , jp ) does. Thus, a different Frobenius norm for the kth discrete derivative would be constituted by symmetrisation, which means symmetrising ${\nabla }_{h}^{k}u$ and taking the Frobenius norm afterwards. In this context, is makes sense to average over the grid points as follows. Denoting by α(p) ∈ N2 the multiindex associated with p ∈ {1,2}k , i.e., α(p)i = #{m|pm = i}, we define

Equation (58)

for (i, j) ∈ Z2 and αN2, where $\left(\genfrac{}{}{0pt}{}{\left\vert \alpha \right\vert }{\alpha }\right)=\frac{\left({\alpha }_{1}+{\alpha }_{2}\right)!}{{\alpha }_{1}!{\alpha }_{2}!}$. Then, the grid associated with an αN2 reads as

while the α-component of the symmetrised gradient is given by

where (i, j) ∈ Z2 is chosen such that $\left({i}_{\alpha },{j}_{\alpha }\right)\in {{\Omega}}_{h}^{\alpha }$ and ${\left({\partial }_{h}^{p}u\right)}_{{i}_{p},{j}_{p}}$ is zero for points outside of ${{\Omega}}_{h}^{p}$. This results in the symmetrised derivative as follows:

The Frobenius norm of ${\mathcal{E}}_{h}^{k}u$ in a point (i, j) ∈ Z2 can finally be obtained by

Equation (59)

Remark 6.1. For αN2 with $\left\vert \alpha \right\vert $ even, we have (iα , jα ) = (i, j) for each (i, j) ∈ Z. Indeed, for $p\in {\left\{1,2\right\}}^{\left\vert \alpha \right\vert }$ and the reversed tuple $\overline{p}=\left({p}_{\left\vert \alpha \right\vert },\dots ,{p}_{1}\right)$ it holds $\alpha \left(p\right)=\alpha \left(\overline{p}\right)$. Further, either $p=\overline{p}$ leading to (ip , jp ) = (i, j) or $p\ne \overline{p}$ leading to $\left({i}_{p},{j}_{p}\right)+\left({i}_{\overline{p}},{j}_{\overline{p}}\right)=2\left(i,j\right)$. Consequently, (iα , jα ) = (i, j) according to the definition. In other words, the symmetrisation of the discrete gradient is a natural way of aligning the different grids ${{\Omega}}_{h}^{p}$ to a common grid in this case.

For $\left\vert \alpha \right\vert $ odd, the grid points still do not align. However, we can say that for (i, j), the point (iα , jα ) lies on the line connecting $\left(i+\frac{1}{2},j\right)$ and $\left(i,j+\frac{1}{2}\right)$. Indeed, for $p\in {\left\{1,2\right\}}^{\left\vert \alpha \right\vert }$ with α(p) = α we can consider $\overline{p}=\left({p}_{\left\vert \alpha \right\vert -1},\dots ,{p}_{1},{p}_{\left\vert \alpha \right\vert }\right)$. If $p=\overline{p}$, then (ip , jp ) is either $\left(i+\frac{1}{2},j\right)$ or $\left(i,j+\frac{1}{2}\right)$. In the case $p\ne \overline{p}$, the point $\frac{1}{2}\left({i}_{p},{j}_{p}\right)+\frac{1}{2}\left({i}_{\overline{p}},{j}_{\overline{p}}\right)$ is either $\left(i+\frac{1}{2},j\right)$ or $\left(i,j+\frac{1}{2}\right)$. As (iα , jα ) is a convex combination of such points, it lies on the line connecting $\left(i+\frac{1}{2},j\right)$ and $\left(i,j+\frac{1}{2}\right)$. Hence, the symmetrisation of the gradient leads to more localised grid points.

We now have everything at hand to define two versions of a discrete total variation of arbitrary order.

Definition 6.2. Let kN, k ⩾ 1 a differentiation order. Then, for u : Ωh R, the discrete total variation is defined as

with ${\left\vert {\nabla }_{h}^{k}u\right\vert }_{i,j}$ according to (57), and discrete total variation for the symmetrised gradient is defined as

with ${\left\vert {\mathcal{E}}_{h}^{k}u\right\vert }_{i,j}$ according to (59).

In order to define a discrete version of the total generalised variation, we still need to discuss the discretisation of the total deformation for discrete symmetric tensor fields. For this purpose, we say that the components of a discrete symmetric tensor field of order lN, live on the grids ${{\Omega}}_{h}^{\alpha }$, resulting in

realising a discrete symmetric tensor field of order l. Its Frobenius norm is given in the points (i, j) ∈ Z2 according to

Equation (60)

which is compatible with (59) if one plugs in ${\mathcal{E}}_{h}^{l}u$ for some u : Ωh R. The partial derivative of order k described by p ∈ {1,2}k applied to uα is then also given by (55), but acts on the grid ${{\Omega}}_{h}^{\alpha }$ and results in a discrete function on the grid ${{\Omega}}_{h}^{\alpha ,p}$ which is given in analogy to (54) by replacing Ωh with ${{\Omega}}_{h}^{\alpha }$. The symmetrised derivative ${\mathcal{E}}_{h}^{k}u$, whose components are indexed by βN2, $\left\vert \beta \right\vert =k+l$, is then defined in a point $\left({i}_{\beta },{j}_{\beta }\right)\in {{\Omega}}_{h}^{\beta }$ where (i, j) ∈ Z2 by

Equation (61)

where

This is sufficient to define a discrete total deformation.

Definition 6.3. Let k, lN, k ⩾ 1 and l ⩾ 0. Then, for $u={\left({u}_{\alpha }\right)}_{\alpha \in {\mathbf{N}}^{2},\left\vert \alpha \right\vert =l}$, the discrete total deformation of order k is defined as

with ${\mathcal{E}}_{h}^{k}u$ according to (61) and ${\left\vert \cdot \right\vert }_{i,j}$ according to (60).

For the sake of completeness, the respective definitions for non-symmetric tensor fields of order l read as

Equation (62)

and the pth component, p ∈ {1,2}k+l , of the discrete gradient of order k is given as

Equation (63)

For numerical algorithms, it is necessary to write the discrete total variation and total deformation as the one-norm of a (symmetric) tensor with respect to the respective discrete differentiation operator. We therefore introduce the underlying spaces.

Definition 6.4. Let lN, and q ∈ [1, ]. The q -space of discrete l-tensors on Ωh is given by

with ${{\Omega}}_{h}^{p}$ according to (54), and norm

with pointwise norm according to (62). The space ${\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)$ is equipped with the scalar product

Analogously, the q -space of discrete symmetric l-tensors on Ωh is defined as

with an analogous norm using (60) as pointwise norm. The scalar product on 2h , Syml (R2)) is given by

For kN, equation (61) then defines a linear operator mapping

and (63) induces a linear operator mapping

The norm of these operators can easily be estimated:

Lemma 6.5. We have ${\Vert}{\nabla }_{h}^{k}{\Vert}{\leqslant}{8}^{k/2}$ and ${\Vert}{\mathcal{E}}_{h}^{k}{\Vert}{\leqslant}{8}^{k/2}$ independent of l.

Proof. As ${\nabla }_{h}^{k}={\nabla }_{h}\cdots {\nabla }_{h}$ on the respective discrete tensor fields, it is sufficient to prove the statement for k = 1 and l arbitrary. For this purpose, observe that for $u:{{\Omega}}_{h}^{p}\to \mathbf{R}$, p ∈ {1,2}l , we have

and an analogous estimate for ${{\Vert}{\partial }_{h}^{\left(2,{p}_{l},\dots ,{p}_{1}\right)}u{\Vert}}_{2}^{2}$. Consequently, ${{\Vert}{\nabla }_{h}u{\Vert}}_{2}^{2}{\leqslant}8{{\Vert}u{\Vert}}_{2}^{2}$ for such u. If $u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)$, then

so the claim follows.

For the symmetrised gradient, it is possible to pursue the same strategy since ${\mathcal{E}}_{h}^{k}={\mathcal{E}}_{h}\cdots {\mathcal{E}}_{h}$ on the respective discrete symmetric tensor fields. Indeed, with the Cauchy–Schwarz inequality and Vandermonde's identity (which reduces to the standard recurrence relation for binomial coefficients in most cases), one obtains

It is then easy to see that ${{\Vert}{\mathcal{E}}_{h}u{\Vert}}_{2}^{2}{\leqslant}{\sum }_{p=1}^{2}{{\Vert}{\partial }_{h}^{p}u{\Vert}}_{2}^{2}{\leqslant}8{{\Vert}u{\Vert}}_{2}^{2}$. □

Remark 6.6. For p ∈ {1,2}l and p0 ∈ {1, 2}, consider the discrete partial derivative ${\partial }_{h}^{{p}_{0}}:{{\Omega}}_{h}^{p}\to {{\Omega}}_{h}^{\left({p}_{0},p\right)}$ and its negative adjoint ${\partial }_{h,0}^{{p}_{0}}$, i.e., $\langle {\partial }_{h}^{{p}_{0}}u,\enspace v\rangle =-\langle u,\enspace {\partial }_{h,0}^{{p}_{0}}v\rangle $ for $u:{{\Omega}}_{h}^{p}\to \mathbf{R}$, $v:{{\Omega}}_{h}^{\left({p}_{0},p\right)}\to \mathbf{R}$. For $u:{{\Omega}}_{h}^{\left({p}_{0},p\right)}\to \mathbf{R}$, this results in

for $\left({i}_{p},{j}_{p}\right)\in {{\Omega}}_{h}^{p}$, where u is extended by 0 outside of ${{\Omega}}_{h}^{\left({p}_{0},p\right)}$. (In contrast, ${\partial }_{h}^{{p}_{0}}u$ is only defined for $\left({i}_{p},{j}_{p}\right)\in {{\Omega}}_{h}^{\left({p}_{0},{p}_{0},p\right)}$.)

Consequently, the negative adjoint of the discrete gradient induces a divergence for discrete tensor fields $u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l+1}\left({\mathbf{R}}^{2}\right)\right)$ such that ${\mathrm{d}\mathrm{i}\mathrm{v}}_{h}u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{l}\left({\mathbf{R}}^{2}\right)\right)$ and

for p ∈ {1,2}l . For the discrete divergence that arises as the negative adjoint of the symmetrised gradient on 2h , Syml (Rd )), one has to take the symmetrisation into account: for u2h , Syml+1(R2)), we have

Here, the operators ${\partial }_{h,0}^{{p}_{0}}$ act on functions on the grid ${{\Omega}}_{h}^{\alpha +\alpha \left({p}_{0}\right)}$ and yield functions on the grid ${{\Omega}}_{h}^{\alpha }$. Note that in the grid point (iα , jα ), these partial derivatives have to be evaluated in the grid points $\left({i}_{\alpha +{e}_{1}}+\frac{1}{2}{\left(-1\right)}^{l},{j}_{\alpha +{e}_{1}}\right)$ and $\left({i}_{\alpha +{e}_{2}},{j}_{\alpha +{e}_{2}}+\frac{1}{2}{\left(-1\right)}^{l}\right)$, respectively. This way, the discrete divergence operator is consistently defined.

Remark 6.7. While the above approach provides a way of discretising higher-order regularisation approaches with finite differences up to an arbitrary order of differentiation, fundamental questions of numerical analysis such as consistency and stability of such discretisations, and consequently, of the existence of error bounds that converge to zero as the discretisations level becomes finer, were not addressed. In view of the non-smooth minimisation problems associated with regularisation approaches in imaging, a first approach to answer such questions is typically to consider convergence of minimisers of the discrete energies to minimisers of the continuous counterpart, e.g., by ensuring gamma-convergence. While for first-order TV regularisation convergence of minimisers is known [57], an extension of such results to higher-order approaches still seems to be open. With respect to error bounds, the situation is similar and we refer to [191] for results on finite-difference discretisations of first-order TV in certain Lipschitz spaces.

6.2. A general saddle-point framework

Having appropriately discretised versions of higher-order regularisation functionals available, we now deal with the numerical solution of corresponding Tikhonov approaches for inverse problems. To this aim, we first consider a general framework and then derive concrete realisations for different regularisation approaches.

Let Ωh be the discretised grid of subsection 6.1 and define Uh = 2h ). We assume a discrete linear forward operator Kh : Uh Yh , with $\left({Y}_{h},{\Vert}\cdot {{\Vert}}_{{Y}_{h}}\right)$ a finite-dimensional Hilbert space, and a proper, convex, lower semi-continuous and coercive discrepancy term ${S}_{{f}_{h}}:{Y}_{h}\to \left[0,\infty \right]$ with corresponding discrete data fh to be given. Further, we define ${\mathcal{R}}_{\alpha }:{U}_{h}\to \left[0,\infty \right]$ to be a regularisation functional given in a general form as ${\mathcal{R}}_{\alpha }\left(u\right)={\mathrm{min}}_{w\in {W}_{h}}{\Vert}{D}_{h}\left(u,w\right){{\Vert}}_{1,\alpha }$, with Dh : Uh × Wh Vh , $\left(u,w\right){\mapsto}{D}_{h}^{1}u+{D}_{h}^{2}w$ a discrete differential operator and Wh , Vh finite-dimensional Hilbert spaces. The expression ||⋅||1,α here denotes an appropriate 1-type norm weighted using the parameters α and will be specified later for concrete examples. Its dual norm is denoted by ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$. We consider the general minimisation problem

Equation (64)

for which we will numerically solve the equivalent reformulation

Equation (65)

Remark 6.8. Note that here, the auxiliary variable w and the space Wh are used to include balancing-type regularisation approaches such as the infimal convolution of two functionals. Setting, for example, Wh = Uh , ${V}_{h}={\ell }^{2}\left({\Omega},{\mathcal{T}}^{1}\left({\mathbf{R}}^{2}\right)\right){\times}{\ell }^{2}\left({\Omega},{\mathcal{T}}^{2}\left({\mathbf{R}}^{2}\right)\right)$, ${D}_{h}\left(u,w\right)=\left({\nabla }_{h}u-{\nabla }_{h}w,{\nabla }_{h}^{2}w\right)$ and ||(v1, v2)||1,α = α1||v1||1 + α2||v2||1 for positive α = (α1, α2) yields

Total-variation regularisation can, on the other hand, be obtained by choosing Wh = {0}, ${V}_{h}={\ell }^{2}\left({{\Omega}}_{h},{\mathcal{T}}^{1}\left({\mathbf{R}}^{2}\right)\right)$, Dh (u, 0) = ∇h u and ||v||1,α = α||v||1 for α > 0.

Remark 6.9. The dual norm ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ will become relevant in the context of primal-dual algorithms via its Fenchel dual. Indeed, we have the identity ${\left({{\Vert}\cdot {\Vert}}_{X}\right)}^{{\ast}}={\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{{X}^{{\ast}}}{\leqslant}1\right\}}$ for ||⋅||X the norm of a general Banach space X and ${{\Vert}\cdot {\Vert}}_{{X}^{{\ast}}}$ its dual norm on X*. This is a consequence of $\langle w,\enspace u\rangle -{{\Vert}w{\Vert}}_{{X}^{{\ast}}}{{\Vert}u{\Vert}}_{X}{\leqslant}0$ for all uX and wX*, so ${\left({{\Vert}\cdot {\Vert}}_{X}\right)}^{{\ast}}\left(w\right)=0$ for ${{\Vert}w{\Vert}}_{{X}^{{\ast}}}{\leqslant}1$. For ${{\Vert}w{\Vert}}_{{X}^{{\ast}}}{ >}1$ one can find a uX, ||u||X ⩽ 1 such that, for a c > 0, ⟨w, u⟩ ⩾ 1 + c ⩾ ||u||X + c. For each t > 0, we get ⟨w, tu⟩ − ||tu||X tc, hence ${\left({{\Vert}\cdot {\Vert}}_{X}\right)}^{{\ast}}\left(w\right)=\infty $.

Remark 6.10. While the setting of (64) allows to include rather general forward operators Kh and discrepancy terms ${S}_{{f}_{h}}$, it will still not capture all applications of higher-order regularisation that we consider later in subsections 7 and 8. It rather comprises a balance between general applicability and uniform presentation, and we will comment on possible extensions later on such that the interested reader should be able to adapt the setting presented here to the concrete problem setting at hand.

From a more general perspective, the reformulation (65) of (64) constitutes a non-smooth, convex optimisation problem of the form

Equation (66)

with $\mathcal{X},\mathcal{Y}$ Hilbert spaces, $\mathcal{F}:\mathcal{Y}\to \left[0,\infty \right]$, $\mathcal{G}:\mathcal{X}\to \left[0,\infty \right]$ proper, convex and lower semi-continuous functionals and $\mathcal{K}:\mathcal{X}\to \mathcal{Y}$ linear and continuous. For this class of problems, duality-based first-order optimisation algorithms of ascent/descent-type have become very popular in the past years as they are rather generally applicable and yield algorithms for the solution of (66) that provably converge to a global optimum, while allowing a simple implementation and practical stepsize choices, such as constant stepsizes. The algorithm of [56], for instance, constitutes a relatively early step in this direction, as it solves the TV-denoising problem with constant stepsizes in terms of a dual problem.

For problems of the type (66), it is often beneficial to consider a primal-dual saddle-point reformulation instead of the dual problem alone, in particular in view of general applicability. This is given as

Equation (67)

with ${\langle \cdot ,\cdot \rangle }_{\mathcal{Y}}$ denoting the inner product in $\mathcal{Y}$. By interchanging minimum and maximum and minimising with respect to x, one further arrives at the dual problem which reads as

Equation (68)

Under certain conditions, the minimum in (66) and maximum in (68) admit the same value and primal/dual solution pairs for (66) an (68) correspond to solutions of the saddle-point problem (67), see below.

Now, indeed, many different algorithmic approaches for solving (67) are nowadays available (see for instance [41, 42, 60, 61, 124]) and which one of them delivers the best performance typically depends on the concrete problem instance. Here, as exemplary algorithmic framework, we consider the popular primal-dual algorithm of [60] (see also [144, 203]), which has the advantage of being simple and yet rather generally applicable.

Conceptually, the algorithm of [60] solves the saddle-point problem (67) via implicit gradient descent and ascent steps with respect to the primal and dual variables, respectively. With $\mathcal{L}\left(x,y\right)={\langle \mathcal{K}x,y\rangle }_{\mathcal{Y}}+\mathcal{G}\left(x\right)-{\mathcal{F}}^{{\ast}}\left(y\right)$, carrying out these implicit steps simultaneously in both variables would correspond to computing the iterates {(xn , yn )} via

Equation (69)

where ∂x and ∂y denotes the subgradient with respect to the first and second variable, respectively, and σ, τ are positive constants. To obtain computationally feasible iterations, the implicit step for yn+1 in the primal-dual algorithm uses an extrapolation ${\overline{x}}^{n}=2{x}^{n}-{x}^{n-1}$ of the previous iterate instead of xn+1, such that the descent and ascent steps decouple and can be re-written as

Equation (70)

The mappings ${\left(\mathrm{i}\mathrm{d}+\sigma \partial {\mathcal{F}}^{{\ast}}\right)}^{-1}$ and ${\left(\mathrm{i}\mathrm{d}+\tau \partial \mathcal{G}\right)}^{-1}$ used here are so-called proximal mappings of ${\mathcal{F}}^{{\ast}}$ and $\mathcal{G}$, respectively, which, as noted in proposition 6.11 below, are well-defined and single valued whenever $\mathcal{G},{\mathcal{F}}^{{\ast}}$ are proper, convex and lower semi-continuous. The resulting algorithm can then be interpreted as proximal-point algorithm (see [102, 161]) and weak convergence in the sense that (xn , yn ) ⇀ (x*, y*) for (x*, y*) being a solution to the saddle-point problem (67) can be ensured for positive stepsize choices σ, τ such that $\sigma \tau {\Vert}\mathcal{K}{{\Vert}}^{2}{< }1$, see for instance [61, 143], or [60] for the finite-dimensional case. In contrast, explicit methods for non-smooth optimisation problems such as subgradient descent, for instance, usually require stepsizes that converge to zero [138] and could stagnate numerically.

Overall, the efficiency of the iteration steps in (70) crucially depends on the ability to evaluate $\mathcal{K}$ and ${\mathcal{K}}^{{\ast}}$ and to compute the proximal mappings efficiently. Regarding the latter, this is possible for a large class of functionals, in particular for many functionals that are defined pointwise, which is one of the reasons for the high popularity of these kind of algorithms. We now consider proximal mappings in more detail and provide concrete examples later on.

Proposition 6.11. Let H be a Hilbert space, $F:H\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous, and σ > 0.

  • (a)  
    Then, the mapping
    Equation (71)
    is well-defined.
  • (b)  
    For uH, u* = proxσF (u) solves the inclusion relation
    i.e., proxσF = (id + σF)−1.
  • (c)  
    The mapping proxσF is Lipschitz-continuous with constant not exceeding 1.

Proof. See, for instance, [178, proposition IV.1.5, corollary IV.1.3], or [13, proposition 12.15, example 23.3, corollary 23.10]. □

In general, the computation of proximal mappings can be as difficult as solving the original optimisation problem itself. However, if, for instance, the corresponding functional can be 'well separated' into simple building blocks, then proximal mappings can be reduced to some basic ones which are simple and easy to compute.

Lemma 6.12. Let H = H1 ⊥⋯⊥ Hn with closed subspaces H1, ..., Hn , the mappings P1, ..., Pn their orthogonal projectors,

with each ${F}_{i}:{H}_{i}\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous. Then,

Proof. This is immediate since the corresponding minimisation problem decouples. □

Furthermore, Moreau's identity (see [160], for instance) provides a relation between the proximal mapping of a function F and the proximal mapping of its dual F* according to

Equation (72)

This immediately implies that for general σ > 0, the computation of (id + σF)−1 is essentially as difficult as the computation of ${\left(\mathrm{i}\mathrm{d}+\sigma \partial {F}^{{\ast}}\right)}^{-1}$, in particular the latter can be obtained from the former as follows.

Lemma 6.13. Let H be a Hilbert space and $F:H\to \left.\right]- \infty ,\infty \left.\right]$ be proper, convex and lower semi-continuous. Then, for σ > 0,

In some situations, the computation of the proximal mappings of the sum of two functions decouples into the composition of two mappings.

Lemma 6.14. Let H be a Hilbert space, $F:H\to \left.\right]- \infty ,\infty \left.\right]$ be proper, convex and lower semi-continuous and σ > 0.

  • (a)  
    If $F\left(u\right)=G\left(u\right)+\frac{\alpha }{2}{\Vert}u-{u}_{0}{{\Vert}}_{H}^{2}$ with $G:H\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous, u0H and α > 0, then
  • (b)  
    If H = ${\mathbf{R}}^{M}$ equipped with the Euclidean norm and $F\left(u\right)={\sum }_{m=1}^{M}{\mathcal{I}}_{\left[{a}_{m},{b}_{m}\right]}\left({u}_{m}\right)+{F}_{m}\left({u}_{m}\right)$ with dom(Fm ) ∩ [am , bm ] ≠ $\mathrm{\varnothing}$ for each m = 1, ..., M, then
    for m = 1, ..., M, where
    is the projection to [am , bm ] in $\mathbf{R}$.

Proof. Regarding (a), we note that by first-order optimality conditions (additivity of the subdifferential follows from [83, proposition I.5.6]), we have the following equivalences:

which proves the explicit form of proxσF . The intermediate equality follows from ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma \frac{\alpha }{2}{\Vert}\cdot -{u}_{0}{{\Vert}}_{H}^{2}}\left(u\right)=\frac{u+\sigma \alpha {u}_{0}}{1+\sigma \alpha }$, which can again be seen from the optimality conditions.

In order to show (b), first note that, using lemma 6.12, it suffices to show the assertion for $F\left(u\right)={\mathcal{I}}_{\left[a,b\right]}\left(u\right)+f\left(u\right)$ with uR, ab and $f:\mathbf{R}\to \left.\right]- \infty ,\infty \left.\right]$ proper, convex and lower semi-continuous such that dom(f) ∩ [a, b] ≠ Ø. Also, the identity ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {\mathcal{I}}_{\left[a,b\right]}}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left[a,b\right]}$ as well as the explicit form of the projection is immediate. Now, denote by u* = proj[a,b]◦proxσf (u) and write it as u* = νuF + (1 − ν)uf with uF = proxσF (u) and uf = proxσf (u) and ν ∈ [0, 1]. To see that this is possible, note that in case of uf ∈ [a, b], u* = uf , in case uf < a we have that uf < a = u* ⩽ uF and similarly in case of uf > b. But, with $E\left(\bar{u}\right)=\frac{\vert \bar{u}-u{\vert }^{2}}{2}+\sigma f\left(\bar{u}\right)$, convexity and minimality of uf implies that

Since both u* and uF are in [a, b], the result follows from uniqueness of minimisers. □

In the following we provide some examples of explicit proximal mappings for some particular choices of F that are relevant for applications. For additional examples and further, general results on proximal mappings, we refer to [13, 66].

Lemma 6.15. Let H be a Hilbert space, σ > 0 and $F:H\to \left.\right]- \infty ,\infty \left.\right]$. Then, the following identities hold.

  • (a)  
    For $F\left(u\right)=\frac{\alpha }{2}{{\Vert}u-f{\Vert}}_{H}^{2}$ with fH, α > 0,
  • (b)  
    For $F={\mathcal{I}}_{C}$ for some non-empty, convex and closed set CH,
    with projC denoting the orthogonal projection onto C.
  • (c)  
    For F(u) = G(uu0) with $G:H\to \left.\right]- \infty ,\infty \left.\right]$ and u0H,
  • (d)  
    For $H={\mathcal{H}}_{1}{\times}\cdots {\times}{\mathcal{H}}_{M}$ product of the Hilbert spaces ${\mathcal{H}}_{1},\dots ,{\mathcal{H}}_{M}$, ${{\Vert}u{\Vert}}_{H}^{2}={\sum }_{m=1}^{M}\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}^{2}$ with each $\vert \cdot {\vert }_{{\mathcal{H}}_{m}}$ denoting the norm on ${\mathcal{H}}_{m}$, $\alpha \in {\left.\right]0,\infty \left[\right.}^{ M}$, and $F={\mathcal{I}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ where ${\Vert}u{{\Vert}}_{\infty ,{\alpha }^{-1}}={\mathrm{max}}_{m=1,\dots ,M}{\alpha }_{m}^{-1}\vert {u}_{m}{\vert }_{\mathcal{H}}$,
  • (e)  
    In the situation of (d) and $F\left(u\right)={\sum }_{m=1}^{M}{\alpha }_{m}\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}$,
    and
    with ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{\vert \cdot {\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}\right\}}$ as in (d).

Proof. The assertion on proxσF in (a) follows from first-order optimality conditions as in lemma 6.14, the assertion on ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {F}^{{\ast}}}$ is a consequence of lemma 6.13. Assertion (b) is immediate from the definition of the orthogonal projection in Hilbert spaces and (c) follows from a simple change of variables for the minimisation problem in the definition of the proximal mapping. Regarding (d), using lemma 6.12 and noting that ${{\Vert}u{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1$ if and only if $\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}$ for m = 1, ..., M, it suffices to show that for each ${u}_{m}\in {\mathcal{H}}_{m}$ and m = 1, ..., M,

To this aim, observe that by definition of the projection in Hilbert spaces,

From this, it is easy to see that the minimum in the last line is achieved for t = 1 in case $\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}{\leqslant}{\alpha }_{m}$ and $t={\alpha }_{m}/\vert {u}_{m}{\vert }_{{\mathcal{H}}_{m}}$ otherwise, from which the explicit form of the projection follows. Considering assertion (e), we have by remark 6.9 that

so the statements follow by lemma 6.13 and assertion (d). □

Now considering the minimisation problem (65), our approach is to rewrite it as a saddle-point problem such that, when applying the iteration (70), the involved proximal mappings decouple into simple and explicit mappings. To achieve this while allowing for general forward operators Kh , we dualise both the regularisation and the data-fidelity term and arrive at the following a saddle-point reformulation of (65):

Equation (73)

recalling that the dual norm of ||⋅||1,α is denoted by ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ and that ${\left({\Vert}\cdot {{\Vert}}_{1,\alpha }\right)}^{{\ast}}={\mathcal{I}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$, see remark 6.9. The following lemma provides some instances of ||⋅||1,α that arise in the context of higher-order TV regularisers and its generalisations. In particular, for these instances, the corresponding dual norm ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ and the proximal mappings ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}}={\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ will be provided. Concrete examples will be discussed in example 6.19 below.

Lemma 6.16. With MN, $\alpha \in {\left.\right]0,\infty \left[\right.}^{ M}$, l1, ..., lM N, and ${\mathcal{H}}_{m}\in \left\{{\mathcal{T}}^{{l}_{m}}\left({\mathbf{R}}^{2}\right),{\mathrm{S}\mathrm{y}\mathrm{m}}^{{l}_{m}}\left({\mathbf{R}}^{2}\right)\right\}$ for m = 1, ..., M, let

for v = (v1, ..., vM ) ∈ Vh , where Vh is equipped with the induced inner product ${\langle u,v\rangle }_{{V}_{h}}={\sum }_{m=1}^{M}{\langle {u}_{m},{v}_{m}\rangle }_{{\ell }^{2}\left({{\Omega}}_{h},{\mathcal{H}}_{m}\right)}$ and norm, and the one-norm on each ${\ell }^{2}\left({\Omega},{\mathcal{H}}_{M}\right)$ is given in definition 6.4. Then, the dual norm ${{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}$ satisfies

with the -norm again according to definition 6.4. Further, we have, for m = 1, ..., M, that

where the right-hand side has to be interpreted in the pointwise sense, i.e., for $u\in {\ell }^{2}\left({{\Omega}}_{h},{\mathcal{H}}_{m}\right)$ and (i, j) ∈ Ωh , it holds that ${\left(\mathrm{max}{\left\{1,{\alpha }_{m}^{-1}\vert u{\vert }_{{\mathcal{H}}_{m}}\right\}}^{-1}u\right)}_{i,j}=\mathrm{max}{\left\{1,{\alpha }_{m}^{-1}\vert {u}_{i,j}{\vert }_{{\mathcal{H}}_{m}}\right\}}^{-1}{u}_{i,j}$.

Proof. By definition, we have for u, vVh with ||u||1,α ⩽ 1 that

hence, ${{\Vert}v{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}{\mathrm{max}}_{m=1,\dots ,M}{\alpha }_{m}^{-1}{{\Vert}{v}_{m}{\Vert}}_{\infty }$. With (m, i, j) a maximising argument of the right-hand side above, equality follows, in case of v ≠ 0, from choosing u according to ${\left({u}_{m}\right)}_{i,j}={\alpha }_{m}^{-1}{\left({v}_{m}\right)}_{i,j}/\vert {\left({v}_{m}\right)}_{i,j}{\vert }_{{\mathcal{H}}_{m}}$ and 0 everywhere else. The case v = 0 is trivial. Also, since Ωh = {1, ..., N1} × {1, ..., N2} is finite, one can interpret each ${\ell }^{2}\left({\Omega},{\mathcal{H}}_{m}\right)$ as ${\mathcal{H}}_{m}^{{N}_{1}{N}_{2}}$, such that lemma 6.15(e) applied to $H={V}_{h}={\mathcal{H}}_{1}^{{N}_{1}{N}_{2}}{\times}\cdots {\times}{\mathcal{H}}_{M}^{{N}_{1}{N}_{2}}$ immediately yields the stated pointwise identity for the proximal mapping/projection. □

Under mild assumptions on ${S}_{{f}_{h}}$, equivalence of the primal problem (65) and the saddle-point problem (73) indeed holds and existence of a solution to both (as well as a corresponding dual problem) can be ensured.

Proposition 6.17. Under the assumptions stated for problem (65), there exists a solution. Further, if ${S}_{{f}_{h}}$ is such that ${Y}_{h}={\bigcup }_{t{\geqslant}0}t\left(\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)+\mathrm{r}\mathrm{g}\left({K}_{h}\right)\right)$, then there exists a solution to the dual problem

Equation (74)

and to the saddle-point problem (73). Further, strong duality holds and the problems are equivalent in the sense that ((u, w), (v, λ)) is a solution to (73) if and only if (u, w) solves (65) and (v, λ) solves (74).

Proof. At first note that existence for (65) can be shown as in theorem 3.26.

By virtue of theorem 5.7, choosing X = Uh × Wh , Y = Vh × Yh , $F:X\to \left.\right]- \infty ,\infty \left.\right]$ as F = 0, $G:Y\to \left.\right]- \infty ,\infty \left.\right]$ as $G\left(v,\lambda \right)={{\Vert}v{\Vert}}_{1,\alpha }+{S}_{{f}_{h}}\left(\lambda \right)$, and Λ : XY as ${\Lambda}\left(u,w\right)=\left({D}_{h}\left(u,w\right),{K}_{h}u\right)$, we only need to verify (39) to obtain existence of dual solutions and strong duality. But since dom(F) = X and $\mathrm{d}\mathrm{o}\mathrm{m}\left(G\right)={V}_{h}{\times}\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)$, the latter condition is equivalent to ${Y}_{h}={\bigcup }_{t{\geqslant}0}t\left(\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)+\mathrm{r}\mathrm{g}\left({K}_{h}\right)\right)$. Also, since ${F}^{{\ast}}={\mathcal{I}}_{\left\{0\right\}}$ and ${\left({{\Vert}\cdot {\Vert}}_{1,\alpha }\right)}^{{\ast}}={\mathcal{I}}_{\left\{{{\Vert}\cdot {\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$, see remark 6.9, the maximisation problem in (40) corresponds to (74). Finally, the equivalence of the saddle-point problem (73) to (65) and (74) then follows from [83, proposition III.3.1]. □

Remark 6.18. In some applications, it is beneficial to add an additional penalty term on u in form of Φ : Uh → [0, ] proper, convex and lower semi-continuous to the energy of (65), whereas in other situations when $u{\mapsto}{S}_{{f}_{h}}\left({K}_{h}u\right)$ has a suitable structure, a dualisation of the data term is not necessary, see the discussion below. Regarding the former, the differences when extending proposition 6.17 is that existence for the primal problem needs to be shown differently and that the domain of Φ needs to be taken into account for obtaining strong duality. Existence can, for instance, be proved when assuming that either Φ is the indicator function of a polyhedral set (see [35, proposition 1]), or that ker(Kh ) ∩ ker(Dh ) = {0}. Duality is further obtained when ${Y}_{h}={\bigcup }_{t{\geqslant}0}t\left(\mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)-{K}_{h}\enspace \mathrm{d}\mathrm{o}\mathrm{m}\left({\Phi}\right)\right)$. Regarding the latter, a version of proposition 6.17 without the dualisation of the data term ${S}_{{f}_{h}}\left({K}_{h}u\right)$ holds even without the assumption on the domain of ${S}_{{f}_{h}}$, however, with a different associated dual problem and saddle-point problem.

In particular, not dualising the data term has impact on the primal-dual optimisation algorithms. In view of the iteration (70), the evaluation of the proximal mapping for $u{\mapsto}{S}_{{f}_{h}}\left({K}_{h}u\right)$ then becomes necessary, so this dualisation strategy is only practical if the latter proximal mapping can easily be computed. Furthermore, in case of a sufficiently smooth data term, dualisation of ${S}_{{f}_{h}}$ can also be avoided by using explicit descent steps for ${S}_{{f}_{h}}$ instead of proximal mappings, where the Lipschitz constant of the derivative of ${S}_{{f}_{h}}$ usually enters in the stepsize bound. See [61] for an extension of the primal-dual algorithm in that direction.

In view of proposition 6.17, we now address the numerical solution of the saddle-point problem (73). Applying the iteration (70), this results in algorithm 1, which is given in a general form. A concrete implementation still requires an explicit form of the proximal mapping ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}$, a concrete choice of Vh , Wh and Dh as well as an estimate on ||(Dh , Kh )|| for the stepsize choice and a suitable stopping criterion. These building blocks will now be provided for different choices of ${\mathcal{R}}_{\alpha }$ and ${S}_{{f}_{\alpha }}$ in a way that they can be combined to an arrive at a concrete, application-specific algorithm. After that, two examples will be discussed.

Algorithm 1. Primal-dual scheme for the numerical solution of (73).

1: function Tikhonov(Kh , fh , α)
2:  $\left(u,w,\overline{u},\overline{w}\right){\leftarrow}\left(0,0,0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right)$
3:  Choose σ, τ > 0 such that $\sigma \tau {{\Vert}\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right){\Vert}}^{2}{< }1$
4:  repeat
5:     Dual updates
6:     $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(v+\sigma \left({D}_{h}^{1}\overline{u}+{D}_{h}^{2}\overline{w}\right)\right)$
7:     $\lambda {\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}\left(\lambda +\sigma {K}_{h}\overline{u}\right)$
8:     Primal updates
9:     ${u}_{+}{\leftarrow}u-\tau \left({\left({D}_{h}^{1}\right)}^{{\ast}}v+{K}_{h}^{{\ast}}\lambda \right)$
10:   ${w}_{+}{\leftarrow}w-\tau \left({\left({D}_{h}^{2}\right)}^{{\ast}}v\right)$
11:   Extrapolation and update
12:   $\left(\overline{u},\overline{w}\right){\leftarrow}2\left({u}_{+},{w}_{+}\right)-\left(u,w\right)$
13:   (u, w) ← (u+, w+)
14:   until stopping criterion fulfiled
15:   return u
16: end function

Proximal mapping of ${S}_{{f}_{h}}^{{\ast}}$. Depending on the application of interest, and in particular on the assumption on the underlying measurement noise, different choices of ${S}_{{f}_{h}}$ are reasonable. The one which is probably most relevant in practice is

which, from a statistical perspective, is the right choice under the assumption of Gaussian noise. In this case, as discussed in lemma 6.15, the proximal mapping of the dual is given as

A second, practically relevant choice is the Kullback–Leibler divergence as in (2). For discrete data ${\left({\left({f}_{h}\right)}_{i}\right)}_{i}$ satisfying ${\left({f}_{h}\right)}_{i}{\geqslant}0$ for each i, and a corresponding discrete signal ${\left({\lambda }_{i}\right)}_{i}$, this corresponds to

Equation (75)

where we again use the convention ${\left({f}_{h}\right)}_{i}\enspace \mathrm{log}\left(0\right)=- \infty $ for ${\left({f}_{h}\right)}_{i}{ >}0$ and $0\enspace \mathrm{log}\left(\frac{{\lambda }_{i}}{0}\right)=0$ whenever λi ⩾ 0. A direct computation (see for instance [122]) shows that in this case

Another choice that is relevant in the presence of strong data outliers (e.g., due to transmission errors) is

in which case

can be obtained from lemmas 6.13 and 6.15.

As already mentioned in remark 6.18, in case the discrepancies term is not dualised, a corresponding version of the algorithm of [60] requires the proximal mappings of $\tau {S}_{{f}_{h}}$ which can either be computed directly or obtained from ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}$ using Moreau's identity as in lemma 6.13. Further, there are many other choices of ${S}_{{f}_{h}}$ for which the ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}$ is simple and explicit, such as, for instance, equality constraints on a subdomain in the case of image inpainting or box constraints in case of dequantisation or image decompression.

Choice of ${\mathcal{R}}_{\alpha }$ and proximal mapping. As we show now, the general form ${\mathcal{R}}_{\alpha }\left(u\right)={\mathrm{min}}_{w\in {W}_{h}}{\Vert}{D}_{h}\left(u,w\right){{\Vert}}_{1,\alpha }$ covers all higher-order regularisation approaches discussed in the previous sections.

Example 6.19. 

  •   
    Higher-order total variation. The choice ${\mathcal{R}}_{\alpha }\left(u\right)=\alpha {\Vert}{\nabla }^{k}u{{\Vert}}_{1}$, with k ⩾ 1 the order of differentiation, can be realised with
    which yields, according to lemma 6.16, for (i, j) ∈ Ωh that
    Equation (76)
    Here, we used that whenever Wh = {0}, one can ignore the second argument of Dh : Uh × Wh Vh and regard it as operator Dh : Uh Vh .
  •   
    Sum of higher-order TV functionals. The choice ${\mathcal{R}}_{\alpha }\left(u\right)={\alpha }_{1}{\Vert}{\nabla }_{h}^{{k}_{1}}u{{\Vert}}_{1}+{\alpha }_{2}{\Vert}{\nabla }_{h}^{{k}_{2}}u{{\Vert}}_{1}$, with k2 > k1 ⩾ 1 and αi > 0 for i = 1, 2 differentiation orders and weighting parameters, respectively, can be realised with
    and yields, according to lemma 6.16,
    Equation (77)
    with ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{i}\right\}}$ as in (76).
  •   
    Infimal convolution of higher-order TV functionals. The infimal convolution
    can be realised via
    where ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ is given as in (77).
  •   
    Second-order total generalised variation. Let α0, α1 > 0. The choice
    can be realised via
    where ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ is given again as in (77) with α2 replaced by α0.
  •   
    Total generalised variation of order k . The total generalised variation functional of arbitrary order kN, k ⩾ 1, and weights $\alpha =\left({\alpha }_{0},\dots ,{\alpha }_{k-1}\right)\in {\left.\right]0,\infty \left[\right.}^{k}$, i.e.,
    can be realised via
    In this case,
    where ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{m}\right\}}$ is given as in (76).

Stepsize choice and stopping rule. Algorithm 1 requires to choose stepsizes σ, τ > 0 such that $\sigma \tau {\Vert}\mathcal{K}{{\Vert}}^{2}{< }1$ where $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$. This, in turn, requires to estimate ${\Vert}\mathcal{K}{\Vert}$ which we discuss on the following. The operator Kh is application-dependent and we assume an upper bound for its norm to be given. Regarding the differential operator ${D}_{h}=\left({D}_{h}^{1},{D}_{h}^{2}\right)$, an estimate on the norm of its building blocks ${\nabla }_{h}^{k}$, ${\mathcal{E}}_{h}^{k}$ is provided in lemma 6.5. As the following proposition shows, an upper bound on ${\Vert}\mathcal{K}{\Vert}$ as well as on the norm of more general block-operators, can then be obtained by computing a simple singular value decomposition of a usually low-dimensional matrix.

Lemma 6.20. Assume that $\mathcal{K}:\mathcal{X}\to \mathcal{Y}$ with $\mathcal{X}={\mathcal{X}}_{1}{\times}\cdots {\times}{\mathcal{X}}_{N}$, $\mathcal{Y}={\mathcal{Y}}_{1}{\times}\cdots {\times}{\mathcal{Y}}_{M}$ is given as

and that ${\Vert}{\mathcal{K}}_{m,n}{\Vert}{\leqslant}{L}_{m,n}$ for each m = 1, ..., M, n = 1, ..., N. Then,

where σmax denotes the largest singular value of a matrix.

Proof. For $x=\left({x}_{1},\dots ,{x}_{N}\right)\in \mathcal{X}$ we estimate

from which the claimed assertion follows since the matrix norm induced by the two-norm corresponds to the largest singular value. □

This result can be applied in the setting (73), i.e., $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$, leading to

Alternatively, one could use the result when ${D}_{h}^{1}$ or ${D}_{h}^{2}$ have block structures and a norm estimate is known for each block in addition to an estimate on ||Kh ||. Two concrete examples will be provided at the end of this section below.

Remark 6.21. In practice, provided that Lm,n is a good upper bound for ${\Vert}{\mathcal{K}}_{m,n}{\Vert}$, the norm estimate of lemma 6.20 is rather tight such that, depending on ${\Vert}\mathcal{K}{\Vert}$, the admissible stepsizes can be sufficiently large. Furthermore, the constraint $\sigma \tau {\Vert}\mathcal{K}{{\Vert}}^{2}{< }1$ still allows to choose an arbitrary positive ratio θ = σ/τ and, in our experience, often a choice θ ≪ 1 or θ ≫ 1 can accelerate convergence significantly. Finally, we also note that in case no estimate on ${\Vert}\mathcal{K}{\Vert}$ can be obtained, or in case an explicit estimate only allows for prohibitively small stepsizes, also an adaptive stepsize choice without prior knowledge of ${\Vert}\mathcal{K}{\Vert}$ is possible, see for instance [34].

Remark 6.22. It is worth mentioning that, in case of a uniformly convex functional in the saddle-point formulation (73) (which is not the case in the setting considered here), a further acceleration can be achieved by adaptive stepsize choices, see for instance [60].

Remark 6.23. Regarding a suitable stopping criterion, we note that often, the primal-dual gap, i.e., the gap between the energy of the primal and dual problem (66) and (68) evaluated at the current iterates, provides a good measure for optimality. Indeed, with

$\mathfrak{G}\left(x,y\right){\geqslant}0$ and $\mathfrak{G}\left(x,y\right)=0$ if and only if (x, y) is optimal such that, in principle, the condition $\mathfrak{G}\left({x}^{n},{y}^{n}\right){< }\varepsilon $ with (xn , yn ) the iterates of (70) can be used as stopping criterion. In case this condition is met, xn as well as yn are both optimal up to an ɛ-tolerance in terms of the objective functionals for the primal and dual problem, respectively.

In the present situation of (73), however, the primal and dual problem (65) and (74) yield

While for the iterates (un , wn , vn , λn ), we always have ${{\Vert}{v}^{n}{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1$ as well as ${\lambda }^{n}\in \mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}^{{\ast}}\right)$, algorithm 1 does not guarantee that ${K}_{h}{u}^{n}\in \mathrm{d}\mathrm{o}\mathrm{m}\left({S}_{{f}_{h}}\right)$ and $\left({D}_{h}^{1}\right){v}^{n}+{K}_{h}^{{\ast}}{\lambda }^{n}=0$ as well as ${\left({D}_{h}^{2}\right)}^{{\ast}}{v}^{n}=0$, such that the primal-dual gap is always infinite in practice and the stopping criterion is never met. With some adaptations, however, it is sometimes still possible to obtain a primal-dual gap that converges to zero and hence, to deduce a stopping criterion with optimality guarantees. There are several possibilities for achieving this. Let us, for simplicity, assume that both ${S}_{{f}_{h}}$ and ${S}_{{f}_{h}}^{{\ast}}$ are finite everywhere and hence, continuous. This is, for example, the case for ${S}_{{f}_{h}}=\frac{1}{2}{{\Vert}\cdot -{f}_{h}{\Vert}}^{2}$. Next, assume that a priori norm estimates are available for all solution pairs (u*, w*), say ${{\Vert}{u}^{{\ast}}{\Vert}}_{{\tilde{U}}_{h}}{\leqslant}{C}_{u}$ and ${{\Vert}{w}^{{\ast}}{\Vert}}_{{\tilde {W}}_{h}}{\leqslant}{C}_{w}$ for norms ${{\Vert}\cdot {\Vert}}_{{\tilde{U}}_{h}}$, ${{\Vert}\cdot {\Vert}}_{{\tilde {W}}_{h}}$ on Uh , Wh that do not necessarily correspond to the Hilbert space norms. Such estimates may, for instance, be obtained from the observation that ${S}_{{f}_{h}}\left({u}^{{\ast}}\right)+{{\Vert}{D}_{h}\left({u}^{{\ast}},{w}^{{\ast}}\right){\Vert}}_{1,\alpha }{\leqslant}{S}_{{f}_{h}}\left(0\right)$ and suitable coercivity estimates, as discussed in sections 35. Then, the primal problem can, for instance, be replaced by

where (t)+ = max{0, t} for tR, which has, by construction, the same minimisers as the original problem (65), but a dual problem that reads as

where ${{\Vert}\cdot {\Vert}}_{{\tilde{U}}_{h}^{{\ast}}}$ and ${{\Vert}\cdot {\Vert}}_{{\tilde {W}}_{h}^{{\ast}}}$ denote the respective dual norms. By duality and since the minimum of the primal problem did not change, the modified dual problem also has the same solutions as the original dual problem. Now, as the iterates (vn , λn ) satisfy ${{\Vert}{v}^{n}{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1$ and ${\lambda }^{n}\in \mathrm{d}\mathrm{o}\mathrm{m}\enspace {S}_{{f}_{h}}^{{\ast}}$, the dual objective is finite for the iterates and converges to the maximum as n. Analogously, plugging in the sequence (un , wn ) into the modified primal problem yields convergence to the minimum, hence, the respective primal-dual gap converges to zero for the primal-dual iterates (un , wn , vn , λn ). In summary, the functional

yields the stopping criterion $\tilde {\mathfrak{G}}\left({u}^{n},{w}^{n},{v}^{n},{\lambda }^{n}\right){< }\varepsilon $ which will be met for some n and gives ɛ-optimality of (un , wn ) for the original primal problem (65).

The examples below show how this primal-dual gap reads for specific applications. For other strategies of modifying the primal-dual gap to a functional that is positive and finite, converges to zero and possibly provides an upper bound on optimality of the iterates in terms of the objective functional, see, for instance [31, 34, 41].

Concrete examples.

Example 6.24. As first example, we consider the minimisation problem

Equation (78)

i.e., second order-TV regularisation for a linear inverse problem with Gaussian measurement noise. In this setting, we choose Wh = {0}, Vh = 2h , Sym2(R2)), ${D}_{h}={\nabla }_{h}^{2}$ and ||v||1,α = α||v||1. Assuming that ||Kh || ⩽ 1 (after possible scaling of Kh ), lemma 6.20 together with the estimate ${\Vert}{\nabla }_{h}^{2}{\Vert}{\leqslant}8$ from lemma 6.5 yields

The resulting concrete realisation of algorithm 1 can be found in algorithm 2. Here, ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}\alpha \right\}}$ is given explicitly in (76), ${\mathrm{d}\mathrm{i}\mathrm{v}}_{h}^{2}={\mathrm{d}\mathrm{i}\mathrm{v}}_{h}{\mathrm{d}\mathrm{i}\mathrm{v}}_{h}$ is the adjoint of ${\nabla }_{h}^{2}$ and the modified primal-dual gap $\tilde {\mathfrak{G}}$ evaluated on the iterates (u, v, λ) of the algorithm reduces to

where Cu > 0 is an a priori bound on ${{\Vert}{u}^{{\ast}}{\Vert}}_{2}$ for solutions u* according to (17), for instance.

Algorithm 2. Implementation for solving the L2–TV2 problem (78).

1: function L2–TV2-Tikhonov(Kh , fh , α)⊳ Requirement: ||Kh || ⩽ 1
2:  $\left(u,\overline{u}\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right)$  
3:  Choose σ, τ > 0 such that $\sigma \tau {< }\frac{1}{65}$  
4:  repeat  
5:     Dual updates  
6:     $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}\alpha \right\}}\left(v+\sigma {\nabla }_{h}^{2}\overline{u}\right)$  
7:     $\lambda {\leftarrow}\left(\lambda +\sigma \left({K}_{h}\overline{u}-{f}_{h}\right)\right)/\left(1+\sigma \right)$  
8:     Primal updates  
9:     ${u}_{+}{\leftarrow}u-\tau \left({\mathrm{d}\mathrm{i}\mathrm{v}}_{h}^{2}v+{K}_{h}^{{\ast}}\lambda \right)$  
10:   Extrapolation and update  
11:   $\overline{u}{\leftarrow}2{u}_{+}-u$  
12:   uu+  
13:   until stopping criterion fulfiled 
14:   return u  
15: end function  

Example 6.25. As second example, we consider ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ regularisation for an inverse problem with Poisson noise and discrete non-negative data ${\left({\left({f}_{h}\right)}_{i}\right)}_{i}$, which corresponds to solving

Equation (79)

with KL being the discrete Kullback–Leibler divergence as in (75).

In this setting, we choose Wh = 2h , Sym1(R2)), Vh = 2h , Sym1(R2)) × 2(Ω, Sym2(R2)), ${D}_{h}=\left(\begin{matrix}\hfill {\nabla }_{h}\hfill & \hfill -\mathrm{i}\mathrm{d}\hfill \\ 0\hfill & \hfill {\mathcal{E}}_{h}\hfill \end{matrix}\right)$ and ||(v1, v2)||1,α = α1||v1||1 + α0||v2||1. Setting

and again assuming ||Kh || ⩽ 1, lemma 6.20 together with the estimates ${\Vert}{\nabla }_{h}{\Vert}{\leqslant}\sqrt{8}$ and ${\Vert}{\mathcal{E}}_{h}{\Vert}{\leqslant}\sqrt{8}$ from lemma 6.5 yields

The resulting, concrete implementation of algorithm 1 can be found in algorithm 3. Here, again ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{i}\right\}}$ is given explicitly in (76), and, abusing notation, divh is both the negative adjoint of ∇h and ${\mathcal{E}}_{h}$, depending on the input. The modified primal-dual gap $\tilde {\mathfrak{G}}$ evaluated on the iterates (u, w, v1, v2, λ) of the algorithm reduces to

where ${\mathrm{K}\mathrm{L}}^{{\ast}}\left(\lambda ,{f}_{h}\right)=-{\sum }_{i}{\left({f}_{h}\right)}_{i}\enspace \mathrm{log}\left(1-{\lambda }_{i}\right)$ whenever λi ⩽ 1 for each i where ${\left({f}_{h}\right)}_{i}\enspace \mathrm{log}\left(0\right)=\infty $ for ${\left({f}_{h}\right)}_{i}{ >}0$, 0  log(0) = 0, and KL*(λ, fh ) = else. Further, Cu is an a priori bound on the two-norm of ${u}^{{\ast}}={w}_{0}^{{\ast}}$ analogous to (18) while Cw is an a priori bound on the one-norm of ${w}^{{\ast}}={w}_{1}^{{\ast}}$ according to (51).

Algorithm 3. Implementation for solving the KL–${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ problem (79).

1: function KL–${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$-Tikhonov(Kh , fh , α)⊳ Requirement: ||Kh || ⩽ 1
2:  $\left(u,w,\overline{u},\overline{w}\right){\leftarrow}\left(0,0,0,0\right),\left({v}_{1},{v}_{2},\lambda \right){\leftarrow}\left(0,0,0\right)$  
3:  Choose σ, τ > 0 such that $\sigma \tau {\leqslant}\frac{6}{71}$  
4:  repeat  
5:     Dual updates  
6:     ${v}_{1}{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{1}\right\}}\left({v}_{1}+\sigma \left({\nabla }_{h}\overline{u}-\overline{w}\right)\right)$  
7:     ${v}_{2}{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty }{\leqslant}{\alpha }_{0}\right\}}\left({v}_{2}+\sigma {\mathcal{E}}_{h}\overline{w}\right)$  
8:     $\lambda {\leftarrow}\lambda +\sigma {K}_{h}\overline{u}$  
9:     $\lambda {\leftarrow}\lambda -\frac{\lambda -1+\sqrt{{\left(\lambda -1\right)}^{2}+4\sigma {f}_{h}}}{2}$  
10:   Primal updates  
11:   ${u}_{+}{\leftarrow}u+\tau \left({\mathrm{d}\mathrm{i}\mathrm{v}}_{h}{v}_{1}-{K}_{h}^{{\ast}}\lambda \right)$  
12:   w+w + τ(v1 + divh v2) 
13:   Extrapolation and update  
14:   $\left(\overline{u},\overline{w}\right){\leftarrow}2\left({u}_{+},{w}_{+}\right)-\left(u,w\right)$  
15:   (u, w) ← (u+, w+) 
16:   until stopping criterion fulfiled 
17:   return u  
18: end function  

We refer to, e.g., [27] for more examples of primal-dual-based algorithms for TGV regularisation.

6.3. Implicit and preconditioned optimisation methods

Let us shortly discuss other proximal algorithms for the solution of (65). One popular method is the alternating direction method of multipliers (ADMM) [89, 95] which bases on augmented Lagrangian formulations for (65), for instance,

Equation (80)

which results in the augmented Lagrangian

where τ > 0. For (80), the ADMM algorithm amounts to

Here, the first subproblem amounts to solving a least-squares problem and the associated normal equation is usually stably solvable since ${D}_{h}^{1}$ and ${D}_{h}^{2}$ involve discrete differential operators and hence, the normal equation essentially corresponds to the solution of a discrete elliptic equation that is perturbed by ${K}_{h}^{{\ast}}{K}_{h}$. For this reason, ADMM is usually considered an implicit method. The second step turns out to be the application of the proximal mappings for ${S}_{{f}_{h}}$ and ||⋅||1,α while the last update steps have an explicit form, see algorithm 4. By virtue of Moreau's identity (72) (also see lemma 6.13), the operators ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {{\Vert}\cdot {\Vert}}_{1,\alpha }}$ and ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {S}_{{f}_{h}}}$ can easily be computed knowing ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}$ and ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{{\tau }^{-1}{S}_{{f}_{h}}^{{\ast}}}$. We have, for instance,

where the projection operator usually has an explicit representation, see lemma 6.16 and example 6.19. Further, for the discrepancies discussed in subsection 6.2, it holds that

Algorithm 4. ADMM scheme for the numerical solution of (80).

1: function Tikhonov-ADMM(Kh , fh , α)
2:  $\left(u,w\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right),\left(\overline{v},\overline{\lambda }\right){\leftarrow}\left(0,0\right)$
3:  Choose τ > 0
4:  repeat
5:     Linear subproblem
6:     (u, w) ← solution of
      $\left(\begin{matrix}\hfill {K}_{h}^{{\ast}}{K}_{h}+{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill {\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{2}\hfill \\ \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{2}\hfill \end{matrix}\right)\left(\begin{matrix}\hfill u\hfill \\ \hfill w\hfill \end{matrix}\right)=\left(\begin{matrix}\hfill {K}_{h}^{{\ast}}\left(\lambda -\tau \overline{\lambda }\right)+{\left({D}_{h}^{1}\right)}^{{\ast}}\left(v-\tau \overline{v}\right)\hfill \\ \hfill {\left({D}_{h}^{2}\right)}^{{\ast}}\left(v-\tau \overline{v}\right)\hfill \end{matrix}\right)$
7:     Proximal subproblem
8:     $v{\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {{\Vert}\cdot {\Vert}}_{1,\alpha }}\left({D}_{h}^{1}u+{D}_{h}^{2}w+\tau \overline{v}\right)$
9:     $\lambda {\leftarrow}{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\tau {S}_{{f}_{h}}}\left({K}_{h}u+\tau \overline{\lambda }\right)$
10:   Lagrange multiplier
11:   $\overline{v}{\leftarrow}\overline{v}+\frac{1}{\tau }\left({D}_{h}^{1}u+{D}_{h}^{2}w-v\right)$
12:   $\overline{\lambda }{\leftarrow}\overline{\lambda }+\frac{1}{\tau }\left({K}_{h}u-\lambda \right)$
13:   until stopping criterion fulfiled
14:   return u
15: end function

While ADMM has the advantage of converging for arbitrary stepsizes τ > 0 (see, e.g. [25]), the main drawback is often considered the linear update step which amounts to solving a linear equation (or, alternatively, a least-squares problem) which can be computationally expensive. The latter can be avoided, for instance, with preconditioning techniques [43, 74]. Denoting again by $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$, the linear solution step amounts to solving ${\mathcal{K}}^{{\ast}}\mathcal{K}\left(\begin{matrix}\hfill u\hfill \\ \hfill w\hfill \end{matrix}\right)={\mathcal{K}}^{{\ast}}\left(\begin{matrix}\hfill v-\tau \overline{v}\hfill \\ \hfill \lambda -\tau \bar{\lambda }\hfill \end{matrix}\right)$. Introducing the additional variables (u', w') ∈ Uh × Wh as well as the constraint $\left({u}^{\prime },{w}^{\prime }\right)={\left(\rho \enspace \mathrm{i}\mathrm{d}-{\mathcal{K}}^{{\ast}}\mathcal{K}\right)}^{1/2}\left(u,w\right)$ for $\rho { >}{{\Vert}\mathcal{K}{\Vert}}^{2}$, we can consider the problem

which is equivalent to (80). The associated ADMM procedure, however, simplifies. In particular, the linear subproblem only involves ρ  id whose solution is trivial. Also, the Lagrange multipliers of the additional constraint are always zero within the iteration and the evaluation of the square root ${\left(\rho \enspace \mathrm{i}\mathrm{d}-{\mathcal{K}}^{{\ast}}\mathcal{K}\right)}^{1/2}$ can be avoided. This leads to the linear subproblem of algorithm 4 being replaced by the linear update step

Also, the procedure then requires, in each iteration, only one evaluation of Kh , ${D}_{h}^{1}$, ${D}_{h}^{2}$ and their respective adjoints as well as the evaluation of proximal mappings, such that the computational effort is comparable to algorithm 1. As a special variant of the general ADMM algorithm, the above preconditioned version converges for τ > 0 if $\rho { >}{{\Vert}\mathcal{K}{\Vert}}^{2}$ is satisfied. Thus, an estimate for ${\Vert}\mathcal{K}{\Vert}$ is required which can, e.g., be obtained by lemma 6.20 (also confer the concrete examples in subsection 6.2). While this is the most common preconditioning strategy for ADMM, there are many other possibilities for transforming the original linear subproblem into a simpler one such that, e.g., the preconditioned problem amounts to the application of one or more steps of a symmetric Gauss–Seidel iteration or a symmetric successive over-relaxation (SSOR) procedure [43].

Another class of methods for solving (65) is given by the Douglas–Rachford iteration [80, 130], which is an iterative procedure for solving monotone inclusion problems of the type

in Hilbert space, where A, B are maximally monotone operators. It proceeds as follows:

where σ > 0 is a stepsize parameter. As only the resolvent operators (id + σA)−1 and (id + σB)−1 are involved, the Douglas–Rachford iteration is also considered an implicit scheme. In the context of optimisation problems, the operators A and B are commonly chosen based on first-order optimality conditions, which are subgradient inclusions [45, 89]. Here, we choose the saddle-point formulation (73) and the associated optimality conditions:

For instance, choosing A and B as the first and second operator in the above splitting, respectively, leads to the iteration outlined in algorithm 5: Indeed, in terms of x = (u, w), y = (v, λ) and $\mathcal{K}=\left(\begin{matrix}\hfill {D}_{h}^{1}\hfill & \hfill {D}_{h}^{2}\hfill \\ \hfill {K}_{h}\hfill & \hfill 0\hfill \end{matrix}\right)$, the resolvent for the linear operator A corresponds to solving the linear system

which is reflected by the linear subproblem and dual update in algorithm 5. The resolvent for B further corresponds to the application of proximal mappings, also see proposition 6.11, where the involved proximal operators are the same as for the primal-dual iteration in algorithm 1. The iteration can be shown to converge for each σ > 0, see, e.g., [45].

Algorithm 5. Douglas–Rachford scheme for the numerical solution of (73).

1: function Tikhonov-DR(Kh , fh , α)
2:  $\left(u,w\right){\leftarrow}\left(0,0\right),\left(v,\lambda \right){\leftarrow}\left(0,0\right),\left(\overline{v},\overline{\lambda }\right){\leftarrow}\left(0,0\right)$
3:  Choose σ > 0
4:  repeat
5:     Linear subproblem
6:     $\left(u,w\right){\leftarrow}{\left(\begin{matrix}\hfill \mathrm{i}\mathrm{d}+{\sigma }^{2}\left({K}_{h}^{{\ast}}{K}_{h}+{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{1}\right)\hfill & \hfill {\sigma }^{2}{\left({D}_{h}^{1}\right)}^{{\ast}}{D}_{h}^{2}\hfill \\ {\sigma }^{2}{\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{1}\hfill & \hfill \mathrm{i}\mathrm{d}+{\sigma }^{2}{\left({D}_{h}^{2}\right)}^{{\ast}}{D}_{h}^{2}\hfill \end{matrix}\right)}^{-1}\cdot \left(\begin{matrix}\hfill u-\sigma \left({K}_{h}^{{\ast}}\overline{\lambda }+{\left({D}_{h}^{1}\right)}^{{\ast}}\overline{v}\right)\hfill \\ w-\sigma {\left({D}_{h}^{2}\right)}^{{\ast}}\overline{v}\hfill \end{matrix}\right)$
7:     Dual update
8:     $v{\leftarrow}\overline{v}+\sigma \left({D}_{h}^{1}u+{D}_{h}^{2}w\right)$
9:     $\lambda {\leftarrow}\overline{\lambda }+\sigma {K}_{h}u$
10:   Proximal update
11:   $\overline{v}{\leftarrow}\overline{v}+{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{\left\{{\Vert}\cdot {{\Vert}}_{\infty ,{\alpha }^{-1}}{\leqslant}1\right\}}\left(2v-\overline{v}\right)-v$
12:   $\overline{\lambda }{\leftarrow}\overline{\lambda }+{\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{x}}_{\sigma {S}_{{f}_{h}}^{{\ast}}}\left(2\lambda -\overline{\lambda }\right)-\lambda $
13:   until stopping criterion fulfiled
14:   return u
15: end function

As for ADMM, the linear subproblem in algorithm 5 can be avoided by preconditioning. Basically, for the above Douglas–Rachford iteration, the same types of preconditioners can be applied as for ADMM, ranging from the Richardson-type preconditioner that was discussed in detail before to symmetric Gauss–Seidel and SSOR-type preconditioners [40]. In particular, the potential of the latter for TGV-regularised imaging problems was shown in [41].

While all three discussed classes of algorithms, i.e., the primal-dual method, ADMM, and the Douglas–Rachford iteration can in principle be used to solve the discrete Tikhonov minimisation problem we are interested in, experience shows that the primal-dual method is usually easy to implement as it only involves forward evaluations of the involved linear operators and simple proximal operators, and thus suitable for prototyping. It needs, however, norm estimates for the forward operator and a possible rescaling. ADMM is, in turn, a very popular algorithm whose advantage lies, for instance, in its unconditional convergence (the parameter τ > 0 can be chosen arbitrarily). Also, in comparison to the primal-dual method, ADMM is observed to admit, in relevant cases, a more stable convergence behaviour, meaning less oscillations and faster objective functional reduction in the first iteration steps. However, ADMM requires the solution of a linear subproblem in each iteration step which might be expensive or call for preconditioning. The same applies to the Douglas–Rachford iteration which is also unconditionally convergent, comparably stable and usually involves the solution of a linear subproblem in each step. In contrast to ADMM it bases, however, on the same saddle-point formulation as the primal-dual methods such that translating a prototype primal-dual implementation into a more efficient Douglas–Rachford implementation with possible preconditioning is more immediate.

7. Applications in image processing and computer vision

7.1. Image denoising and deblurring

Image denoising is a simple yet heavily addressed problem in image processing (see for instance [127] for a review) as it is practically relevant by itself and, in addition, allows to investigate the effect of different smoothing and regularisation approaches independent of particular measurements setups or forward models. The standard formulation of variational denoising assumes Gaussian noise and, consequently, employs an L2-type data fidelity. Allowing for more general noise models, the denoising problem reads as

where we assume Sf : Lp (Ω) → [0, ], with p ∈ [1, ], to be proper, convex, lower semi-continuous and coercive, and ${\mathcal{R}}_{\alpha }$ to be an appropriate regularisation functional. This setting covers, for instance, Gaussian noise (with ${S}_{f}\left(u\right)=\frac{1}{2}{\Vert}u-f{{\Vert}}_{2}^{2}$), impulse noise (with Sf (u) = ||uf||1) and Poisson noise (with Sf (u) = KL(u, f)). With first- or higher-order TV regularisation, additive or infimal-convolution-based combinations thereof, or TGV regularisation, the denoising problem is well-posed for any of the above choices of Sf . For ${S}_{f}\left(u\right)=\frac{1}{q}{\Vert}u-f{{\Vert}}_{q}^{q}$ and q > 1, also regularisation with ${\mathcal{R}}_{\alpha }\left(u\right)=\alpha {\Vert}{\Delta}u{{\Vert}}_{\mathcal{M}}$ is well-posed and figure 11 summarises, once again, the result of these different approaches for q = 2 and Gaussian noise on a piecewise affine test image. It further emphasises again the appropriateness of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ as a regulariser for piecewise smooth images.

Figure 11.

Figure 11. Comparison of different first- and second-order image models for variational image denoising with L2-discrepancy. Left column: the original image (top) and noisy input image (bottom). Columns 2–4: results for variational denoising with different regularisation terms. The parameters were manually optimised for best PSNR (see figures 14 and 6 for the PSNR values).

Standard image High-resolution image

In order to visualise the difference between different orders of TGV regularisation, figure 12 considers a piecewise smooth image corrupted by Gaussian noise and compares TGV regularisation with orders k ∈ {2, 3}. It can be seen there that third-order TGV yields a better approximation of smooth structures, resulting in an improved PSNR, while the second-order TGV regularised image has small defects resulting from a piecewise linear approximation of the data.

Figure 12.

Figure 12. Comparison of second- and third-order TGV for denoising for a piecewise smooth noisy image (PSNR: 26.0 dB). The red lines in the original image indicate the areas used in the line and surface plots. A close look on these plots reveals piecewise-linearity defects of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$, while the ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{3}$ reconstruction yields a better approximation of smooth structures and an improved PSNR (${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$: 40.7 dB, ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{3}$: 42.3 dB). Note that in the line plots, the value 0.05 was subtracted from the TGV-denoising results in order to prevent the respective plots from significantly overlapping with the plots of the original data.

Standard image High-resolution image

Another problem class is image deblurring which can be considered as a standard test problem for the ill-posed inversion of linear operators in imaging. Pick a blurring kernel kL0) with bounded domains Ω0, Ω' ⊂ Rd such that Ω' − Ω0 ⊂ Ω. Then, K : L1(Ω) → L2(Ω') given by

is well-defined, linear and continuous. Consequently, by theorems 2.11, 2.14 and proposition 5.17,

for 1 < pd/(d − 1) and ${\mathcal{R}}_{\alpha }\in \left\{\alpha \mathrm{T}\mathrm{V},{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}\right\}$ admits a solution that stably depends on the data fL2(Ω'), which we assume to be a noise-contaminated image blurred by the convolution operator K. A numerical solution can again be obtained with the framework described in section 6 and a comparison of the two choices of ${\mathcal{R}}_{\alpha }$ for a test image can be found in figure 13. We can observe that both TV and ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ are able to remove noise and blur from the image, however, the TV reconstruction suffers from staircasing artefacts which are not present with ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$.

Figure 13.

Figure 13. Deconvolution example. The original image uorig [92] has been blurred and contaminated by noise resulting in f. The images uTV and ${u}_{{\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}}$ are the regularised solutions recovered from f. Reproduced from [92]. CC BY 2.0.

Standard image High-resolution image

7.2. Compressed sensing

The next problem we would like to discuss is compressive sampling with total variation and total generalised variation [27]. More precisely, we aim at reconstructing a single-channel image from 'single-pixel camera' data [78], an inverse problem with finite-dimensional data space. Here, an image is not observed directly but only the accumulated grey values over finitely many random pixel patterns are sequentially measured by one sensor, the 'single pixel'. This can be modelled as follows. For a bounded Lipschitz image domain Ω ⊂ R2, let the measurable sets E1, ..., EM ⊂ Ω be the collection of random patterns where each Em is associated with the mth measurement. The image u is then determined by solving the inverse problem

and fRM is the measurement vector, i.e., each fm is the output of the sensor for the pattern Em . As the set of u solving this inverse problem is an affine space with finite codimension, the compressive imaging approach assumes that the image u is sparse in a certain representation which is usually translated into the discrete total variation TV(u) being small. A way to reconstruct u from f is then to solve

Equation (81)

In this context, also higher-order regularisers may be used as sparsity constraint. For instance, in [27], total generalised variation of order 2 has numerically been tested:

Equation (82)

Figure 14 shows example reconstructions for real data according to discretised versions of (81) and (82). As supported by the theory of compressed sensing [52, 53], the image can essentially be recovered from a few single-pixel measurements. Here, TGV-minimisation helps to reconstruct smooth regions of the image such that in comparison to TV-minimisation, more features can still be recognised, in particular, when reconstructing from very few samples. Once again, staircasing artefacts are clearly visible for the TV-based reconstructions, a fact that recently was made rigorous in [26, 29].

Figure 14.

Figure 14. Example for TV/TGV2 compressive imaging reconstruction for real single-pixel camera data [157]. Top: TV-based reconstruction of a 64 × 64 image from 18.75%, 9.375%, 6.25% and 4.6875% of the data (from left to right). Bottom: TGV2-based reconstruction obtained from the same data. Figure taken from [27]. Reprinted by permission from Springer Nature.

Standard image High-resolution image

7.3. Optical flow and stereo estimation

Another important fundamental problem in image processing and computer vision is the determination of the optical flow [110] of an image sequence. Here, we consider this task for two consecutive frames f0 and f1 in a sequence of images. This is often modelled by minimising a possibly joint discrepancy ${S}_{{f}_{0},{f}_{1}}\left(u\left(0\right),u\left(1\right)\right)$ for u : [0, 1] × Ω → R subject to the optical flow constraint $\frac{\partial u}{\partial t}+\nabla u\cdot v=0$, see, for instance, [24]. Here, v : [0, 1] × Ω → Rd is the optical flow field that shall be determined. In order to deal with ill-posedness, ambiguities as well as occlusion, the vector field v needs to be regularised by a penalty term. This leads to the PDE-constrained problem

where ${\mathcal{R}}_{\alpha }$ is a suitable convex regulariser for vector field sequences. Usually, ${S}_{{f}_{0},{f}_{1}}$ is chosen such that the initial condition u(0) is fixed to f0, for instance, ${S}_{{f}_{0},{f}_{1}}\left({u}_{0},{u}_{1}\right)={\mathcal{I}}_{\left\{{f}_{0}\right\}}\left({u}_{0}\right)+\frac{1}{2}{{\Vert}{u}_{1}-{f}_{1}{\Vert}}_{2}^{2}$, see [24, 64, 103, 118].

In many approaches, this problem is reformulated to a correspondence problem. This means, on the one hand, replacing the optical flow constraint by the displacement introduced by a vector field v0 : Ω → R2, i.e., u(0) = u0 and u(1) = u0 ◦ (id + v0). The image u0 : Ω → R is either prespecified or subject to optimisation. For instance, choosing again ${S}_{{f}_{0},{f}_{1}}\left({u}_{0},{u}_{1}\right)={\mathcal{I}}_{\left\{{f}_{0}\right\}}\left({u}_{0}\right)+\frac{1}{2}{{\Vert}{u}_{1}-{f}_{1}{\Vert}}_{2}^{2}$ leads to the classical correspondence problem

see, for instance, [110], which uses the square of the H1-seminorm as a regulariser. On the other hand, other approaches have been considered for the discrepancy (and regularisation), see [47, 196]. In this context, a popular concept is the census transform [195] that describes the local relative behaviour of an image and is invariant to brightness changes. For an image f : Ω → R, measurable patch Ω' ⊂ R2 and threshold ɛ > 0, it is defined as

Here, one usually sets u0 = f0 and u1 = f1 such that the discrepancy only depends on the vector field v0, such as, for instance,

leading to the optical-flow problem

see, for instance, [137, 188]. A closely related problem is stereo estimation which can also be modelled as a correspondence problem. In this context, f0 and f1 constitute a stereo image pair, for instance, f0 being the left image and f1 being the right image. The stereo information is then usually reflected by the disparity which describes the displacement of the right image with respect to the left image. This corresponds to setting the vertical component of the displacement field v0 to zero, for instance, ${\left({v}_{0}\right)}_{2}=0$. Census-transform based discrepancies are also used for this task [152], leading to the stereo-estimation model

Equation (83)

with a suitable convex regulariser ${\mathcal{R}}_{\alpha }$ for scalar disparity images.

Both optical flow and stereo estimation are non-convex due to the non-convex data terms and require dedicated solution techniques. One possible approach is to smooth the discrepancy functional such that it becomes (twice) continuously differentiable, and approximate it, for each x ∈ Ω, by either first or second-order Taylor expansion. For the latter case, if one also projects the pointwise Hessian to the positive semi-definite cone, one arrives at the convex problem

where v0 is the base vector field for the Taylor expansion and ${\mathrm{p}\mathrm{r}\mathrm{o}\mathrm{j}}_{{S}^{+}}:{S}^{2{\times}2}\to {S}_{+}^{2{\times}2}$ denotes the orthogonal projection to the cone of positive semi-definite matrices ${S}_{+}^{2{\times}2}$. Besides classical regularisers such as the H1-seminorm, the total variation has been chosen [193], i.e., ${\mathcal{R}}_{\alpha }=\alpha \mathrm{T}\mathrm{V}$, which allows the identification of jumps in the displacement field associated with object boundaries. The displacement field is, however, piecewise smooth such that ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ turns out to be advantageous. Further improvements can be achieved by non-local total generalised variation NLTGV2, see [154], leading to sharper and more accurate motion boundaries, see figure 15. For stereo estimation, a similar approach using first-order Taylor expansion and image-driven total generalised variation ${\mathrm{I}\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ also yields very accurate disparity images [152].

Figure 15.

Figure 15. Example for higher-order approaches for optical flow determination. (a) An optical flow field obtained on a sample dataset from the Middlebury benchmark [11] using a ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ regulariser. (b) An enlarged detail of (a). (c) The optical flow field obtained by a NLTGV2 regulariser. (d) An enlarged detail of (c). Images taken from [154]. Reprinted by permission from Springer Nature.

Standard image High-resolution image

A different concept for solving the non-convex optical flow/stereo estimation problem is functional lifting [4, 55]. For the stereo estimation problem, this means to recover the characteristic function of the subgraph of the disparity image, i.e., ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$. Assume that the discrepancy for the disparity w0 can be written in integral form, i.e., ${S}_{{f}_{0},{f}_{1}}\left({w}_{0}\right)={\int }_{{\Omega}}g\left(x,{w}_{0}\left(x\right)\right)\enspace \mathrm{d}x$ with a suitable g : Ω × RR that is possibly non-convex with respect to the second argument. If w0 is of bounded variation, then ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ is also of bounded variation and the weak derivative with respect to x and t, respectively, are Radon measures. Denoting by vx and vt the respective components of a vector, i.e., v = (vx , vt ) ∈ R2 × R, these derivatives satisfy the identity $\frac{\partial }{\partial t}{\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}={\left(\frac{\nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}}{\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert }\right)}_{t}\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert $ as well as ${\nabla }_{x}{\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}={\left(\frac{\nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}}{\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert }\right)}_{x}\left\vert \nabla {\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}\right\vert $. The discrepancy term can then be written in the form

which is convex with respect to ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$. In many cases, regularisation functionals can also be written in terms of ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$, for instance, by the coarea formula,

which is again convex with respect to ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$. As the set of all ${\chi }_{\left\{t{\leqslant}{w}_{0}\right\}}$ is still non-convex, this constraint is usually relaxed to a convex set, for instance, to the conditions

Equation (84)

where the limits have to be understood in a suitable sense. Then, the stereo problem (83) with total-variation regularisation can be relaxed to the convex problem

Then, optimal solutions u* for the above problem yield minimisers of the original problem when thresholded, i.e., for $s\in \left.\right]0,1\left[\right.$, the function ${\chi }_{\left\{s{\leqslant}{u}^{{\ast}}\right\}}$ is the characteristic function of the subgraph of a w0 that is optimal for (83) for the assumed discrepancy and total-variation regularisation [145].

Unfortunately, a straightforward adaptation of this strategy to higher-order total-variation-type regularisation functionals is not possible. For TGV2, one can nevertheless benefit from the convexification approach. Considering the ${\mathrm{T}\mathrm{G}\mathrm{V}}_{2}^{\alpha }$-regularised problem

Equation (85)

one sees that the problem is convex in w and minimisation with respect to w0 can still be convexified by functional lifting. For fixed w, the latter leads to

which is again convex and whose solutions can again be thresholded to yield a ${w}_{0}^{{\ast}}$ that is optimal with respect to w0 for a fixed w. Alternating minimisation then provides a robust solution strategy for (85) based on convex optimisation [155], see figure 16 for an example. In this context, algorithms realising functional lifting strategies for TV and TGV regularisation have recently further been refined, for instance, in order to lower the computational complexity associated with the additional space dimension introduced by the lifting, see, e.g. [135, 181].

Figure 16.

Figure 16. Total-generalised-variation-regularised stereo estimation based on functional lifting and convex optimisation for an image pair of the KITTI dataset [91]. (a) The reference image. (b) The disparity image obtained with TGV2-regularisation [155]. Reprinted from [153] by permission from Springer Nature Customer Service Centre GmbH: Springer © 2013.

Standard image High-resolution image

7.4. Image and video decompression

Pixelwise representations of image or image sequence data require, on the one hand, a large amount of digital storage but contain, on the other hand, enough redundancy to enable compression. Indeed, most digitally stored images and image sequences, e.g., on cameras, mobile phones or the world-wide web are compressed. Commonly-used lossy compression standards such as JPEG, JPEG2000 for images and MPEG for image sequences, however, suffer from visual artefacts in decompressed data, especially for high compression rates.

Those artefacts result from errors in the compressed data due to quantisation, which is not accounted for in the decompression procedure. These errors, however, can be well described using the data that is available in the compressed file and in particular, precise bounds on the difference of the available data and the unknown, ground truth data can be obtained. This observation motivates a generic approach for an improved decompression of such compressed image or video data, which consists of minimising a regularisation functional subject to these error bounds, see for instance [5, 31, 200] for TV-based works in this context. Following this generic approach, we present here a TGV-based reconstruction method (see [33, 34]) that allows for a variational reconstruction of still images from compressed data that is directly applicable to the major image compression standards such as JPEG, JPEG2000 or the image compression layer of the DjVu document compression format [108]. A further extension of this model to the decompression of MPEG encoded video data will be addressed afterwards.

The underlying principle of a broad class of image and video compression standards, and in particular of JPEG and JPEG 2000 compression, is as follows: first, a linear transformation is used to transform the image data to a different representation where information that is more and less important for visual image quality is well separated. Then, a weighted quantisation of this data (according to its expected importance for visual image quality) is carried out and the quantised data (together with information that allows to obtain the quantisation accuracy) is stored. Thus, defining K to be the linear transformation used in the compression process and D to be a set of admissible, transformed image data that can be obtained using the information available in the compressed file, decompression amounts to finding an image u such that KuD. Using the TGV functional to regularise this compression procedure and considering colour images u : Ω → R3, we arrive at the following minimisation problem:

Equation (86)

where K : L2(Ω, R3) → 2 is an analysis operator related to a Riesz basis of L2(Ω, R3), and a Frobenius-norm-type coupling of the colour channels is used in TGV, see subsection 5.3. The coefficient dataset D2 reflects interval restrictions on the coefficients, i.e., is defined as D = {v2|vn Jn for all nN} for {Jn } a family of closed intervals. In case D is bounded, well-posedness of this approach can be obtained via a direct extension of proposition 5.17 to R3-valued functions, which in particular requires a multi-channel version of the Poincaré inequality for TGV as in proposition 5.15. The latter can straightforwardly be obtained by equivalence of norms in finite dimensions, see for instance [27, 33]. Beyond that, existence of a solution to (86) can be guaranteed also in case of a non-coercive discrepancy when arbitrarily many of the intervals Jn are unbounded, provided that only finitely many of them are half-bounded, i.e., are the form ${J}_{n}= \left.\right]- \infty ,{c}_{n} \left.\right]$ or ${J}_{n}=\left[\right.{c}_{n},\infty \left[\right. $ for cn R, see [33]. In compression, half-bounded intervals would correspond to knowing only the sign but not the precision of the respective coefficient, a situation which does not occur in JPEG, JPEG2000 and DjVu. Thus, in all relevant applications, all intervals are either bounded or all of R, and hence, solutions exist. Further, under the assumption that all but finitely many intervals have a width that is uniformly bounded from below, again an assumption which holds true in all anticipated applications, optimality conditions for (86) can be obtained.

In the application to JPEG decompression, colour images are processed in the YCbCr colour space and the basis transformation operator K corresponds to a colour subsampling followed by a block- and channel-wise discrete cosine transformation, which together can be expressed as Riesz-basis transform. The interval sequence {Jn } can be obtained using a quantisation matrix that is available in the encoded file and each interval Jn is bounded.

In the application to JPEG2000 decompression, again the YCbCr colour space is used and K realises a colour-component-wise biorthogonal wavelet transform using Le Gall 5/3 or CDF 9/7 wavelets as defined in [65, tables 6.1 and 6.2]. Obtaining bounds on the precision of the wavelet coefficients is more involved than with JPEG (see [33, section 4.3]), but can be done by studying the bit truncation scheme of JPEG2000 in detail. As opposed to JPEG, however, the intervals Jn might either be bounded or unbounded.

A third application of the model (86) is a variational decompression of the image layers of a DjVu compressed document. DjVu [100] is a storage format for digital documents. It encodes document pages via a separation into fore- and background layers as well as a binary switching mask, where the former are encoded using a lossy, transform-based compression and the latter using a dictionary-based compression. While the binary switching mask typically encodes fine details such as written text, the fore- and background layer encode image data, which again suffers from compression artefacts that can be reduced via variational decompression. Here, the extraction of the relevant coefficient data together with error bounds has to account for the particular features of the DjVu compression standard (we refer to [108] and its supplementary material for a detailed description and software that extracts the relevant data from DjVu compressed files), but the overall model for the image layers is again similar to the one of JPEG and JPEG2000 decompression. In particular, encoding of the fore- and background layer can be modelled with the operator K, in this case corresponding to a colour-component-wise wavelet transformation using the Dubuc–Deslauriers–Lemire (DDL) (4, 4) wavelets [75], and the data intervals Jn , which are again either bounded or all of R.

In all of the above applications, a numerical solution of the corresponding particular instance of the minimisation problem (86) can be obtained using the primal-dual framework [34] as described in section 6 (see [34] for details). We refer to figure 17 for exemplary results using second-order TGV regularisation. Regarding the implementation, relevant differences arise depending on whether the projection onto the dataset UD can be carried out explicitly or not, the latter requiring a dualisation of this constraint and an additional dual variable. Only in the application to JPEG decompression, this projection is explicit due to orthonormality of the cosine transform and the particular structure of the colour subsampling operator. This has the particular advantage that, at any iteration of the algorithm, the solution is feasible and one can, for instance, apply early stopping techniques to obtain already quite improved decompressed images in a computationally cheap way.

Figure 17.

Figure 17. Example of variational image decompression. Standard (left column) and TGV-based (right column) decompression for a JPEG image (top row) compressed to 0.15 bits-per-pixel (bpp), a JPEG2000 image (middle row) compressed to 0.3 bpp, and a DjVu-compressed document page (bottom row) with close-ups. Results from [34] (rows 1 and 2) and [108] (bottom row). Reproduced with permission from [90]. Copyright © 2018 Society for Industrial and Applied Mathematics.

Standard image High-resolution image

Variational MPEG decompression. The MPEG video compression standard builds on JPEG compression for storing frame-wise image data, but incorporates additional motion prediction and correction steps which can significantly reduce storage size of video data. In MPEG-2 compression, which is a tractable blueprint for the MPEG compression family, video data is processed as subsequent groups of pictures (typically 12–15 frames) which can be handled separately. In each group of pictures, different frame types (I, P and B frames) are defined, and, depending to the frame type, image data is stored by using motion prediction and correction followed by a JPEG-type compression of the corrected data. Similar to JPEG compression, colour images are processed in the YCbCr colour space and additional subsampling of colour components is allowed.

While these are the main features of a typical MPEG video encoder, as usual for most compression standards, the MPEG standard defines the decompression procedure rather than compression. Hence, since compression might differ for different encoders, we build a variational model for MPEG decompression that works with a decoding operator (see [35] for more details on MPEG and the model): using the information (in particular motion correction vectors and quantisation tables) that is stored in the MPEG compressed files, we can define a linear operator K that maps encoded, motion corrected cosine-transform coefficient data to (colour subsampled) video data. Furthermore, bounds on the coefficient data can be obtained. Using a second operator S to model colour subsampling and choosing a right-inverse Ŝ, MPEG decompression amounts to finding a video u such that

where vD with D being the admissible set of cosine coefficient data, and s ∈ ker(S) compensates for the colour upsampling of Ŝ. Incorporating the infimal-convolution of second-order spatio-temporal TGV functionals as regularisation for video data (see subsection 5.3 and [35, 109]), decompression then amounts to solve

Again, the minimisation problem can be solved using duality-based convex optimisation methods as described in section 6 and we refer to figure 18 for a comparison of standard MPEG-2 decompression and the result obtained with this model.

Figure 18.

Figure 18. Example of variational MPEG decompression. Standard (top row) and ICTGV-based (bottom row) decompression of the Juggler image sequence from [11]. On the left, the second frame (P-frame) is shown in detail while on the right, all 8 frames are depicted. Reprinted from [35] by permission from Springer Nature Customer Service Centre GmbH: Springer © 2015.

Standard image High-resolution image

8. Applications in medical imaging and image reconstruction

8.1. Denoising of dual-energy computed-tomography (CT) data

Since its development in the 1970s, computed x-ray tomography (CT) became a standard tool in medical imaging. As CT bases on x-rays, the health risks associated with ionising radiation is certainly a drawback of this imaging technique. Further, the acquired images do in general not allow to differentiate objects with the same density. For the former point, a low radiation dose is an important goal, being, of course in conflict with the demand of a high signal-to-noise ratio (SNR). Regarding the differentiation of objects with the same density [88, 115], a recently developed approach bases on an additional dataset from a second x-ray source (typically placed in a 90° offset) which possess a different spectrum (or energy) compared to the standard x-ray emitter in CT, the dual-energy CT device, see figure 19(a).

Objects of different material having the same response for one x-ray source may admit a different response for the second source, making a differentiation possible. A relevant application of this principle is, for instance, the quantification of contrast agent concentration. Adjusting a dual-energy CT device such that normal tissue is insensitive for both x-ray sources and sensitive for an administered contrast agent allows to infer its concentration from the difference of the two acquired images, see figure 19(b). This may be useful, for instance, for recognising perfusion deficits and thus aid the diagnosis of, e.g., pulmonary embolism in the lung [131]. However, due to low doses for the dual-energy CT scan as well as a limited sensitivity with respect to the contrast agent, the difference image can be noisy and denoising is required in order to obtain a meaningful interpretation, see figure 19(c).

Figure 19.

Figure 19. Example of L1–TGV2 denoising for dual energy computed tomography. (a) A schematic of a dual-energy CT device. (b) A pair of (reconstructed) dual-energy CT images. (c) A noisy difference image with marked perfusion deficit region. (d) Difference image of the TGV-denoised dataset (3D denoising, only one slice is shown).

Standard image High-resolution image

In the following, a variational denoising approach is derived that takes the structure of the problem into account. First, let A0 and B0 be the noisy CT-reconstructions associated with the respective x-ray source. Then, as the difference image contains the relevant information, we would like to impose regularity on the difference image AB as well as a 'base' image B instead of penalising each image separately. As we may assume that the contrast agent concentration as well as the density is piecewise smooth, is admits a low total generalised variation, and hence, we choose this functional as a penalty, for instance of second order. Furthermore, as the results should be usable for a quantification, we have to account for that and therefore choose an L1-fidelity term as this is known to possess desirable contrast-preservation properties in conjunction with TV and TGV [38, 62]. In total, this leads to the variational problem

where A0, B0L1(Ω) is given and α = (α0, α1) as well as α' = (α0', α1') are positive regularisation parameters. Having the application in mind, the domain Ω is typically a bounded three-dimensional domain with Lipschitz boundary. Then, existence of minimisers can be obtained using the tools from section 5, see proposition 5.17, which nevertheless requires some straightforward adaptations. Due to the lack of strict convexity, however, the solutions might be non-unique. Further, numerical algorithms can be developed along the lines of section 6, for instance, a primal-dual algorithm as outlined in subsection 6.2. In case of non-uniqueness, the minimisation procedure 'chooses' one solution in the sense that it converges to one element of the solution set, such that variational model and optimisation algorithm cannot clearly be separated and other results might be possible using different optimisation algorithms.

Figure 19(d) shows denoising results for the primal-dual algorithm, where a clear improvement of image quality for the difference image in comparison to figure 19(c) can be observed. In particular, the total generalised variation model is suitable to recover the smooth distribution of the contrast agent within the lung, including the perfusion deficit region, as well as the discontinuities induced by bones, vessels, etc. Further, one can see that the dedicated modelling of the problem as a denoising problem for a difference image based on two datasets turns out to be beneficial. A denoising procedure that only depends on the noisy difference image would not allow for such an improvement of image quality.

8.2. Parallel reconstruction in magnetic resonance imaging

Magnetic resonance imaging (MRI) is a tomographic imaging technique that is heavily used in medical imaging and beyond. It builds on an interplay of magnetic fields and radio-frequency pulses, which allows for localised excitation and, via induction of current in receiver coils, for a subsequent measurement of the proton density inside the object of interest [46]. In the standard setting, MRI delivers qualitative images visualising the density of hydrogen protons, e.g., inside the human body. Its usefulness is in particular due to an excellent soft tissue contrast (as opposed to computed tomography) and a high spatial resolution of MR images. The trade-off, in particular for the latter, is the long measurement time, which comes with obvious drawbacks such as patient discomfort, limitations on patient throughput and imaging artefacts resulting from temporally inconsistent data due to patient motion.

Subsampled data acquisition and parallel imaging [97, 149, 179] (combined with appropriate reconstruction methods) are nowadays standard techniques to accelerate MRI measurements. As the data in an MR experiment is acquired sequentially, a reduced number of measurements directly implies a reduced measurement time, however, in order to maintain the same image resolution, the resulting lack of data needs to be compensated for by other means. Parallel imaging achieves this to some extent by using not a single but multiple measurement coils and combining the corresponding measured signals for image reconstruction. On top of that, advanced mathematical reconstruction methods such as compressed sensing techniques [21, 132] or, more general, variational reconstruction have been shown to allow for a further, significant reduction of measurement time with a negligible loss of image quality.

In this context, transform-based regularisation techniques [132, 133] and derivative-based techniques [21, 121] are among the most popular approaches. More recently, also learning-based methods building on the structure of variational approaches have become very popular [101, 150]. Here, we focus on variational regularisation approaches with first- and higher-order derivatives. To this aim, we first deal with the forward model of parallel, static MR imaging.

In a standard setting, the MR measurement process can be modelled as measuring Fourier coefficients of the unknown image. In order to include measurements from multiple coils, spatially varying sensitivity profiles of these coils also need to be included in the forward model via a pointwise multiplication in image space. Subsampled data acquisition then corresponds to measuring the Fourier coefficients only on a certain measurement domain in Fourier space, which is defined by a subsampling pattern. Let ${c}_{1},\dots ,{c}_{k}\in {\mathcal{C}}_{0}\left({\mathbf{R}}^{d},\mathbf{C}\right)$ be functions modelling some fixed coil sensitivity profiles for k receiver coils, let σ be a positive, finite Radon measure on Rd that defines the sampling pattern, and let Ω ⊂ Rd be a bounded Lipschitz domain that represents the image domain. Then, following the lines of [30], we define, for p ∈ [1, ], the MR measurement operator $K:{L}^{p}\left({\Omega},\mathbf{C}\right)\to {L}_{\sigma }^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ as

Equation (87)

where we extend u by zero to Rd . Note that for each uLp (Ω, C), Ku as a function on Rd is bounded and continuous which follows from

Thus, since σ is finite, K indeed linearly and continuously maps into ${L}_{\sigma }^{2}\left({\mathbf{R}}^{d},\mathbf{C}\right)$.

While here, we assume the coil sensitivities to be known (such that the forward model is linear), obtaining them prior to image reconstruction is non-trivial and we refer to [21, 166, 185, 190] for some existing methods. In the experiments discussed below, we followed the approach of [21] and employed, for each individual coil, a variational reconstruction with a quadratic regularisation on the derivative (H1-regularisation) followed by the convolution with a smoothing kernel. For each coil, the sensitivity profile was then obtained by division with the sum-of-squares image (which is, despite its name, the pointwise square root of the sum of the squared modulus of the individual coil images).

A regularised reconstruction from MR measurement data $f\in {L}_{\sigma }^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ can be obtained by solving

Equation (88)

where we test with both ${\mathcal{R}}_{\alpha }=\alpha \mathrm{T}\mathrm{V}$ and ${\mathcal{R}}_{\alpha }={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$, in which case we can choose 1 < pd/(d − 1). Note that well-posedness for (88) follows from theorems 2.11, 2.14 in the case of TV and from proposition 5.17 in the case of ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ (where straightforward adaptions are necessary to include complex-valued functions). Numerically, the optimisation problem can be solved using the algorithmic framework described in section 6, where again, some modifications are necessary to deal with complex-valued images.

Figure 20 compares the results between these two choices of regularisation functionals and a conventional reconstruction based on direct Fourier inversion using non-uniform fast Fourier transform (NUFFT) [87] for different subsampling factors and a dataset for which a fully sampled ground truth is available. Undersampled 2D radial spin-echo measurements of the human brain were performed with a clinical 3T scanner using a receive-only 12 channel head coil. Sequence parameters were: TR = 2500 ms, TE = 50 ms, matrix size 256 × 256, slice thickness 2 mm, in-plane resolution 0.78 mm × 0.78 mm. The sampling direction of every second spoke was reversed to reduce artefacts from off-resonances [22], and numerical experiments were performed using 96, 48 and 24 projections. As $\frac{\pi }{2}N$ projections (402 for N = 256 in our case) have to be acquired to obtain a fully sampled dataset in line with the Nyquist criterion [18], this corresponds to undersampling factors of approximately 4, 8 and 16. The raw data was exported from the scanner, and image reconstruction was performed offline.

Figure 20.

Figure 20. Parallel undersampling MRI of the human brain (256 × 256 pixels) from 96, 48 and 24 radial projections (top, middle, bottom row). Left column: conventional NUFFT reconstruction. Middle column: reconstruction with TV regularisation. Right column: reconstruction with TGV2 regularisation. All reconstructed images are shown with a closeup of the lower right brain region.

Standard image High-resolution image

It can be seen that in particular at higher subsampling factors, variational, derivative-based reconstruction reduces artefacts stemming from limited Fourier measurements. Both TV and TGV perform well, while a closer look reveals that staircasing artefacts present with TV can be avoided using second-order TGV regularisation.

8.3. Diffusion tensor imaging

Magnetic resonance imaging offers, apart from obtaining morphological images as outlined in subsection 8.2, many other possibilities to acquire information about the imaged objects. Among these possibilities, diffusion tensor imaging (DTI) is one of the more recent developments. It aims at measuring the diffusion directions of water protons in each spatial point. The physical background is given by the Bloch–Torrey equation which describes the spatio-temporal evolution of magnetisation vector taking diffusion processes into account [183]. Based on this, diffusion-weighted imaging can be performed which uses dedicated MR sequences depending on a direction vector qR3 in order to obtain displacement information associated with that direction.

This leads to the following model. Assume that ρ0 : R3R is the proton density to recover and ρt : R3 × R3R is the function such that for each x, x' ∈ R3, the value ρt (x, x') represents the probability of a proton moving from x to x' during the time t > 0. By applying a diffusion-sensitive sequence (such as, e.g., a pulsed-gradient spin echo [180]) associated with the vector qR3, one is able to measure in k-space as follows:

where kR3, see [51, 67]. Note that in practice, also the coil sensitivity profile would influence the measurement as outlined in subsection 8.2, however, for the sake of simplicity, we neglect this aspect in the following. Now, sampling q across R3 would then, in principle, allow to recover the six-dimensional function u : (x, x') ↦ ρ0(x)ρt (x, x') by inverse Fourier transform, since $S\left(k,q\right)=\left(\mathcal{F}u\right)\left(k-q,q\right)$ for each k, qR3. The 6D-space spanned by the coordinates k and q is called kq -space. Assuming that for a fixed qR3, the k-space is fully sampled then allows to recover fq : R3C by inverse Fourier transform, where

Obtaining and analysing fq for a coverage of the q-space is called q -space imaging which also is the basis of orientation-based analysis such as q -ball imaging [184]. However, as these techniques require too much measurement time in practice, one usually makes assumptions about the structure of ρt in order to avoid the measurement of fq for too many q.

Along this line, the probably simplest model is to assume that for each x, ρt (x, ⋅) follows a Gaussian distribution centred around x with symmetric positive definite covariance matrix 2tD(x) ∈ S3×3, i.e.,

For fixed xR3, this can be interpreted as the fundamental solution of the diffusion equation

shifted by x and evaluated at time t. The model for ρt thus indeed reflects linear diffusion through a homogeneous medium. This makes sense as diffusion during the measurement process is usually orders of magnitudes smaller than the spatial scale one is interested in, but the homogeneity assumption might also be violated in case when microstructures are present. Nevertheless, with this assumption, in the above case of full k-space sampling, one gets

Equation (89)

Clearly, for q = 0, we have f0 = ρ0, and assuming ρ0 > 0 almost everywhere leads to the following pointwise equation that is linear in D:

Hence, one can recover by D by measuring f0 and ${f}_{{q}_{1}},\dots ,{f}_{{q}_{m}}$ for q1, ..., qm R3 suitably chosen, i.e., such that in particular, D is uniquely determinable from D ⋅ (qi qi ) for i = 1, ..., m. In particular, one requires that the symmetric tensors q1q1, ..., qm qm span the space Sym2(R3), meaning that m must be at least 6. Note that according to (89), fq must be real and non-negative, such that in practice, it suffices to reconstruct the absolute value of fq , for instance, by computing the sum-of-squares image.

The inverse problem for D can then be described as follows. Restricting the considerations to a bounded domain Ω ⊂ R3 and letting p ∈ [1, ] such that g1, ..., gm Lp (Ω) where ${g}_{i}=-\frac{1}{t}\;\mathrm{log}\left({f}_{{q}_{i}}/{f}_{0}\right)$, we aim at solving

Equation (90)

for DLp (Ω, Sym2(R3)). It is easy to see that this problem is well-posed, but regularisation is still necessary in practice as the measurements and the reconstruction are usually very noisy. To do so, one can, in the case p = 2, minimise a Tikhonov functional with quadratic discrepancy term and positive semi-definiteness constraints:

Equation (91)

Here, {D ⩾ 0} denotes the set of symmetric tensor fields that are positive semi-definite almost everywhere in Ω. Further, the regulariser ${\mathcal{R}}_{\alpha }$ is preferably tailored to the structure of symmetric tensor fields. Since the D to recover can be assumed to admit discontinuities, for instance, at tissue borders, the total deformation TD for Sym2(R3)-valued functions as described in subsection 3.2, constitutes a meaningful regulariser. In this context, higher-order regularisation via ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ for Sym2(R3)-valued functions according to definition 5.1 can be beneficial as, e.g., principal diffusion directions might smoothly vary within the same tissue type [186, 187]. In both cases, problem (91) is well-posed, admits a unique solution which can, once discretised, be found numerically by the algorithms outlined in section 6.

Once the diffusion tensor field D is obtained, one can use it to visualise some of its properties. For instance, in the context of medical imaging, the eigenvectors and eigenvalues of D play a role in interpreting DTI data. Based on the fractional anisotropy [12], which is defined as

where ${\lambda }_{1},{\lambda }_{2},{\lambda }_{3}:{\Omega}\to \left[\right. 0,\infty \left[\right.$ are the eigenvalues of D as a function in Ω, one is able to identify isotropic regions (FAD ≈ 0) as well as regions where diffusion only takes place in one direction (FAD ≈ 1). The latter case indicates the presence of fibres whose orientation then corresponds to a principal eigenvalue of D. Figure 21 shows an example of DTI reconstruction from noisy data using TD and TGV2 regularisation for symmetric tensor fields, using principal-direction/fractional-anisotropy-based visualisation. It turns out that also here, higher-order regularisation is beneficial for image reconstruction [187]. In particular, the faithful recovery of piecewise smooth fibre orientation fields may improve advanced visualisation techniques such as DTI-based tractography.

Figure 21.

Figure 21. Example of TD- and TGV2-regularised diffusion tensor imaging reconstruction. (a) Ground truth, (d) direct reconstruction from noisy data, (b) direct inversion of (90) followed by TD-denoising, (c) Tikhonov regularisation according to (91) with TD-regulariser, (e) direct inversion of (90) followed by TGV2-denoising, (f) Tikhonov regularisation according to (91) with TGV2-regulariser. All images visualise one slice of the respective 3D tensor fields.

Standard image High-resolution image

8.4. Quantitative susceptibility mapping

Magnetic resonance imaging also has capabilities for the quantification of certain material properties. One of these properties is the magnetic susceptibility which quantifies the ability of a material to magnetise in a magnetic field such as the static field that is used in MRI. Recovering the susceptibility distribution of an object is called quantitative susceptibility mapping (QSM) [72, 177].

Assuming that the static field is aligned with the z-axis of a three-dimensional coordinate system, this susceptibility can be related to the z-component of the static field inhomogeneity δB0 : R3R that is caused by the material, which in turn induces a shift in resonance frequency and, consequently, a phase shift in the complex image data. For instance, if ${\varphi }_{t}:{\mathbf{R}}^{3}\to \left[\right. -\pi ,\pi \left[\right.$ denotes the phase of an MR image acquired with a gradient echo (GRE) sequence with echo time t > 0, the relation between δB0 and φ0 can be stated as:

where ${\varphi }_{0}:{\mathbf{R}}^{3}\to \left[\right. -\pi ,\pi \left[\right.$ is the time-independent phase offset induced by a single measurement coil and γ is the gyromagnetic ratio. Using multiple coils, the phase offset φ0 can be recovered [159] such that we may assume, in the following, that φ0 = 0. Pursuing a Lorentzian sphere approach and assuming that in the near field, the magnetic dipoles moments that cause the magnetisation are randomly distributed, one is able to relate δB0 with the magnetic susceptibility χ associated with the static field orientation approximately as follows [172]:

where B0 is the static field strength and d : R3\{0} → R is the dipole kernel according to

Assuming further that the susceptibility is isotropic, i.e., does not depend on the orientation of the static field, it may be recovered from the phase data φt . However, the phase image φt is only well-defined where the magnitude of the MR image is non-zero (or above a certain threshold). Denoting by Ω ⊂ R3 a Lipschitz domain that describes where φt is available, recovering χ then amounts to solving

for χ : R3R. This problem poses several challenges. First, the values on the left-hand side are only available up to integer multiples of 2π, such that phase unwrapping becomes necessary. There is a plethora of methods available for doing this for discrete data [159], however, in regions of fast phase change, these methods might not correctly resolve the ambiguities introduced by phase wrapping. Consequently, the unwrapped phase image ${\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ might be inaccurate.

With unwrapped phase data being available, the next challenge is to obtain χ on the whole space from a noisy version of χ * d on Ω, which is an underdetermined problem. The usual approach for this challenge is to split χ into its contributions on Ω and R3\Ω and only aim at reconstructing χ on Ω. Now, as the dipole kernel d is harmonic on R3\{0}, the function $\chi {\vert }_{{\mathbf{R}}^{3}{\backslash}{\Omega}}{\ast}d$ is harmonic in Ω. Thus, one can write

Equation (92)

and solve this equation instead. For QSM, one often estimates ψ first and subtracts this estimate from the data. This step is called background field removal in this context and there are many different approaches for that [173]. Depending on the accuracy of the background field estimate, this step may introduce further errors into the data. Nevertheless, the procedure results in a foreground field estimate ${\varphi }_{t}^{\mathrm{f}\mathrm{g}}$ for which only the deconvolution problem

has to be solved. As this problem is ill-posed, it needs to be regularised. A Tikhonov regularisation approach can then be phrased as follows:

for 1 < p < , dt = 2πγtd and ${\mathcal{R}}_{\alpha }$ a regularisation functional on Lp (Ω). As the convolution with dt results in a singular integral, the operation χχ * dt is only continuous Lp (Ω) → Lp (Ω) by the Calderón–Zygmund inequality [50], i.e., does not increase regularity. In this context, first-order regularisers (H1 and TV) have been used [20], but also TGV2 has been employed [63]. Note that in these approaches, one usually considers p = 2 which might cause problems regarding well-posedness for TV and TGV2 as in 3D, coercivity only holds in L3/2(Ω). This problem can for instance be avoided by setting dt to zero in a small ball around zero; a strategy that also seems consistent with the modelling of the forward problem [172]. A numerical solution for χ then finally gives a susceptibility map of the region of interest Ω. However, since the overall procedure involves three sequential steps, each possibly introducing an error that propagates, an integrative variational model that essentially only depends on the original wrapped phase data φt is desirable.

Such a model can indeed be derived. First, observe that in case of sufficient regularity, the Laplacian of the unwrapped phase can easily and directly be obtained from φt :

Equation (93)

such that ${\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ is known up to an additive harmonic contribution. Indeed, this is the concept behind Laplacian phase unwrapping [168]. Further, introducing the wave-type operator

and noticing that d = □Γ, where Γ : R3\{0} → R is the fundamental solution of the Laplace equation, i.e., ${\Gamma}\left(x,y,z\right)=\frac{1}{4\pi }{\left({x}^{2}+{y}^{2}+{z}^{2}\right)}^{-1/2}$, it follows from (92) that

Equation (94)

In particular, the harmonic contribution from the background field vanishes and the data obtained in (93) can directly be used on the right-hand side. Thus, only a wave-type partial differential equation has to be solved in which there is no longer the need for background field correction. The equation is, however, missing boundary conditions such that one cannot expect to recover χ in all circumstances. Under a priori assumptions on χ, the lack of boundary conditions can be mitigated by the introduction of a regularisation functional. Indeed, assuming that χ is piecewise constant and of bounded variation, the minimisation of TV subject to (94) recovers χ up to an additive constant [44].

Since the data φt might be noisy, the variational model should also account for errors on the right-hand side of (94) and introduce a suitable discrepancy term. Assuming Gaussian noise for φt , the right-hand side ${\Delta}{\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ is perturbed by noise in H−2(Ω) which suggests a H−2-discrepancy term for (94). The latter can be realised by requiring ${\Delta}\psi =2\pi \gamma t\square \chi -{\Delta}{\varphi }_{t}^{\mathrm{u}\mathrm{n}\mathrm{w}\mathrm{r}\mathrm{a}\mathrm{p}}$ for a ψL2(Ω) and measuring the L2-norm of ψ. In total, this leads to

Equation (95)

where 1 ⩽ p < , the constraint has to be understood in the distributional sense, and ${\mathcal{R}}_{\alpha }$ is a regularisation functional on Lp (Ω) realising a priori assumptions on χ that compensate for the lack of boundary conditions. In [126], the choice ${\mathcal{R}}_{\alpha }={\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ was proposed and studied. Choosing $p=\frac{3}{2}$, the functional in (95) is coercive up to finite dimensions and the linear PDE-constraint is closed, so one can easily see that an optimal solution always exists and yields finite values once there is a pair (χ, ψ) ∈ L2(Ω) × BV(Ω) that satisfies the constraints.

A numerical algorithm for the discrete solution of (95) with TGV2-regularisation can easily be derived by employing the tools of section 6 and, e.g., finite-difference discretisations of the operator □. In [126], a primal-dual algorithm has been implemented and tested for synthetic as well as real-life data. It turns out that the integrative approach (95) is very robust to noise and can in particular be employed for fast 3D MRI-acquisition schemes that may yield low signal-to noise ratio such as 3D echo-planar imaging (EPI) [147]. It has been tested on raw phase data, see figure 22, where the benefits of higher-order regularisation also become apparent. Due to the short scan time that is possible by this approach as well as its robustness, it might additionally contribute to advance QSM further towards clinical applications.

Figure 22.

Figure 22. Example for integrative TGV-regularised susceptibility reconstruction from wrapped phase data. (a) Magnitude image (for brain mask extraction). (b) Input phase image φt (single gradient echo, echo time: 27 ms, field strength: 3 T). (c) Result of the integrative approach (95) (scale from −0.15 to 0.25 ppm). All images visualise one slice of the respective 3D image.

Standard image High-resolution image

8.5. Dynamic MRI reconstruction

As mentioned in subsection 8.2, data acquisition in MR imaging is relatively slow. This can be compensated by subsampling and variational reconstruction techniques such that in controlled environments, as for instance with brain or knee imaging, a good reconstruction quality can be obtained. The situation is more difficult when imaging parts of the body that are affected, for instance, by breathing motion, or when one aims to image certain dynamics such as with dynamic contrast enhanced MRI or heart imaging. Regarding unwanted motion, there exists a large amount of literature on motion correction techniques (see [197] for a review) which can be separated into prospective and retrospective motion correction and which often rely on additional measurements to estimate and correct for unwanted motion. In contrast to that, dynamic MRI aims to capture certain dynamic processes such as heartbeats, the flow of blood or contrast agent. Here, the approach is often to acquire highly subsampled data, possibly combined with gating techniques, such that motion consistency can be assumed for each single frame of a time series of measurements. The severe lack of data for each frame can then only be mitigated by exploiting temporal correspondences between different measurement times. One way to achieve this is via Tikhonov regularisation of the dynamic inverse problem, which, for instance, amounts to

Equation (96)

where p ∈ [1, ], T > 0, Ω ⊂ Rd is the image domain, and for almost every $t\in \left.\right]0,T\left[\right.$, σt is a positive, finite Radon measure on Rd that represents the possibly time-dependent Fourier sampling pattern at time t, such that ${K}_{t}:{L}^{p}\left({\Omega}\right)\to {L}_{{\sigma }_{t}}^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ according to (87) models the MR forward operator, and ${f}_{t}\in {L}_{{\sigma }_{t}}^{2}{\left({\mathbf{R}}^{d},\mathbf{C}\right)}^{k}$ represents the associated measurement data. Further, ut denotes the evaluation of u at time t which is almost everywhere a function in Lp (Ω). As usual, ${\mathcal{R}}_{\alpha }$ corresponds to the regularisation functional that can be used to enforce additional regularity constraints. Note that in order to obtain a well-defined formulation for the time-dependent integral, the σt have to vary in a measurable way with t such that the associated Kt and the data ft are also measurable in a suitable sense. We refer to [30] for details on the necessary notion and spaces, and an analysis of the above problem in the context of optimal-transport-based regularisation.

In the context of clinical MR applications, temporal Fourier transforms [117], temporal derivatives [2] or combinations thereof [86] have, for instance, been proposed for temporal regularisation. More recently, methods that build on motion-dependent additive decomposition of the dynamic image data into different components have been successful. The work [141] achieves this in a discrete setting via low-rank and sparse decomposition which, for the low-rank component, penalises the singular values of the matrix containing the vectorised frames in each column. In contrast to that, by employing the ICTGV functional presented in subsection 5.3, the work [167] achieves an additive decomposition and adaptive regularisation of the dynamic data via penalising differently weighted spatio-temporal derivatives. There, problem (96) is solved for the choice

where the ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{i}}^{2}$ are second-order spatio-temporal TGV functionals that employ different weightings of the components of the spatio-temporal derivatives in such a way that for ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{1}}^{2}$, changes in time are penalised stronger than changes in space while ${\mathrm{T}\mathrm{G}\mathrm{V}}_{{\beta }_{2}}^{2}$ acts the other way around. The numerical solution of (96) can again be obtained within the algorithmic framework presented in section 6 and we refer to [171] for a GPU-accelerated open source implementation and demo scripts.

Figure 23 shows the result of ICTGV-regularised reconstruction of a multi-coil cardiac cine dataset (subsampled with factor ≈ 11) and compares to the straightforward sum-of-squares (SOS) reconstruction. Since the SOS reconstruction does not account for temporal correspondences, it is not able to obtain a useful result for a high subsampling factor while the ICTGV-based reconstruction resolves fine details as well as motion dynamics rather well. Figure 24 shows a comparison to the low-rank and sparse (L + S) method of [141] for a second cine dataset with a different view. Here, the parameters for the L + S method where optimised for each experiment separately using the (in practice unknown) ground truth while for ICTGV, the parameters were trained a priori on a different dataset and fixed afterwards. It can be seen in figure 24 that both methods perform rather well up the high subsampling factors, where the ICTGV-based is able to recover fine details (highlighted by arrows) that are lost with L + S reconstruction.

Figure 23.

Figure 23. Comparison of straightforward sum-of-squares (left) and ICTGV-regularised (right) reconstruction for a dynamic MR dataset with subsampling factor ≈ 11. Each image shows one frame of the reconstructed image sequence along with the temporal evolution of one horizontal and vertical cross section indicated by the red and blue line, respectively.

Standard image High-resolution image
Figure 24.

Figure 24. Comparison of L + S- and ICTGV-regularised dynamic MR reconstruction. The first column shows, from top to bottom, a frame of the ground truth image sequence along with the temporal evolution of a vertical and horizontal cross section (indicated by red dotted lines) as well as a close up. Columns 2 and 3 depict the reconstruction results for L + S regularisation, while columns 4 and 5 depict the corresponding results for ICTGV regularisation (subsampling factors r = 12 and r = 16). The red arrows indicate details that are lost by L + S regularisation but maintained with ICTGV regularisation. Reproduced from [167] John Wiley & Sons. © 2016 International Society for Magnetic Resonance in Medicine.

Standard image High-resolution image

8.6. Joint MR-PET reconstruction

We have seen in subsections 8.2 and 8.5 that image reconstruction from parallel, subsampled MRI data is non-trivial and can greatly be improved with variational regularisation. Beyond MRI and CT, a further medical imaging modality of high clinical relevance is positron emission tomography (PET). As opposed to standard MR imaging, PET imaging is quantitative and builds on reconstructing the spatial distribution of a radioactive tracer that is injected into the patient prior to the measurement. The forward model in PET imaging is the x-ray transform (often combined with resolution modelling) and, since measurements correspond to photon counts, the noise in PET imaging is typically assumed to be Poisson distributed. Reconstructing images from PET measurement data is a non-trivial inverse problem, where difficulties arise, for instance, from high Poisson noise due to limited data acquisition time, dosage restrictions for the radioactive tracer, as well as from limited measurement resolution due to finite detector size and photon acollinearity. As a result, variational reconstruction methods and in particular TV regularisation are employed also in PET imaging to improve reconstruction (see, for instance, [116, 165]).

In a clinical workflow, often both MR and PET images are acquired, which provides two complementary sources of information for diagnoses. This can also be exploited for reconstruction and in particular, for MR-prior-based PET reconstruction methods, which incorporate structural information from the MR image for PET reconstruction, are now well established in theory [105] and in practice [81, 169, 189]. While those methods regard an a priori reconstructed MR image as fixed, anatomical prior for PET, also joint, synergistic reconstruction is possible and recently became more popular due to the availability of joint MR-PET scanners [82, 122]. An advantage of the latter is that neither of the two images is fixed a priori and, in principle, a mutual benefit for both modalities due to joint reconstruction is possible. To this aim, the regularisation term needs to incorporate an appropriate coupling of the two modalities, and here we discuss the coupled TGV-based approach of [122] that allows to achieve this.

At first, we consider the forward model for PET imaging, which consists of a convolution followed by an attenuated the x-ray transform and additive corrections. With Ω ⊂ Rd the image domain such that Ω ⊂ BR (0) for some R > 0, the x-ray transform can be defined as linear operator $P:{L}^{p}\left({\Omega}\right)\to {L}_{\mu }^{1}\left({\Sigma}\right)$, where p ∈ [1, ], ${\Sigma}\subset \left\{\left(\vartheta ,x\right)\enspace \vert \enspace \vartheta \in {\mathcal{S}}^{d-1},\enspace x\in {\left\{\vartheta \right\}}^{\perp },\enspace {\Vert}x{\Vert}{< }R\right\}$, Σ is a non-empty and open subset of the tangent bundle to ${\mathcal{S}}^{d-1}$, via

Note that here, u is extended by zero outside Ω and the measure μ on Σ is induced by the functional

see [134, section 3.4] for details. We further denote by kL1(Br (0)) a convolution kernel with width r > 0 that models physical limitations in PET imaging, for instance, due to finite detector size and photon acollinearity, see [151]. The PET forward model is then defined as ${K}_{\mathrm{P}\mathrm{E}\mathrm{T}}:{L}^{p}\left({\Omega}\right)\to {L}_{\mu }^{1}\left({\Sigma}\right)$

where u * k denotes the convolution of u and k (using again zero extension), $a\in {L}_{\mu }^{\infty }\left({\Sigma}\right)$ with a > 0 a.e. includes a correction for attenuation and detector sensitivities and $c\in {L}_{\mu }^{1}\left({\Sigma}\right)$ with c ⩾ 0 a.e. accounts for additive errors due to random and scattered events. Assuming the noise in PET to be Poisson distributed, we use the Kullback–Leibler divergence as defined in (2) for data fidelity.

For the MR forward model, we use again the parallel MR operator KMR according to (87) in subsection 8.2 that includes coil sensitivity profiles and a measurement trajectory defined via a finite, positive Radon measure σ on Rd .

For regularisation, we use an extension of second-order TGV to multi-channel data as discussed in subsection 5.3, which, as in parallel MR reconstruction, is adapted to complex-valued data. That is, we define ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ for $u=\left({u}_{1},{u}_{2}\right)\in {L}_{\mathrm{l}\mathrm{o}\mathrm{c}}^{1}\left({\Omega},\mathbf{C}\right)$ similar to (53), where we use the spectral norm as pointwise dual norm on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{1}{\left({\mathbf{C}}^{d}\right)}^{2}$ and the Frobenius norm on ${\mathrm{S}\mathrm{y}\mathrm{m}}^{2}{\left({\mathbf{C}}^{d}\right)}^{2}$. In the primal version of TGV analogous to (35), this results in particular in a pointwise nuclear-norm penalisation of the first-order derivative information ∇uw and is motivated by the goal of enforcing pointwise rank one of ∇uw in a discretised setting and hence an alignment of level sets.

With these building blocks, a variational model for coupled MR-PET reconstruction can be written as

where ${\left({f}_{1}\right)}_{1},\dots ,{\left({f}_{1}\right)}_{k}\in {L}_{\sigma }^{2}\left({\mathbf{R}}^{d},\mathbf{C}\right)$ and ${f}_{2}\in {L}_{\mu }^{1}\left({\Sigma}\right)$, f2 ⩾ 0 almost everywhere is the given measurement data for MR and PET, respectively, and λ1, λ2 > 0 are the weights for the different data terms. Well-posedness for this model follows again by a straightforward adaptation of proposition 5.17 to the multi-channel setting. Regarding the regularisation parameters λ1, λ2, this is a particular case of coupled multi-discrepancy regularisation and we refer to [107] for results on convergence and parameter choice for vanishing noise. A numerical solution can again be obtained with the techniques described in section 6, where the discrete forward operator Kh is vectorised as Kh = diag(KMR,h , KPET,h ) with discretised operators KMR,h and KPET,h , and the discrepancy ${S}_{{f}_{h}}$ is the component-wise sum of the two discrepancies above in a discrete version.

Numerical results for 3D in-vivo data using this method, together with a comparison to a standard method, can be found in figure 25 (see also [122] for a more detailed evaluation). As can be seen there, the coupling of the two modalities yield improved reconstruction results in particular for the PET channels, making sharp features and details more visible.

Figure 25.

Figure 25. Slices of fused, reconstructed 3D in vivo MR and PET images from an MPRAGE contrast with two-fold subsampling and a 10 min fluorodeoxyglucose (FDG) PET head scan, respectively (left to right: transversal view, coronal view, sagittal view). Top row: standard methods (CG sense [148] for MR, expectation maximisation [176] for PET). Bottom row: nuclear-norm-TGV-based variational reconstruction.

Standard image High-resolution image

8.7. Radon inversion for multi-channel electron microscopy

Similar to joint MR-PET reconstruction, coupled higher-order regularisation can also be used in multi-channel electron microscopy imaging for improving reconstruction quality. As a particular technique in electron microscopy, scanning transmission electron microscopy (STEM) allows for a three-dimensional imaging of nanomaterials down to atomic resolution and is heavily used in material sciences and nanotechnology, e.g. for quality control and troubleshooting in the production of microchips. Beyond providing pure density images, spectroscopy methods in STEM imaging also allow to image the 3D elemental and chemical make-up of a sample.

Standard techniques for density and spectroscopy imaging in STEM are high-angle annular dark-field (HAADF) imaging and energy-dispersive x-ray spectroscopy (EDXS), respectively. For both imaging methods, measurement data can be acquired simultaneously while raster-scanning the material sample with a focussed electron beam. HAADF imaging records the number of electrons scattered to a specific annular range while EDXS allows to record characteristic x-rays for specific elements which are emitted when electrons change their shell position. For each position of the electron beam, both HAADF and EDXS measurements correspond to measuring (approximately) the density of a weighted sum of all elements and single element, respectively, integrated along the line of the electron beam that intersects the sample. Scanning over the entire sample orthogonal to an imaging plane, the acquired signals hence correspond to a slice-wise Radon transform of an overall density image (HAADF) and different elemental maps (EDXS).

Due to physical restrictions in the imaging system, the number of available projections (i.e., measurement angles) as well as the signal-to-noise ratio, in particular for EDXS, is limited and volumetric images obtained with standard image reconstruction methods, such as the simultaneous iterative reconstruction technique (SIRT) [93], suffer from artefacts and noise. As a result, regularised reconstruction is increasingly used also for electron tomography, with total-variation-based methods being a popular example [96]. While TV regularisation works well for piecewise-constant density distributions with sharp interfaces, the presence of gradual changes between different sample regions, e.g., due to diffusion at interfaces, motivates the usage of higher-order regularisation approaches for electron tomography [111], also in a single-channel setting [3]. In a multi-channel setting as discussed here, an additional coupling of different measurement channels is very beneficial in particular for the reconstruction of elemental maps and has been carried out with first-order TV regularisation in [201, 202] and second-order TGV regularisation in [111]. In the following, we discuss the TGV-based approach of [111] in more detail and provide experimental results.

Figure 26.

Figure 26. Example of multi-channel electron tomography. The images show density maps (top row) and elemental maps (bottom row) of one slice of a 3D multi-channel electron tomography reconstruction using different reconstruction strategies [111]. Left column: SIRT [93] method. Middle column: uncoupled TGV-based regularisation. Right column: coupled TGV-based regularisation. Reproduced from [111]. CC BY 3.0.

Standard image High-resolution image

With Σ = Σ1 × Σ2, ${{\Sigma}}_{1}\subset {\mathcal{S}}^{1}{\times}\left.\right]-R,R\left[\right.$ non-empty, open, ${{\Sigma}}_{2}\subset \left.\right]-R,R\left[\right.$ non-empty, open, for some R > 0 and $\mu =\left({\mathcal{H}}^{1}\enspace \llcorner \enspace {\mathcal{S}}^{1}{\times}{\mathcal{L}}^{1}\right){\times}{\mathcal{L}}^{1}$, we define, for ${\Omega}={B}_{R}\left(0\right){\times}\left.\right]-R,R\left[\right.$ where BR (0) ⊂ R2 and p ∈ [1, ], the forward operator for electron tomography as ${K}_{\mathrm{T}\mathrm{E}\mathrm{M}}:{L}^{p}\left({\Omega}\right)\to {L}_{\mu }^{1}\left({\Sigma}\right)$ via

which corresponds to a slice-wise 2D Radon transform. By continuity of the Radon transform from L1(BR (0)) to ${L}^{1}\left({\mathcal{S}}^{1}{\times}\left.\right]-R,R\left[\right.\right)$ (see [134, section 3.4]) there is a C > 0 such that, for every uLp (BR (0)) and almost every $z\in \left.\right]-R,R\left[\right.$,

Integrating over Σ2, it follows that KTEM is bounded from Lp (Ω) to ${L}_{\mu }^{1}\left({\Sigma}\right)$. Now assume f1, ..., fn to be given, multi-channel measurement data and the forward model for the ith measurement channel to be described with ${\left({K}_{\mathrm{T}\mathrm{E}\mathrm{M}}\right)}_{i}\in \mathcal{L}\left({L}^{p}\left({\Omega}\right),{L}_{\mu }^{1}\left({\Sigma}\right)\right)$ for i = 1, ..., n. In the example considered below, f = (fHAADF, fYb, fAl, fSi), with fHAADF the HAADF data, and (fYb, fAl, fSi) the EDXS data for ytterbium, aluminium and silicon, respectively. With ${\mathrm{T}\mathrm{G}\mathrm{V}}_{\alpha }^{2}$ the multi-channel extension of second-order TGV as discussed in subsection 5.3, using a Frobenius-norm coupling of the different channels, we consider

for the reconstruction of multi-channel image data, for which well-posedness again results from a multi-channel extension of proposition 5.17. A numerical solution can be obtained using the framework as described in section 6 where again the discrete forward operator Kh and the discrepancy term ${S}_{{f}_{h}}$ are vectorised accordingly, similar as in subsection 8.6.

Experimental results for this setting and a comparison to other methods can be found in figure 26, where in particular, separate TGV regularisation of each channel is compared to the Frobenius-norm-based coupling as mentioned above. It can be seen in figure 26 that using TGV regularisation significantly improves upon the standard SIRT method. Also, a coupling of the different channels is very beneficial in particular for the elemental maps, making material inclusions visible that can hardly be seen with an uncoupled reconstruction. We refer to [111] for a more detailed evaluation (and comparison to TV-based regularisation).

9. Conclusions

The higher-order total variation strategies and application examples discussed in this review show once again that while regularisation makes it possible to solve ill-posed inverse problems in the first place, the actual choice of the regularisation strategy has a tremendous impact on the qualitative properties of the regularised solutions and can be decisive on whether the inverse problem is considered being solved in practice. In the considered context of Tikhonov regularisation, convex regularisation functionals offer a great flexibility in terms of functional-analytic properties and a priori assumptions on the solutions. With the total variation being an established regulariser sharing desirable properties such as the ability to recover discontinuities, higher-order total variation regularisers offer additional possibilities, mainly the efficient modelling of piecewise smooth regions in which the derivative of some order may jump. We have seen in this paper that a regularisation theory for these functionals can be established and the overall theory is now sufficiently advanced such that the favourable properties of both first- and higher-order TV can be obtained with suitable functionals, for instance by infimal convolution. The underlying concepts are in particular suitable for various generalisations. The total generalised variation, for instance, bases on TV-type penalties for a multiple-order differentiation cascade and thus enables to realise the a priori assumption of piecewise smoothness with jump discontinuities. Further, due to higher-order derivatives being intrinsically connected to symmetric tensor fields, a generalisation to dedicated regularisation approaches for the latter is immediate. All these approaches and generalisations are indeed beneficial for applications and the solution of concrete inverse problems. This is in particular the case for inverse problems in medical imaging.

Of course, there are still several directions of future research, open questions and topics that have not been covered by this review. For instance, one of the major differences between first-order TV and higher-order approaches is the availability of a co-area formula that can be used to describe the total variation of scalar function in terms of its sublevel-sets. Generalisations to vector-valued functions or higher-order derivatives do either not exist or are not practical from the view of regularisation theory for inverse problems. As the co-area formula allows, for instance, to obtain geometrical properties for TV-regularisation [58, 112], it would be interesting to bridge the gap to higher-order TV approaches such that similar statements can be made. Some recent progress in this direction might be the connection between the solutions of certain linear inverse problems and the extremal points of the sublevel sets of the regulariser [26, 29], since the extremal points of the TV-balls are essentially characteristic functions. However, for higher-order TV and the generalisations discussed in this paper, a characterisation of its extremal points is not known to date. Further, in the context of TV-regularisation, only natural orders of differentiation have been considered in detail so far, with regularisation theory for fractional-order TV just emerging [70, 194, 199]. Indeed, there are many open questions for fractional-order TV regularisation ranging from the properties of the fractional derivative operators and their underlying spaces to optimal selection of the fractional differentiation parameter as well as the construction of efficient numerical algorithms. Finally, with all the possibilities of combining distributional differentiation and Radon-norm-penalisation, which are the essential building blocks of the regularisers discussed in this paper, the question arises whether their structure, parameters and differential operators can also be learned by data-driven optimisation. Some results in this direction can already be found in the literature [49, 71], and we expect that more will follow in the future.

Acknowledgments

The authors gratefully acknowledge support of the Austrian Science Fund (FWF) within the project P 29192. The Institute of Mathematics and Scientific Computing at the University of Graz is a member of NAWI Graz (www.nawigraz.at) and BioTechMed Graz (www.biotechmedgraz.at).

Appendix A.: Additional proofs

Lemma A.1. With Ω' ⊂ Rd measurable, in accordance with equation (2), let the functional KL on L1(Ω')2 be given as

where we set the integrand to v where f = 0 and to where v = 0 and f > 0. Then, KL is well-defined, non-negative, convex and lower semi-continuous. In case f ⩾ 0 a.e., it holds that KL(v, f) = 0 if and only if v = f. Further, for all v, fL1(Ω'),

Equation (A.1)

and in particular,

Equation (A.2)

for all f, vL1(Ω').

Proof. At first note that, in case f, v ⩾ 0, KL is given by integrating g : [0, [2 → [0, ] with $g\left(x,y\right)=x-y-y\enspace \mathrm{log}\left(\frac{x}{y}\right)$ for $x,y\in \left[\right.0,\infty \left[\right.$, where we use the conventions $0\enspace \mathrm{log}\left(\frac{v}{0}\right)=0$ for v ⩾ 0 and $-f\enspace \mathrm{log}\left(\frac{0}{f}\right)=\infty $ for f > 0. It is easy to see that g is non-negative, convex and lower semi-continuous, hence KL is well-defined, non-negative, convex, lower semi-continuous and, in case f ⩾ 0 a.e., KL(v, f) = 0 if and only if v = f. Also, a simple computation (see [23]) shows that for all $x,y\in \left[\right.0,\infty \left[\right.$,

from which the estimate (A.1) follows with the Cauchy–Schwarz inequality applied to the square root of the above estimate. Now, for the first estimate in (A.2), we take f, vL1(Ω') and note that in case ||v||1 ⩽ ||f||1, the estimate holds trivially. In the other case, v ≠ 0 and we observe that (A.1) implies

from which the claimed estimate follows from rearranging, dividing by ||v||1 and noting that ||f||1/||v||1 ⩽ 1. The second estimate in (A.2) follows analogously. □

Lemma A.2. For {fn } and f in L1(Ω'), let KL(f, fn ) → 0. Then, ||ffn ||1 → 0 and for each sequence {vn } in L1(Ω') with vn v as n for vL1(Ω'), it holds that

If, in addition, fn Cf a.e. in Ω' for all n and some C > 0, then for all vL1(Ω'), we have

Proof. Assume that KL(f, fn ) → 0. It follows from the second estimate in (A.2) that {KL(f, fn )} bounded implies {||fn ||1} bounded which, using (A.1), yields that fn f in L1(Ω'). The lim inf estimate then follows from lower semi-continuity as in lemma A.1. Now assume that additionally, fn Cf a.e. in Ω' for all n and some C > 0. By L1-convergence we can take a subsequence $\left\{{f}^{{n}_{k}}\right\}$ such that ${f}^{{n}_{k}}\to f$ pointwise a.e., and ${\mathrm{lim}}_{k\to \infty }\mathrm{K}\mathrm{L}\left(v,{f}^{{n}_{k}}\right)={\text{lim sup}}_{n\to \infty }\mathrm{K}\mathrm{L}\left(v,{f}^{n}\right)$. As $\mathrm{K}\mathrm{L}\left(f,{f}^{{n}_{k}}\right)\to 0$ as k, we have ${\int }_{{{\Omega}}^{\prime }}{f}^{{n}_{k}}\enspace \mathrm{log}\left(f/{f}^{{n}_{k}}\right)\enspace \mathrm{d}x\to 0$. Also, since f  log(v/f) ∈ L1(Ω') and ${f}^{{n}_{k}}/f$, where we set ${f}^{{n}_{k}}/f=0$ where f = 0, is bounded a.e. uniformly with respect to k, we have ${\int }_{{{\Omega}}^{\prime }}\left({f}^{{n}_{k}}/f\right)f\enspace \mathrm{log}\left(v/f\right)\enspace \mathrm{d}x\to {\int }_{{{\Omega}}^{\prime }}f\enspace \mathrm{log}\left(v/f\right)\enspace \mathrm{d}x$ by virtue of Lebesgue's theorem. Together, we get

which is what we wanted to show. □

Lemma A.3. Let k ⩾ 1, l ⩾ 0 and u : Ω → Syml (Rd ) be (k + l)-times continuously differentiable such that ${\mathcal{E}}^{k}u=0$ in Ω. Then, ∇k+l u = 0 in Ω.

Proof. The statement is a slight generalisation of [28, proposition 3.1] and its proof is analogous. We present it for the sake of completeness. Choose a1, ..., a2l+k Rd . We show that (∇k+l u)(x)(a1, ..., a2l+k ) = 0 for each x ∈ Ω. For this purpose, let L ⊂ {1, ..., 2l + k} with $\left\vert L\right\vert =l$ and denote, dropping the dependence on x, by

for some bijective π : {1, ..., l} → L, giving a (k + l)-times differentiable uL : Ω → R. Observe that by symmetry, uL does not depend on the choice of π but indeed only on L. Likewise, denote by

for some bijective σ : {1, ..., k + l} → ∁L. By symmetry of the derivative, $\frac{{\partial }^{k+l}{u}_{L}}{\partial {a}_{\complement L}}:{\Omega}\to \mathbf{R}$ only depends on L. We also introduce an analogous notation for the symmetrised derivative ${\mathcal{E}}^{k}$:

and, for some π : {1, ..., l} → L bijective,

Now, as for π : {1, ..., l} → L bijective, the definitions as well as symmetry yield

we see that each $\frac{{\partial }^{l}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}}{\partial {a}_{L}}$ can be written as a linear combination of $\frac{{\partial }^{k+l}{u}_{M}}{\partial {a}_{\complement M}}$. Up to the factor ${\left(\genfrac{}{}{0pt}{}{k+l}{l}\right)}^{-1}$, the linear mapping that takes the formal vector ${\left(\frac{{\partial }^{k+l}{u}_{M}}{\partial {a}_{\complement M}}\right)}_{M}$ indexed by all M ⊂ {1, ..., 2l + k}, $\left\vert M\right\vert =l$ to the formal vector ${\left(\frac{{\partial }^{l}{\left({\mathcal{E}}^{k}u\right)}_{\complement L}}{\partial {a}_{L}}\right)}_{L}$ indexed by all L ⊂ {1, ..., 2l + k}, $\left\vert L\right\vert =l$ is corresponding to the multiplication with the adjacency matrix of the Kneser graph K2l+k,l , see, for instance, [19] for a definition. The latter is regular, which can, for instance, be seen by looking at its eigenvalues which are known to be

see again [19]. Thus, we can find real numbers ${\left({c}_{L}\right)}_{L}$ indexed by all L ⊂ {1, ..., 2k + l}, $\left\vert L\right\vert =l$ and independent from u, and a1, ..., a2l+k such that for M = {k + l + 1, ..., 2l + k}, the identity

holds. If ${\mathcal{E}}^{k}u=0$, then the right-hand side is 0 while the left-hand side corresponds to ∇k+l u(a1, ..., a2l+k ). This completes the proof. □

Lemma A.4. Let k ⩾ 1 and Ω ⊂ Rd be a bounded Lipschitz domain. For each u ∈ BVk (Ω) and δ > 0, there exists a uδ ∈ BVk (Ω) ∩ C(Ω) such that for δ → 0,

i.e., {uδ } converges strictly in BVk (Ω) to u as δ → 0.

Proof. The proof builds on the result [32, lemma 5.4] and techniques from [7, 85]. Choose a sequence of open sets {Ωn } such that Ω = ⋃nN Ωn , ${\overline{{\Omega}}}_{n}\subset \subset {\Omega}$ for all nN and any point of Ω belongs to at most four sets Ωn (cf [7, theorem 3.9] for a construction of such sets). Further, let {φn } be a partition of unity relative to {Ωn }, i.e., ${\varphi }^{n}\in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({{\Omega}}_{n}\right)$ with φn ⩾ 0 for all nN and ${\sum }_{n=1}^{\infty }{\varphi }^{n}=1$ pointwise in Ω. Finally, let $\rho \in {\mathcal{C}}_{\mathrm{c}}^{\infty }\left({\mathbf{R}}^{d}\right)$ be a standard mollifier, i.e., ρ is radially symmetric, non-negative and satisfies ${\int }_{{\mathbf{R}}^{d}}\rho \enspace \mathrm{d}x=1$. Denote by ρɛ the function given by ρɛ (x) = ɛd ρ(x/ɛ) for ɛ > 0.

As ρ is a mollifier and φn has compact support in Ωn , we can find, for any nN, an ɛn > 0 such that $\mathrm{s}\mathrm{u}\mathrm{p}\mathrm{p}\left(\left(v{\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}}\right)\subset {{\Omega}}_{n}$ for any v ∈ BD(Ω, Syml (Ω)), lN. Further, as shown in [32, lemma 5.4], for any v ∈ BD(Ω, Syml (Ω)) fixed, for any δ > 0 we can pick a sequence $\left\{{\varepsilon }_{n}^{\delta }\right\}$ with each ${\varepsilon }_{n}^{\delta }$ being small enough such that with ${v}^{\delta }={\sum }_{n=1}^{\infty }\left(v{\varphi }^{n}\right){\ast}{\rho }_{{\varepsilon }_{n}^{\delta }}$, we have

In particular, for u ∈ BVk (Ω) fixed and vl = ∇l u ∈ BDkl (Ω, Syml (Rd )) for l = 0, ..., k − 1, we can pick a sequence $\left\{{\varepsilon }_{n}^{\delta }\right\}$ with each component small enough such that

Equation (A.3)

since $\nabla \otimes {v}_{l}=\mathcal{E}{v}_{l}$. Further we note that, as additional consequence of the Sobolev–Korn inequality of theorem 3.18, vl Hk−1−l,1(Ω, Syml (Rd )) and hence by the product rule $\mathcal{E}\left({v}_{l-1}{\varphi }^{n}\right)=\left(\mathcal{E}{v}_{l-1}\right){\varphi }^{n}+\vert \vert \vert \left({v}_{l-1}\otimes \nabla {\varphi }^{n}\right)$, we get

In addition,

Since each | | |(vl−1 ⊗ ∇φn ) ∈ Hkl,1(Ω, Syml (Rd )), by adaptation of standard mollification results [85, theorem 5.2.2] we can further reduce any ${\varepsilon }_{n}^{\delta }$ to be small enough such that for each m = 1, ..., k,

and consequently,

Now, setting ${u}^{\delta }={v}_{0}^{\delta }$, we estimate for m = 1, ..., k, using the second estimate in (A.3) and that $\mathcal{E}{v}_{m-1}={\nabla }^{m}u$ as well as ${\mathcal{E}}^{m}{u}^{\delta }={\nabla }^{m}{u}^{\delta }$,

This shows in particular that uδ ∈ BVk (Ω) and by construction, ${u}^{\delta }\in {\mathcal{C}}^{\infty }\left({\Omega}\right)$. Taking the limit δ → 0 and using the lower semi-continuity of TVm , we finally obtain

for m = 1, ..., k which, together with the first estimate in (A.3), implies the assertion. □

Please wait… references are loading.